%-----------------------------------------------------------------------
%                  Open SkyQuery
%-----------------------------------------------------------------------
\documentclass[11pt,twoside]{article}
\usepackage{adassconf}

\begin{document}
\paperID{P2-18}
%%%% ID=P2-18

\title{Open SkyQuery -- VO Compliant Dynamic Federation of Astronomical 
Archives}  
%%%%% titlemark too long -- changed FO
%%\titlemark{Open SkyQuery -- VO Compliant Federation of Astronomical Archives}
\titlemark{Open SkyQuery}

\author{Tam\'as Budav\'ari\altaffilmark{1}, Alex
Szalay\altaffilmark{1}, Tanu Malik\altaffilmark{1}, Ani
Thakar\altaffilmark{1}, William O'Mullane\altaffilmark{1}, Roy
Williams\altaffilmark{2}, Jim Gray\altaffilmark{3}, Bob
Mann\altaffilmark{4}, Naoki Yasuda\altaffilmark{5}}

\iffalse	%%%(FO) Change to a standard list
\altaffiltext{1}{Dept.\ of Physics \& Astronomy, Johns Hopkins
University, Baltimore, MD 21218, USA}
\altaffiltext{2}{CACR, California Institute 
of Technology, Pasadena, CA, 91125, USA}
\altaffiltext{3}{Microsoft Bay Area Research Center, San Francisco,
CA 94105, USA}
\altaffiltext{4}{Institute for Astronomy, University of Edinburgh, 
%Royal Observatory, Blackford Hill, 
Edinburgh, EH9 3HJ, UK}
\altaffiltext{5}{National Astronomical Observatory of Japan, Tokyo
181-8588, Japan}
\else
  \vspace*{2ex}   % Corrected style FO
\altaffilmark{1}\affil{Dept.\ of Physics \& Astronomy, Johns Hopkins
   University, Baltimore, MD 21218, USA}
\altaffilmark{2}\affil{CACR, California Institute
   of Technology, Pasadena, CA, 91125, USA}
\altaffilmark{3}\affil{Microsoft Bay Area Research Center, San Francisco,
   CA 94105, USA}
\altaffilmark{4}\affil{Institute for Astronomy, University of Edinburgh,
   Edinburgh, EH9 3HJ, UK}
\altaffilmark{5}\affil{National Astronomical Observatory of Japan, Tokyo
   181-8588, Japan}
\fi

% contact
\contact{Tam\'as Budav\'ari} \email{budavari@jhu.edu}

% index
\paindex{Budav\'ari, T.} 
\aindex{Szalay, A. S.} 
\aindex{Gray, J.}
\aindex{OMullane@O'Mullane, W.} 
\aindex{Williams, R.} 
\aindex{Thakar, A.} 
\aindex{Malik, T.} 
\aindex{Yasuda, N.} 
\aindex{Mann, R.}

\authormark{Budav\'ari et al.}

\keywords{SkyQuery, Virtual Observatory: table, ConeSearch, Virtual Observatory: QL, ADQL, Virtual Observatory: registry, web services}

%-----------------------------------------------------------------------
%                  Abstract
%-----------------------------------------------------------------------

\begin{abstract}
We discuss the redesign of the SkyQuery architecture, originally
built as a simple proof of concept for dynamic federation of
astronomical archives. In keeping with the Virtual Observatory
philosophy of hierarchical services, the design of Open SkyQuery
is based upon higher level services extending the basic
functionality of the current VO standard, the ConeSearch. Open
SkyQuery implements the VO specifications for data access,
retrieval and spatial join. Data are published via Web Services
called SkyNodes providing a rich functionality including footprint
coverage. SkyNodes are discovered through the VO registry. We
propose to have at least two levels of SkyNode compliance (Basic
and Full). We will also provide templates for publishing data
into a SkyNode.
\end{abstract}

%-----------------------------------------------------------------------
%                  Main body
%-----------------------------------------------------------------------

\section{Motivation}

With the advent of large CCD detectors, the way astronomy is done
changes rapidly. Because of the exponential growth in the size and
speed of the silicon chips, new surveys are expected to have
significantly higher data rates. These survey projects become both the
authors and publishers of their data (Szalay et al.~2002). In this
exponential world, only 10\% of all astronomical information is
available in central archives at any given time. In order to have
access to all up-to-date observations, we need to find a way to
federate geographically separated astronomical archives.

Current sky surveys such as SDSS, 2MASS, DPOSS have proven that
discoveries are always made at the boundaries when going deeper or
using more colors. By covering different wavelength ranges, surveys
can very well complement one another if one finds a way to combine
them. In the past, crossmatching of many catalogs was prohibitively
complex and expensive. The Virtual Observatory and Open SkyQuery in
particular are to make it simple and affordable.


\section{The Prototype SkyQuery}

The prototype SkyQuery was built last year in six weeks as a
feasibility study (Budav\'ari et al.~2003, Malik et al.~2002). It used
a hierarchy of XML Web Services to implement a distributed query
system that provided seamless access to SDSS, 2MASS and FIRST
data. Since the launch of the SkyQuery web site, many other catalogs
have become available (Purger et al.~2004) including the Isaac Newton
Telescope's Wide Field Survey (INTWFS), IRAS, NVSS, 2dF, PSCz, 2QZ and
Rosat, see Figure~\ref{P2-18:fig:scr}.

\begin{figure}
\epsscale{0.7}
\plotone{P2-18_1.ps}
\caption{The SkyQuery web site currently provides access to 10 catalogs,
altogether close to one terabyte (1TB) online astronomical data.}
\label{P2-18:fig:scr}
\end{figure}

\section{Building on Virtual Observatory Standards}

The SkyQuery architecture is being redesigned to utilize the recently
emerging VO standards such as the VOTable, the Astronomical Data Query
Language (ADQL; Yasuda et al.~2004), the VO Query Language (VOQL) and
the VO Registry services (Greene et al.\ 2004).  The data are going
to be published by the SkyNodes that implement XML Web Services to
extend the basic functionality of the ConeSearch.


\begin{figure}
%\epsscale{1}
%\plotone{arc.ps}
\epsscale{0.92}
\plotone{P2-18_2.ps}
\caption{Open SkyQuery architecture: Basic and Full IVOA Sky\-Nodes are
discovered dynamically in the VO registry by the SkyQuery portal.}
\label{P2-18:fig:arch}
\end{figure}

\section{Open SkyNode}

We propose to have three layers of VOQL building on top of one
another, see also in Yasuda et~al.\ (2004):

\begin{itemize}
\item {\bf VOQL1--ADQL:} ADQL and VOTable to query a single node
\item {\bf VOQL2--SkyQL:} SQL-like query language and federation
system%, i.e. combination of SkyQuery, JVOQL and VO standards.
\item {\bf VOQL3--Sky???:} a future query language
\end{itemize}
%
The SkyNode is essentially the implementation that provides the
necessary services, i.e.\ automatic crossmatching and participation
in federated queries called SkyQL. They may publish only a small
amount of data, i.e.\ a single FITS file or an entire survey such as
SDSS.
%
We distinguish between at least two levels in the implementation of the
SkyNode:
%
{\bf{}Basic SkyNodes} confirm to the Layer-1 specifications. They know
how to execute ADQL requests to query their own data and return the
results in VOTable format.
%
{\bf{}Full SkyNodes} support all methods required to be part of a
federated query (Layer-2). Advanced versions will also implement
footprint services in order to work out dynamically their intersection
when used in the same SkyQL query. For large surveys, it is
essentially a must to implement a sky indexing scheme as well, such as
the Hierarchical Triangular Mesh (HTM; Kunszt et al.\ 2001) for quick
lookup and spatial joins.

Figure~\ref{P2-18:fig:arch} illustrates the Open SkyQuery architecture and
shows the relations between the components.


\section{SkyQuery Strategy}

In order to ensure fast response, one needs to optimize the query
plan. Our simulations show that the simple sequential execution proves
to be optimal because today the wire speed is the limiting factor. One
needs to arrange the SkyNodes in {\em ascending} order of the number
of matching records so that the {\em least} amount of data is
transferred.  This simplifies the logic of the portal
significantly. However, the SkyNodes are designed to deal with more
complicated query plans, so that the system may be enhanced easily
later on. Another possible enhancement might be the asynchronous data
flow (O'Mullane et al.~2004).


\section{Concluding Remarks}

The emerging Virtual Observatory infrastructure makes it possible to
develop a new generation of astronomical tools. These online tools
promise to be easy to use and to open new dimensions for scientists.

Open SkyQuery is just one of the first steps. Its catalog services
will enable us to analyze geographically separated IVOA archives,
a.k.a.\ SkyNodes, as if they were part of the same dataset. As of
today, the VO building blocks are already in place to make Open
SkyQuery a reality.


\acknowledgments 
SkyQuery is supported by NSF Awards 0122449 and 9980044, and NASA AISRP 
awards NAG5-10742 (2001) and NAG5-12092 (2002).


%-----------------------------------------------------------------------
%                 Links
%-----------------------------------------------------------------------
\paragraph{Links} \mbox{} \\

\htmladdURL{http://www.skyquery.net/}

\htmladdURL{http://skyservice.pha.jhu.edu/develop/vo/adql/}

\htmladdURL{http://www.ivoa.net/twiki/bin/view/IVOA/IvoaVOQL}


%-----------------------------------------------------------------------
%                 References
%-----------------------------------------------------------------------
\begin{references}
\reference Budav\'ari, T., et al. 2003, \adassxii, \adassref{xii:O10-1}{31}
\reference Greene, G., et al. 2004, \adassxiii, \paperref{P3-8}
\reference Kunszt, P. Z., Szalay, A. S., and Thakar, A. R. 2001,
Mining the Sky: Proc. of the MPA/ESO/MPE workshop, Garching,
A.J.Banday, S. Zaroubi, M. Bartelmann (ed.), (Springer-Verlag Berlin
Heidelberg), 631.
\reference Malik, T., et al. 2002, CIDR `03, p.17, {\it `SkyQuery: A
WebService Approach to Federate Databases'} 
\reference O'Mullane, W., et al. 2004, \adassxiii, \paperref{O4-4}
\reference Purger, N., et al. 2004, \adassxiii, \paperref{P2-26}
\reference Szalay, A. S., et al. 2002, Proc.\ of SPIE, 4846, {\it `Web
Services for the VO'}
\reference Yasuda, N., et al. 2004, \adassxiii, \paperref{P3-10}
\end{references}

\end{document}
