%% Modified FO
\documentclass[11pt,twoside]{article}  % Leave intact
\usepackage{adassconf}

\begin{document}   

\title{Increasing the Accessibility of Green Bank Telescope Data}
\paperID{P1-17}
%%%% ID=P1-17

\author{Karen O'Neil, Nicole Radziwill, Ronald J. Maddalena}
\affil{National Radio Astronomy Observatory,
P.O. Box 2, Green Bank, WV 24944, U.S.A.}
\contact{Karen O'Neil}
\email{koneil@nrao.edu}

\paindex{ONeil@O'Neil, K.}
\aindex{Radziwill, N.}
\aindex{Maddalena, R.}

\authormark{O'Neil, Radziwill, \& Maddalena}

\keywords{astronomy: radio, AIPS++, SDFITS, data: reduction, GBT, IDL   }

\begin{abstract}          
The Green Bank Telescope (GBT) currently outputs its raw data as a suite of
binary FITS files, approximately one per component device on the
telescope, which are then consolidated and pre-processed before being
written into an AIPS++ Measurement Set for more extensive analysis.
This design decision by the GBT project had essentially restricted astronomers
to a single data analysis package and reduced the productivity of those
who prefer other analysis packages. To maximize the scientific returns
from the unique features of the GBT, and to support a broader cross-section
of observers' backgrounds and interests, work is being done to combine
raw GBT data from the disparate FITS files into a variety of standardized
FITS file formats such as SDFITS and CLASS FITS. 
%hese files can then
%be analyzed using tools such as IDL, CLASS, Mathematica and Matlab,
%for example.
Here we describe prototyping exercises that were initiated
during the summer of 2003 for the purpose of identifying
how to make GBT data more readily accessible to a wider variety of data
reduction tools. Although further refinement is needed to support the
standard observing modes of the GBT in a production capacity, early
results from the investigation demonstrate the feasibility and applicability
of the approach.
\end{abstract}

\section{Background}

At present, a typical data set resulting from the Robert C. Byrd Green
Bank Telescope (GBT) is composed of individual FITS files for each device
required for an observation (e.g. the antenna, LO, backend) as well
as a log (also a FITS file) which indexes all of the device files
according to scans.
 GBT data can be assimilated into the AIPS++ DISH utility by using the 
AIPS++ d.import command, or by using the gbtmsfiller command, called 
from the UNIX command line. Either step transforms the raw data into
a representation that is sensible from the astronomical perspective.

Because the GBT was designed to produce its raw data as a collection
of FITS files, it is a challenge for any data reduction package to combine the information for analysis.
To fill data into an AIPS++ Measurement
Set, the development team spent up to two years resolving
issues associated with the data itself, and was eventually able to produce
the gbtmsfiller routine which is in use today. Prior to the launch of the GBT
data accessibility exploration, IDL users (for example) had to follow
a similar process independently, writing their own modules to extract
and pre-process relevant information from the collection of GBT FITS
files. Users of other packages are still faced with this barrier.

%Because the raw data output from the GBT is segmented, several common
%issues are encountered regardless of which data analysis package an
%observer wishes to use. These data preprocessing functions, which currently
%exist in gbtmsfiller, could be componentized for general use. These
%include, but are not limited to, the following:
%\begin{itemize}
%\item The ability to select subsets of data to process;
%\item Associating an appropriate antenna pointing with each data sample;
%\item Generating a description of the frequency axis and polarization
%information for each data sample;
%\item Extracting Tcal values from receiver FITS files;
%\item Converting lags to spectra (required for spectrometer data only).
%\end{itemize}

%Users of other packages are not able to take advantage of solutions
%for these issues that have already been resolved in the process of making
%GBT data available to AIPS++. By creating a suite of data preprocessing
%components for general use, all that remains is to wrap the components
%within a script that also contains a data translation component, and
%output formats can be generated that are palatable to various data analysis
%packages (see Figure~\ref{P1-17:fig:map}).

The demand for greater accessibility has been expressed within NRAO
as well as by visiting observers. Several astronomers at Green Bank
have expressed a desire to process data in IDL, making use of IDL
modules relevant to astronomers that have been developed by third parties.
%This includes the Goddard Astronomy Users' Library which can be accessed
%at http://idlastro.gsfc.nasa.gov.
Engineers working on the Precision
Telescope Control System (PTCS) project (a major initiative currently
underway which will provide the pointing, collimation and surface accuracy
required to allow the GBT to operate effectively at 3mm -- see papers in this
volume by Constantikes \paperref{O9-1} and Marganian \paperref{P9-9}) 
%%% FO: Added the paper-ref's
do much of
their analysis in Matlab and need to access data from astronomical observations within the Matlab application.
Requests have also been made to allow ready data reduction within the 
CLASS, Classic AIPS, and Mathematica packages.

\section{Goals and Objectives}

The primary goal for this effort is to make GBT more readily accessible
to various data analysis packages.  It is understood that each package has its own
unique strengths and limitations, and not all packages may be able to
reduce all types of GBT observations. However, with a clear understanding
of what is possible with each package, an astronomer will have greater
leverage in choosing the tool that best suits his or her needs for a particular investigation.

This is not exclusively a data format issue, although knitting together
the disparate FITS files currently produced into one cohesive structure
is one important step to enable many of the data paths. The intention
is not to create a new, all-encompassing data format for the GBT, but
to arrive at a reasonable representation that will make it straightforward
to transition to future, standardized single dish data formats. (One
possibility is the MBFITS specification that is under discussion by ALMA.)

Meeting several objectives will facilitate the accomplishment of these goals:
\begin{itemize}
\item Find an easier way to get data out of FITS files. This step has
been accomplished through the development of FITS Query Language
(\htmladdnormallinkfoot{FQL}{http://wiki.gb.nrao.edu/bin/view/Data/FitsQueryLanguage});
\item Extract the preprocessing steps from gbtmsfiller and then rewrite them
in Python, so they can be used by multiple programs; 
\item Validate the preprocessing components against previously verified
parts of gbtmsfiller;
\item Use the Python preprocessing components to generate a unified
representation for GBT data;
\item Reduce basic continuum and spectral line observations; plot them
using various analysis packages and examine for correctness.\\
\end{itemize}

Once this process is complete, we will be able to verify the consistency
of scientific results between data analysis packages (e.g. IDL vs. AIPS,
AIPS++ vs CLASS, CLASS vs. IDL); until now we have not had two or
more packages with which cross-comparisons can be performed. Being able
to perform cross-comparisons will aid the process of commissioning data
reduction for new capabilities on the GBT, ensuring that errors are
captured well in advance of live observations using a new device.

\section{Prototyping Exercises}

Three types of data were evaluated during the initial exercises: continuum
data taken with the Digital Continuum Receiver and spectral line data
from both the GBT spectrometer and spectral processor.

As it  is a powerful language with the array handling needed for working
with GBT data, Python was chosen as the programming language for all
accessibility prototypes. It has a reasonably quick learning curve --
skilled software engineers in Green Bank with no prior knowledge of
Python were able to produce useful results within 2-3 days of beginning
to work with the language. Additionally, several ALMA prototypes are
being written in Python, indicating that Python could become a core
competency among software engineers throughout NRAO.

Proof of concept exercises have been performed using IDL and Matlab
experiments are in progress (Figure~\ref{P1-17:fig:map}). These experiments take advantage of the
FITS Query Language to create an intermediary data
format based on SDFITS. The next phase of prototype work to be completed
by the end of the year will explore data accessibility by other analysis packages.

\begin{figure}
\plotone{P1-17_fig1a.eps}\\
\plotone{P1-17_fig1b.eps}
\caption{A continuum 21-cm map, completed as an assignment in the 2003 Single Dish Summer School held in Green Bank, was produced in both IDL (top) and AIPS++ (bottom) with similar results. Note that the color scale for the two images is different. \label{P1-17:fig:map}}
\end{figure}

\section{Accessibility Strategy}

Making GBT data accessible to additional data analysis packages is being
done in a staged approach, aligned with demand from visiting observers
and other development priorities of the GBT project. IDL is being targeted
immediately, because of the strong demand that has been expressed by
visiting observers and local astronomers alike. Accessibility of GBT
data to Matlab is also being addressed at the present time to support
critical PTCS experiments. In the next stage, access to CLASS will be
investigated to support a wider audience of radio astronomers, and accessibility
to AIPS will be explored, in part to support research for GBT development
projects now in their earliest stages. Mathematica, which has the fewest
identified users to date, will be explored once solutions are in place
for other packages which are used more widely.

%Standard data preprocessing components can be reused for many of these
%cases, making the entire system more maintainable while granting easier
%access to a larger variety of data analysis programs. See
%Figure~\ref{P1-17:fig:preprocess} for
%details.  Standard data preprocessing components can be reused for many
%of these cases making the entire system more maintainable while granting
%easier access to a larger variety of data analysis programs.

\section{Current Status and Future Plans}

%Despite the demonstrated ability to import and plot data in IDL, and
%access raw data in Matlab, there is still much work remaining to be
%done. Errors in content and form in the unified output data format are
%have been resolved. The prototype programs and a memo
%describing their use is on track to be presented to a wider audience
%within NRAO by the middle of Q3 2003. NRAO astronomers can then provide
%comments based on the applicability of these programs to their own GBT
%data sets. After this time the feasibility of making the data sets a
%production offering can be accurately evaluated.
%
%AIPS++ is still being actively used in Green Bank for many purposes
% including individual research and the commissioning of new instrumentation
% Enhancements to the package will be made as appropriate, in response
%to specific demands from GBT development projects. Updated versions
%of AIPS++ will be made available three to four times a year for internal
%astronomers as well as visiting observers. IDL versions 5.5 and 6 are
%installed locally and available for public use, and CLASS has been installed
%but is not yet available. Matlab and Mathematica are licensed to individual
%staff members, who have purchased the packages for their own needs.
On November 24th, 2003, the beta version of the SDFITS generator was
released for wide internal review. Continuum data from the DCR, as well
as spectral line data from both the spectrometer and the spectral
processor, are fully supported. File sizes are somewhat smaller than the
total size of the raw data files, and much smaller than equivalent
MeasurementSets. The output in the SDFITS files has been validated
against the AIPS++ filler and is at least as accurate, although performs
much more slowly. Future plans include making the preprocessing
components used to generate the SDFITS files fast enough to replace the
AIPS++ filler, so that data to be reduced in most data reduction
packages will be preprocessed by the same, uniformly validated
components.

The GBT project does not intend to provide dedicated support to users
of all the packages described herein; however, limited hands-on support
for select packages such as AIPS++ and IDL will be available.  The intent
is to provide sufficient documentation that all of the options, while
making it possible for any observer to be able to easily use the data analysis
package of their choice.

Up-to-date information on this project can be found online at\\
\makeURL{http://wiki.gb.nrao.edu/bin/view/Data/WebHome}.

\acknowledgements{ \small
Scientific validity of this activity has relied, and will continue to
rely, upon the contributions of NRAO astronomers Bob Garwood and Jim
Braatz, in consultation with Bill Cotton. Bob Garwood is also leading
the work to qualitatively and quantitatively assess the accuracy and
viability of reusable preprocessing components, and contributes extensive
knowledge about the processing of GBT data and internals of gbtmsfiller
 which he wrote. Technical development has been made possible thanks
to the work of Green Bank Software Engineer Eric Sessoms, who conceived
the idea and developed the FQL utility, and built all initial versions
of data preprocessing components in Python. 
The technical efforts for producing a suitable evolutionary data format
are now being led by David Fleming, also a Software Engineer in Green
Bank. Work to access GBT data in Matlab is being done by Software Engineers
Ramon Creager and Paul Marganian. We also thank Kim Constantikes who
is the lead user of Matlab as PTCS Project Engineer, as well as Carl
Heiles and Tim Robishaw who have supplied us with tremendous insight
about how they currently use IDL to analyze GBT data.
}
\end{document} 
