J/A+A/668/A99 Gaia DR3 quasar and galaxy classification (Hughes+, 2022)
Quasar and galaxy classification using Gaia EDR3 and CatWise2020.
Hughes A.C.N., Bailer-Jones C.A.L., Jamal S.
<Astron. Astrophys. 668, A99 (2022)>
=2022A&A...668A..99H 2022A&A...668A..99H (SIMBAD/NED BibCode)
ADC_Keywords: Surveys ; QSOs ; Galaxies
Keywords: methods: statistical - surveys - quasars: general -
galaxies: general - stars: general - methods: data analysis
Abstract:
In this work, we assess the combined use of Gaia photometry and
astrometry with infrared data from CatWISE in improving the
identification of extragalactic sources compared to the classification
obtained using Gaia data. Here we perform a comprehensive study in
which we assess different input feature configurations and prior
functions to identify extragalactic sources in Gaia, with the aim of
presenting a classification methodology that integrates prior
knowledge stemming from realistic class distributions in the Universe.
In our work, we compare different classifiers, namely Gaussian mixture
models (GMMs) and the boosted decision trees, XGBoost and CatBoost, in
a supervised approach, and classify sources into three classes, namely
star, quasar, and galaxy, with the target quasar and galaxy class
labels obtained from the Sloan Digital Sky Survey Data release 16
(SDSS16) and the star label from Gaia EDR3. In our approach, we adjust
the posterior probabilities to reflect the intrinsic distribution of
extragalactic sources in the Universe via a prior function. In
particular, we introduce two priors, a global prior reflecting the
overall rarity of quasars and galaxies, and a mixed prior that
incorporates in addition the distribution of the extragalactic sources
as a function of Galactic latitude and magnitude. Our best
classification performances, in terms of completeness and purity of
the extragalactic classes, namely the galaxy and quasar classes, are
achieved using the mixed prior for sources at high latitudes and in
the magnitude range G=18.5-19.5. We apply our identified
best-performing classifier to three application datasets from Gaia
Data Release 3 (GDR3), and find that the global prior is more
conservative in what it considers to be a quasar or a galaxy compared
to the mixed prior. In particular, when applied to the quasar and
galaxy candidate tables from GDR3, the classifier using a global prior
achieves purities of 55% for quasars and 93% for galaxies, and
purities of 59% and 91%, respectively, using the mixed prior. When
compared to the performances obtained on the GDR3 pure quasar and
galaxy candidate samples, we reach a higher level of purity, 97% for
quasars and 99.9% for galaxies using the global prior, and purities of
96% and 99%, respectively, using the mixed prior. When refining the
GDR3 candidate tables via a cross-match with SDSS DR16 confirmed
quasars and galaxies, the classifier reaches purities of 99.8% for
quasars and 99.9% for galaxies using a global prior, and 99.9% and
99.9% using the mixed prior. We conclude our work by discussing the
importance of applying adjusted priors that portray realistic class
distributions in the Universe and the effect of introducing infrared
data as ancillary inputs in the identification of extragalactic
sources.
Description:
We provide probabilistic quasar and galaxy classifications for sources
defined in the Gaia Data Release 3 quasar and galaxy candidate tables.
This has been achieved using a supervised classification method
(XGBoost) using features defined in Gaia EDR3 and CatWISE2020.The
model is trained empirically to classify objects into three classes -
star, quasar, galaxy - for all objects. We provide the probabilities
for being a star(pStar), quasar (pQSO) and a galaxy (pGAL), and all
other Gaia data can be obtained by cross-matching Gaia-DR3 using the
source identifier. See the paper for details of the purity and
completeness of these samples, and for more details of its
construction, contents, and validation.
The classes are defined by the supervised training sets used in the
classifier. Galaxies and quasars are identified for the training set
by a cross-match to objects with spectroscopic classifications from
the Sloan Digital Sky Survey Data Release 16. Stars are defined
directly from Gaia-EDR3. In addition the three classes require
CatWISE2020 photometry.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
quasarpb.dat 92 4048626 Probabilities on the GDR3 quasar candidate table
galaxypb.dat 89 4194100 Probabilities on the GDR3 galaxy candidate table
--------------------------------------------------------------------------------
See also:
I/355 : Gaia DR3 Part 1. Main source (Gaia Collaboration, 2022)
I/356 : Gaia DR3 Part 2. Extra-galactic (Gaia Collaboration, 2022)
II/365 : The CatWISE2020 catalog (updated version 28-Jan-2021) (Marocco+, 2021)
V/154 : Sloan Digital Sky Surveys (SDSS), Release 16 (DR16) (Ahumada+, 2020)
Byte-by-byte Description of file: quasarpb.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 19 I19 --- GaiaDR3 Gaia-DR3 identification number (source_id)
21 A1 --- isQSOpure [F/T] Binary flag indicating in the GDR3
pure quasar sample (isQSO_pure)
23 A1 --- isQSOSDSS [F/T] Binary flag indicating in the SDSS16
quasar sample (isQSO_pure)
25- 46 E22.18 --- pStar Star probability (pStar)
48- 69 E22.18 --- pQSO Quasar probability (pQSO)
71- 92 E22.18 --- pGAL Galaxy probability (pGAL)
--------------------------------------------------------------------------------
Byte-by-byte Description of file: galaxypb.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 19 I19 --- GaiaDR3 Gaia-DR3 identification number (source_id)
21 A1 --- isGALpure [F/T] Binary flag indicating in the GDR3
pure galaxy sample (isGAL_pure)
23 A1 --- isGALSDSS [F/T] Binary flag indicating in the SDSS16
galaxy sample (isGAL_SDSS)
25- 46 E22.18 --- pStar Star probability (pStar)
48- 68 E21.15 --- pQSO Quasar probability (pQSO)
70- 89 E20.14 --- pGAL Galaxy probability (pGAL)
--------------------------------------------------------------------------------
Acknowledgements:
Arvind Hughes, ahughes(at)mpia.de
(End) Patricia Vannier [CDS] 12-Oct-2022