J/A+A/668/A99       Gaia DR3 quasar and galaxy classification    (Hughes+, 2022)

Quasar and galaxy classification using Gaia EDR3 and CatWise2020. Hughes A.C.N., Bailer-Jones C.A.L., Jamal S. <Astron. Astrophys. 668, A99 (2022)> =2022A&A...668A..99H 2022A&A...668A..99H (SIMBAD/NED BibCode)
ADC_Keywords: Surveys ; QSOs ; Galaxies Keywords: methods: statistical - surveys - quasars: general - galaxies: general - stars: general - methods: data analysis Abstract: In this work, we assess the combined use of Gaia photometry and astrometry with infrared data from CatWISE in improving the identification of extragalactic sources compared to the classification obtained using Gaia data. Here we perform a comprehensive study in which we assess different input feature configurations and prior functions to identify extragalactic sources in Gaia, with the aim of presenting a classification methodology that integrates prior knowledge stemming from realistic class distributions in the Universe. In our work, we compare different classifiers, namely Gaussian mixture models (GMMs) and the boosted decision trees, XGBoost and CatBoost, in a supervised approach, and classify sources into three classes, namely star, quasar, and galaxy, with the target quasar and galaxy class labels obtained from the Sloan Digital Sky Survey Data release 16 (SDSS16) and the star label from Gaia EDR3. In our approach, we adjust the posterior probabilities to reflect the intrinsic distribution of extragalactic sources in the Universe via a prior function. In particular, we introduce two priors, a global prior reflecting the overall rarity of quasars and galaxies, and a mixed prior that incorporates in addition the distribution of the extragalactic sources as a function of Galactic latitude and magnitude. Our best classification performances, in terms of completeness and purity of the extragalactic classes, namely the galaxy and quasar classes, are achieved using the mixed prior for sources at high latitudes and in the magnitude range G=18.5-19.5. We apply our identified best-performing classifier to three application datasets from Gaia Data Release 3 (GDR3), and find that the global prior is more conservative in what it considers to be a quasar or a galaxy compared to the mixed prior. In particular, when applied to the quasar and galaxy candidate tables from GDR3, the classifier using a global prior achieves purities of 55% for quasars and 93% for galaxies, and purities of 59% and 91%, respectively, using the mixed prior. When compared to the performances obtained on the GDR3 pure quasar and galaxy candidate samples, we reach a higher level of purity, 97% for quasars and 99.9% for galaxies using the global prior, and purities of 96% and 99%, respectively, using the mixed prior. When refining the GDR3 candidate tables via a cross-match with SDSS DR16 confirmed quasars and galaxies, the classifier reaches purities of 99.8% for quasars and 99.9% for galaxies using a global prior, and 99.9% and 99.9% using the mixed prior. We conclude our work by discussing the importance of applying adjusted priors that portray realistic class distributions in the Universe and the effect of introducing infrared data as ancillary inputs in the identification of extragalactic sources. Description: We provide probabilistic quasar and galaxy classifications for sources defined in the Gaia Data Release 3 quasar and galaxy candidate tables. This has been achieved using a supervised classification method (XGBoost) using features defined in Gaia EDR3 and CatWISE2020.The model is trained empirically to classify objects into three classes - star, quasar, galaxy - for all objects. We provide the probabilities for being a star(pStar), quasar (pQSO) and a galaxy (pGAL), and all other Gaia data can be obtained by cross-matching Gaia-DR3 using the source identifier. See the paper for details of the purity and completeness of these samples, and for more details of its construction, contents, and validation. The classes are defined by the supervised training sets used in the classifier. Galaxies and quasars are identified for the training set by a cross-match to objects with spectroscopic classifications from the Sloan Digital Sky Survey Data Release 16. Stars are defined directly from Gaia-EDR3. In addition the three classes require CatWISE2020 photometry. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file quasarpb.dat 92 4048626 Probabilities on the GDR3 quasar candidate table galaxypb.dat 89 4194100 Probabilities on the GDR3 galaxy candidate table -------------------------------------------------------------------------------- See also: I/355 : Gaia DR3 Part 1. Main source (Gaia Collaboration, 2022) I/356 : Gaia DR3 Part 2. Extra-galactic (Gaia Collaboration, 2022) II/365 : The CatWISE2020 catalog (updated version 28-Jan-2021) (Marocco+, 2021) V/154 : Sloan Digital Sky Surveys (SDSS), Release 16 (DR16) (Ahumada+, 2020) Byte-by-byte Description of file: quasarpb.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 19 I19 --- GaiaDR3 Gaia-DR3 identification number (source_id) 21 A1 --- isQSOpure [F/T] Binary flag indicating in the GDR3 pure quasar sample (isQSO_pure) 23 A1 --- isQSOSDSS [F/T] Binary flag indicating in the SDSS16 quasar sample (isQSO_pure) 25- 46 E22.18 --- pStar Star probability (pStar) 48- 69 E22.18 --- pQSO Quasar probability (pQSO) 71- 92 E22.18 --- pGAL Galaxy probability (pGAL) -------------------------------------------------------------------------------- Byte-by-byte Description of file: galaxypb.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 19 I19 --- GaiaDR3 Gaia-DR3 identification number (source_id) 21 A1 --- isGALpure [F/T] Binary flag indicating in the GDR3 pure galaxy sample (isGAL_pure) 23 A1 --- isGALSDSS [F/T] Binary flag indicating in the SDSS16 galaxy sample (isGAL_SDSS) 25- 46 E22.18 --- pStar Star probability (pStar) 48- 68 E21.15 --- pQSO Quasar probability (pQSO) 70- 89 E20.14 --- pGAL Galaxy probability (pGAL) -------------------------------------------------------------------------------- Acknowledgements: Arvind Hughes, ahughes(at)mpia.de
(End) Patricia Vannier [CDS] 12-Oct-2022
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line