J/A+A/683/A34         Boost recall in QSO selection           (Calderone+, 2024)

Boost recall in quasi-stellar object selection from highly imbalanced photometric datasets. The reverse selection method. Calderone G., Guarneri F., Porru M., Cristiani S., Grazian A., Nicastro L., Bischetti M., Boutsia K., Cupani G., D'Odorico V., Feruglio C., Fontanot F. <Astron. Astrophys. 683, A34 (2024)> =2024A&A...683A..34C 2024A&A...683A..34C (SIMBAD/NED BibCode)
ADC_Keywords: Surveys ; QSOs ; Redshifts Keywords: methods: statistical - astronomical databases: miscellaneous - catalogs - surveys - quasars: general Abstract: The identification of bright quasi-stellar objects (QSOs) is of fundamental importance to probe the intergalactic medium and address open questions in cosmology. Several approaches have been adopted to find such sources in the currently available photometric surveys, including machine learning methods. However, the rarity of bright QSOs at high redshifts compared to other contaminating sources (such as stars and galaxies) makes the selection of reliable candidates a difficult task, especially when high completeness is required. We present a novel technique to boost recall (i.e., completeness within the considered sample) in the selection of QSOs from photometric datasets dominated by stars, galaxies, and low-z QSOs (imbalanced datasets). Our heuristic method operates by iteratively removing sources whose probability of belonging to a noninteresting class exceeds a user-defined threshold, until the remaining dataset contains mainly high-z QSOs. Any existing machine learning method can be used as the underlying classifier, provided it allows for a classification probability to be estimated. We applied the method to a dataset obtained by cross-matching PanSTARRS1 (DR2), Gaia (DR3), and WISE, and identified the high-z QSO candidates using both our method and its direct multi-label counterpart. We ran several tests by randomly choosing the training and test datasets, and achieved significant improvements in recall which increased from ∼50% to ∼85% for QSOs with z>2.5, and from ∼70% to ∼90% for QSOs with z>3. Also, we identified a sample of 3098 new QSO candidates on a sample of 2.6x106 sources with no known classification. We obtained follow-up spectroscopy for 121 candidates, confirming 107 new QSOs with z>2.5. Finally, a comparison of our QSO candidates with those selected by an independent method based on GAIA spectroscopy shows that the two samples overlap by more than 90% and that both selection methods are potentially capable of achieving a high level of completeness. Description: We present the entire sample of 3098 high-z QSO candidates obtained with our method (cands.dat) as well as the 107 QSOs with z>2.5 for which we obtained a spectroscopic classification and redshift (newqso.dat). File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file cands.dat 40 3098 The list of QSO candidates (table B1) newqso.dat 103 107 The newly identified QSOs (table B1) -------------------------------------------------------------------------------- Byte-by-byte Description of file: cands.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 9 I9 --- qid Qubrics internal ID 11- 12 I2 h RAh Right Ascension (J2000) 14- 15 I2 min RAm Right Ascension (J2000) 17- 21 F5.2 s RAs Right Ascension (J2000) 23 A1 --- DE- Declination sign (J2000) 24- 25 I2 deg DEd Declination (J2000) 27- 28 I2 arcmin DEm Declination (J2000) 30- 34 F5.2 arcsec DEs Declination (J2000) 36- 40 F5.2 mag imag PanSTARRS i magnitude -------------------------------------------------------------------------------- Byte-by-byte Description of file: newqso.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 9 I9 --- qid Qubrics internal ID 11- 12 I2 h RAh Right Ascension (J2000) 14- 15 I2 min RAm Right Ascension (J2000) 17- 21 F5.2 s RAs Right Ascension (J2000) 23 A1 --- DE- Declination sign (J2000) 24- 25 I2 deg DEd Declination (J2000) 27- 28 I2 arcmin DEm Declination (J2000) 30- 34 F5.2 arcsec DEs Declination (J2000) 36- 40 F5.2 mag imag PanSTARRS i magnitude 42- 45 F4.2 --- z Spectroscopic redshift 47-103 A57 --- Notes Notes -------------------------------------------------------------------------------- Acknowledgements: Giorgio Calderone, giorgio.calderone(at)inaf.it
(End) Giorgio Calderone [INAF-OATs], Patricia Vannier [CDS] 06-Jan-2024
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line