J/A+A/683/A34 Boost recall in QSO selection (Calderone+, 2024)
Boost recall in quasi-stellar object selection from highly imbalanced
photometric datasets. The reverse selection method.
Calderone G., Guarneri F., Porru M., Cristiani S., Grazian A., Nicastro L.,
Bischetti M., Boutsia K., Cupani G., D'Odorico V., Feruglio C., Fontanot F.
<Astron. Astrophys. 683, A34 (2024)>
=2024A&A...683A..34C 2024A&A...683A..34C (SIMBAD/NED BibCode)
ADC_Keywords: Surveys ; QSOs ; Redshifts
Keywords: methods: statistical - astronomical databases: miscellaneous -
catalogs - surveys - quasars: general
Abstract:
The identification of bright quasi-stellar objects (QSOs) is of
fundamental importance to probe the intergalactic medium and address
open questions in cosmology. Several approaches have been adopted to
find such sources in the currently available photometric surveys,
including machine learning methods. However, the rarity of bright QSOs
at high redshifts compared to other contaminating sources (such as
stars and galaxies) makes the selection of reliable candidates a
difficult task, especially when high completeness is required.
We present a novel technique to boost recall (i.e., completeness
within the considered sample) in the selection of QSOs from
photometric datasets dominated by stars, galaxies, and low-z QSOs
(imbalanced datasets).
Our heuristic method operates by iteratively removing sources whose
probability of belonging to a noninteresting class exceeds a
user-defined threshold, until the remaining dataset contains mainly
high-z QSOs. Any existing machine learning method can be used as the
underlying classifier, provided it allows for a classification
probability to be estimated. We applied the method to a dataset
obtained by cross-matching PanSTARRS1 (DR2), Gaia (DR3), and WISE, and
identified the high-z QSO candidates using both our method and its
direct multi-label counterpart.
We ran several tests by randomly choosing the training and test
datasets, and achieved significant improvements in recall which
increased from ∼50% to ∼85% for QSOs with z>2.5, and from ∼70% to ∼90%
for QSOs with z>3. Also, we identified a sample of 3098 new QSO
candidates on a sample of 2.6x106 sources with no known
classification. We obtained follow-up spectroscopy for 121 candidates,
confirming 107 new QSOs with z>2.5. Finally, a comparison of our QSO
candidates with those selected by an independent method based on GAIA
spectroscopy shows that the two samples overlap by more than 90% and
that both selection methods are potentially capable of achieving a
high level of completeness.
Description:
We present the entire sample of 3098 high-z QSO candidates obtained
with our method (cands.dat) as well as the 107 QSOs with z>2.5 for
which we obtained a spectroscopic classification and redshift
(newqso.dat).
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
cands.dat 40 3098 The list of QSO candidates (table B1)
newqso.dat 103 107 The newly identified QSOs (table B1)
--------------------------------------------------------------------------------
Byte-by-byte Description of file: cands.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 9 I9 --- qid Qubrics internal ID
11- 12 I2 h RAh Right Ascension (J2000)
14- 15 I2 min RAm Right Ascension (J2000)
17- 21 F5.2 s RAs Right Ascension (J2000)
23 A1 --- DE- Declination sign (J2000)
24- 25 I2 deg DEd Declination (J2000)
27- 28 I2 arcmin DEm Declination (J2000)
30- 34 F5.2 arcsec DEs Declination (J2000)
36- 40 F5.2 mag imag PanSTARRS i magnitude
--------------------------------------------------------------------------------
Byte-by-byte Description of file: newqso.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 9 I9 --- qid Qubrics internal ID
11- 12 I2 h RAh Right Ascension (J2000)
14- 15 I2 min RAm Right Ascension (J2000)
17- 21 F5.2 s RAs Right Ascension (J2000)
23 A1 --- DE- Declination sign (J2000)
24- 25 I2 deg DEd Declination (J2000)
27- 28 I2 arcmin DEm Declination (J2000)
30- 34 F5.2 arcsec DEs Declination (J2000)
36- 40 F5.2 mag imag PanSTARRS i magnitude
42- 45 F4.2 --- z Spectroscopic redshift
47-103 A57 --- Notes Notes
--------------------------------------------------------------------------------
Acknowledgements:
Giorgio Calderone, giorgio.calderone(at)inaf.it
(End) Giorgio Calderone [INAF-OATs], Patricia Vannier [CDS] 06-Jan-2024