J/A+A/701/A223  ML-aided selected Lyalpha candidates in COSMOS2020 (Vale+, 2025)

A gradient boosting and broadband approach to finding Lyman-alpha emitting galaxies beyond narrow-band surveys. Vale A., Paulino-Afonso A., Humphrey A., Cunha P.A.C., Ribeiro B., Cerqueira B., Carvajal R., Fonseca J. <Astron. Astrophys. 701, A223 (2025)> =2025A&A...701A.223V 2025A&A...701A.223V (SIMBAD/NED BibCode)
ADC_Keywords: Galaxies ; Photometry ; Galaxy catalogs ; Models Keywords: methods: data analysis - methods: statistical - surveys - galaxies: high redshift - galaxies: photometry Abstract: The identification of Lyman-alpha emitting galaxies (LAEs) has traditionally relied on dedicated surveys using custom narrow-band filters, which constrain observations to specific narrow redshift intervals, or on blind spectroscopy, which - although unbiased - typically requires extensive telescope time, making it challenging to assemble large, statistically robust galaxy samples. With the advent of wide-area astronomical surveys producing datasets significantly larger than traditional surveys, the need for new techniques arises. We test whether gradient boosting algorithms, trained on broadband photometric data from traditional LAE surveys, can efficiently and accurately identify LAE candidates from typical star-forming galaxies at similar redshifts and brightness levels. Using galaxy samples at z ∈ [2, 6] derived from the COSMOS2020 and SC4K catalogs, we trained gradient-boosting machine learning algorithms (LGBM, XGBoost, and CatBoost), using optical and near-infrared broad-band photometry. To ensure balanced performance, the models were trained on carefully selected datasets, with similar redshift and i-band magnitude distributions. Additionally, the models were tested for robustness by perturbing the photometric data using the associated observational uncertainties. Our classification models achieved F1-scores ∼87%, successfully identifying around 7000 objects with unanimous agreement across all models. This more than doubles the number of LAEs identified in the COSMOS field compared with the SC4K dataset. We managed to spectroscopically confirmed 60 of these LAEs candidates using the publicly available catalogs in the COSMOS field. These results highlight the potential of machine learning in efficiently identifying LAEs candidates, laying foundations for application to larger photometric surveys, such as Euclid and LSST. By complementing traditional approaches and providing robust pre-selection capabilities, our models facilitate the analysis of these objects, crucial to increase our knowledge of the overall LAE population. Description: We applied machine-learning techniques, specifically, the gradient-boosting algorithms LightGBM, XGBoost, and CatBoost, to identify LAEs candidates using broadband photometric data (fluxes, magnitudes, and colors) in the optical and NIR. Using SC4K and COSMOS2020, we extracted five samples with similar redshift and i-band magnitude distributions to ensure that we had comparable LAE and nLAE populations. We finally trained, tested, and analyzed the three algorithms in each one of the five samples, resulting in 15 models. ML-aided selected Lyalpha candidates in COSMOS2020 using gradient-boosting algorithms. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file seldata.dat 74 7073 Identification of the selected candidates -------------------------------------------------------------------------------- See also: J/MNRAS/476/4725 : SC4K catalogue of candidate LAEs (Sobral+, 2018) J/ApJS/258/11 : The COSMOS2020 catalog (Weaver+, 2022) Byte-by-byte Description of file: seldata.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 7 I7 --- ID COSMOS2020 ID number 9- 26 F18.14 deg RAdeg COSMOS2020 Right Ascension (J2000.0) 28- 45 F18.16 deg DEdeg COSMOS2020 Declination (J2000.0) 47- 52 F6.4 --- zph COSMOS2020 LePhare redshift (lp_zBEST) 54- 55 I2 --- TimesPred Number of predictions (out of 15) 57- 74 F18.16 --- AvgPredProba Average prediction probability of LAE candidate -------------------------------------------------------------------------------- Acknowledgements: Afonso Vale, afonso.vale(at)astro.up.pt
(End) Patricia Vannier [CDS] 29-Jul-2025
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line