J/A+A/647/A116      YSO candidate catalog from ANN                (Cornu+, 2021)

A neural network-based methodology to select young stellar object candidates from IR surveys. Cornu D., Montillaud J. <Astron. Astrophys. 647, A116 (2021)> =2021A&A...647A.116C 2021A&A...647A.116C (SIMBAD/NED BibCode)
ADC_Keywords: YSOs ; Photometry, infrared ; Photometry, classification ; Magnitudes Keywords: methods: statistical - methods: numerical - stars: protostars - stars: pre-main sequence - catalogs= Abstract: Observed young stellar objects (YSOs) are used to study star formation and characterize star-forming regions. For this purpose, YSO candidate catalogs are compiled from various surveys, especially in the infrared (IR), and simple selection schemes in color-magnitude diagrams (CMDs) are often used to identify and classify YSOs. We propose a methodology for YSO classification through machine learning (ML) using Spitzer IR data. We detail our approach in order to ensure reproducibility and provide an in-depth example on how to efficiently apply ML to an astrophysical classification. We used feed forward artificial neural networks (ANNs) that use the four IRAC bands (3.6, 4.5, 5.8, and 8 micron) and the 24 micron MIPS band from Spitzer to classify point source objects into CI and CII YSO candidates or as contaminants. We focused on nearby (∼1kpc) star-forming regions including Orion and NGC 2264, and assessed the generalization capacity of our network from one region to another. We found that ANNs can be efficiently applied to YSO classification with a contained number of neurons (∼25). Knowledge gathered on one star-forming region has shown to be partly efficient for prediction in new regions. The best generalization capacity was achieved using a combination of several star-forming regions to train the network. Carefully rebalancing the training proportions was necessary to achieve good results. We observed that the predicted YSOs are mainly contaminated by under-constrained rare subclasses like Shocks and polycyclic aromatic hydrocarbons (PAHs), or by the vastly dominant other kinds of stars (mostly on the main sequence). We achieved above 90% and 97% recovery rate for CI and CII YSOs, respectively, with a precision above 80% and 90% for our most general results. We took advantage of the great flexibility of ANNs to define, for each object, an effective membership probability to each output class. Using a threshold in this probability was found to efficiently improve the classification results at a reasonable cost of object exclusion. With this additional selection, we reached 90% and 97% precision on CI and CII YSOs, respectively, for more than half of them. Our catalog of YSO candidates in Orion (365 CI, 2381 CII) and NGC 2264 (101 CI, 469 CII) predicted by our final ANN, along with the class membership probability for each object, is publicly available at the CDS. Compared to usual CMD selection schemes, ANNs provide a possibility to quantitatively study the properties and quality of the classification. Although some further improvement may be achieved by using more powerful ML methods, we established that the result quality depends mostly on the training set construction. Improvements in YSO identification with IR surveys using ML would require larger and more reliable training catalogs, either by taking advantage of current and future surveys from various facilities like VLA, ALMA, or Chandra, or by synthesizing such catalogs from simulations. Description: YSO candidate catalog with the associated class prediction from our trained neural network (F-C). The prediction is made only for objects from the Orion (Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192) and NGC 2264 (Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124) catalogs using our pre-selection criteria. Our catalog lists the original catalog of each object, all the Spitzer bands and their uncertainties that were used as input features for the network, the target associated with each object using the subclasses of the Gutermuth et al. (2009ApJS..184...18G 2009ApJS..184...18G, Cat. J/ApJS/184/18) method, and the prediction of the network using our three output classes (CI, CII, Others). For each object the predicted membership probability to each output class is provided, making it possible to select objects according to the reliability of their classification. This enabling subsequent refinement of the classification following the prescriptions from the paper. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file catalog.dat 133 26903 Object classification including YSO candidates -------------------------------------------------------------------------------- See also: J/AJ/144/192 : Orion A and B Spitzer survey. I. YSO catalog (Megeath+, 2012) J/ApJ/794/124 : Young SFR NGC 2264 Spitzer sources (Rapson+, 2014) Byte-by-byte Description of file: catalog.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 11 F11.7 deg RAdeg Right ascension (J2000.0) 13- 23 F11.7 deg DEdeg Declination (J2000.0) 25 I1 --- Catalog [0/1] Original YSO catalog of the object (1) 27- 32 A6 --- OriClass Class from the original catalog (2) 35- 40 F6.3 mag 3.6mag Spitzer/IRAC 3.6 micron band magnitude 42- 46 F5.3 mag e_3.6mag Uncertainty in 3.6mag 49- 54 F6.3 mag 4.5mag Spitzer/IRAC 4.5 micron band magnitude 56- 60 F5.3 mag e_4.5mag Uncertainty in 4.5mag 63- 68 F6.3 mag 5.8mag Spitzer/IRAC 5.8 micron band magnitude 70- 74 F5.3 mag e_5.8mag Uncertainty in 5.8mag 77- 82 F6.3 mag 8.0mag Spitzer/IRAC 8 micron band magnitude 84- 88 F5.3 mag e_8.0mag Uncertainty in 8.0mag 91- 96 F6.3 mag 24mag ? Spitzer/MIPS 24 micron band magnitude 98-102 F5.3 mag e_24mag ? Uncertainty in 24mag 104 I1 --- Target [0/6] The target subclass from the Gutermuth et al. (2009ApJS..184...18G 2009ApJS..184...18G) method (3) 106 I1 --- Pred [0/2] Network class prediction (4) 108-115 F8.6 --- P(CI) [0/1] CI YSO predicted membership probability (5) 117-124 F8.6 --- P(CII) [0/1] CII YSO predicted membership probability(5) 126-133 F8.6 --- P(Other) [0/1] Other predicted membership probability (5) -------------------------------------------------------------------------------- Note (1): Original Spitzer IR catalog from which the object is taken as follows: 0 = Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192 - Orion dataset 1 = Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124 - NGC 2264 dataset Note (2): Associated class in the original catalog as follows: 0/I = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192 II = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192 III/F = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192 AGN = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192 PAH = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192 SHOCK = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192 T = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192 Proto/CI = for Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124 Faint/CI = for Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124 Disks/CII = for Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124 Other = for Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124 Note (3): Target subclass from our modified Gutermuth et al. (2009ApJS..184...18G 2009ApJS..184...18G) method as follows: 0 = CI YSOs 1 = CII YSOs 3 = Galaxies 4 = Shocks 5 = PAHs 6 = Stars Note (4): Class prediction from the network (highest probability) as follows: 0 = CI YSOs 1 = CII YSOs 2 = Contaminants (Others) Note (5): This a membership probability according to the network and not a genuine probability of the object being of the associated class. The confusion matrix should be computed at a given probability threshold to estimate the true probability (precision) of the remaining objects for each class as described in section 5.3 of the paper. -------------------------------------------------------------------------------- Acknowledgements: David Cornu, david.cornu(at)observatoiredeparis.psl.eu
(End) David Cornu [UTINAM & LERMA, France], Patricia Vannier [CDS] 25-Jan-2021
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line