J/A+A/647/A116 YSO candidate catalog from ANN (Cornu+, 2021)
A neural network-based methodology to select young stellar object candidates
from IR surveys.
Cornu D., Montillaud J.
<Astron. Astrophys. 647, A116 (2021)>
=2021A&A...647A.116C 2021A&A...647A.116C (SIMBAD/NED BibCode)
ADC_Keywords: YSOs ; Photometry, infrared ; Photometry, classification ;
Magnitudes
Keywords: methods: statistical - methods: numerical - stars: protostars -
stars: pre-main sequence - catalogs=
Abstract:
Observed young stellar objects (YSOs) are used to study star formation
and characterize star-forming regions. For this purpose, YSO candidate
catalogs are compiled from various surveys, especially in the infrared
(IR), and simple selection schemes in color-magnitude diagrams (CMDs)
are often used to identify and classify YSOs.
We propose a methodology for YSO classification through machine
learning (ML) using Spitzer IR data. We detail our approach in order
to ensure reproducibility and provide an in-depth example on how to
efficiently apply ML to an astrophysical classification.
We used feed forward artificial neural networks (ANNs) that use the
four IRAC bands (3.6, 4.5, 5.8, and 8 micron) and the 24 micron MIPS
band from Spitzer to classify point source objects into CI and CII YSO
candidates or as contaminants. We focused on nearby (∼1kpc)
star-forming regions including Orion and NGC 2264, and assessed the
generalization capacity of our network from one region to another.
We found that ANNs can be efficiently applied to YSO classification
with a contained number of neurons (∼25). Knowledge gathered on one
star-forming region has shown to be partly efficient for prediction in
new regions. The best generalization capacity was achieved using a
combination of several star-forming regions to train the network.
Carefully rebalancing the training proportions was necessary to
achieve good results. We observed that the predicted YSOs are mainly
contaminated by under-constrained rare subclasses like Shocks and
polycyclic aromatic hydrocarbons (PAHs), or by the vastly dominant
other kinds of stars (mostly on the main sequence).
We achieved above 90% and 97% recovery rate for CI and CII YSOs,
respectively, with a precision above 80% and 90% for our most general
results. We took advantage of the great flexibility of ANNs to define,
for each object, an effective membership probability to each output
class. Using a threshold in this probability was found to efficiently
improve the classification results at a reasonable cost of object
exclusion. With this additional selection, we reached 90% and 97%
precision on CI and CII YSOs, respectively, for more than half of
them. Our catalog of YSO candidates in Orion (365 CI, 2381 CII) and
NGC 2264 (101 CI, 469 CII) predicted by our final ANN, along with the
class membership probability for each object, is publicly available at
the CDS.
Compared to usual CMD selection schemes, ANNs provide a possibility to
quantitatively study the properties and quality of the classification.
Although some further improvement may be achieved by using more
powerful ML methods, we established that the result quality depends
mostly on the training set construction. Improvements in YSO
identification with IR surveys using ML would require larger and more
reliable training catalogs, either by taking advantage of current and
future surveys from various facilities like VLA, ALMA, or Chandra, or
by synthesizing such catalogs from simulations.
Description:
YSO candidate catalog with the associated class prediction from our
trained neural network (F-C). The prediction is made only for objects
from the Orion (Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat.
J/AJ/144/192) and NGC 2264 (Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat.
J/ApJ/794/124) catalogs using our pre-selection criteria. Our catalog
lists the original catalog of each object, all the Spitzer bands and
their uncertainties that were used as input features for the network,
the target associated with each object using the subclasses of the
Gutermuth et al. (2009ApJS..184...18G 2009ApJS..184...18G, Cat. J/ApJS/184/18) method, and
the prediction of the network using our three output classes (CI, CII,
Others). For each object the predicted membership probability to each
output class is provided, making it possible to select objects
according to the reliability of their classification. This enabling
subsequent refinement of the classification following the
prescriptions from the paper.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
catalog.dat 133 26903 Object classification including YSO candidates
--------------------------------------------------------------------------------
See also:
J/AJ/144/192 : Orion A and B Spitzer survey. I. YSO catalog (Megeath+, 2012)
J/ApJ/794/124 : Young SFR NGC 2264 Spitzer sources (Rapson+, 2014)
Byte-by-byte Description of file: catalog.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 11 F11.7 deg RAdeg Right ascension (J2000.0)
13- 23 F11.7 deg DEdeg Declination (J2000.0)
25 I1 --- Catalog [0/1] Original YSO catalog of the object (1)
27- 32 A6 --- OriClass Class from the original catalog (2)
35- 40 F6.3 mag 3.6mag Spitzer/IRAC 3.6 micron band magnitude
42- 46 F5.3 mag e_3.6mag Uncertainty in 3.6mag
49- 54 F6.3 mag 4.5mag Spitzer/IRAC 4.5 micron band magnitude
56- 60 F5.3 mag e_4.5mag Uncertainty in 4.5mag
63- 68 F6.3 mag 5.8mag Spitzer/IRAC 5.8 micron band magnitude
70- 74 F5.3 mag e_5.8mag Uncertainty in 5.8mag
77- 82 F6.3 mag 8.0mag Spitzer/IRAC 8 micron band magnitude
84- 88 F5.3 mag e_8.0mag Uncertainty in 8.0mag
91- 96 F6.3 mag 24mag ? Spitzer/MIPS 24 micron band magnitude
98-102 F5.3 mag e_24mag ? Uncertainty in 24mag
104 I1 --- Target [0/6] The target subclass from the Gutermuth
et al. (2009ApJS..184...18G 2009ApJS..184...18G) method (3)
106 I1 --- Pred [0/2] Network class prediction (4)
108-115 F8.6 --- P(CI) [0/1] CI YSO predicted membership
probability (5)
117-124 F8.6 --- P(CII) [0/1] CII YSO predicted membership
probability(5)
126-133 F8.6 --- P(Other) [0/1] Other predicted membership
probability (5)
--------------------------------------------------------------------------------
Note (1): Original Spitzer IR catalog from which the object is taken as follows:
0 = Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192 - Orion dataset
1 = Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124 - NGC 2264 dataset
Note (2): Associated class in the original catalog as follows:
0/I = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192
II = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192
III/F = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192
AGN = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192
PAH = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192
SHOCK = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192
T = for Megeath et al., 2012AJ....144..192M 2012AJ....144..192M, Cat. J/AJ/144/192
Proto/CI = for Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124
Faint/CI = for Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124
Disks/CII = for Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124
Other = for Rapson et al, 2014ApJ...794..124R 2014ApJ...794..124R, Cat. J/ApJ/794/124
Note (3): Target subclass from our modified Gutermuth et al.
(2009ApJS..184...18G 2009ApJS..184...18G) method as follows:
0 = CI YSOs
1 = CII YSOs
3 = Galaxies
4 = Shocks
5 = PAHs
6 = Stars
Note (4): Class prediction from the network (highest probability) as follows:
0 = CI YSOs
1 = CII YSOs
2 = Contaminants (Others)
Note (5): This a membership probability according to the network and not a
genuine probability of the object being of the associated class.
The confusion matrix should be computed at a given probability threshold
to estimate the true probability (precision) of the remaining objects for
each class as described in section 5.3 of the paper.
--------------------------------------------------------------------------------
Acknowledgements:
David Cornu, david.cornu(at)observatoiredeparis.psl.eu
(End) David Cornu [UTINAM & LERMA, France], Patricia Vannier [CDS] 25-Jan-2021