J/ApJ/811/30 Machine learning metallicity predictions using SDSS (Miller, 2015)

The synthetic-oversampling method: using photometric colors to discover extremely metal-poor stars. Miller A.A. <Astrophys. J., 811, 30 (2015)> =2015ApJ...811...30M 2015ApJ...811...30M (SIMBAD/NED BibCode)
ADC_Keywords: Abundances, [Fe/H] ; Models ; Photometry, SDSS Keywords: methods: data analysis; methods: statistical; stars: general; stars: statistics; stars: fundamental parameters; surveys Abstract: Extremely metal-poor (EMP) stars ([Fe/H]≤-3.0dex) provide a unique window into understanding the first generation of stars and early chemical enrichment of the universe. EMP stars are exceptionally rare, however, and the relatively small number of confirmed discoveries limits our ability to exploit these near-field probes of the first ∼500Myr after the Big Bang. Here, a new method to photometrically estimate [Fe/H] from only broadband photometric colors is presented. I show that the method, which utilizes machine-learning algorithms and a training set of ∼170000 stars with spectroscopically measured [Fe/H], produces a typical scatter of ∼0.29dex. This performance is similar to what is achievable via low-resolution spectroscopy, and outperforms other photometric techniques, while also being more general. I further show that a slight alteration to the model, wherein synthetic EMP stars are added to the training set, yields the robust identification of EMP candidates. In particular, this synthetic-oversampling method recovers ∼20% of the EMP stars in the training set, at a precision of ∼0.05. Furthermore, ∼65% of the false positives from the model are very metal-poor stars ([Fe/H]≤-2.0dex). The synthetic-oversampling method is biased toward the discovery of warm (∼F-type) stars, a consequence of the targeting bias from the Sloan Digital Sky Survey/Sloan Extension for Galactic Understanding survey. This EMP selection method represents a significant improvement over alternative broadband optical selection techniques. The models are applied to >12 million stars, with an expected yield of ∼600 new EMP stars, which promises to open new avenues for exploring the early universe. Description: Photometric colors and spectroscopic [Fe/H] measurements for the training set sources are selected from SDSS data release 10 (DR10; Ahn et al. 2014ApJS..211...17A 2014ApJS..211...17A). The selection criteria are designed to select sources with the most reliable photometric and spectroscopic measurements. It is important to note that each of these criteria can be applied to the ∼2.6x108 SDSS stars with no spectroscopic observations, ensuring that these choices do not introduce a significant bias in the final model predictions. See section 2 for further explanations. In addition to building a robust and representative training set, the choice of machine-learning algorithm is essential for the construction of a useful model. Three different algorithms are utilized in this study: the K-nearest Neighbors (KNN) regression, the Random Forest (RF) method and the Suport Vector Machines (SVMs) model. See section 3 for further explanations. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file table3.dat 95 12569529 Final metallicity predictions for field stars -------------------------------------------------------------------------------- See also: V/139 : The SDSS Photometric Catalog, Release 9 (Adelman-McCarthy+, 2012) J/ApJ/807/171 : SkyMapper Survey metal-poor star spectrosc. (Jacobson+, 2015) J/ApJ/798/122 : SEGUE Stellar Parameters Pipeline abundances (Miller+, 2015) J/A+A/568/A7 : Model SDSS colors for halo stars (Allende Prieto+, 2014) J/AJ/147/136 : Stars of very low metal abundance. VI. (Roederer+, 2014) J/AJ/145/13 : Metal-poor stars from SDSS/SEGUE. I. Abundances (Aoki+, 2013) J/ApJS/199/30 : Effective temperatures for KIC stars (Pinsonneault+, 2012) J/MNRAS/414/2602 : Automated classification of HIP variables (Dubath+, 2011) J/AJ/137/4377 : List of SEGUE plate pairs (Yanny+, 2009) J/AJ/136/2070 : SEGUE stellar parameter pipeline. III. (Allende Prieto+, 2008) J/AJ/136/2050 : SEGUE stellar parameter pipeline. II. (Lee+, 2008) J/A+A/484/721 : HES survey. IV. Candidate metal-poor stars (Christlieb+, 2008) J/ApJ/652/1585 : Bright metal-poor stars from HES survey (Frebel+, 2006) J/AJ/103/1987 : Stars of very low metal abundance (Beers+ 1992) J/AJ/90/2089 : Stars of very low metal abundance. I (Beers+, 1985) http://www.sdss3.org/ : SDSS-III home page Byte-by-byte Description of file: table3.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 4 A4 --- --- [SDSS] 6- 24 A19 --- SDSS SDSS object name (JHHMMSS.ss+DDMMSS.s) 26- 44 I19 --- objID Object ID from the SDSS DR10 PhotoObjAll table 46- 47 I2 h RAh Hour of Right Ascension (J2000) 49- 50 I2 min RAm Minute of Right Ascension (J2000) 52- 56 F5.2 s RAs Second of Right Ascension (J2000) 58 A1 --- DE- Sign of the Declination (J2000) 59- 60 I2 deg DEd Degree of Declination (J2000) 62- 63 I2 arcmin DEm Arcminute of Declination (J2000) 65- 68 F4.1 arcsec DEs Arcsecond of Declination (J2000) 70- 73 I4 K Teff [4500/6988] Photometric Teff (1) 75- 80 F6.3 [-] [Fe/H]1 [-2.5/0.5] Photometric [Fe/H] using Support Vector Machine (SVM)-regression model 82- 87 F6.3 [-] [Fe/H]2 [-3.6/0.4] Photometric [Fe/H] using synthetic-oversampling 89- 95 F7.4 --- rho [0.03/28.6] Proximity Measure (ρ); given star to the training set (2) -------------------------------------------------------------------------------- Note (1): After Pinsonneault et al. (2012, Cat. J/ApJS/199/30) Note (2): ρ represents the mean Euclidean distance between a given source and its 60-nearest-training-set neighbors. Sources with large ρ are likely to have unreliable estimates of [Fe/H]. Thresholds on ρ as in table 4: ---------------------------- Percentile ρt ---------------------------- 68 0.0843 90 0.1310 95 0.1705 99 0.3883 99.5 0.5774 99.7 0.7737 ---------------------------- Note: The threshold, ρt, corresponding to the percentage of training set sources with ρ≤ρt. See section 6 for further explanations. -------------------------------------------------------------------------------- History: From electronic version of the journal
(End) Prepared by [AAS], Emmanuelle Perret [CDS] 21-Dec-2015
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line