J/AJ/169/216 Predictions of exoplanet hosts & abundances (Torres-Quijano+, 2025)

Utilizing machine learning to predict host stars and the key elemental abundances of small planets. Torres-Quijano A.R., Hinkel N.R., Wheeler C.H.I., Young P.A., Ghezzi L., Baldo A.P. <Astron. J., 169, 216 (2025)> =2025AJ....169..216T 2025AJ....169..216T
ADC_Keywords: Exoplanets; Abundances; Combined data; Models Keywords: Computational methods ; Exoplanets ; Mini Neptunes ; Stellar abundances ; Super Earths Abstract: Stars and their associated planets originate from the same cloud of gas and dust, making a star's elemental composition a valuable indicator for indirectly studying planetary compositions. While the connection between a star's iron (Fe) abundance and the presence of giant exoplanets is established, the relationship with small planets remains unclear. The elements Mg, Si, and Fe are important in forming small planets. Employing machine learning algorithms like XGBoost, trained on the abundances (e.g., the Hypatia Catalog) of known exoplanet-hosting stars (NASA Exoplanet Archive), allows us to determine significant "features" (abundances or molar ratios) that may indicate the presence of small planets. We test on three groups of exoplanets: (1) all small, RP<3.5R; (2) sub-Neptunes, 2.0R<RP<3.5R; and (3) super-Earths, 1.0R<RP<2.0R - each subdivided into seven ensembles to test different combinations of features. We created a list of stars with ≥90% probability of hosting small planets across all ensembles and experiments ("overlap stars"). We found abundance trends for stars hosting small planets, possibly indicating star-planet chemical interplay during formation. We also found that Na and V are key features regardless of planetary radii. We expect our results to underscore the importance of elements in exoplanet formation and machine learning's role in target selection for future NASA missions, e.g., the James Webb Space Telescope, the Nancy Grace Roman Space Telescope, and the Habitable Worlds Observatory-all of which are aimed at small-planet detection. Description: We obtain the stellar elemental abundance data for this study from the Hypatia Catalog (Hinkel+2014, J/AJ/148/54). It contains the abundance data for more than 100 elements and species for >11,000 FGKM-type stars within 500pc of the Sun. Of these stars, >1400 are known exoplanet hosts. We required a sufficiently large sample of stars (>200) with recorded stellar abundances as per Hinkel+2019 (J/ApJ/880/49) to create training and prediction data sets. We additionally required that each star have at least 50% of its relevant abundance values be measured (as opposed to missing or "null" values). After removing those stars with <50% recorded abundances (which included all M-stars), we obtained a data set with 10,178 stars. We focus on the abundances for 16 elements: C, O, Na, Mg, Al, Si, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, and Y. The elements chosen for our analysis fall within the lithophile or siderophile groups. Lithophiles (Na, Mg, Al, Si, Ca, Sc, Ti, V, Mn, Y) are elements that bond with oxygen and form oxidized minerals, while siderophiles (Cr, Co, Ni) commonly alloy with Fe. The volatile elements (C and O) were also included based on the potential for a planet to possess volatile envelopes. See Section 2.1. We obtain small-planet data from the NASA Exoplanet Archive which currently contains information for >5500 confirmed exoplanets. For our analysis, we remove any planets that have radii RP>3.5R. Furthermore, if the host star has multiple known planetary companions, we choose to keep the largest planet in the system (below the radius cutoff). We then cross-match these small-planet hosts to the stars available in the Hypatia Catalog to ensure that they have stellar abundances. We separate our study into three experiments based on planetary radii, as per Bergsten+ (2022AJ....164..190B 2022AJ....164..190B), where a minimum of 200 planets are necessary to successfully train the algorithm (see Section 2.2). Experiment 1: Small planets (479 planets with RP<3.5R); stellar host distribution: 79 F-types, 294 G-types, 106 K-types. Experiment 2: Sub-Neptunes (219 planets with 2.0R<RP<3.5R); stellar host distribution: 28 F-types, 131 G-types, 60 K-types. Experiment 3: Super-Earths (211 planets with 1.0R<RP<2.0R); stellar host distribution: 42 F-types, 135 G-types, 34 K-types. We further separate each of our three experiments into seven ensembles which are different combinations of either element abundance values expressed in dex notation or molar ratios (see Section 2.3). In order for XGBoost to make predictions on a given star, the star must have values recorded for all of the elements in the abundances or molar ratios within the ensemble. We note that Experiment 3 is an exception, where null abundances were included (see Section 3.3). File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file stars.dat 30 10178 List of stars used in this study table1.dat 59 7 List of each tested ensemble table2.dat 339 30534 *Data from this study -------------------------------------------------------------------------------- Note on table2.dat: The abundances and molar ratios were pulled October 2023 from the Hypatia Catalog (Hinkel+2014, J/AJ/148/54, http://www.hypatiacatalog.com/) -------------------------------------------------------------------------------- See also: I/239 : The Hipparcos and Tycho Catalogues (ESA 1997) I/311 : Hipparcos, the New Reduction (van Leeuwen, 2007) I/337 : Gaia DR1 (Gaia Collaboration, 2016) I/345 : Gaia DR2 (Gaia Collaboration, 2018) I/355 : Gaia DR3 Part 1. Main source (Gaia Collaboration, 2022) II/246 : 2MASS All-Sky Catalog of Point Sources (Cutri+ 2003) J/A+A/410/527 : Abundances in the Galactic disk (Bensby+, 2003) J/ApJ/622/1102 : The planet-metallicity correlation. (Fischer+, 2005) J/ApJ/715/1050 : Predicted abundances for extrasolar planets. I. (Bond+, 2010) J/AJ/148/54 : The Hypatia Catalog (Hinkel+, 2014) J/ApJ/807/45 : Potential. habitable planets orbiting M dwarfs (Dressing+, 2015) J/AJ/154/109 : California-Kepler Surv. (CKS). III. Planet radii (Fulton+, 2017) J/ApJ/880/49 : Predictions of giant exoplanet host star's (Hinkel+, 2019) J/ApJ/875/29 : Spectroscopic analysis of the CKS sample. I. (Martinez+, 2019) J/AJ/165/125 : Planetary Orbit Eccentricity Trends (POET). I. (An+, 2023) J/ApJ/946/61 : Hydra sp. obs. of K2 planet-host stars (Loaiza-Tacuri+, 2023) http://www.hypatiacatalog.com/ : Hypatia Catalog http://exoplanetarchive.ipac.caltech.edu/ : NASA Exoplanet Archive Byte-by-byte Description of file: stars.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 28 A28 --- Star Stellar ID based on associated catalog 30 A1 --- fPl [*] * Indicates the star has an orbiting planet -------------------------------------------------------------------------------- Byte-by-byte Description of file: table1.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1 I1 --- Ens [1/7] Ensemble number 3- 44 A42 --- Features Feature tested (1) 46- 49 I4 --- nS1 [2486/4767] Number of stars available for prediction in Experiment 1 (2) 51- 54 I4 --- nS2 [2486/4767] Number of stars available for prediction in Experiment 2 (2) 56- 59 I4 --- nS3 [9698] Number of stars available for prediction in Experiment 3 (2) -------------------------------------------------------------------------------- Note (1): We note that volatiles here include C and O; lithophiles include Na, Mg, Al, Si, Ca, Sc, Ti, V, Mn, and Y; and siderophiles are Cr, Co, and Ni. See Section 2 for more details. Note (2): In order for XGBoost to make predictions on a given star, the star must have values recorded for all of the elements in the abundances or molar ratios within the ensemble. We note that Experiment 3 is an exception, where null abundances were included. The experiments are as follows: Experiment 1: Small planets (RP<3.5R) Experiment 2: Sub-Neptunes (2.0R<RP<3.5R) Experiment 3: Super-Earths (1.0R<RP<2.0R) -------------------------------------------------------------------------------- Byte-by-byte Description of file: table2.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 28 A28 --- Star Stellar ID based on associated catalog 30 I1 --- Exp [1/3] Experiment number for the star (1) 32- 34 A3 --- fOvlap Is the star an "overlap star"? (2) 36- 40 F5.2 [-] [Fe/H] [-5.29/1.05] The [Fe/H] abundance in dex 42- 46 F5.2 [-] [C/H] [-2.94/7.68]? The [C/H] abundance in dex 48- 52 F5.2 [-] [O/H] [-2.57/1.72]? The [O/H] abundance in dex 54- 58 F5.2 [-] [Na/H] [-3.14/1.19]? The [Na/H] abundance in dex 60- 64 F5.2 [-] [Mg/H] [-2.83/0.83]? The [Mg/H] abundance in dex 66- 70 F5.2 [-] [Al/H] [-4.03/1.12]? The [Al/H] abundance in dex 72- 76 F5.2 [-] [Si/H] [-4.54/1.2]? The [Si/H] abundance in dex 78- 82 F5.2 [-] [Ca/H] [-3.37/1.04]? The [Ca/H] abundance in dex 84- 88 F5.2 [-] [Sc/H] [-2.88/2.0]? The [Sc/H] abundance in dex 90- 94 F5.2 [-] [Ti/H] [-7.11/1.1]? The [Ti/H] abundance in dex 96- 100 F5.2 [-] [V/H] [-3.36/1.51]? The [V/H] abundance in dex 102- 106 F5.2 [-] [Cr/H] [-5.97/1.5]? The [Cr/H] abundance in dex 108- 112 F5.2 [-] [Mn/H] [-3.56/0.88]? The [Mn/H] abundance in dex 114- 118 F5.2 [-] [Co/H] [-3.11/1.57]? The [Co/H] abundance in dex 120- 124 F5.2 [-] [Ni/H] [-3.12/0.79]? The [Ni/H] abundance in dex 126- 130 F5.2 [-] [Y/H] [-3.32/2.33]? The [Y/H] abundance in dex 132- 139 F8.4 --- C/Mg [0.13/446.69]? Molar ratio of C/Mg 141- 150 F10.4 --- O/Mg [0.72/45709]? Molar ratio of O/Mg 152- 158 F7.4 --- Si/Mg [4e-3/26.31]? Molar ratio of Si/Mg 160- 165 F6.4 --- Ca/Mg [1e-3/0.7943]? Molar ratio of Ca/Mg 167- 172 F6.4 --- Ti/Mg [0/0.31]? Molar ratio of Ti/Mg 174- 179 F6.4 --- Fe/Mg [0/4.58]? Molar ratio of Fe/Mg 181- 188 F8.4 --- C/Si [0.01/691.84]? Molar ratio of C/Si 190- 199 F10.4 --- O/Si [0.58/37153.53]? Molar ratio of O/Si 201- 208 F8.4 --- Mg/Si [0.03/229.09]? Molar ratio of Mg/Si 210- 218 F9.4 --- Ca/Si [2e-4/1905.47]? Molar ratio of Ca/Si 220- 225 F6.4 --- Ti/Si [0/0.05]? Molar ratio of Ti/Si 227- 234 F8.4 --- Fe/Si [0/251.19]? Molar ratio of Fe/Si 236- 241 F6.4 --- C/O [3e-4/6.46]? Molar ratio of C/O 243- 248 F6.4 --- Si/O [0/1.7]? Molar ratio of Si/O 250- 255 F6.4 --- Mg/O [0/1.39]? Molar ratio of Mg/O 257- 262 F6.4 --- Ca/O [0/0.07]? Molar ratio of Ca/O 264- 269 F6.4 --- Ti/O [0/0.002]? Molar ratio of Ti/O 271- 276 F6.4 --- Fe/O [0/0.42]? Molar ratio of Fe/O 278 I1 --- fPl [0/1] Indicates whether star has an orbiting planet (3) 280 A1 --- Pl Planet ID, largest planet in system (4) 282 I1 --- nPl [1/5]? Number of planets in system (4) 284- 288 F5.3 Rgeo RadPl [0.58/3.5]? Planet radius (4) 290 I1 --- Disk [1/2]? Location of star in the Galactic disk (5) 292- 297 F6.4 --- Prob1 [3e-4/1]? Ensemble 1 probability of hosting a small planet (6) 299- 304 F6.4 --- Prob2 [3e-4/1]? Ensemble 2 probability of hosting a small planet (6) 306- 311 F6.4 --- Prob3 [3e-4/1]? Ensemble 3 probability of hosting a small planet (6) 313- 318 F6.4 --- Prob4 [3e-4/1]? Ensemble 4 probability of hosting a small planet (6) 320- 325 F6.4 --- Prob5 [3e-4/1]? Ensemble 5 probability of hosting a small planet (6) 327- 332 F6.4 --- Prob6 [3e-4/1]? Ensemble 6 probability of hosting a small planet (6) 334- 339 F6.4 --- Prob7 [3e-4/1]? Ensemble 7 probability of hosting a small planet (6) -------------------------------------------------------------------------------- Note (1): Stars that were trained and predicted upon for each experiment (1, 2, or 3) are indicated by the Experiment value (see Section 2.2). Experiment number as follows: 1 = Small Planet (RP<3.5R); 10178 occurrences 2 = Sub-Neptune (2.0R<RP<3.5R); 10178 occurrences 3 = Super-Earth (1.0R<RP<2.0R); 10178 occurrences Note (2): Overlap flag as follows: Yes = "Overlap star", i.e. star with ≥90% probability of hosting a small planet across all ensembles in this experiment. See Section 4.3. (432 occurrences) No = Not an overlap star (30102 occurrences) Note (3): Flag as follows: 1 = Star has an orbiting planet (1440 occurrences) 0 = Star does not have an orbiting planet (29094 occurrences) Note (4): Value provided per the NASA Exoplanet Archive (http://exoplanetarchive.ipac.caltech.edu/) Note (5): Location in the Galactic disk as follows: 1 = thin disk (18453 occurrences) 2 = thick disk (1905 occurrences) 0 = N/A (10176 occurrences) Note (6): In order for XGBoost to make predictions on a given star, the star must have values recorded for all of the elements in the abundances or molar ratios within the ensemble. We note that Experiment 3 is an exception, where null abundances were included. See Table 1 for the list of tested ensembles (J/AJ/169/216/table1). -------------------------------------------------------------------------------- History: From electronic version of the journal
(End) Prepared by [AAS], Robin Leichtnam [CDS] 14-Jan-2026
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line