J/AJ/169/216 Predictions of exoplanet hosts & abundances (Torres-Quijano+, 2025)
Utilizing machine learning to predict host stars and the key elemental
abundances of small planets.
Torres-Quijano A.R., Hinkel N.R., Wheeler C.H.I., Young P.A., Ghezzi L.,
Baldo A.P.
<Astron. J., 169, 216 (2025)>
=2025AJ....169..216T 2025AJ....169..216T
ADC_Keywords: Exoplanets; Abundances; Combined data; Models
Keywords: Computational methods ; Exoplanets ; Mini Neptunes ;
Stellar abundances ; Super Earths
Abstract:
Stars and their associated planets originate from the same cloud of
gas and dust, making a star's elemental composition a valuable
indicator for indirectly studying planetary compositions. While the
connection between a star's iron (Fe) abundance and the presence of
giant exoplanets is established, the relationship with small planets
remains unclear. The elements Mg, Si, and Fe are important in forming
small planets. Employing machine learning algorithms like XGBoost,
trained on the abundances (e.g., the Hypatia Catalog) of known
exoplanet-hosting stars (NASA Exoplanet Archive), allows us to
determine significant "features" (abundances or molar ratios) that may
indicate the presence of small planets. We test on three groups of
exoplanets: (1) all small, RP<3.5R⊕;
(2) sub-Neptunes, 2.0R⊕<RP<3.5R⊕; and
(3) super-Earths, 1.0R⊕<RP<2.0R⊕ - each subdivided
into seven ensembles to test different combinations of features. We
created a list of stars with ≥90% probability of hosting small
planets across all ensembles and experiments ("overlap stars"). We
found abundance trends for stars hosting small planets, possibly
indicating star-planet chemical interplay during formation. We also
found that Na and V are key features regardless of planetary radii. We
expect our results to underscore the importance of elements in
exoplanet formation and machine learning's role in target selection
for future NASA missions, e.g., the James Webb Space Telescope, the
Nancy Grace Roman Space Telescope, and the Habitable Worlds
Observatory-all of which are aimed at small-planet detection.
Description:
We obtain the stellar elemental abundance data for this study from the
Hypatia Catalog (Hinkel+2014, J/AJ/148/54). It contains the abundance
data for more than 100 elements and species for >11,000 FGKM-type
stars within 500pc of the Sun. Of these stars, >1400 are known
exoplanet hosts. We required a sufficiently large sample of stars
(>200) with recorded stellar abundances as per Hinkel+2019
(J/ApJ/880/49) to create training and prediction data sets. We
additionally required that each star have at least 50% of its relevant
abundance values be measured (as opposed to missing or "null"
values). After removing those stars with <50% recorded abundances
(which included all M-stars), we obtained a data set with 10,178 stars.
We focus on the abundances for 16 elements: C, O, Na, Mg, Al, Si, Ca,
Sc, Ti, V, Cr, Mn, Fe, Co, Ni, and Y. The elements chosen for our
analysis fall within the lithophile or siderophile groups. Lithophiles
(Na, Mg, Al, Si, Ca, Sc, Ti, V, Mn, Y) are elements that bond with
oxygen and form oxidized minerals, while siderophiles (Cr, Co, Ni)
commonly alloy with Fe. The volatile elements (C and O) were also
included based on the potential for a planet to possess volatile
envelopes. See Section 2.1.
We obtain small-planet data from the NASA Exoplanet Archive which
currently contains information for >5500 confirmed exoplanets. For our
analysis, we remove any planets that have radii RP>3.5R⊕.
Furthermore, if the host star has multiple known planetary companions,
we choose to keep the largest planet in the system (below the radius
cutoff). We then cross-match these small-planet hosts to the stars
available in the Hypatia Catalog to ensure that they have stellar
abundances. We separate our study into three experiments based on
planetary radii, as per Bergsten+ (2022AJ....164..190B 2022AJ....164..190B), where a
minimum of 200 planets are necessary to successfully train the
algorithm (see Section 2.2).
Experiment 1: Small planets (479 planets with RP<3.5R⊕);
stellar host distribution: 79 F-types, 294 G-types, 106 K-types.
Experiment 2: Sub-Neptunes (219 planets with
2.0R⊕<RP<3.5R⊕); stellar host distribution: 28 F-types,
131 G-types, 60 K-types.
Experiment 3: Super-Earths (211 planets with
1.0R⊕<RP<2.0R⊕); stellar host distribution: 42 F-types,
135 G-types, 34 K-types.
We further separate each of our three experiments into seven ensembles
which are different combinations of either element abundance values
expressed in dex notation or molar ratios (see Section 2.3). In order
for XGBoost to make predictions on a given star, the star must have
values recorded for all of the elements in the abundances or molar
ratios within the ensemble. We note that Experiment 3 is an exception,
where null abundances were included (see Section 3.3).
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
stars.dat 30 10178 List of stars used in this study
table1.dat 59 7 List of each tested ensemble
table2.dat 339 30534 *Data from this study
--------------------------------------------------------------------------------
Note on table2.dat: The abundances and molar ratios were pulled October 2023
from the Hypatia Catalog (Hinkel+2014, J/AJ/148/54,
http://www.hypatiacatalog.com/)
--------------------------------------------------------------------------------
See also:
I/239 : The Hipparcos and Tycho Catalogues (ESA 1997)
I/311 : Hipparcos, the New Reduction (van Leeuwen, 2007)
I/337 : Gaia DR1 (Gaia Collaboration, 2016)
I/345 : Gaia DR2 (Gaia Collaboration, 2018)
I/355 : Gaia DR3 Part 1. Main source (Gaia Collaboration, 2022)
II/246 : 2MASS All-Sky Catalog of Point Sources (Cutri+ 2003)
J/A+A/410/527 : Abundances in the Galactic disk (Bensby+, 2003)
J/ApJ/622/1102 : The planet-metallicity correlation. (Fischer+, 2005)
J/ApJ/715/1050 : Predicted abundances for extrasolar planets. I. (Bond+, 2010)
J/AJ/148/54 : The Hypatia Catalog (Hinkel+, 2014)
J/ApJ/807/45 : Potential. habitable planets orbiting M dwarfs (Dressing+, 2015)
J/AJ/154/109 : California-Kepler Surv. (CKS). III. Planet radii (Fulton+, 2017)
J/ApJ/880/49 : Predictions of giant exoplanet host star's (Hinkel+, 2019)
J/ApJ/875/29 : Spectroscopic analysis of the CKS sample. I. (Martinez+, 2019)
J/AJ/165/125 : Planetary Orbit Eccentricity Trends (POET). I. (An+, 2023)
J/ApJ/946/61 : Hydra sp. obs. of K2 planet-host stars (Loaiza-Tacuri+, 2023)
http://www.hypatiacatalog.com/ : Hypatia Catalog
http://exoplanetarchive.ipac.caltech.edu/ : NASA Exoplanet Archive
Byte-by-byte Description of file: stars.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 28 A28 --- Star Stellar ID based on associated catalog
30 A1 --- fPl [*] * Indicates the star has an orbiting planet
--------------------------------------------------------------------------------
Byte-by-byte Description of file: table1.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1 I1 --- Ens [1/7] Ensemble number
3- 44 A42 --- Features Feature tested (1)
46- 49 I4 --- nS1 [2486/4767] Number of stars available for
prediction in Experiment 1 (2)
51- 54 I4 --- nS2 [2486/4767] Number of stars available for
prediction in Experiment 2 (2)
56- 59 I4 --- nS3 [9698] Number of stars available for
prediction in Experiment 3 (2)
--------------------------------------------------------------------------------
Note (1): We note that volatiles here include C and O;
lithophiles include Na, Mg, Al, Si, Ca, Sc, Ti, V, Mn, and Y; and
siderophiles are Cr, Co, and Ni. See Section 2 for more details.
Note (2): In order for XGBoost to make predictions on a given star, the star
must have values recorded for all of the elements in the abundances or molar
ratios within the ensemble. We note that Experiment 3 is an exception, where
null abundances were included. The experiments are as follows:
Experiment 1: Small planets (RP<3.5R⊕)
Experiment 2: Sub-Neptunes (2.0R⊕<RP<3.5R⊕)
Experiment 3: Super-Earths (1.0R⊕<RP<2.0R⊕)
--------------------------------------------------------------------------------
Byte-by-byte Description of file: table2.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 28 A28 --- Star Stellar ID based on associated catalog
30 I1 --- Exp [1/3] Experiment number for the star (1)
32- 34 A3 --- fOvlap Is the star an "overlap star"? (2)
36- 40 F5.2 [-] [Fe/H] [-5.29/1.05] The [Fe/H] abundance in dex
42- 46 F5.2 [-] [C/H] [-2.94/7.68]? The [C/H] abundance in dex
48- 52 F5.2 [-] [O/H] [-2.57/1.72]? The [O/H] abundance in dex
54- 58 F5.2 [-] [Na/H] [-3.14/1.19]? The [Na/H] abundance in dex
60- 64 F5.2 [-] [Mg/H] [-2.83/0.83]? The [Mg/H] abundance in dex
66- 70 F5.2 [-] [Al/H] [-4.03/1.12]? The [Al/H] abundance in dex
72- 76 F5.2 [-] [Si/H] [-4.54/1.2]? The [Si/H] abundance in dex
78- 82 F5.2 [-] [Ca/H] [-3.37/1.04]? The [Ca/H] abundance in dex
84- 88 F5.2 [-] [Sc/H] [-2.88/2.0]? The [Sc/H] abundance in dex
90- 94 F5.2 [-] [Ti/H] [-7.11/1.1]? The [Ti/H] abundance in dex
96- 100 F5.2 [-] [V/H] [-3.36/1.51]? The [V/H] abundance in dex
102- 106 F5.2 [-] [Cr/H] [-5.97/1.5]? The [Cr/H] abundance in dex
108- 112 F5.2 [-] [Mn/H] [-3.56/0.88]? The [Mn/H] abundance in dex
114- 118 F5.2 [-] [Co/H] [-3.11/1.57]? The [Co/H] abundance in dex
120- 124 F5.2 [-] [Ni/H] [-3.12/0.79]? The [Ni/H] abundance in dex
126- 130 F5.2 [-] [Y/H] [-3.32/2.33]? The [Y/H] abundance in dex
132- 139 F8.4 --- C/Mg [0.13/446.69]? Molar ratio of C/Mg
141- 150 F10.4 --- O/Mg [0.72/45709]? Molar ratio of O/Mg
152- 158 F7.4 --- Si/Mg [4e-3/26.31]? Molar ratio of Si/Mg
160- 165 F6.4 --- Ca/Mg [1e-3/0.7943]? Molar ratio of Ca/Mg
167- 172 F6.4 --- Ti/Mg [0/0.31]? Molar ratio of Ti/Mg
174- 179 F6.4 --- Fe/Mg [0/4.58]? Molar ratio of Fe/Mg
181- 188 F8.4 --- C/Si [0.01/691.84]? Molar ratio of C/Si
190- 199 F10.4 --- O/Si [0.58/37153.53]? Molar ratio of O/Si
201- 208 F8.4 --- Mg/Si [0.03/229.09]? Molar ratio of Mg/Si
210- 218 F9.4 --- Ca/Si [2e-4/1905.47]? Molar ratio of Ca/Si
220- 225 F6.4 --- Ti/Si [0/0.05]? Molar ratio of Ti/Si
227- 234 F8.4 --- Fe/Si [0/251.19]? Molar ratio of Fe/Si
236- 241 F6.4 --- C/O [3e-4/6.46]? Molar ratio of C/O
243- 248 F6.4 --- Si/O [0/1.7]? Molar ratio of Si/O
250- 255 F6.4 --- Mg/O [0/1.39]? Molar ratio of Mg/O
257- 262 F6.4 --- Ca/O [0/0.07]? Molar ratio of Ca/O
264- 269 F6.4 --- Ti/O [0/0.002]? Molar ratio of Ti/O
271- 276 F6.4 --- Fe/O [0/0.42]? Molar ratio of Fe/O
278 I1 --- fPl [0/1] Indicates whether star has an orbiting
planet (3)
280 A1 --- Pl Planet ID, largest planet in system (4)
282 I1 --- nPl [1/5]? Number of planets in system (4)
284- 288 F5.3 Rgeo RadPl [0.58/3.5]? Planet radius (4)
290 I1 --- Disk [1/2]? Location of star in the Galactic disk (5)
292- 297 F6.4 --- Prob1 [3e-4/1]? Ensemble 1 probability of hosting a
small planet (6)
299- 304 F6.4 --- Prob2 [3e-4/1]? Ensemble 2 probability of hosting a
small planet (6)
306- 311 F6.4 --- Prob3 [3e-4/1]? Ensemble 3 probability of hosting a
small planet (6)
313- 318 F6.4 --- Prob4 [3e-4/1]? Ensemble 4 probability of hosting a
small planet (6)
320- 325 F6.4 --- Prob5 [3e-4/1]? Ensemble 5 probability of hosting a
small planet (6)
327- 332 F6.4 --- Prob6 [3e-4/1]? Ensemble 6 probability of hosting a
small planet (6)
334- 339 F6.4 --- Prob7 [3e-4/1]? Ensemble 7 probability of hosting a
small planet (6)
--------------------------------------------------------------------------------
Note (1): Stars that were trained and predicted upon for each experiment (1, 2,
or 3) are indicated by the Experiment value (see Section 2.2).
Experiment number as follows:
1 = Small Planet (RP<3.5R⊕); 10178 occurrences
2 = Sub-Neptune (2.0R⊕<RP<3.5R⊕); 10178 occurrences
3 = Super-Earth (1.0R⊕<RP<2.0R⊕); 10178 occurrences
Note (2): Overlap flag as follows:
Yes = "Overlap star", i.e. star with ≥90% probability of hosting a small
planet across all ensembles in this experiment. See Section 4.3.
(432 occurrences)
No = Not an overlap star (30102 occurrences)
Note (3): Flag as follows:
1 = Star has an orbiting planet (1440 occurrences)
0 = Star does not have an orbiting planet (29094 occurrences)
Note (4): Value provided per the NASA Exoplanet Archive
(http://exoplanetarchive.ipac.caltech.edu/)
Note (5): Location in the Galactic disk as follows:
1 = thin disk (18453 occurrences)
2 = thick disk (1905 occurrences)
0 = N/A (10176 occurrences)
Note (6): In order for XGBoost to make predictions on a given star, the star
must have values recorded for all of the elements in the abundances
or molar ratios within the ensemble. We note that Experiment 3 is an
exception, where null abundances were included.
See Table 1 for the list of tested ensembles (J/AJ/169/216/table1).
--------------------------------------------------------------------------------
History:
From electronic version of the journal
(End) Prepared by [AAS], Robin Leichtnam [CDS] 14-Jan-2026