V/160 SHBoost 2024 (Khalatyan+, 2024)
Transferring spectroscopic stellar labels to 217 million Gaia DR3 XP stars
with SHBoost.
Khalatyan A., Anders F., Chiappini C., Queiroz A.B.A., Nepal S.,
dal Ponte M., Jordi C., Guiglion G., Valentini M., Torralba Elipe G.,
Steinmetz M., Pantaleoni-Gonzalez M., Malhotra S., Jimenez-Arranz O.,
Enke H., Casamiquela L., Ardevol J.
<Astron. Astrophys. 691, A98 (2024)>
=2024A&A...691A..98K 2024A&A...691A..98K (SIMBAD/NED BibCode)
=2024yCat.5160....0K 2024yCat.5160....0K
ADC_Keywords: Milky Way ; Stars, distances ; Effective temperatures ;
Radial velocities ; Space velocities ; Abundances, [Fe/H]
Keywords: catalogs - stars: general - stars: statistics - Galaxy: general -
Galaxy: stellar content - Galaxy: structure
Abstract:
With Gaia Data Release 3 (DR3), new and improved astrometric,
photometric, and spectroscopic measurements for 1.8 billion stars have
become available. Alongside this wealth of new data, however, there
are challenges in finding ecient and accurate computational methods
for their analysis. In this paper, we explore the feasibility of using
machine learning regression as a method of extracting basic stellar
parameters and lineof- sight extinctions from spectro-photometric
data. To this end, we built a stable gradient-boosted random-forest
regressor (xgboost), trained on spectroscopic data, capable of
producing output parameters with reliable uncertainties from Gaia DR3
data (most notably the low-resolution XP spectra), without
ground-based spectroscopic observations. Using Shapley additive
explanations, we interpret how the predictions for each star are
influenced by each data feature. For the training and testing of the
network, we used high-quality parameters obtained from the StarHorse
code for a sample of around eight million stars observed by major
spectroscopic stellar surveys, complemented by curated samples of hot
stars, very metal-poor stars, white dwarfs, and hot sub-dwarfs. The
training data cover the whole sky, all Galactic components, and almost
the full magnitude range of the Gaia DR3 XP sample of more than 217
million objects that also have reported parallaxes. We have achieved
median uncertainties of 0.20mag in V-band extinction, 0.01dex in
logarithmic eective temperature, 0.20dex in surface gravity, 0.18dex
in metallicity, and 12% in mass (over the full Gaia DR3 XP sample,
with considerable variations in precision as a function of magnitude
and stellar type). We succeeded in predicting competitive results
based on Gaia DR3 XP spectra compared to classical isochrone or
spectral-energy distribution fitting methods we employed in earlier
works, especially for parameters AV and Te, along with the metallicity
values. Finally, we showcase some potential applications of this new
catalogue, including extinction maps, metallicity trends in the Milky
Way, and extended maps of young massive stars, metal-poor stars, and
metal-rich stars).
Description:
We use an xgboost regression to produce a catalogue of stellar
properties derived from Gaia DR3 XP spectra, astrometry, and
multi-wavelength photometry. This catalogue, referred to as SHBoost,
comprises the extinction, effective temperature, surface gravity,
[M/H], and mass estimates for more than 217 million stars.
See also:
I/352 : Distances to 1.47 billion stars in Gaia EDR3 (Bailer-Jones+, 2021)
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
shboost.sam 390 1000 Data model of the Gaia DR3 SHBoost catalogue
--------------------------------------------------------------------------------
Byte-by-byte Description of file: shboost.sam
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 19 I19 --- GaiaDR3 Gaia DR3 source_id (source_id)
21- 35 F15.11 deg RAdeg Right ascension (ICRS) at Ep=2016.0 (ra)
37- 51 F15.11 deg DEdeg Declination (ICRS) at Ep=2016 (dec)
53- 61 F9.6 mag AV Line-of-sight extinction at
λ=5420Å, AV, xgboost point
estimate (xgb_av)
63- 71 F9.6 mag AVmean Line-of-sight extinction at
λ=5420Å, AV,
xgboost-distribution mean value
(xgbdistavmean)
73- 83 F11.6 mag s_AVmean Line-of-sight extinction at
λ=5420Å, AV,
xgboost-distribution standard deviation
(xgbdistavstd)
85- 92 F8.6 [K] logTeff Effective temperature, xgboost point
estimate (xgb_logteff)
94-101 F8.6 [K] logTeffmean Effective temperature,
xgboost-distribution mean value
(xgbdistlogteffmean)
103-111 F9.6 [K] s_logTeffmean Effective temperature,
xgboost-distribution standard deviation
(xgbdistlogteffstd)
113-121 F9.6 [cm/s2] logg Surface gravity, xgboost point estimate
(xgb_logg)
123-131 F9.6 [cm/s2] loggmean Surface gravity, xgboost-distribution
mean value (xgbdistloggmean)
133-142 F10.6 [cm/s2] s_loggmean Surface gravity, xgboost-distribution
standard deviation (xgbdistloggstd)
144-152 F9.6 [-] Met Metallicity, xgboost point estimate
(xgb_met)
154-162 F9.6 [-] Metmean Metallicity, xgboost-distribution
mean value (xgbdistmetmean)
164-172 F9.6 [-] s_Metmean Metallicity, xgboost-distribution
standard deviation (xgbdistmetstd)
174-182 F9.6 Msun Mass Stellar mass, xgboost point estimate
(xgb_mass)
184-192 F9.6 Msun Massmean Stellar mass, xgboost-distribution
mean value (xgbdistmassmean)
194-205 E12.6 Msun s_Massmean Stellar mass, xgboost-distribution
standard deviation (xgbdistmassstd)
207-216 F10.6 pc Dist Distance estimate from the literature
(dist)
218-227 F10.6 pc b_Dist ? 16th distance percentile from the
literature (dist_lower)
229-238 F10.6 pc B_Dist ? 84th distance percentile from the
literature (dist_upper)
240 I1 --- f_Dist [0/2] Distance flag (dist_flag) (1)
242-249 F8.5 mag (BP-RP)0 ? Dereddened colour, derived with
gaiaedr3photutils (bprp0)
251-259 F9.5 mag GMAG0 Absolute magnitude, derived with
gaiaedr3photutils (mg0)
261-270 F10.5 kpc Xgal Galactocentric Cartesian X coordinate,
derived from dist and assuming
X0 = -8.2 kpc (xg)
272-281 F10.5 kpc Ygal Galactocentric Cartesian Y coordinate,
derived from dist and assuming
X0 = 0 kpc (yg)
283-292 F10.5 kpc Zgal Cartesian Z coordinate, derived from dist
and assuming Z0 = 0 (zg)
294-302 F9.5 kpc Rgal Galactocentric planar distance,
derived from XGal and YGal (rg)
304-313 F10.4 km/s VX ? Galactic Cartesian velocity in
X direction (vxg)
315-324 F10.4 km/s VY ? Galactic Cartesian velocity in
Y direction (vyg)
326-336 F11.4 km/s VZ ? Galactic Cartesian velocity in
Z direction (vzg)
338-347 F10.4 km/s VR ? Galactic radial velocity (vrg)
349-358 F10.4 km/s Vphi ? Galactic angular velocity (vphig)
360-380 A21 --- InputFlag SHBoost input flag (xgb_inputflag)
382 I1 --- q_AV [0/1] AV output quality flag
(=0 if xgbdistavstd<0.3)
(xgbavoutputflag)
384 I1 --- q_logTeff [0/1] logTeff output quality flag
(=0 if xgbdistlogteffstd<0.1)
(xgblogteffoutputflag)
386 I1 --- q_logg [0/1] logg output quality flag
(=0 if xgbdistloggstd<0.3)
(xgbloggoutputflag)
388 I1 --- q_Met [0/1] [M/H] output quality flag
(=0 if xgbdistmetstd<0.3)
(xgbmetoutputflag)
390 I1 --- q_Mass [0/1] Mass output quality flag (=0 if
xgbdistmassstd/xgbdistmassmean<0.3)
(xgbmassoutputflag)
--------------------------------------------------------------------------------
Note (1): Distance flag as follows:
0 = StarHorse EDR3, Anders et al., 2022A&A...658A..91A 2022A&A...658A..91A, Cat. I/354
1 = Bailer-Jones et al. 2021AJ....161..147B 2021AJ....161..147B, Cat. I/352, photogeo
2 - Bailer-Jones et al. 2021AJ....161..147B 2021AJ....161..147B, Cat. I/352, geo
--------------------------------------------------------------------------------
Acknowledgements:
Friedrich Anders, fanders(at)fqa.ub.edu
(End) Francois-Xavier Pineau, Patricia Vannier [CDS] 07-Oct-2024