J/MNRAS/414/2602 Automated classification of HIP variables (Dubath+, 2011)
Random forest automated supervised classification of Hipparcos periodic variable
stars.
Dubath P., Rimoldini L., Suveges M., Blomme J., Lopez M., Sarro L.M.,
De Ridder J., Cuypers J., Guy L., Lecoeur I., Nienartowicz K., Jan A.,
Beck M., Mowlavi N., De Cat P., Lebzelter T., Eyer L.
<Mon. Not. R. Astron. Soc., 414, 2602-2617 (2011)>
=2011MNRAS.414.2602D 2011MNRAS.414.2602D
ADC_Keywords: Models ; Stars, variable ; MK spectral classification
Keywords: methods: data analysis - methods: statistical -
techniques: photometric - catalogues - stars: variables: general
Abstract:
We present an evaluation of the performance of an automated
classification of the Hipparcos periodic variable stars into 26 types.
The sub-sample with the most reliable variability types available in
the literature is used to train supervised algorithms to characterize
the type dependencies on a number of attributes. The most useful
attributes evaluated with the random forest methodology include, in
decreasing order of importance, the period, the amplitude, the V-I
colour index, the absolute magnitude, the residual around the folded
light-curve model, the magnitude distribution skewness and the
amplitude of the second harmonic of the Fourier series model relative
to that of the fundamental frequency.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
table3.dat 110 1661 The Hipparcos training set star list with
literature types and attribute values
table4.dat 110 882 Results obtained for the Hipparcos stars
excluded from the training set
--------------------------------------------------------------------------------
See also:
I/239 : The Hipparcos and Tycho Catalogues (ESA 1997)
Byte-by-byte Description of file: table[34].dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 6 I6 --- HIP HIP number
8- 15 A8 --- Type Type of variable
17- 23 A7 ---- PType Predicted type (only in table4)
25- 31 F7.4 [d] logP Period extracted with the Lomb-Scargle method
[Log(Period)]
33- 37 F5.2 [---] logA Amplitude of the light-curve model
[Log(Amplitude)]
39- 43 F5.2 mag V-I Mean V-I colour index [V-I]
45- 50 F6.2 mag Mhip Hipparcos absolute magnitude derived from the
parallaxes neglecting interstellar
absorption [MHipparcos] (1)
52- 56 F5.2 --- res/raw Median absolute of the residuals (obtained by
subtracting model values from the raw light
curve) divided by the Median Absolute
Deviation (MAD) of the raw light-curve values
around the median [Scatter:res/raw]
58- 62 F5.2 --- Skew Unbiased weighted skewness of the magnitude
distribution [Skewness]
64- 68 F5.2 [-] log(1+A2/A1) Amplitude ratio between the second harmonic
and the fundamental (plus one, to avoid
negative values) [Log(1+A2/A1)]
70- 74 F5.2 --- P2p/2P Sum of the squares of the magnitude
differences between pairs of successive data
points in the light curve folded around twice
the period divided by the same quantity
derived from the raw light curve
[P2p scatter:2P/raw]
76- 80 F5.2 --- P2p Median of the absolute values of the
differences between successive magnitudes in
the raw light curve normalized by the MAD
around the median [P2p_scatter]
82- 86 F5.2 --- P90 The 90th percentile of the absolute residual
values around the 2P model divided by the
same quantity for the residuals around the P
model. The 2P model is a model recomputed
using twice the period value
[Percentile90:2P/P]
88- 92 F5.2 % Res Mean of the squared residuals around the
model [Residual_scatter]
94- 98 F5.2 rad Phase2 Phase of the second harmonic after setting the
phase of the fundamental to zero [Phase2] (2)
100-104 F5.2 --- P2P/P Median of the absolute values of the
differences between successive magnitudes in
the folded light curve normalized by the MAD
around the median of the raw light curve
[P2p_scatter:P/raw]
106-110 F5.2 --- Slope Sum of the square of the slopes of lines
joining the data points before and after a
number of selected outliers towards faint
magnitude (e.g., data points during eclipses)
[P2p_slope] (3)
--------------------------------------------------------------------------------
Note (1): Because of measurement uncertainties, some stars have negative
parallax values. Each of these values is replaced by a positive value taken
randomly from a Gaussian distribution with zero mean and a standard deviation
equal to the measurement uncertainty. In many cases, the derived absolute
magnitudes represent lower limits as the parallax measurements are not
significant.
Note (2): by an appropriate transformation
Phase2=arctan(sin(φ2-2φ1),cos(φ2-2φ1))
(Debosscher et al., 2007, Cat. J/A+A/475/1159).
Note (3): This is set to zero if there are no such outliers in the light curve.
--------------------------------------------------------------------------------
History:
From electronic version of the journal
(End) Patricia Vannier [CDS] 17-Feb-2012