J/MNRAS/414/2602    Automated classification of HIP variables    (Dubath+, 2011)
Random forest automated supervised classification of Hipparcos periodic variable
stars.
    Dubath P., Rimoldini L., Suveges M., Blomme J., Lopez M., Sarro L.M.,
    De Ridder J., Cuypers J., Guy L., Lecoeur I., Nienartowicz K., Jan A.,
    Beck M., Mowlavi N., De Cat P., Lebzelter T., Eyer L.
   <Mon. Not. R. Astron. Soc., 414, 2602-2617 (2011)>
   =2011MNRAS.414.2602D 2011MNRAS.414.2602D
ADC_Keywords: Models ; Stars, variable ; MK spectral classification
Keywords: methods: data analysis - methods: statistical -
          techniques: photometric - catalogues - stars: variables: general
Abstract:
    We present an evaluation of the performance of an automated
    classification of the Hipparcos periodic variable stars into 26 types.
    The sub-sample with the most reliable variability types available in
    the literature is used to train supervised algorithms to characterize
    the type dependencies on a number of attributes. The most useful
    attributes evaluated with the random forest methodology include, in
    decreasing order of importance, the period, the amplitude, the V-I
    colour index, the absolute magnitude, the residual around the folded
    light-curve model, the magnitude distribution skewness and the
    amplitude of the second harmonic of the Fourier series model relative
    to that of the fundamental frequency.
File Summary:
--------------------------------------------------------------------------------
 FileName   Lrecl  Records   Explanations
--------------------------------------------------------------------------------
ReadMe         80        .   This file
table3.dat    110     1661   The Hipparcos training set star list with
                              literature types and attribute values
table4.dat    110      882   Results obtained for the Hipparcos stars
                              excluded from the training set
--------------------------------------------------------------------------------
See also:
   I/239 : The Hipparcos and Tycho Catalogues (ESA 1997)
Byte-by-byte Description of file: table[34].dat
--------------------------------------------------------------------------------
   Bytes Format Units   Label     Explanations
--------------------------------------------------------------------------------
   1-  6  I6    ---     HIP       HIP number
   8- 15  A8    ---     Type      Type of variable
  17- 23  A7    ----    PType     Predicted type (only in table4)
  25- 31  F7.4  [d]     logP      Period extracted with the Lomb-Scargle method
                                   [Log(Period)]
  33- 37  F5.2 [---]    logA      Amplitude of the light-curve model
                                   [Log(Amplitude)]
  39- 43  F5.2  mag     V-I       Mean V-I colour index [V-I]
  45- 50  F6.2  mag     Mhip      Hipparcos absolute magnitude derived from the
                                   parallaxes neglecting interstellar
                                   absorption [MHipparcos] (1)
  52- 56  F5.2  ---     res/raw   Median absolute of the residuals (obtained by
                                   subtracting model values from the raw light
                                   curve) divided by the Median Absolute
                                   Deviation (MAD) of the raw light-curve values
                                   around the median [Scatter:res/raw]
  58- 62  F5.2  ---     Skew      Unbiased weighted skewness of the magnitude
                                   distribution [Skewness]
  64- 68  F5.2  [-] log(1+A2/A1)  Amplitude ratio between the second harmonic
                                   and the fundamental (plus one, to avoid
                                   negative values) [Log(1+A2/A1)]
  70- 74  F5.2  ---     P2p/2P    Sum of the squares of the magnitude
                                   differences between pairs of successive data
                                   points in the light curve folded around twice
                                   the period divided by the same quantity
                                   derived from the raw light curve
                                   [P2p scatter:2P/raw]
  76- 80  F5.2  ---     P2p       Median of the absolute values of the
                                   differences between successive magnitudes in
                                   the raw light curve normalized by the MAD
                                   around the median [P2p_scatter]
  82- 86  F5.2  ---     P90       The 90th percentile of the absolute residual
                                   values around the 2P model divided by the
                                   same quantity for the residuals around the P
                                   model. The 2P model is a model recomputed
                                   using twice the period value
                                   [Percentile90:2P/P]
  88- 92  F5.2  %       Res       Mean of the squared residuals around the
                                   model [Residual_scatter]
  94- 98  F5.2  rad     Phase2    Phase of the second harmonic after setting the
                                   phase of the fundamental to zero [Phase2] (2)
 100-104  F5.2  ---     P2P/P     Median of the absolute values of the
                                   differences between successive magnitudes in
                                   the folded light curve normalized by the MAD
                                   around the median of the raw light curve
                                   [P2p_scatter:P/raw]
 106-110  F5.2  ---     Slope     Sum of the square of the slopes of lines
                                   joining the data points before and after a
                                   number of selected outliers towards faint
                                   magnitude (e.g., data points during eclipses)
                                   [P2p_slope] (3)
--------------------------------------------------------------------------------
Note (1): Because of measurement uncertainties, some stars have negative
  parallax values. Each of these values is replaced by a positive value taken
  randomly from a Gaussian distribution with zero mean and a standard deviation
  equal to the measurement uncertainty. In many cases, the derived absolute
  magnitudes represent lower limits as the parallax measurements are not
  significant.
Note (2): by an appropriate transformation
  Phase2=arctan(sin(φ2-2φ1),cos(φ2-2φ1)) 
  (Debosscher et al., 2007, Cat. J/A+A/475/1159).
Note (3): This is set to zero if there are no such outliers in the light curve.
--------------------------------------------------------------------------------
History:
    From electronic version of the journal
(End)                                      Patricia Vannier [CDS]    17-Feb-2012