J/MNRAS/427/2917    Classification of Hipparcos variables    (Rimoldini+, 2012)
Automated classification of Hipparcos unsolved variables.
    Rimoldini L., Dubath P., Suveges M., Lopez M., Sarro L.M., Blomme J.,
    De Ridder J., Cuypers J., Guy L., Mowlavi N., Lecoeur-Taibi I., Beck M.,
    Jan A., Nienartowicz K., Ordonez-Blanco D., Lebzelter T., Eyer L.
   <Mon. Not. R. Astron. Soc. 427, 2917 (2012)>
   =2012MNRAS.427.2917R 2012MNRAS.427.2917R
ADC_Keywords: Models ; Stars, variable ; Photometry, classification
Keywords: methods: data analysis - catalogues - stars: variables: general
Description:
    The Hipparcos catalogue (ESA 1997, Cat. I/239) and the AAVSO Variable
    Star Index (Watson et al., 2011, Cat. B/vsx) are employed to
    complement the training set of periodic variables of Dubath et al.
    (2011, Cat. J/MNRAS/414/2602) with irregular and non-periodic
    representatives, leading to 3881 sources in total which described 24
    variability types. The attributes employed to characterize light-curve
    features are selected according to their relevance for classification.
    Classifier models are produced with random forests and a multi-stage
    methodology based on Bayesian networks, achieving overall
    misclassification rates under 12%. Both classifiers are applied to
    predict variability types for 6051 Hipparcos variables associated
    with uncertain or missing types in the literature.
File Summary:
--------------------------------------------------------------------------------
 FileName   Lrecl  Records   Explanations
--------------------------------------------------------------------------------
ReadMe         80        .   This file
table2.dat    111     3881   Training set of Hipparcos variable stars
table4.dat    101     6051   Prediction set of Hipparcos unsolved variables
table5.dat     68     6051   Predictions of variability types
tablec1.dat   176     6051   Full random forest prediction probability arrays
tablec2.dat   176     6051   Full multi-stage Bayesian nets prediction
                             probability arrays
ori.tar       512     7080   Original files
--------------------------------------------------------------------------------
See also:
          I/239 : The Hipparcos and Tycho Catalogues (ESA 1997)
          I/311 : Hipparcos, the New Reduction (van Leeuwen, 2007)
          B/vsx : AAVSO International Variable Star Index VSX (Watson+, 2006-12)
 J/MNRAS/414/2602 : HIP variable automated classification (Dubath+, 2011)
Byte-by-byte Description of file: table[24].dat
--------------------------------------------------------------------------------
  Bytes Format Units  Label     Explanations
--------------------------------------------------------------------------------
   1-  6  I6    ---    HIP       [1/120404] Hipparcos number
   8- 12  F5.2  mag    V-I       Reddened V-I colour index in Cousins' system,
                                  as provided by ESA (1997) (1)
  14- 19  F6.2  ---    Skew      Unbiased skewness of the distribution of HIP
                                  magnitudes (Skewness) (2)
  21- 25  F5.2  [mag]  logAmp    Decadic logarithm of the difference between
                                  the faintest and the brightest values of the
                                  light-curve model (LogAmplitude) (3)
  27- 33  F7.4  [d]    logPer    Decadic logarithm of the period (LogPeriod) (4)
  35- 40  F6.2  mag    MAG       Absolute magnitude in the Hipparcos band
                                  (AbsoluteMag) (5)
  42- 48  F7.2  [-]    logFAP    [,0] Decadic logarithm of the probability that
                                  the maximum peak in the Lomb-Scargle
                                  periodogram (Scargle 1982ApJ...263..835S 1982ApJ...263..835S) is
                                  due to noise rather than the true signal (6)
  50- 54  F5.2  [-]    logP2P    Decadic logarithm of the point-to-point
                                  scatter of the time series
                                  (LogP2PscatterFoldedRaw) (7)
  56- 60  F5.2  [-]    logQSOvar Decadic logarithm of the reduced chi-square of
                                  the source variability with respect to a
                                  parametrized QSO variance model (8)
  62- 66  F5.2  [-]    logScRaw  Decadic logarithm of the ratio between the
                                  median of absolute deviations from the
                                  median of the raw time series and the median
                                  of absolute values of the residual time
                                  series (logScatterRawRes) (9)
  68- 72  F5.2  [mas]  logPlx    Decadic logarithm of the parallax value as
                                  provided by ESA (2007) (LogParallax) (10)
  74- 78  F5.2  [mag]  logSt     Decadic logarithm of the unbiased standard
                                  deviation of the residual time series
                                  (logStdDevRes) (11)
  80- 84  F5.2  [mag]  logSVar   Decadic logarithm of the average of absolute
                                  values of magnitude differences between all
                                  pairs of measurements separated by
                                  time-scales from 0.01 to 0.1 day
                                  (logShortVar) (12)
  86- 89  F4.2  ---    Sum       Ratio between the sum of squared residuals of
                                  the model from the raw data and the sum of
                                  squared deviations of the raw time series
                                  from its mean value (SumSqResRaw) (13)
  91- 95  F5.2  deg    |b|       Absolute value of the Galactic latitude of
                                  the source position (AbsGLAT) (14)
  97-101  F5.2 10-4/d  eFreq     Error estimate of the derived frequency
                                  (FrequencyError) (15)
 103-111  A9    ---    Type      Variability type, only in table2 (16)
--------------------------------------------------------------------------------
Note (1): The reddened V-I colour index in Cousins' system, as provided by
  ESA (1997, I/239).
Note (2): The unbiased skewness of the distribution of Hipparcos magnitudes,
  weighted by the inverse of squared measurement uncertainties.
Note (3): The decadic logarithm of the difference between the faintest and
  the brightest values of the light-curve model.
Note (4): The decadic logarithm of the period computed with the generalized
  Lomb-Scargle method (Zechmeister & Kurster 2009A&A...496..577Z 2009A&A...496..577Z) for sources
  with weighted skewness of the magnitude distribution smaller than 1.6.
  Periods of sources with skewness greater than 1.6 are computed with the
  classical (unweighted) Lomb-Scargle method (Lomb 1976Ap&SS..39..447L 1976Ap&SS..39..447L)
  Scargle 1982ApJ...263..835S 1982ApJ...263..835S). Limitations regarding the recovered periods
  are described in Sec. 4.2 of the paper.
Note (5): The absolute magnitude in the Hipparcos band employing the parallax
  described in logPlx and neglecting interstellar absorption.
Note (6): The decadic logarithm of the probability that the maximum peak in the
  the Lomb-Scargle periodogram (Scargle 1982ApJ...263..835S 1982ApJ...263..835S) is due to noise
  rather than the true signal, employing the beta distribution as indicated
  by Schwarzenberg-Czerny (1998MNRAS.301..831S 1998MNRAS.301..831S). The computation assumed a
  number of independent frequencies equal to the number of frequencies
  tested divided by an oversampling factor (estimated by the largest value
  between one and the inverse of the product of the frequency spacing
  employed and the time-series duration).
Note (7): The decadic logarithm of the point-to-point scatter of the time series
  folded with twice the recovered period (measured by the sum of squared
  magnitude differences between successive measurements in phase) divided by the
  same quantity computed on the raw time series (i.e., with respect to
  successive measurements in time).
Note (8): The decadic logarithm of the reduced chi-square of the source
  variability with respect to a parametrized quasar variance model, denoted by
  χ2QSO/ν in Butler & Bloom (2011AJ....141...93B 2011AJ....141...93B). Following Richards
  et al. (2011ApJ...733...10R 2011ApJ...733...10R), the parameter values employed for the Hipparcos
  data correspond to the SDSS g-band at fixed magnitude of 19.
Note (9): The decadic logarithm of the ratio between the median of absolute
  deviations from the median of the raw time series and the median of absolute
  values of the residual time series (obtained by subtracting model values from
  the raw time series).
Note (10): The decadic logarithm of the parallax value as from the new reduction
  of the Hipparcos raw data (van Leeuwen, 2007, I/311). Non-positive values of
  parallax are replaced by positive values randomly extracted from a Gaussian
  distribution with zero mean and standard deviation equal to the measurement
  uncertainty.
Note (11): The decadic logarithm of the unbiased standard deviation of the
  residual time series, weighted by the inverse of squared measurement
  uncertainties.
Note (12): The decadic logarithm of the average of absolute values of magnitude
  differences between all pairs of measurements separated by time-scales
  from 0.01 to 0.1 day.
Note (13): The ratio between the sum of squared residuals of the model from the
  raw data and the sum of squared deviations of the raw time series from its
  mean value.
Note (14): The absolute value of the Galactic latitude of the source position.
Note (15): The error estimate of the derived frequency (multiplied by 10000),
  under the assumption of equidistant observations of a sinusoidal signal
  (Kovacs 1981Ap&SS..78..175K 1981Ap&SS..78..175K; Baliunas et al. 1985ApJ...294..310B 1985ApJ...294..310B;
  Gilliland & Fisher 1985PASP...97..285G 1985PASP...97..285G).
Note (16): Variability types mostly from the AAVSO Variable Star Index
  (Watson et al., 2011, Cat. B/vsx; see also the "Note (G1)" below); other
  sources are detailed in the paper.
--------------------------------------------------------------------------------
Byte-by-byte Description of file: table5.dat
--------------------------------------------------------------------------------
  Bytes Format Units  Label     Explanations
--------------------------------------------------------------------------------
   1-  6  I6    ---    HIP      [1/120404] Hipparcos number
   8-  9  A2    ---    Set      Hipparcos sets from which the sources have
                                 been selected (HipparcosSet) (17)
  11- 15  A5    ---    HIPtype  Variability types as listed in HIP
                                 (HipparcosType) (18)
  17- 38  A22   ---    VXtype   Variability types  as listed in AAVSO (19)
  40- 48  A9    ---    RFtype   Variability types predicted by random
                                 forests (PredictedTypeRF) (20)
  50- 58  A9    ---    MBtype   Variability types predicted by a multi-stage
                                 methodology based on Bayesian networks
                                 (PredictedTypeMB) (21)
  60- 63  F4.2  ---    prRF     [0/1] Probability of the variability type
                                 predicted by random forests (ProbabilityRF)
  65- 68  F4.2  ---    prMB     [0/1] Probability of the variability type
                                 predicted by a multi-stage methodology
                                 based on Bayesian networks (ProbabilityMB)
--------------------------------------------------------------------------------
Note (17): The Hipparcos sets from which the sources have been selected
  (U1, U2 [unsolved], and M [micro-variable]), see Sec. 2 of the paper.
Note (18): Variability types from literature as listed in the Hipparcos
  catalogue (ESA 1997, I/239), when available
Note (19): Variability types from literature included in the AAVSO Variable
  Star Index, Version 2011-01-16 (Watson et al., 2011, B/vsx), when available.
Note (20): Variability types predicted by random forests (limited to single
  types only); see the types in the "Note (G1)" below.
Note (21): Variability types predicted by a multi-stage methodology based on
  Bayesian networks (limited to single types only); see the types in the
  "Note (G1)" section below.
--------------------------------------------------------------------------------
Byte-by-byte Description of file: tablec?.dat
--------------------------------------------------------------------------------
   Bytes Format Units Label    Explanations
--------------------------------------------------------------------------------
   1-  6  I6    ---   HIP      [1/120404] Hipparcos number
   8- 11  F4.2  ---   IX       Probability of the source to be of type I_X, as
                               predicted by random forests (ProbabilityI_X) (G1)
  13- 16  F4.2  ---   LPVP     Probability of the source to be of type LPV_P,
                               as predicted by random forests
                               (ProbabilityLPV_P) (G1)
  18- 21  F4.2  ---   LPVX     Probability of the source to be of type LPV_X,
                               as predicted by random forests
                               (ProbabilityLPV_X) (G1)
  23- 26  F4.2  ---   RS+BYP   Probability of the source to be of type RS+BY_P,
                               as predicted by random forests
                               (ProbabilityRS+BY_P) (G1)
  28- 31  F4.2  ---   RS+BYX   Probability of the source to be of type RS+BY_X,
                               as predicted by random forests
                               (ProbabilityRS+BY_X) (G1)
  33- 36  F4.2  ---   BE+GCASP Probability of the source to be of type BE+GCAS_P
                               as predicted by random forests
                               (ProbabilityBE+GCAS_P) (G1)
  38- 41  F4.2  ---   BE+GCASX Probability of the source to be of type BE+GCAS_X
                               as predicted by random forests
                               (ProbabilityBE+GCAS_X) (G1)
  43- 46  F4.2  ---   SPBP     Probability of the source to be of type SPB_P,
                               as predicted by random forests
                               (ProbabilitySPB_P) (G1)
  48- 51  F4.2  ---   ACVP     Probability of the source to be of type ACV_P,
                               as predicted by random forests
                               (ProbabilityACV_P) (G1)
  53- 56  F4.2  ---   ACVX     Probability of the source to be of type ACV_X,
                               as predicted by random forests
                               (ProbabilityACV_X) (G1)
  58- 61  F4.2  ---   EAP      Probability of the source to be of type EA_P,
                               as predicted by random forests
                               (ProbabilityEA_P) (G1)
  63- 66  F4.2  ---   EAX      Probability of the source to be of type EA_X,
                               as predicted by random forests
                               (ProbabilityEA_X) (G1)
  68- 71  F4.2  ---   EBP      Probability of the source to be of type EB_P,
                               as predicted by random forests
                               (ProbabilityEB_P) (G1)
  73- 76  F4.2  ---   EWP      Probability of the source to be of type EW_P,
                               as predicted by random forests
                               (ProbabilityEW_P) (G1)
  78- 81  F4.2  ---   ELLP     Probability of the source to be of type ELL_P,
                               as predicted by random forests
                               (ProbabilityELL_P) (G1)
  83- 86  F4.2  ---   ACYGP    Probability of the source to be of type ACYG_P,
                               as predicted by random forests
                               (ProbabilityACYG_P) (G1)
  88- 91  F4.2  ---   ACYGX    Probability of the source to be of type ACYG_X,
                               as predicted by random forests
                               (ProbabilityACYG_X) (G1)
  93- 96  F4.2  ---   BCEPP    Probability of the source to be of type BCEP_P,
                               as predicted by random forests
                               (ProbabilityBCEP_P) (G1)
  98-101  F4.2  ---   BCEPX    Probability of the source to be of type BCEP_X,
                               as predicted by random forests
                               (ProbabilityBCEP_X) (G1)
 103-106  F4.2  ---   DCEPSP   Probability of the source to be of type DCEPS_P,
                               as predicted by random forests
                               (ProbabilityDCEPS_P) (G1)
 108-111  F4.2  ---   DCEPP    Probability of the source to be of type DCEP_P,
                               as predicted by random forests
                               (ProbabilityDCEP_P) (G1)
 113-116  F4.2  ---   CEP(B)P  Probability of the source to be of type CEP(B)_P,
                               as predicted by random forests
                               (ProbabilityCEP(B)_P) (G1)
 118-121  F4.2  ---   RRABP    Probability of the source to be of type RRAB_P,
                               as predicted by random forests
                               (ProbabilityRRAB_P) (G1)
 123-126  F4.2  ---   RRCP     Probability of the source to be of type RRC_P,
                               as predicted by random forests
                               (ProbabilityRRC_P) (G1)
 128-131  F4.2  ---   GDORP    Probability of the source to be of type GDOR_P,
                               as predicted by random forests
                               (ProbabilityGDOR_P) (G1)
 133-136  F4.2  ---   GDORX    Probability of the source to be of type GDOR_X,
                               as predicted by random forests
                               (ProbabilityGDOR_X) (G1)
 138-141  F4.2  ---   DSCTP    Probability of the source to be of type DSCT_P,
                               as predicted by random forests
                               (ProbabilityDSCT_P) (G1)
 143-146  F4.2  ---   DSCTX    Probability of the source to be of type DSCT_X,
                               as predicted by random forests
                               (ProbabilityDSCT_X) (G1)
 148-151  F4.2  ---   DSCTCP   Probability of the source to be of type DSCTC_P,
                               as predicted by random forests
                               (ProbabilityDSCTC_P) (G1)
 153-156  F4.2  ---   DSCTCX   Probability of the source to be of type DSCTC_X,
                               as predicted by random forests
                               (ProbabilityDSCTC_X) (G1)
 158-161  F4.2  ---   CWAP     Probability of the source to be of type CWA_P,
                               as predicted by random forests
                               (ProbabilityCWA_P) (G1)
 163-166  F4.2  ---   CWBP     Probability of the source to be of type CWB_P,
                               as predicted by random forests
                               (ProbabilityCWB_P) (G1)
 168-171  F4.2  ---   SXARIP   Probability of the source to be of type SXARI_P,
                               as predicted by random forests
                               (ProbabilitySXARI_P) (G1)
 173-176  F4.2  ---   RVP      Probability of the source to be of type RV_P,
                               as predicted by random forests
                               (ProbabilityRV_P) (G1)
--------------------------------------------------------------------------------
Global Notes:
Note (G1): the variability types include a suffix _P for periodic variables,
     or _X for unsolved variables. The classifications, defined in table 1
     of paper, are:
    I       = Irregular
    LPV     = Long period variables
    RS+BY   = RS CVn- and BY Dra- type variables (rotating dKe or dMe stars)
    BE+GCAS = B-type emission line star and γ Cas variables
    SPB     = Slowly pulsating B-type stars
    ACV     = α2CVn-type variables (rotating Ap stars)
    EB      = eclipsing binaryies of Algol type (detached)
    EB      = eclipsing binaryies of β Lyr type (semi-detached)
    EW      = eclipsing binaryies of W UMa type (contact)
    ELL     = ellipsoidal rotating variables
    ACYG    = α Cyg variables (pulsating early-type supergiants)
    BCEP    = β Cep variables (pulsating early-type)
    DCEP    = classical Cepheid (δ Cep type)
    DCEPS   = First overtone Cepheid
    CEP(B)  = Multimode Cepheid
    RRAB    = RR Lyr asymmetric light curve
    RRC     = RR Lyr with nearly symmetric light curve
    GDOR    = γ Dor type (early F-type pulsating star)
    DSCT    = δ Scuti variable (pulsating A0-F5 stars);
              includes SX Phe-type stars
    DSCTC   = low-amplitude δ Scuti variables
    CWA     = pulsating variables of W Vir type with period>8d
    CWB     = pulsating variables of W Vir type with period<8d (BL Her-type)
    SXARI   = SX Ari-type star (rotating variable of Bp type)
    RV      = RV Tau-type (radially pulsating F-G supergiants)
History:
    * 29-Nov-2012: on-line version
    * 24-Jan-2013: tables 2, 4 and 5 corrected (from author)
    * 20-Aug-2013: label logScFol corrected into logScQSO (from author)
Acknowledgements:
    Lorenzo Rimoldini, lorenzo(at)rimoldini.info
(End) L. Rimoldini [Geneva Obs./ISDC, Switzerland], P. Vannier [CDS] 19-Mar-2012