J/MNRAS/427/2917 Classification of Hipparcos variables (Rimoldini+, 2012)
Automated classification of Hipparcos unsolved variables.
Rimoldini L., Dubath P., Suveges M., Lopez M., Sarro L.M., Blomme J.,
De Ridder J., Cuypers J., Guy L., Mowlavi N., Lecoeur-Taibi I., Beck M.,
Jan A., Nienartowicz K., Ordonez-Blanco D., Lebzelter T., Eyer L.
<Mon. Not. R. Astron. Soc. 427, 2917 (2012)>
=2012MNRAS.427.2917R 2012MNRAS.427.2917R
ADC_Keywords: Models ; Stars, variable ; Photometry, classification
Keywords: methods: data analysis - catalogues - stars: variables: general
Description:
The Hipparcos catalogue (ESA 1997, Cat. I/239) and the AAVSO Variable
Star Index (Watson et al., 2011, Cat. B/vsx) are employed to
complement the training set of periodic variables of Dubath et al.
(2011, Cat. J/MNRAS/414/2602) with irregular and non-periodic
representatives, leading to 3881 sources in total which described 24
variability types. The attributes employed to characterize light-curve
features are selected according to their relevance for classification.
Classifier models are produced with random forests and a multi-stage
methodology based on Bayesian networks, achieving overall
misclassification rates under 12%. Both classifiers are applied to
predict variability types for 6051 Hipparcos variables associated
with uncertain or missing types in the literature.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
table2.dat 111 3881 Training set of Hipparcos variable stars
table4.dat 101 6051 Prediction set of Hipparcos unsolved variables
table5.dat 68 6051 Predictions of variability types
tablec1.dat 176 6051 Full random forest prediction probability arrays
tablec2.dat 176 6051 Full multi-stage Bayesian nets prediction
probability arrays
ori.tar 512 7080 Original files
--------------------------------------------------------------------------------
See also:
I/239 : The Hipparcos and Tycho Catalogues (ESA 1997)
I/311 : Hipparcos, the New Reduction (van Leeuwen, 2007)
B/vsx : AAVSO International Variable Star Index VSX (Watson+, 2006-12)
J/MNRAS/414/2602 : HIP variable automated classification (Dubath+, 2011)
Byte-by-byte Description of file: table[24].dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 6 I6 --- HIP [1/120404] Hipparcos number
8- 12 F5.2 mag V-I Reddened V-I colour index in Cousins' system,
as provided by ESA (1997) (1)
14- 19 F6.2 --- Skew Unbiased skewness of the distribution of HIP
magnitudes (Skewness) (2)
21- 25 F5.2 [mag] logAmp Decadic logarithm of the difference between
the faintest and the brightest values of the
light-curve model (LogAmplitude) (3)
27- 33 F7.4 [d] logPer Decadic logarithm of the period (LogPeriod) (4)
35- 40 F6.2 mag MAG Absolute magnitude in the Hipparcos band
(AbsoluteMag) (5)
42- 48 F7.2 [-] logFAP [,0] Decadic logarithm of the probability that
the maximum peak in the Lomb-Scargle
periodogram (Scargle 1982ApJ...263..835S 1982ApJ...263..835S) is
due to noise rather than the true signal (6)
50- 54 F5.2 [-] logP2P Decadic logarithm of the point-to-point
scatter of the time series
(LogP2PscatterFoldedRaw) (7)
56- 60 F5.2 [-] logQSOvar Decadic logarithm of the reduced chi-square of
the source variability with respect to a
parametrized QSO variance model (8)
62- 66 F5.2 [-] logScRaw Decadic logarithm of the ratio between the
median of absolute deviations from the
median of the raw time series and the median
of absolute values of the residual time
series (logScatterRawRes) (9)
68- 72 F5.2 [mas] logPlx Decadic logarithm of the parallax value as
provided by ESA (2007) (LogParallax) (10)
74- 78 F5.2 [mag] logSt Decadic logarithm of the unbiased standard
deviation of the residual time series
(logStdDevRes) (11)
80- 84 F5.2 [mag] logSVar Decadic logarithm of the average of absolute
values of magnitude differences between all
pairs of measurements separated by
time-scales from 0.01 to 0.1 day
(logShortVar) (12)
86- 89 F4.2 --- Sum Ratio between the sum of squared residuals of
the model from the raw data and the sum of
squared deviations of the raw time series
from its mean value (SumSqResRaw) (13)
91- 95 F5.2 deg |b| Absolute value of the Galactic latitude of
the source position (AbsGLAT) (14)
97-101 F5.2 10-4/d eFreq Error estimate of the derived frequency
(FrequencyError) (15)
103-111 A9 --- Type Variability type, only in table2 (16)
--------------------------------------------------------------------------------
Note (1): The reddened V-I colour index in Cousins' system, as provided by
ESA (1997, I/239).
Note (2): The unbiased skewness of the distribution of Hipparcos magnitudes,
weighted by the inverse of squared measurement uncertainties.
Note (3): The decadic logarithm of the difference between the faintest and
the brightest values of the light-curve model.
Note (4): The decadic logarithm of the period computed with the generalized
Lomb-Scargle method (Zechmeister & Kurster 2009A&A...496..577Z 2009A&A...496..577Z) for sources
with weighted skewness of the magnitude distribution smaller than 1.6.
Periods of sources with skewness greater than 1.6 are computed with the
classical (unweighted) Lomb-Scargle method (Lomb 1976Ap&SS..39..447L 1976Ap&SS..39..447L)
Scargle 1982ApJ...263..835S 1982ApJ...263..835S). Limitations regarding the recovered periods
are described in Sec. 4.2 of the paper.
Note (5): The absolute magnitude in the Hipparcos band employing the parallax
described in logPlx and neglecting interstellar absorption.
Note (6): The decadic logarithm of the probability that the maximum peak in the
the Lomb-Scargle periodogram (Scargle 1982ApJ...263..835S 1982ApJ...263..835S) is due to noise
rather than the true signal, employing the beta distribution as indicated
by Schwarzenberg-Czerny (1998MNRAS.301..831S 1998MNRAS.301..831S). The computation assumed a
number of independent frequencies equal to the number of frequencies
tested divided by an oversampling factor (estimated by the largest value
between one and the inverse of the product of the frequency spacing
employed and the time-series duration).
Note (7): The decadic logarithm of the point-to-point scatter of the time series
folded with twice the recovered period (measured by the sum of squared
magnitude differences between successive measurements in phase) divided by the
same quantity computed on the raw time series (i.e., with respect to
successive measurements in time).
Note (8): The decadic logarithm of the reduced chi-square of the source
variability with respect to a parametrized quasar variance model, denoted by
χ2QSO/ν in Butler & Bloom (2011AJ....141...93B 2011AJ....141...93B). Following Richards
et al. (2011ApJ...733...10R 2011ApJ...733...10R), the parameter values employed for the Hipparcos
data correspond to the SDSS g-band at fixed magnitude of 19.
Note (9): The decadic logarithm of the ratio between the median of absolute
deviations from the median of the raw time series and the median of absolute
values of the residual time series (obtained by subtracting model values from
the raw time series).
Note (10): The decadic logarithm of the parallax value as from the new reduction
of the Hipparcos raw data (van Leeuwen, 2007, I/311). Non-positive values of
parallax are replaced by positive values randomly extracted from a Gaussian
distribution with zero mean and standard deviation equal to the measurement
uncertainty.
Note (11): The decadic logarithm of the unbiased standard deviation of the
residual time series, weighted by the inverse of squared measurement
uncertainties.
Note (12): The decadic logarithm of the average of absolute values of magnitude
differences between all pairs of measurements separated by time-scales
from 0.01 to 0.1 day.
Note (13): The ratio between the sum of squared residuals of the model from the
raw data and the sum of squared deviations of the raw time series from its
mean value.
Note (14): The absolute value of the Galactic latitude of the source position.
Note (15): The error estimate of the derived frequency (multiplied by 10000),
under the assumption of equidistant observations of a sinusoidal signal
(Kovacs 1981Ap&SS..78..175K 1981Ap&SS..78..175K; Baliunas et al. 1985ApJ...294..310B 1985ApJ...294..310B;
Gilliland & Fisher 1985PASP...97..285G 1985PASP...97..285G).
Note (16): Variability types mostly from the AAVSO Variable Star Index
(Watson et al., 2011, Cat. B/vsx; see also the "Note (G1)" below); other
sources are detailed in the paper.
--------------------------------------------------------------------------------
Byte-by-byte Description of file: table5.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 6 I6 --- HIP [1/120404] Hipparcos number
8- 9 A2 --- Set Hipparcos sets from which the sources have
been selected (HipparcosSet) (17)
11- 15 A5 --- HIPtype Variability types as listed in HIP
(HipparcosType) (18)
17- 38 A22 --- VXtype Variability types as listed in AAVSO (19)
40- 48 A9 --- RFtype Variability types predicted by random
forests (PredictedTypeRF) (20)
50- 58 A9 --- MBtype Variability types predicted by a multi-stage
methodology based on Bayesian networks
(PredictedTypeMB) (21)
60- 63 F4.2 --- prRF [0/1] Probability of the variability type
predicted by random forests (ProbabilityRF)
65- 68 F4.2 --- prMB [0/1] Probability of the variability type
predicted by a multi-stage methodology
based on Bayesian networks (ProbabilityMB)
--------------------------------------------------------------------------------
Note (17): The Hipparcos sets from which the sources have been selected
(U1, U2 [unsolved], and M [micro-variable]), see Sec. 2 of the paper.
Note (18): Variability types from literature as listed in the Hipparcos
catalogue (ESA 1997, I/239), when available
Note (19): Variability types from literature included in the AAVSO Variable
Star Index, Version 2011-01-16 (Watson et al., 2011, B/vsx), when available.
Note (20): Variability types predicted by random forests (limited to single
types only); see the types in the "Note (G1)" below.
Note (21): Variability types predicted by a multi-stage methodology based on
Bayesian networks (limited to single types only); see the types in the
"Note (G1)" section below.
--------------------------------------------------------------------------------
Byte-by-byte Description of file: tablec?.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 6 I6 --- HIP [1/120404] Hipparcos number
8- 11 F4.2 --- IX Probability of the source to be of type I_X, as
predicted by random forests (ProbabilityI_X) (G1)
13- 16 F4.2 --- LPVP Probability of the source to be of type LPV_P,
as predicted by random forests
(ProbabilityLPV_P) (G1)
18- 21 F4.2 --- LPVX Probability of the source to be of type LPV_X,
as predicted by random forests
(ProbabilityLPV_X) (G1)
23- 26 F4.2 --- RS+BYP Probability of the source to be of type RS+BY_P,
as predicted by random forests
(ProbabilityRS+BY_P) (G1)
28- 31 F4.2 --- RS+BYX Probability of the source to be of type RS+BY_X,
as predicted by random forests
(ProbabilityRS+BY_X) (G1)
33- 36 F4.2 --- BE+GCASP Probability of the source to be of type BE+GCAS_P
as predicted by random forests
(ProbabilityBE+GCAS_P) (G1)
38- 41 F4.2 --- BE+GCASX Probability of the source to be of type BE+GCAS_X
as predicted by random forests
(ProbabilityBE+GCAS_X) (G1)
43- 46 F4.2 --- SPBP Probability of the source to be of type SPB_P,
as predicted by random forests
(ProbabilitySPB_P) (G1)
48- 51 F4.2 --- ACVP Probability of the source to be of type ACV_P,
as predicted by random forests
(ProbabilityACV_P) (G1)
53- 56 F4.2 --- ACVX Probability of the source to be of type ACV_X,
as predicted by random forests
(ProbabilityACV_X) (G1)
58- 61 F4.2 --- EAP Probability of the source to be of type EA_P,
as predicted by random forests
(ProbabilityEA_P) (G1)
63- 66 F4.2 --- EAX Probability of the source to be of type EA_X,
as predicted by random forests
(ProbabilityEA_X) (G1)
68- 71 F4.2 --- EBP Probability of the source to be of type EB_P,
as predicted by random forests
(ProbabilityEB_P) (G1)
73- 76 F4.2 --- EWP Probability of the source to be of type EW_P,
as predicted by random forests
(ProbabilityEW_P) (G1)
78- 81 F4.2 --- ELLP Probability of the source to be of type ELL_P,
as predicted by random forests
(ProbabilityELL_P) (G1)
83- 86 F4.2 --- ACYGP Probability of the source to be of type ACYG_P,
as predicted by random forests
(ProbabilityACYG_P) (G1)
88- 91 F4.2 --- ACYGX Probability of the source to be of type ACYG_X,
as predicted by random forests
(ProbabilityACYG_X) (G1)
93- 96 F4.2 --- BCEPP Probability of the source to be of type BCEP_P,
as predicted by random forests
(ProbabilityBCEP_P) (G1)
98-101 F4.2 --- BCEPX Probability of the source to be of type BCEP_X,
as predicted by random forests
(ProbabilityBCEP_X) (G1)
103-106 F4.2 --- DCEPSP Probability of the source to be of type DCEPS_P,
as predicted by random forests
(ProbabilityDCEPS_P) (G1)
108-111 F4.2 --- DCEPP Probability of the source to be of type DCEP_P,
as predicted by random forests
(ProbabilityDCEP_P) (G1)
113-116 F4.2 --- CEP(B)P Probability of the source to be of type CEP(B)_P,
as predicted by random forests
(ProbabilityCEP(B)_P) (G1)
118-121 F4.2 --- RRABP Probability of the source to be of type RRAB_P,
as predicted by random forests
(ProbabilityRRAB_P) (G1)
123-126 F4.2 --- RRCP Probability of the source to be of type RRC_P,
as predicted by random forests
(ProbabilityRRC_P) (G1)
128-131 F4.2 --- GDORP Probability of the source to be of type GDOR_P,
as predicted by random forests
(ProbabilityGDOR_P) (G1)
133-136 F4.2 --- GDORX Probability of the source to be of type GDOR_X,
as predicted by random forests
(ProbabilityGDOR_X) (G1)
138-141 F4.2 --- DSCTP Probability of the source to be of type DSCT_P,
as predicted by random forests
(ProbabilityDSCT_P) (G1)
143-146 F4.2 --- DSCTX Probability of the source to be of type DSCT_X,
as predicted by random forests
(ProbabilityDSCT_X) (G1)
148-151 F4.2 --- DSCTCP Probability of the source to be of type DSCTC_P,
as predicted by random forests
(ProbabilityDSCTC_P) (G1)
153-156 F4.2 --- DSCTCX Probability of the source to be of type DSCTC_X,
as predicted by random forests
(ProbabilityDSCTC_X) (G1)
158-161 F4.2 --- CWAP Probability of the source to be of type CWA_P,
as predicted by random forests
(ProbabilityCWA_P) (G1)
163-166 F4.2 --- CWBP Probability of the source to be of type CWB_P,
as predicted by random forests
(ProbabilityCWB_P) (G1)
168-171 F4.2 --- SXARIP Probability of the source to be of type SXARI_P,
as predicted by random forests
(ProbabilitySXARI_P) (G1)
173-176 F4.2 --- RVP Probability of the source to be of type RV_P,
as predicted by random forests
(ProbabilityRV_P) (G1)
--------------------------------------------------------------------------------
Global Notes:
Note (G1): the variability types include a suffix _P for periodic variables,
or _X for unsolved variables. The classifications, defined in table 1
of paper, are:
I = Irregular
LPV = Long period variables
RS+BY = RS CVn- and BY Dra- type variables (rotating dKe or dMe stars)
BE+GCAS = B-type emission line star and γ Cas variables
SPB = Slowly pulsating B-type stars
ACV = α2CVn-type variables (rotating Ap stars)
EB = eclipsing binaryies of Algol type (detached)
EB = eclipsing binaryies of β Lyr type (semi-detached)
EW = eclipsing binaryies of W UMa type (contact)
ELL = ellipsoidal rotating variables
ACYG = α Cyg variables (pulsating early-type supergiants)
BCEP = β Cep variables (pulsating early-type)
DCEP = classical Cepheid (δ Cep type)
DCEPS = First overtone Cepheid
CEP(B) = Multimode Cepheid
RRAB = RR Lyr asymmetric light curve
RRC = RR Lyr with nearly symmetric light curve
GDOR = γ Dor type (early F-type pulsating star)
DSCT = δ Scuti variable (pulsating A0-F5 stars);
includes SX Phe-type stars
DSCTC = low-amplitude δ Scuti variables
CWA = pulsating variables of W Vir type with period>8d
CWB = pulsating variables of W Vir type with period<8d (BL Her-type)
SXARI = SX Ari-type star (rotating variable of Bp type)
RV = RV Tau-type (radially pulsating F-G supergiants)
History:
* 29-Nov-2012: on-line version
* 24-Jan-2013: tables 2, 4 and 5 corrected (from author)
* 20-Aug-2013: label logScFol corrected into logScQSO (from author)
Acknowledgements:
Lorenzo Rimoldini, lorenzo(at)rimoldini.info
(End) L. Rimoldini [Geneva Obs./ISDC, Switzerland], P. Vannier [CDS] 19-Mar-2012