J/A+A/642/A58 RR Lyrae candidates in VVV (Cabral+, 2020)
Automatic catalog of RR Lyrae from ∼14 million VVV light curves:
How far can we go with traditional machine-learning?
Cabral J.B., Ramos F., Gurovich S., Granitto P.M.
<Astron. Astrophys., 642, A58 (2020)>
=2020A&A...642A..58C 2020A&A...642A..58C (SIMBAD/NED BibCode)
ADC_Keywords: Stars, variable
Keywords: methods: data analysis - methods: statistical - surveys - catalogs -
stars: variables: RR Lyrae - Galaxy: bulge
Abstract:
The creation of a 3D map of the bulge using RR Lyrae (RRL) is one of
the main goals of the VISTA Variables in the Via Lactea Survey (VVV)
and VVV(X) surveys. The overwhelming number of sources undergoing
analysis undoubtedly requires the use of automatic procedures. In this
context, previous studies have introduced the use of machine learning
(ML) methods for the task of variable star classification.
Our goal is to develop and test an entirely automatic ML-based
procedure for the identification of RRLs in the VVV Survey. This
automatic procedure is meant to be used to generate reliable catalogs
integrated over several tiles in the survey.
Following the reconstruction of light curves, we extracted a set of
period- and intensity-based features, which were already defined in
previous works. Also, for the first time, we put a new subset of
useful color features to use. We discuss in considerable detail all
the appropriate steps needed to define our fully automatic pipeline,
namely: the selection of quality measurements; sampling procedures;
classifier setup, and model selection.
As a result, we were able to construct an ensemble classifier with an
average recall of 0.48 and average precision of 0.86 over 15 tiles. We
also made all our processed datasets available and we published a
catalog of candidate RRLs.
Perhaps most interestingly, from a classification perspective based on
photometric broad-band data, our results indicate that color is an
informative feature type of the RRL objective class that should always
be considered in automatic classification methods via ML. We also
argue that recall and precision in both tables and curves are
high-quality metrics with regard to this highly imbalanced problem.
Furthermore, we show for our VVV data-set that to have good estimates,
it is important to use the original distribution more abundantly than
reduced samples with an artificial balance. Finally, we show that the
use of ensemble classifiers helps resolve the crucial model selection
step and that most errors in the identification of RRLs are related to
low-quality observations of some sources or to the increased
difficulty in resolving the RRL-C type given the data.
Description:
In this work, we derive a method for the automatic classification of
RRL stars. We begin by discussing the context of RRL as keystones for
stellar evolution and pulsation astrophysics and their importance as
rungs on the intra- and extragalactic distance scale ladder, as well
as for galaxy formation models. We base our models on RRL that have
previously been classified in the literature prior to Gaia DR2. We
match VVV data to those stars, and extract features using the feets
package affiliated to astropy, presented in Cabral et al. (2018,
Astron. Comput., 25, 213). We explore the difficulty inherent in
existing semi-automatic methods as found in the literature and set out
to test some of these pitfalls to learn from them to build a more
robust classifier of RRL for the VVV survey based on a newly crafted
ML tool.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
tableg.dat 84 242 Candidates to RRL sorted by probability of
being an RRL
--------------------------------------------------------------------------------
See also:
II/348 : VISTA Variable in the Via Lactea Survey DR2 (Minniti+, 2017)
Byte-by-byte Description of file: tableg.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 3 I3 --- Seq [1/242] Sequential number
5- 27 A23 --- Name Name (VVV JHHMMSS.ss+DDMMSS.s)
29- 32 A4 --- Tile Tile where the candidate is located
34- 42 F9.5 deg RAdeg Right ascension (J2000)
44- 52 F9.5 deg DEdeg Declination (J2000)
54- 60 F7.5 d Per Calculated period
62- 70 F9.6 mag magmean Mean of magnitudes
72- 78 F7.5 mag Amp Amplitude
80- 84 F5.3 --- Prob Probability to be a RRL
--------------------------------------------------------------------------------
History:
From electronic version of the journal
(End) Patricia Vannier [CDS] 16-Nov-2020