J/MNRAS/499/524 Density-based outlier scoring on Kepler data (Giles+, 2020)
Density-based outlier scoring on Kepler data.
Giles D.K., Walkowicz L.
<Mon. Not. R. Astron. Soc., 499, 524-542 (2020)>
=2020MNRAS.499..524G 2020MNRAS.499..524G (SIMBAD/NED BibCode)
ADC_Keywords: Stars, fundamental ; Surveys ; Optical
Keywords: methods: data analysis - surveys - stars: general
Abstract:
In the present era of large-scale surveys, big data present new
challenges to the discovery process for anomalous data. Such data can
be indicative of systematic errors, extreme (or rare) forms of known
phenomena, or most interestingly, truly novel phenomena that exhibit
as-of-yet unobserved behaviours. In this work, we present an outlier
scoring methodology to identify and characterize the most promising
unusual sources to facilitate discoveries of such anomalous data. We
have developed a data mining method based on k-nearest neighbour
distance in feature space to efficiently identify the most anomalous
light curves. We test variations of this method including using
principal components of the feature space, removing select features,
the effect of the choice of k, and scoring to subset samples. We
evaluate the performance of our scoring on known object classes and
find that our scoring consistently scores rare (<1000) object classes
higher than common classes. We have applied scoring to all long
cadence light curves of Quarters 1-17 of Kepler's prime mission and
present outlier scores for all 2.8 million light curves for the
roughly 200k objects.
Description:
The data we consider in this study are long-cadence photometric light
curves from Quarters 1 to 17 of NASA's Kepler mission. We utilize Data
Release 25 that reprocessed all Q0-Q17 data with the updated data
pipeline (Thompson et al. 2016ksci.rept....9T, 2016ksci.rept....3T).
The Kepler mission was designed to observe stars in a single 105deg2
field of view (FOV) centred at RA=19h22m40s and Dec.=44°30'00"
from 2009 March to 2013 May. Four times a year, every 3 months, the
Kepler spacecraft rolled by 90° to re-align its solar panels, and
these define epochs known as 'Quarters'.
We present here the results of outlier scoring on all long cadence
light curves observed by the Kepler prime mission, providing scores
for every light curve in the context of each quarter, as well as
alternative scores scaled relative to an artificial reference to
facilitate comparisons of scores by object across quarters.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
table1.dat 400 201266 Outlier scores min-max scaled from zero to one
table3.dat 76 201266 Summary information for each object in the KIC
without the column references from Simbad
table3.txt 41696 201266 Summary information for each object in the KIC
with the column references from Simbad
tableb1.dat 638 201266 Full machine readable table of file names for
long cadence light curves from MAST
tablec1.dat 396 201266 Outlier scores scaled with respect to an
artificial reference source
--------------------------------------------------------------------------------
See also:
V/133 : Kepler Input Catalog (Kepler Mission Team, 2009)
Byte-by-byte Description of file: table1.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 9 I9 --- KIC KIC identifier
11- 32 E22.17 --- Q1 [0/1]? Outlier score for Quarter 1 (1)
34- 55 E22.17 --- Q2 [0/1]? Outlier score for Quarter 2 (1)
57- 78 E22.17 --- Q3 [0/1]? Outlier score for Quarter 3 (1)
80- 101 E22.17 --- Q4 [0/1]? Outlier score for Quarter 4 (1)
103- 124 E22.17 --- Q5 [0/1]? Outlier score for Quarter 5 (1)
126- 147 E22.17 --- Q6 [0/1]? Outlier score for Quarter 6 (1)
149- 170 E22.17 --- Q7 [0/1]? Outlier score for Quarter 7 (1)
172- 193 E22.17 --- Q8 [0/1]? Outlier score for Quarter 8 (1)
195- 216 E22.17 --- Q9 [0/1]? Outlier score for Quarter 9 (1)
218- 239 E22.17 --- Q10 [0/1]? Outlier score for Quarter 10 (1)
241- 262 E22.17 --- Q11 [0/1]? Outlier score for Quarter 11 (1)
264- 285 E22.17 --- Q12 [0/1]? Outlier score for Quarter 12 (1)
287- 308 E22.17 --- Q13 [0/1]? Outlier score for Quarter 13 (1)
310- 331 E22.17 --- Q14 [0/1]? Outlier score for Quarter 14 (1)
333- 354 E22.17 --- Q15 [0/1]? Outlier score for Quarter 15 (1)
356- 377 E22.17 --- Q16 [0/1]? Outlier score for Quarter 16 (1)
379- 400 E22.17 --- Q17 [0/1]? Outlier score for Quarter 17 (1)
--------------------------------------------------------------------------------
Note (1): We present here only the 4=<k<14 scores for the full feature set
without sampling. These scores are scaled from zero to one where the
most outlying object in the quarter has a score of one.
--------------------------------------------------------------------------------
Byte-by-byte Description of file: table3.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 9 I9 --- KIC KIC identifier
11- 19 F9.2 --- rankmed Median rank (1)
21- 28 F8.1 --- rankmin Minimum rank (1)
30- 32 A3 --- Qmin Quarter of minimum
34- 48 A15 --- Type Object type from Simbad (2)
50- 51 I2 h RAh ? Right ascension (J2000) (2)
53- 54 I2 min RAm ? Right ascension (J2000) (2)
56- 62 F7.4 s RAs ? Right ascension (J2000) (2)
64 A1 --- DE- ? Declination sign (J2000) (2)
65- 66 I2 deg DEd ? Declination (J2000) (2)
68- 69 I2 arcmin DEm ? Declination (J2000) (2)
71- 76 F6.3 arcsec DEs ? Declination (J2000) (2)
% 78-41696 A41618 --- Refs References (2)
--------------------------------------------------------------------------------
Note (1): Rank information is from the k-average scores based on the full
feature set
Note (2): Position information, type, and bibliography are from the SIMBAD data
base (bibliography only in table3.txt)
--------------------------------------------------------------------------------
Byte-by-byte Description of file: tableb1.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 9 I9 --- KIC KIC identifier
11- 46 A36 --- FileQ1 File name for long cadence light curve in
quarter 1
48- 83 A36 --- FileQ2 File name for long cadence light curve in
quarter 2
85- 120 A36 --- FileQ3 File name for long cadence light curve in
quarter 3
122- 157 A36 --- FileQ4 File name for long cadence light curve in
quarter 4
159- 194 A36 --- FileQ5 File name for long cadence light curve in
quarter 5
196- 231 A36 --- FileQ6 File name for long cadence light curve in
quarter 6
233- 268 A36 --- FileQ7 File name for long cadence light curve in
quarter 7
270- 305 A36 --- FileQ8 File name for long cadence light curve in
quarter 8
307- 342 A36 --- FileQ9 File name for long cadence light curve in
quarter 9
344- 379 A36 --- FileQ10 File name for long cadence light curve in
quarter 10
381- 416 A36 --- FileQ11 File name for long cadence light curve in
quarter 11
418- 453 A36 --- FileQ12 File name for long cadence light curve in
quarter 12
455- 490 A36 --- FileQ13 File name for long cadence light curve in
quarter 13
492- 527 A36 --- FileQ14 File name for long cadence light curve in
quarter 14
529- 564 A36 --- FileQ15 File name for long cadence light curve in
quarter 15
566- 601 A36 --- FileQ16 File name for long cadence light curve in
quarter 16
603- 638 A36 --- FileQ17 File name for long cadence light curve in
quarter 17
--------------------------------------------------------------------------------
Byte-by-byte Description of file: tablec1.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 9 I9 --- KIC KIC identifier
11- 31 F21.19 --- Q1 ? Outlier score scaled for Quarter 1 (1)
33- 54 F22.19 --- Q2 ? Outlier score scaled for Quarter 2 (1)
56- 77 F22.19 --- Q3 ? Outlier score scaled for Quarter 3 (1)
79- 100 F22.19 --- Q4 ? Outlier score scaled for Quarter 4 (1)
102- 123 F22.19 --- Q5 ? Outlier score scaled for Quarter 5 (1)
125- 145 F21.18 --- Q6 ? Outlier score scaled for Quarter 6 (1)
147- 167 F21.18 --- Q7 ? Outlier score scaled for Quarter 7 (1)
169- 190 F22.19 --- Q8 ? Outlier score scaled for Quarter 8 (1)
192- 213 F22.19 --- Q9 ? Outlier score scaled for Quarter 9 (1)
215- 236 F22.19 --- Q10 ? Outlier score scaled for Quarter 10 (1)
238- 259 F22.19 --- Q11 ? Outlier score scaled for Quarter 11 (1)
261- 282 F22.19 --- Q12 ? Outlier score scaled for Quarter 12 (1)
284- 305 F22.19 --- Q13 ? Outlier score scaled for Quarter 13 (1)
307- 328 F22.19 --- Q14 ? Outlier score scaled for Quarter 14 (1)
330- 351 F22.19 --- Q15 ? Outlier score scaled for Quarter 15 (1)
353- 374 F22.19 --- Q16 ? Outlier score scaled for Quarter 16 (1)
376- 396 F21.18 --- Q17 ? Outlier score scaled for Quarter 17 (1)
--------------------------------------------------------------------------------
Note (1): The outlier scores are scaled with respect to an artificial reference
source. This facilitates comparisons of scores across quarters. These
scores are exact k=1, sampled 10x1000.
--------------------------------------------------------------------------------
History:
From electronic version of the journal
(End) Ana Fiallos [CDS] 16-Aug-2023