J/MNRAS/499/524    Density-based outlier scoring on Kepler data   (Giles+, 2020)

Density-based outlier scoring on Kepler data. Giles D.K., Walkowicz L. <Mon. Not. R. Astron. Soc., 499, 524-542 (2020)> =2020MNRAS.499..524G 2020MNRAS.499..524G (SIMBAD/NED BibCode)
ADC_Keywords: Stars, fundamental ; Surveys ; Optical Keywords: methods: data analysis - surveys - stars: general Abstract: In the present era of large-scale surveys, big data present new challenges to the discovery process for anomalous data. Such data can be indicative of systematic errors, extreme (or rare) forms of known phenomena, or most interestingly, truly novel phenomena that exhibit as-of-yet unobserved behaviours. In this work, we present an outlier scoring methodology to identify and characterize the most promising unusual sources to facilitate discoveries of such anomalous data. We have developed a data mining method based on k-nearest neighbour distance in feature space to efficiently identify the most anomalous light curves. We test variations of this method including using principal components of the feature space, removing select features, the effect of the choice of k, and scoring to subset samples. We evaluate the performance of our scoring on known object classes and find that our scoring consistently scores rare (<1000) object classes higher than common classes. We have applied scoring to all long cadence light curves of Quarters 1-17 of Kepler's prime mission and present outlier scores for all 2.8 million light curves for the roughly 200k objects. Description: The data we consider in this study are long-cadence photometric light curves from Quarters 1 to 17 of NASA's Kepler mission. We utilize Data Release 25 that reprocessed all Q0-Q17 data with the updated data pipeline (Thompson et al. 2016ksci.rept....9T, 2016ksci.rept....3T). The Kepler mission was designed to observe stars in a single 105deg2 field of view (FOV) centred at RA=19h22m40s and Dec.=44°30'00" from 2009 March to 2013 May. Four times a year, every 3 months, the Kepler spacecraft rolled by 90° to re-align its solar panels, and these define epochs known as 'Quarters'. We present here the results of outlier scoring on all long cadence light curves observed by the Kepler prime mission, providing scores for every light curve in the context of each quarter, as well as alternative scores scaled relative to an artificial reference to facilitate comparisons of scores by object across quarters. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file table1.dat 400 201266 Outlier scores min-max scaled from zero to one table3.dat 76 201266 Summary information for each object in the KIC without the column references from Simbad table3.txt 41696 201266 Summary information for each object in the KIC with the column references from Simbad tableb1.dat 638 201266 Full machine readable table of file names for long cadence light curves from MAST tablec1.dat 396 201266 Outlier scores scaled with respect to an artificial reference source -------------------------------------------------------------------------------- See also: V/133 : Kepler Input Catalog (Kepler Mission Team, 2009) Byte-by-byte Description of file: table1.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 9 I9 --- KIC KIC identifier 11- 32 E22.17 --- Q1 [0/1]? Outlier score for Quarter 1 (1) 34- 55 E22.17 --- Q2 [0/1]? Outlier score for Quarter 2 (1) 57- 78 E22.17 --- Q3 [0/1]? Outlier score for Quarter 3 (1) 80- 101 E22.17 --- Q4 [0/1]? Outlier score for Quarter 4 (1) 103- 124 E22.17 --- Q5 [0/1]? Outlier score for Quarter 5 (1) 126- 147 E22.17 --- Q6 [0/1]? Outlier score for Quarter 6 (1) 149- 170 E22.17 --- Q7 [0/1]? Outlier score for Quarter 7 (1) 172- 193 E22.17 --- Q8 [0/1]? Outlier score for Quarter 8 (1) 195- 216 E22.17 --- Q9 [0/1]? Outlier score for Quarter 9 (1) 218- 239 E22.17 --- Q10 [0/1]? Outlier score for Quarter 10 (1) 241- 262 E22.17 --- Q11 [0/1]? Outlier score for Quarter 11 (1) 264- 285 E22.17 --- Q12 [0/1]? Outlier score for Quarter 12 (1) 287- 308 E22.17 --- Q13 [0/1]? Outlier score for Quarter 13 (1) 310- 331 E22.17 --- Q14 [0/1]? Outlier score for Quarter 14 (1) 333- 354 E22.17 --- Q15 [0/1]? Outlier score for Quarter 15 (1) 356- 377 E22.17 --- Q16 [0/1]? Outlier score for Quarter 16 (1) 379- 400 E22.17 --- Q17 [0/1]? Outlier score for Quarter 17 (1) -------------------------------------------------------------------------------- Note (1): We present here only the 4=<k<14 scores for the full feature set without sampling. These scores are scaled from zero to one where the most outlying object in the quarter has a score of one. -------------------------------------------------------------------------------- Byte-by-byte Description of file: table3.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 9 I9 --- KIC KIC identifier 11- 19 F9.2 --- rankmed Median rank (1) 21- 28 F8.1 --- rankmin Minimum rank (1) 30- 32 A3 --- Qmin Quarter of minimum 34- 48 A15 --- Type Object type from Simbad (2) 50- 51 I2 h RAh ? Right ascension (J2000) (2) 53- 54 I2 min RAm ? Right ascension (J2000) (2) 56- 62 F7.4 s RAs ? Right ascension (J2000) (2) 64 A1 --- DE- ? Declination sign (J2000) (2) 65- 66 I2 deg DEd ? Declination (J2000) (2) 68- 69 I2 arcmin DEm ? Declination (J2000) (2) 71- 76 F6.3 arcsec DEs ? Declination (J2000) (2) % 78-41696 A41618 --- Refs References (2) -------------------------------------------------------------------------------- Note (1): Rank information is from the k-average scores based on the full feature set Note (2): Position information, type, and bibliography are from the SIMBAD data base (bibliography only in table3.txt) -------------------------------------------------------------------------------- Byte-by-byte Description of file: tableb1.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 9 I9 --- KIC KIC identifier 11- 46 A36 --- FileQ1 File name for long cadence light curve in quarter 1 48- 83 A36 --- FileQ2 File name for long cadence light curve in quarter 2 85- 120 A36 --- FileQ3 File name for long cadence light curve in quarter 3 122- 157 A36 --- FileQ4 File name for long cadence light curve in quarter 4 159- 194 A36 --- FileQ5 File name for long cadence light curve in quarter 5 196- 231 A36 --- FileQ6 File name for long cadence light curve in quarter 6 233- 268 A36 --- FileQ7 File name for long cadence light curve in quarter 7 270- 305 A36 --- FileQ8 File name for long cadence light curve in quarter 8 307- 342 A36 --- FileQ9 File name for long cadence light curve in quarter 9 344- 379 A36 --- FileQ10 File name for long cadence light curve in quarter 10 381- 416 A36 --- FileQ11 File name for long cadence light curve in quarter 11 418- 453 A36 --- FileQ12 File name for long cadence light curve in quarter 12 455- 490 A36 --- FileQ13 File name for long cadence light curve in quarter 13 492- 527 A36 --- FileQ14 File name for long cadence light curve in quarter 14 529- 564 A36 --- FileQ15 File name for long cadence light curve in quarter 15 566- 601 A36 --- FileQ16 File name for long cadence light curve in quarter 16 603- 638 A36 --- FileQ17 File name for long cadence light curve in quarter 17 -------------------------------------------------------------------------------- Byte-by-byte Description of file: tablec1.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 9 I9 --- KIC KIC identifier 11- 31 F21.19 --- Q1 ? Outlier score scaled for Quarter 1 (1) 33- 54 F22.19 --- Q2 ? Outlier score scaled for Quarter 2 (1) 56- 77 F22.19 --- Q3 ? Outlier score scaled for Quarter 3 (1) 79- 100 F22.19 --- Q4 ? Outlier score scaled for Quarter 4 (1) 102- 123 F22.19 --- Q5 ? Outlier score scaled for Quarter 5 (1) 125- 145 F21.18 --- Q6 ? Outlier score scaled for Quarter 6 (1) 147- 167 F21.18 --- Q7 ? Outlier score scaled for Quarter 7 (1) 169- 190 F22.19 --- Q8 ? Outlier score scaled for Quarter 8 (1) 192- 213 F22.19 --- Q9 ? Outlier score scaled for Quarter 9 (1) 215- 236 F22.19 --- Q10 ? Outlier score scaled for Quarter 10 (1) 238- 259 F22.19 --- Q11 ? Outlier score scaled for Quarter 11 (1) 261- 282 F22.19 --- Q12 ? Outlier score scaled for Quarter 12 (1) 284- 305 F22.19 --- Q13 ? Outlier score scaled for Quarter 13 (1) 307- 328 F22.19 --- Q14 ? Outlier score scaled for Quarter 14 (1) 330- 351 F22.19 --- Q15 ? Outlier score scaled for Quarter 15 (1) 353- 374 F22.19 --- Q16 ? Outlier score scaled for Quarter 16 (1) 376- 396 F21.18 --- Q17 ? Outlier score scaled for Quarter 17 (1) -------------------------------------------------------------------------------- Note (1): The outlier scores are scaled with respect to an artificial reference source. This facilitates comparisons of scores across quarters. These scores are exact k=1, sampled 10x1000. -------------------------------------------------------------------------------- History: From electronic version of the journal
(End) Ana Fiallos [CDS] 16-Aug-2023
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line