J/MNRAS/484/834 Unsupervised machine learning to detect anomalies (Giles+, 2019)

Systematic serendipity: a test of unsupervised machine learning as a method for anomaly detection. Giles D., Walkowicz L. <Mon. Not. R. Astron. Soc., 484, 834-849 (2019)> =2019MNRAS.484..834G 2019MNRAS.484..834G (SIMBAD/NED BibCode)
ADC_Keywords: Surveys ; Positional data ; Optical Keywords: methods: data analysis - surveys - stars: individual: KIC 846285 - stars: individual: KIC 8462852 Abstract: Advances in astronomy are often driven by serendipitous discoveries. As survey astronomy continues to grow, the size and complexity of astronomical data bases will increase, and the ability of astronomers to manually scour data and make such discoveries decreases. In this work, we introduce a machine learning-based method to identify anomalies in large data sets to facilitate such discoveries, and apply this method to long cadence light curves from NASA's Kepler Mission. Our method clusters data based on density, identifying anomalies as data that lie outside of dense regions. This work serves as a proof-of-concept case study and we test our method on four quarters of the Kepler long cadence light curves. We use Kepler's most notorious anomaly, Boyajian's star (KIC 8462852), as a rare 'ground truth' for testing outlier identification to verify that objects of genuine scientific interest are included among the identified anomalies. We evaluate the method's ability to identify known anomalies by identifying unusual behaviour in Boyajian's star; we report the full list of identified anomalies for these quarters, and present a sample subset of identified outliers that includes unusual phenomena, objects that are rare in the Kepler field, and data artefacts. By identifying <4 per cent of each quarter as outlying data, we demonstrate that this anomaly detection method can create a more targeted approach in searching for rare and novel phenomena. Description: The data we consider in this study are long-cadence photometric light curves from Quarters 4, 8, 11, and 16 of NASA's Kepler mission. We utilize Data Release 25 that reprocessed all Q0-Q17 data with the updated data pipeline. The Kepler spacecraft was designed to obtain near-continuous photometry for stars in a single, star-rich 105deg2 field of view (FOV) centred at R.A.=19h22m40s and Dec=44°30'00" from 2009 March to 2013 May. The photometer camera contains 42 CCDs with 2200x1024 pixels, where each pixel covers 4arcsec. However, only pre-selected stars of interest were downloaded (Batalha et al. 2010ApJ...713L.109B 2010ApJ...713L.109B). Four times a year, every 3 months, the Kepler spacecraft rolled by 90deg to re-align its solar panels, and these define epochs known as 'Quarters'. This will place any given star in one of four different positions on the focal plane depending on season: in this study Quarters 4, 8, and 16 are the same orientation with Quarter 11 in the preceding orientation. This work utilizes a proximity clustering approach to identify outliers, based on Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al. 1996, in Simoudis E., Han J., Fayyad U., eds, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, Palo Alto. p.226). The DBSCAN algorithm is a nearest neighbour approach with two parameters defining what constitutes a cluster: the maximum separation (ε) in feature space between two points to be associated with one another, and the minimum number of associated neighbours (k) to qualify a point as a core cluster member. Across all quarters we considered 149789 objects, of which 8507 unique objects were identified as outliers representing 5.68 per cent of all objects considered (list of outliers shown in table 4). A total of 141282 objects, 94.32 per cent of all objects, were identified only as part of a cluster, either as core cluster members or edge cluster members. Objects that were identified as outliers in every quarter constituted 3584 of the outliers (2.39 per cent of all objects and 42 per cent of all outliers), and the remaining 4923 objects were found to be transient outliers, identified as an outlier and as a cluster member at least once each in different quarters. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file table4.dat 88 8507 List of outliers -------------------------------------------------------------------------------- See also: V/133 : Kepler Input Catalog (Kepler Mission Team, 2009) Byte-by-byte Description of file: table4.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 9 I9 --- KIC KIC identification number 11- 12 I2 h RAh Right ascension (J2000) 14- 15 I2 min RAm Right ascension (J2000) 17- 22 F6.3 s RAs Right ascension (J2000) 24 A1 --- DE- Declination sign (J2000) 25- 26 I2 deg DEd Declination (J2000) 28- 29 I2 arcmin DEm Declination (J2000) 31- 35 F5.2 arcsec DEs [0/60] Declination (J2000) 37- 41 I5 K Teff ? Effective temperature 43- 45 I3 K E_Teff ? Upper error on Teff 47- 50 I4 K e_Teff ? Lower error on Teff 52- 57 F6.3 [cm2/s] logg ? Surface gravity 59- 63 F5.3 [cm2/s] E_logg ? Upper error on logg 65- 69 F5.3 [cm2/s] e_logg ? Lower error on logg 71- 76 F6.3 mag Kepmag ? Kepler magnitude 78- 79 I2 --- Q4 Outlier flag on quarter 4 of the Kepler mission (1) 81- 82 I2 --- Q8 Outlier flag on quarter 8 of the Kepler mission (1) 84- 85 I2 --- Q11 Outlier flag on quarter 11 of the Kepler mission (1) 87- 88 I2 --- Q16 Outlier flag on quarter 16 of the Kepler mission (1) -------------------------------------------------------------------------------- Note (1): Flag as follows: -1 = the object is outlying in this quarter 0 = core cluster membership 1 = the object is an edge cluster member -------------------------------------------------------------------------------- History: From electronic version of the journal
(End) Ana Fiallos [CDS] 16-Aug-2022
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line