J/MNRAS/465/4530 outlier detection algorithm for SDSS galaxies (Baron+, 2017)
The weirdest SDSS galaxies: results from an outlier detection algorithm.
Baron D., Poznanski D.
<Mon. Not. R. Astron. Soc., 465, 4530-4555 (2017)>
=2017MNRAS.465.4530B 2017MNRAS.465.4530B (SIMBAD/NED BibCode)
ADC_Keywords: Galaxy catalogs ; Redshifts
Keywords: methods: data analysis - methods: statistical - galaxies: general -
galaxies: peculiar
Abstract:
How can we discover objects we did not know existed within the large
data sets that now abound in astronomy? We present an outlier
detection algorithm that we developed, based on an unsupervised Random
Forest. We test the algorithm on more than two million galaxy spectra
from the Sloan Digital Sky Survey and examine the 400 galaxies with
the highest outlier score. We find objects which have extreme emission
line ratios and abnormally strong absorption lines, objects with
unusual continua, including extremely reddened galaxies. We find
galaxy-galaxy gravitational lenses, double-peaked emission line
galaxies and close galaxy pairs. We find galaxies with high ionization
lines, galaxies that host supernovae and galaxies with unusual gas
kinematics. Only a fraction of the outliers we find were reported by
previous studies that used specific and tailored algorithms to find a
single class of unusual objects. Our algorithm is general and detects
all of these classes, and many more, regardless of what makes them
peculiar. It can be executed on imaging, time series and other
spectroscopic data, operates well with thousands of features, is not
sensitive to missing values and is easily parallelizable.
Description:
We have introduced an outlier detection algorithm that is based on an
unsupervised implementation of RF. By construction, the algorithm
learns the most important features of the data and their
interconnections; it is completely general and can be applied to
imaging data, time series and other spectroscopic objects as well. Out
of 2355926 galaxies that compose the input sample, we chose 400
galaxies with the highest weirdness score. We find objects with
unusual emission line ratios, and complex velocity structures,
extremely red objects, objects with extremely strong absorption lines
(i.e. sodium and Hα), galaxies which host supernovae, or have
rare emission lines.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
tablea.dat 94 432 Outlying galaxies (tables A1-A16)
--------------------------------------------------------------------------------
See also:
V/147 : The SDSS Photometric Catalogue, Release 12 (Alam+, 2015)
Byte-by-byte Description of file: tablea.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 3 A3 --- Code Code (1)
5- 8 I4 --- Plate SDSS plate number
10- 14 I5 --- MJD SDSS MJD number
16- 18 I3 --- Fiber SDSS Fiber number
20- 25 F6.4 --- z ? Redshift (2)
28- 65 A38 --- Com Comments
67- 94 A28 --- Ref References (3)
--------------------------------------------------------------------------------
Note (1): Code as follows:
A1 = Unusual velocity structure
A2 = Galaxies with additional velocity structure near Hα
A3 = Double-peaked emission-line galaxies
A4 = Broad [OIII] emission line
A5 = Hα strong galaxies
A6 = Unusual emission lines
A7 = Outliers on BPT diagram
A8 = Weak Hα emission
A9 = Sodium excess galaxies
A10 = Extremely red galaxies
A11 = Galaxies hosting supernovae
A12 = Chance alignment galaxy and a nearby star
A13 = Galaxy-galaxy gravitational lenses
A14 = Multiple emission-line systems
A15 = Stars identified as galaxies
A16 = Bad spectra
Note (2): Redshift from SDSS pipeline, whenever available we use zNOQSO which
is more reliable
Note (3): References as follows:
AN05 = Anderson et al., 2005AJ....130.2230A 2005AJ....130.2230A
BA05 = Balogh et al., 2005MNRAS.360..587B 2005MNRAS.360..587B, Cat. J/MNRAS/360/587
BI07 = Bian et al., 2007ApJ...668..721B 2007ApJ...668..721B
BO08 = Bolton et al., 2008ApJ...682..964B 2008ApJ...682..964B, Cat. J/ApJ/682/964
GE12 = Ge et al., 2012ApJS..201...31G 2012ApJS..201...31G ,Cat. J/ApJS/201/31
GI11 = Girven et al., 2011MNRAS.417.1210G 2011MNRAS.417.1210G, Cat. J/MNRAS/417/1210
GO03 = Goto et al., 2003PASJ...55..771G 2003PASJ...55..771G
GO04 = Goto, 2004A&A...427..125G 2004A&A...427..125G
GO07 = Goto, 2007MNRAS.381..187G 2007MNRAS.381..187G
GR15 = Graur et al., 2015MNRAS.450..905G 2015MNRAS.450..905G, Cat. J/MNRAS/450/905
JE13 = Jeong et al., 2013ApJS..208....7J 2013ApJS..208....7J, Cat. J/ApJS/208/7
KL13 = Kleinman et al., 2013ApJS..204....5K 2013ApJS..204....5K, Cat. J/ApJS/204/5
MA03 = Madgwick et al., 2003ApJ...599L..33M 2003ApJ...599L..33M
ME13 = Melnick & De Propris, 2013MNRAS.431.2034M 2013MNRAS.431.2034M, Cat. J/MNRAS/431/2034
PI12 = Pilyugin et al., 2012MNRAS.419..490P 2012MNRAS.419..490P
SC13 = Schirmer et al., 2013ApJ...763...60S 2013ApJ...763...60S
SM10 = Smith et al., 2010ApJ...716..866S 2010ApJ...716..866S, Cat. J/ApJ/716/866
ST03 = Strateva et al., 2003AJ....126.1720S 2003AJ....126.1720S, Cat. J/AJ/126/1720
ST08 = Strateva et al., 2008ApJ...687..869S 2008ApJ...687..869S
WA12 = Wang et al., 2012ApJ...749..115W 2012ApJ...749..115W
WU04 = Wu & Liu, 2004ApJ...614...91W 2004ApJ...614...91W, Cat. J/ApJ/614/91
XU09 = Xu & Komossa, 2009ApJ...705L..20X 2009ApJ...705L..20X
--------------------------------------------------------------------------------
History:
From electronic version of the journal
(End) Patricia Vannier [CDS] 26-Aug-2019