J/A+A/648/A122 SDSS galaxies morphological classification (Vavilova+, 2021)
Machine learning technique for morphological classification of galaxies from
the SDSS. I. Photometry-based approach.
Vavilova I.B., Dobrycheva D.V., Vasylenko M.Yu., Elyiv A.A., Melnyk O.V.,
Khramtsov V.
<Astron. Astrophys. 648, A122 (2021)>
=2021A&A...648A.122V 2021A&A...648A.122V (SIMBAD/NED BibCode)
ADC_Keywords: Galaxy catalogs ; Morphology ; Photometry, SDSS
Keywords: galaxies: general - methods: data analysis - galaxies: statistics -
galaxies: photometry - galaxies: spiral -
galaxies: elliptical and lenticular, cD
Abstract:
Machine learning methods are effective tools in astronomical tasks for
classifying objects by their individual features. One of the promising
utilities is related to the morphological classification of galaxies
at different redshifts.
We use the photometry-based approach for the SDSS data (1) to exploit
five supervised machine learning techniques and define the most
effective among them for the automated galaxy morphological
classification; (2) to test the influence of photometry data on
morphology classification; (3) to discuss problem points of supervised
machine learning and labeling bias; and (4) to apply the best fitting
machine learning methods for revealing the unknown morphological types
of galaxies from the SDSS DR9 at z<0.1.
We used different galaxy classification techniques: human labeling,
multi-photometry diagrams, naive Bayes, logistic regression,
support-vector machine, random forest, k-nearest neighbors.
We present the results of a binary automated morphological
classification of galaxies conducted by human labeling,
multi-photometry, and five supervised machine learning methods. We
applied it to the sample of galaxies from the SDSS DR9 with redshifts
of 0.02<z<0.1 and absolute stellar magnitudes of -24mag<Mr←19.4mag.
For the analysis we used absolute magnitudes Mu, Mg, Mr, Mi, Mz; color
indices Mu-Mr, Mg-Mi, Mu-Mg, Mr-Mz; and the inverse concentration
index to the center R50/R90. We determined the ability of each method
to predict the morphological type, and verified various dependencies
of the method's accuracy on redshifts, human labeling, morphological
shape, and overlap of different morphological types for galaxies with
the same color indices. We find that the morphology based on the
supervised machine learning methods trained over photometric
parameters demonstrates significantly less bias than the morphology
based on citizen-science classifiers.
The support-vector machine and random forest methods with Scikit-learn
software machine learning library in Python provide the highest
accuracy for the binary galaxy morphological classification.
Specifically, the success rate is 96.4% for support-vector machine
(96.1% early E and 96.9% late L types) and 95.5% for random forest
(96.7% early E and 92.8% late L types). Applying the support-vector
machine for the sample of 316 031 galaxies from the SDSS DR9 at
z<0.1 with unknown morphological types, we found 139659 E and 176372 L
types among them.
Description:
Catalogue of the morphological types of 316031 galaxies with the
absolute stellar magnitudes -24m<Mr←13m at z<0.1 from the
SDSS DR9 is obtained by the machine learning methods (support vector
machine, SVM; random forest, RF). A preliminary sample of galaxies
contained of ∼724,000 galaxies. Following the SDSS recommendation, we
input limits mr<17.7 to avoid typical statistical errors in
spectroscopic flux. The absolute stellar magnitude of the galaxy was
obtained by the formula (example for r-band):
Mr=mr-5lg(DL)-25-Kr(z)-extr), where mr - visual
stellar magnitude in r-band, DL - luminosity distance, extr -
the Galactic absorption in r-band, Kr(z) - k-correction in r-band.
The color indices were calculated as (example for g-i bands)
Mg-Mi=(mg-mi)-(extg-exti)-(Kg(z)-Ki(z)), where mg
and mi - visual stellar magnitude; extg and exti) - the Galactic
absorption; Kg(z) and Ki(z) - k-correction. We provided a binary
automated morphological classification: early-type "E" and late type
"L". The methods of Support Vector Machine and Random Forest with
Scikit-learn software machine learning library in thePython provide
the highest accuracy. Namely, 96.4% for SVM (96.1% early "E" and 96.9%
late "L" types) and 95.5% for Random Forest (96.7% early "E" and 92.8%
late "L" types). Applying the SVM we found 139659 E and 176372 L types
galaxies.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
catalog.dat 436 316031 Binary morphology SDSS galaxies catalog
--------------------------------------------------------------------------------
See also:
http://skyserver.sdss.org/dr9 : SDSS DR9 Home Page
Byte-by-byte Description of file: catalog.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 19 I19 --- objID Long object SDSS identification, which is a
bit-encoded integer of run, rerun, camcol,
field, object
21- 34 F14.10 deg RAdeg Right ascension (J2000)
36- 55 E20.14 deg DEdeg Declination (J2000)
57- 64 F8.5 mag umag Model magnitude in u-band taken from SDSS
(modelMag_u)
66- 73 F8.5 mag gmag Model magnitude in g-band taken from SDSS
(modelMag_g)
75- 82 F8.5 mag rmag Model magnitude in r-band taken from SDSS
(modelMag_r)
84- 91 F8.5 mag imag Model magnitude in i-band taken from SDSS
(modelMag_i)
93-100 F8.5 mag zmag Model magnitude in z-band taken from SDSS
(modelMag_z)
102-111 F10.8 mag Extu Galactic extinction in u-band taken from SDSS
(Extinction_u)
113-122 F10.8 mag Extg Galactic extinction in g-band taken from SDSS
(Extinction_g)
124-134 F11.9 mag Extr Galactic extinction in r-band taken from SDSS
(Extinction_r)
136-146 F11.9 mag Exti Galactic extinction in i-band taken from SDSS
(Extinction_i)
148-158 F11.9 mag Extz Galactic extinction in z-band taken from SDSS
(Extinction_z)
160-170 F11.7 arcsec R50 Radius containing 50% of the Petrosian flux
for r-band from SDSS (petroR50_r)
172-182 F11.7 arcsec R90 Radius containing 90% of the Petrosian flux
for r-band from SDSS (petroR90_r)
184-200 F17.15 --- R50/R90 Inverse concentration index
(petroR50r/petroR90r)
202-212 F11.9 --- z Redshift from SDSS (z)
214-223 F10.6 Mpc DL Luminosity distance (DL) (1)
225-243 F19.15 mag uMAG Absolute stellar magnitude calculated for
u-band: umag-5*log10(DL)-25-Extu-kcoru (Mu)
245-262 F18.14 mag gMAG Absolute stellar magnitude calculated for
g-band: gmag-5*log10(DL)-25-Extg-kcorg (Mg)
264-280 F17.13 mag rMAG Absolute stellar magnitude calculated for
r-band: rmag-5*log10(DL)-25-Extr-kcorr (Mr)
282-299 F18.14 mag iMAG Absolute stellar magnitude calculated for
i-band: imag-5*log10(DL)-25-Exti-kcori (Mi)
301-319 F19.13 mag zMAG Absolute stellar magnitude calculated for
z-band: zmag-5*log10(DL)-25-Extz-kcorz (Mz)
321-338 F18.15 mag u-r Color indices calculated in u-r bands:
umag-rmag-(Extu-Extr)-(kcoru-kcorr) (Mu-Mr)
340-357 F18.15 mag g-i Color indices calculated in g-i bands:
gmag-imag-(Extg-Exti)-(kcorg-kcori) (Mg-Mi)
359-379 E21.15 mag r-z Color indices calculated in r-z bands:
rmag-zmag-(Extr-Extz)-(kcorr-kcorz) (Mr-Mz)
381-389 E9.4 mag Kcorg K-correction in g-band (kcorrect_g) (2)
391-399 E9.4 mag Kcori K-correction in i-band (kcorrect_i) (2)
401-409 E9.4 mag Kcorr K-correction in r-band (kcorrect_r) (2)
411-420 F10.6 mag Kcoru K-correction in u-band (kcorrect_u) (2)
422-432 E11.6 mag Kcorz K-correction in z-band (kcorrect_z) (2)
434 I1 --- SVMPython [0/1] Morphological type of the galaxy
determined by Support Vector Machine method
(SVM_Python) (3)
436 I1 --- RFPython [0/1] Morphological type determined by Random
Forest method (RF_Python) (3)
--------------------------------------------------------------------------------
Note (1): Luminosity distance calculated with LUMDIST program using redshift of
galaxy, H0=71, OmegaM=0.27, Lambda0=0.73.
Note (2): Chilingarian et al. 2010MNRAS.405.1409C 2010MNRAS.405.1409C,
Chilingarian & Zolotukhin 2012MNRAS.419.1727C 2012MNRAS.419.1727C).
Note (3): Morphological type of the galaxy as follows:
0 = Early "E" type
1 = Late "L" type
--------------------------------------------------------------------------------
History:
From Irina Vavilova, irivav(at)mao.kiev.ua
Acknowledgements:
We thank Prof. Massimo Capacciolli and Dr. ValentinaKarachentseva for
the fruitful discussion and remarks. We are grateful to the referee
for useful comments that allowed us to present the results of ourstudy
more fully. This work was supported in frame of the budgetary pro-gram
"Support for the development of priority fields of scientific
research" (CPCEL 6541230), the grant for Young Scientist's Research
Laboratories(2018-2019, Dobrycheva D.V.), and the Youth Scientific
Project (2019-2020,Dobrycheva D.V., Vasylenko M.Yu.) of the National
Academy of Sciences of Ukraine. The use of the SDSS (Ahn et al.,
2012ApJS..203...21A 2012ApJS..203...21A; Blanton et al., 2017AJ....154...28B 2017AJ....154...28B; Ahumada et
al., 2020ApJS..249....3A 2020ApJS..249....3A), HyperLeda (Makarov et al.,
2014A&A...570A..13M 2014A&A...570A..13M), and SAO/NASA Astro-physics Data System was
extensively applicable. This study has also made with the NASA/IPAC
Extragalactic Database (NED), which is operated by the JetPropulsion
Laboratory, California Institute of Technology, under contract with
the NASA.
(End) Patricia Vannier [CDS] 23-Mar-2021