J/A+A/648/A122      SDSS galaxies morphological classification (Vavilova+, 2021)

Machine learning technique for morphological classification of galaxies from the SDSS. I. Photometry-based approach. Vavilova I.B., Dobrycheva D.V., Vasylenko M.Yu., Elyiv A.A., Melnyk O.V., Khramtsov V. <Astron. Astrophys. 648, A122 (2021)> =2021A&A...648A.122V 2021A&A...648A.122V (SIMBAD/NED BibCode)
ADC_Keywords: Galaxy catalogs ; Morphology ; Photometry, SDSS Keywords: galaxies: general - methods: data analysis - galaxies: statistics - galaxies: photometry - galaxies: spiral - galaxies: elliptical and lenticular, cD Abstract: Machine learning methods are effective tools in astronomical tasks for classifying objects by their individual features. One of the promising utilities is related to the morphological classification of galaxies at different redshifts. We use the photometry-based approach for the SDSS data (1) to exploit five supervised machine learning techniques and define the most effective among them for the automated galaxy morphological classification; (2) to test the influence of photometry data on morphology classification; (3) to discuss problem points of supervised machine learning and labeling bias; and (4) to apply the best fitting machine learning methods for revealing the unknown morphological types of galaxies from the SDSS DR9 at z<0.1. We used different galaxy classification techniques: human labeling, multi-photometry diagrams, naive Bayes, logistic regression, support-vector machine, random forest, k-nearest neighbors. We present the results of a binary automated morphological classification of galaxies conducted by human labeling, multi-photometry, and five supervised machine learning methods. We applied it to the sample of galaxies from the SDSS DR9 with redshifts of 0.02<z<0.1 and absolute stellar magnitudes of -24mag<Mr←19.4mag. For the analysis we used absolute magnitudes Mu, Mg, Mr, Mi, Mz; color indices Mu-Mr, Mg-Mi, Mu-Mg, Mr-Mz; and the inverse concentration index to the center R50/R90. We determined the ability of each method to predict the morphological type, and verified various dependencies of the method's accuracy on redshifts, human labeling, morphological shape, and overlap of different morphological types for galaxies with the same color indices. We find that the morphology based on the supervised machine learning methods trained over photometric parameters demonstrates significantly less bias than the morphology based on citizen-science classifiers. The support-vector machine and random forest methods with Scikit-learn software machine learning library in Python provide the highest accuracy for the binary galaxy morphological classification. Specifically, the success rate is 96.4% for support-vector machine (96.1% early E and 96.9% late L types) and 95.5% for random forest (96.7% early E and 92.8% late L types). Applying the support-vector machine for the sample of 316 031 galaxies from the SDSS DR9 at z<0.1 with unknown morphological types, we found 139659 E and 176372 L types among them. Description: Catalogue of the morphological types of 316031 galaxies with the absolute stellar magnitudes -24m<Mr←13m at z<0.1 from the SDSS DR9 is obtained by the machine learning methods (support vector machine, SVM; random forest, RF). A preliminary sample of galaxies contained of ∼724,000 galaxies. Following the SDSS recommendation, we input limits mr<17.7 to avoid typical statistical errors in spectroscopic flux. The absolute stellar magnitude of the galaxy was obtained by the formula (example for r-band): Mr=mr-5lg(DL)-25-Kr(z)-extr), where mr - visual stellar magnitude in r-band, DL - luminosity distance, extr - the Galactic absorption in r-band, Kr(z) - k-correction in r-band. The color indices were calculated as (example for g-i bands) Mg-Mi=(mg-mi)-(extg-exti)-(Kg(z)-Ki(z)), where mg and mi - visual stellar magnitude; extg and exti) - the Galactic absorption; Kg(z) and Ki(z) - k-correction. We provided a binary automated morphological classification: early-type "E" and late type "L". The methods of Support Vector Machine and Random Forest with Scikit-learn software machine learning library in thePython provide the highest accuracy. Namely, 96.4% for SVM (96.1% early "E" and 96.9% late "L" types) and 95.5% for Random Forest (96.7% early "E" and 92.8% late "L" types). Applying the SVM we found 139659 E and 176372 L types galaxies. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file catalog.dat 436 316031 Binary morphology SDSS galaxies catalog -------------------------------------------------------------------------------- See also: http://skyserver.sdss.org/dr9 : SDSS DR9 Home Page Byte-by-byte Description of file: catalog.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 19 I19 --- objID Long object SDSS identification, which is a bit-encoded integer of run, rerun, camcol, field, object 21- 34 F14.10 deg RAdeg Right ascension (J2000) 36- 55 E20.14 deg DEdeg Declination (J2000) 57- 64 F8.5 mag umag Model magnitude in u-band taken from SDSS (modelMag_u) 66- 73 F8.5 mag gmag Model magnitude in g-band taken from SDSS (modelMag_g) 75- 82 F8.5 mag rmag Model magnitude in r-band taken from SDSS (modelMag_r) 84- 91 F8.5 mag imag Model magnitude in i-band taken from SDSS (modelMag_i) 93-100 F8.5 mag zmag Model magnitude in z-band taken from SDSS (modelMag_z) 102-111 F10.8 mag Extu Galactic extinction in u-band taken from SDSS (Extinction_u) 113-122 F10.8 mag Extg Galactic extinction in g-band taken from SDSS (Extinction_g) 124-134 F11.9 mag Extr Galactic extinction in r-band taken from SDSS (Extinction_r) 136-146 F11.9 mag Exti Galactic extinction in i-band taken from SDSS (Extinction_i) 148-158 F11.9 mag Extz Galactic extinction in z-band taken from SDSS (Extinction_z) 160-170 F11.7 arcsec R50 Radius containing 50% of the Petrosian flux for r-band from SDSS (petroR50_r) 172-182 F11.7 arcsec R90 Radius containing 90% of the Petrosian flux for r-band from SDSS (petroR90_r) 184-200 F17.15 --- R50/R90 Inverse concentration index (petroR50r/petroR90r) 202-212 F11.9 --- z Redshift from SDSS (z) 214-223 F10.6 Mpc DL Luminosity distance (DL) (1) 225-243 F19.15 mag uMAG Absolute stellar magnitude calculated for u-band: umag-5*log10(DL)-25-Extu-kcoru (Mu) 245-262 F18.14 mag gMAG Absolute stellar magnitude calculated for g-band: gmag-5*log10(DL)-25-Extg-kcorg (Mg) 264-280 F17.13 mag rMAG Absolute stellar magnitude calculated for r-band: rmag-5*log10(DL)-25-Extr-kcorr (Mr) 282-299 F18.14 mag iMAG Absolute stellar magnitude calculated for i-band: imag-5*log10(DL)-25-Exti-kcori (Mi) 301-319 F19.13 mag zMAG Absolute stellar magnitude calculated for z-band: zmag-5*log10(DL)-25-Extz-kcorz (Mz) 321-338 F18.15 mag u-r Color indices calculated in u-r bands: umag-rmag-(Extu-Extr)-(kcoru-kcorr) (Mu-Mr) 340-357 F18.15 mag g-i Color indices calculated in g-i bands: gmag-imag-(Extg-Exti)-(kcorg-kcori) (Mg-Mi) 359-379 E21.15 mag r-z Color indices calculated in r-z bands: rmag-zmag-(Extr-Extz)-(kcorr-kcorz) (Mr-Mz) 381-389 E9.4 mag Kcorg K-correction in g-band (kcorrect_g) (2) 391-399 E9.4 mag Kcori K-correction in i-band (kcorrect_i) (2) 401-409 E9.4 mag Kcorr K-correction in r-band (kcorrect_r) (2) 411-420 F10.6 mag Kcoru K-correction in u-band (kcorrect_u) (2) 422-432 E11.6 mag Kcorz K-correction in z-band (kcorrect_z) (2) 434 I1 --- SVMPython [0/1] Morphological type of the galaxy determined by Support Vector Machine method (SVM_Python) (3) 436 I1 --- RFPython [0/1] Morphological type determined by Random Forest method (RF_Python) (3) -------------------------------------------------------------------------------- Note (1): Luminosity distance calculated with LUMDIST program using redshift of galaxy, H0=71, OmegaM=0.27, Lambda0=0.73. Note (2): Chilingarian et al. 2010MNRAS.405.1409C 2010MNRAS.405.1409C, Chilingarian & Zolotukhin 2012MNRAS.419.1727C 2012MNRAS.419.1727C). Note (3): Morphological type of the galaxy as follows: 0 = Early "E" type 1 = Late "L" type -------------------------------------------------------------------------------- History: From Irina Vavilova, irivav(at)mao.kiev.ua Acknowledgements: We thank Prof. Massimo Capacciolli and Dr. ValentinaKarachentseva for the fruitful discussion and remarks. We are grateful to the referee for useful comments that allowed us to present the results of ourstudy more fully. This work was supported in frame of the budgetary pro-gram "Support for the development of priority fields of scientific research" (CPCEL 6541230), the grant for Young Scientist's Research Laboratories(2018-2019, Dobrycheva D.V.), and the Youth Scientific Project (2019-2020,Dobrycheva D.V., Vasylenko M.Yu.) of the National Academy of Sciences of Ukraine. The use of the SDSS (Ahn et al., 2012ApJS..203...21A 2012ApJS..203...21A; Blanton et al., 2017AJ....154...28B 2017AJ....154...28B; Ahumada et al., 2020ApJS..249....3A 2020ApJS..249....3A), HyperLeda (Makarov et al., 2014A&A...570A..13M 2014A&A...570A..13M), and SAO/NASA Astro-physics Data System was extensively applicable. This study has also made with the NASA/IPAC Extragalactic Database (NED), which is operated by the JetPropulsion Laboratory, California Institute of Technology, under contract with the NASA.
(End) Patricia Vannier [CDS] 23-Mar-2021
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line