J/other/KNIT/28.3 Galaxies at 0.02<z<0.1 morphological catalog (Vavilova+, 2022)

Machine learning technique for morphological classification of galaxies from SDSS. II. The image-based morphological catalogs of galaxies at 0.02<z<0.1. Vavilova I.B., Khramtsov V., Dobrycheva D.V., Vasylenko M.Yu., Elyiv A.A., Melnyk O.V. <Space Science and Technology, 28, 3-22 (2022)> =2022KNIT...28....3V 2022KNIT...28....3V (SIMBAD/NED BibCode)
ADC_Keywords: Galaxy catalogs ; Morphology ; Photometry, SDSS Keywords: methods: data analysis; machine learning, convolutional neural networks - galaxies: general, morphological classification - galaxy catalogs - large-scale structure of the Universe Abstract: We applied the image-based approach with a convolutional neural network model to the sample of low-redshifts galaxies with absolute magnitudes -24<Mr<19.4 mag from the SDSS DR9. We divided it into two subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2) dataset, considering them as the inference and training datasets, respectively. When training the CNN classifier for a more accurate result, we took into consideration only those galaxies for which GZ2's volunteers gave the most votes. As a result, we created the morphological catalog of 315776 galaxies at 0.02<z<0.1. The CNN classifier shows the promising performance of morphological classification attaining >93 % of accuracy for five classes morphology prediction except the cigar-shaped (∼75 %) and completely rounded (∼83 %) galaxies. The catalog includes 27378 completely rounded, 59 194 rounded in-between, 18862 cigar-shaped, 7831 edge-on, and 23119 spiral galaxies of the inference dataset, which were first defined, as well as the galaxies from the GZ2 training sample with reassigned types and corrected types in case of the low volunteer votes (Vavilova et al., 2022KNIT...28....3V 2022KNIT...28....3V). As for the classification of galaxies by their detailed structural morphological features, our CNN model gives the accuracy in the range of 83.3-99.4% depending on features, a number of galaxies with the given feature in the inference dataset, and the galaxy image quality. This allowed us, for the first time, to assign the detailed morphological classification for more than 140K low-redshift galaxies. We describe in detail the adversarial validation technique as well as how we managed the optimal train-test split of galaxies from the training data set. We have also found optimal galaxy image transformations to increase the classifier generalization ability. It can be considered as another way to improve the human bias for those galaxy images that had a poor vote classification in the GZ project. Such an approach, likely auto- immunization, when the CNN classifier trained on very good images is able to retrain bad images from the same homogeneous sample, can be considered co-planar to other methods of combating the human bias (Khramtsov et al., 2022KNIT...28....3V 2022KNIT...28....3V). We demonstrate that implication of the CNN model with adversarial validation and adversarial image data augmentation improves classification of smaller and fainter SDSS galaxies with mr<17.7. The proposed CNN model allows solving a bunch of galaxy classification problems, for example, such as a quick selection of galaxies with a bar, bulge, ring, and other morphological features for their subsequent analysis. Galaxies at 0.02<z<0.1 morphological catalog v.2 (Vavilova et al., 2022KNIT...28....3V 2022KNIT...28....3V) is available in CSV format at ftp://ftp.mao.kiev.ua/pub/astro/cats/galaxies /galSDSSDR9zto0.1morph_classification.csv. This Catalog is supplemented with the VizieR Online Data Catalog: SDSS galaxies morphological classification (Vavilova et al., 2021A&A...648A.122V 2021A&A...648A.122V, Cat. J/A+A/648/A122). This Catalog is also supplemented with the papers: - Machine learning technique for morphological classification of galaxies from SDSS. II. The image-based morphological catalogs of galaxies at 0.02<z<0.1. Vavilova, I. B. ; Khramtsov, V. ; Dobrycheva, D. V. ; Vasylenko, M. Yu. ; Elyiv, A. A. ; Melnyk, O. V. Space Science and Technology, Vol. 28, No. 1, pp.03-22 (2022KNIT...28....3V 2022KNIT...28....3V). - Machine learning technique for morphological classification of galaxies from the SDSS. III. The CNN image-based inference of detailed features. Khramtsov, V., Vavilova, I.B., Dobrycheva, D.V., Vasylenko, M.Yu., Melnyk, O.V., Elyiv, A.A., Akhmetov, V.S., Dmytrenko, A.M. Space Science and Technology, Vol. 28, No. 5, pp. 27-55 (2022KNIT...28....3V 2022KNIT...28....3V). https://doi.org/10.15407/knit2022.05.027 Description: The morphological catalog of 315 776 galaxies at 0.02<z<0.1 with the absolute stellar magnitudes in the range of -24...-13 at z<0.1 from the SDSS DR9 is obtained by human labeling, multi-photometry, supervised machine learning methods, and CNN classifier. For the photometric binary morphological classification, we used absolute magnitudes Mu, Mg, Mr, Mi, Mz; color indices Mu-Mr, Mg-Mi, Mu-Mg, Mr-Mz; and the inverse concentration index to the center R50/R90. The supervised methods provide the accuracy of 96.4% for Support Vector Machine (96.1% early and 96.9% late types) and 95.5% for Random Forest (96.7% early and 92.8% late types). To obtain the CNN image-based classification of morphological classes and features, we divided 315782 galaxies into two subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2) dataset, considering them as the inference and training datasets, respectively. When training the CNN classifier for a more accurate result, we took into consideration only those galaxies for which GZ2's volunteers gave the most votes. The criteria for each image of the galaxy are defined in the GZ2 project, their description is available through web-site https://data.galaxyzoo.org/. The accuracy of CNN-classifier on the morphological classes is as follows: cigar-shaped (75%), completely round (83%), round in-between (93%), edge-on (93%), spiral (96%). As for the classification of galaxies by their detailed 32 structural morphological features, our CNN model gives the accuracy in the range of 83.3-99.4% depending on features (bar, rings, number of spiral arms, mergers, dust lane, edge-on, etc.), a number of galaxies with the given feature in the inference dataset, and the galaxy image quality. As a result, for the first time, we assigned the detailed morphological classification for ∼140 000 low-redshift galaxies, especially at the fainter end mr <17.7. Galaxies at 0.02<z<0.1 morphological catalog v.2 (Vavilova et al., 2022KNIT...28....3V 2022KNIT...28....3V) is available in CSV format at ftp://ftp.mao.kiev.ua/pub/astro/cats/galaxies/ galSDSSDR9zto0.1morph_classification.csv. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file galclsv2.dat 404 315776 Morphological catalog of galaxies (update on 10-Feb-2023) -------------------------------------------------------------------------------- See also: J/A+A/648/A122 : SDSS galaxies morphological classification (Vavilova+, 2021) Byte-by-byte Description of file: galclsv2.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 19 I19 --- BESTObjID Long object SDSS identification (1) 21- 30 F10.6 deg RAdeg Right ascension (J2000) from SDSS 32- 41 F10.6 deg DEdeg Declination (J2000) from SDSS 43- 50 F8.5 mag rmag Model magnitude in r-band taken from SDSS (modelMag_r) (2) 52- 58 F7.5 mag Extr Galactic extinction in different bands taken from SDSS (Extinction_r) (3) 60- 67 F8.5 mag Kcorr K-correction in r-band (kcorrect_r)(4) 69- 77 F9.5 arcsec R50 Radius containing 50% of the Petrosian flux for r-band from SDSS (petroR50_r) (5) 79- 87 F9.5 arcsec R90 Radius containing 90% of the Petrosian flux for r-band from SDSS (petroR90_r) (5) 89- 97 F9.5 Mpc DL Luminosity distance (DL) (6) 99-105 F7.5 --- z Redshift z taken from SDSS 107-115 F9.5 mag rMAG Absolute stellar magnitude calculated for r-band(7) 117-117 A1 --- MPD [012] Morphological type determined by multi-photometry-diagrams method (8) 119-119 A1 --- RFPython [01] Morphological type determined by Random Forest method (RF_Python) (9) 121-121 A1 --- SVMPython [01] Morphological type of the galaxy determined by Support Vector Machine method (SVM_Python) (10) 123-130 F8.6 --- compl-round CNN probability of galaxy, the "Completely round" morphological class (11) 132-139 F8.6 --- round-in-bet CNN probability of galaxy, the "Round-in-between" morphological class (12) 141-148 F8.6 --- cigar-shap CNN probability of galaxy, the "Cigar-shaped" morphological class (13) 150-156 F7.5 --- edge-on CNN probability of galaxy, the "Edge on" morphological class (14) 158-164 F7.5 --- spiral CNN probability of galaxy, the "Spiral" morphological class (15) 166-166 A1 --- failed [01] Failed flag (16) 168-173 F6.4 --- smooth-ft CNN probability of galaxy to have smooth_feature 175-180 F6.4 --- disk-ft CNN probability of galaxy to have featuresordisk_feature 182-187 F6.4 --- art-ft CNN probability of galaxy to have starorartifact_feature 189-194 F6.4 --- edge-yes-ft CNN probability of galaxy to have edgeonyesfeature 196-201 F6.4 --- edge-no-ft CNN probability of galaxy to have edgeonnofeature 203-208 F6.4 --- bar-ft CNN probability of galaxy to have bar_feature 210-215 F6.4 --- no-bar-ft CNN probability of galaxy to have nobarfeature 217-222 F6.4 --- spiral-ft CNN probability of galaxy to have spiral_feature 224-229 F6.4 --- no-spiral-ft CNN probability of galaxy to have nospiralfeature 231-236 F6.4 --- no-bulge-ft CNN probability of galaxy to have nobulgefeature 238-243 F6.4 --- bulge-jst-ft CNN probability of galaxy to have bulgejustnoticeable_feature 245-250 F6.4 --- bulge-obv-ft CNN probability of galaxy to have bulgeobviousfeature 252-257 F6.4 --- odd-yes-ft CNN probability of galaxy to have oddyesfeature 259-264 F6.4 --- odd-no-ft CNN probability of galaxy to have oddnofeature 266-271 F6.4 --- compl-rnd-ft CNN probability of galaxy to have completelyroundfeature 273-278 F6.4 --- rnd-bet-ft CNN probability of galaxy to have roundedinbetween_feature 280-285 F6.4 --- cigar-ft CNN probability of galaxy to have cigarshapedfeature 287-292 F6.4 --- ring-ft CNN probability of galaxy to have ring_feature 294-299 F6.4 --- dist-ft CNN probability of galaxy to have disturbed_feature 301-306 F6.4 --- irreg-ft CNN probability of galaxy to have irregular_feature 308-313 F6.4 --- other-ft CNN probability of galaxy to have other_feature 315-320 F6.4 --- merger-ft CNN probability of galaxy to have merger_feature 322-327 F6.4 --- dust-ft CNN probability of galaxy to have dustlanefeature 329-334 F6.4 --- bulge-rnd-ft CNN probability of galaxy to have bulgeshaperound_feature 336-341 F6.4 --- blg-sh-no-ft CNN probability of galaxy to have bulgeshapenobulgefeature 343-348 F6.4 --- armswng-t-ft CNN probability of galaxy to have armswindingtight_feature 350-355 F6.4 --- armswng-m-ft CNN probability of galaxy to have armswindingmedium_feature 357-362 F6.4 --- armswng-l-ft CNN probability of galaxy to have armswindingloose_feature 364-369 F6.4 --- arms-n1-ft CNN probability of galaxy to have armsnumber1_feature 371-376 F6.4 --- arms-n2-ft CNN probability of galaxy to have armsnumber2_feature 378-383 F6.4 --- arms-n3-ft CNN probability of galaxy to have armsnumber3_feature 385-390 F6.4 --- arms-n4-ft CNN probability of galaxy to have armsnumber4_feature 392-397 F6.4 --- arms-n5-ft CNN probability of galaxy to have armsnumbermorethan4_feature 399-404 F6.4 --- arms-non-ft CNN probability of galaxy to have armsnumbercanttellfeature -------------------------------------------------------------------------------- Note (1): A bit-encoded integer of run, rerun, camcol, field, object Note (2): Just as the PSF (point spread function) magnitudes are optimal measures of the fluxes of stars, the optimal measure of the flux of a galaxy would use a matched galaxy model. With this in mind, the code fits two models to the two-dimensional image of each object in each band: a pure deVaucouleurs profile, and a pure exponential profile. Note (3): Galactic extinction in different bands taken from SDSS. Reddening corrections in magnitudes at the position of each object, called extinction in the database, are computed in a r-band (Extinction_r) following Schlegel, Finkbeiner & Davis (1998ApJ...500..525S 1998ApJ...500..525S). Note (4): It is a correction to an astronomical object's magnitude (or equivalently, its flux) that allows a measurement of a quantity of light from an object at a redshift z to be converted to an equivalent measurement in the rest frame of the object. It was computed in a r-band (kcorrect_r) following Chilingarian et al. (2010MNRAS.405.1409C 2010MNRAS.405.1409C), Chilingarian & Zolotukhin (2012MNRAS.419.1727C 2012MNRAS.419.1727C). Note (5): petroR50_r, petroR90_r - the radii containing 50% and 90% respectively of the Petrosian flux for r-band. The characteristics of surface brightness in the target selection pipeline of the SDSS is the mean surface brightness within petroR50. It turns out that the ratio of petroR50 to petroR90, the so-called "inverse concentration index", is correlated with morphology. Note (6): DL - luminosity distance (in Mpc). We calculated it with LUMDIST program using parameters as follows, redshift of galaxy, H0=71, OmegaM=0.27, Lambda0=0.73 (https://idlastro.gsfc.nasa.gov/ftp/pro/astro/ lumdist.pro). Note (7): Mr - the absolute stellar magnitudes in r-band calculated by formula: Mr = modelMag_r - 5*log10(DL) - 25 - Extinction_r - kcorrect_r, where modelMag_r - the model magnitudes in r-band; DL - luminosity distance; Extinction_r - Galactic extinction in an r-band; kcorrect_r - k-correction in an r-band. Note (8): MPD - Morphological type determined by multi-photometry-diagrams method, as follows: 0 = Early types 1 = Late types 2 = Irregular (see, a paper by Melnyk, O.V.; Dobrycheva, D.V.; Vavilova, I.B., 2012Ap.....55..293M 2012Ap.....55..293M, DOI: 10.1007/s10511-012-9236-7). Note (9): Morphological type determined by Random Forest method as follows: 0 = Early types 1 = Late types Accuracy of the method is 96.7% for Early and 92.8% for Late types (Vavilova, I.B.; Dobrycheva, D.V.; Vasylenko, M.Yu. et al., 2021A&A...648A.122V 2021A&A...648A.122V, DOI: 10.1051/0004-6361/202038981). Note (10): Morphological type determined by Support Vector Machine method as follows: 0 = Early types 1 = Late types Accuracy of the method is 96.4% for Early and 96.9% for Late types (Vavilova, I.B.; Dobrycheva, D.V.; Vasylenko, M.Yu. et al., 2021A&A...648A.122V 2021A&A...648A.122V, DOI: 10.1051/0004-6361/202038981). Note (11): CNN probability of galaxy to be assigned to the "Completely round" morphological class in terms of the Galaxy Zoo 2 nomenclature. Note (12): CNN probability of galaxy to be assigned to the "Round-in-between" morphological class in terms of the Galaxy Zoo 2 nomenclature. Note (13): CNN probability of galaxy to be assigned to the "Cigar-shaped" morphological class in terms of the Galaxy Zoo 2 nomenclature. Note (14): CNN probability of galaxy to be assigned to the "Edge on" morphological class in terms of the Galaxy Zoo 2 nomenclature. Note (15): CNN probability of galaxy to be assigned to the "Spiral" morphological class in terms of the Galaxy Zoo 2 nomenclature. Note (16): Failed flag as follows: 0 = the image quality for CNN classification is good 1 = the image quality for CNN classification is failed -------------------------------------------------------------------------------- Acknowledgements: Ludmila Pakuliak, pakuliak(at)mao.kiev.ua References: Vavilova et al., Paper I 2021A&A...648A.122V 2021A&A...648A.122V, Cat. J/A+A/648/A122
(End) Ludmila Pakuliak [MAO NAS of Ukraine], Patricia Vannier [CDS] 07-Feb-2023
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line