J/other/KNIT/28.3 Galaxies at 0.02<z<0.1 morphological catalog (Vavilova+, 2022)
Machine learning technique for morphological classification of galaxies from
SDSS. II. The image-based morphological catalogs of galaxies at 0.02<z<0.1.
Vavilova I.B., Khramtsov V., Dobrycheva D.V., Vasylenko M.Yu., Elyiv A.A.,
Melnyk O.V.
<Space Science and Technology, 28, 3-22 (2022)>
=2022KNIT...28....3V 2022KNIT...28....3V (SIMBAD/NED BibCode)
ADC_Keywords: Galaxy catalogs ; Morphology ; Photometry, SDSS
Keywords: methods: data analysis; machine learning,
convolutional neural networks -
galaxies: general, morphological classification -
galaxy catalogs - large-scale structure of the Universe
Abstract:
We applied the image-based approach with a convolutional neural
network model to the sample of low-redshifts galaxies with absolute
magnitudes -24<Mr<19.4 mag from the SDSS DR9. We divided it into two
subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2) dataset,
considering them as the inference and training datasets, respectively.
When training the CNN classifier for a more accurate result, we took
into consideration only those galaxies for which GZ2's volunteers
gave the most votes. As a result, we created the morphological catalog
of 315776 galaxies at 0.02<z<0.1. The CNN classifier shows the
promising performance of morphological classification attaining >93 %
of accuracy for five classes morphology prediction except the
cigar-shaped (∼75 %) and completely rounded (∼83 %) galaxies. The
catalog includes 27378 completely rounded, 59 194 rounded in-between,
18862 cigar-shaped, 7831 edge-on, and 23119 spiral galaxies of the
inference dataset, which were first defined, as well as the galaxies
from the GZ2 training sample with reassigned types and corrected
types in case of the low volunteer votes (Vavilova et al.,
2022KNIT...28....3V 2022KNIT...28....3V). As for the classification of galaxies by their
detailed structural morphological features, our CNN model gives the
accuracy in the range of 83.3-99.4% depending on features, a number of
galaxies with the given feature in the inference dataset, and the
galaxy image quality. This allowed us, for the first time, to assign
the detailed morphological classification for more than 140K
low-redshift galaxies. We describe in detail the adversarial
validation technique as well as how we managed the optimal train-test
split of galaxies from the training data set. We have also found
optimal galaxy image transformations to increase the classifier
generalization ability. It can be considered as another way to improve
the human bias for those galaxy images that had a poor vote
classification in the GZ project. Such an approach, likely auto-
immunization, when the CNN classifier trained on very good images is
able to retrain bad images from the same homogeneous sample, can be
considered co-planar to other methods of combating the human bias
(Khramtsov et al., 2022KNIT...28....3V 2022KNIT...28....3V). We demonstrate that
implication of the CNN model with adversarial validation and
adversarial image data augmentation improves classification of smaller
and fainter SDSS galaxies with mr<17.7. The proposed CNN model allows
solving a bunch of galaxy classification problems, for example, such
as a quick selection of galaxies with a bar, bulge, ring, and other
morphological features for their subsequent analysis.
Galaxies at 0.02<z<0.1 morphological catalog v.2 (Vavilova et al.,
2022KNIT...28....3V 2022KNIT...28....3V) is available in CSV format at
ftp://ftp.mao.kiev.ua/pub/astro/cats/galaxies
/galSDSSDR9zto0.1morph_classification.csv.
This Catalog is supplemented with the VizieR Online Data Catalog: SDSS
galaxies morphological classification (Vavilova et al.,
2021A&A...648A.122V 2021A&A...648A.122V, Cat. J/A+A/648/A122).
This Catalog is also supplemented with the papers: - Machine learning
technique for morphological classification of galaxies from SDSS. II.
The image-based morphological catalogs of galaxies at 0.02<z<0.1.
Vavilova, I. B. ; Khramtsov, V. ; Dobrycheva, D. V. ; Vasylenko, M.
Yu. ; Elyiv, A. A. ; Melnyk, O. V. Space Science and Technology, Vol.
28, No. 1, pp.03-22 (2022KNIT...28....3V 2022KNIT...28....3V).
- Machine learning technique for morphological classification of
galaxies from the SDSS. III. The CNN image-based inference of detailed
features. Khramtsov, V., Vavilova, I.B., Dobrycheva, D.V., Vasylenko,
M.Yu., Melnyk, O.V., Elyiv, A.A., Akhmetov, V.S., Dmytrenko, A.M.
Space Science and Technology, Vol. 28, No. 5, pp. 27-55
(2022KNIT...28....3V 2022KNIT...28....3V). https://doi.org/10.15407/knit2022.05.027
Description:
The morphological catalog of 315 776 galaxies at 0.02<z<0.1 with the
absolute stellar magnitudes in the range of -24...-13 at z<0.1 from
the SDSS DR9 is obtained by human labeling, multi-photometry,
supervised machine learning methods, and CNN classifier.
For the photometric binary morphological classification, we used
absolute magnitudes Mu, Mg, Mr, Mi, Mz; color indices
Mu-Mr, Mg-Mi, Mu-Mg, Mr-Mz; and the inverse
concentration index to the center R50/R90. The supervised methods
provide the accuracy of 96.4% for Support Vector Machine (96.1% early
and 96.9% late types) and 95.5% for Random Forest (96.7% early and
92.8% late types). To obtain the CNN image-based classification of
morphological classes and features, we divided 315782 galaxies into
two subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2)
dataset, considering them as the inference and training datasets,
respectively. When training the CNN classifier for a more accurate
result, we took into consideration only those galaxies for which
GZ2's volunteers gave the most votes. The criteria for each image of
the galaxy are defined in the GZ2 project, their description is
available through web-site https://data.galaxyzoo.org/.
The accuracy of CNN-classifier on the morphological classes is as
follows: cigar-shaped (75%), completely round (83%), round in-between
(93%), edge-on (93%), spiral (96%). As for the classification of
galaxies by their detailed 32 structural morphological features, our
CNN model gives the accuracy in the range of 83.3-99.4% depending on
features (bar, rings, number of spiral arms, mergers, dust lane,
edge-on, etc.), a number of galaxies with the given feature in the
inference dataset, and the galaxy image quality. As a result, for the
first time, we assigned the detailed morphological classification for
∼140 000 low-redshift galaxies, especially at the fainter end mr
<17.7.
Galaxies at 0.02<z<0.1 morphological catalog v.2 (Vavilova et al.,
2022KNIT...28....3V 2022KNIT...28....3V) is available in CSV format at
ftp://ftp.mao.kiev.ua/pub/astro/cats/galaxies/
galSDSSDR9zto0.1morph_classification.csv.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
galclsv2.dat 404 315776 Morphological catalog of galaxies
(update on 10-Feb-2023)
--------------------------------------------------------------------------------
See also:
J/A+A/648/A122 : SDSS galaxies morphological classification (Vavilova+, 2021)
Byte-by-byte Description of file: galclsv2.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 19 I19 --- BESTObjID Long object SDSS identification (1)
21- 30 F10.6 deg RAdeg Right ascension (J2000) from SDSS
32- 41 F10.6 deg DEdeg Declination (J2000) from SDSS
43- 50 F8.5 mag rmag Model magnitude in r-band taken from
SDSS (modelMag_r) (2)
52- 58 F7.5 mag Extr Galactic extinction in different bands
taken from SDSS (Extinction_r) (3)
60- 67 F8.5 mag Kcorr K-correction in r-band (kcorrect_r)(4)
69- 77 F9.5 arcsec R50 Radius containing 50% of the Petrosian
flux for r-band from SDSS (petroR50_r) (5)
79- 87 F9.5 arcsec R90 Radius containing 90% of the Petrosian
flux for r-band from SDSS (petroR90_r) (5)
89- 97 F9.5 Mpc DL Luminosity distance (DL) (6)
99-105 F7.5 --- z Redshift z taken from SDSS
107-115 F9.5 mag rMAG Absolute stellar magnitude calculated for
r-band(7)
117-117 A1 --- MPD [012] Morphological type determined
by multi-photometry-diagrams method (8)
119-119 A1 --- RFPython [01] Morphological type determined by
Random Forest method (RF_Python) (9)
121-121 A1 --- SVMPython [01] Morphological type of the galaxy
determined by Support Vector Machine method
(SVM_Python) (10)
123-130 F8.6 --- compl-round CNN probability of galaxy, the "Completely
round" morphological class (11)
132-139 F8.6 --- round-in-bet CNN probability of galaxy,
the "Round-in-between" morphological
class (12)
141-148 F8.6 --- cigar-shap CNN probability of galaxy,
the "Cigar-shaped" morphological class (13)
150-156 F7.5 --- edge-on CNN probability of galaxy,
the "Edge on" morphological class (14)
158-164 F7.5 --- spiral CNN probability of galaxy, the "Spiral"
morphological class (15)
166-166 A1 --- failed [01] Failed flag (16)
168-173 F6.4 --- smooth-ft CNN probability of galaxy to have
smooth_feature
175-180 F6.4 --- disk-ft CNN probability of galaxy to have
featuresordisk_feature
182-187 F6.4 --- art-ft CNN probability of galaxy to have
starorartifact_feature
189-194 F6.4 --- edge-yes-ft CNN probability of galaxy to have
edgeonyesfeature
196-201 F6.4 --- edge-no-ft CNN probability of galaxy to have
edgeonnofeature
203-208 F6.4 --- bar-ft CNN probability of galaxy to have
bar_feature
210-215 F6.4 --- no-bar-ft CNN probability of galaxy to have
nobarfeature
217-222 F6.4 --- spiral-ft CNN probability of galaxy to have
spiral_feature
224-229 F6.4 --- no-spiral-ft CNN probability of galaxy to have
nospiralfeature
231-236 F6.4 --- no-bulge-ft CNN probability of galaxy to have
nobulgefeature
238-243 F6.4 --- bulge-jst-ft CNN probability of galaxy to have
bulgejustnoticeable_feature
245-250 F6.4 --- bulge-obv-ft CNN probability of galaxy to have
bulgeobviousfeature
252-257 F6.4 --- odd-yes-ft CNN probability of galaxy to have
oddyesfeature
259-264 F6.4 --- odd-no-ft CNN probability of galaxy to have
oddnofeature
266-271 F6.4 --- compl-rnd-ft CNN probability of galaxy to have
completelyroundfeature
273-278 F6.4 --- rnd-bet-ft CNN probability of galaxy to have
roundedinbetween_feature
280-285 F6.4 --- cigar-ft CNN probability of galaxy to have
cigarshapedfeature
287-292 F6.4 --- ring-ft CNN probability of galaxy to have
ring_feature
294-299 F6.4 --- dist-ft CNN probability of galaxy to have
disturbed_feature
301-306 F6.4 --- irreg-ft CNN probability of galaxy to have
irregular_feature
308-313 F6.4 --- other-ft CNN probability of galaxy to have
other_feature
315-320 F6.4 --- merger-ft CNN probability of galaxy to have
merger_feature
322-327 F6.4 --- dust-ft CNN probability of galaxy to have
dustlanefeature
329-334 F6.4 --- bulge-rnd-ft CNN probability of galaxy to have
bulgeshaperound_feature
336-341 F6.4 --- blg-sh-no-ft CNN probability of galaxy to have
bulgeshapenobulgefeature
343-348 F6.4 --- armswng-t-ft CNN probability of galaxy to have
armswindingtight_feature
350-355 F6.4 --- armswng-m-ft CNN probability of galaxy to have
armswindingmedium_feature
357-362 F6.4 --- armswng-l-ft CNN probability of galaxy to have
armswindingloose_feature
364-369 F6.4 --- arms-n1-ft CNN probability of galaxy to have
armsnumber1_feature
371-376 F6.4 --- arms-n2-ft CNN probability of galaxy to have
armsnumber2_feature
378-383 F6.4 --- arms-n3-ft CNN probability of galaxy to have
armsnumber3_feature
385-390 F6.4 --- arms-n4-ft CNN probability of galaxy to have
armsnumber4_feature
392-397 F6.4 --- arms-n5-ft CNN probability of galaxy to have
armsnumbermorethan4_feature
399-404 F6.4 --- arms-non-ft CNN probability of galaxy to have
armsnumbercanttellfeature
--------------------------------------------------------------------------------
Note (1): A bit-encoded integer of run, rerun, camcol, field, object
Note (2): Just as the PSF (point spread function) magnitudes are
optimal measures of the fluxes of stars, the optimal measure of
the flux of a galaxy would use a matched galaxy model. With this
in mind, the code fits two models to the two-dimensional image of
each object in each band: a pure deVaucouleurs profile, and a pure
exponential profile.
Note (3): Galactic extinction in different bands taken from SDSS. Reddening
corrections in magnitudes at the position of each object, called extinction
in the database, are computed in a r-band (Extinction_r) following Schlegel,
Finkbeiner & Davis (1998ApJ...500..525S 1998ApJ...500..525S).
Note (4): It is a correction to an astronomical object's magnitude (or
equivalently, its flux) that allows a measurement of a quantity of light
from an object at a redshift z to be converted to an equivalent measurement
in the rest frame of the object. It was computed in a r-band (kcorrect_r)
following Chilingarian et al. (2010MNRAS.405.1409C 2010MNRAS.405.1409C),
Chilingarian & Zolotukhin (2012MNRAS.419.1727C 2012MNRAS.419.1727C).
Note (5): petroR50_r, petroR90_r - the radii containing 50% and 90%
respectively of the Petrosian flux for r-band. The characteristics of surface
brightness in the target selection pipeline of the SDSS is the mean surface
brightness within petroR50. It turns out that the ratio of petroR50 to
petroR90, the so-called "inverse concentration index", is correlated with
morphology.
Note (6): DL - luminosity distance (in Mpc). We calculated it with LUMDIST
program using parameters as follows, redshift of galaxy, H0=71,
OmegaM=0.27, Lambda0=0.73 (https://idlastro.gsfc.nasa.gov/ftp/pro/astro/
lumdist.pro).
Note (7): Mr - the absolute stellar magnitudes in r-band calculated by
formula:
Mr = modelMag_r - 5*log10(DL) - 25 - Extinction_r - kcorrect_r, where
modelMag_r - the model magnitudes in r-band;
DL - luminosity distance;
Extinction_r - Galactic extinction in an r-band;
kcorrect_r - k-correction in an r-band.
Note (8): MPD - Morphological type determined by multi-photometry-diagrams
method, as follows:
0 = Early types
1 = Late types
2 = Irregular
(see, a paper by Melnyk, O.V.; Dobrycheva, D.V.; Vavilova, I.B.,
2012Ap.....55..293M 2012Ap.....55..293M, DOI: 10.1007/s10511-012-9236-7).
Note (9): Morphological type determined by Random Forest method as follows:
0 = Early types
1 = Late types
Accuracy of the method is 96.7% for Early and 92.8% for Late types
(Vavilova, I.B.; Dobrycheva, D.V.; Vasylenko, M.Yu. et al.,
2021A&A...648A.122V 2021A&A...648A.122V, DOI: 10.1051/0004-6361/202038981).
Note (10): Morphological type determined by Support Vector Machine method
as follows:
0 = Early types
1 = Late types
Accuracy of the method is 96.4% for Early and 96.9% for Late types
(Vavilova, I.B.; Dobrycheva, D.V.; Vasylenko, M.Yu. et al.,
2021A&A...648A.122V 2021A&A...648A.122V, DOI: 10.1051/0004-6361/202038981).
Note (11): CNN probability of galaxy to be assigned to the "Completely round"
morphological class in terms of the Galaxy Zoo 2 nomenclature.
Note (12): CNN probability of galaxy to be assigned to the "Round-in-between"
morphological class in terms of the Galaxy Zoo 2 nomenclature.
Note (13): CNN probability of galaxy to be assigned to the "Cigar-shaped"
morphological class in terms of the Galaxy Zoo 2 nomenclature.
Note (14): CNN probability of galaxy to be assigned to the "Edge on"
morphological class in terms of the Galaxy Zoo 2 nomenclature.
Note (15): CNN probability of galaxy to be assigned to the "Spiral"
morphological class in terms of the Galaxy Zoo 2 nomenclature.
Note (16): Failed flag as follows:
0 = the image quality for CNN classification is good
1 = the image quality for CNN classification is failed
--------------------------------------------------------------------------------
Acknowledgements:
Ludmila Pakuliak, pakuliak(at)mao.kiev.ua
References:
Vavilova et al., Paper I 2021A&A...648A.122V 2021A&A...648A.122V, Cat. J/A+A/648/A122
(End) Ludmila Pakuliak [MAO NAS of Ukraine], Patricia Vannier [CDS] 07-Feb-2023