J/MNRAS/503/5263 Sorting of 4XMM-DR9 sources by machine learning (Zhang+, 2021)
Classification of 4XMM-DR9 sources by machine learning.
Zhang Y., Zhao Y., Wu X.-B.
<Mon. Not. R. Astron. Soc., 503, 5263-5273 (2021)>
=2021MNRAS.503.5263Z 2021MNRAS.503.5263Z (SIMBAD/NED BibCode)
ADC_Keywords: X-ray sources ; Optical ; Infrared sources ; Galaxies ; QSOs ;
Stars, normal
Keywords: methods: data analysis - methods: statistical -
astronomical data bases: miscellaneous; catalogues - stars: general -
galaxies: general
Abstract:
The ESA's X-ray Multi-mirror Mission (XMM-Newton) created a new
high-quality version of the XMM-Newton serendipitous source catalogue,
4XMM-DR9, which provides a wealth of information for observed sources.
The 4XMM-DR9 catalogue is correlated with the Sloan Digital Sky Survey
(SDSS) DR12 photometric data base and the AllWISE data base; we then
get X-ray sources with information from the X-ray, optical, and/or
infrared bands and obtain the XMM-WISE, XMM-SDSS, and XMM-WISE-SDSS
samples. Based on the large spectroscopic surveys of SDSS and the
Large Sky Area Multi-object Fiber Spectroscopic Telescope (LAMOST), we
cross-match the XMM-WISE-SDSS sample with sources of known spectral
classes, and obtain known samples of stars, galaxies, and quasars. The
distribution of stars, galaxies, and quasars as well as all spectral
classes of stars in 2D parameter space is presented. Various
machine-learning methods are applied to different samples from
different bands. The better classified results are retained. For the
sample from the X-ray band, a rotation-forest classifier performs the
best. For the sample from the X-ray and infrared bands, a
random-forest algorithm outperforms all other methods. For the samples
from the X-ray, optical, and/or infrared bands, the LogitBoost
classifier shows its superiority. Thus, all X-ray sources in the
4XMM-DR9 catalogue with different input patterns are classified by
their respective models that are created by these best methods. Their
membership of and membership probabilities for individual X-ray
sources are assigned. The classified result will be of great value for
the further research of X-ray sources in greater detail.
Description:
Firstly, we did catalogues cross matches based on 4XMM-DR9 (Webb et
al. 2020A&A...641A.136W 2020A&A...641A.136W, Cat. IX/59) catalogue, SDSS (DR12 Alam
et al. 2015ApJS..219...12A 2015ApJS..219...12A, Cat. V/147) photometric data base and the
AllWISE (Cutri et al. 2013, Cat. II/328) 2013 data base which are
correlated by the parameters of known objects. We obtained the
XMM-WISE, XMM-SDSS, and XMM-WISE-SDSS samples which contains
X-ray sources with informations on the X-ray, optical, and/or infrared
bands. Secondly, based on The Large Sky Area Multi-object Fiber
Spectroscopic Telescope (LAMOST; Cui et al. 2012RAA....12.1197C 2012RAA....12.1197C; Luo
et al. 2015RAA....15.1095L 2015RAA....15.1095L, Cat. V/146) for stars and galaxies
identifications and based on The SDSS Data Release 14 Quasar catalogue
(DR14Q; Paris et al. 2018A&A...613A..51P 2018A&A...613A..51P, Cat. VII/286) for quasars
identification. We create multiple samples of known objects in order
to have classificate X-ray sources divided in three groups as Galaxy
class (with Subclasses like AGN, SB etc. ), Star class (with
Subclasses like O,A etc.) and QSOs.
Finally, we trained 3 different machine learning algorithms :
rotation-forest (Rodriguez, Kuncheva & Alonso 2006,IEEE Trans. Pattern
Analysis and Machine Intelligence, 28, 1619), random-forest (Breiman
2001, Machine Learning, 45, 5) and LogitBoost (Friedman, Hastie &
Tranibshirani, 2000, Ann. Statistics, 28, 337) on a input pattern
parameters (see section for more details) in order to recognize
subclasses of galaxies, stars, and quasars for the 4 differents cases
of samples (only X-ray band, only X-ray and optical bands, only X-ray
and infrared bands, X-ray,optical and infrared bands). We asigned
LogitBoost for the cases (only X-ray and optical bands, X-ray,optical
and infrared bands) which makes two differents classifiers ,
rotation-forest classifier for the case (only X-ray band) and
random-forest classifier for the case (only X-ray and infrared bands).
Due to the performance precision of algorithms to classify and
sub-classify known X-ray sources from our training samples, we decided
to keep only the three mains classes such as stars, galaxies and QSOs
for best accuracy predictions of unknown X-ray sources classifications
probabilities (see section 4 and 5). For the 4XMM-DR9 sources, all
predicted results are shown in table10.dat.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
table10.dat 108 550124 predicted results of machine learning
classifications of 4XMM-DR9 sources
--------------------------------------------------------------------------------
See also:
IX/59 : XMM-Newton Serendipitous Source Catalogue 4XMM-DR9 (Webb+, 2020)
V/147 : The SDSS Photometric Catalogue, Release 12 (Alam+, 2015)
II/328 : AllWISE Data Release (Cutri+ 2013)
V/146 : LAMOST DR1 catalogs (Luo+, 2015)
VII/286 : SDSS quasar catalog, fourteenth data release (Paris+, 2018)
Byte-by-byte Description of file: table10.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 15 I15 --- Source Source ID (scrid)
17- 35 E19.17 deg RAdeg Right ascension in decimal degrees (sc_ra)
(J2000)
37- 56 E20.17 deg DEdeg Declination in decimal degrees (sc_dec)
(J2000)
58- 63 A6 --- ClassX Source classification for the first
classifier machine learning method
(rotation-forest classifier X-ray
information only) (Class_x)
65- 69 F5.3 --- PX The classification probabilities deducted
for sources that only have the X-ray band
(P_x)
71- 76 A6 --- ClassXO ? Source classification for the third
classifier machine learning method
(LogitBoost classifiers X-ray and optical
bands information only) (Class_xo)
78- 82 F5.3 --- PXO ? The classification probabilities deducted
for source that only have the X-ray and
optical bands (P_xo)
84- 89 A6 --- ClassXI ? Source classification for the second
classifier machine learning method
(random-forest classifier X-ray and
infrared bands information only) (Class_xi)
91- 95 F5.3 --- PXI ? The classification probabilities deducted
for sources that only have the X-ray and
infrared bands (P_xi)
97-102 A6 --- ClassXIO ? Source classification for the fourth
classifier machine learning method
(LogitBoost classifiers X-ray, optical
and infrared bands information only)
(Class_xio)
104-108 F5.3 --- PXIO ? The classification probabilities deducted
for sources that have X-ray, optical and
infrared bands (P_xio)
--------------------------------------------------------------------------------
History:
From electronic version of the journal
(End) Luc Trabelsi[CDS] 16-Apr-2024