J/A+A/615/A56      Code for iterative clustering method - iSRS  (Pacciani, 2018)

Identification of activity peaks in time-tagged data with a scan-statistics driven clustering method and its application to gamma-ray data samples. Pacciani L. <Astron. Astrophys. 615, A56 (2018)> =2018A&A...615A..56P 2018A&A...615A..56P (SIMBAD/NED BibCode)
ADC_Keywords: Models ; Gamma rays Keywords: methods: statistical - methods: data-analysis - techniques: photometric - gamma-ray: general - Abstract: The investigation of activity periods in time-tagged data samples is a topic of large interest. Among Astrophysical samples, gamma-ray sources are widely studied, due to the huge quasi-continuum data set available today from the FERMI-LAT and AGILE-GRID gamma-ray telescopes. I developed a general temporal-unbinned method to identify flaring periods in time-tagged data and discriminate statistically-significant flares: I propose an event clustering method in one-dimension to identify flaring episodes, and Scan-statistics to evaluate the flare significance within the whole data sample. This is a photometric algorithm. The comparison of the photometric results (e.g., photometric flux, gamma-ray spatial distribution) for the identified peaks with the standard likelihood analysis for the same period is mandatory to establish if source-confusion is spoiling results. The result of the proposed method is similar to a photometric light-curve, but peaks are resolved, they are statistically significant within the whole period of investigation, and peak detection capability does not suffer time-binning related issue. The method can be applied to reveal flares in any time-tagged data sample. I will show results for gamma-ray sources of known celestial position, e.g., from a catalog. Furthermore it can be used when it is necessary to assess the statistical significance within the whole period of investigation of a flare from an unknown gamma-ray source. Description: The C code implementing the iSRS clustering (isrs.c) and the removal of random clusters (rmrndm.c) from a generic data sample is reported here. The "isrs" executable builds the set of candidate clusters Ci. (no removal of random clusters is applied). The "rmrndm" executable performs the removal of random clusters. It produced the list of survived clusters (the unbinned light curve) and the list of flares which includes peak flux, peak time and flare FWHM. A first release (version 1) of scan statistics tables (stab.txt) filled using 32bit Marsaglia-Zaman RANMAR random generator are reported. The frequency of false positive samples (fcoinc) discussed in the paper (section 5) is reported too (ctab.txt). The iSRS clustering is a method of event clustering in 1D iterated to obtain all the conceivable clusters from the original sample. It depends on 1 parameter: the Ntol parameter (its meaning is explained in section 3 of the paper. In appendix A of the paper it is explained how to correctly choose it. Clustering method: The clustering method (SRS) has two parameters ($Ntol and Δthr. $Ntol parameter is kept constant. Δthr naively corresponds to the maximum allowed distance among the elements belonging to a certain cluster. The clustering procedure is iterated (iSRS) scanning on the Δthr parameter. It is finely decreased starting from the largest spacing among contiguous events of the data sample under investigation. The Δthr decreasing procedure stops when only clusters of size 2 (of two events) remain. At the end of the scanning, the Δthr space is fully explored. We can obtain the same cluster from a sub-set of Δthr. Duplicate clusters are removed. Instructions: A "Makefile" builds the executables "isrs" and "<&getCatFile /rmrndm|rmrndm>" for linux based systems with gcc installed. To prepare the executables, put all the files provided here in the same directory, and at prompt type: make clean make The algorithm has two steps. 1) production of candidate clusters, command syntax: ./isrs <N_tol> 2) removal of random clusters, command syntax: ./rmrndm PEAKLIST_FILE - The file with the candidate clusters is an ascii file, its format and meaning of each column is specified in the header of the file. - The file with the survived clusters is an ascii file, its format and meaning of each column is specified in the header of the file. - The file with the flare list is an ascii file, its format and meaning of each column is specified in the header of the file. The threshold probability (1 - confidence level) for the removal of random clusters can be set directly adding the string: THR_PROB to the "rmrndm" command. Alternatively the threshold probability can be set in standard gaussian units adding the string NSIGMA to the "rmrndm" command. The default value is NSIGMA=3. Currently the threshold can be chosen in the range 2 - 3.5 std dev. Example: ./isrs 1 knox.txt knoxcandidateclusters.txt ./rmrndm knoxcandidateclusters.txt knoxsurvivedclusters.txt NSIGMA 2. PEAKLIST_FILE knox_peaks.txt Optionally the format of output file can be chosen adding to the "isrs" command a string like this: T_FORMAT %18.10lf The format string must be in c style. Example: ./isrs 1 knox.txt knoxcandidateclusters.txt T_FORMAT %18.10lf ./rmrndm knoxcandidateclusters.txt knoxsurvivedclusters.txt NSIGMA 3. PEAKLIST_FILE knox_peaks.txt Optionally the files stab.txt and ctab.txt can be put in an other directory. To specify the path, add a string like this at the "rmrndm" command: SCANTABLEDIR Example: ./isrs 1 knox.txt knoxcandidateclusters.txt ./rmrndm knoxcandidateclusters.txt knoxsurvivedclusters.txt NSIGMA 2. PEAKLIST_FILE knox_peaks.txt SCANTABLEDIR ../../table_dir/ An ordered list of events is the input of the procedure. The input list can be an ascii or a binary file. If the exposure is uniform during the observation, the input list consists of a column (T column) with the time of occurrence of each event. If the exposure is not uniform during the observation (e.g. for events extracted within the FERMI-LAT gamma-ray data sample, see section 3 of the paper), the input list consists of two columns: The first column (X column) contains the cumulative exposure from the start of observation to the occurrence of each event. The second column (t column) contains the time of the occurrence of each event. Ascii input list: It has an header consisting of 3 rows: Header first row: specifies the sample size. If the input list has one column: The header second row specifies the start of the observation. The header third row specifies the stop of the observation. Header example for "one column" event list: SAMPLE SIZE 35 START 0. STOP 2191. The Ascii input list ("knox.txt") for the "Knox data set" (Knox, G., 1959, British Journal of Preventive Social Medecine, 13, 222) is reported here as example for the "one column" event list. If the input list has two columns: The header second row specifies the cumulative exposure at the start of the observation (first field), and the starting time of the observation (second field). The header third row specifies the cumulative exposure at the stop of the observation (first field), and the ending time of the observation (second field). Header example for "two columns" event list: SAMPLE SIZE 82749 START 0.00000000E+00 54682.65603222 STOP 2.18668820E+11 57412.44772204 The ascii input list ("3C1.txt") is a "two columns" input event list. The file contains the event list for the FSRQ 3C 454.3 with gamma-ray data extracted above 0.3 GeV between MJD 56500 and MJD 56600, see section 3 of the paper for an explanation of the extraction method). The ascii event list is limited by the chosen format: it could happen that for two contiguous events, the same cumulative exposure is tabulated. To prevent this, an ascii event list can be prepared reporting in the X column the exposure between the reported event and the previous one (called here differential exposure). For the first event, the exposure between the time of the occurrence of the event and the start of the observation is reported. The ascii "two column" input list with differential exposure are reported in "3C2.txt" (3C2.txt contains the same events reported in 3C1.txt). the binary input list uses the same sequence of input header and input data specified for the ascii input list, but there are no input specifiers. The binary file is a list of header values and data of c datatype "double". For "one column" event list, the input file contains the following fields (all with double datatype):