J/A+A/615/A56


J/A+A/615/A56      Code for iterative clustering method - iSRS  (Pacciani, 2018)
Identification of activity peaks in time-tagged data with a scan-statistics
driven clustering method and its application to gamma-ray data samples.
    Pacciani L.
    <Astron. Astrophys. 615, A56 (2018)>
    =2018A&A...615A..56P 2018A&A...615A..56P        (SIMBAD/NED BibCode)
ADC_Keywords: Models ; Gamma rays
Keywords: methods: statistical - methods: data-analysis -
          techniques: photometric - gamma-ray: general -

Abstract:
    The investigation of activity periods in time-tagged data samples is a
    topic of large interest. Among Astrophysical samples, gamma-ray
    sources are widely studied, due to the huge quasi-continuum data set
    available today from the FERMI-LAT and AGILE-GRID gamma-ray
    telescopes.

    I developed a general temporal-unbinned method to identify flaring
    periods in time-tagged data and discriminate statistically-significant
    flares: I propose an event clustering method in one-dimension to
    identify flaring episodes, and Scan-statistics to evaluate the flare
    significance within the whole data sample.

    This is a photometric algorithm. The comparison of the photometric
    results (e.g., photometric flux, gamma-ray spatial distribution) for
    the identified peaks with the standard likelihood analysis for the
    same period is mandatory to establish if source-confusion is spoiling
    results. The result of the proposed method is similar to a photometric
    light-curve, but peaks are resolved, they are statistically
    significant within the whole period of investigation, and peak
    detection capability does not suffer time-binning related issue.

    The method can be applied to reveal flares in any time-tagged data
    sample. I will show results for gamma-ray sources of known celestial
    position, e.g., from a catalog. Furthermore it can be used when it is
    necessary to assess the statistical significance within the whole
    period of investigation of a flare from an unknown gamma-ray source.

Description:
    The C code implementing the iSRS clustering (isrs.c) and the removal
    of random clusters (rmrndm.c) from a generic data sample is reported
    here. The "isrs" executable builds the set
    of candidate clusters C_i. (no removal of random clusters is
    applied).

    The "rmrndm" executable performs the
    removal of random clusters. It produced the list of survived clusters
    (the unbinned light curve) and the list of flares which includes peak
    flux, peak time and flare FWHM.

    A first release (version 1) of scan statistics tables (stab.txt)
    filled using 32bit Marsaglia-Zaman RANMAR random generator are
    reported. The frequency of false positive samples (f_coinc) discussed
    in the paper (section 5) is reported too (ctab.txt).

    The iSRS clustering is a method of event clustering in 1D iterated to
    obtain all the conceivable clusters from the original sample. It
    depends on 1 parameter: the N_tol parameter (its meaning is
    explained in section 3 of the paper.

    In appendix A of the paper it is explained how to correctly choose it.
    Clustering method: The clustering method (SRS) has two parameters
    ($N_tol and Δ_thr. $N_tol parameter is kept constant.
    Δ_thr naively corresponds to the maximum allowed distance
    among the elements belonging to a certain cluster.

    The clustering procedure is iterated (iSRS) scanning on the
    Δ_thr parameter. It is finely decreased starting from the
    largest spacing among contiguous events of the data sample under
    investigation. The Δ_thr decreasing procedure stops when only
    clusters of size 2 (of two events) remain. At the end of the scanning,
    the Δ_thr space is fully explored. We can obtain the same
    cluster from a sub-set of Δ_thr. Duplicate clusters are
    removed.

Instructions:
  A "Makefile" builds the executables
  "isrs" and "<&getCatFile
  /rmrndm|rmrndm>" for linux based systems with gcc installed.

  To prepare the executables, put all the files provided here in the same
  directory, and at prompt type:
  make clean
  make

  The algorithm has two steps.
  1) production of candidate clusters, command syntax:
      ./isrs <N_tol> 
      

  2) removal of random clusters, command syntax:
     ./rmrndm 
      PEAKLIST_FILE
     

  - The file with the candidate clusters is an ascii file, its format and
     meaning of each column is specified in the header of the file.
  - The file with the survived clusters is an ascii file, its format and
     meaning of each column is specified in the header of the file.
  - The file with the flare list is an ascii file, its format and
     meaning of each column is specified in the header of the file.

  The threshold probability (1 - confidence level) for the removal of random
  clusters can be set directly adding the string:
  THR_PROB 
  to the "rmrndm" command.
  Alternatively the threshold probability can be set in standard gaussian
  units adding the string
  NSIGMA 
  to the "rmrndm" command.
  The default value is NSIGMA=3.
  Currently the threshold can be chosen in the range 2 - 3.5 std dev.

  Example:
  ./isrs 1 knox.txt knox_candidateclusters.txt
  ./rmrndm knox_candidateclusters.txt knox_survivedclusters.txt NSIGMA 2.
  PEAKLIST_FILE knox_peaks.txt

  Optionally the format of output file can be chosen adding to the
  "isrs"
  command a string like this:
  T_FORMAT %18.10lf
  The format string must be in c style.

  Example:
  ./isrs 1 knox.txt knox_candidateclusters.txt T_FORMAT %18.10lf
  ./rmrndm knox_candidateclusters.txt knox_survivedclusters.txt NSIGMA 3.
  PEAKLIST_FILE knox_peaks.txt

  Optionally the files stab.txt and ctab.txt can be put in an other
  directory. To specify the path, add a string like this at the
  "rmrndm" command:
  SCANTABLEDIR 

  Example:
  ./isrs 1 knox.txt knox_candidateclusters.txt
  ./rmrndm knox_candidateclusters.txt knox_survivedclusters.txt NSIGMA 2.
  PEAKLIST_FILE knox_peaks.txt  SCANTABLEDIR ../../table_dir/

  An ordered list of events is the input of the procedure.
  The input list can be an ascii or a binary file.
  If the exposure is uniform during the observation, the input list
  consists of a column (T column) with the time of occurrence of each event.

  If the exposure is not uniform during the observation (e.g. for events
  extracted within the FERMI-LAT gamma-ray data sample, see section 3 of the
  paper), the input list consists of two columns:
  The first column (X column) contains the cumulative exposure from the
  start of observation to the occurrence of each event.
  The second column (t column) contains the time of the occurrence of each
  event.

  Ascii input list:
  It has an header consisting of 3 rows:
  Header first row: specifies the sample size.
  If the input list has one column:
  The header second row specifies the start of the observation.
  The header third row specifies the stop of the observation.
  Header example for "one column" event list:
  SAMPLE SIZE 35
  START 0.
  STOP 2191.
  The Ascii input list ("knox.txt") for the
  "Knox data set" (Knox, G., 1959,
  British Journal of Preventive Social Medecine, 13, 222) is reported here
  as example for the "one column" event list.

  If the input list has two columns:
  The header second row specifies the cumulative exposure
  at the start of the observation (first field), and the starting
  time of the observation (second field).
  The header third row specifies the cumulative exposure
  at the stop of the observation (first field), and the ending
  time of the observation (second field).
  Header example for "two columns" event list:
  SAMPLE SIZE   82749
  START  0.00000000E+00  54682.65603222
  STOP  2.18668820E+11  57412.44772204
  The ascii input list ("3C1.txt") is a
  "two columns" input event list.
  The file contains the event list for the FSRQ 3C 454.3 with gamma-ray
  data extracted above 0.3 GeV between MJD 56500 and MJD 56600, see section
  3 of the paper for an explanation of the extraction method).

  The ascii event list is limited by the chosen format: it could
  happen that for two contiguous events, the same cumulative exposure
  is tabulated. To prevent this, an ascii event list can be prepared
  reporting in the X column the exposure between the reported event and the
  previous one (called here differential exposure).
  For the first event, the exposure between the time of the
  occurrence of the event and the start of the observation is reported.
  The ascii "two column" input list  with differential exposure
  are reported in "3C2.txt"
  (3C2.txt contains the same events reported in 3C1.txt).

  the binary input list uses the same sequence of input header and input
  data specified for the ascii input list, but there are no input
  specifiers. The binary file is a list of header values and data of
  c datatype "double".
  For "one column" event list, the input file contains the following fields
  (all with double datatype):
   ... <time
  of event i> ...
  For "two column" event list, the input file contains the following fields
  (all with double datatype):
  
  ...
  ......
  where X is the exposure (cumulative or differential) evaluated at the
  time of occurrence of the event, and t is the time of occurrence  of the
  event.

  When the input list has two column, the option "TWOCOLUMNS" must be
  specified to the isrs command:
  example:
  ./isrs 50 3C1.txt 3C1_candidateclusters.txt TWOCOLUMNS

  When the first column (t column for "one column mode", and X column
  for "two column mode") is reported as differential, The option
  "DIFFERENTIALX" must be specified to the isrs command.
  example:
  ./isrs 50 3C2.txt 3C2_candidateclusters.txt TWOCOLUMNS DIFFERENTIALX

  When the input list is a binary file, the option "BINARY" must be
  specified to the isrs command:
  example:
  ./isrs 50 3C1.bin 3C1_candidateclusters.txt TWOCOLUMNS BINARY


  Optionally the format of output file can be chosen adding to the
  "isrs"
  command a string like this:
  T_FORMAT %18.10lf
  The format string must be in c style;
  or/and a string like this (for the "two columns" input list):
  X_FORMAT %18.10lE
  The format string must be in c style.


  Example1:
  input: "one column" event list
  ./isrs 1 knox.txt knox_candidateclusters.txt
  ./rmrndm knox_candidateclusters.txt knox_survivedclusters.txt NSIGMA 3.
  PEAKLIST_FILE knox_peaks.txt

  Example2:
  input: "two column" event list
  ./isrs 50 3C1.txt 3C1_candidateclusters.txt TWOCOLUMNS T_FORMAT %18.10lf
  X_FORMAT %18.10lE
  ./rmrndm 3C1_candidateclusters.txt 3C1_survivedclusters.txt NSIGMA 3.
  PEAKLIST_FILE 3C1_peaks.txt

  Example3:
  input: "two column" event list, the exposure between the time of
  occurrence of the event, and the previous is reported in  "X column",
  instead of the cumulative exposure.
  ./isrs 50 3C2.txt 3C2_candidateclusters.txt TWOCOLUMNS T_FORMAT %18.10lf
  X_FORMAT %18.10lE DIFFERENTIALX
  ./rmrndm 3C2_candidateclusters.txt 3C2_survivedclusters.txt NSIGMA 3.
  PEAKLIST_FILE 3C2_peaks.txt

  For any question, send an email to: 

stab.txt content:
  This ascii file contains the tables with the cumulative distribution
  for the m-spacing. Each table is for a predefined value of sample
  size. The first three rows contains the format version id, the version
  id, and release date of the file.
  Then the number of rows reserved for comments is reported.
  The comments follow.

  The field
  Ntables:  
  contains the number of reported tables.
  The next rows contain the info for each table and the table.

  For each table
  The field
  NELE:  
  contains the size of the random sample under consideration.
  The field:
  NTRY:   
  contains the number of samples used to fill the table.
  The field:
  SEED:   
  Contains the input seed of the random engine.
  The field:
  Nmele:  
  contains the number of rows of the table.
  The rows of the table follow.
  First column (integer number) contains the value of "m" of the m-spacing
  for which the cumulative distribution is reported
  The other columns report the length "l" (real number) of the m-spacing for
  which the probability to obtain an m-spacing shorter then "l"
  reported in gaussian standard deviation
  is 2, 2.5, 3, 3.5, 4, 4.5, 5 (see section 5 of the paper).

ctab.txt content:
  This ascii file contains the frequency of false positive samples
  discussed in section 5 of the paper.
  The first three rows contains the format version id, the version id, and
  release date of the file.
  Then the number of rows reserved for comments is reported.
  The comments follow.
  Then the table is reported:
  First column reports the sample size (integer number).
  Second column reports the value of $Θ^*$ threshold (real number).
  Third column reports the 100*frequency of false positive ($f_{coinc}$)
  samples (real number).
  Fourth column is the negative error on 100*$f_{coinc}$ (real number).
  Fifth column is the positive error on 100*$f_{coinc}$ (real number).

  plotting unbinned light curve:
  The idl procedure pulc.pro is provided to plot the unbinned light curve.
  The output encapsulated postscript file with the unbinned light curve
  is "uLC.eps".
  at the idl prompt, type:
  .r pulc
  pulc, 

  pulc accepts several options:
  to specify x-axis range:
  pulc,  ,trange=[,]

  to specify x-axis title:
  pulc,  ,xtitle=

  to specify y-axis title:
  pulc, <file name with the list of survived clusters> ,ytitle=<title of y axis>

  to specify max value on y axis:
  pulc, <file name with the list of survived clusters> ,ymax=<max for y axis>

File Summary:
--------------------------------------------------------------------------------
 FileName      Lrecl  Records   Explanations
--------------------------------------------------------------------------------
ReadMe            80        .   This file
isrs.c           263      819   for iSRS clustering
rmrndm.c         347     1293   for removal of random clusters, peaks find,
                                 and FWHM evaluation
ldstab.c         512       55   in rmrndm.c to read stab.txt and ctab.txt
ldstab.h         192       68   include file with the definition of the data
                                structure used for scan stat tables. This
                                structure is filled within "ldstab.c" and
                                used within "ldstab.c" and "rmrndm.c"
stab.txt         108    43710   the m-spacing cumulative distributions
ctab.txt          58      114   the frequency of false positive samples
                                 (f_coinc) discussed in the (section 5)
Makefile          48        9   and "rmrndm" executables (for linux gcc only)
knox.txt          14       38   one column input event list ("Knox data set")
3C1.txt           33     1571   two columns input event list (FSRQ 3C454.3
                                 gamma-ray data extracted between MJD 56500 and
                                 MJD 56600)
3C2.txt           33     1571   two columns input event list with same content
                                 as 3C1.txt, but for each event, the exposure
                                 between the time of the event and the time of
                                 previous one is reported (instead of the
                                 cumulative exposure).
pulc.pro         181      157   procedure to plot the unbinned light curve
LICENSE           79       34   BSD 3 clauses License
--------------------------------------------------------------------------------

Acknowledgements:
    Luigi Pacciani, luigi.pacciani(at)iaps.inaf.it

<HR size=4>(End)                                        Patricia Vannier [CDS]  12-Apr-2018
</pre>
<TABLE BORDER=1 bgcolor="#F4A460" CELLPADDING=5 CELLSPACING=0>
  <TR VALIGN=BASELINE><TD><EM><FONT SIZE='-1'>The document above
  follows the rules of the <A HREF="https://cds.unistra.fr/doc/catstd.htx">Standard Description for Astronomical Catalogues</A>;

    from this documentation it is possible to generate
    <B>f77</B> program to load files
         <A HREF="/viz-bin/ReadMe2f?-d&/ftp/cats/J/A+A/615/A56&-a">into arrays</A>
         or <A HREF="/viz-bin/ReadMe2f?-d&/ftp/cats/J/A+A/615/A56">line by line</A>
  </FONT></EM>
</TD></TR></TABLE>     <!--
-- #######################################################################
-- #                                                                     #
-- #                             CDS FOOTER                              #
-- #                                                                     #
-- #######################################################################
-->
<footer class="cds-footer for-vizier" data-pagefind-ignore>
<div class="cds">
<!-- CDS Logo and link -->
<a class="cds logo" href="https://cds.unistra.fr" title="CDS - Centre de Données astronomiques de Strasbourg" target="_blank"></a>
<!-- Social networks links -->
<div class="socials">
<a class="email"    title="Send an email to the CDS team." href="mailto:cds-question@unistra.fr?Subject=VizieR"></a>
<a class="rss"      title="RSS Feed of the CDS news"       href="https://cds.unistra.fr/news/rss.php" target="_blank" rel="noopener noreferrer"></a>
<a class="bluesky"  title="CDS' BlueSky page."             href="https://bsky.app/profile/cdsportal.bsky.social" target="_blank" rel="noopener noreferrer"></a>
<a class="facebook" title="CDS' Facebook page."            href="https://www.facebook.com/CDSportal/" target="_blank" rel="noopener noreferrer"></a>
<a class="youtube"  title="CDS' Youtube channel."          href="https://www.youtube.com/@CDSportal" target="_blank" rel="noopener noreferrer"></a>
<a class="github"   title="CDS' GitHub page."              href="https://github.com/cds-astro" target="_blank" rel="noopener noreferrer"></a>
</div>
</div>
<div class="service">
<div class="content">

</div>
</div>
<div class="legal-links">
<a href="https://cds.unistra.fr/about/contact/">Contact</a> - <a href="https://cds.unistra.fr/legals/">Legals</a>
</div>
</footer></body></html>