<< back to HepSim manual

Analysis Primer

This analysis tutorial covers Python codding on the Java platform (Jython). Look at the main description for other topics. For C++/ROOT, please refer ProMC web page.

To ensure platform independence and a possibility to run programs using web browsers 1), validation programs are written in Jython, which is an implementation of the Python language on the Java platform. Look at several links:

See Java code API.

The example codes on this page can be run using several options:

  • Using Java web start from the browser.
  • “hs-ide file.py” command to launch the editor (from hs-tools package)
  • “hs-run file.py” command to run using a command-line (“batch mode”, from the hs-tools package)

Accessing data

Here is a simple example to make a list of files using the data URL:

from hepsim import HepSim
url="https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_pythia8/"
flist=HepSim.getList(url)
print flist

If you do not know URL, you can use the dataset name or URL of the info page of this dataset:

from hepsim import HepSim
url=HepSim.urlRedirector("tev100_ttbar_mg5")
flist=HepSim.getList(url)
print flist

Note that you can find URL list of datasets using the standard Python.

name="tev100pp_pythia8_minbias_a14"
hepsim_server="http://atlaswww.hep.anl.gov/hepsim/"
import urllib2
response = urllib2.urlopen(hepsim_server+'/geturlmirrors.php?name='+name)
html = response.read()
response.close()
html=html.split(";")
print html # print a list of URL locations 

If ProMC files were downloaded to the local file system, one can build the file list as:

from jhplot.utils import FileList
flist=FileList.get("./data","promc")
print flist

Here “./data” is a directory with files, while “promc” indicated that we build the file list using the file extensions “.promc”.

Often, the output list contains multiple files generated with different pT or mass ranges. For example, a group of files generated with minimum pT=1000 GeV have names *_pt1000_*.py, while files generated with pt=5000 GeV have names *_pt5000_*.py. One can build a list with pT ranges using this method:

xmin=HepSim.getRanges(flist,"_pt","_");

This methods takes file list, and identify pT ranges given between the strings “_pt” and “_”. In the above example, the output list will be [1000,5000].

Using ProMC files

Reading the file header

To read data from a ProMC file, you will need these statements:

from proto import FileMC        
f=FileMC("data.promc")              # get input file from current directory (or URL)
des=f.getDescription()              # get description
header = f.getHeader()              # to access information on this file
un=float(header.getMomentumUnit())  # conversion energy units
lunit=float(header.getLengthUnit()) # conversion length units

The latter 2 values are needed to convert from integer presentation of 4-momenta, typically used for variable byte encoding to save space, to double values. The values are typically around 1000-10000, depending on importance of representing low-momentum with a better precision. The header file also stores some additional information. Look at Java API

Reading the event records

from proto import FileMC    
f=FileMC("data.promc")              # get input file from current directory (or URL)
header = f.getHeader()              # to access information on this file
un=float(header.getMomentumUnit())  # conversion energy units
lunit=float(header.getLengthUnit()) # conversion length units
for i in range(f.size()):           # run over all events in the file 
    eve = f.read(i)                 # ProMCEvent record for event "i"
    pa = eve.getParticles()         # particle information

Look at Java API of the ProMCEvent class:

Here is the Java API for “Particles”:

In the above example, “pa” is an object Particles with particle information. We can create a typical Lorentz particle using the class LParticle as:

from proto import FileMC  
from hephysics.particle import LParticle 
f=FileMC("data.promc")              # get input file from current directory (or URL)
header = f.getHeader()              # to access information on this file
un=float(header.getMomentumUnit())  # conversion energy units
lunit=float(header.getLengthUnit()) # conversion length units
for i in range(f.size()):           # run over all events in the file 
    eve = f.read(i)                 # ProMCEvent record for event "i"
    pa = eve.getParticles()         # particle information
    for j in range(pa.getPxCount()): 
           px=pa.getPx(j)/un         # pX
           py=pa.getPy(j)/un         # pY
           pz=pa.getPz(j)/un         # pZ
           e= pa.getEnergy(j)/un     # energy
           m= pa.getMass(j)/un       # mass
           p=LParticle(px,py,pz,e,m) # fill it

To access the PDG status code, us “pa.getPdgId(j) method. For stable particles, use the HEPEVT status code (the method “pa.getStatus(j)”). Here are the methods to retrieve information from the event records on a particle at a position j:

  • pa.getPdgId(j) - get PDG id (int)
  • pa.getStatus(j) - get status information (int)
  • pa.getPx(j), pa.getPy(j), pa.getPz(j), pa.getEnergy(j) - get Px, Py, Pz, Energy of a particle (in int64 varint)
  • pa.getX(j), pa.getY(j), pa.getZ(j), pa.getT(j) - get (x, y, z, t) position of a particle (in int64 varint)
  • pa.getMass(j) - get mass (in int64 varint)
  • pa.getMother1(j), pa.getMother2(j) - get positions of first and second parent of the particle (int)
  • pa.getDaughter1(j), pa.getDaughter1(j) - get position of 1st and 2nd daughter (int)
  • pa.getBarcode(j) - get barcode (int, in filled)

Look at the API of the Lorentz-particle class:

  • LParticle - a simple Lorentz particle with transformations
  • HEParticle - a typical HEP particle based on the Lorentz particle with transformations

In addition, you can build an ArrayList with particles using a convenient class PromcUtil. Also, when possible, use the method “getParticle()” that returns a HEParticle particle with 4-momenta and postilion.

from proto import FileMC
from hephysics.hepsim import PromcUtil
f=FileMC("data.promc")              # get input file from current directory (or URL)
header = f.getHeader()              # to access information on this file
un=float(header.getMomentumUnit())  # conversion energy units
lunit=float(header.getLengthUnit()) # conversion length units
for i in range(f.size()):           # run over all events in the file 
    eve = f.read(i)                 # ProMCEvent record for event "i"
    pa = eve.getParticles()         # particle information
    par=PromcUtil.getParticleDList(f.getHeader(), pa, 1, -1, 1000) # List with particle 4-momenta
    print "Nr of final-state particles=",par.size()

The above example fills a list with all stable particles, without any cuts on pT and Eta (2 second arguments).

The next example creates HEParticle using PromcUtil:

from proto import FileMC
from hephysics.hepsim import PromcUtil
f=FileMC("data.promc")              # get input file from current directory (or URL)
header = f.getHeader()              # to access information on this file
un=float(header.getMomentumUnit())  # conversion energy units
lunit=float(header.getLengthUnit()) # conversion length units
for i in range(f.size()):           # run over all events in the file 
    eve = f.read(i)                 # ProMCEvent record for event "i"
    pa = eve.getParticles()         # particle information
    for j in range(pa.getPxCount()): 
           particle=PromcUtil.getParticle(pa, un, lunit, j) # a HEP particle
 

In this example, we fill a list with all stable particles (status “1”), without any pT cut (”-1“) and maximum absolute value of pseudo-rapidity 1000. The object “par” is a list with ParticleD.

Finality, let us extend the above example: we will create anti-kT jets using the list “par”:

from proto import FileMC
from hephysics.hepsim import PromcUtil
from hephysics.jet import JetN2 
 
ktjet=JetN2(0.5,"antikt",20)        # initialize jets with R=0.5, E-mode, anti-KT,min pT=20
print ktjet.info()                  # print its settings
 
f=FileMC("data.promc")              # get input file from current directory (or URL)
header = f.getHeader()              # to access information on this file
un=float(header.getMomentumUnit())  # conversion energy units
lunit=float(header.getLengthUnit()) # conversion length units
for i in range(f.size()):           # run over all events in the file 
    eve = f.read(i)                 # ProMCEvent record for event "i"
    pa = eve.getParticles()         # particle information
    par=PromcUtil.getParticleDList(f.getHeader(), pa, 1, -1, 1000) # List with particle 4-momenta
    ktjet.buildJets(par)            # build anti-kT jets              
    jets=ktjet.getJetsSorted()      # get a list with sorted jets
    if (len(jets)>0):
                 print "pT of a leading jet =",jets[0].perp()," GeV" 

To build anti-kT jets, we use JetN2 clustering algorithm implemented in Java. The description of this algorithm is given in http://arxiv.org/abs/1510.06638. It is a similar to FastJet N*N algorithm and typically shows a performance similar to the C++ analogue. Tests indicate that the JetN2 clustering algorithm is about twice slower that C++, but by a factor 20 faster than the traditional N^3 jet clustering algorithms. You can find benchmarking results in HepJet github repository. Look at a typical example of jet clustering in the qcd_pythia8_ptbins_jets.py code.

Event statistics

Before closing ProMC file, you may need to access event statistics, cross sections and errors on cross sections.

from proto import FileMC    
f=FileMC("data.promc")                       # get input file from current directory (or URL)
stat = f.getStat()                           # get statistics
cross=stat.getCrossSectionAccumulated()      # accumulated cross section in pb
error=stat.getCrossSectionErrorAccumulated() # error on accumulated cross section in pb 
print "Cross section:", cross, "+/-", error, " pb"
f.close()

Look at the API:

Using ProIO files

ProIO files have their own API (https://github.com/decibelcooper/proio). It is at early stage of development. You can look at example of how to use ProIO here [http://atlaswww.hep.anl.gov/hepsim/info.php?item=325)

Using ROOT files

ROOT files are used to store Delphes fast simulations. To analyses ROOT files, either use ROOT (see Delphes manual) or the Jas4pp program.

Plots and histograms

HepSim uses histogram packages supported by DataMelt (and implemented using JAIDA and FreeHep).

Here are a few most common classes:

  • HPlot - simple canvas to show graphs
  • H1D - 1D histogram
  • H2D - 2D histogram
  • P1D - data container with support of errors

The main canvas to show histograms H1D and data points P1D is HPlot. To process scripts in a background without a pop-up HPlot, use method “visible(False)”, and set sys.exit(0) at the end of the scripts.

Usually, plots are saved in the vector-graphics SVG format. They can be converted on Linux to EPS as:

inkscape --without-gui --export-eps output.eps input.svg

However, one can also save images in PDF or EPS using the corresponding file extensions for the “export” command.

In addition, data are saved in the form of XML (with the extension ”.jdat“) files. For example, one can read such files and create histograms or data objects as:

from jhplot.io import HBook
hb = HBook("http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/macros/ttbar_mg5.jdat") 
print hb.getKeys()         # print all the keys
p1=hb.get("1")
print p1.toString()

The object p1 belongs to the class P1D. Analogously, one can store histograms etc.

HepSim API

All HepSim Jython/Java validation scripts are based on a few classes. With such classes, you can make any observable from the particle lists stored inside ProMC data files:

Histograms and plots

  • HPlot - canvas to show X-Y data and histograms in 2D
  • HPlot3D - canvas to show X-Y-Z data and histograms in 3D
  • H1D - 1D histogram
  • H2D - 2D histogram
  • P1D - X-Y container with support of 2-level errors

See also other math and graphics classes in the jhplot package.

Lorentz particles and Jets

  • PromcUtil convenient method to fill arrays with particles from ProMC files
  • LParticle a HEP particle with the Lorentz transformations
  • FastParticle a HEP particles with precomputed Et2,Eta,Phi for jet algorithms.
  • Physics vectors typical HEP physics vectors the Lorentz transformations
  • JetN2 recommended kt-type jet clustering ng algorithms (kT, anti-kT, CA) implemented in Java using N^2 approach, similar to the FastJet algorithm. Recommended for hadron collisions
  • SCJet traditional kt-type jet clustering algorithms (kT, anti-kT) for pp implemented in Java using N^3 approach (slow)
  • Jets and event shapes traditional jet algorithms and event shapes for e+e- (Geneva, Jade, Durham jets)

Java implementation of the longitudinally invariant kT and anti-kT clustering algorithms uses the E-scheme to combine particles (p1+p2) and Eta-Phi space (not Rapidity-Phi). Also, Cambridge/Aachen jets are supported. HepJet on github shows benchmarks of the Java implementation of the anti-KT with the original FastJet package.

You can build the standard kt-jets using PromcUtil, followed by the KTjet class.

Working with ProMC files

Also, look at the ProMC Java API that is used to store data:

It is simplified compared to full MC simulations, and also includes additional arrays “idata” and “fdata” to store information on event weights and uncertainties.

The project uses the community edition of DataMelt project and the FreeHEP Java Libraries.

GROOT API

In addition, this tool supports GRoot histograms and canvases (see GRoot poject). The syntax for histogram creation and plotting is similar to the PyROOT commands, thus it is easier to use this package for PyROOT users. You can find the Java API of this package in here.

The following histograms are available:

They can be plotted on the EmbeddedCanvas.

Sergei Chekanov 2015/03/06 08:49


1) Cross-platform validation and analysis environment for particle physics, S.V. Chekanov, I. Pogrebnyak, D. Wilbern, arXiv:1510.06638 http://arxiv.org/abs/1510.06638