Table of Contents
HepSim Analysis Primer
This analysis tutorial covers Python codding on the Java platform (Jython). Look at the main description for other topics. For C++/ROOT, please refer ProMC web page.
To ensure platform independence and a possibility to run programs using web browsers [(Cross-platform validation and analysis environment for particle physics, S.V. Chekanov, I. Pogrebnyak, D. Wilbern, arXiv:1510.06638 http://arxiv.org/abs/1510.06638)], validation programs are written in Jython, which is an implementation of the Python language on the Java platform. Look at several links:
The example codes on this page can be run using several options:
- Java web start from the browser from the HepSim database info pages
- The light-weight “hs-tools” package as described in hepsim_software_toolkit
- Jas4pp program (typically used for full Geant4 simulation files)
- DMelt IDE (which can also be used with other languages in addition to Python)
Accessing data
Here is a simple example to make a list of files using the data URL:
from hepsim import HepSim url="http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_pythia8/" flist=HepSim.getList(url) print flist
If you do not know URL, you can use the dataset name or URL of the info page of this dataset:
from hepsim import HepSim url=HepSim.urlRedirector("tev100_ttbar_mg5") flist=HepSim.getList(url) print flist
If ProMC files were downloaded to the local file system, one can build the file list as:
from jhplot.utils import FileList flist=FileList.get("./data","promc") print flist
Here “./data” is a directory with files, while “promc” indicated that we build the file list using the file extensions “.promc”.
Often, the output list contains multiple files generated with different pT or mass ranges. For example, a group of files generated with minimum pT=1000 GeV have names *_pt1000_*.py, while files generated with pt=5000 GeV have names *_pt5000_*.py. One can build a list with pT ranges using this method:
xmin=HepSim.getRanges(flist,"_pt","_");
This methods takes file list, and identify pT ranges given between the strings “_pt” and “_”. In the above example, the output list will be [1000,5000].
Reading the file header
To read data from a ProMC file, you will need these statements:
from proto import FileMC f=FileMC("data.promc") # get input file from current directory (or URL) des=f.getDescription() # get description header = f.getHeader() # to access information on this file un=float(header.getMomentumUnit()) # conversion energy units lunit=float(header.getLengthUnit()) # conversion length units
The latter 2 values are needed to convert from integer presentation of 4-momenta, typically used for variable byte encoding to save space, to double values. The values are typically around 1000-10000, depending on importance of representing low-momentum with a better precision. The header file also stores some additional information. Look at Java API
- HepSim API for FileMC
- ProMCHeader for MC records
- ProMCHeader for NLO records
Reading the event records
from proto import FileMC f=FileMC("data.promc") # get input file from current directory (or URL) header = f.getHeader() # to access information on this file un=float(header.getMomentumUnit()) # conversion energy units lunit=float(header.getLengthUnit()) # conversion length units for i in range(f.size()): # run over all events in the file eve = f.read(i) # ProMCEvent record for event "i" pa = eve.getParticles() # particle information
Look at Java API of the ProMCEvent class:
Here is the Java API for “Particles”:
In the above example, “pa” is an object Particles with particle information. We can create a typical Lorentz particle using the class LParticle as:
from proto import FileMC from hephysics.particle import LParticle f=FileMC("data.promc") # get input file from current directory (or URL) header = f.getHeader() # to access information on this file un=float(header.getMomentumUnit()) # conversion energy units lunit=float(header.getLengthUnit()) # conversion length units for i in range(f.size()): # run over all events in the file eve = f.read(i) # ProMCEvent record for event "i" pa = eve.getParticles() # particle information for j in range(pa.getPxCount()): px=pa.getPx(j)/un # pX py=pa.getPy(j)/un # pY pz=pa.getPz(j)/un # pZ e= pa.getEnergy(j)/un # energy m= pa.getMass(j)/un # mass p=LParticle(px,py,pz,e,m) # fill it
To access the PDG status code, us “pa.getPdgId(j) method. For stable particles, use the HEPEVT status code (the method “pa.getStatus(j)”). Here are the methods to retrieve information from the event records on a particle at a position j:
- pa.getPdgId(j) - get PDG id (int)
- pa.getStatus(j) - get status information (int)
- pa.getPx(j), pa.getPy(j), pa.getPz(j), pa.getEnergy(j) - get Px, Py, Pz, Energy of a particle (in int64 varint)
- pa.getX(j), pa.getY(j), pa.getZ(j), pa.getT(j) - get (x, y, z, t) position of a particle (in int64 varint)
- pa.getMass(j) - get mass (in int64 varint)
- pa.getMother1(j), pa.getMother2(j) - get positions of first and second parent of the particle (int)
- pa.getDaughter1(j), pa.getDaughter1(j) - get position of 1st and 2nd daughter (int)
- pa.getBarcode(j) - get barcode (int, in filled)
Look at the API of the Lorentz-particle class:
- LParticle - a simple Lorentz particle with transformations
- HEParticle - a typical HEP particle based on the Lorentz particle with transformations
In addition, you can build an ArrayList with particles using a convenient class PromcUtil. Also, when possible, use the method “getParticle()” that returns a HEParticle particle with 4-momenta and postilion.
from proto import FileMC from hephysics.hepsim import PromcUtil f=FileMC("data.promc") # get input file from current directory (or URL) header = f.getHeader() # to access information on this file un=float(header.getMomentumUnit()) # conversion energy units lunit=float(header.getLengthUnit()) # conversion length units for i in range(f.size()): # run over all events in the file eve = f.read(i) # ProMCEvent record for event "i" pa = eve.getParticles() # particle information par=PromcUtil.getParticleDList(f.getHeader(), pa, 1, -1, 1000) # List with particle 4-momenta print "Nr of final-state particles=",par.size()
The above example fills a list with all stable particles, without any cuts on pT and Eta (2 second arguments).
The next example creates HEParticle using PromcUtil:
from proto import FileMC from hephysics.hepsim import PromcUtil f=FileMC("data.promc") # get input file from current directory (or URL) header = f.getHeader() # to access information on this file un=float(header.getMomentumUnit()) # conversion energy units lunit=float(header.getLengthUnit()) # conversion length units for i in range(f.size()): # run over all events in the file eve = f.read(i) # ProMCEvent record for event "i" pa = eve.getParticles() # particle information for j in range(pa.getPxCount()): particle=PromcUtil.getParticle(pa, un, lunit, j) # a HEP particle
In this example, we fill a list with all stable particles (status “1”), without any pT cut (”-1“) and maximum absolute value of pseudo-rapidity 1000. The object “par” is a list with ParticleD.
Finality, let us extend the above example: we will create anti-kT jets using the list “par”:
from proto import FileMC from hephysics.hepsim import PromcUtil from hephysics.jet import JetN2 ktjet=JetN2(0.5,"antikt",20) # initialize jets with R=0.5, E-mode, anti-KT,min pT=20 print ktjet.info() # print its settings f=FileMC("data.promc") # get input file from current directory (or URL) header = f.getHeader() # to access information on this file un=float(header.getMomentumUnit()) # conversion energy units lunit=float(header.getLengthUnit()) # conversion length units for i in range(f.size()): # run over all events in the file eve = f.read(i) # ProMCEvent record for event "i" pa = eve.getParticles() # particle information par=PromcUtil.getParticleDList(f.getHeader(), pa, 1, -1, 1000) # List with particle 4-momenta ktjet.buildJets(par) # build anti-kT jets jets=ktjet.getJetsSorted() # get a list with sorted jets if (len(jets)>0): print "pT of a leading jet =",jets[0].perp()," GeV"
To build anti-kT jets, we use JetN2 clustering algorithm implemented in Java. The description of this algorithm is given in http://arxiv.org/abs/1510.06638. It is a similar to FastJet N*N algorithm and typically shows a performance similar to the C++ analogue. Tests indicate that the JetN2 clustering algorithm is about twice slower that C++, but by a factor 20 faster than the traditional N^3 jet clustering algorithms. You can find benchmarking results in HepJet github repository. Look at a typical example of jet clustering in the qcd_pythia8_ptbins_jets.py code.
Event statistics
Before closing ProMC file, you may need to access event statistics, cross sections and errors on cross sections.
from proto import FileMC f=FileMC("data.promc") # get input file from current directory (or URL) stat = f.getStat() # get statistics cross=stat.getCrossSectionAccumulated() # accumulated cross section in pb error=stat.getCrossSectionErrorAccumulated() # error on accumulated cross section in pb print "Cross section:", cross, "+/-", error, " pb" f.close()
Look at the API:
Plots and histograms
HepSim uses histogram packages supported by DataMelt (and implemented using JAIDA and FreeHep).
Here are a few most common classes:
- HPlot - simple canvas to show graphs
- H1D - 1D histogram
- H2D - 2D histogram
- P1D - data container with support of errors
The main canvas to show histograms H1D and data points P1D is HPlot. To process scripts in a background without a pop-up HPlot, use method “visible(False)”, and set sys.exit(0) at the end of the scripts.
Usually, plots are saved in the vector-graphics SVG format. They can be converted on Linux to EPS as:
inkscape --without-gui --export-eps output.eps input.svg
However, one can also save images in PDF or EPS using the corresponding file extensions for the “export” command.
In addition, data are saved in the form of XML (with the extension ”.jdat“) files. Look the manual DatMelt IO. For example, one can read such files and create histograms or data objects as:
from jhplot.io import HBook hb = HBook("http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/macros/ttbar_mg5.jdat") print hb.getKeys() # print all the keys p1=hb.get("1") print p1.toString()
The object p1 belongs to the class P1D. Analogously, one can store histograms etc.
HepSim API
All HepSim Jython/Java validation scripts are based on a few classes. With such classes, you can make any observable from the particle lists stored inside ProMC data files:
Histograms and plots
- HPlot - canvas to show X-Y data and histograms in 2D
- HPlot3D - canvas to show X-Y-Z data and histograms in 3D
- H1D - 1D histogram
- H2D - 2D histogram
- P1D - X-Y container with support of 2-level errors
See also other math and graphics classes in the jhplot package.
Lorentz particles and Jets
- PromcUtil convenient method to fill arrays with particles from ProMC files
- LParticle a HEP particle with the Lorentz transformations
- FastParticle a HEP particles with precomputed Et2,Eta,Phi for jet algorithms.
- Physics vectors typical HEP physics vectors the Lorentz transformations
- JetN2 recommended kt-type jet clustering ng algorithms (kT, anti-kT, CA) implemented in Java using N^2 approach, similar to the FastJet algorithm. Recommended for hadron collisions
- SCJet traditional kt-type jet clustering algorithms (kT, anti-kT) for pp implemented in Java using N^3 approach (slow)
- Jets and event shapes traditional jet algorithms and event shapes for e+e- (Geneva, Jade, Durham jets)
Java implementation of the longitudinally invariant kT and anti-kT clustering algorithms uses the E-scheme to combine particles (p1+p2) and Eta-Phi space (not Rapidity-Phi). Also, Cambridge/Aachen jets are supported. HepJet on github shows benchmarks of the Java implementation of the anti-KT with the original FastJet package.
You can build the standard kt-jets using PromcUtil, followed by the KTjet class.
Working with ProMC files
Also, look at the ProMC Java API that is used to store data:
- ProMCHeader for MC and ProMCHeader for NLO
It is simplified compared to full MC simulations, and also includes additional arrays “idata” and “fdata” to store information on event weights and uncertainties.
The project uses the community edition of DMelt project and the FreeHEP Java Libraries.
— Sergei Chekanov 2015/03/06 08:49