Table of Contents

<< back to HepSim manual

Quick start

Install the HepSim software toolkit using the “bash” shell on Linux/Mac:

bash   #  set bash if you haven't done this before 
wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz;
source hs-toolkit/setup.sh

This creates the directory “hs-toolkit” with HepSim commands. You can also download it as hs-toolkit.tgz. Note that Java 8 and above should be installed. You can view the commands using the bash shell by typing :

bash> hs-help

The directory contains several bash scripts for Linux/Mac, and Windows batch (BAT) files to process events on Windows OS. The package is used for download, view and analyze truth-level events in the PROMC or PROIO format.

Use JAS4PP program for analysing LCIO (*.lcio) files with Geant4 simulations. This program can also be used for truth-level records in the PROMC, PROIO and ROOT file formats. You can also use Delphes/ROOT framework for ROOT files.

Finding data files

Let us show how to find the files associated with a given Monte Carlo event sample. Go to HepSim database and find the “Files” column. It shows URL of truth-level files (“EVGEN”), i.e. files directly created by event generators.

You can use the command line tool to list files associated with a dataset as:

hs-ls [name] 

where [name] is the dataset name. One can also use the URL of the Info page instead, or the URL of the location of all files. This command shows a table with file names.

Here is an example illustrating how to list all files from the Higgs to ttbar Monte Carlo sample:

hs-ls tev100pp_higgs_ttbar_mg5 

(“tev” defines the energy unit, “pp” means pp collisions, “mg5” means MadGraph5). Similarly, one can use the download URL:

hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/ 

Note that in this approach, one can use URL mirrors close to your geographical location.

If you need to create list for downloads with all files, use this syntax:

hs-ls [name] simple     > input.list     # make list of ProMC files (without URL path)
hs-ls [name] simple-url > input_url.list # make list with URL from the main server

where [name] is the name of dataset. You can also use a URL if you want to create a list of files from certain (mirror) servers.

Searching for datasets

The best method to find the needed sample is to use the web page with database search.

Enter “rfull” in the search field, and you will see all samples with full simulation taggs. Enter “rfast”, and you will see samples with fast simulations. If you search for “higgs”, you will see the list of samples for Higgs. Also you can use complex searches, such as “pythia%rfast001” (Pythia sample after a fast simulation with the tag 001).

You can also use convenient URL links that search for some datasets. Here are a few examples:

If you prefer to use the command-line approach, you can find URL that corresponds a dataset using this command:

hs-find [search word] 

The search is performed using names of datasets or Monte Carlo models, or in the file description. For example, to find all URL locations that correspond to simulated samples with Higgs, try this:

hs-find higgs 

If you are interested in a specific reconstruction tag, use “%” to separate the search string and the tag name. Example:

hs-find pythia%rfast001 

It will search for Pythia samples after a fast detector simulation with the tag “001”. To search for a full detector simulation, replace “rfast” with “rfull”.

Downloading EVGEN files

One can download all EVGEN files for a given dataset as:

hs-get [name] [OUTPUT_DIR]

where [name] is the dataset name. This also can be the URL of the Info page, or a direct URL pointing to the locations of ProMC files. This example downloads all files from the “tev100pp_higgs_ttbar_mg5” dataset to the directory “data”:

hs-get tev100pp_higgs_ttbar_mg5 data

Alternatively, this example downloads the same files using the URL of the Info page:

hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data 

Or, if you know the download URL, use it:

hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data

All these examples will download all files from the “tev100pp_higgs_ttbar_mg5” event sample.

One can add an integer value to the end of this command which specifies the number of threads. If a second integer is given, it will be used to set the maximum number of files to download. This example will download 10 files in 3 thread from the dataset “tev100pp_higgs_ttbar_mg5”.

hs-get tev100pp_higgs_ttbar_mg5 data 3 10                       

One can also download files that have certain pattern in their names. If URL contains files generated with different pT cuts, the names are usually have the string “pt”, followed by the pT cut. In this case, one can download such files as:

hs-get tev13pp_higgs_pythia8_ptbins  data 3 10 pt100_

where the name is tev13pp_higgs_pythia8_ptbins.

The command download files to the “data” directory in 2 threads. The maximum number of download files is 5 and all file names have *pt100* string (i.e. pT>100 GeV).

The general usage of the hs-get command requires 2, 3, 4 or 5 arguments:

hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)]

where [URL] is either info URL, [Download URL], or the dataset name.

Downloading RECO files

Many datasets contain data files after Geant4 detector simulation and reconstruction (“RECO”). The files are in “LCIO” format. They include complete tracks, hits, calorimeter clusters etc. Reconstructed files are stored inside the directories with the tag “rfastNNN” (Delphes fast simulation) or “rfullNNN” (full simulation), where “NNN” is a version number. You can identify detector geometries that correspond to the tags using detector description page. For example, tev100pp_ttbar_mg5 sample includes the link “rfast001” (Delphes fast simulation, version 001). To download the reconstructed events for the reconstruction tag “rfast001”, use this syntax:

hs-ls  tev100pp_ttbar_mg5%rfast001      # list ROOT files with the tag "rfast001" 
hs-get tev100pp_ttbar_mg5%rfast001 data # download to the "data" directory 

The symbol “%” separates the dataset name (“tev100pp_ttbar_mg5”) from the reconstruction tag (“rfast001”). You can skip “data” in the second example - in this case, data will be copied to the directory “tev100pp_ttbar_mg5%rfast001”.

One can also download files in several threads. If you want to download 10 files in 3 threads, run:

hs-get  tev100pp_ttbar_mg5%rfast001 data 3 10

As before, one can also download the files using the URL:

hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data

File validation

One can check file consistency and print additional information as:

hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/pythia8_higgs2mumu/tev14_pythia8_h2mm_1.promc

The last argument can be a file location on the local disk (works faster than URL!). The output of the above command will look something like this:

Click to display ⇲

Click to hide ⇱

   ProMC version = 2
   Last modified = 2013-06-05 16:32:18
   Description   = PYTHIA8;PhaseSpace:mHatMin = 20;PhaseSpace:pTHatMin = 20;
                ParticleDecays:limitTau0 = on;
                ParticleDecays:tau0Max = 10;HiggsSM:all = on;
   Events        = 10000
   Sigma    (pb) = 2.72474E1 ± 1.92589E-1
   Lumi   (pb-1) = 3.67007E2
   Varint units  = E:100000 L:1000
   Log file:     = logfile.txt
   The file was validated. Exit.

You can see that this file includes a complete logfile (“logfile.txt”). We will explain how to extract it later.

Looking at separate events

One can print separate EVGEN events using the above command after passing an integer argument that specifies the event to be looked at. This command prints the event number “100”:

hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/pythia8_higgs2mumu/tev14_pythia8_h2mm_1.promc 100 

More conveniently, one can open the file in a GUI mode to look at all events:

hs-view [promc file]              

This command brings up a GUI window to look at separate events. You should forward X11 to see the GUI. For Windows: download the file hepsim.jar and click on it. Then open the file as [File]-[Open file].

If you use Windows OS, click “hs-view.bat” and open ProMC file using the File Menu.

You can also view EVGEN events without downloading files. Simply pass a URL to the above command and you will stream Monte Carlo events:

hs-view https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/pythia8_higgs2mumu/tev14_pythia8_h2mm_1.promc

Here we looked at one file of Pythia8 (QCD) sample. Files with NLO predictions will be automatically identified: For such files, you will see a few particles per events and weights for calculations of PDF uncertainties.

Monte Carlo logfile

Each ProMC/ProIO file includes a logfile from the Monte Carlo generator. Show this file on the screen as:

hs-log [file]

where [file] is either a ProMC or ProIO file (you can use URL instead of the full path on the local computer).

In the case of ProMC files, one can use the standard Linux commands, such as “unzip”:

unzip -p [promc file]  logfile.txt

where [promc file] is the file name. This command extracts a logfile with original generator-level information. The next command shows the actual number of stored events:

unzip -p [promc file]  promc_nevents

This command lists the stored events (each event is a ProtoBuffer binary file):

unzip -l [promc file]

Pileup mixing

One can mix events from a signal ProMC file with inelastic (minbias) events using the “pileup mixing” command:

hs-pileup pN signal.promc minbias.promc output.promc

Here “p” indicates that events from “minbias.promc” will be mixed with every event from “signal.promc” using a Poisson distribution with the mean “N”. If “p” before N is not given, then exactly N (random) events from minbias.promc will be added to every event from “signal.promc”. Use large numbers of events in “minbias.promc” to minimise reuse of the same events. The barcode of particles inside “output.promc” indicates the event origin (0 is set for particles from “signal.promc”).

Analysing EVNT files

One can analyse Monte Carlo events on Window, Linux and Mac with Java7/8. Many HepSim samples include *.py scripts to calculate differential cross sections. One can run validation scripts from the Web using Java Web Start. Also, one can run scripts using a desktop and streaming data via the network, or using downloaded files (in which case you pass the directory with *promc files as an argument). Here are a few approaches showing how to read *.py scripts:

Using Java Web start

Many “Info” pages of HepSim have Jython (Python) scripts for validation and analysis. One can run such scripts from the web browsers using the Java Web Start technology. Click the “Launch” button. You will see an editor. Then click the “Run” button to process events.

To use Java Web Start, you should configure Java permissions: For Linux/Mac, run “ControlPanel”, go to the “Security” tab and add “http://atlaswww.hep.anl.gov” to the exception list. For Windows, find “Java Control Panel” and do the same. Read Why are Java applications blocked by your security settings. In addition, if you are a Mac user, you should allow execution of programs outside Mac App Store.

Using stand-alone Python

In this example, we will run a Python (to be more exact, Jython) script and, at the same time, will stream data from the web. Find a HepSim event sample by clicking the info “Info” column. For example, look at a ttbar sample from Madgraph: ttbar_mg5. Find the URL of the analysis script (“ttbar_mg5.py”) located at the bottom. Copy it to some foulder. Or use “wget”:

wget https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/macros/ttbar_mg5.py

Then process this code in a batch mode as:

hs-run ttbar_mg5.py

If you do not want a pop-up canvas with the output histogram, change the line 71 to “c1.visible(0)” (or “c1.visible(False)”) and add “sys.exit(0)” at the very end of the “ttbar_mg5.py” macro.

One can view, edit and run the analysis file using a simple GUI editor.

hs-ide ttbar_mg5.py

It opens this file for editing. One can run it by clicking on the “run” button. It also provides an interactive Jython shell.

If you use Windows OS, click the file “hs-ide.bat” and open the Python script using the menu, and then run this script using the“Run” button.

When possible, use the downloaded ProMC files, rather than streaming the data over the network. The calculations will run faster since the program does calculations using local files. Let assume that we put all ProMC files to the directory “data”. Then run the script as:

hs-run ttbar_mg5.py data

Here is a complete example: we download data to the directory “ttbar_mg5”, then we download the analysis script, and then we run this script over the local data using 10000 events:

hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5 ttbar_mg5
hs-run ttbar_mg5 10000

Using a Java IDE

The above example has some limitations since it uses rather simple editor. Another approach is to use the full-featured Jas4pp or DataMelt programs which give more flexibility.

wget -O dmelt.zip http://jwork.org/dmelt/download/current.php;
unzip dmelt.zip; 
./dmelt/dmelt_batch.sh ttbar_mg5.py 

You can also pass URL with data as an argument and limit the calculation to 10000 events:

./dmelt/dmelt_batch.sh ttbar_mg5.py https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/ 10000 

As before, use the batch mode using downloaded ProMC files. Let assume that we put all ProMC files to the directory “data”. Then run DataMelt over the data as:

./dmelt/dmelt_batch.sh ttbar_mg5.py data 

Here is a complete example: we download data to the directory “ttbar_mg5”, then we download the analysis script, and then we run this script over the local data using 10000 events:

hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5 ttbar_mg5
./dmelt/dmelt_batch.sh ttbar_mg5.py ttbar_mg5 10000 

Then click “run” (or [F8]). One can also start DataMelt without input files:

./dmelt/dmelt.sh

on Linux/Mac. On Windows, run “dmelt.bat” instead. You will see the DatMelt IDE. Locate an URL location of the analysis script, such as ttbar_mg5 (can be found under “Info” link). Then copy this link using the right mouse button (“Copy URL Location”). Next, in the DMelt menu, go to “File”→“Read script from URL”. Copy the URL link of the *.py file to the pop-up DataMelt URL dialog and click “run”. The program will start reading data from the Web. At the end of the run, you will see a pop-up window with a histogram. This method works for Python/Jython, Java, Ruby, Groovy, BeanShell languages.

Using C++/ROOT

Install ProMC and ROOT. Make sure that the environmental variables $PROMC and $ROOTSYS are set correctly. Then look at the examples:

$PROMC/examples/reader_mc/  # shows how to read ProMC files from a typical Monte Carlo generator
$PROMC/examples/reader_nlo/ # shows how to read ProMC files with NLO calculations (i.e. MCFM)
$PROMC/examples/promc2root/ # shows how to read PROMC files and create ROOT Tree. 

For C++/ROOT, you should download files <i>a priory</i> since the streaming over the network is not supported. There is a simple example showing how to read multiple Monte Carlo files from HepSim, build anti-KT jets using FastJet, and fill ROOT histograms. Download hepsim-cpp package and compile it:

wget http://atlaswww.hep.anl.gov/asc/hepsim/soft/hepsim-cpp.tgz -O - | tar -xz;
cd hepsim-cpp/; make 

Read the file “README” inside this directory.

Analyzing RECO data

HepSim includes data after fast and full detector simulations. There are several methods to analyse such files.

Analyzing Delphes ROOT files (fast simulation)

The Delphes ROOT files are typically posted using the reconstruction tag “rfast[XXX]”, where “[XXX]” is a number. Use search to find such samples. Read the Delphes documentation about how to read Delphes ROOT files.

You can find all samples that contain fast simulations using this link.

Full simulation: LCIO files

Events afer detector simulation and reconstruction (“RECO”) are posted under the tag “rfull[XXX]”, where “[XXX]” is a number. We use LCIO file format that is readable by C++, Fortran and Java. Such files have an extension “slcio”. You can analyse the LCIO files using Jas4pp program that allows you to read files using the Python syntax.

If you need to read LCIO files in C++ code with ROOT/FastJet, use the example package https://github.com/chekanov/HepSim.

You can find all samples that contain full simulations using this link.

Conversion to ROOT, HEPMC, HEPEVT, LHE, STDHEP, LCIO

One can convert ProMC file to ROOT to look at branches. If the ProMC package is installed, run the converter:

cp -rf $PROMC/examples/promc2root .
cd promc2root
make
./promc2root [promc file] output.root

The output file will contain ROOT branches with px,py,pz,e, etc.

One can also convert ProMC to HEPMC using the example

$PROMC/examples/promc2hepmc

(see the ProMC manual). In addition, the directory

$PROMC/examples/ 

has examples showing how to convert ProMC to HEPEVT records (promc2hepevnt), STDHEP (promc2stdhep), LHE (promc2lhe) and LCIO (promc2lcio).

Anomaly detection with HepSim

You can use ADFilter to process PROMC and DELPHES files to check how anomalous your events are. In addition, ADFilter can convert LHE files (extension *.lhe.gz) into PROMC files using the web interface.

Comparing with HepData

Durham HepData database maintains “DMelt” scripts compatible with HepSim analysis scripts, thus it is relatively easy to overlay Monte Carlo predictions and data from published articles. For example, look at the link AAD 2013 from HepData and download a “DMelt” Jython script with published data. You can run this script inside DMelt IDE, or using the “hs-run” and “hs-ide” commands from hs-tools. Then one can combine this script with a HepSim script (from the “Info” description) that runs over HepSim Monte Carlo data, creating plots showing agreement between data and theoretical calculations.

Sergei Chekanov 2017/02/06 17:25