HepSim

Repository with Monte Carlo predictions for HEP experiments

HepSim manual


Please use the HepSim Wiki with the most updated manual and physics/performance studies based on HepSim.

Quick start with HepSim

Install the HepSim software toolkit using the "bash" shell on Linux/Mac:
      wget http://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz;
      source hs-toolkit/setup.sh
This creates the directory "hs-toolkit" with HepSim commands. Java7/8 and above should be installed. You can view the commands by typing:
      hs-help
This package is used to download, view and processes HepSim files. To work with full simulation files, use the JAS4PP program.

(1) List available data files

Let us show how to find all files associated with a given Monte Carlo event sample. Go to HepSim database and find the "File" column. It shows truth-level files ("EVGEN"). Then find the files as:
      hs-ls [name] 
where [name] is the dataset name. One can also use the URL of the Info page instead, or the URL of the location of all files. This command shows a table with file names and their sizes.

Here is an example illustrating how to list all files from the Higgs to ttbar Monte Carlo sample:
      hs-ls tev100_higgs_ttbar_mg5 
Similarly, one can use the download URL:
      hs-ls http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/ 

(2) Searching for datasets

The best method to find the needed sample is to use the web page with database search. Enter "rfull" in the search field, and you will see all samples with full simulation taggs. Enter "rfast", and you will see samples with fast simulations. If you search for "higgs", you will see the list of samples for Higgs. Also you can use complex searches, such as "pythia%rfast001" (Pythia sample after a fast simulation with the tag 001).

You can also use convinient URL links that search for some datasets. Here are a few examples:

1) http://atlaswww.hep.anl.gov/hepsim/list.php?find=rfull list MC after full simulations

2) http://atlaswww.hep.anl.gov/hepsim/list.php?find=rfast lists MC after fast simulations

3) http://atlaswww.hep.anl.gov/hepsim/list.php?find=mg5 - lists all Madgraph5 samples

4) http://atlaswww.hep.anl.gov/hepsim/list.php?find=higgs%rfast - lists Higgs samples after fast simulation

If you prefer to use the command-line approach, you can find URL that corresponds a dataset using this command:
      hs-find [search word] 
The search is performed using names of datasets or Monte Carlo models, or in the file description. For example, to find all URL locations that correspond to simulated samples with Higgs, try this:
      hs-find higgs 
If you are interested in a specific reconstruction tag, use "%" to separate the search string and the tag name. Example:
     hs-find pythia%rfast001 
It will search for Pythia samples after a fast detector simulation with the tag "001". To search for a full detector simulation, replace "rfast" with "rfull".

(3) Downloading truth-level files

One can download all files for a given dataset as:
      hs-get [name] [OUTPUT_DIR]
where [name] is the dataset name. This also can be the URL of the Info page, or a direct URL pointing to the locations of ProMC files. This example downloads all files from the "tev100_higgs_ttbar_mg5" dataset to the directory "data":
      hs-get tev100_higgs_ttbar_mg5 data
Alternatively, this example downloads the same files using the URL of the Info page:
      hs-get http://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data 
Or, if you know the download URL, use it:
      hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data
All these examples will download all files from the "tev100_higgs_ttbar_mg5" event sample.

One can add an integer value to the end of this command which specifies the number of threads. If a second integer is given, it will be used to set the maximum number of files to download. This example will download 10 files in 3 thread from the dataset "tev100_higgs_ttbar_mg5".
      hs-get tev100_higgs_ttbar_mg5 data 3 10                       
One can also download files that have certain pattern in their names. If URL contains files generated with different pT cuts, the names are usually have the substring “pt”, followed by the pT cut. In this case, one can download such files as:
      hs-get tev13_higgs_pythia8_ptbins  data 3 10 pt100_
where the name is tev13_higgs_pythia8_ptbins. The command download files to the "data" directory in 2 threads. The maximum number of download files is 5 and all file names have *pt100* substring (i.e. pT>100 GeV).

The general usage of the hs-get command requires 2, 3, 4 or 5 arguments:
     [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)]
where [URL] is either info URL, [Download URL], or the dataset name.

(4) Downloading files after detector simulation

Some datasets contain data files after detector simulations. Reconstructed files are stored inside the directories with the tag "rfastNNN" (fast simulation) or "rfullNNN" (full simulation), where "NNN" is a version number. For example, tev100_ttbar_mg5 sample includes the link "rfast001" (Delphes fast simulation, version 001). To download the reconstructed events for the reconstruction tag "rfast001", use this syntax:
      hs-ls  tev100_ttbar_mg5%rfast001      # list ROOT files with the tag "rfast001" 
      hs-get tev100_ttbar_mg5%rfast001 data # download to the "data" directory 
The symbol "%" separates the dataset name ("tev100_ttbar_mg5") from the reconstruction tag ("rfast001"). You can skip "data" in the second example - in this case, data will be copied to the directory "tev100_ttbar_mg5%rfast001".

One can also download files in several threads. If you want to download 10 files in 3 threads, run:
      hs-get  tev100_ttbar_mg5%rfast001 data 3 10
As before, one can also download the files using the URL:
      hs-ls http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
      hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data

(5) File validation

One can check file consistency and print additional information as:
      hs-info http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc
The last argument can be a file location on the local disk (works faster than URL!). The output of the above command is:
   ProMC version = 2
   Last modified = 2013-06-05 16:32:18
   Description   = PYTHIA8;PhaseSpace:mHatMin = 20;PhaseSpace:pTHatMin = 20;
                ParticleDecays:limitTau0 = on;
                ParticleDecays:tau0Max = 10;HiggsSM:all = on;
   Events        = 10000
   Sigma    (pb) = 2.72474E1 ± 1.92589E-1
   Lumi   (pb-1) = 3.67007E2
   Varint units  = E:100000 L:1000
   Log file:     = logfile.txt
   The file was validated. Exit.
You can see that this file includes a complete logfile ("logfile.txt"). We will explain how to extract it later.

(6) Looking at separate events

One can print separate events using the above command after passing an integer argument that specifies the event to be looked at. This command prints the event number "100":
      hs-info http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc 100 
More conveniently, one can open the file in a GUI mode to look at all events:
      hs-view [promc file]              
This command brings up a GUI window to look at separate events. You should forward X11 to see the GUI. For Windows: download the file hepsim.jar and click on it. Then open the file as [File]-[Open file]. Look at HepSim wiki (hepsim.jar is similar to browser_promc.jar).

You can also view events without downloading files. Simply pass a URL to the above command and you will stream Monte Carlo events:
      hs-view http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/qcd/pythia8/pythia8_qcd_1.promc
Here we looked at one file of Pythia8 (QCD) sample. Files with NLO predictions will be automatically identified: For such files, you will see a few particles per events and weights for calculations of PDF uncertainties.

(7) Monte Carlo logfile

You can work with ProMC using the standard Linux commands, such as "unzip":
      unzip -p [promc file]  logfile.txt
where [promc file] is the file name. This command extracts a logfile with original generator-level information. The next command shows the actual number of stored events:
      unzip -p [promc file]  promc_nevents
This command lists the stored events (each event is a ProtoBuffer binary file):
      unzip -l [promc file]

(8) Pileup mixing

One can mix events from a signal ProMC file with inelastic (minbias) events using the "pileup mixing" command:
      hs-pileup pN signal.promc minbias.promc output.promc
Here "p" indicates that events from "minbias.promc" will be mixed with every event from "signal.promc" using a Poisson distribution with the mean "N". If "p" before N is not given, then exactly N (random) events from minbias.promc will be added to every event from "signal.promc". Use large numbers of events in "minbias.promc" to minimise reuse of the same events. The barcode of particles inside "output.promc" indicates the event origin (0 is set for particles from "signal.promc"). Look at the example in the section Pileup Mixing Wiki.

(9) Processing with a fast detector simulation

You need to install ProMC C++ package and then Delphes. The easiest is to use the FastHepSim package as explained in HepSim wiki, since it includes everything you need (Delphes, ProMC and input cards). If not, install everything manually.

Here is a Delphes command to create a ROOT file with the fast detector simulation using "delphes.tcl" card:
      DelphesProMC delphes.tcl output.root [promc file] 
If [promc file] is slimmed, remove the line “TauTagging” from delphes_card.tcl to avoid a crash. If you want to run over multiple ProMC files without manual download, use this command:
      hs-exec DelphesProMC delphes.tcl output.root [URL] [Nfiles]  
where [URL] is HepSim location of files and [Nfiles] is the number of files for processing. The output ROOT will be located inside "hepsim_output" directory. Here is a small example:
      hs-exec DelphesProMC delphes.tcl output.root  http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 5
which processes 5 files from HiggsToTTbar sample. Skip "5" at the end to process all files.

(10) Analysing HepSim truth-level files

One can analyse Monte Carlo events on Window, Linux and Mac with Java7/8. Many HepSim samples include *.py scripts to calculate differential cross sections. One can run validation scripts from the Web using Java Web Start. Also, one can run scripts using a desktop and streaming data via the network, or using downloaded files (in which case you pass the directory with *promc files as an argument).

HepSim Analysis Primer explains how to write validation and analysis programs. Here are a few approaches showing how to read *.py scripts:

(a) Using Java Web start

Many "Info" pages of HepSim have Jython (Python) scripts for validation and analysis. One can run such scripts from the web browsers using the Java Web Start technology. Click the "Launch" button. You will see an editor. Then click the "Run" button to process events.

To use Java Web Start, you should configure Java permissions: For Linux/Mac, run "ControlPanel", go to the "Security" tab and add "http://atlaswww.hep.anl.gov" to the exception list. For Windows, find "Java Control Panel" and do the same. Read Why are Java applications blocked by your security settings. In addition, if you are a Mac user, you should allow execution of programs outside Mac App Store.

(b) Using stand-alone Python

In this example, we will run a Python (to be more exact, Jython) script and, at the same time, will stream data from the web. Find a HepSim event sample by clicking the info "Info" column. For example, look at a ttbar sample from Madgraph: ttbar_mg5. Find the URL of the analysis script ("ttbar_mg5.py") located at the bottom. Copy it to some foulder. Or use "wget":
      wget http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/macros/ttbar_mg5.py
Then process this code in a batch mode as:
      hs-run ttbar_mg5.py
If you do not want a pop-up canvas with the output histogram, change the line 71 to "c1.visible(0)" (or "c1.visible(False)") and add "sys.exit(0)" at the very end of the "ttbar_mg5.py" macro.

One can view, edit and run the analysis file using a simple GUI editor.
      hs-ide ttbar_mg5.py
It opens this file for editing. One can run it by clicking on the "run" button. It also provides an interactive Jython shell.

When possible, use the downloaded ProMC files, rather than streaming the data over the network. The calculations will run faster since the program does calculations using local files. Let assume that we put all ProMC files to the directory "data". Then run the script as:
      hs-run ttbar_mg5.py data
Here is a complete example: we download data to the directory "ttbar_mg5", then we download the analysis script, and then we run this script over the local data using 10000 events:
      hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5 ttbar_mg5
      hs-run ttbar_mg5 10000

(c) Using a Java IDE

The above example has some limitations since it uses rather simple editor.  Another approach is to use the full-featured Jas4pp or  DataMelt programs which give more flexibility.
      get -O dmelt.zip http://jwork.org/dmelt/download/current.php;
      unzip dmelt.zip; 
      ./dmelt/dmelt_batch.sh ttbar_mg5.py 
You can also pass URL with data as an argument and limit the calculation to 10000 events:
      ./dmelt/dmelt_batch.sh ttbar_mg5.py http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/ 10000 

As before, use the batch mode using downloaded ProMC files. Let assume that we put all ProMC files to the directory "data". Then run DataMelt over the data as:
      ./dmelt/dmelt_batch.sh ttbar_mg5.py data 
Here is a complete example: we download data to the directory "ttbar_mg5", then we download the analysis script, and then we run this script over the local data using 10000 events:
      hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5 ttbar_mg5
      ./dmelt/dmelt_batch.sh ttbar_mg5.py ttbar_mg5 10000 
Then click "run" (or [F8]). One can also start DataMelt without input files:
      ./dmelt/dmelt.sh
on Linux/Mac. On Windows, run "dmelt.bat" instead. You will see the DatMelt IDE. Locate an URL location of the analysis script, such as ttbar_mg5 (can be found under "Info" link). Then copy this link using the right mouse button ("Copy URL Location"). Next, in the ScaVis menu, go to "File"→"Read script from URL". Copy the URL link of the *.py file to the pop-up DataMelt URL dialog and click "run". The program will start reading data from the Web. At the end of the run, you will see a pop-up window with a histogram. This method works for Python/Jython, Java, Ruby, Groovy, BeanShell languages.

(d) Using C++/ROOT

Install ProMC and ROOT. Make sure that the enviromental variables $PROMC and $ROOTSYS are set correctly. Then look at the examples:
  $PROMC/examples/reader_mc/  - shows how to read ProMC files from a typical Monte Carlo generator
  $PROMC/examples/reader_nlo/ - shows how to read ProMC files with NLO calculations (i.e. MCFM)
  $PROMC/examples/promc2root/ - shows how to read PROMC files and create ROOT Tree. 
One can read ProMC files in C++/ROOT or CPython as explained in the HepSim wiki. For C++/ROOT, you should download files a priory since the streaming over the network is not supported.
There is a simple example showing how to read multiple Monte Carlo files from HepSim, build anti-KT jets using FastJet, and fill ROOT histograms. Download hepsim-cpp package and compile it:
    wget http://atlaswww.hep.anl.gov/asc/hepsim/soft/hepsim-cpp.tgz -O - | tar -xz;
    cd hepsim-cpp/; make 
Read the file "README" inside this directory.

(11) Analysing data after detector simulations

HepSim includes data after fast and full detector simulations. There are several methods to analyse such files.

(a) Analysing Delphes ROOT files (fast simulation)

The Delphes ROOT files are typically posted using the reconstruction tag "rfast[XXX]", where "[XXX]" is a number. Use search to find such samples. Read the Delphes documentation about how to read Delphes ROOT files.

You can find all samples that contain fast simulations using this link.

(b) Full simulation: LCIO files

The full simulation files are posted under the tag "rfull[XXX]" where "[XXX]" is a number. We use LCIO file format that is readable by C++, Fortran and Java. Such files have an extension "slcio". You can analyse the LCIO files using Jas4pp program that allows you to read files using the Python syntax.

You can find all samples that contain full simulations using this link.

(12) Conversion to ROOT, HEPMC, HEPEVT, LHE, STDHEP, LCIO

One can convert ProMC file to ROOT to look at branches. If the ProMC package is installed, run the converter:
      cp -rf $PROMC/examples/promc2root .
      cd promc2root
      make
      ./promc2root [promc file] output.root
The output file will contain ROOT branches with px,py,pz,e, etc.

One can also convert ProMC to HEPMC using the example $PROMC/examples/promc2hepmc (see the ProMC manual). In addition, the directory $PROMC/examples/ has examples showing how to convert ProMC to HEPEVT records (promc2hepevnt), STDHEP (promc2stdhep), LHE (promc2lhe) and LCIO (promc2lcio).

(13) Comparing with data

Durham HepData database maintains "DMelt" scripts compatible with HepSim analysis scripts, thus it is relatively easy to overlay Monte Carlo predictions and data from published articles. For example, look at the link AAD 2013 from HepData and download a "DMelt" Jython script with published data. You can run this script inside DMelt IDE, or using the "hs-run" and "hs-ide" commands from hs-tools. Then one can combine this script with a HepSim script (from the "Info" description) that runs over HepSim Monte Carlo data, creating plots showing agreement between data and theoretical calculations.

2013-2016. S. Chekanov (ANL)

HEP.ANL.GOV