Install the HepSim software toolkit using the “bash” shell on Linux/Mac:
bash # set bash if you haven't done this before wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz; source hs-toolkit/setup.sh
This creates the directory “hs-toolkit” with HepSim commands. You can also download it as hs-toolkit.tgz. Note that Java 8 and above should be installed. You can view the commands using the bash shell by typing :
bash> hs-help
The directory contains several bash scripts for Linux/Mac, and Windows batch (BAT) files to process events on Windows OS. The package is used for download, view and analyze truth-level events in the PROMC or PROIO format.
Let us show how to find the files associated with a given Monte Carlo event sample. Go to HepSim database and find the “Files” column. It shows URL of truth-level files (“EVGEN”), i.e. files directly created by event generators.
You can use the command line tool to list files associated with a dataset as:
hs-ls [name]
where [name] is the dataset name. One can also use the URL of the Info page instead, or the URL of the location of all files. This command shows a table with file names.
Here is an example illustrating how to list all files from the Higgs to ttbar Monte Carlo sample:
hs-ls tev100pp_higgs_ttbar_mg5
(“tev” defines the energy unit, “pp” means pp collisions, “mg5” means MadGraph5). Similarly, one can use the download URL:
hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/
Note that in this approach, one can use URL mirrors close to your geographical location.
If you need to create list for downloads with all files, use this syntax:
hs-ls [name] simple > input.list # make list of ProMC files (without URL path) hs-ls [name] simple-url > input_url.list # make list with URL from the main server
where [name] is the name of dataset. You can also use a URL if you want to create a list of files from certain (mirror) servers.
The best method to find the needed sample is to use the web page with database search.
Enter “rfull” in the search field, and you will see all samples with full simulation taggs. Enter “rfast”, and you will see samples with fast simulations. If you search for “higgs”, you will see the list of samples for Higgs. Also you can use complex searches, such as “pythia%rfast001” (Pythia sample after a fast simulation with the tag 001).
You can also use convenient URL links that search for some datasets. Here are a few examples:
If you prefer to use the command-line approach, you can find URL that corresponds a dataset using this command:
hs-find [search word]
The search is performed using names of datasets or Monte Carlo models, or in the file description. For example, to find all URL locations that correspond to simulated samples with Higgs, try this:
hs-find higgs
If you are interested in a specific reconstruction tag, use “%” to separate the search string and the tag name. Example:
hs-find pythia%rfast001
It will search for Pythia samples after a fast detector simulation with the tag “001”. To search for a full detector simulation, replace “rfast” with “rfull”.
One can download all EVGEN files for a given dataset as:
hs-get [name] [OUTPUT_DIR]
where [name] is the dataset name. This also can be the URL of the Info page, or a direct URL pointing to the locations of ProMC files. This example downloads all files from the “tev100pp_higgs_ttbar_mg5” dataset to the directory “data”:
hs-get tev100pp_higgs_ttbar_mg5 data
Alternatively, this example downloads the same files using the URL of the Info page:
hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data
Or, if you know the download URL, use it:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data
All these examples will download all files from the “tev100pp_higgs_ttbar_mg5” event sample.
One can add an integer value to the end of this command which specifies the number of threads. If a second integer is given, it will be used to set the maximum number of files to download. This example will download 10 files in 3 thread from the dataset “tev100pp_higgs_ttbar_mg5”.
hs-get tev100pp_higgs_ttbar_mg5 data 3 10
One can also download files that have certain pattern in their names. If URL contains files generated with different pT cuts, the names are usually have the string “pt”, followed by the pT cut. In this case, one can download such files as:
hs-get tev13pp_higgs_pythia8_ptbins data 3 10 pt100_
where the name is tev13pp_higgs_pythia8_ptbins.
The command download files to the “data” directory in 2 threads. The maximum number of download files is 5 and all file names have *pt100* string (i.e. pT>100 GeV).
The general usage of the hs-get command requires 2, 3, 4 or 5 arguments:
hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)]
where [URL] is either info URL, [Download URL], or the dataset name.
Many datasets contain data files after Geant4 detector simulation and reconstruction (“RECO”). The files are in “LCIO” format. They include complete tracks, hits, calorimeter clusters etc. Reconstructed files are stored inside the directories with the tag “rfastNNN” (Delphes fast simulation) or “rfullNNN” (full simulation), where “NNN” is a version number. You can identify detector geometries that correspond to the tags using detector description page. For example, tev100pp_ttbar_mg5 sample includes the link “rfast001” (Delphes fast simulation, version 001). To download the reconstructed events for the reconstruction tag “rfast001”, use this syntax:
hs-ls tev100pp_ttbar_mg5%rfast001 # list ROOT files with the tag "rfast001" hs-get tev100pp_ttbar_mg5%rfast001 data # download to the "data" directory
The symbol “%” separates the dataset name (“tev100pp_ttbar_mg5”) from the reconstruction tag (“rfast001”). You can skip “data” in the second example - in this case, data will be copied to the directory “tev100pp_ttbar_mg5%rfast001”.
One can also download files in several threads. If you want to download 10 files in 3 threads, run:
hs-get tev100pp_ttbar_mg5%rfast001 data 3 10
As before, one can also download the files using the URL:
hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data
One can check file consistency and print additional information as:
hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/pythia8_higgs2mumu/tev14_pythia8_h2mm_1.promc
The last argument can be a file location on the local disk (works faster than URL!). The output of the above command will look something like this:
You can see that this file includes a complete logfile (“logfile.txt”). We will explain how to extract it later.
One can print separate EVGEN events using the above command after passing an integer argument that specifies the event to be looked at. This command prints the event number “100”:
hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/pythia8_higgs2mumu/tev14_pythia8_h2mm_1.promc 100
More conveniently, one can open the file in a GUI mode to look at all events:
hs-view [promc file]
This command brings up a GUI window to look at separate events. You should forward X11 to see the GUI. For Windows: download the file hepsim.jar and click on it. Then open the file as [File]-[Open file].
You can also view EVGEN events without downloading files. Simply pass a URL to the above command and you will stream Monte Carlo events:
hs-view https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/pythia8_higgs2mumu/tev14_pythia8_h2mm_1.promc
Here we looked at one file of Pythia8 (QCD) sample. Files with NLO predictions will be automatically identified: For such files, you will see a few particles per events and weights for calculations of PDF uncertainties.
Each ProMC/ProIO file includes a logfile from the Monte Carlo generator. Show this file on the screen as:
hs-log [file]
where [file] is either a ProMC or ProIO file (you can use URL instead of the full path on the local computer).
In the case of ProMC files, one can use the standard Linux commands, such as “unzip”:
unzip -p [promc file] logfile.txt
where [promc file] is the file name. This command extracts a logfile with original generator-level information. The next command shows the actual number of stored events:
unzip -p [promc file] promc_nevents
This command lists the stored events (each event is a ProtoBuffer binary file):
unzip -l [promc file]
One can mix events from a signal ProMC file with inelastic (minbias) events using the “pileup mixing” command:
hs-pileup pN signal.promc minbias.promc output.promc
Here “p” indicates that events from “minbias.promc” will be mixed with every event from “signal.promc” using a Poisson distribution with the mean “N”. If “p” before N is not given, then exactly N (random) events from minbias.promc will be added to every event from “signal.promc”. Use large numbers of events in “minbias.promc” to minimise reuse of the same events. The barcode of particles inside “output.promc” indicates the event origin (0 is set for particles from “signal.promc”).
One can analyse Monte Carlo events on Window, Linux and Mac with Java7/8. Many HepSim samples include *.py scripts to calculate differential cross sections. One can run validation scripts from the Web using Java Web Start. Also, one can run scripts using a desktop and streaming data via the network, or using downloaded files (in which case you pass the directory with *promc files as an argument). Here are a few approaches showing how to read *.py scripts:
Many “Info” pages of HepSim have Jython (Python) scripts for validation and analysis. One can run such scripts from the web browsers using the Java Web Start technology. Click the “Launch” button. You will see an editor. Then click the “Run” button to process events.
To use Java Web Start, you should configure Java permissions: For Linux/Mac, run “ControlPanel”, go to the “Security” tab and add “http://atlaswww.hep.anl.gov” to the exception list. For Windows, find “Java Control Panel” and do the same. Read Why are Java applications blocked by your security settings. In addition, if you are a Mac user, you should allow execution of programs outside Mac App Store.
In this example, we will run a Python (to be more exact, Jython) script and, at the same time, will stream data from the web. Find a HepSim event sample by clicking the info “Info” column. For example, look at a ttbar sample from Madgraph: ttbar_mg5. Find the URL of the analysis script (“ttbar_mg5.py”) located at the bottom. Copy it to some foulder. Or use “wget”:
wget https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/macros/ttbar_mg5.py
Then process this code in a batch mode as:
hs-run ttbar_mg5.py
If you do not want a pop-up canvas with the output histogram, change the line 71 to “c1.visible(0)” (or “c1.visible(False)”) and add “sys.exit(0)” at the very end of the “ttbar_mg5.py” macro.
One can view, edit and run the analysis file using a simple GUI editor.
hs-ide ttbar_mg5.py
It opens this file for editing. One can run it by clicking on the “run” button. It also provides an interactive Jython shell.
When possible, use the downloaded ProMC files, rather than streaming the data over the network. The calculations will run faster since the program does calculations using local files. Let assume that we put all ProMC files to the directory “data”. Then run the script as:
hs-run ttbar_mg5.py data
Here is a complete example: we download data to the directory “ttbar_mg5”, then we download the analysis script, and then we run this script over the local data using 10000 events:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5 ttbar_mg5 hs-run ttbar_mg5 10000
The above example has some limitations since it uses rather simple editor. Another approach is to use the full-featured Jas4pp or DataMelt programs which give more flexibility.
wget -O dmelt.zip http://jwork.org/dmelt/download/current.php; unzip dmelt.zip; ./dmelt/dmelt_batch.sh ttbar_mg5.py
You can also pass URL with data as an argument and limit the calculation to 10000 events:
./dmelt/dmelt_batch.sh ttbar_mg5.py https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/ 10000
As before, use the batch mode using downloaded ProMC files. Let assume that we put all ProMC files to the directory “data”. Then run DataMelt over the data as:
./dmelt/dmelt_batch.sh ttbar_mg5.py data
Here is a complete example: we download data to the directory “ttbar_mg5”, then we download the analysis script, and then we run this script over the local data using 10000 events:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5 ttbar_mg5 ./dmelt/dmelt_batch.sh ttbar_mg5.py ttbar_mg5 10000
Then click “run” (or [F8]). One can also start DataMelt without input files:
./dmelt/dmelt.sh
on Linux/Mac. On Windows, run “dmelt.bat” instead. You will see the DatMelt IDE. Locate an URL location of the analysis script, such as ttbar_mg5 (can be found under “Info” link). Then copy this link using the right mouse button (“Copy URL Location”). Next, in the DMelt menu, go to “File”→“Read script from URL”. Copy the URL link of the *.py file to the pop-up DataMelt URL dialog and click “run”. The program will start reading data from the Web. At the end of the run, you will see a pop-up window with a histogram. This method works for Python/Jython, Java, Ruby, Groovy, BeanShell languages.
Install ProMC and ROOT. Make sure that the environmental variables $PROMC and $ROOTSYS are set correctly. Then look at the examples:
$PROMC/examples/reader_mc/ # shows how to read ProMC files from a typical Monte Carlo generator $PROMC/examples/reader_nlo/ # shows how to read ProMC files with NLO calculations (i.e. MCFM) $PROMC/examples/promc2root/ # shows how to read PROMC files and create ROOT Tree.
For C++/ROOT, you should download files <i>a priory</i> since the streaming over the network is not supported. There is a simple example showing how to read multiple Monte Carlo files from HepSim, build anti-KT jets using FastJet, and fill ROOT histograms. Download hepsim-cpp package and compile it:
wget http://atlaswww.hep.anl.gov/asc/hepsim/soft/hepsim-cpp.tgz -O - | tar -xz; cd hepsim-cpp/; make
Read the file “README” inside this directory.
HepSim includes data after fast and full detector simulations. There are several methods to analyse such files.
The Delphes ROOT files are typically posted using the reconstruction tag “rfast[XXX]”, where “[XXX]” is a number. Use search to find such samples. Read the Delphes documentation about how to read Delphes ROOT files.
You can find all samples that contain fast simulations using this link.
Events afer detector simulation and reconstruction (“RECO”) are posted under the tag “rfull[XXX]”, where “[XXX]” is a number. We use LCIO file format that is readable by C++, Fortran and Java. Such files have an extension “slcio”. You can analyse the LCIO files using Jas4pp program that allows you to read files using the Python syntax.
If you need to read LCIO files in C++ code with ROOT/FastJet, use the example package https://github.com/chekanov/HepSim.
You can find all samples that contain full simulations using this link.
One can convert ProMC file to ROOT to look at branches. If the ProMC package is installed, run the converter:
cp -rf $PROMC/examples/promc2root . cd promc2root make ./promc2root [promc file] output.root
The output file will contain ROOT branches with px,py,pz,e, etc.
One can also convert ProMC to HEPMC using the example
$PROMC/examples/promc2hepmc
(see the ProMC manual). In addition, the directory
$PROMC/examples/
has examples showing how to convert ProMC to HEPEVT records (promc2hepevnt), STDHEP (promc2stdhep), LHE (promc2lhe) and LCIO (promc2lcio).
You can use ADFilter to process PROMC and DELPHES files to check how anomalous your events are. In addition, ADFilter can convert LHE files (extension *.lhe.gz) into PROMC files using the web interface.
Durham HepData database maintains “DMelt” scripts compatible with HepSim analysis scripts, thus it is relatively easy to overlay Monte Carlo predictions and data from published articles. For example, look at the link AAD 2013 from HepData and download a “DMelt” Jython script with published data. You can run this script inside DMelt IDE, or using the “hs-run” and “hs-ide” commands from hs-tools. Then one can combine this script with a HepSim script (from the “Info” description) that runs over HepSim Monte Carlo data, creating plots showing agreement between data and theoretical calculations.
— Sergei Chekanov 2017/02/06 17:25