{{indexmenu_n>1}}
[[:|<< back to HepSim manual]]
====== Quick start ======
Install the HepSim software toolkit using the "bash" shell on Linux/Mac:
bash # set bash if you haven't done this before
wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz;
source hs-toolkit/setup.sh
This creates the directory "hs-toolkit" with HepSim commands. You can also download it as [[https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz | hs-toolkit.tgz]]. Note that
[[https://www.java.com/en/download/|Java 8]] and above should be installed.
You can view the commands using the bash shell by typing :
bash> hs-help
The directory contains several bash scripts for Linux/Mac, and Windows batch (BAT) files to process events on Windows OS.
The package is used for download, view and analyze truth-level events in the [[https://atlaswww.hep.anl.gov/asc/promc/ | PROMC]] or [[https://github.com/proio-org/ | PROIO]] format.
Use [[https://atlaswww.hep.anl.gov/asc/jas4pp | JAS4PP program]] for analysing LCIO
(*.lcio) files with Geant4 simulations.
This program can also be used for truth-level records in the [[https://atlaswww.hep.anl.gov/asc/promc/|PROMC]], [[https://github.com/proio-org/| PROIO]] and [[https://root.cern/|ROOT]] file formats. You can also use Delphes/ROOT framework for ROOT files.
====== Finding data files ======
Let us show how to find the files associated with a given Monte Carlo event sample.
Go to [[https://atlaswww.hep.anl.gov/hepsim/|HepSim database]] and find the "Files" column.
It shows URL of truth-level files ("EVGEN"), i.e. files directly created by event generators.
You can use the command line tool to list files associated with a dataset as:
hs-ls [name]
where [name] is the dataset name. One can also use the URL of the Info page instead,
or the URL of the location of all files. This command shows a table with file names.
Here is an example illustrating how to list all files from the
[[http://atlaswww.hep.anl.gov/hepsim/info.php?item=2|Higgs to ttbar]] Monte Carlo sample:
hs-ls tev100pp_higgs_ttbar_mg5
("tev" defines the energy unit, "pp" means pp collisions, "mg5" means MadGraph5).
Similarly, one can use the download URL:
hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/
Note that in this approach, one can use URL mirrors close to your geographical location.
If you need to create list for downloads with all files, use this syntax:
hs-ls [name] simple > input.list # make list of ProMC files (without URL path)
hs-ls [name] simple-url > input_url.list # make list with URL from the main server
where [name] is the name of dataset. You can also use a URL if you want to create a list of files from certain (mirror) servers.
====== Searching for datasets ======
The best method to find the needed sample is to use the web page with
[[https://atlaswww.hep.anl.gov/hepsim/search.php|database search]].
Enter "rfull" in the search field, and you will see all samples with full simulation taggs. Enter "rfast", and you will
see samples with fast simulations. If you search for "higgs", you will see the list of samples
for Higgs. Also you can use complex searches, such as "pythia%rfast001" (Pythia sample after a fast simulation with the tag 001).
You can also use convenient URL links that search for some datasets. Here are a few examples:
* [[https://atlaswww.hep.anl.gov/hepsim/list.php?find=rfull]] list MC after full simulations
* [[https://atlaswww.hep.anl.gov/hepsim/list.php?find=rfast]] lists MC after fast simulations
* [[https://atlaswww.hep.anl.gov/hepsim/list.php?find=mg5]] - lists all Madgraph5 samples
* [[https://atlaswww.hep.anl.gov/hepsim/list.php?find=higgs%rfast]] - lists Higgs samples after fast simulation
If you prefer to use the command-line approach, you can find URL that corresponds a dataset using this command:
hs-find [search word]
The search is performed using names of datasets or Monte Carlo models, or in the file description.
For example, to find all URL locations that correspond to simulated samples with Higgs, try this:
hs-find higgs
If you are interested in a specific reconstruction tag, use "%" to separate the search string and the tag name.
Example:
hs-find pythia%rfast001
It will search for Pythia samples after a fast detector simulation with the tag "001". To search for a full detector simulation, replace
"rfast" with "rfull".
====== Downloading EVGEN files =====
One can download all EVGEN files for a given dataset as:
hs-get [name] [OUTPUT_DIR]
where [name] is the dataset name. This also can be the URL of the Info page,
or a direct URL pointing to the locations of ProMC files.
This example downloads all files from the "tev100pp_higgs_ttbar_mg5" dataset
to the directory "data":
hs-get tev100pp_higgs_ttbar_mg5 data
Alternatively, this example downloads the same files using the URL of the Info page:
hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data
Or, if you know the download URL, use it:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data
All these examples will download all files from the "tev100pp_higgs_ttbar_mg5" event sample.
One can add an integer value to the end of this command which specifies the number of threads.
If a second integer is given, it will be used to set the maximum number of files to download.
This example will download 10 files in 3 thread from the dataset "tev100pp_higgs_ttbar_mg5".
hs-get tev100pp_higgs_ttbar_mg5 data 3 10
One can also download files that have certain pattern in their names. If URL contains files generated with different pT cuts,
the names are usually have the string “pt”, followed by the pT cut. In this case, one can download such files as:
hs-get tev13pp_higgs_pythia8_ptbins data 3 10 pt100_
where the name is [[https://atlaswww.hep.anl.gov/hepsim/info.php?item=92|tev13pp_higgs_pythia8_ptbins]].
The command download files to the "data" directory in 2 threads. The maximum number of download files is 5 and all file names have *pt100* string (i.e. pT>100 GeV).
The general usage of the **hs-get** command requires 2, 3, 4 or 5 arguments:
hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)]
where [URL] is either info URL, [Download URL], or the dataset name.
====== Downloading RECO files ======
Many datasets contain data files after Geant4 detector simulation and reconstruction ("RECO").
The files are in "LCIO" format. They include complete tracks, hits, calorimeter clusters etc.
Reconstructed files are stored inside the directories with the tag "rfastNNN" (Delphes fast simulation) or "rfullNNN" (full simulation),
where "NNN" is a version number.
You can identify detector geometries that correspond to the tags using [[http://atlaswww.hep.anl.gov/hepsim/detectors.php|detector description page]].
For example, [[https://atlaswww.hep.anl.gov/hepsim/info.php?item=15|tev100pp_ttbar_mg5]]
sample includes the link "rfast001" (Delphes fast simulation, version 001). To download the reconstructed events for the reconstruction tag "rfast001", use this syntax:
hs-ls tev100pp_ttbar_mg5%rfast001 # list ROOT files with the tag "rfast001"
hs-get tev100pp_ttbar_mg5%rfast001 data # download to the "data" directory
The symbol "%" separates the dataset name ("tev100pp_ttbar_mg5") from the reconstruction tag ("rfast001").
You can skip "data" in the second example - in this case, data will be copied to the directory "tev100pp_ttbar_mg5%rfast001".
One can also download files in several threads. If you want to download 10 files in 3 threads, run:
hs-get tev100pp_ttbar_mg5%rfast001 data 3 10
As before, one can also download the files using the URL:
hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data
====== File validation ======
One can check file consistency and print additional information as:
hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/pythia8_higgs2mumu/tev14_pythia8_h2mm_1.promc
The last argument can be a file location on the local disk (works faster than URL!).
The output of the above command will look something like this:
ProMC version = 2
Last modified = 2013-06-05 16:32:18
Description = PYTHIA8;PhaseSpace:mHatMin = 20;PhaseSpace:pTHatMin = 20;
ParticleDecays:limitTau0 = on;
ParticleDecays:tau0Max = 10;HiggsSM:all = on;
Events = 10000
Sigma (pb) = 2.72474E1 ± 1.92589E-1
Lumi (pb-1) = 3.67007E2
Varint units = E:100000 L:1000
Log file: = logfile.txt
The file was validated. Exit.
You can see that this file includes a complete logfile ("logfile.txt"). We will explain how to extract it later.
====== Looking at separate events ======
One can print separate EVGEN events using the above command after passing an integer argument that specifies
the event to be looked at. This command prints the event number "100":
hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/pythia8_higgs2mumu/tev14_pythia8_h2mm_1.promc 100
More conveniently, one can open the file in a GUI mode to look at all events:
hs-view [promc file]
This command brings up a GUI window to look at separate events. You should forward X11 to see the GUI. For Windows: download the file [[https://atlaswww.hep.anl.gov/asc/hepsim/hepsim.jar|hepsim.jar]] and click on it. Then open the file as [File]-[Open file].
If you use Windows OS, click "hs-view.bat" and open ProMC file using the File Menu.
You can also view EVGEN events without downloading files. Simply pass a URL to the above command and you will stream Monte Carlo events:
hs-view https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/pythia8_higgs2mumu/tev14_pythia8_h2mm_1.promc
Here we looked at one file of [[http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/qcd/pythia8/|Pythia8 (QCD) sample]].
Files with NLO predictions will be automatically identified: For such files, you will see a few particles per events and weights for calculations of PDF uncertainties.
====== Monte Carlo logfile ======
Each ProMC/ProIO file includes a logfile from the Monte Carlo generator. Show this file on the screen as:
hs-log [file]
where [file] is either a ProMC or ProIO file (you can use URL instead of the full path on the local computer).
In the case of ProMC files, one can use the standard Linux commands, such as "unzip":
unzip -p [promc file] logfile.txt
where [promc file] is the file name. This command extracts a logfile with original generator-level information.
The next command shows the actual number of stored events:
unzip -p [promc file] promc_nevents
This command lists the stored events (each event is a ProtoBuffer binary file):
unzip -l [promc file]
====== Pileup mixing ======
One can mix events from a signal ProMC file with
inelastic (minbias) events using the "pileup mixing" command:
hs-pileup pN signal.promc minbias.promc output.promc
Here "p" indicates that events from "minbias.promc" will be mixed
with every event from "signal.promc" using a Poisson distribution with the mean "N".
If "p" before N is not given, then exactly N (random) events from minbias.promc will be added to every event from "signal.promc".
Use large numbers of events in "minbias.promc" to minimise reuse of the same events.
The barcode of particles inside "output.promc" indicates the event origin (0 is set for particles from "signal.promc").
====== Analysing EVNT files ======
One can analyse Monte Carlo events on Window, Linux and Mac with [[https://www.java.com/en/download/|Java7/8]].
Many HepSim samples include *.py scripts to calculate differential cross sections. One can run
validation scripts from the Web using [[https://www.java.com/en/download/faq/java_webstart.xml|Java Web Start]].
Also, one can run scripts using a desktop and streaming data via the network,
or using downloaded files (in which case you pass the directory with *promc files as an argument).
Here are a few approaches showing how to read *.py scripts:
===== Using Java Web start =====
Many "Info" pages of HepSim have Jython (Python) scripts for validation and analysis.
One can run such scripts from the web browsers
using the [[https://www.java.com/en/download/faq/java_webstart.xml|Java Web Start]] technology.
Click the "Launch" button. You will see an editor. Then click the "Run" button to process events.
To use [[https://www.java.com/en/download/faq/java_webstart.xml|Java Web Start]], you should configure
Java permissions:
For Linux/Mac, run "ControlPanel", go to the "Security" tab and add "http://atlaswww.hep.anl.gov" to the exception list.
For Windows, find "Java Control Panel" and do the same.
Read [[https://www.java.com/en/download/help/java_blocked.xml|Why are Java applications blocked]] by your security settings.
In addition, if you are a Mac user, you should allow execution of programs [[http://www.macrumors.com/2012/02/16/os-x-mountain-lion-limits-apps-to-mac-app-store-signed-apps-by-default/|outside Mac App Store]].
===== Using stand-alone Python =====
In this example, we will run a Python (to be more exact, Jython) script and, at the same time, will stream data from the web.
Find a HepSim event sample by clicking the info "Info" column.
For example, look at a ttbar sample from Madgraph: [[https://atlaswww.hep.anl.gov/hepsim/info.php?item=15|ttbar_mg5]].
Find the URL of the analysis script ("ttbar_mg5.py") located at the bottom. Copy it to some foulder. Or use "wget":
wget https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/macros/ttbar_mg5.py
Then process this code in a batch mode as:
hs-run ttbar_mg5.py
If you do not want a pop-up canvas with the output histogram, change the line 71 to "c1.visible(0)" (or "c1.visible(False)")
and add "sys.exit(0)" at the very end of the "ttbar_mg5.py" macro.
One can view, edit and run the analysis file using a simple GUI editor.
hs-ide ttbar_mg5.py
It opens this file for editing. One can run it by clicking on the "run" button.
It also provides an interactive Jython shell.
If you use Windows OS, click the file "hs-ide.bat" and open the Python script using the menu, and then run this script using the"Run" button.
When possible, use the downloaded ProMC files, rather than streaming the data over the network.
The calculations will run faster since the program does calculations using local files.
Let assume that we put all ProMC files to the directory "data". Then run the script as:
hs-run ttbar_mg5.py data
Here is a complete example: we download data to the directory "ttbar_mg5",
then we download the analysis script, and then we run this script over the local data using 10000 events:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5 ttbar_mg5
hs-run ttbar_mg5 10000
===== Using a Java IDE =====
The above example has some limitations since it uses rather simple editor.
Another approach is to use the full-featured [[http://atlaswww.hep.anl.gov/asc/jas4pp/|Jas4pp]] or
[[https://datamelt.org|DataMelt]] programs which give more flexibility.
wget -O dmelt.zip http://jwork.org/dmelt/download/current.php;
unzip dmelt.zip;
./dmelt/dmelt_batch.sh ttbar_mg5.py
You can also pass URL with data as an argument and limit the calculation to 10000 events:
./dmelt/dmelt_batch.sh ttbar_mg5.py https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/ 10000
As before, use the batch mode using downloaded ProMC files.
Let assume that we put all ProMC files to the directory "data". Then run [[https://datamelt.org/|DataMelt]] over the data as:
./dmelt/dmelt_batch.sh ttbar_mg5.py data
Here is a complete example: we download data to the directory "ttbar_mg5",
then we download the analysis script, and then we run this script over the local data using 10000 events:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5 ttbar_mg5
./dmelt/dmelt_batch.sh ttbar_mg5.py ttbar_mg5 10000
Then click "run" (or [F8]).
One can also start [[https://datamelt.org|DataMelt]] without input files:
./dmelt/dmelt.sh
on Linux/Mac. On Windows, run "dmelt.bat" instead. You will see the DatMelt IDE.
Locate an URL location of the analysis script, such as [[https://atlaswww.hep.anl.gov/hepsim/info.php?item=15|ttbar_mg5]] (can be found under "Info" link). Then copy this link using the right mouse button ("Copy URL Location").
Next, in the DMelt menu, go to "File"→"Read script from URL". Copy the URL link of the *.py file to the pop-up DataMelt
URL dialog and click "run". The program will start reading data from the Web. At the end of the run, you will see a pop-up
window with a histogram.
This method works for Python/Jython, Java, Ruby, Groovy, BeanShell languages.
===== Using C++/ROOT =====
Install [[http://atlaswww.hep.anl.gov/asc/promc/|ProMC]] and [[http://root.cern.ch/drupal/|ROOT]]. Make sure that the environmental variables $PROMC and $ROOTSYS are set correctly. Then look at the examples:
$PROMC/examples/reader_mc/ # shows how to read ProMC files from a typical Monte Carlo generator
$PROMC/examples/reader_nlo/ # shows how to read ProMC files with NLO calculations (i.e. MCFM)
$PROMC/examples/promc2root/ # shows how to read PROMC files and create ROOT Tree.
For C++/ROOT, you should download files a priory since the streaming over the network is not supported.
There is a simple example showing how to read multiple Monte Carlo files from HepSim,
build anti-KT jets using FastJet, and fill ROOT histograms. Download
[[https://atlaswww.hep.anl.gov/asc/hepsim/soft/hepsim-cpp.tgz|hepsim-cpp package]] and compile it:
wget http://atlaswww.hep.anl.gov/asc/hepsim/soft/hepsim-cpp.tgz -O - | tar -xz;
cd hepsim-cpp/; make
Read the file "README" inside this directory.
====== Analyzing RECO data ======
HepSim includes data after fast and full detector simulations. There are several methods to analyse such files.
===== Analyzing Delphes ROOT files (fast simulation) =====
The Delphes ROOT files are typically posted using the reconstruction tag "rfast[XXX]", where "[XXX]" is a number.
Use search to find such samples. Read the [[https://cp3.irmp.ucl.ac.be/projects/delphes|Delphes documentation]]
about how to read Delphes ROOT files.
You can find all samples that contain fast simulations using [[https://atlaswww.hep.anl.gov/hepsim/list.php?find=rfast|this link]].
===== Full simulation: LCIO files =====
Events afer detector simulation and reconstruction ("RECO") are posted under the tag "rfull[XXX]", where "[XXX]" is a number.
We use [[http://lcio.desy.de/|LCIO]] file format that is readable by C++, Fortran and Java.
Such files have an extension "slcio". You can analyse the LCIO files using
[[https://atlaswww.hep.anl.gov/asc/jas4pp/|Jas4pp program]] that allows
you to read files using the Python syntax.
If you need to read LCIO files in C++ code with ROOT/FastJet, use the example package [[https://github.com/chekanov/HepSim]].
You can find all samples that contain full simulations using [[https://atlaswww.hep.anl.gov/hepsim/list.php?find=rfull|this link]].
===== Conversion to ROOT, HEPMC, HEPEVT, LHE, STDHEP, LCIO =====
One can convert ProMC file to ROOT to look at branches. If the ProMC package is installed, run the converter:
cp -rf $PROMC/examples/promc2root .
cd promc2root
make
./promc2root [promc file] output.root
The output file will contain ROOT branches with px,py,pz,e, etc.
One can also convert ProMC to HEPMC using the example
$PROMC/examples/promc2hepmc
(see the ProMC manual). In addition, the directory
$PROMC/examples/
has examples showing how to convert
ProMC to HEPEVT records (promc2hepevnt), STDHEP (promc2stdhep), LHE (promc2lhe) and LCIO (promc2lcio).
====== Anomaly detection with HepSim ======
You can use [[http://mc.hep.anl.gov/asc/adfilter|ADFilter]] to process PROMC and DELPHES files to check how anomalous your events are.
In addition, [[http://mc.hep.anl.gov/asc/adfilter|ADFilter]] can convert LHE files (extension *.lhe.gz) into PROMC files using the web interface.
====== Comparing with HepData ======
Durham [[http://durpdg.dur.ac.uk/|HepData]] database maintains "DMelt" scripts compatible
with [[https://atlaswww.hep.anl.gov/hepsim/|HepSim]] analysis scripts, thus it is relatively easy to overlay Monte Carlo predictions and data from published articles.
For example, look at the link [[http://durpdg.dur.ac.uk/view/ins1253852|AAD 2013]] from [[http://durpdg.dur.ac.uk/|HepData]] and
download a "DMelt" Jython script with published data.
You can run this script inside DMelt IDE, or using the "hs-run" and "hs-ide"
commands from hs-tools.
Then one can combine this script with a HepSim script
(from the "Info" description)
that runs over HepSim Monte Carlo data,
creating plots showing agreement between data and theoretical calculations.
--- //[[chekanov@anl.gov|Sergei Chekanov]] 2017/02/06 17:25//