<< back to HepSim manual

HepSim data files

EVGEN events

Truth-level (EVGEN) data are stored in a platform-independent format called ProMC that allows very effective compression using a variable-byte encoding. This data format is supported by popular programming languages (C++, Java, Python) on major operating system (Windows, Mac, Linux, etc.). Open access to such files via the http protocol is the central element in the design of HepSim, since further processing using fast and full simulations is done using computer resources from multiple locations. The compact ProMC files optimized for web streaming, together with the http protocol optimized to handle many (relatively small) files, is one of the distinct features of HepSim compared to other production systems.

Simulated data after fast and full detector simulations are kept in ROOT and LCIO formats.

HepSim software toolkit

In order to work with HepSim database, install the HepSim software toolkit. For Linux/Mac with the bash shell, use:

wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz
source hs-toolkit/setup.sh

or using curl:

curl https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz | tar -xz
source hs-toolkit/setup.sh

This creates the directory “hs-toolkit” with HepSim commands. To run them, Java7/8 and above should be installed. One can also download hs-toolkit.tgz. Most commands are platform-independent and based on the hepsim.jar that can be used on Windows/Mac. The setup command adds all commands to the execution path.

You can view the available commands from this package by typing:

hs-help
The programs included in hs-toolkit can be used for searching and downloading ProMC files, and other files after fast and full simulations. If you need to analyse and display reconstructed events in the LCIO format, use the Jas4PP program which includes hs-toolkit, but also adds packages for processing SLCIO files. This program can also process ProMC files.

Event record browser

Use the event-record browser to view separate EVGEN events, cross sections and log files. On Linux/Mac, use:

hs-view https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc

Here we have looked at one file of the Pythia8 (QCD) sample.

Of course, one can look at the local file as well:

wget https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
hs-view tev14_mg5_Httbar_001.promc

after you downloaded it.

On Windows, download hepsim.jar and click the “hepsim.jar” file. Then open the ProMC file using the “File” menu. You will see a pop-up GUI browser which displays the MC record. You can search for a given particle name, view data layouts and log files using the [Menu]:

Click to display ⇲

Click to hide ⇱

 ProMC browser

This works for full parton-shower simulations with detailed information on particles. Unlike the usual parton shower Monte Carlo, this browser has a detailed information on event weights, PDF uncertainties and scale uncertainties (in some cases). The browser can show 4-momenta of each event as well as the total cross sections (for NLO, you need to read all events to get an accurate cross section). Look at the ProMC file description.

File validation

Check the consistency of the EVGEN file and additional information as:

hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc

The last argument can be a file path on the local disk (faster than URL!). The output of the above command is:

File          = http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
ProMC version = 4
Last modified = 2015-10-03 12:06:52
Description   = run_Httbar_14tev
Events        = 10000
Sigma    (pb) = 5.61176E-1 ± 3.3035E-3
Lumi   (pb-1) = 1.78197E4
Varint units  = E:100000 L:1000
Log file:     = logfile.txt
####  The file is healthy!  ####

All entries are self-explanatory. Varint units - values used to multiply energy (momenta) to convert to variable-byte integers. The “E:100000” means that all px, py, pz, e, mass values are multiplied by 100000, while all distances (x,y,z,t) are multiplied by 1000. See the ProMC archive format.

One can look at separate events using the above command after passing an integer argument that specifies the event to be looked at. This command prints the event number 100:

hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc 100

List available data files

Let us show how to find all files associated with a given Monte Carlo event sample. Go to HepSim database. Look at the links “Files”. It list the available files. Then find the files as:

      hs-ls [name]              

where [name] is the dataset name, or the URL of the Info page, or the URL of the location of all files. This command shows a table with file names and their sizes.

Here is an example illustrating how to list all files from the Higgs to ttbar Monte Carlo sample:

hs-ls tev100pp_higgs_ttbar_mg5

If you want to create a simple list without decoration to make a file to be read by other program, use

hs-ls tev100pp_higgs_ttbar_mg5 simple

The string “simple” removes the decorations. If you want a list with full URL and without decorations, use:

hs-ls tev100pp_higgs_ttbar_mg5 simple-url

Similarly, one can use the Info or Download URL path:

hs-ls https://atlaswww.hep.anl.gov/hepsim/info.php?item=2

or

hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/

The commands show this table:

Click to display ⇲

Click to hide ⇱

Looking for http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5
File name                size (kB)
------------------------------------
mg5_Httbar_100tev_001.promc	24464
mg5_Httbar_100tev_002.promc	24360
mg5_Httbar_100tev_003.promc	24488
mg5_Httbar_100tev_004.promc	24516
mg5_Httbar_100tev_005.promc	24448
mg5_Httbar_100tev_006.promc	24484
mg5_Httbar_100tev_007.promc	24468
mg5_Httbar_100tev_008.promc	24516
mg5_Httbar_100tev_009.promc	24440
mg5_Httbar_100tev_010.promc	24592
mg5_Httbar_100tev_011.promc	24456
mg5_Httbar_100tev_012.promc	24568
mg5_Httbar_100tev_013.promc	24436
mg5_Httbar_100tev_014.promc	24416
mg5_Httbar_100tev_015.promc	24408
mg5_Httbar_100tev_016.promc	24484
mg5_Httbar_100tev_017.promc	24400
mg5_Httbar_100tev_018.promc	24460
mg5_Httbar_100tev_019.promc	24452
mg5_Httbar_100tev_020.promc	24420
------------------------------------
-> Summary: Nr of files=20  Total size=489 MB

Searching for a dataset

One can find a URL that corresponds to a dataset using a search word. The syntax is:

hs-find [search word]

For example, this commands list all datasets that have the keyword “higgs” in the name of the dataset, in the name of Monte Carlo models, or in the file description:

hs-find higgs

Note that there is no need to use any character such as “*” or “%” as for the usual regular expression. The match is done using any dataset that has the sub-string “higgs”. The same functionality is implemented in the “Search” menu of the HepSim interface.

If you are interested in a specific reconstruction tag, use “%” to separate the search string and the tag name. Example:

hs-find pythia%rfast001

It will search for Pythia samples after a fast detector simulation with the tag “001”. To search for a full detector simulation, replace “rfast” with “rfull”.

Downloading truth-level files

One can download all files for a given dataset as:

hs-get [name] [OUTPUT_DIR]

where [name] is either the name of the dataset, or the URL of Info page HepSim repository, or a direct URL pointing to the locations of ProMC files. This example downloads dataset “tev100pp_higgs_ttbar_mg5” to the directory “data”:

hs-get tev100pp_higgs_ttbar_mg5 data

You will be prompted to use certain mirror (if there are alternative mirrors). Select the mirror and start downloading the files.

Alternatively, this example downloads files using the URL of the Info page:

hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data

Or, if you know the download URL with the file locations, use this command:

hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data

All these examples will download all files from the “tev100pp_higgs_ttbar_mg5” event sample.

If you see that the download is slow, use an alternative URL from the mirror list which is given for each dataset.

You can stop downloading using [Ctrt-D]. Next time you start it, it will continue the download. By default, we use 2 threads. One can increase the number of threads by adding an integer number to the end of this command. If a second integer is given, it will be used to set the maximum number of files to download. This example shows how to download 10 files using 3 threads:

hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 higgs_ttbar_mg5 3 10

Instead of [Download URL], one can use the URL of the info page, or the name of the dataset. Here are 2 identical examples to download 5 files using single (1) thread and the output directory “data”:

Using the URL of the info page:

hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2  data 1 5

or, when using the dataset name given on the info page:

hs-get tev100pp_higgs_ttbar_mg5  data 1 5

You can also download files that have certain pattern in the names. If a directory contains files generated with different pt cuts, the names are usually have the sub-string “pt”, followed by the pT cuts. In this case, one can download such files as:

hs-get tev13pp_higgs_pythia8_ptbins data 2 5 pt100_

The last argument shows that all the downloaded files should have the string “pt100_” in their names (in this case, it tells that the file are generated with pT>100 GeV).

The general usage of the hs-get command requires 2, 3, 4 or 5 arguments:

hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)].

where [URL] is either info URL, [Download URL], or the dataset name.

It is not recommended to use more than 6 threads for downloads.

Downloading files after detector simulation

Some datasets contain reconstructed files after detector simulations. Reconstructed files are stored inside the directory “rfastNNN” (fast simulation) or “rfullNNN” (full simulation), where “NNN” is the version number. For example, tev100pp_ttbar_mg5 sample includes the link “rfast001” (Delphes fast simulation, version 001). To download the reconstructed events for the reconstruction tag “rfast001”, use this syntax:

hs-ls  tev100pp_ttbar_mg5%rfast001      # list reco files with the tag "rfast001"
hs-get tev100pp_ttbar_mg5%rfast001 data # download to the "data" directory

The symbol “%” separates the sample name (tev100pp_ttbar_mg5) from the reconstruction tag (rfast001). If you want to download 10 files in 3 threads, use this:

hs-get  tev100pp_ttbar_mg5%rfast001 data 3 10

As before, one can also download the files using the URL approach:

hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data

Note that the reconstruction tag “rfast001” is separated by backslash as for the usual directory. As before, the URL can be taken from the list with mirrors. For example, if you want to download files from Nersc, use this URL:

hs-ls  http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
hs-get http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ data

Written by Sergei Chekanov (ANL) 2016/02/08 10:26