EVGEN events

Truth-level (EVGEN) data are stored in a platform-independent format called ProMC that allows very effective compression using a variable-byte encoding. This data format is supported by popular programming languages (C++, Java, Python) on major operating system (Windows, Mac, Linux, etc.). Open access to such files via the http protocol is the central element in the design of HepSim, since further processing using fast and full simulations is done using computer resources from multiple locations. The compact ProMC files optimized for web streaming, together with the http protocol optimized to handle many (relatively small) files, is one of the distinct features of HepSim compared to other production systems.

Simulated data after fast and full detector simulations are kept in ROOT and LCIO formats.

HepSim software toolkit

In order to work with HepSim database, install the HepSim software toolkit. For Linux/Mac with the bash shell, use:

wget http://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz
source hs-toolkit/setup.sh

or using curl:

curl http://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz | tar -xz
source hs-toolkit/setup.sh

This creates the directory “hs-toolkit” with HepSim commands. To run them, Java7/8 and above should be installed. One can also download hs-toolkit.tgz. Most commands are platform-independent and based on the hepsim.jar that can be used on Windows/Mac. The setup command adds all commands to the execution path.

You can view the available commands from this package by typing:

hs-help
The programs included in hs-toolkit can be used for searching and downloading ProMC files, and other files after fast and full simulations. If you need to analyse and display reconstructed events in the LCIO format, use the Jas4PP program which includes hs-toolkit, but also adds packages for processing SLCIO files. This program can also process ProMC files.

Event record browser

Use the event-record browser to view separate EVGEN events, cross sections and log files. On Linux/Mac, use:

hs-view http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc

Here we looked at one file of the Pythia8 (QCD) sample.

Of course, one can look at the local file as well:

hs-view pythia8_higgs_1.promc

On Windows, download hepsim.jar and click the “hepsim.jar” file. Then open the ProMC file using the “File” menu. You will see a pop-up GUI browser which displays the MC record. You can search for a given particle name, view data layouts and log files using the [Menu]:

Click to display ⇲

Click to hide ⇱

This works for full parton-shower simulations with detailed information on particles. Unlike the usual parton shower Monte Carlo, this browser has a detailed information on event weights, PDF uncertainties and scale uncertainties (in some cases). The browser can show 4-momenta of each event as well as the total cross sections (for NLO, you need to read all events to get an accurate cross section). Look at the ProMC file description.

File validation

Check the consistency of the EVGEN file and additional information as:

hs-info http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc

The last argument can be a file path on the local disk (faster than URL!). The output of the above command is:

File          = http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc
ProMC version = 2
Description   = PYTHIA8;PhaseSpace:mHatMin = 20;PhaseSpace:pTHatMin = 20;ParticleDecays:limitTau0 = on;
ParticleDecays:tau0Max = 10;HiggsSM:all = on;
Events        = 10000
Sigma    (pb) = 2.72474E1 ± 1.92589E-1
Lumi   (pb-1) = 3.67007E2
Varint units  = E:100000 L:1000
Log file:     = logfile.txt
The file was validated. Exit.

All entries are self-explanatory. Varint units - values used to multiply energy (momenta) to convert to variable-byte integers. The “E:100000” means that all px,py,pz,e,mass were multiplied by 100000, while all distances (x,y,z,t) were multiplied by 1000. See the ProMC archive format.

One can look at separate events using the above command after passing an integer argument that specifies the event to be looked at. This command prints the event number 100:

hs-info http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc 100

List available data files

Let us show how to find all files associated with a given Monte Carlo event sample. Go to HepSim database. Look at the links “Files”. It list the available files. Then find the files as:

      hs-ls [name]

where [name] is the dataset name, or the URL of the Info page, or the URL of the location of all files. This command shows a table with file names and their sizes.

Here is an example illustrating how to list all files from the Higgs to ttbar Monte Carlo sample:

hs-ls tev100_higgs_ttbar_mg5

If you want to create a simple list without decoration to make a file to be read by other program, use

hs-ls tev100_higgs_ttbar_mg5 simple

The string “simple” removes the decorations. If you want a list with full URL and without decorations, use:

hs-ls tev100_higgs_ttbar_mg5 simple-url

hs-ls http://atlaswww.hep.anl.gov/hepsim/info.php?item=2

or

hs-ls http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/

The commands show this table:

Click to display ⇲

Click to hide ⇱

Looking for http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5
File name                size (kB)
------------------------------------
mg5_Httbar_100tev_001.promc	24464
mg5_Httbar_100tev_002.promc	24360
mg5_Httbar_100tev_003.promc	24488
mg5_Httbar_100tev_004.promc	24516
mg5_Httbar_100tev_005.promc	24448
mg5_Httbar_100tev_006.promc	24484
mg5_Httbar_100tev_007.promc	24468
mg5_Httbar_100tev_008.promc	24516
mg5_Httbar_100tev_009.promc	24440
mg5_Httbar_100tev_010.promc	24592
mg5_Httbar_100tev_011.promc	24456
mg5_Httbar_100tev_012.promc	24568
mg5_Httbar_100tev_013.promc	24436
mg5_Httbar_100tev_014.promc	24416
mg5_Httbar_100tev_015.promc	24408
mg5_Httbar_100tev_016.promc	24484
mg5_Httbar_100tev_017.promc	24400
mg5_Httbar_100tev_018.promc	24460
mg5_Httbar_100tev_019.promc	24452
mg5_Httbar_100tev_020.promc	24420
------------------------------------
-> Summary: Nr of files=20  Total size=489 MB

Searching for a dataset

One can find a URL that corresponds to a dataset using a search word. The syntax is:

hs-find [search word]

For example, this commands list all datasets that have the keyword “higgs” in the name of the dataset, in the name of Monte Carlo models, or in the file description:

hs-find higgs

Note that there is no need to use any character such as “*” or “%” as for the usual regular expression. The match is done using any dataset that has the sub-string “higgs”. The same functionality is implemented in the “Search” menu of the HepSim interface.

If you are interested in a specific reconstruction tag, use “%” to separate the search string and the tag name. Example:

hs-find pythia%rfast001

It will search for Pythia samples after a fast detector simulation with the tag “001”. To search for a full detector simulation, replace “rfast” with “rfull”.

hs-get [name] [OUTPUT_DIR]

where [name] is either the name of the dataset, or the URL of Info page HepSim repository, or a direct URL pointing to the locations of ProMC files. This example downloads dataset “tev100_higgs_ttbar_mg5” to the directory “data”:

hs-get tev100_higgs_ttbar_mg5 data

You will be prompted to use certain mirror (if there are alternative mirrors). Select the mirror and start downloading the files.

Alternatively, this example downloads files using the URL of the Info page:

hs-get http://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data

Or, if you know the download URL with the file locations, use this command:

hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data

All these examples will download all files from the “tev100_higgs_ttbar_mg5” event sample.

If you see that the download is slow, use an alternative URL from the mirror list which is given for each dataset.

hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 higgs_ttbar_mg5 3 10

Instead of [Download URL], one can use the URL of the info page, or the name of the dataset. Here are 2 identical examples to download 5 files using single (1) thread and the ouput directory “data”:

Using the URL of the info page:

hs-get http://atlaswww.hep.anl.gov/hepsim/info.php?item=2  data 1 5

or, when using the dataset name given on the info page:

hs-get tev100_higgs_ttbar_mg5  data 1 5

You can also download files that have certain pattern in the names. If a directory contains files generated with different pt cuts, the names are usually have the substring “pt”, followed by the pT cuts. In this case, one can download such files as:

hs-get tev13_higgs_pythia8_ptbins data 2 5 pt100_

The last argument shows that all the downloaded files should have the string “pt100_” in their names (in this case, it tells that the file are generated with pT>100 GeV).

The general usage of the hs-get command requires 2, 3, 4 or 5 arguments:

[URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)].

where [URL] is either info URL, [Download URL], or the dataset name.

Some datasets contain reconstructed files after detector simulations. Reconstructed files are stored inside the directory “rfastNNN” (fast simulation) or “rfullNNN” (full simulation), where “NNN” is the version number. For example, tev100_ttbar_mg5 sample includes the link “rfast001” (Delphes fast simulation, version 001). To download the reconstructed events for the reconstruction tag “rfast001”, use this syntax:

hs-ls  tev100_ttbar_mg5%rfast001      # list reco files with the tag "rfast001"
hs-get tev100_ttbar_mg5%rfast001 data # download to the "data" directory

The symbol “%” separates the sample name (tev100_ttbar_mg5) from the reconstruction tag (rfast001). If you want to download 10 files in 3 threads, use this:

hs-get  tev100_ttbar_mg5%rfast001 data 3 10

As before, one can also download the files using the URL approach:

hs-ls http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data

Note that the reconstruction tag “rfast001” is separated by backslash as for the usual directory. As before, the URL can be taken from the list with mirrors. For example, if you want to download files from Nersc, use this URL:

hs-ls  http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
hs-get http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ data

Send comments to: — Sergei Chekanov (ANL) 2014/02/08 10:26