HepSim data files
Truth-level (EVGEN) data are stored in a platform-independent format called ProMC that allows very effective compression using a variable-byte encoding. This data format is supported by popular programming languages (C++, Java, Python) on major operating system (Windows, Mac, Linux, etc.). Open access to such files via the http protocol is the central element in the design of HepSim, since further processing using fast and full simulations is done using computer resources from multiple locations. The compact ProMC files optimized for web streaming, together with the http protocol optimized to handle many (relatively small) files, is one of the distinct features of HepSim compared to other production systems.
Simulated data after fast and full detector simulations are kept in ROOT and LCIO formats.
HepSim software toolkit
In order to work with HepSim database, install the HepSim software toolkit. For Linux/Mac with the bash shell, use:
wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz source hs-toolkit/setup.sh
or using curl:
curl https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz | tar -xz source hs-toolkit/setup.sh
This creates the directory “hs-toolkit” with HepSim commands. To run them, Java7/8 and above should be installed. One can also download hs-toolkit.tgz. Most commands are platform-independent and based on the hepsim.jar that can be used on Windows/Mac. The setup command adds all commands to the execution path.
You can view the available commands from this package by typing:
Event record browser
Use the event-record browser to view separate EVGEN events, cross sections and log files. On Linux/Mac, use:
Here we have looked at one file of the Pythia8 (QCD) sample.
Of course, one can look at the local file as well:
wget https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc hs-view tev14_mg5_Httbar_001.promc
after you downloaded it.
On Windows, download hepsim.jar and click the “hepsim.jar” file. Then open the ProMC file using the “File” menu. You will see a pop-up GUI browser which displays the MC record. You can search for a given particle name, view data layouts and log files using the [Menu]:
This works for full parton-shower simulations with detailed information on particles. Unlike the usual parton shower Monte Carlo, this browser has a detailed information on event weights, PDF uncertainties and scale uncertainties (in some cases). The browser can show 4-momenta of each event as well as the total cross sections (for NLO, you need to read all events to get an accurate cross section). Look at the ProMC file description.
Check the consistency of the EVGEN file and additional information as:
The last argument can be a file path on the local disk (faster than URL!). The output of the above command is:
File = http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc ProMC version = 4 Last modified = 2015-10-03 12:06:52 Description = run_Httbar_14tev Events = 10000 Sigma (pb) = 5.61176E-1 ± 3.3035E-3 Lumi (pb-1) = 1.78197E4 Varint units = E:100000 L:1000 Log file: = logfile.txt #### The file is healthy! ####
All entries are self-explanatory. Varint units - values used to multiply energy (momenta) to convert to variable-byte integers. The “E:100000” means that all px, py, pz, e, mass values are multiplied by 100000, while all distances (x,y,z,t) are multiplied by 1000. See the ProMC archive format.
One can look at separate events using the above command after passing an integer argument that specifies the event to be looked at. This command prints the event number 100:
hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc 100
List available data files
Let us show how to find all files associated with a given Monte Carlo event sample. Go to HepSim database. Look at the links “Files”. It list the available files. Then find the files as:
where [name] is the dataset name, or the URL of the Info page, or the URL of the location of all files. This command shows a table with file names and their sizes.
Here is an example illustrating how to list all files from the Higgs to ttbar Monte Carlo sample:
If you want to create a simple list without decoration to make a file to be read by other program, use
hs-ls tev100pp_higgs_ttbar_mg5 simple
The string “simple” removes the decorations. If you want a list with full URL and without decorations, use:
hs-ls tev100pp_higgs_ttbar_mg5 simple-url
Similarly, one can use the Info or Download URL path:
The commands show this table:
Searching for a dataset
One can find a URL that corresponds to a dataset using a search word. The syntax is:
hs-find [search word]
For example, this commands list all datasets that have the keyword “higgs” in the name of the dataset, in the name of Monte Carlo models, or in the file description:
Note that there is no need to use any character such as “*” or “%” as for the usual regular expression. The match is done using any dataset that has the sub-string “higgs”. The same functionality is implemented in the “Search” menu of the HepSim interface.
If you are interested in a specific reconstruction tag, use “%” to separate the search string and the tag name. Example:
It will search for Pythia samples after a fast detector simulation with the tag “001”. To search for a full detector simulation, replace “rfast” with “rfull”.
Downloading truth-level files
One can download all files for a given dataset as:
hs-get [name] [OUTPUT_DIR]
where [name] is either the name of the dataset, or the URL of Info page HepSim repository, or a direct URL pointing to the locations of ProMC files. This example downloads dataset “tev100pp_higgs_ttbar_mg5” to the directory “data”:
hs-get tev100pp_higgs_ttbar_mg5 data
You will be prompted to use certain mirror (if there are alternative mirrors). Select the mirror and start downloading the files.
Alternatively, this example downloads files using the URL of the Info page:
hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data
Or, if you know the download URL with the file locations, use this command:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data
All these examples will download all files from the “tev100pp_higgs_ttbar_mg5” event sample.
You can stop downloading using [Ctrt-D]. Next time you start it, it will continue the download. By default, we use 2 threads. One can increase the number of threads by adding an integer number to the end of this command. If a second integer is given, it will be used to set the maximum number of files to download. This example shows how to download 10 files using 3 threads:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 higgs_ttbar_mg5 3 10
Instead of [Download URL], one can use the URL of the info page, or the name of the dataset. Here are 2 identical examples to download 5 files using single (1) thread and the output directory “data”:
Using the URL of the info page:
hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data 1 5
or, when using the dataset name given on the info page:
hs-get tev100pp_higgs_ttbar_mg5 data 1 5
You can also download files that have certain pattern in the names. If a directory contains files generated with different pt cuts, the names are usually have the sub-string “pt”, followed by the pT cuts. In this case, one can download such files as:
hs-get tev13pp_higgs_pythia8_ptbins data 2 5 pt100_
The last argument shows that all the downloaded files should have the string “pt100_” in their names (in this case, it tells that the file are generated with pT>100 GeV).
The general usage of the hs-get command requires 2, 3, 4 or 5 arguments:
hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)].
where [URL] is either info URL, [Download URL], or the dataset name.
It is not recommended to use more than 6 threads for downloads.
Downloading files after detector simulation
Some datasets contain reconstructed files after detector simulations. Reconstructed files are stored inside the directory “rfastNNN” (fast simulation) or “rfullNNN” (full simulation), where “NNN” is the version number. For example, tev100pp_ttbar_mg5 sample includes the link “rfast001” (Delphes fast simulation, version 001). To download the reconstructed events for the reconstruction tag “rfast001”, use this syntax:
hs-ls tev100pp_ttbar_mg5%rfast001 # list reco files with the tag "rfast001" hs-get tev100pp_ttbar_mg5%rfast001 data # download to the "data" directory
The symbol “%” separates the sample name (tev100pp_ttbar_mg5) from the reconstruction tag (rfast001). If you want to download 10 files in 3 threads, use this:
hs-get tev100pp_ttbar_mg5%rfast001 data 3 10
As before, one can also download the files using the URL approach:
hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data
Note that the reconstruction tag “rfast001” is separated by backslash as for the usual directory. As before, the URL can be taken from the list with mirrors. For example, if you want to download files from Nersc, use this URL:
hs-ls http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ # list all files hs-get http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ data
Written by Sergei Chekanov (ANL) 2016/02/08 10:26