{{indexmenu_n>2}}
[[:|<< back to HepSim manual]]
====== HepSim data files ======
====== EVGEN events ======
Truth-level (EVGEN) data are stored in a platform-independent format called [[https://atlaswww.hep.anl.gov/asc/promc/ | ProMC]] that allows very effective compression using a variable-byte encoding. This data format is
supported by popular programming languages (C++, Java, Python) on major operating system (Windows, Mac, Linux, etc.).
Open access to such files via the http protocol is the central element in the design of HepSim, since
further processing using fast and full simulations is done using computer resources from multiple locations.
The compact ProMC files optimized for web streaming, together with the http protocol optimized
to handle many (relatively small) files, is one of the distinct features of HepSim compared to other production systems.
Simulated data after fast and full detector simulations are kept in ROOT and LCIO formats.
====== HepSim software toolkit ======
In order to work with HepSim database, install the HepSim software toolkit. For Linux/Mac with the bash shell, use:
wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz
source hs-toolkit/setup.sh
or using curl:
curl https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz | tar -xz
source hs-toolkit/setup.sh
This creates the directory "hs-toolkit" with HepSim commands. To run them,
[[https://www.java.com/en/download/|Java7/8]] and above should be installed.
One can also download [[http://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz|hs-toolkit.tgz]].
Most commands are platform-independent and based on the
[[http://atlaswww.hep.anl.gov/hepsim/hepsim.jar|hepsim.jar]] that can be used on Windows/Mac.
The setup command adds all commands to the execution path.
You can view the available commands from this package by typing:
hs-help
The programs included in **hs-toolkit** can be used for searching and downloading ProMC files, and other files after fast and full simulations.
If you need to analyse and display reconstructed events in the LCIO format, use the [[https://atlaswww.hep.anl.gov/asc/jas4pp|Jas4PP program]] which includes **hs-toolkit**, but also adds packages for processing SLCIO files. This program can also process ProMC files.
====== Event record browser ======
Use the event-record browser to view separate EVGEN events, cross sections and log files. On Linux/Mac, use:
hs-view https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
Here we have looked at one file of the [[http://atlaswww.hep.anl.gov/hepsim/info.php?item=141 | Pythia8 (QCD) sample]].
Of course, one can look at the local file as well:
wget https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
hs-view tev14_mg5_Httbar_001.promc
after you downloaded it.
On Windows, download [[https://atlaswww.hep.anl.gov/asc/hepsim/hepsim.jar| hepsim.jar]] and click the "hepsim.jar" file. Then open the ProMC file using the "File" menu.
You will see a pop-up GUI browser which displays the MC record. You can search for a given particle name, view data layouts and log files using the [Menu]:
{{:hepsim:promc_browser.png| ProMC browser}}
This works for full parton-shower simulations with detailed information on particles.
Unlike the usual parton shower Monte Carlo, this browser has a detailed information on event weights, PDF uncertainties and scale uncertainties (in some cases). The browser can show 4-momenta of each event as well as the total cross sections (for NLO, you need to read all events to get an accurate cross section). Look at the [[https://atlaswww.hep.anl.gov/asc/promc/| ProMC file]] description.
====== File validation ======
Check the consistency of the EVGEN file and additional information as:
hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
The last argument can be a file path on the local disk (faster than URL!). The output of the above command is:
File = http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
ProMC version = 4
Last modified = 2015-10-03 12:06:52
Description = run_Httbar_14tev
Events = 10000
Sigma (pb) = 5.61176E-1 ± 3.3035E-3
Lumi (pb-1) = 1.78197E4
Varint units = E:100000 L:1000
Log file: = logfile.txt
#### The file is healthy! ####
All entries are self-explanatory. Varint units - values used to multiply energy (momenta) to convert to variable-byte integers.
The "E:100000" means that all px, py, pz, e, mass values are multiplied by 100000, while all distances (x,y,z,t) are multiplied by 1000.
See the [[https://atlaswww.hep.anl.gov/asc/promc/ | ProMC archive format]].
One can look at separate events using the above command after passing an integer argument that specifies
the event to be looked at. This command prints the event number 100:
hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc 100
====== List available data files ======
Let us show how to find all files associated with a given Monte Carlo event sample.
Go to [[https://atlaswww.hep.anl.gov/hepsim/ | HepSim database]]. Look at the links "Files". It list the available files.
Then find the files as:
hs-ls [name]
where [name] is the dataset name, or the URL of the Info page, or the URL of the location of all files.
This command shows a table with file names and their sizes.
Here is an example illustrating how to list all files from the [[https://atlaswww.hep.anl.gov/hepsim/info.php?item=2|Higgs to ttbar]]
Monte Carlo sample:
hs-ls tev100pp_higgs_ttbar_mg5
If you want to create a simple list without decoration to make a file to be read by other program, use
hs-ls tev100pp_higgs_ttbar_mg5 simple
The string "simple" removes the decorations. If you want a list with full URL and without decorations, use:
hs-ls tev100pp_higgs_ttbar_mg5 simple-url
Similarly, one can use the Info or Download URL path:
hs-ls https://atlaswww.hep.anl.gov/hepsim/info.php?item=2
or
hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/
The commands show this table:
Looking for http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5
File name size (kB)
------------------------------------
mg5_Httbar_100tev_001.promc 24464
mg5_Httbar_100tev_002.promc 24360
mg5_Httbar_100tev_003.promc 24488
mg5_Httbar_100tev_004.promc 24516
mg5_Httbar_100tev_005.promc 24448
mg5_Httbar_100tev_006.promc 24484
mg5_Httbar_100tev_007.promc 24468
mg5_Httbar_100tev_008.promc 24516
mg5_Httbar_100tev_009.promc 24440
mg5_Httbar_100tev_010.promc 24592
mg5_Httbar_100tev_011.promc 24456
mg5_Httbar_100tev_012.promc 24568
mg5_Httbar_100tev_013.promc 24436
mg5_Httbar_100tev_014.promc 24416
mg5_Httbar_100tev_015.promc 24408
mg5_Httbar_100tev_016.promc 24484
mg5_Httbar_100tev_017.promc 24400
mg5_Httbar_100tev_018.promc 24460
mg5_Httbar_100tev_019.promc 24452
mg5_Httbar_100tev_020.promc 24420
------------------------------------
-> Summary: Nr of files=20 Total size=489 MB
====== Searching for a dataset ======
One can find a URL that corresponds to a dataset using a search word. The syntax is:
hs-find [search word]
For example, this commands list all datasets that have the keyword "higgs" in the name of the dataset,
in the name of Monte Carlo models, or in the file description:
hs-find higgs
Note that there is no need to use any character such as "*" or "%" as for the usual regular expression.
The match is done using any dataset that has the sub-string "higgs".
The same functionality is implemented in the "Search" menu of the HepSim interface.
If you are interested in a specific reconstruction tag, use "%" to separate the search string and the tag name.
Example:
hs-find pythia%rfast001
It will search for Pythia samples after a fast detector simulation with the tag "001". To search for a full detector simulation, replace
"rfast" with "rfull".
====== Downloading truth-level files ======
One can download all files for a given dataset as:
hs-get [name] [OUTPUT_DIR]
where [name] is either the name of the dataset, or the URL of Info page [[https://atlaswww.hep.anl.gov/hepsim/ | HepSim repository]], or a direct URL pointing to the locations of ProMC files.
This example downloads dataset "tev100pp_higgs_ttbar_mg5" to the directory "data":
hs-get tev100pp_higgs_ttbar_mg5 data
You will be prompted to use certain mirror (if there are alternative mirrors). Select the mirror and start downloading the files.
Alternatively, this example downloads files using the URL of the Info page:
hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data
Or, if you know the download URL with the file locations, use this command:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data
All these examples will download all files from the "tev100pp_higgs_ttbar_mg5" event sample.
If you see that the download is slow, use an alternative URL from the mirror list which is given for each dataset.
You can stop downloading using [Ctrt-D]. Next time you start it, it will continue the download.
By default, we use 2 threads. One can increase the number of threads by adding an integer number to the end of this command.
If a second integer is given, it will be used to set the maximum number of files to download.
This example shows how to download 10 files using 3 threads:
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 higgs_ttbar_mg5 3 10
Instead of [Download URL], one can use the URL of the info page, or the name of the dataset.
Here are 2 identical examples to download 5 files using single (1) thread and the output directory "data":
Using the URL of the info page:
hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data 1 5
or, when using the dataset name given on the info page:
hs-get tev100pp_higgs_ttbar_mg5 data 1 5
You can also download files that have certain pattern in the names. If a directory contains files generated with different pt cuts,
the names are usually have the sub-string "pt", followed by the pT cuts. In this case, one can download such files as:
hs-get tev13pp_higgs_pythia8_ptbins data 2 5 pt100_
The last argument shows that all the downloaded files should have the string "pt100_" in their names (in this case, it tells that the
file are generated with pT>100 GeV).
The general usage of the hs-get command requires 2, 3, 4 or 5 arguments:
hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)].
where [URL] is either info URL, [Download URL], or the dataset name.
It is not recommended to use more than 6 threads for downloads.
====== Downloading files after detector simulation ======
Some datasets contain reconstructed files after detector simulations.
Reconstructed files are stored inside the directory "rfastNNN" (fast simulation) or "rfullNNN" (full simulation),
where "NNN" is the version number. For example,
[[https://atlaswww.hep.anl.gov/hepsim/info.php?item=15|tev100pp_ttbar_mg5]] sample includes the link "rfast001" (Delphes
fast simulation, version 001). To download the reconstructed events for the reconstruction tag "rfast001", use this syntax:
hs-ls tev100pp_ttbar_mg5%rfast001 # list reco files with the tag "rfast001"
hs-get tev100pp_ttbar_mg5%rfast001 data # download to the "data" directory
The symbol "%" separates the sample name (tev100pp_ttbar_mg5) from the reconstruction tag (rfast001).
If you want to download 10 files in 3 threads, use this:
hs-get tev100pp_ttbar_mg5%rfast001 data 3 10
As before, one can also download the files using the URL approach:
hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data
Note that the reconstruction tag "rfast001" is separated by backslash as for the usual directory.
As before, the URL can be taken from the list with mirrors. For example, if you want to download files from Nersc, use this URL:
hs-ls http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
hs-get http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ data
Written by //[[chekanov@anl.gov|Sergei Chekanov (ANL)]] 2016/02/08 10:26//