This is an old revision of the document!
How to download, view and find data
Truth-level (EVGEN) data are stored in a platform-independent format called ProMC that allows very effective compression using a variable-byte encoding. This data format is supported by popular programming languages (C++, Java, Python) on major operating system (Windows, Mac, Linux, etc.). Open access to such files via the http protocol is the central element in the design of HepSim, since further processing using fast and full simulations is done using computer resources from multiple locations. The compact ProMC files optimized for web streaming, together with the http protocol optimized to handle many (relatively small) files, is one of the distinct features of HepSim compared to other production systems.
Simulated data after fast and full detector simulations are kept in ROOT and LCIO formats.
HepSim software toolkit
In order to work with HepSim database, install the HepSim software toolkit. For Linux/Mac with the bash shell, use:
wget http://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz source hs-toolkit/setup.sh
or using curl:
curl http://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz | tar -xz source hs-toolkit/setup.sh
This creates the directory “hs-toolkit” with HepSim commands. To run them, Java7/8 and above should be installed. One can also download hs-toolkit.tgz. Most commands are platform-independent and based on the hepsim.jar that can be used on Windows/Mac. The setup command adds all commands to the execution path.
You can view the available commands from this package by typing:
Event record browser
Use the event-record browser to view separate EVGEN events, cross sections and log files. On Linux/Mac, use:
Here we looked at one file of the Pythia8 (QCD) sample.
Of course, one can look at the local file as well:
after you downloaded it.
On Windows, download hepsim.jar and click the “hepsim.jar” file. Then open the ProMC file using the “File” menu. You will see a pop-up GUI browser which displays the MC record. You can search for a given particle name, view data layouts and log files using the [Menu]:
This works for full parton-shower simulations with detailed information on particles. Unlike the usual parton shower Monte Carlo, this browser has a detailed information on event weights, PDF uncertainties and scale uncertainties (in some cases). The browser can show 4-momenta of each event as well as the total cross sections (for NLO, you need to read all events to get an accurate cross section). Look at the ProMC file description.
Check the consistency of the EVGEN file and additional information as:
The last argument can be a file path on the local disk (faster than URL!). The output of the above command is:
File = http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc ProMC version = 2 Last modified = 2013-06-05 16:32:18 Description = PYTHIA8;PhaseSpace:mHatMin = 20;PhaseSpace:pTHatMin = 20;ParticleDecays:limitTau0 = on; ParticleDecays:tau0Max = 10;HiggsSM:all = on; Events = 10000 Sigma (pb) = 2.72474E1 ± 1.92589E-1 Lumi (pb-1) = 3.67007E2 Varint units = E:100000 L:1000 Log file: = logfile.txt The file was validated. Exit.
All entries are self-explanatory. Varint units - values used to multiply energy (momenta) to convert to variable-byte integers. The “E:100000” means that all px,py,pz,e,mass were multiplied by 100000, while all distances (x,y,z,t) were multiplied by 1000. See the ProMC archive format.
One can look at separate events using the above command after passing an integer argument that specifies the event to be looked at. This command prints the event number 100:
hs-info http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc 100
List available data files
Let us show how to find all files associated with a given Monte Carlo event sample. Go to HepSim database. Look at the links “Files”. It list the available files. Then find the files as:
where [name] is the dataset name, or the URL of the Info page, or the URL of the location of all files. This command shows a table with file names and their sizes.
Here is an example illustrating how to list all files from the Higgs to ttbar Monte Carlo sample:
If you want to create a simple list without decoration to make a file to be read by other program, use
hs-ls tev100_higgs_ttbar_mg5 simple
The string “simple” removes the decorations. If you want a list with full URL and without decorations, use:
hs-ls tev100_higgs_ttbar_mg5 simple-url
Similarly, one can use the Info or Download URL path:
The commands show this table:
Looking for http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 File name size (kB) ------------------------------------ mg5_Httbar_100tev_001.promc 24464 mg5_Httbar_100tev_002.promc 24360 mg5_Httbar_100tev_003.promc 24488 mg5_Httbar_100tev_004.promc 24516 mg5_Httbar_100tev_005.promc 24448 mg5_Httbar_100tev_006.promc 24484 mg5_Httbar_100tev_007.promc 24468 mg5_Httbar_100tev_008.promc 24516 mg5_Httbar_100tev_009.promc 24440 mg5_Httbar_100tev_010.promc 24592 mg5_Httbar_100tev_011.promc 24456 mg5_Httbar_100tev_012.promc 24568 mg5_Httbar_100tev_013.promc 24436 mg5_Httbar_100tev_014.promc 24416 mg5_Httbar_100tev_015.promc 24408 mg5_Httbar_100tev_016.promc 24484 mg5_Httbar_100tev_017.promc 24400 mg5_Httbar_100tev_018.promc 24460 mg5_Httbar_100tev_019.promc 24452 mg5_Httbar_100tev_020.promc 24420 ------------------------------------ -> Summary: Nr of files=20 Total size=489 MB
Searching for a dataset
One can find a URL that corresponds to a dataset using a search word. The syntax is:
hs-find [search word]
For example, this commands list all datasets that have the keyword “higgs” in the name of the dataset, in the name of Monte Carlo models, or in the file description:
Note that there is no need to use any character such as “*” or “%” as for the usual regular expression. The match is done using any dataset that has the sub-string “higgs”. The same functionality is implemented in the “Search” menu of the HepSim interface.
If you are interested in a specific reconstruction tag, use “%” to separate the search string and the tag name. Example:
It will search for Pythia samples after a fast detector simulation with the tag “001”. To search for a full detector simulation, replace “rfast” with “rfull”.
Downloading truth-level files
One can download all files for a given dataset as:
hs-get [name] [OUTPUT_DIR]
where [name] is either the name of the dataset, or the URL of Info page HepSim repository, or a direct URL pointing to the locations of ProMC files. This example downloads dataset “tev100_higgs_ttbar_mg5” to the directory “data”:
hs-get tev100_higgs_ttbar_mg5 data
You will be prompted to use certain mirror (if there are alternative mirrors). Select the mirror and start downloading the files.
Alternatively, this example downloads files using the URL of the Info page:
hs-get http://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data
Or, if you know the download URL with the file locations, use this command:
hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data
All these examples will download all files from the “tev100_higgs_ttbar_mg5” event sample.
You can stop downloading using [Ctrt-D]. Next time you start it, it will continue the download. By default, we use 2 threads. One can increase the number of threads by adding an integer number to the end of this command. If a second integer is given, it will be used to set the maximum number of files to download. This example shows how to download 10 files using 3 threads:
hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 higgs_ttbar_mg5 3 10
Instead of [Download URL], one can use the URL of the info page, or the name of the dataset. Here are 2 identical examples to download 5 files using single (1) thread and the ouput directory “data”:
Using the URL of the info page:
hs-get http://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data 1 5
or, when using the dataset name given on the info page:
hs-get tev100_higgs_ttbar_mg5 data 1 5
You can also download files that have certain pattern in the names. If a directory contains files generated with different pt cuts, the names are usually have the substring “pt”, followed by the pT cuts. In this case, one can download such files as:
hs-get tev13_higgs_pythia8_ptbins data 2 5 pt100_
The last argument shows that all the downloaded files should have the string “pt100_” in their names (in this case, it tells that the file are generated with pT>100 GeV).
The general usage of the hs-get command requires 2, 3, 4 or 5 arguments:
[URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)].
where [URL] is either info URL, [Download URL], or the dataset name.
It is not recommended to use more than 6 threads for downloads.
Downloading files after detector simulation
Some datasets contain reconstructed files after detector simulations. Reconstructed files are stored inside the directory “rfastNNN” (fast simulation) or “rfullNNN” (full simulation), where “NNN” is the version number. For example, tev100_ttbar_mg5 sample includes the link “rfast001” (Delphes fast simulation, version 001). To download the reconstructed events for the reconstruction tag “rfast001”, use this syntax:
hs-ls tev100_ttbar_mg5%rfast001 # list reco files with the tag "rfast001" hs-get tev100_ttbar_mg5%rfast001 data # download to the "data" directory
The symbol “%” separates the sample name (tev100_ttbar_mg5) from the reconstruction tag (rfast001). If you want to download 10 files in 3 threads, use this:
hs-get tev100_ttbar_mg5%rfast001 data 3 10
As before, one can also download the files using the URL approach:
hs-ls http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data
Note that the reconstruction tag “rfast001” is separated by backslash as for the usual directory. As before, the URL can be taken from the list with mirrors. For example, if you want to download files from Nersc, use this URL:
hs-ls http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ # list all files hs-get http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ data
Send comments to: — Sergei Chekanov (ANL) 2014/02/08 10:26