hepsim:usage_download
Differences
This shows you the differences between two versions of the page.
— | hepsim:usage_download [2024/07/01 21:25] (current) – created - external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | {{indexmenu_n> | ||
+ | |||
+ | [[: | ||
+ | |||
+ | ====== | ||
+ | |||
+ | ====== | ||
+ | Truth-level (EVGEN) data are stored in a platform-independent format called [[https:// | ||
+ | supported by popular programming languages (C++, Java, Python) on major operating system (Windows, Mac, Linux, etc.). | ||
+ | Open access to such files via the http protocol is the central element | ||
+ | further processing using fast and full simulations is done using computer resources from multiple locations. | ||
+ | The compact ProMC files optimized for web streaming, together with the http protocol optimized | ||
+ | to handle many (relatively small) files, is one of the distinct features of HepSim compared to other production systems. | ||
+ | |||
+ | Simulated data after fast and full detector simulations are kept in ROOT and LCIO formats. | ||
+ | |||
+ | ====== | ||
+ | |||
+ | In order to work with HepSim database, install the HepSim software toolkit. For Linux/Mac with the bash shell, use: | ||
+ | |||
+ | <code bash> | ||
+ | wget https:// | ||
+ | source hs-toolkit/ | ||
+ | </ | ||
+ | |||
+ | or using curl: | ||
+ | |||
+ | <code bash> | ||
+ | curl https:// | ||
+ | source hs-toolkit/ | ||
+ | </ | ||
+ | |||
+ | This creates the directory " | ||
+ | [[https:// | ||
+ | One can also download | ||
+ | Most commands are platform-independent and based on the | ||
+ | [[http:// | ||
+ | The setup command | ||
+ | |||
+ | You can view the available commands from this package by typing: | ||
+ | |||
+ | <code bash> | ||
+ | hs-help | ||
+ | </ | ||
+ | |||
+ | <note tip>The programs included in **hs-toolkit** can be used for searching and downloading ProMC files, and other files after fast and full simulations. | ||
+ | If you need to analyse and display reconstructed events in the LCIO format, use the [[https:// | ||
+ | </ | ||
+ | |||
+ | |||
+ | ====== | ||
+ | |||
+ | Use the event-record browser to view separate EVGEN events, cross sections and log files. On Linux/Mac, use: | ||
+ | |||
+ | <code bash> | ||
+ | hs-view https:// | ||
+ | </ | ||
+ | Here we have looked at one file of the [[http:// | ||
+ | |||
+ | Of course, one can look at the local file as well: | ||
+ | |||
+ | <code bash> | ||
+ | wget https:// | ||
+ | hs-view tev14_mg5_Httbar_001.promc | ||
+ | </ | ||
+ | after you downloaded it. | ||
+ | |||
+ | On Windows, download [[https:// | ||
+ | You will see a pop-up GUI browser which displays the MC record. You can search for a given particle name, view data layouts and log files using the [Menu]: | ||
+ | |||
+ | < | ||
+ | {{: | ||
+ | </ | ||
+ | |||
+ | This works for full parton-shower simulations with detailed information on particles. | ||
+ | Unlike the usual parton shower Monte Carlo, this browser has a detailed information on event weights, PDF uncertainties and scale uncertainties (in some cases). The browser can show 4-momenta of each event as well as the total cross sections (for NLO, you need to read all events to get an accurate cross section). Look at the [[https:// | ||
+ | |||
+ | ====== | ||
+ | |||
+ | Check the consistency of the EVGEN file and additional information as: | ||
+ | |||
+ | <code bash> | ||
+ | hs-info https:// | ||
+ | </ | ||
+ | |||
+ | The last argument can be a file path on the local disk (faster than URL!). The output of the above command is: | ||
+ | |||
+ | < | ||
+ | File = http:// | ||
+ | ProMC version = 4 | ||
+ | Last modified = 2015-10-03 12:06:52 | ||
+ | Description | ||
+ | Events | ||
+ | Sigma (pb) = 5.61176E-1 ± 3.3035E-3 | ||
+ | Lumi | ||
+ | Varint units = E:100000 L:1000 | ||
+ | Log file: = logfile.txt | ||
+ | #### The file is healthy! | ||
+ | </ | ||
+ | |||
+ | All entries are self-explanatory. Varint units - values used to multiply energy (momenta) to convert to variable-byte integers. | ||
+ | The " | ||
+ | See the [[https:// | ||
+ | |||
+ | One can look at separate events using the above command after passing an integer argument that specifies | ||
+ | the event to be looked at. This command prints the event number 100: | ||
+ | |||
+ | <code bash> | ||
+ | hs-info https:// | ||
+ | </ | ||
+ | |||
+ | ====== List available data files ====== | ||
+ | |||
+ | Let us show how to find all files associated with a given Monte Carlo event sample. | ||
+ | Go to [[https:// | ||
+ | Then find the files as: | ||
+ | <code bash> | ||
+ | hs-ls [name] | ||
+ | </ | ||
+ | where [name] is the dataset name, or the URL of the Info page, or the URL of the location of all files. | ||
+ | This command shows a table with file names and their sizes. | ||
+ | |||
+ | Here is an example illustrating how to list all files from the [[https:// | ||
+ | Monte Carlo sample: | ||
+ | |||
+ | <code bash> | ||
+ | hs-ls tev100pp_higgs_ttbar_mg5 | ||
+ | </ | ||
+ | |||
+ | If you want to create a simple list without decoration to make a file to be read by other program, use | ||
+ | |||
+ | <code bash> | ||
+ | hs-ls tev100pp_higgs_ttbar_mg5 simple | ||
+ | </ | ||
+ | The string " | ||
+ | |||
+ | <code bash> | ||
+ | hs-ls tev100pp_higgs_ttbar_mg5 simple-url | ||
+ | </ | ||
+ | |||
+ | |||
+ | Similarly, one can use the Info or Download URL path: | ||
+ | |||
+ | <code bash> | ||
+ | hs-ls https:// | ||
+ | </ | ||
+ | |||
+ | or | ||
+ | |||
+ | <code bash> | ||
+ | hs-ls https:// | ||
+ | </ | ||
+ | |||
+ | The commands show this table: | ||
+ | |||
+ | < | ||
+ | < | ||
+ | Looking for http:// | ||
+ | File name size (kB) | ||
+ | ------------------------------------ | ||
+ | mg5_Httbar_100tev_001.promc 24464 | ||
+ | mg5_Httbar_100tev_002.promc 24360 | ||
+ | mg5_Httbar_100tev_003.promc 24488 | ||
+ | mg5_Httbar_100tev_004.promc 24516 | ||
+ | mg5_Httbar_100tev_005.promc 24448 | ||
+ | mg5_Httbar_100tev_006.promc 24484 | ||
+ | mg5_Httbar_100tev_007.promc 24468 | ||
+ | mg5_Httbar_100tev_008.promc 24516 | ||
+ | mg5_Httbar_100tev_009.promc 24440 | ||
+ | mg5_Httbar_100tev_010.promc 24592 | ||
+ | mg5_Httbar_100tev_011.promc 24456 | ||
+ | mg5_Httbar_100tev_012.promc 24568 | ||
+ | mg5_Httbar_100tev_013.promc 24436 | ||
+ | mg5_Httbar_100tev_014.promc 24416 | ||
+ | mg5_Httbar_100tev_015.promc 24408 | ||
+ | mg5_Httbar_100tev_016.promc 24484 | ||
+ | mg5_Httbar_100tev_017.promc 24400 | ||
+ | mg5_Httbar_100tev_018.promc 24460 | ||
+ | mg5_Httbar_100tev_019.promc 24452 | ||
+ | mg5_Httbar_100tev_020.promc 24420 | ||
+ | ------------------------------------ | ||
+ | -> Summary: Nr of files=20 | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | ====== Searching for a dataset | ||
+ | |||
+ | One can find a URL that corresponds to a dataset using a search word. The syntax is: | ||
+ | |||
+ | <code bash> | ||
+ | hs-find [search word] | ||
+ | </ | ||
+ | |||
+ | For example, this commands list all datasets that have the keyword | ||
+ | in the name of Monte Carlo models, or in the file description: | ||
+ | |||
+ | <code bash> | ||
+ | hs-find higgs | ||
+ | </ | ||
+ | |||
+ | Note that there is no need to use any character such as " | ||
+ | The match is done using any dataset that has the sub-string " | ||
+ | The same functionality is implemented in the " | ||
+ | |||
+ | If you are interested in a specific reconstruction tag, use " | ||
+ | Example: | ||
+ | |||
+ | <code bash> | ||
+ | hs-find pythia%rfast001 | ||
+ | </ | ||
+ | It will search for Pythia samples after a fast detector simulation with the tag " | ||
+ | " | ||
+ | |||
+ | ====== Downloading truth-level files ====== | ||
+ | |||
+ | One can download all files for a given dataset as: | ||
+ | <code bash> | ||
+ | hs-get [name] [OUTPUT_DIR] | ||
+ | </ | ||
+ | where [name] is either the name of the dataset, or the URL of Info page [[https:// | ||
+ | This example downloads dataset " | ||
+ | <code bash> | ||
+ | hs-get tev100pp_higgs_ttbar_mg5 data | ||
+ | </ | ||
+ | You will be prompted to use certain mirror (if there are alternative mirrors). Select the mirror and start downloading the files. | ||
+ | |||
+ | Alternatively, | ||
+ | <code bash> | ||
+ | hs-get https:// | ||
+ | </ | ||
+ | Or, if you know the download URL with the file locations, use this command: | ||
+ | <code bash> | ||
+ | hs-get https:// | ||
+ | </ | ||
+ | All these examples will download all files from the " | ||
+ | |||
+ | <note important> | ||
+ | </ | ||
+ | |||
+ | You can stop downloading using [Ctrt-D]. Next time you start it, it will continue the download. | ||
+ | By default, we use 2 threads. One can increase the number of threads by adding an integer number to the end of this command. | ||
+ | If a second integer is given, it will be used to set the maximum number of files to download. | ||
+ | This example shows how to download 10 files using 3 threads: | ||
+ | <code bash> | ||
+ | hs-get https:// | ||
+ | </ | ||
+ | |||
+ | Instead of [Download URL], one can use the URL of the info page, or the name of the dataset. | ||
+ | Here are 2 identical examples to download 5 files using single (1) thread and the output directory " | ||
+ | |||
+ | Using the URL of the info page: | ||
+ | <code bash> | ||
+ | hs-get https:// | ||
+ | </ | ||
+ | or, when using the dataset name given on the info page: | ||
+ | <code bash> | ||
+ | hs-get tev100pp_higgs_ttbar_mg5 | ||
+ | </ | ||
+ | |||
+ | You can also download files that have certain pattern in the names. If a directory contains files generated with different pt cuts, | ||
+ | the names are usually have the sub-string " | ||
+ | |||
+ | <code bash> | ||
+ | hs-get tev13pp_higgs_pythia8_ptbins data 2 5 pt100_ | ||
+ | </ | ||
+ | The last argument shows that all the downloaded files should have the string " | ||
+ | file are generated with pT>100 GeV). | ||
+ | |||
+ | The general usage of the hs-get command requires 2, 3, 4 or 5 arguments: | ||
+ | |||
+ | <code bash> | ||
+ | hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)]. | ||
+ | </ | ||
+ | where [URL] is either info URL, [Download URL], or the dataset name. | ||
+ | |||
+ | It is not recommended to use more than 6 threads for downloads. | ||
+ | |||
+ | |||
+ | ====== Downloading files after detector simulation ====== | ||
+ | |||
+ | |||
+ | Some datasets contain reconstructed files after detector simulations. | ||
+ | Reconstructed files are stored inside the directory " | ||
+ | where " | ||
+ | [[https:// | ||
+ | fast simulation, version 001). To download the reconstructed events for the reconstruction tag " | ||
+ | |||
+ | <code bash> | ||
+ | hs-ls tev100pp_ttbar_mg5%rfast001 | ||
+ | hs-get tev100pp_ttbar_mg5%rfast001 data # download to the " | ||
+ | </ | ||
+ | The symbol " | ||
+ | If you want to download 10 files in 3 threads, use this: | ||
+ | < | ||
+ | hs-get | ||
+ | </ | ||
+ | |||
+ | As before, one can also download the files using the URL approach: | ||
+ | <code bash> | ||
+ | hs-ls https:// | ||
+ | hs-get https:// | ||
+ | </ | ||
+ | Note that the reconstruction tag " | ||
+ | As before, the URL can be taken from the list with mirrors. For example, if you want to download | ||
+ | |||
+ | <code bash> | ||
+ | hs-ls http:// | ||
+ | hs-get http:// | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | Written by // | ||