User Tools

Site Tools


hepsim:usage_download

Differences

This shows you the differences between two versions of the page.


hepsim:usage_download [2024/07/01 21:25] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +{{indexmenu_n>2}}
 +
 +[[:|<< back to HepSim manual]]
 +
 +======  HepSim data files ======
 +
 +======  EVGEN events ======
 +Truth-level (EVGEN) data are stored in a platform-independent format called [[https://atlaswww.hep.anl.gov/asc/promc/ | ProMC]] that allows very effective compression using a variable-byte encoding.  This data format is
 +supported by popular programming languages (C++, Java, Python) on major  operating system (Windows, Mac, Linux, etc.).
 +Open access to such files via the http protocol is the central element  in the design of HepSim, since
 +further processing using fast and full simulations is done using computer resources from multiple locations.
 +The compact ProMC files optimized for web streaming, together with the http protocol optimized  
 +to handle many (relatively small) files, is one of the distinct features of HepSim compared to other production systems.
 +
 +Simulated data after fast and full detector simulations are kept in ROOT and LCIO  formats. 
 +
 +======  HepSim software toolkit ======
 +
 +In order to work with HepSim database, install the HepSim software toolkit. For Linux/Mac with the bash shell, use:
 +
 +<code bash>
 +wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz
 +source hs-toolkit/setup.sh
 +</code>
 +
 +or using curl:
 +
 +<code bash>
 +curl https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz | tar -xz
 +source hs-toolkit/setup.sh
 +</code>
 +
 +This creates the directory "hs-toolkit" with HepSim commands. To run them,
 +[[https://www.java.com/en/download/|Java7/8]] and above should be installed.
 +One can also download  [[http://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz|hs-toolkit.tgz]].
 +Most commands are platform-independent and based on the
 +[[http://atlaswww.hep.anl.gov/hepsim/hepsim.jar|hepsim.jar]] that can be used on Windows/Mac.
 +The setup command  adds all commands to the execution path.
 +
 +You can view the available commands from this package by typing:
 +
 +<code bash>
 +hs-help
 +</code>
 +
 +<note tip>The programs included in **hs-toolkit** can be used for searching and downloading ProMC files, and other files after fast and full simulations.
 +If you need to analyse and display reconstructed events in the LCIO format, use the [[https://atlaswww.hep.anl.gov/asc/jas4pp|Jas4PP program]] which includes **hs-toolkit**, but also adds packages for processing SLCIO files. This program can also process ProMC files.
 +</note>
 +
 +
 +======  Event record browser ======
 +
 +Use the event-record browser to view separate EVGEN events, cross sections and log files. On Linux/Mac, use:
 +
 +<code bash>
 +hs-view https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
 +</code>
 +Here we have looked at one file of the [[http://atlaswww.hep.anl.gov/hepsim/info.php?item=141 | Pythia8 (QCD) sample]]. 
 +
 +Of course, one can look at the local file as well:
 +
 +<code bash>
 +wget https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
 +hs-view tev14_mg5_Httbar_001.promc
 +</code>
 +after you downloaded it.
 +
 +On Windows, download [[https://atlaswww.hep.anl.gov/asc/hepsim/hepsim.jar| hepsim.jar]] and click the "hepsim.jar" file. Then  open the ProMC file using the "File" menu. 
 +You will see a pop-up GUI browser which displays the MC record. You can search for a given particle name, view data layouts and log files using the [Menu]:
 +
 +<hidden>
 +{{:hepsim:promc_browser.png| ProMC browser}}
 +</hidden>
 +
 +This works for full parton-shower simulations with detailed information on particles.
 +Unlike the usual parton shower Monte Carlo, this  browser has a detailed information on event weights, PDF uncertainties and scale uncertainties (in some cases). The browser can show 4-momenta of each event as well as the total cross sections (for NLO, you need to read all events to get an accurate cross section). Look at the [[https://atlaswww.hep.anl.gov/asc/promc/| ProMC file]] description.
 +
 +======  File validation ======
 +
 +Check the consistency of the EVGEN file and additional information as:
 +
 +<code bash>
 +hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
 +</code>
 +
 +The last argument can be a file path on the local disk (faster than URL!). The output of the above command is:
 +
 +<code>
 +File          = http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
 +ProMC version = 4
 +Last modified = 2015-10-03 12:06:52
 +Description   = run_Httbar_14tev
 +Events        = 10000
 +Sigma    (pb) = 5.61176E-1 ± 3.3035E-3
 +Lumi   (pb-1) = 1.78197E4
 +Varint units  = E:100000 L:1000
 +Log file:     = logfile.txt
 +####  The file is healthy!  ####
 +</code>
 +
 +All entries are self-explanatory. Varint units - values used to multiply energy (momenta) to convert to variable-byte integers.
 +The "E:100000" means that all px, py, pz, e, mass values are multiplied by 100000, while all distances (x,y,z,t) are multiplied by  1000.
 +See the [[https://atlaswww.hep.anl.gov/asc/promc/ | ProMC archive format]].
 +
 +One can look at separate events using the above command after passing an integer argument that specifies
 +the event to be looked at. This command prints the event number 100:
 +
 +<code bash>
 +hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc 100
 +</code>
 +
 +====== List available data files ======
 +
 +Let us show how to find all files associated with a given Monte Carlo event sample.
 +Go to [[https://atlaswww.hep.anl.gov/hepsim/ | HepSim database]]. Look at the links "Files". It list the available files.
 +Then find the files as:                       
 +<code bash>
 +      hs-ls [name]              
 +</code>
 +where [name] is the dataset name, or the URL of the Info page, or the URL of the location of all files.
 +This command shows a table with file names and their sizes.
 +
 +Here is an example illustrating how to list all files from the [[https://atlaswww.hep.anl.gov/hepsim/info.php?item=2|Higgs to ttbar]]                
 +Monte Carlo sample:
 +
 +<code bash>
 +hs-ls tev100pp_higgs_ttbar_mg5
 +</code>
 +
 +If you want to create a simple list without decoration to make a file to be read by other program, use
 +
 +<code bash>
 +hs-ls tev100pp_higgs_ttbar_mg5 simple
 +</code>
 +The string "simple" removes the decorations. If you want a list with full URL and without decorations, use:
 +
 +<code bash>
 +hs-ls tev100pp_higgs_ttbar_mg5 simple-url
 +</code>
 +
 +
 +Similarly, one can use the Info or Download URL path:
 +
 +<code bash>
 +hs-ls https://atlaswww.hep.anl.gov/hepsim/info.php?item=2
 +</code>
 +
 +or 
 +
 +<code bash>
 +hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/
 +</code>
 +
 +The commands show this table:
 +
 +<hidden>
 +<code>
 +Looking for http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5
 +File name                size (kB)
 +------------------------------------
 +mg5_Httbar_100tev_001.promc 24464
 +mg5_Httbar_100tev_002.promc 24360
 +mg5_Httbar_100tev_003.promc 24488
 +mg5_Httbar_100tev_004.promc 24516
 +mg5_Httbar_100tev_005.promc 24448
 +mg5_Httbar_100tev_006.promc 24484
 +mg5_Httbar_100tev_007.promc 24468
 +mg5_Httbar_100tev_008.promc 24516
 +mg5_Httbar_100tev_009.promc 24440
 +mg5_Httbar_100tev_010.promc 24592
 +mg5_Httbar_100tev_011.promc 24456
 +mg5_Httbar_100tev_012.promc 24568
 +mg5_Httbar_100tev_013.promc 24436
 +mg5_Httbar_100tev_014.promc 24416
 +mg5_Httbar_100tev_015.promc 24408
 +mg5_Httbar_100tev_016.promc 24484
 +mg5_Httbar_100tev_017.promc 24400
 +mg5_Httbar_100tev_018.promc 24460
 +mg5_Httbar_100tev_019.promc 24452
 +mg5_Httbar_100tev_020.promc 24420
 +------------------------------------
 +-> Summary: Nr of files=20  Total size=489 MB
 +</code>
 +</hidden>
 +
 +
 +
 +====== Searching for a dataset  ======
 +
 +One can find a URL that corresponds to a dataset using a search word. The syntax is:
 +
 +<code bash>
 +hs-find [search word]
 +</code>
 +
 +For example, this commands list all datasets that have the keyword  "higgs" in the name of the dataset,
 +in the name of Monte Carlo models, or in the file description:
 +
 +<code bash>
 +hs-find higgs
 +</code>
 +
 +Note that there is no need to use any character such as "*" or "%" as for the usual regular expression. 
 +The match is done using any dataset that has the sub-string "higgs".
 +The same functionality is implemented in the "Search" menu of the HepSim interface.
 +
 +If you are interested in a specific reconstruction tag, use "%" to separate the search string and the tag name.
 +Example:
 +
 +<code bash>
 +hs-find pythia%rfast001
 +</code>
 +It will search for Pythia samples after a fast detector simulation with the tag "001". To search for a full detector simulation, replace
 +"rfast" with "rfull".
 +
 +====== Downloading truth-level files ======
 +
 +One can download all files for a given dataset as:
 +<code bash>
 +hs-get [name] [OUTPUT_DIR]
 +</code>
 +where [name] is either the name of the dataset, or the URL of Info page [[https://atlaswww.hep.anl.gov/hepsim/ | HepSim repository]], or a direct URL pointing to the locations of ProMC files.
 +This example downloads dataset "tev100pp_higgs_ttbar_mg5" to the directory "data":
 +<code bash>
 +hs-get tev100pp_higgs_ttbar_mg5 data
 +</code>
 +You will be prompted to use certain mirror (if there are alternative mirrors). Select the mirror and start downloading the files.
 +
 +Alternatively, this example downloads files using the URL of the Info page:
 +<code bash>
 +hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data
 +</code>
 +Or, if you know the download URL with the file locations, use this command:
 +<code bash>
 +hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data
 +</code>
 +All these examples will download all files from the "tev100pp_higgs_ttbar_mg5" event sample. 
 +
 +<note important>If you see that the download is slow, use an alternative URL from the mirror list which is given for each dataset.
 +</note>
 +
 +You can stop downloading using [Ctrt-D]. Next time you start it, it will continue the download.
 +By default, we use 2 threads. One  can increase the number of threads by adding an integer number to the end of this command.
 +If a second integer is given, it will be used to set the maximum number of files to download.
 +This example shows how to download 10 files using 3  threads:
 +<code bash>
 +hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 higgs_ttbar_mg5 3 10
 +</code>
 +
 +Instead of [Download URL], one can use the URL of the info page, or the name of the dataset.
 +Here are 2 identical examples to download 5 files using single (1) thread and the output directory "data":
 +
 +Using the URL of the info page:
 +<code bash>
 +hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2  data 1 5
 +</code>
 +or, when using the dataset name given on the info page:
 +<code bash>
 +hs-get tev100pp_higgs_ttbar_mg5  data 1 5
 +</code>
 +
 +You can also download files that have certain pattern in the names. If a directory contains files generated with different pt cuts,
 +the names are usually have the sub-string "pt", followed by the pT cuts. In this case, one can download such files as:
 +
 +<code bash>
 +hs-get tev13pp_higgs_pythia8_ptbins data 2 5 pt100_
 +</code>
 +The last argument shows that all the downloaded files should have the string "pt100_" in their names (in this case, it tells that the 
 +file are generated with pT>100 GeV).
 +
 +The general usage of the hs-get command requires 2, 3, 4 or 5 arguments: 
 +
 +<code bash>
 +hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)].
 +</code>
 +where [URL] is either info URL, [Download URL], or the dataset name.
 +
 +It is not recommended to use more than 6 threads for downloads.
 +
 +
 +====== Downloading files after detector simulation ======
 +
 +
 +Some datasets contain reconstructed files after detector simulations.
 +Reconstructed files are stored inside the directory "rfastNNN" (fast simulation) or "rfullNNN" (full simulation),
 +where "NNN" is the version number. For example,
 +[[https://atlaswww.hep.anl.gov/hepsim/info.php?item=15|tev100pp_ttbar_mg5]] sample includes the link "rfast001" (Delphes
 +fast simulation, version 001). To download the reconstructed events for the reconstruction tag "rfast001", use this syntax:
 +
 +<code bash>
 +hs-ls  tev100pp_ttbar_mg5%rfast001      # list reco files with the tag "rfast001"
 +hs-get tev100pp_ttbar_mg5%rfast001 data # download to the "data" directory
 +</code>
 +The symbol "%" separates the sample name (tev100pp_ttbar_mg5) from the reconstruction tag (rfast001).
 +If you want to download 10 files in 3 threads, use this:
 +<code>
 +hs-get  tev100pp_ttbar_mg5%rfast001 data 3 10
 +</code>
 +
 +As before, one can also download the files using the URL approach:
 +<code bash>
 +hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
 +hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data
 +</code>
 +Note that the reconstruction tag "rfast001" is separated by backslash as for the usual directory.
 +As before, the URL can be taken from the list with mirrors. For example, if you want to download  files from Nersc, use this URL:
 +
 +<code bash>
 +hs-ls  http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ # list all files
 +hs-get http://portal.nersc.gov/project/m1758/data/events/pp/100tev/ttbar_mg5/rfast001/ data
 +</code>
 +
 +
 +
 +Written by  //[[[email protected]|Sergei Chekanov (ANL)]] 2016/02/08 10:26//
  
hepsim/usage_download.txt · Last modified: 2024/07/01 21:25 by 127.0.0.1