Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
hepsim:usage_download [2017/02/06 23:20]
hepsim17
hepsim:usage_download [2020/05/22 02:45] (current)
hepsim17
Line 3: Line 3:
 [[:|<< back to HepSim manual]] [[:|<< back to HepSim manual]]
  
-======  How to download, view and find data ======+======  HepSim data files ======
  
 ======  EVGEN events ====== ======  EVGEN events ======
-Truth-level (EVGEN) data are stored in a platform-independent format called [[http://atlaswww.hep.anl.gov/asc/promc/ | ProMC]] that allows very effective compression using a variable-byte encoding.  This data format is+Truth-level (EVGEN) data are stored in a platform-independent format called [[https://atlaswww.hep.anl.gov/asc/promc/ | ProMC]] that allows very effective compression using a variable-byte encoding.  This data format is
 supported by popular programming languages (C++, Java, Python) on major  operating system (Windows, Mac, Linux, etc.). supported by popular programming languages (C++, Java, Python) on major  operating system (Windows, Mac, Linux, etc.).
 Open access to such files via the http protocol is the central element  in the design of HepSim, since Open access to such files via the http protocol is the central element  in the design of HepSim, since
Line 12: Line 12:
 The compact ProMC files optimized for web streaming, together with the http protocol optimized   The compact ProMC files optimized for web streaming, together with the http protocol optimized  
 to handle many (relatively small) files, is one of the distinct features of HepSim compared to other production systems. to handle many (relatively small) files, is one of the distinct features of HepSim compared to other production systems.
- 
  
 Simulated data after fast and full detector simulations are kept in ROOT and LCIO  formats.  Simulated data after fast and full detector simulations are kept in ROOT and LCIO  formats. 
- 
  
 ======  HepSim software toolkit ====== ======  HepSim software toolkit ======
Line 22: Line 20:
  
 <code bash> <code bash>
-wget http://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz+wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz
 source hs-toolkit/setup.sh source hs-toolkit/setup.sh
 </code> </code>
Line 29: Line 27:
  
 <code bash> <code bash>
-curl http://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz | tar -xz+curl https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz | tar -xz
 source hs-toolkit/setup.sh source hs-toolkit/setup.sh
 </code> </code>
Line 56: Line 54:
  
 <code bash> <code bash>
-hs-view http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc +hs-view https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
 </code> </code>
-Here we looked at one file of the [[http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/qcd/pythia8/ | Pythia8 (QCD) sample]]. +Here we have looked at one file of the [[http://atlaswww.hep.anl.gov/hepsim/info.php?item=141 | Pythia8 (QCD) sample]]. 
  
 Of course, one can look at the local file as well: Of course, one can look at the local file as well:
  
 <code bash> <code bash>
-hs-view pythia8_higgs_1.promc +wget https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc 
 +hs-view tev14_mg5_Httbar_001.promc
 </code> </code>
 after you downloaded it. after you downloaded it.
  
-On Windows, download [[http://atlaswww.hep.anl.gov/asc/hepsim/hepsim.jar| hepsim.jar]] and click the "hepsim.jar" file. Then  open the ProMC file using the "File" menu. +On Windows, download [[https://atlaswww.hep.anl.gov/asc/hepsim/hepsim.jar| hepsim.jar]] and click the "hepsim.jar" file. Then  open the ProMC file using the "File" menu. 
 You will see a pop-up GUI browser which displays the MC record. You can search for a given particle name, view data layouts and log files using the [Menu]: You will see a pop-up GUI browser which displays the MC record. You can search for a given particle name, view data layouts and log files using the [Menu]:
  
 <hidden> <hidden>
-{{:community:promc_browser.png| ProMC browser}}+{{:hepsim:promc_browser.png| ProMC browser}}
 </hidden> </hidden>
  
 This works for full parton-shower simulations with detailed information on particles. This works for full parton-shower simulations with detailed information on particles.
-Unlike the usual parton shower Monte Carlo, this  browser has a detailed information on event weights, PDF uncertainties and scale uncertainties (in some cases). The browser can show 4-momenta of each event as well as the total cross sections (for NLO, you need to read all events to get an accurate cross section). Look at the [[asc:promc| ProMC file]] description. +Unlike the usual parton shower Monte Carlo, this  browser has a detailed information on event weights, PDF uncertainties and scale uncertainties (in some cases). The browser can show 4-momenta of each event as well as the total cross sections (for NLO, you need to read all events to get an accurate cross section). Look at the [[https://atlaswww.hep.anl.gov/asc/promc/| ProMC file]] description.
- +
  
 ======  File validation ====== ======  File validation ======
Line 84: Line 81:
  
 <code bash> <code bash>
-hs-info http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc+hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc
 </code> </code>
  
Line 90: Line 87:
  
 <code> <code>
-File          = http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc +File          = http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc 
-ProMC version = 2 +ProMC version = 4 
-Last modified = 2013-06-05 16:32:18 +Last modified = 2015-10-03 12:06:52 
-Description   PYTHIA8;PhaseSpace:mHatMin = 20;PhaseSpace:pTHatMin = 20;ParticleDecays:limitTau0 = on; +Description   run_Httbar_14tev
-                ParticleDecays:tau0Max = 10;HiggsSM:all = on;+
 Events        = 10000 Events        = 10000
-Sigma    (pb) = 2.72474E1 ± 1.92589E-1 +Sigma    (pb) = 5.61176E-± 3.3035E-3 
-Lumi   (pb-1) = 3.67007E2+Lumi   (pb-1) = 1.78197E4
 Varint units  = E:100000 L:1000 Varint units  = E:100000 L:1000
 Log file:     = logfile.txt Log file:     = logfile.txt
-The file was validated. Exit.+####  The file is healthy!  ####
 </code> </code>
  
 All entries are self-explanatory. Varint units - values used to multiply energy (momenta) to convert to variable-byte integers. All entries are self-explanatory. Varint units - values used to multiply energy (momenta) to convert to variable-byte integers.
-The "E:100000" means that all px,py,pz,e,mass were multiplied by 100000, while all distances (x,y,z,t) were multiplied by  1000. +The "E:100000" means that all px, py, pz, e, mass values are multiplied by 100000, while all distances (x,y,z,t) are multiplied by  1000. 
-See the [[http://atlaswww.hep.anl.gov/asc/promc/ | ProMC archive format]].+See the [[https://atlaswww.hep.anl.gov/asc/promc/ | ProMC archive format]].
  
 One can look at separate events using the above command after passing an integer argument that specifies One can look at separate events using the above command after passing an integer argument that specifies
Line 111: Line 107:
  
 <code bash> <code bash>
-hs-info http://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/higgs/pythia8/pythia8_higgs_1.promc 100+hs-info https://mc.hep.anl.gov/asc/hepsim/events/pp/14tev/mg5_httbar/tev14_mg5_Httbar_001.promc 100
 </code> </code>
- 
- 
  
 ====== List available data files ====== ====== List available data files ======
  
 Let us show how to find all files associated with a given Monte Carlo event sample. Let us show how to find all files associated with a given Monte Carlo event sample.
-Go to [[http://atlaswww.hep.anl.gov/hepsim/ | HepSim database]]. Look at the links "Files". It list the available files.+Go to [[https://atlaswww.hep.anl.gov/hepsim/ | HepSim database]]. Look at the links "Files". It list the available files.
 Then find the files as:                        Then find the files as:                       
 <code bash> <code bash>
Line 127: Line 121:
 This command shows a table with file names and their sizes. This command shows a table with file names and their sizes.
  
-Here is an example illustrating how to list all files from the [[http://atlaswww.hep.anl.gov/hepsim/info.php?item=2|Higgs to ttbar]]                +Here is an example illustrating how to list all files from the [[https://atlaswww.hep.anl.gov/hepsim/info.php?item=2|Higgs to ttbar]]                
 Monte Carlo sample: Monte Carlo sample:
  
 <code bash> <code bash>
-hs-ls tev100_higgs_ttbar_mg5+hs-ls tev100pp_higgs_ttbar_mg5
 </code> </code>
  
Line 137: Line 131:
  
 <code bash> <code bash>
-hs-ls tev100_higgs_ttbar_mg5 simple+hs-ls tev100pp_higgs_ttbar_mg5 simple
 </code> </code>
 The string "simple" removes the decorations. If you want a list with full URL and without decorations, use: The string "simple" removes the decorations. If you want a list with full URL and without decorations, use:
  
 <code bash> <code bash>
-hs-ls tev100_higgs_ttbar_mg5 simple-url+hs-ls tev100pp_higgs_ttbar_mg5 simple-url
 </code> </code>
  
Line 149: Line 143:
  
 <code bash> <code bash>
-hs-ls http://atlaswww.hep.anl.gov/hepsim/info.php?item=2+hs-ls https://atlaswww.hep.anl.gov/hepsim/info.php?item=2
 </code> </code>
  
Line 155: Line 149:
  
 <code bash> <code bash>
-hs-ls http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/+hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5/
 </code> </code>
  
Line 213: Line 207:
 If you are interested in a specific reconstruction tag, use "%" to separate the search string and the tag name. If you are interested in a specific reconstruction tag, use "%" to separate the search string and the tag name.
 Example: Example:
 +
 <code bash> <code bash>
 hs-find pythia%rfast001 hs-find pythia%rfast001
Line 225: Line 220:
 hs-get [name] [OUTPUT_DIR] hs-get [name] [OUTPUT_DIR]
 </code> </code>
-where [name] is either the name of the dataset, or the URL of Info page [[http://atlaswww.hep.anl.gov/hepsim/ | HepSim repository]], or a direct URL pointing to the locations of ProMC files. +where [name] is either the name of the dataset, or the URL of Info page [[https://atlaswww.hep.anl.gov/hepsim/ | HepSim repository]], or a direct URL pointing to the locations of ProMC files. 
-This example downloads dataset "tev100_higgs_ttbar_mg5" to the directory "data":+This example downloads dataset "tev100pp_higgs_ttbar_mg5" to the directory "data":
 <code bash> <code bash>
-hs-get tev100_higgs_ttbar_mg5 data+hs-get tev100pp_higgs_ttbar_mg5 data
 </code> </code>
 You will be prompted to use certain mirror (if there are alternative mirrors). Select the mirror and start downloading the files. You will be prompted to use certain mirror (if there are alternative mirrors). Select the mirror and start downloading the files.
Line 234: Line 229:
 Alternatively, this example downloads files using the URL of the Info page: Alternatively, this example downloads files using the URL of the Info page:
 <code bash> <code bash>
-hs-get http://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data+hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2 data
 </code> </code>
 Or, if you know the download URL with the file locations, use this command: Or, if you know the download URL with the file locations, use this command:
 <code bash> <code bash>
-hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data+hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 data
 </code> </code>
-All these examples will download all files from the "tev100_higgs_ttbar_mg5" event sample. +All these examples will download all files from the "tev100pp_higgs_ttbar_mg5" event sample. 
  
 <note important>If you see that the download is slow, use an alternative URL from the mirror list which is given for each dataset. <note important>If you see that the download is slow, use an alternative URL from the mirror list which is given for each dataset.
Line 250: Line 245:
 This example shows how to download 10 files using 3  threads: This example shows how to download 10 files using 3  threads:
 <code bash> <code bash>
-hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 higgs_ttbar_mg5 3 10+hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/higgs_ttbar_mg5 higgs_ttbar_mg5 3 10
 </code> </code>
  
 Instead of [Download URL], one can use the URL of the info page, or the name of the dataset. Instead of [Download URL], one can use the URL of the info page, or the name of the dataset.
-Here are 2 identical examples to download 5 files using single (1) thread and the ouput directory "data":+Here are 2 identical examples to download 5 files using single (1) thread and the output directory "data":
  
 Using the URL of the info page: Using the URL of the info page:
 <code bash> <code bash>
-hs-get http://atlaswww.hep.anl.gov/hepsim/info.php?item=2  data 1 5+hs-get https://atlaswww.hep.anl.gov/hepsim/info.php?item=2  data 1 5
 </code> </code>
 or, when using the dataset name given on the info page: or, when using the dataset name given on the info page:
 <code bash> <code bash>
-hs-get tev100_higgs_ttbar_mg5  data 1 5+hs-get tev100pp_higgs_ttbar_mg5  data 1 5
 </code> </code>
  
 You can also download files that have certain pattern in the names. If a directory contains files generated with different pt cuts, You can also download files that have certain pattern in the names. If a directory contains files generated with different pt cuts,
-the names are usually have the substring "pt", followed by the pT cuts. In this case, one can download such files as:+the names are usually have the sub-string "pt", followed by the pT cuts. In this case, one can download such files as:
  
 <code bash> <code bash>
-hs-get tev13_higgs_pythia8_ptbins data 2 5 pt100_+hs-get tev13pp_higgs_pythia8_ptbins data 2 5 pt100_
 </code> </code>
 The last argument shows that all the downloaded files should have the string "pt100_" in their names (in this case, it tells that the  The last argument shows that all the downloaded files should have the string "pt100_" in their names (in this case, it tells that the 
Line 277: Line 272:
  
 <code bash> <code bash>
-[URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)].+hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)].
 </code> </code>
 where [URL] is either info URL, [Download URL], or the dataset name. where [URL] is either info URL, [Download URL], or the dataset name.
Line 290: Line 285:
 Reconstructed files are stored inside the directory "rfastNNN" (fast simulation) or "rfullNNN" (full simulation), Reconstructed files are stored inside the directory "rfastNNN" (fast simulation) or "rfullNNN" (full simulation),
 where "NNN" is the version number. For example, where "NNN" is the version number. For example,
-[[http://atlaswww.hep.anl.gov/hepsim/info.php?item=15|tev100_ttbar_mg5]] sample includes the link "rfast001" (Delphes+[[https://atlaswww.hep.anl.gov/hepsim/info.php?item=15|tev100pp_ttbar_mg5]] sample includes the link "rfast001" (Delphes
 fast simulation, version 001). To download the reconstructed events for the reconstruction tag "rfast001", use this syntax: fast simulation, version 001). To download the reconstructed events for the reconstruction tag "rfast001", use this syntax:
  
 <code bash> <code bash>
-hs-ls  tev100_ttbar_mg5%rfast001      # list reco files with the tag "rfast001" +hs-ls  tev100pp_ttbar_mg5%rfast001      # list reco files with the tag "rfast001" 
-hs-get tev100_ttbar_mg5%rfast001 data # download to the "data" directory+hs-get tev100pp_ttbar_mg5%rfast001 data # download to the "data" directory
 </code> </code>
-The symbol "%" separates the sample name (tev100_ttbar_mg5) from the reconstruction tag (rfast001).+The symbol "%" separates the sample name (tev100pp_ttbar_mg5) from the reconstruction tag (rfast001).
 If you want to download 10 files in 3 threads, use this: If you want to download 10 files in 3 threads, use this:
 <code> <code>
-hs-get  tev100_ttbar_mg5%rfast001 data 3 10+hs-get  tev100pp_ttbar_mg5%rfast001 data 3 10
 </code> </code>
  
 As before, one can also download the files using the URL approach: As before, one can also download the files using the URL approach:
 <code bash> <code bash>
-hs-ls http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files +hs-ls https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ # list all files 
-hs-get http://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data+hs-get https://mc.hep.anl.gov/asc/hepsim/events/pp/100tev/ttbar_mg5/rfast001/ data
 </code> </code>
 Note that the reconstruction tag "rfast001" is separated by backslash as for the usual directory. Note that the reconstruction tag "rfast001" is separated by backslash as for the usual directory.
Line 318: Line 313:
  
  
- +Written by  //[[[email protected]|Sergei Chekanov (ANL)]] 2016/02/08 10:26//
- +
- Send comments to:  --- //[[[email protected]|Sergei Chekanov (ANL)]] 2014/02/08 10:26//+