hepsim:quick [2024/07/01 21:25] (current) – created - external edit
 +[[:|<< back to HepSim manual]]
 +====== Quick start ======
 +Install the HepSim software toolkit using the "bash" shell on Linux/Mac:
 +<code bash>      
 +bash   #  set bash if you haven't done this before 
 +wget -O - | tar -xz;
 +source hs-toolkit/
 +This creates the directory "hs-toolkit" with HepSim commands. You can also download it as [[ | hs-toolkit.tgz]]. Note that
 +[[|Java 8]] and above should be installed. 
 +You can view the commands using the bash shell by typing :
 +<code bash>      
 +bash> hs-help
 +The directory contains several bash scripts for Linux/Mac, and Windows batch (BAT) files to process events on Windows OS.
 +The package is used for download, view and analyze truth-level events in the [[ | PROMC]] or [[ | PROIO]] format.  
 +<note tip>
 +Use [[ | JAS4PP program]] for analysing LCIO 
 +(*.lcio) files with Geant4 simulations.
 +This program can also be used for truth-level records in the [[|PROMC]], [[| PROIO]] and [[|ROOT]] file formats. You can also use Delphes/ROOT framework for ROOT files. 
 +====== Finding data files ======
 +Let us show how to find the files associated with a given Monte Carlo event sample. 
 +Go to [[|HepSim database]] and find the "Files" column.
 +It shows URL of truth-level files ("EVGEN"), i.e. files directly created by event generators.  
 +You can use the command line tool to list files associated with a dataset as:  
 +<code bash>      
 +hs-ls [name] 
 +where [name] is the dataset name. One can also use the URL of the Info page instead,  
 +or the URL of the location of all files. This command shows a table with file names.
 +Here is an example illustrating how to list all files from the 
 +[[|Higgs to ttbar]] Monte Carlo sample:
 +<code bash>      
 +hs-ls tev100pp_higgs_ttbar_mg5 
 +("tev" defines the energy unit, "pp" means pp collisions, "mg5" means MadGraph5).
 +Similarly, one can use the download URL:
 +<code bash>
 +Note that in this approach, one can use URL mirrors close to your geographical location.
 +If you need to create list for downloads with all files, use this syntax:
 +<code bash>
 +hs-ls [name] simple     > input.list     # make list of ProMC files (without URL path)
 +hs-ls [name] simple-url > input_url.list # make list with URL from the main server
 +where [name] is the name of dataset. You can also use a URL if you want to create a list of files from certain (mirror) servers.
 +====== Searching for datasets ======
 +The best method to find the needed sample is to use the web page with 
 +[[|database search]].
 +Enter "rfull" in the search field, and you will see all samples with full simulation taggs. Enter "rfast", and you will
 +see samples with fast simulations. If you search for "higgs", you will see the list of samples
 +for Higgs. Also you can use complex searches, such as "pythia%rfast001" (Pythia sample after a fast simulation with the tag 001). 
 +You can also use convenient URL links that search for some datasets. Here are a few examples: 
 +  * [[]] list MC after full simulations
 +  * [[]] lists MC after fast simulations
 +  * [[]] - lists all Madgraph5 samples
 +  * [[]] - lists Higgs samples after fast simulation
 +If you prefer to use the command-line approach, you can find URL that corresponds a dataset using this command:
 +hs-find [search word] 
 +The search is performed  using  names of datasets or Monte Carlo models, or in the file description.  
 +For example, to find all URL locations that correspond to simulated samples with Higgs, try this:
 +hs-find higgs 
 +If you are interested in a specific reconstruction tag, use "%" to separate the search string and the tag name.
 +hs-find pythia%rfast001 
 +It will search for Pythia samples after a fast detector simulation with the tag "001". To search for a full detector simulation, replace
 +"rfast" with "rfull".
 +====== Downloading EVGEN files =====
 +One can download all EVGEN files for a given dataset as:
 +hs-get [name] [OUTPUT_DIR]
 +where [name] is the dataset name. This also can be the URL of the Info page, 
 +or a direct URL pointing to the locations of ProMC files. 
 +This example downloads all files from the "tev100pp_higgs_ttbar_mg5" dataset 
 +to the directory "data":
 +hs-get tev100pp_higgs_ttbar_mg5 data
 +Alternatively, this example downloads the same files using the URL of the Info page: 
 +hs-get data 
 +Or, if you know the download URL, use it:
 +hs-get data
 +All these examples will download all files from the "tev100pp_higgs_ttbar_mg5" event sample.
 +One can add an integer value to the end of this command which specifies the number of threads.
 +If a second integer is given, it will be used to set the maximum number of files to download.
 +This example will download 10 files in 3 thread from the dataset "tev100pp_higgs_ttbar_mg5".
 +hs-get tev100pp_higgs_ttbar_mg5 data 3 10                       
 +One can also download files that have certain pattern in their names. If URL contains files generated with different pT cuts, 
 +the names are usually have the string “pt”, followed by the pT cut. In this case, one can download such files as:
 +hs-get tev13pp_higgs_pythia8_ptbins  data 3 10 pt100_
 +where the name is [[|tev13pp_higgs_pythia8_ptbins]]. 
 +The command download files to the "data" directory in 2 threads. The maximum number of download files is 5 and all file names have *pt100* string (i.e. pT>100 GeV).
 +The general usage of the **hs-get** command requires 2, 3, 4 or 5 arguments:
 +hs-get [URL] [OUTPUT_DIR] [Nr of threads (optional)] [Nr of files (optional)] [pattern (optional)]
 +where [URL] is either info URL, [Download URL], or the dataset name.
 +====== Downloading RECO files ======
 +Many datasets contain data files after Geant4 detector simulation and reconstruction ("RECO").
 +The files are in "LCIO" format. They include complete tracks, hits, calorimeter clusters etc.
 +Reconstructed files are stored inside the directories with the tag "rfastNNN" (Delphes fast simulation) or "rfullNNN" (full simulation),
 +where "NNN" is a version number. 
 +You can identify detector geometries that correspond to the tags using  [[|detector description page]].
 +For example, [[|tev100pp_ttbar_mg5]] 
 +sample includes the link "rfast001" (Delphes fast simulation, version 001). To download the reconstructed events for the reconstruction tag "rfast001", use this syntax: 
 +<code bash>
 +hs-ls  tev100pp_ttbar_mg5%rfast001      # list ROOT files with the tag "rfast001" 
 +hs-get tev100pp_ttbar_mg5%rfast001 data # download to the "data" directory 
 +The symbol "%" separates the dataset name ("tev100pp_ttbar_mg5") from the reconstruction tag ("rfast001").
 +You can skip "data" in the second example - in this case, data will be copied to the directory "tev100pp_ttbar_mg5%rfast001".
 +One can also download files in several threads. If you want to download 10 files in 3 threads, run:
 +<code bash>
 +hs-get  tev100pp_ttbar_mg5%rfast001 data 3 10
 +As before, one can also download the files using the URL:
 +<code bash>
 +hs-ls # list all files
 +hs-get data
 +====== File validation ======
 +One can check  file consistency and print additional information as:
 +<code bash> 
 +The last argument can be a file location on the local disk (works faster than URL!). 
 +The output of the above command will look something like this:
 +   ProMC version = 2
 +   Last modified = 2013-06-05 16:32:18
 +   Description   = PYTHIA8;PhaseSpace:mHatMin = 20;PhaseSpace:pTHatMin = 20;
 +                ParticleDecays:limitTau0 = on;
 +                ParticleDecays:tau0Max = 10;HiggsSM:all = on;
 +   Events        = 10000
 +   Sigma    (pb) = 2.72474E1 ± 1.92589E-1
 +   Lumi   (pb-1) = 3.67007E2
 +   Varint units  = E:100000 L:1000
 +   Log file:     = logfile.txt
 +   The file was validated. Exit.
 +You can see that this file includes a complete logfile ("logfile.txt"). We will explain how to extract it later. 
 +====== Looking at separate events ======
 +One can print separate EVGEN events using the above command after passing an integer argument that specifies
 +the event to be looked at. This command prints the event number "100":
 +hs-info 100 
 +More conveniently, one can open the file in a GUI mode to look at all events: 
 +hs-view [promc file]              
 +This command brings up a GUI window to look at separate events. You should forward X11 to see the GUI. For Windows: download the file [[|hepsim.jar]] and  click on it. Then open the file as [File]-[Open file].
 +<note tip>If you use Windows OS, click "hs-view.bat" and open ProMC file using the File Menu.</note>
 +You can also view EVGEN events without downloading files. Simply pass a URL to the above command and you will stream Monte Carlo events:
 +Here we looked at one file of [[|Pythia8 (QCD) sample]]. 
 +Files with NLO predictions will be automatically identified: For such files, you will see a few particles per events and weights for calculations of PDF uncertainties. 
 +====== Monte Carlo logfile ======
 +Each ProMC/ProIO file includes a logfile from the Monte Carlo generator. Show this file on the screen as:
 +hs-log [file]
 +where [file] is either a ProMC or ProIO file (you can use URL instead of the full path on the local computer).
 +In the case of ProMC files, one can use the standard Linux commands, such as "unzip": 
 +unzip -p [promc file]  logfile.txt
 +where [promc file] is the file name. This command extracts a logfile with original generator-level information.
 +The next command shows the actual number of stored events:
 +unzip -p [promc file]  promc_nevents
 +This command lists the stored events (each event is a ProtoBuffer binary file):
 +unzip -l [promc file]
 +====== Pileup mixing ======
 +One can mix events from a signal ProMC file with 
 +inelastic (minbias) events using  the  "pileup mixing" command: 
 +hs-pileup pN signal.promc minbias.promc output.promc
 +Here "p" indicates that events from "minbias.promc" will be mixed 
 +with every event from "signal.promc" using a Poisson distribution with the mean "N".
 +If "p" before N is not given, then exactly N (random) events from minbias.promc will be added  to every event from "signal.promc".
 +Use large numbers of events in "minbias.promc" to minimise reuse of the same events.
 +The barcode of particles inside "output.promc" indicates the event origin (0 is set for particles from "signal.promc").
 +====== Analysing EVNT files ======
 +One can analyse Monte Carlo events on Window, Linux and Mac with [[|Java7/8]].  
 +Many HepSim samples include *.py scripts to calculate differential cross sections. One can run
 +validation scripts from the Web using [[|Java Web Start]]. 
 +Also, one can run scripts using a desktop and streaming  data via the network, 
 +or using downloaded files (in which case you pass the directory with *promc files as an argument).
 +Here are a few approaches showing how to read *.py scripts:
 +===== Using Java Web start =====
 +Many "Info" pages of HepSim have Jython (Python) scripts for validation and analysis. 
 +One can run such scripts from the web browsers 
 +using the [[|Java Web Start]] technology.  
 +Click the "Launch" button.  You will see an editor. Then click the "Run" button to process events. 
 +To use [[|Java Web Start]], you should configure
 +Java permissions:   
 +For Linux/Mac, run "ControlPanel", go to the "Security" tab and add "" to the exception list.
 +For Windows, find "Java Control Panel" and do the same. 
 +Read [[|Why are Java applications blocked]] by your security settings.
 +In addition, if you are a Mac user, you should allow execution of programs [[|outside Mac App Store]]. 
 +===== Using stand-alone Python =====
 +In this example, we will run a Python (to be more exact, Jython) script and, at the same time, will stream data from the web.
 +Find a HepSim event sample by clicking the info "Info" column. 
 +For example, look at a ttbar sample from Madgraph: [[|ttbar_mg5]].  
 +Find the URL of the analysis script ("") located at the bottom. Copy it to some foulder. Or use "wget":
 +<code bash>
 +Then process this code in a batch mode as:
 +If you do not want a pop-up canvas with the output histogram, change the line 71 to "c1.visible(0)" (or "c1.visible(False)")
 +and add "sys.exit(0)" at the very end of the "" macro.
 +One can view, edit and run the analysis file using a simple GUI editor. 
 +It opens this file for editing. One can run it by clicking on the "run" button.
 +It also provides an interactive Jython shell.
 +<note tip>If you use Windows OS, click the file "hs-ide.bat" and open the Python script using the menu, and then run this script using the"Run" button.</note>
 +When possible, use the downloaded ProMC files, rather than streaming the data over the network.
 +The calculations will run faster since the program  does calculations using local files. 
 +Let assume that we put all ProMC files to the directory "data". Then run the script as: 
 +hs-run data
 +Here is a complete example: we download data to the directory "ttbar_mg5",
 +then we download the analysis script, and then we run this script over the local data using 10000 events:
 +hs-get ttbar_mg5
 +hs-run ttbar_mg5 10000
 +===== Using a Java IDE =====
 +The above example has some limitations since it uses rather simple editor. 
 +Another approach is to use the full-featured [[|Jas4pp]] or   
 +[[|DataMelt]] programs which give more flexibility.
 +<code bash>
 +wget -O;
 +You can also pass URL with data as an argument and limit the calculation to 10000 events:
 +<code bash>
 +./dmelt/ 10000 
 +As before, use the batch mode using downloaded ProMC files.
 +Let assume that we put all ProMC files to the directory "data". Then run [[|DataMelt]] over the data as:
 +<code bash>
 +./dmelt/ data 
 +Here is a complete example: we download data to the directory "ttbar_mg5", 
 +then we download the analysis script, and then we run this script over the local data using 10000 events:
 +hs-get ttbar_mg5
 +./dmelt/ ttbar_mg5 10000 
 +Then click "run" (or [F8]). 
 +One can also start  [[|DataMelt]] without input files: 
 +on Linux/Mac. On Windows, run "dmelt.bat" instead. You will see the DatMelt IDE.
 +Locate  an URL location of the analysis script, such as [[|ttbar_mg5]] (can be found under "Info" link). Then copy this link using the right mouse button ("Copy URL Location"). 
 +Next, in the DMelt menu, go to "File"→"Read script from URL". Copy the URL link of the *.py file to the pop-up DataMelt 
 +URL dialog and click "run". The program will start reading data from the Web. At the end of the run, you will see  a pop-up
 +window with a histogram.
 +This method works for Python/Jython, Java, Ruby, Groovy, BeanShell languages. 
 +===== Using C++/ROOT =====
 +Install [[|ProMC]] and [[|ROOT]]. Make sure that the environmental variables $PROMC and $ROOTSYS are set correctly. Then look at the examples:
 +<code bash>
 +$PROMC/examples/reader_mc/  # shows how to read ProMC files from a typical Monte Carlo generator
 +$PROMC/examples/reader_nlo/ # shows how to read ProMC files with NLO calculations (i.e. MCFM)
 +$PROMC/examples/promc2root/ # shows how to read PROMC files and create ROOT Tree. 
 +For C++/ROOT, you should download files <i>a priory</i> since the streaming over the network is not supported.
 +There is a simple example showing how to read multiple Monte Carlo files from HepSim,
 +build anti-KT jets using FastJet, and fill ROOT histograms. Download 
 +[[|hepsim-cpp package]]  and compile it:   
 +wget -O - | tar -xz;
 +cd hepsim-cpp/; make 
 +Read the file "README" inside this directory.
 +====== Analyzing RECO data ======
 +HepSim includes data after fast and full detector simulations. There are several methods to analyse such files.
 +===== Analyzing Delphes ROOT files (fast simulation) =====
 +The Delphes ROOT files are typically posted using the reconstruction tag "rfast[XXX]", where "[XXX]" is a number.
 +Use search to find such samples. Read the [[|Delphes documentation]]
 +about how to read Delphes ROOT files. 
 +You can find all samples that contain fast simulations using [[|this link]]. 
 +===== Full simulation: LCIO files =====
 +Events afer detector simulation and reconstruction ("RECO") are posted under the tag "rfull[XXX]", where "[XXX]" is a number.
 +We use [[|LCIO]] file format that is readable by C++, Fortran and Java.
 +Such files have an extension "slcio". You can analyse the LCIO files using 
 +[[|Jas4pp program]] that allows
 +you to read files using the Python syntax. 
 +If you need to read LCIO files in C++ code with ROOT/FastJet, use the example package [[]].
 +You can find all samples that contain full simulations using [[|this link]].
 +===== Conversion to ROOT, HEPMC, HEPEVT, LHE, STDHEP, LCIO =====
 +One can convert ProMC file to ROOT to look at branches. If the ProMC package is installed, run the converter:
 +<code bash>
 +cp -rf $PROMC/examples/promc2root .
 +cd promc2root
 +./promc2root [promc file] output.root
 +The output file will contain ROOT branches with px,py,pz,e, etc.
 +One can also convert ProMC to HEPMC using the example 
 +(see the ProMC manual). In addition, the directory 
 +has examples  showing how to convert
 +ProMC to HEPEVT records (promc2hepevnt), STDHEP (promc2stdhep), LHE (promc2lhe) and LCIO (promc2lcio). 
 +====== Anomaly detection with HepSim ======
 +You can use [[|ADFilter]] to process PROMC and DELPHES files to check how anomalous your events are.
 +In addition, [[|ADFilter]] can convert LHE files (extension *.lhe.gz) into PROMC files using the web interface.
 +====== Comparing with HepData ======
 +Durham [[|HepData]] database maintains "DMelt" scripts compatible 
 +with [[|HepSim]] analysis scripts, thus it is relatively easy to  overlay Monte Carlo predictions and data from published articles.
 +For example, look at the link  [[|AAD 2013]] from [[|HepData]] and  
 +download a "DMelt" Jython script with published data. 
 +You can run this script inside  DMelt IDE, or using the "hs-run" and "hs-ide" 
 +commands from hs-tools.
 +Then one can combine this script with a HepSim script
 +(from the "Info" description) 
 +that runs over HepSim Monte Carlo data, 
 +creating plots showing agreement between data and theoretical calculations.
 +--- //[[[email protected]|Sergei Chekanov]] 2017/02/06 17:25//
