<< back to HepSim manual

Linking event storage

Monte Carlo files from any location can be published on HepSim. HepSim is not a storage, but a catalog of files stored in multiple URL locations. It is your responsibility to maintain Monte Carlo data visible from HepSim. HepSim tools allow:

  • to index files so they will be available from HepSim;
  • to make mirrors of data on a server close to your location for faster download;

If the dataset is new, you will see a new entry in the HepSim database. Your authorship will be preserved on the HepSim web page. If you have made a mirror of the existing data, your server will be added as a mirror for a given dataset.

If you created Monte Carlo files, you can publish them using hepsim#petrel project, and link such files to HepSim after you index them as explained below. In this case you do not need to maintain a web server.

What do you need

If you decide to publish Monte Carlo files (or make a mirror of the existing files), you need the following:

  • A web server that can hold your data (>1 TB for /var/www/html is recommended). It is advisable to create a separate partition with RAID10, say, /data, and link this directory to /var/www/html/.
  • Linux OS (any flavor)
  • Apache2 with PHP module that can serve/var/www/html/
  • Java 8 JDK or JRE (optionally, for tools to check data)

You there is no web server, one can add files to the “petrel#hepsim” project using Globus. See the description later.

The root directory of a typical HepSim repository has the name “events”. This directory should be served by Apache, i.e. it should be visible from /var/www/html. You will need to make the directory structure such as:

  |-events               # HepSim root directory
  |  |-pp                # process type: pp, ee, ep, misc (single particles)
  |  |  |-14tev          # CM energy. Can be 100tev, 250gev, 500gev
  |  |  |  |-qcd_pythia8 # this directory contains ProMC files for EVGEN
  |  |  |  |  |-rfast001 # subdirectory with ROOT/Delphes files
  |  |  |  |  |-rfull001 # subdirectory with SLCIO files
  |  |  |  |  |-macros   # files for truth validation (optional)

(it shows a data sample for 14 TeV pp with the name “qcd_pythia8”). The directory “events/pp/14tev/qcd_pythia8” should contain truth-level ProMC files. The sub-directories “rfast001” contains fast simulation files (tag “rfast001”), while “rfull001” should contain LCIO files with full simulation. The directory “macros” contains macro files written in Jython for validation, as well as images (optional).

After populating the directory “events/pp/14tev/qcd_pythia8” with ProMC files, you will need to index all files (including reconstruction tags). Install the needed packages as explained below. Download the example from hepsim_web.tgz Assuming that you are in the directory above the directory “events”, run these commands:

wget https://atlaswww.hep.anl.gov/hepsim/doc/lib/exe/fetch.php?media=hepsim:hepsim_web.tgz -O hepsim_web.tgz
tar -zvxf hepsim_web.tgz
cd hepsim_web/web_post
wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz;
source hs-toolkit/setup.sh # this checks if Java is installed

This package has 2 directories:

  • events directory should be served by HTTP Apache. You can move it to /var/www/html location.
  • web_post directory should be outside the Apache area. This directory is used to index your files in the directory “events”

This package has 2 example files in “events/ee/250gev/pythia6_higgs_gamgam” to illustrate how the indexing works.

Now, index all ProMC/ROOT/SLCIO files located in “events”. Open the script “A_RUN_hepsim.sh” and specify the location of the “events” directory. Then run “bash ./A_RUN_hepsim.sh”. It will process the directory “events/”. You can edit the script to change the indexed directory to ““events/pp/100tev/”. After indexing, several new files should appear, such as “files.zip”, “metadata.txt” and “dirs.idx”.

To make sure that everything works, look with the browser in the HTTP link “http://yourserver/events/”. You should see your files and the directory structure. Note that what you see is not what you see in HepSim, which does not use the “index.php” file.

Now you can:

  • populate the directory tree with your files (for a new dataset). ProMC is used for EVGEN files, abd SLCIO/ROOT files for directories “rfullXXX/rfastXXX”, where XXX is a tag number.
  • copy existing files from the HepSim and put them to the correct directory (will be mirror)

In both cases, make sure that Linux system administrator set the correct permission for the directory “events”, so you can copy the files. The directory for indexing should be inside your private directory outside the Web area.

If you have indexed the files successfully, send a request to “[email protected]” (or [email protected]) to include your repository to HepSim database. Please include a short description of your files and your name (it will be shown on the web page).

How to mirror entire dataset

You can mirror an entire dataset by copying HepSim files to your mirror web server (or local computer). You can use the command “hs-mirror” from the “hs-toolkit” package. For example, if you want to mirror a dataset with known URL, use this example:

wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz;
source hs-toolkit/setup.sh # setup HepSim programs
# now copy a dataset from URL to a new location
SOURCE=https://cepcgit.ihep.ac.cn:81/hepsim/events/ee/250gev/pythia6_higgs_gamgam_test
OUTPUT_DIR=/var/www/html/
hs-mirror -i $SOURCE -o $OUTPUT_DIR

This example creates the directory ”/var/www/html/events/ee/250gev/pythia6_higgs_gamgam_test“ and copies all files from the URL. The URL can be found using the HepSim web page.

If you want to download only EVGEN files, without reconstructed events, use this command:

hs-mirror -i $SOURCE -o $OUTPUT_DIR -t evgen

Generally, you do not need to index files in the mirror directory.

How to index directories

If you have made new files for HepSim (and added new file to the mirrored directory), you will need to index all files Assume your files are located inside the directory “events/ep/pythi8_dis”. Then run “hs-index” script:

wget https://atlaswww.hep.anl.gov/hepsim/soft/hs-toolkit.tgz -O - | tar -xz;
source hs-toolkit/setup.sh # setup HepSim programs
hs-index events/ep/pythi8_dis

If you need to re-index all subdirectories inside the directory “events”, pass the upper directory to this script:

hs-index events

This script creates:

  • “files.zip” - with the list of files
  • “metadata.txt” - with some information from ProMC files
  • “dirs.txt” - a list with subdirectories

Using Petrel from Globus

Monte Carlo files can be added to the “petrel#hepsim” project (https://press3.mcs.anl.gov/petrel/) on the grid Globus, so hey can be accessed via the grid and https for public view. In this case there is no need to run a private web server. The description of Petrel can be found in https://press3.mcs.anl.gov/petrel/documentation/. You will need a Globus account to be able to store files on Petrel.

You can run your personal Petrel projects (with enabled https) with the files with HepSim-supported files, or you can add your files to the existing petrel#hepsim project. The directory with Monte Carlo files should be properly indexed as described above, and then you can copy such directory to Petrel Web-based storage using the transfer tools.

A typical workflow is following:

  • create directory with MC files on OSG or other resources
  • index files using the tools described above
  • Move the directory with the files to a petrel project (or ask to be included to petrel#hepsim)
  • Send a URL location of your files to [email protected]

Summary

When everything is done, this is what you should expect:

  • Your files will be visible on the main HepSim web page and on the current mirrors (takes ~ few days to propagate the changes)
  • You can search your files etc. using the main HepSim web page
  • You can search, list and download files using hs-toolkit commands (hs-find, hs-ls, hs-get etc)
  • Since your files are public, there is good chance that somebody will make a mirror to reduce the load on your server
  • You server will be listed on the page statistics summary

Sergei Chekanov 2016/04/28 21:31Sergei Chekanov 2016/10/24 21:31