User Tools

Site Tools


asc:promc:examples

<< back

ProMC Examples

(written by S.Chekanov, ANL)

ProMC files are compact and self-describing binary files, typically 30-50% smaller than ROOT files or gzipped HEPMC files due to the use of a variable-byte encoding (small numbers use smaller number of bytes). They can be processed in C++, Java, Python and other languages. Look at the examples below which show how to write,read, browser and convert HEPMC files to ProMC files. More information is in the Introduction. Since ProMC is the main format for HepSim, check the description of that database.

File browser

This is very simple example. You can work with ProMC files without installing the ProMC package. You need any Linux/Windows/Mac with installed Java7 (check this as “java -version”, it should show 1.7.X version). Look at a ProMC file using Linux/Mac as:

wget  http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc      # get example file
wget  http://atlaswww.hep.anl.gov/asc/promc/download/browser_promc.jar  # get GUI browser
java -jar browser_promc.jar Pythia8.promc                               # run GUI browser on this file

This will bring up a GUI window so one can look at separate events and the data layout (see the example below).

On Windows: click “browser_promc.jar” and open this file in the browser as: [File]→[Open file]. Read more details in the manual. If the ProMC is installed, simply run promc_browser <file>.promc> command. Read more the manual for details.

One can also view files using the network (http/ftp protocols) without downloading them on the disk:

java -jar browser_promc.jar http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc

Note that the file still will be downloaded into the JVM memory (not on the disk). Look at the Python examples discussed bellow how to use a random access via the network.

Note that the browser can open ProMC files of any size, since it loads only a fraction of data. This is especially useful compared to limitations of some formats that cannot be viewed in text editors if their sizes are too large.

Using zip tools

You can quickly look at the ProMC files without installing the ProMC package as:

unzip -p Pythia8.promc  logfile.txt   
unzip -p Pythia8.promc  promc_nevents 

Here we extract the attached logfile (and print on the screen) and also look at the number of stored events. You can extract any event using a random access using “unzip” or even add additional entries. Look at the ProMC manual.

Reading a ProMC file in C++

After the installation, all examples are located in the directory:

$PROMC/examples 

If you already know that your file include information from a typical parton-shower Monte Carlo model, look at the example:

$PROMC/examples/reader_mc 

that reads a ProMC file. The makefile links “libpromc.a” library which describes the event structure.

If you know that stored data are from a NLO program, looks at:

$PROMC/examples/reader_nlo

that links the “libpronlo.a” library describing a typical NLO record.

ProMC files are self-describing, therefore you can read and write any type of data and generate static libraries from platform-neutral templates. You can generate analysis codes in C++, Java, Python if you happen to have a ProMC file but do not know how the data are organized inside the file. You need to install ProMC.

Let us assume we want to read Monte Carlo events generated for the Snowmass 2013 studies. Such events were processed by the Delphes fast simulations. In addition to truth particle record, thus they have reconstructed jets, muons, photons etc. Here are the steps to read such data in C++/Java/Python:

mkdir test    # create a test directory
cd test
wget http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc
promc_info  Pythia8.promc  # check information about this file
promc_proto Pythia8.promc  # extracts data layouts into the directory "proto"
promc_code                 # creates  C++/Java/Python analysis codes in src/, java/, python/
make                       # compiles C++ code reader.cc
./reader   Pythia8.promc   # runs the C++ analysis code

The command “promc_proto” is important. It generates a platform-neutral layout of stored data. The command “promc_code” generates language-specific source codes in C++ (src/), Java (java/) and Python (python/). One can also use the Java browser to look at detailed information of this file (see later).

Now you can modify the program “reader.cc”. Look at the data layout in “proto/ProMC.proto”. The corresponding C++ source code is given in “src” (see src/ProMC.* files). Look at the language guide protocol-buffers used for such files.

You can also run over this file using Java (without C++ -dependent libraries). Go to the directory “java” and run the example “run.sh”.

cd java
run.sh ../Pythia8.promc

You can modify the example code “ReadProMC.java” and check the available methods in “src/promc/io/ProMC.java”

The command “promc_code” also generates a code example in Python. Go to the directory “python” and run the script:

cd python
python reader.py ../Pythia8.promc

Modify the analysis code as needed. You can look at Python modules in the directory “modules”.

Writing ProMC files

Let us write “fake” Monte Carlo events (created using random numbers) and read them back. Copy the “examples” directory from the installation directory and run the examples. We assume that you have setup PROMC by running “source setup.sh”. See the section Installation

cp -r $PROMC/examples .
cd examples/random/
ln -s $PROMC/proto/promc  proto  # append data-layout files
promc_code                       # create analysis source codes (C++ in src, Java in java/, Python in python/)
make                             # compile the code
./writer                         # write random events to /out. Also it writes TXT files (similar to HEPMC)

The example generates the file “out/output.promc” with “fake” ProMC event records. In addition, it dumps the same information to ASCII file out/output.txt“ for comparison.

It is a good practice to have a directory “proto” with ProtoBuffers files used to create the event records, so later one can generate C++, Java, Python code for reading data stored inside the ProMC files. This makes the output file “self-describing”. To make the file self-describing, we should create a directory “proto” and put ProtoBuffers files used to create event record into this directory. Or, one can also link the existing default “proto” directory. The command “promc_code” generates C+ and Java codes using the existing “proto” directory with data description.

Or even better: One can include the logfile to the ProMC file, if it has the name ”logfile.txt“. This can be important since there is no need for keeping separate log files for the ProMC files. To append log file, just make sure that ”logfile.txt“ is located in the same directory:

./writer > logfile.txt 2>&1         # logfile.txt will be attached to the ProMC file

In this case, “logfile.txt” is created and automatically embedded (and compressed) inside the “out/output.promc” file. You can extract this file later as “promc_log file.promc”.

Now, try to read the “out/output.promc” file as:

./reader  # read the file in out/output.promc

For 1000 events with 5000 particles (using a record similar to HEPMC), the output is

174 K   event.proto     # single event record as ProtoBuf message
105 MB  output.promc    # all events in one ProMC file
325 MB  output.txt      # all events as a text file

The ProMC has a factor 3.1 smaller file output than the TXT. After gzipping the text file, you can compare gzip version and ProMC output:

136 MB  output.txt.gz

so the ProMC has 30% smaller file size (105 MB) than gzip compressed file, indicating improvements in the compression. The improvement in the compression depends on the pT distribution. In the above example, we used exponential random numbers to mimic Px,Py,Pz spectra. If the pT spectrum decay faster (which is normally the case for pileup), the compression will be better. If the moment is distributed using flat distribution, there is no much improvement compared to “gzipped” version of TXT file.

This example still assumes that we store particle masses for each particle. Using built-in map in the header record, we can set masses to 0 for most common particles. This should further bring down the file size (estimated by 10%).

You can access metadata information (description, proto files used to generate event records and logfile) as explained in Introduction section.

Reading data using Java

The above example also generates the code in Java. See the directory “java”. Make sure that Java7 is installed (type java -version). Go to this directory and run:

./run.sh ../out/output.promc

You can modify the example code “ReadProMC.java” and check the available methods in “src/promc/io/ProMC.java”

Reading data using Python

The above example also generates the code in Python. See the directory “python”. Go to this directory and run:

python reader.py ../out/output.promc

You can modify the example code “reader.py” and check the available methods in “modules/”

Filling ROOT tree from ProMC

The example given in the directory “examples/root” shows how to fill ROOT tree from ProMC record. We assume that the file “output.promc” from the previous example was already created. Go to ” examples/root“ and type “make” (ROOT should be installed). Then dump the ProMC record to the ROOT Tree file. We assume that 4-momenta is written as “Double32_t” (written as a 4 bytes floats). The output file will be found “out/output.root”

Writing Pythia8 event record

Here is a small example of how to write Pythia8 event record using the PROMC data format in the directory

$PROMC/examples/pythia

In the Makefile, make sure that you see the ProMC include files need for compilation (it assumes that PROMC variable is set after running setup.sh). You should setup the location of PYTHIA and HEPMC (this example also writes HEPM events for comparison)

-I${PROMC}/include

You will need to link 2 libraries:

 -L${PROMC}/lib -lprotoc -lcbook

An example program which fills ProMC file record is given in Pythia8 to ProMC example

An example which shows how to read such PROMC file can be found in “examples/random/reader.cc”. Starting from Pythi8 version 8180, ProMC is included into the Pythia8 package. See the example “main46.cc” in the directory “examples” of The Pythia8.

ProMC File Browser

You can browser events and other information stored in ProMC files using a browser (implemented in Java and runs on Linux/Windows/Mac). First, get the browser:

wget  http://atlaswww.hep.anl.gov/asc/promc/download/browser_promc.jar

And run it as (it assumes Java7 and above)

java -jar browser_promc.jar

Now we can open a ProMC file. Let's get an example ProMC file which keeps 1,000 events generated by Pythia8. We download this file and run several commands to check what is inside:

wget  http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc

Open this file in the browser as: [File]→[Open file]. Or you can open it using the prompt:

java -jar browser_promc.jar Pythia8.promc

Similarly, read the data through the network (http/ftp):

java -jar browser_promc.jar http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc

This opens the file and shows the metadata (i.e. information stored in the header and statistics records): Note: For Linux/Mac: you can open the browser as:

promc_browser Pythia8.promc

On the left, you will see event numbers. Double click on any number. The browser will display the event record with all stored particles for this event (PID, Status,Px,Py,Pz, etc).

You can access metadata information on stored particle data, such as particle types, PID and masses using the [Metadata]→[Particle data] menu. This information is common for all events ProMC does not store particle names and masses for each event to save space.

If the ProMC file was made “self-describing” and stores templates for data layouts used to generate analysis code, you can open the “Data layout” menu:

You can look at event information (process ID, PDF, alphaS, weight) if you navigate with the mouse to the event number on the left, and click on the right button. You will see a pop-up menu. Select “Event information”.

ProMC vs HepMC

This event record in Pythia8.promc can be always compared with the standard HEPMC file (in gzipped form). Get the equivalent HEPMC file:

wget  http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.hepmc.gz

You can see that the equivalent HEPMC file after the standard compression (gzip) is about 107MB, while the corresponding information can be fit to 29MB of the ProMC record.

Reading and writing in Java

One can read event records generated by ProMC in Java (naitively), without external C++ libraries. Look at the example in “examples/random/java”. Assuming that “output.promc” was already generated inside “examples/random/java/”, you can execute this example as

run.sh  ../output/output.promc  # compiles Java code and reads this file 

This compiles the Java class ReadProMC which reads the “output.promc” file.

Reading data using Jython

You can make histograms on any platform (Windows/Linux/Mac) when using DataMelt. Copy “browser_promc.jar” file from “example/browser/” of the installation directory to the directory “lib/user” of the DatMelt installation. To avoid a clash with the library shipped with DatMelt, remove “promc-protobuf.jar” inside the DatMelt installation directory.

rm   lib/system/promc-protobuf.jar

(in Windows, go to this directory and remove this file) and restart the DatMelt. Then you can write a small Jython script like this:

promc.py
# Reading Pthia8 file in the ProMC format using [[http://jwork.org/dmelt | DataMelt]]
# S.Chekanov (ANL)
from java.io import *
from java.awt import *
from promc.io import *
from proto import *   # import FileMC
from jhplot import *  # import DatMelt graphics
 
file = FileMC("Pythia8.promc")
print "ProMC version=",file.getVersion()
print "Last Modified=",file.getLastModified()
header = file.getHeader()                     # get header file
unit=float(header.getMomentumUnit())
lunit=float(header.getLengthUnit())
print "Momentum unit=",unit
print "Length unit=",lunit
 
for j in range(header.getParticleDataCount()): # look at PDG info stored in the header
  d = header.getParticleData(j)
  pid = d.getId(); mass = d.getMass(); name = d.getName();
  print name, pid, mass
 
h1= H1D("Px",100,0,10)   # create a histogram
print "File size=",file.size()
 
for i in range(file.size()):  # run over all events    
      if (i%100==0): print "Event=",i
      entry = file.read(i)
      p = entry.getParticles()                 # get particles
      for j in range( p.getPxCount() ):
            h1.fill(p.getPx(j)/unit)
c1 = HPlot("Canvas",600,400) # plot histogram
c1.visible()
c1.setAutoRange()
c1.draw(h1)
c1.export("px.pdf")                          # create PDF file             

Now copy a ProMC file :

wget  http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc

Start DatMelt and run this script. It will show the Px spectra for all stored particles. (You can load this file as “dmelt.sh promc.py”)

Random access

You can extract a given record/event using a random access capabilities of this format. Check the example in “examples/random_access”. Type make to compile it and run the code. You can see that we can extract the needed event using the method “event(index)”.

Reading data remotely (with random access)

You can read or stream data from a remote server without downloading files. The easiest is to to use Python reader (see the example in examples/python). Below we show to to read one single event (event=100) remotely using Python:

# Chekanov. Shows how to read event from a remote file with MC events import urllib2, cStringIO, zipfile

file.py
url = "http://mc.hep.anl.gov/asc/snowmass2013/delphes36/TruthRecords/higgs14tev/pythia8/pythia8_higgs_1.promc"
 
try:
    remotezip = urllib2.urlopen(url)
    zipinmemory = cStringIO.StringIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        # print fn
        if fn=="100":
             data = zip.read(fn)
             print "Read event=100"
except urllib2.HTTPError:
       print "no file"

In this example. “data” represents a ProMC event record. Look at the example in the example in examples/python how to print such info.

HEPMC to PROMC converter

A prototype C++ converter exists which converts a HEPMC file into the ProMC file (hepmc2promc, written in C++). You should specify HEPMC directory during the installation as described in Installation. Look also at the example code in examples/hepmc2promc in the installation directory. Assuming that the converter is installed and you run setup.sh script, the syntax for conversion is:

hepmc2promc [input HEPMC file] [Output ProMC file] [Description]

you may have “logfile.txt” in the same directory with the additional information (it will be attached to ProMC automatically). You can also do the opposite: convert a ProMC file to HepMC file with the command “promc2hepmc”. Here is a small example:

wget  http://atlaswww.hep.anl.gov/asc/promc/download/HiggsTTbar.hepmc.gz
gunzip HiggsTTbar.hepmc.gz
hepmc2promc HiggsTTbar.hepmc HiggsTTbar.promc "Higgs plus ttbar at 14 TeV"

This will create the PROMC file HiggsTTbar.promc. To look at the events, run the browser:

java -jar browser_promc.jar HiggsTTbar.promc

Double click the event number (on the left) and you will see how a single event look like.

If you have a problem with converter, just get this PROMC file as:

wget  http://atlaswww.hep.anl.gov/asc/promc/download/HiggsTTbar.promc
java -jar browser_promc.jar HiggsTTbar.promc

STDHEP to PROMC converter

The ProMC package includes STDHEP to PROMC converted. It can be compiled typing “make” in the directory “examples/stdhep2promc”. For ProMC v1.2, this converter is not installed by dafault.

Default data layouts

Structural data in ProMC are represented by platform-neutral layout files using Google's Protocol Buffers template files. The are attached to the ProMC file and cab be extracted using “promc_proto” tool and using ProMC Java browser. The default layout files shipped with the ProMC are designed to store truth MC records and have the following structure:

  1. ProMC record - This is simplest data layout to keep only truth particle information. It is shipped with the default ProMC installation.
  2. ProMC for Delphes - This is a more complicated data layout suitable for data and reconstructed MC. It shows how to include reconstructed objects (jets,photons, muons)
  3. ProMC.proto for Delphes plus jet constituents - This is more complicated layout. It shows how to include clusters (jet constituents) for each jet
  4. ProMC NLO record - This is a data layout to keep NLO calculations

More complex layout files (for example, foe Delphes fast simulation) can be found inside the directory “examples/”

ProMC file manipulation

Read section to learn about ProMC commands. One can dump events, get information, extract and save a smaller number of events into a new files. Example:

promc_info file.promc                        # get info
promc_dump file.promc                        # dump events
promc_extract   file.promc  new.promc 10     # save 10 events in a new file  new.promc
promc_log                                    # extract log file (if attached)

A ProMC file is a simple zip file with ProtoBuffer messages. One can break out a ProMC file into pieces as:

unzip file.promc

Then you can assemble files back using “zip” command. You can also merge, add etc ProMC files.

One can also splits a file into an arbitrary numbers of files. Use promc_split program:

cp -r $PROMC/examples/promc_split
promc_proto file.promc
promc_code 
make
promc_split file.promc 7  # plits the original file into 7 files in the directory out/ with the same number of events 

Accessing data in PHP

You can access entries of PROMC files in PHP. Currently, you can read entries, version and description entry. This is an example. Run it as “php test.php”

<?php
$zip = zip_open("Pythia8.promc");
if ($zip) {
    while ($zip_entry = zip_read($zip)) {
        echo "Name:               " . zip_entry_name($zip_entry) . "\n";
        echo "Actual Filesize:    " . zip_entry_filesize($zip_entry) . "\n";
        echo "Compressed Size:    " . zip_entry_compressedsize($zip_entry) . "\n";
        echo "Compression Method: " . zip_entry_compressionmethod($zip_entry) . "\n";
        if (zip_entry_name($zip_entry) == "promc_description" || zip_entry_name($zip_entry) == "version") {
        if (zip_entry_open($zip, $zip_entry, "r")) {
            echo "File Contents:\n";
            $buf = zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));
            $buf = preg_replace('/[^(\x20-\x7F)]*/','', $buf);
            echo "$buf\n";
 
            zip_entry_close($zip_entry);
        }
        }
        echo "\n";
 
    }
    zip_close($zip);
}
?>

Where and how one can use ProMC

The default version of ProMC has 2 records per event: Event (event information) and “Particles” (truth particle information).

You can also use a modified version which keeps reconstructed objects, such as “Jets”, “Electrons”, “Photons”, “Muons”. ProMC is used for Snowmass2013 to keep Delphes fast simulation files. See the Snowmass web page. However, it is still in a prototype stage.

Sergei Chekanov 2013/03/11 21:45Sergei Chekanov 2013/05/03 12:58

asc/promc/examples.txt · Last modified: 2015/05/10 21:08 by asc