[[asc:promc|<< back]]
====== ProMC Examples ======
(written by S.Chekanov, ANL)
ProMC files are compact and self-describing binary files, typically 30-50% smaller than ROOT files or gzipped HEPMC files due to the use of a variable-byte encoding (small numbers use smaller number of bytes). They can be processed in C++, Java, Python and other languages. Look at the examples below which show how to write,read, browser and convert HEPMC files
to ProMC files. More information is in the [[asc:promc:introduction|Introduction]]. Since ProMC is the main format for [[http://atlaswww.hep.anl.gov/hepsim/| HepSim]], check the description of that database.
==== File browser ====
This is very simple example. You can work with ProMC files without installing the ProMC package.
You need any Linux/Windows/Mac with installed Java7 (check this as "**java -version**", it should show 1.7.X version). Look at a ProMC file using Linux/Mac as:
wget http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc # get example file
wget http://atlaswww.hep.anl.gov/asc/promc/download/browser_promc.jar # get GUI browser
java -jar browser_promc.jar Pythia8.promc # run GUI browser on this file
This will bring up a GUI window so one can look at separate events and the data layout (see the example below).
//On Windows:// click "browser_promc.jar" and open this file in the browser as: [File]->[Open file]. Read more details in the manual.
If the ProMC is installed, simply run **promc_browser .promc>** command. Read more the manual for details.
One can also view files using the network (http/ftp protocols) without downloading them on the disk:
java -jar browser_promc.jar http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc
Note that the file still will be downloaded into the JVM memory (not on the disk).
Look at the Python examples discussed bellow how to use a random access via the network.
Note that the browser can open ProMC files of any size, since it loads only a fraction of data. This is especially useful compared to limitations
of some formats that cannot be viewed in text editors if their sizes are too large.
==== Using zip tools ====
You can quickly look at the ProMC files without installing the ProMC package as:
unzip -p Pythia8.promc logfile.txt
unzip -p Pythia8.promc promc_nevents
Here we extract the attached logfile (and print on the screen) and also look at the number of stored events.
You can extract any event using a random access using "unzip" or even add additional entries. Look at the [[https://atlaswww.hep.anl.gov/asc/wikidoc/doku.php?id=asc:promc:introduction#using_zip_to_extract_events| ProMC manual]].
===== Reading a ProMC file in C++ =====
After the installation, all examples are located in the directory:
$PROMC/examples
If you already know that your file include information from a typical parton-shower Monte Carlo model, look at the example:
$PROMC/examples/reader_mc
that reads a ProMC file. The makefile links "libpromc.a" library which describes the event structure.
If you know that stored data are from a NLO program, looks at:
$PROMC/examples/reader_nlo
that links the "libpronlo.a" library describing a typical NLO record.
ProMC files are self-describing, therefore you can read and write any type of data and generate static libraries from platform-neutral templates.
You can generate analysis codes in C++, Java, Python
if you happen to have a ProMC file but do not know how the data are organized inside the file. You need to install [[asc:promc:installation|ProMC]].
Let us assume we want to read Monte Carlo events generated for the [[http://mc.hep.anl.gov/asc/snowmass2013/delphes36/ | Snowmass 2013 studies]]. Such events were processed by the Delphes fast simulations. In addition to truth particle record, thus they have reconstructed jets, muons, photons etc.
Here are the steps to read such data in C++/Java/Python:
mkdir test # create a test directory
cd test
wget http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc
promc_info Pythia8.promc # check information about this file
promc_proto Pythia8.promc # extracts data layouts into the directory "proto"
promc_code # creates C++/Java/Python analysis codes in src/, java/, python/
make # compiles C++ code reader.cc
./reader Pythia8.promc # runs the C++ analysis code
The command "**promc_proto**" is important. It generates a platform-neutral layout of stored data. The command "**promc_code**" generates language-specific source codes in C++ (src/), Java (java/) and Python (python/). One can also use the Java browser to look at detailed information of this file (see later).
Now you can modify the program "reader.cc". Look at the data layout in "proto/ProMC.proto".
The corresponding C++ source code is given in "src" (see src/ProMC.* files). Look at the language guide
[[https://developers.google.com/protocol-buffers/docs/proto| protocol-buffers]] used for such files.
You can also run over this file using Java (without C++ -dependent libraries). Go to the directory "java" and run the example "run.sh".
cd java
run.sh ../Pythia8.promc
You can modify the example code "ReadProMC.java" and check the available methods in "src/promc/io/ProMC.java"
The command "**promc_code**" also generates a code example in Python. Go to the directory "python"
and run the script:
cd python
python reader.py ../Pythia8.promc
Modify the analysis code as needed. You can look at Python modules in the directory "modules".
===== Writing ProMC files =====
Let us write "fake" Monte Carlo events (created using random numbers) and read them back.
Copy the "examples" directory from the installation directory and run the examples.
We assume that you have setup PROMC by running "source setup.sh". See the section [[asc:promc:installation|Installation]]
cp -r $PROMC/examples .
cd examples/random/
ln -s $PROMC/proto/promc proto # append data-layout files
promc_code # create analysis source codes (C++ in src, Java in java/, Python in python/)
make # compile the code
./writer # write random events to /out. Also it writes TXT files (similar to HEPMC)
The example generates the file "out/output.promc" with "fake" ProMC event records.
In addition, it dumps the same information to ASCII file out/output.txt" for comparison.
It is a good practice to have a directory "proto" with ProtoBuffers files
used to create the event records, so later one can generate C++, Java, Python code
for reading data stored inside the ProMC files.
This makes the output file "self-describing". To make the file self-describing,
we should create a directory "proto" and put ProtoBuffers files used to create event record into this directory. Or, one can also link the existing default "proto" directory. The command "promc_code" generates C+ and Java codes using the existing "proto" directory with data description.
Or even better: One can include the logfile to the ProMC file, if it has the name "//logfile.txt//". This can be important since there is no need for keeping separate log files for the ProMC files. To append log file, just make sure that "//logfile.txt//" is located in the same directory:
./writer > logfile.txt 2>&1 # logfile.txt will be attached to the ProMC file
In this case, "logfile.txt" is created and automatically embedded (and compressed) inside the "out/output.promc" file.
You can extract this file later as "promc_log file.promc".
Now, try to read the "out/output.promc" file as:
./reader # read the file in out/output.promc
For 1000 events with 5000 particles (using a record similar to HEPMC), the output is
174 K event.proto # single event record as ProtoBuf message
105 MB output.promc # all events in one ProMC file
325 MB output.txt # all events as a text file
The ProMC has a factor 3.1 smaller file output than the TXT. After gzipping the text file, you can compare gzip version and ProMC output:
136 MB output.txt.gz
so the ProMC has 30% smaller file size (105 MB) than gzip compressed file, indicating improvements in the compression.
The improvement in the compression depends on the pT distribution. In the above example, we used exponential random numbers to mimic Px,Py,Pz spectra.
If the pT spectrum decay faster (which is normally the case for pileup), the compression will be better. If the moment is distributed using flat distribution, there is no much improvement compared to "gzipped" version of TXT file.
This example still assumes that we store particle masses for each particle. Using built-in map in the header record, we can set masses to 0 for most common particles. This should further bring down the file size (estimated by 10%).
You can access metadata information (description, proto files used to generate event records and logfile) as explained in [[asc:promc:introduction|Introduction]] section.
==== Reading data using Java ====
The above example also generates the code in Java. See the directory "java".
Make sure that Java7 is installed (type //java -version//).
Go to this directory and run:
./run.sh ../out/output.promc
You can modify the example code “ReadProMC.java” and check the available methods in “src/promc/io/ProMC.java”
==== Reading data using Python ====
The above example also generates the code in Python. See the directory "python".
Go to this directory and run:
python reader.py ../out/output.promc
You can modify the example code “reader.py” and check the available methods in “modules/”
===== Filling ROOT tree from ProMC =====
The example given in the directory "examples/root" shows how to fill ROOT tree from ProMC record. We assume that the file "output.promc" from the previous example was already created. Go to " examples/root" and type "make" (ROOT should be installed). Then dump the ProMC record to the ROOT Tree file. We assume that 4-momenta is written as "Double32_t" (written as a 4 bytes floats). The output file will be found "out/output.root"
===== Writing Pythia8 event record =====
Here is a small example of how to write Pythia8 event record using the PROMC data format
in the directory
$PROMC/examples/pythia
In the Makefile, make sure that you see the ProMC include files need for compilation (it assumes
that PROMC variable is set after running setup.sh). You should setup the location of PYTHIA and
HEPMC (this example also writes HEPM events for comparison)
-I${PROMC}/include
You will need to link 2 libraries:
-L${PROMC}/lib -lprotoc -lcbook
An example program which fills ProMC file record is given in
[[https://github.com/Argonne-National-Laboratory/ProMC/blob/master/examples/pythia/|Pythia8 to ProMC example]]
An example which shows how to read such PROMC file can be found in "examples/random/reader.cc".
Starting from Pythi8 version 8180, ProMC is included into the Pythia8 package. See the example "main46.cc" in the directory "examples" of The Pythia8.
===== ProMC File Browser=====
You can browser events and other information stored in ProMC files using a browser
(implemented in Java and runs on Linux/Windows/Mac).
First, get the browser:
wget http://atlaswww.hep.anl.gov/asc/promc/download/browser_promc.jar
And run it as (it assumes Java7 and above)
java -jar browser_promc.jar
Now we can open a ProMC file. Let's get an example ProMC file which keeps 1,000 events generated by Pythia8. We download this file and run several commands to check what is inside:
wget http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc
Open this file in the browser as: [File]->[Open file]. Or you can open it using the prompt:
java -jar browser_promc.jar Pythia8.promc
Similarly, read the data through the network (http/ftp):
java -jar browser_promc.jar http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc
This opens the file and shows the metadata (i.e. information stored in the header and statistics records):
Note: For Linux/Mac: you can open the browser as:
promc_browser Pythia8.promc
{{:asc:promc:screenshot_from_2013-05-11_21_43_06.png}}
On the left, you will see event numbers. Double click on any number. The browser will display the event record with all stored particles for this event (PID, Status,Px,Py,Pz, etc).
{{:asc:promc:screenshot_from_2013-05-11_21_43_55.png}}
You can access metadata information on stored particle data, such as particle types, PID and masses using the [Metadata]->[Particle data] menu. This information is common for all events
ProMC does not store particle names and masses for each event to save space.
{{:asc:promc:screenshot_from_2013-05-11_21_43_34.png}}
If the ProMC file was made "self-describing" and stores templates for data layouts used to generate analysis
code, you can open the "Data layout" menu:
{{:asc:promc:screenshot_from_2013-05-11_21_44_10.png}}
You can look at event information (process ID, PDF, alphaS, weight) if you navigate with the mouse to the event number on the left, and click on the right button. You will see a pop-up menu. Select "Event information".
===== ProMC vs HepMC =====
This event record in Pythia8.promc can be always compared with the standard HEPMC file (in gzipped form). Get the equivalent HEPMC file:
wget http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.hepmc.gz
You can see that the equivalent HEPMC file after the standard compression (gzip) is about 107MB, while the corresponding information can be fit to 29MB of the ProMC record.
===== Reading and writing in Java =====
One can read event records generated by ProMC in Java (naitively), without external C++ libraries.
Look at the example in "examples/random/java". Assuming that "output.promc" was already generated inside "examples/random/java/", you can execute this example as
run.sh ../output/output.promc # compiles Java code and reads this file
This compiles the Java class ReadProMC which reads the "output.promc" file.
===== Reading data using Jython =====
You can make histograms on any platform (Windows/Linux/Mac) when using [[http://jwork.org/dmelt/|DataMelt]]. Copy "browser_promc.jar" file from
"example/browser/" of the installation directory to the directory "lib/user" of the DatMelt installation.
To avoid a clash with the library shipped with DatMelt, remove "promc-protobuf.jar" inside the DatMelt installation directory.
rm lib/system/promc-protobuf.jar
(in Windows, go to this directory and remove this file) and restart the DatMelt. Then you can write a small Jython script like this:
# Reading Pthia8 file in the ProMC format using [[http://jwork.org/dmelt | DataMelt]]
# S.Chekanov (ANL)
from java.io import *
from java.awt import *
from promc.io import *
from proto import * # import FileMC
from jhplot import * # import DatMelt graphics
file = FileMC("Pythia8.promc")
print "ProMC version=",file.getVersion()
print "Last Modified=",file.getLastModified()
header = file.getHeader() # get header file
unit=float(header.getMomentumUnit())
lunit=float(header.getLengthUnit())
print "Momentum unit=",unit
print "Length unit=",lunit
for j in range(header.getParticleDataCount()): # look at PDG info stored in the header
d = header.getParticleData(j)
pid = d.getId(); mass = d.getMass(); name = d.getName();
print name, pid, mass
h1= H1D("Px",100,0,10) # create a histogram
print "File size=",file.size()
for i in range(file.size()): # run over all events
if (i%100==0): print "Event=",i
entry = file.read(i)
p = entry.getParticles() # get particles
for j in range( p.getPxCount() ):
h1.fill(p.getPx(j)/unit)
c1 = HPlot("Canvas",600,400) # plot histogram
c1.visible()
c1.setAutoRange()
c1.draw(h1)
c1.export("px.pdf") # create PDF file
Now copy a ProMC file :
wget http://atlaswww.hep.anl.gov/asc/promc/download/Pythia8.promc
Start DatMelt and run this script. It will show the Px spectra for all stored particles.
(You can load this file as "dmelt.sh promc.py")
===== Random access =====
You can extract a given record/event using a random access capabilities of this format.
Check the example in "examples/random_access". Type make to compile it and run the code.
You can see that we can extract the needed event using the method "event(index)".
===== Reading data remotely (with random access)=====
You can read or stream data from a remote server without downloading files. The easiest is to to use Python reader (see the example
in examples/python). Below we show to to read one single event (event=100) remotely using Python:
# Chekanov. Shows how to read event from a remote file with MC events
import urllib2, cStringIO, zipfile
url = "http://mc.hep.anl.gov/asc/snowmass2013/delphes36/TruthRecords/higgs14tev/pythia8/pythia8_higgs_1.promc"
try:
remotezip = urllib2.urlopen(url)
zipinmemory = cStringIO.StringIO(remotezip.read())
zip = zipfile.ZipFile(zipinmemory)
for fn in zip.namelist():
# print fn
if fn=="100":
data = zip.read(fn)
print "Read event=100"
except urllib2.HTTPError:
print "no file"
In this example. "data" represents a ProMC event record. Look at the example in the example
in examples/python how to print such info.
===== HEPMC to PROMC converter =====
A prototype C++ converter exists which converts a HEPMC file into the ProMC file (hepmc2promc, written in C++).
You should specify HEPMC directory during the installation as described in [[asc:promc:installation|Installation]].
Look also at the example code in examples/hepmc2promc in the installation directory.
Assuming that the converter is installed and you run **setup.sh** script, the syntax for conversion is:
hepmc2promc [input HEPMC file] [Output ProMC file] [Description]
you may have "logfile.txt" in the same directory with the additional information (it will be attached to ProMC automatically).
You can also do the opposite: convert a ProMC file to HepMC file with the command "promc2hepmc".
Here is a small example:
wget http://atlaswww.hep.anl.gov/asc/promc/download/HiggsTTbar.hepmc.gz
gunzip HiggsTTbar.hepmc.gz
hepmc2promc HiggsTTbar.hepmc HiggsTTbar.promc "Higgs plus ttbar at 14 TeV"
This will create the PROMC file HiggsTTbar.promc. To look at the events, run the browser:
java -jar browser_promc.jar HiggsTTbar.promc
Double click the event number (on the left) and you will see how a single event look like.
If you have a problem with converter, just get this PROMC file as:
wget http://atlaswww.hep.anl.gov/asc/promc/download/HiggsTTbar.promc
java -jar browser_promc.jar HiggsTTbar.promc
===== STDHEP to PROMC converter =====
The ProMC package includes STDHEP to PROMC converted. It can be compiled typing "make" in the directory "examples/stdhep2promc". For ProMC v1.2, this converter is not installed by dafault.
===== Default data layouts =====
Structural data in ProMC are represented by platform-neutral layout files
using Google's Protocol Buffers template files. The are attached to the ProMC file
and cab be extracted using "promc_proto" tool and using ProMC Java browser.
The default layout files shipped with the ProMC are designed
to store truth MC records and have the following structure:
- [[https://github.com/Argonne-National-Laboratory/ProMC/tree/master/proto/promc|ProMC record]] - This is simplest data layout to keep only truth particle information. It is shipped with the default ProMC installation.
- [[https://github.com/Argonne-National-Laboratory/ProMC/tree/master/proto/proreco | ProMC for Delphes]] - This is a more complicated data layout suitable for data and reconstructed MC. It shows how to include reconstructed objects (jets,photons, muons)
- [[http://atlaswww.hep.anl.gov/asc/WebSVN/filedetails.php?repname=ProMC&path=%2FProMC%2Ftrunk%2Fexamples%2Fproto%2Fdelphes_constituents%2FProMC.proto | ProMC.proto for Delphes plus jet constituents]] - This is more complicated layout. It shows how to include clusters (jet constituents) for each jet
- [[https://github.com/Argonne-National-Laboratory/ProMC/tree/master/proto/pronlo|ProMC NLO record]] - This is a data layout to keep NLO calculations
More complex layout files (for example, foe Delphes fast simulation) can be found
inside the directory "examples/"
===== ProMC file manipulation =====
Read [[asc:promc:introduction|section]] to learn about ProMC commands. One can dump events, get information, extract and save a smaller number of events into a new files. Example:
promc_info file.promc # get info
promc_dump file.promc # dump events
promc_extract file.promc new.promc 10 # save 10 events in a new file new.promc
promc_log # extract log file (if attached)
A ProMC file is a simple zip file with ProtoBuffer messages.
One can break out a ProMC file into pieces as:
unzip file.promc
Then you can assemble files back using "zip" command. You can also merge, add etc ProMC files.
One can also splits a file into an arbitrary numbers of files. Use promc_split program:
cp -r $PROMC/examples/promc_split
promc_proto file.promc
promc_code
make
promc_split file.promc 7 # plits the original file into 7 files in the directory out/ with the same number of events
===== Accessing data in PHP =====
You can access entries of PROMC files in PHP. Currently, you can read entries,
version and description entry. This is an example. Run it as "php test.php"
===== Where and how one can use ProMC=====
The default version of ProMC has 2 records per event: Event (event information)
and "Particles" (truth particle information).
You can also use a modified version which keeps reconstructed objects, such as "Jets", "Electrons", "Photons", "Muons". ProMC is used for Snowmass2013 to keep Delphes fast simulation files. See the [[snowmass2013:analyse_d36_promc| Snowmass web page]].
However, it is still in a prototype stage.
--- //[[chekanov@anl.gov|Sergei Chekanov]] 2013/03/11 21:45//
--- //[[chekanov@anl.gov|Sergei Chekanov]] 2013/05/03 12:58//