# StatShape

## a jet-shape identification based on a linear regression

The StatShape package allows to characterize an arbitrary 3D distribution (say jet constituents characterized by Eta, Phi and Energy) in terms of few variables. Thus it deals with a general problem of a dimensionality reduction, i.e. how to reduce the amount of information in the original multi-dimensional data keeping only a few parameters which catch the most basic spacial features of the original data. When applied for jet constituents, the package allows
• Reconstruction of the major and minor axises using a linear regression
• Calculate jet size in terms of the major and minor lengths
• Calculate the extent of elongation of the jet shape (i.e. the so-called eccentricity).
• Calculate degrees of skewness of jet profile with respect to the major and minor axises

Unlike the traditional approaches based on the jet width and eccentricity, usually derived using the principle-component analysis, (see, for example, Phys. Rev. D81 (2010) 114038), the current approach provides a significantly larger number of jet-shape characteristics: It goes beyond the simple jet elongation, and provides with various degrees of skewness of jet shapes in the longitudinal (along the major axis) and the transverse (along the minor axis) directions. This means that the approach can be used to detect jets with a triangular shape (!) assuming not only the geometrical shape topology, but also taking into account weights given by energies of jet constituents. This is potentially important for top quarks decaying to qqb, or other unbalanced energy flows inside a jet due to asymmetric decays. The best result is obtained for jets reconstructed with the kT algorithm (not anti-kT which makes circular jets!).

## Algorithm implementation

### C++ code

C++ code can be downloaded here. Unpack the tar file (tar -zvxf StatShape.tgz) and type "make" to compile it. ROOT must be installed.

The main C++ code is "statshapes.cxx", which takes a jet (represented by the LParticle class) and builds all jet-shape variables using jet constituents (each represented by the CParticle class). LParticle and CParticle classes are given as convenient inputs (can be replaced by your interface classes).

You can get more instructions in the file "README". You can run a simple example reading a file with jet constituents by running the code (type "./main").

The package returns an array stat[] with 23 shape variables Below, 'quadrants' signifies the "quadrant method" while 'nq' denotes the "non-quadrant method" as defined in the paper hep-ph/1009.2749. The variables are defined there as well

Also below, 'method2' signifies an orthogonal projection of the jet-constituents onto the major and minor axis-lines before continuing with distance calculations.

Description of the output array:

3- semi-major length 1; distance from jet-center to closer major center (quadrants)

4- semi-major length 2; distance from jet-center to further major center (quadrants)

5- Major Eccentricity; 'skewness' of jet to one side of the minor axis; 1 - (stat[3]/stat[4])

6- semi-minor length 1 distance from jet-center to closer minor center (quadrants)

7- semi-minor length 2 distance from jet-center to further minor center (quadrants)

8- Minor Eccentricity; 'skewness' of jet to one side of the major axis; 1 - (stat[6]/stat[7])

9- absolute major length; distance between two furthest jet constituents (after major axis-line projection)

10- absolute minor length; distance between two furthest jet constituents (after minor axis-line projection)

11- major length "method2";

12- minor length "method2";

13- Eccentricity "method2"; 1 - (stat[12]/stat[11])

14- major length (nq); distance between opposite major semi-plane weighted centers

15- minor length (nq); distance between opposite minor semi-plane weighted centers

16- Eccentricity (nq); 1 - (stat[15]/stat[14])

17- major length "method2" (nq)

18- minor length "method" (nq)

19- Eccentricity "method2" (nq); 1-(stat[18]/stat[17])

20- Major Eccentricity (nq);

21- Minor Eccentricity (nq);

22- Fmax; fraction of jet energy held by the most energetic constituent

It should be noted that not all variables are interesting and some of them strongly correlate. The paper hep-ph/1009.2749 discusses a few most interesting variables.

The code also calculates the jet width and jet eccentricity using the principle-component analysis (PCA). In addition, it reconstructs the so-called pull values as described in hep-ph/1001.5027 by J.Gallicchio and M.D. Schwartz.

### Java code

The code is also implemented in Java for visualization and easy debugging on event-by-event basis. One can build an arbitrary jet shape using several overplayed 2D Gaussian distribution ("jet constituents") and study such distributions running a Python (Jython) script. This StatShape calculator is included into the jHepWork data-analysis framework. Look at the statistics examples (the Jython script "Identification of 2D shapes in data", statistics_statshape.py). The Java class API is also available.

## References:

• New approach for jet-shape identification of TeV-scale particles at the LHC
S.Chekanov, C.Levy, J.Proudfoot, R.Yoshida,
Phys.Rev.D82:094029,2010   E-print: hep-ph/1009.2749
• Searches for TeV-scale particles at the LHC using jet substructure
S.Chekanov, J.Proudfoot (2010) Phys. Rev. D81 (2010) 114038   E-print: hep-ph/1002.3982

## Presentation:

Page editors: S.Chekanov, C.Levy, L.Asquith