Differences

This shows you the differences between two versions of the page.

--- hpc:bghep:benchmarks [2014/02/17 16:41]
asc
+++ hpc:bghep:benchmarks [2014/05/27 16:45] (current)
edmay
@@ Line 4: / Line 4: @@
 ====== IO benchmarks using ProMC format ======
 (E.May)
 A number of IO tests are being performed on BlueGene/Q
@@ Line 18: / Line 17: @@
 .5 M ev in 1197 sec (20 mins).
+===== Fig 1 =====
+{{:hpc:bghep:alcf-vesta-pythia6-promc-mpi-a.png?400|}}
-{{:hpc:bghep:alcf-vesta-pythia6-promc-mpi-a.png}}
 The first plot shows the (inverse) rate as a function
 of core. There is a noticiable jog at 1024 cores and above. I ran a second job at
@@ Line 28: / Line 29: @@
+===== Fig 2 =====
-{{:hpc:bghep:alcf-vesta-pythia6-promc-mpi-b.png}}
+Plot 2 shows the data arranged as a speed-up presentation. Note the use of log scales which minimises the nonlinearity.
-Plot 2 shows the data arranged
+{{:hpc:bghep:alcf-vesta-pythia6-promc-mpi-b.png?400|}}
-as a speed-up presentation. Note the use of log scales which minimises the nonlinearity.
-{{:hpc:bghep:alcf-vesta-pythia6-promc-mpi-c.png}}
+===== Fig 3 =====
 A more appropriate measure is shown in Plot 3 which is the effective utilization of
-the multiple cores. Above 100 cores the fraction begins to drop reaching only 20% for
+the multiple cores. Above 100 cores the fraction begins to drop reaching only 20% for 1024 (and above) cores. This is really rather poor performance! The plot 5 shows the I/O performance which shows that the code is not really pushing the I/O capabilities of Vesta.
-> = 1024 cores. This is really rather poor performance! The plot 5 shows the I/O performance
-which shows that the code is not really pushing the I/O capabibilities of Vesta.
-{{:hpc:bghep:alcf-vesta-pythia6-promc-mpi-d.png}}
+{{:hpc:bghep:alcf-vesta-pythia6-promc-mpi-c.png?400|}}
+This plot shows the efficiency (R_1c / Nc / R_Nc) v. Nc, where
+<code>
+R == sec/event  =       job_run_time/total_number_of_events
+Nc == number of cores used in job
+</code>
+For a perfect speed-up this would always be 1. My experience with clusters of
+smaller size is that 80% is usually achievable while 20% is quite low and the
+usual interpretation is the code has high fraction of serialization. For this
+case it would be more efficient to run 8 jobs of 512 cores than 1 job of 4096.
+This of course is speculation on my part as I have not identified to cause of
+the inefficiency!
+===== Fig 4 =====
+{{:hpc:bghep:alcf-vesta-pythia6-promc-mpi-d.png?400|}}
+The ALCF experts suggested the I/O model of 1 directory and many files in that 1 directory would preform badly due to lock contention
+on the directory! Thus the example code was modified to use a model
+of 1 output promc data file per directory. Running the modified code
+produced the following figures:
+http://www.hep.anl.gov/may/ALCF/new.vesta.4up.pdf Vesta Plots
+Focusing on the 'Efficiency' plots there appears to be some (small)
+improvements both at low core numbers and a high core numbers: 80% rising to 90% and 20% rising to 40% respectively.
+The large step between 512 to 1024 is still present!
+As part of the bootcamp for MIRA the code was moved to the
+BG/Q Mira and a subset up the benchmarks were run in the new
+IO model. The results are shown in
+http://www.hep.anl.gov/may/ALCF/new.mira.4up.pdf Mira figures
+Again focusing on the Efficiency plot the results are very
+similar. This suggest when using this naive IO model of each
+mpi rank writing its own output ProMC file should be limited to
+jobs of 512 cores or less for good utilization of the machine
+and IO resouces. This is OK for Vesta where the minimum charging
+is 32 nodes (ie 512 cores). While on Mira the minimum is 512
+nodes (ie 8192 cores) there is not a good match!
 [[hpc:bghep:| << back]]
+ --- //[[[email protected]|Ed May]] 2014/02/04 //
+ --- //[[[email protected]|Ed May]] 2014/05/27 //
  --- //[[[email protected]|Sergei Chekanov]] 2014/02/04 10:42//

ATLAS Support Center

User Tools

Site Tools

Differences

Page Tools