Nodes 1 2 4 8 16 32 64 128
E ext 51310.6 25655.3 12797.5 6408.5 3267.4 1666.4 851.6 438.5 E int 396.1 212.3 133.7 92.1 67.9 55.9 49.5 46.3 Wait 0.0 151.5 84.1 51.1 36.5 27.4 35.0 20.2 Comm 0.0 33.0 52.3 65.4 76.5 84.9 98.3 104.8 List 3275.9 1695.7 1010.1 504.5 255.2 133.2 72.0 42.4 Integ 126.2 75.0 38.9 21.2 12.7 8.8 7.2 6.8 Total 55108.8 27822.8 14116.6 7142.8 3716.2 1976.7 1113.6 659.0
Total(hours) 15.31 7.73 3.92 1.98 1.03 0.55 0.31 0.18 Eff 100.0% 99.0% 97.6% 96.4% 92.6% 87.1% 77.3% 65.3% Speedup 1.0 1.98 3.9 7.7 15 28 49 84
E ext : External energy terms (electrostatics + Lenard-Jones) E int : Internal energy terms (bond, angle, dihedral) Wait : Load unbalance Comm : Communication time (Vector Distr. Global {Sum,Brdcst}) List : Nonbond list generation time Integ : Time needed to integrate equations of motion Total : Total elapsed time Eff : Efficiency = speedup divided by number of nodes Speedup: Time for N nodes divided by time for one node.
See Intel WWW pages for more details
Timothy G. Matson (tgm@ssd.intel.com) writes:
However, I have also been running the program on our existing supercomputers. There are a few simple optimizations I had to carry out. The most important was to replace your own global ops with the most recent van de Geijn library called iCC. There was a race condition (which I documented and passed onto Bernie) in the default routines. Also, the iCC library runs on any number of nodes so I don't have to worry about the power of 2 stuff. The other optimization was to limit the number of communication buffers managed by each Paragon node by setting the runtime flag -loc to to log(P)+1 where P is the number of processors. This lets the message passing take place much more efficiently. If I do that, I get the following numbers for the Paragon using R1.2 with the message co-processor turned on: Paragon XP & 1 & 14.37 \\ & 8 & 1.92 \\ & 16 & 0.98 \\ & 32 & 0.52 \\ & 64 & 0.29 \\ & 128 & 0.18 \\ & 256 & 0.14 \\ & 512 & 0.098 \\ \hline This is pretty cool!!! As far as I know, the 0.098 number is the fastest time by anyone for this benchmark.