===================== July 25, 1998 ============================= I had the opportunity to have you bench run on two SGI: an Octane with 2 R10K@250MHz and an Origin200QC (2Mb of secondary cache). The bench were run by an SGI employee. Octane 2xR10000@250MHz # processor | temps (heure) ---------------------------- 1 | .93 2 | .48 Origin 200QC 2xR10000@180MHz # processor | temps (heure) ---------------------------- 1 | 1.2 2 | .62 4 | .32 Tru -- CEA Centre d'etudes de Saclay 91191 Gif sur Yvette CEDEX FRANCE DSM/DRECAM/SCM | DSV/DBCM/SBPM Bat 137 piece 107 | Bat 528 piece 215 thuynh@cea.fr ===================================================================
Nodes 1 2 4 8 16
E ext 9423.8 4697.3 2438.0 1170.7 587.8 E int 85.6 47.6 29.5 17.0 11.4 Wait 0.0 38.8 62.6 22.1 24.8 Comm 0.1 17.7 41.5 39.9 68.9 List 853.3 428.7 223.2 112.0 63.6 Integ 21.7 10.6 7.7 5.6 9.7 Total 10384.5 5240.7 2802.4 1367.2 766.1 Total(hours) 2.88 1.456 0.778 0.380 0.213 Eff 100.0% 99.1% 93.0% 95.0% 84.7% Speedup 1.0 1.98 3.71 7.6 13.55
E ext : External energy terms (electrostatics + Lenard-Jones) E int : Internal energy terms (bond, angle, dihedral) Wait : Load unbalance Comm : Communication time (Vector Distr. Global {Sum,Brdcst}) List : Nonbond list generation time Integ : Time needed to integrate equations of motion Total : Total elapsed time Eff : Efficiency = speedup divided by number of nodes Speedup: Time for N nodes divided by time for one node.
Origin @ dontask.ii.uib.no (http://www.parallab.uib.no) (64 processors, load was 10) Since f77 compiler 7.2 gives wrong results when compiling CHARMM with the -O3, only the following makefiles have -O3 option in it (for all files within the module): energy.mk, images.mk, nbonds.mk. This gives correct results, while the speed for the following benchmarks is the same as if compiling the whole CHARMM with -O3. (EXPAND works the same as NOEXPAND). c26a1 has to be patched in order to support CMPI MPI keywords. This combination means that CHARMM uses MPI for send and receive and global combine routines from CHARMM, while specifying just PARALLEL and PARAFULL means use combine routines provided by MPI. The following CHARMM executables were tried: name keywords in pref.dat ------------------------------------------------ charmm-mpi PARALLEL PARAFULL charmm-pvm PARALLEL PARAFULL CMPI PVMC SGIMP SYNCHRON charmm-pvm-gencomm PARALLEL PARAFULL CMPI PVMC SGIMP SYNCHRON GENCOMM charmm-cmpi-sync PARALLEL PARAFULL CMPI MPI SYNCHRON charmm-cmpi-async PARALLEL PARAFULL CMPI MPI charmm-cmpi-gencomm PARALLEL PARAFULL CMPI MPI GENCOMM bench: MbCO+3830w 1000 steps of dynamics [time in seconds]: ============================================================ executable Number of nodes ------------------------------------------------------------ 8 16 32 ------------------------------------------------------------ charmm-mpi 634.8 351.4 226.2 charmm-pvm 626.4 344.7 Doesn't want to start so many charmm-pvm-gencomm 634.3 351.7 Doesn't want to start so many charmm-cmpi-sync 624.0 358.6 200.4 charmm-cmpi-async 642.2 346.1 201.6 charmm-cmpi-gencomm 625.4 343.1 211.7 bench1: MbCO+4985w 55.5A cubic box PME simulation (100 steps) [time in seconds]: ============================================================ executable Number of nodes ------------------------------------------------------------ 8 16 32 ------------------------------------------------------------ charmm-mpi 255.7 344.0 323.2 charmm-pvm 181.5 138.9 Doesn't want to start so many charmm-pvm-gencomm 175.7 139.3 Doesn't want to start so many charmm-cmpi-sync 194.3 214.6 291.4 charmm-cmpi-async 194.6 187.1 286.0 charmm-cmpi-gencomm 175.1 207.6 115.1
1 node ----------------------- Power Indigo 2 2.93 h Indy 8.93 h
See SGI WWW pages for more details