Gaussian G09 Scaling Benchmarks

Size: px

Start display at page:

Download "Gaussian G09 Scaling Benchmarks"

Lawrence Harper
5 years ago
Views:

1 Systems: Gaussian G9 Scaling Benchmarks Jemmy Hu SHARCNET June-July, 29 Name CPUs/node RAM/node OS Interconnect Saw 8 (2 quad-core), 6. GB HP Linux XC 4 InfiniBand 3. GHz Narwhal Silky Hound 4 (2 dual-core), 2.2 GHz SGI, GHz 6 (4 quad) Xeon@2.4GHz, 32 (8 quad) Opteron@2.2 GHz 8. GB HP Linux XC 3. Myrinet 2g (gm) 256GB SUSE Enterprise SMP, NUMA 28 GB Centos 5 InfiniBand, NFS storage file system Molecules and Methods/Models: Molecule\Module B3LYP MP2 CISD CCSD I III IV C4H4Cl2P2Pd (test job 445) CH3OH (test job 58 ) CH3CH2 (test job 684 ) Opt + Freq Opt + Freq Opt + Freq Opt +Freq BS on card BS on card 6-3g(2df,p) 6-3g*, 6-3g(2df,p) Gaussian versions: Gaussian versions G9-A. Binary versions from Gaussian Inc G9-A.2 Compiled from source on Silky Binaries for others G3-E. Binary versions from Gaussian Inc Target goals: [] Scaling results for typical models/methods in Gaussian 9 [2] Scaling on different systems: clusters (saw, narwhal, hound) vs. SMP (silky) [3] G3 vs G9

2 General conclusions:. Gaussian 9 scales quite good for shared memory jobs. Silky (SMP machine): DFT type of methods scale very good to 6 processors (small speedup from 6 to 32 processors) MP2 type of methods scale very good to 8 CPUs (small speedup from 8 to 6 processors) Saw (8-cpu nodes): DFT scale good to 8 processors MP2 scales to 4 processors (small speedup for 8 processors) 2. Gaussian does not scale for CI and CC based methods. 3. G9 is about 2 times faster than G3 for DFT, CI and CC based methods. Maximum processors for G9 jobs (In practice, in order to run more jobs on a system, smaller cpus/size jobs are recommended) [] Silky (SMP machine) Methods/Modules Opt Freq Energy HF DFT (B3LYP, etc) MP(2, 3, 4) CISD (cis, cid, cisd, qcisd) CCSD (ccd, ccsd, ccsd(t)) [2]Saw (2 quad-core nodes) Methods/Modules Opt Freq Energy HF DFT (B3LYP, etc) MP(2, 3, 4)* CISD (cis, cid, cisd, qcisd) CCSD (ccd, ccsd, ccsd(t)) *due to the node per job LSF nature, run 8-way MP2 on saw is fine. If a node can be shared by multiple jobs (torque on hound), 4-way MP2 jobs are recommended. [3] bull, goblin (and other 4-core node XC clusters) Methods/Modules Opt Freq Energy HF DFT (B3LYP, etc) MP(2, 3, 4) CISD (cis, cid, cisd, qcisd) CCSD (ccd, ccsd, ccsd(t))

3 Results on saw, molecule-i 8 6 B3LYP-Optimization on saw 8 6 B3LYP-Optimization on saw G3-E. G9-A G3-E. G9-A MP2-Optimization on saw MP2-Frequency on saw G3-E. G9-A G3-E. G9-A

4 Molecule-III CISD - Opt on saw CISD -Freq on saw G3-E. G9-A.2 G3-E. G9-A No. pf CPUs Molecule-IV CCSD-Opt on saw CCSD-Freq on saw G3-E. G9-A G3-E. G9-A

5 Results on narwhal, molecule-i B3LYP-Opt on narwhal B3LYP-Freq on narwhal G3-C.2 G3-E. G9-A G3-C.2 G3-E. G9-A MP2-Opt on narwhal MP2-Freq on narwhal G3-C.2 G3-E. G9-A G3-C.2 G3-E. G9-A No. f CPUs

6 Results on silky, molecule-i B3LYP-Opt on silky B3LYP-Freq on silky G9-A G9-A No of CPUs No of CPUs MP2-Opt on silky MP2-Freq G9-A G9-A No of CPUs

7 Molecule: WH(CO)(NO)(PMe3)3 4 rmpwpw9-opt on silky 8 rmpwpw9-freq on silky G9-A G9-A No of CPUs

8 Results: saw, Molecule - I B3lyp / opt CPUs G3-E. G9-A., 62 (27ms) 2 7 (7m5s) 4 7 (m4s) (9m57s) (8m3s) (m39s) (7m) (5m37s) 29 (8m49s) 639 (m39s) 423 (7m3s) 326 (5m26s) (26m4s) (2m32s) (8ms) (6m9s) B3LYP / Freq G3-E. 56 (26ms) (4m7s) 4 53 (8m33s) (5m45s) G9-A., (3m23s) (7m) (3m54s) (2m38s) 832 (3m52s) 42 (7m) 233 (3m53s) 57 (2m37s) (2m) (9m26s) (5m) (3m6s) MP2 / opt G3-E (37m26s) (22m57s) (6m37s) 8* 5 (7m3s) G9-A., (32m4s) (9m43s) (4m32s) (4m27s) 234 (39ms) 74 (9m34s) 87 (4m3s) 866 (4m26s) (5m45s) (23m5s) (5m58s) (5m53s) MP2 / Freq G3-E (9h35m7s) (5h36m32s) (3h37m34s) 8* 2849 (3h34m9s) G9-A., (5h5ms) 9836 (5h3m36s) 9355 (2h33m25s) (2h35m52s) 63 (h43m4s) (h45m) 5455 (h37m23s) (h3m55s) (9h6s)? (3h58m3s) (2h2m2s) (h45m39s)

9 Results: CISD, Molecule-III Cluster: saw, G3-E./G9-A., 6-3g(2df,p) G3-E. G9-A. Opt Freq Opt Freq Run time Speedup 626 (m26s) 533 (h28m23s) 332 (5m32s) 273 (34m33s) (8m27s) (h2m2s) (5m37s) (37m24s) (2m42s) (h3m44s) (5m28s) (35m3s) (3m) (h27m) (5m44s) (36m) Results: CCSD, Molecule - IV Cluster: saw, G9-A. Opt Freq Opt Freq 6-3g* 6-3g* 6-3g(2df,p) 6-3g(2df,p) 48 (8m) 2 36 (6m) (7m2s) 8 36 (6m) 2355 (39m5s) (33m23s) (36m25s) (32m) 338 (55m8s).8 25 (4m4s).8 34 (5m54s) (47m4s) 6785 (4h39m45s) (3h32m4s) (4h25m53s) Cluster: saw, G3-E. Opt+Freq Opt Freq 6-3g* 6-3g* 6-3g* Run time Speed up 34m45s 6 (ms) 358 (58m38s) 2 34m6s (m5s) (h8m6s) 4 34ms (m2s) (h2m39s) (m22s) (h4m24s) Gaussian does not scale for CI or CC based methods, but G9-A. is about 2 times faster than G3- E. for the CISD and CCSD jobs (6-3g* results)

10 Cluster: narwhal, Molecule - I MP2 / Opt CPUs G3-C.2 G3-E. Runtimes(s) 7522 (2h5m22s) (45m44s) (38m5s) speedup 3629 (h29s) (35m57s) (33m26s) 466 (h7m46s) (35m4s).8 2 (33m2s) MP2 / Freq G3-C.2 G3-E (9h23m9s) (5h49m22s) (4h5m5s) 5452 (5h8m4s) (8h8m5s) (6h46m42s) (9h6m39s) (4h5m44s) (3hms) B3LYP / OPT G3-C.2 G3-E. 464 (h6m54s) (32m5s) 4 43 (23m23s) 397 (5m37s) (24ms) (8m2) 246 (35m46s) (8ms) (2m24s) B3LYP / Freq G3-C.2 G3-E. 288 (48m) (23m55s) (3m49s) 244 (35m44s) (8m45s) (m23s) 44 (24ms) (m6s) (6m4s) (35m35s) (8m32s) (2m42s) (26m29s) (2m4s) (7m6s)

11 Cluster: silky Molecule I, benchmark- Molecule II, Dmitri s sample (#rmpwpw9/genecp nosymm opt freq) DFT / opt CPUs ia64-b, M2 (rmpwpw9) ia64-b, M (9h28m46s) (35m3s) 2 35 (2m55s) (2h44m3s) (3m5s) (h32m46s) (8m5s) (52m24s) (5m5s) (34m3s) (B3LYP) ia64-s, M 33m8s.62 22m54s 2.7 2m3s 2m52s, 2m4s DFT / Freq ia64-b, M2 (rmpwpw9) ia64-b, M (4h28m5s) (4m7s) 2 48 (8m) (h24m46s) (4m2s) (54m46s) (2m3s) (42m37s) (.36) (m33s) (4m24s) (B3LYP) ia64-s, M 4m4s.76 8ms m9s 4m2s, 4m32s MP2 M Opt 4946 (h22m26s) 4 99 (33ms) 8* 232 (2m32s) 6 5 (7m3s) (3ms) Freq 682 (4h4m2s) (h4m46s) (h2m26s) (h4m2s) (54m45s)

12 Cluster: hound, Molecule I (NFS storage file system, results are meaningless) B3lyp / opt CPUs G9-A., amd4 Runtime(s) Speedu p G9-A., 43m34s 24m5s 25ms 34m2s 4 7m4s 9m4s 9m3s 6m3s 8 2m25s 6m42s 6m4s 9m 6 m29s 6m8s 6m7s 7m36s 32 7m4s B3LYP / Freq G9-A., G9-A., 3m42 7m5s 36m4s 24m3s 4 8m48s 5ms 4m57s 6m49s 8 5m 2m49s 2m49s 3m4s 6 5m42s 2ms (2ms) 2m5s 32 2m44s MP2 / opt G9-A., G9-A., h57m47s h4m22s h9m39s h5m44s 4 42m4s 9m7s 26m53s 32m2s 8* h5m53s 5m 22m22s 4m53s 6 h47m34s 2m9s 2m3s 32 2h4m26s MP2 / Freq G9-A., G9-A., h57m49s h4m34s 7h3m48s h45m8s 4 6h3m4s 6h23m45s 3h4m48s 5h2m 8* 7h3m3s 7h23m6s 2h52m7s 3h47m38s 6 3h5m42s 5h9m28s 5h33m29s 32 4h46m42s

13 Input files %mem = 2GB for B3LYP %mem = 4GB for MP2 computations %mem = 2GB for CISD %mem = 4GB for CCSD computations %nproc varies from, 2, 4, 8, 6 to 32 threads/cpus depending on the node structures Molecule I, (H2PCH2CH2PH2)PdCl2(CH3)2 for B3LYP and MP2 It is from Gaussian test job 445, the geom. and basis sets can be found in test445.com in the directory /opt/sharcnet/gaussian/g9/tests/com or /opt/sharcnet/gaussian/g3/tests/com The following leading lines have been added above the geom. inputs (%nproc varies for scaling tests) %nosave %mem=2gb %chk=benchmark-b3lyp- %nproc= #p b3lyp/gen 6d opt freq (for B3LYP computations) [#p mp2/gen 6d opt freq (for MP2 computations)] Gaussian Test Job 445: (H2PCH2CH2PH2)PdCl2(CH3)2 benchmark optimization Molecule: WH(CO)(NO)(PMe3)3 for rmpwpw9 %chk=test4cpussilky.chk %mem=256mw %nproc=4 #opt rmpwpw9/genecp nosymm WH(CO)(NO)(PMe3)3 test calculation using 4 CPUs W P P P N O C O

14 C H H H C H H H C H H H C H H H C H H H C H H H C H H H C H H H C H H H H H C N O P 6-3g(d,p) **** W sdd **** W sdd --Link-- %chk=test4cpussilky.chk %mem=52mw %nproc=4 #freq geom=check guess=read rmpwpw9/genecp nosymm WH(CO)(NO)(PMe3)3 test calculation using 4 CPUs H C N O P

15 6-3g(d,p) **** W sdd **** W Sdd Molecule III for CISD Opt and Freq %NoSave %chk=ch3oh_cisd-4 %mem=2gb %nproc=4 #p cisd/6-3g(2df,p) opt freq Gaussian Test Job 58: MEOH opt, freq STD MOD cisd C O CO H CH 2 T H CH 2 T 3 T H CH 2 T 3 T - H 2 OH T 3 8. CO.43 CH.9 OH.96 T Molecule IV, for CCSD Opt and Freq %NoSave %chk=ch3ch2_ccsd-8 %mem=4gb %nproc=8 #p ccsd/6-3g* opt freq Gaussian Test Job 684: Ethyl radical CCSD opt+freq 2 C C2 C CC H C CH C2 T H2 C CH C2 T H T H3 C2 CH C T H 8. H4 C2 CH C T H3 2. H5 C2 CH C T H3 24. CC.54 CH.9 T

CURRENT STATUS OF THE PROJECT TO ENABLE GAUSSIAN 09 ON GPGPUS

CURRENT STATUS OF THE PROJECT TO ENABLE GAUSSIAN 09 ON GPGPUS Roberto Gomperts (NVIDIA, Corp.) Michael Frisch (Gaussian, Inc.) Giovanni Scalmani (Gaussian, Inc.) Brent Leback (PGI) TOPICS Gaussian Design