Gaussian G09 Scaling Benchmarks

Size: px
Start display at page:

Download "Gaussian G09 Scaling Benchmarks"

Transcription

1 Systems: Gaussian G9 Scaling Benchmarks Jemmy Hu SHARCNET June-July, 29 Name CPUs/node RAM/node OS Interconnect Saw 8 (2 quad-core), 6. GB HP Linux XC 4 InfiniBand 3. GHz Narwhal Silky Hound 4 (2 dual-core), 2.2 GHz SGI, GHz 6 (4 quad) Xeon@2.4GHz, 32 (8 quad) Opteron@2.2 GHz 8. GB HP Linux XC 3. Myrinet 2g (gm) 256GB SUSE Enterprise SMP, NUMA 28 GB Centos 5 InfiniBand, NFS storage file system Molecules and Methods/Models: Molecule\Module B3LYP MP2 CISD CCSD I III IV C4H4Cl2P2Pd (test job 445) CH3OH (test job 58 ) CH3CH2 (test job 684 ) Opt + Freq Opt + Freq Opt + Freq Opt +Freq BS on card BS on card 6-3g(2df,p) 6-3g*, 6-3g(2df,p) Gaussian versions: Gaussian versions G9-A. Binary versions from Gaussian Inc G9-A.2 Compiled from source on Silky Binaries for others G3-E. Binary versions from Gaussian Inc Target goals: [] Scaling results for typical models/methods in Gaussian 9 [2] Scaling on different systems: clusters (saw, narwhal, hound) vs. SMP (silky) [3] G3 vs G9

2 General conclusions:. Gaussian 9 scales quite good for shared memory jobs. Silky (SMP machine): DFT type of methods scale very good to 6 processors (small speedup from 6 to 32 processors) MP2 type of methods scale very good to 8 CPUs (small speedup from 8 to 6 processors) Saw (8-cpu nodes): DFT scale good to 8 processors MP2 scales to 4 processors (small speedup for 8 processors) 2. Gaussian does not scale for CI and CC based methods. 3. G9 is about 2 times faster than G3 for DFT, CI and CC based methods. Maximum processors for G9 jobs (In practice, in order to run more jobs on a system, smaller cpus/size jobs are recommended) [] Silky (SMP machine) Methods/Modules Opt Freq Energy HF DFT (B3LYP, etc) MP(2, 3, 4) CISD (cis, cid, cisd, qcisd) CCSD (ccd, ccsd, ccsd(t)) [2]Saw (2 quad-core nodes) Methods/Modules Opt Freq Energy HF DFT (B3LYP, etc) MP(2, 3, 4)* CISD (cis, cid, cisd, qcisd) CCSD (ccd, ccsd, ccsd(t)) *due to the node per job LSF nature, run 8-way MP2 on saw is fine. If a node can be shared by multiple jobs (torque on hound), 4-way MP2 jobs are recommended. [3] bull, goblin (and other 4-core node XC clusters) Methods/Modules Opt Freq Energy HF DFT (B3LYP, etc) MP(2, 3, 4) CISD (cis, cid, cisd, qcisd) CCSD (ccd, ccsd, ccsd(t))

3 Results on saw, molecule-i 8 6 B3LYP-Optimization on saw 8 6 B3LYP-Optimization on saw G3-E. G9-A G3-E. G9-A MP2-Optimization on saw MP2-Frequency on saw G3-E. G9-A G3-E. G9-A

4 Molecule-III CISD - Opt on saw CISD -Freq on saw G3-E. G9-A.2 G3-E. G9-A No. pf CPUs Molecule-IV CCSD-Opt on saw CCSD-Freq on saw G3-E. G9-A G3-E. G9-A

5 Results on narwhal, molecule-i B3LYP-Opt on narwhal B3LYP-Freq on narwhal G3-C.2 G3-E. G9-A G3-C.2 G3-E. G9-A MP2-Opt on narwhal MP2-Freq on narwhal G3-C.2 G3-E. G9-A G3-C.2 G3-E. G9-A No. f CPUs

6 Results on silky, molecule-i B3LYP-Opt on silky B3LYP-Freq on silky G9-A G9-A No of CPUs No of CPUs MP2-Opt on silky MP2-Freq G9-A G9-A No of CPUs

7 Molecule: WH(CO)(NO)(PMe3)3 4 rmpwpw9-opt on silky 8 rmpwpw9-freq on silky G9-A G9-A No of CPUs

8 Results: saw, Molecule - I B3lyp / opt CPUs G3-E. G9-A., 62 (27ms) 2 7 (7m5s) 4 7 (m4s) (9m57s) (8m3s) (m39s) (7m) (5m37s) 29 (8m49s) 639 (m39s) 423 (7m3s) 326 (5m26s) (26m4s) (2m32s) (8ms) (6m9s) B3LYP / Freq G3-E. 56 (26ms) (4m7s) 4 53 (8m33s) (5m45s) G9-A., (3m23s) (7m) (3m54s) (2m38s) 832 (3m52s) 42 (7m) 233 (3m53s) 57 (2m37s) (2m) (9m26s) (5m) (3m6s) MP2 / opt G3-E (37m26s) (22m57s) (6m37s) 8* 5 (7m3s) G9-A., (32m4s) (9m43s) (4m32s) (4m27s) 234 (39ms) 74 (9m34s) 87 (4m3s) 866 (4m26s) (5m45s) (23m5s) (5m58s) (5m53s) MP2 / Freq G3-E (9h35m7s) (5h36m32s) (3h37m34s) 8* 2849 (3h34m9s) G9-A., (5h5ms) 9836 (5h3m36s) 9355 (2h33m25s) (2h35m52s) 63 (h43m4s) (h45m) 5455 (h37m23s) (h3m55s) (9h6s)? (3h58m3s) (2h2m2s) (h45m39s)

9 Results: CISD, Molecule-III Cluster: saw, G3-E./G9-A., 6-3g(2df,p) G3-E. G9-A. Opt Freq Opt Freq Run time Speedup 626 (m26s) 533 (h28m23s) 332 (5m32s) 273 (34m33s) (8m27s) (h2m2s) (5m37s) (37m24s) (2m42s) (h3m44s) (5m28s) (35m3s) (3m) (h27m) (5m44s) (36m) Results: CCSD, Molecule - IV Cluster: saw, G9-A. Opt Freq Opt Freq 6-3g* 6-3g* 6-3g(2df,p) 6-3g(2df,p) 48 (8m) 2 36 (6m) (7m2s) 8 36 (6m) 2355 (39m5s) (33m23s) (36m25s) (32m) 338 (55m8s).8 25 (4m4s).8 34 (5m54s) (47m4s) 6785 (4h39m45s) (3h32m4s) (4h25m53s) Cluster: saw, G3-E. Opt+Freq Opt Freq 6-3g* 6-3g* 6-3g* Run time Speed up 34m45s 6 (ms) 358 (58m38s) 2 34m6s (m5s) (h8m6s) 4 34ms (m2s) (h2m39s) (m22s) (h4m24s) Gaussian does not scale for CI or CC based methods, but G9-A. is about 2 times faster than G3- E. for the CISD and CCSD jobs (6-3g* results)

10 Cluster: narwhal, Molecule - I MP2 / Opt CPUs G3-C.2 G3-E. Runtimes(s) 7522 (2h5m22s) (45m44s) (38m5s) speedup 3629 (h29s) (35m57s) (33m26s) 466 (h7m46s) (35m4s).8 2 (33m2s) MP2 / Freq G3-C.2 G3-E (9h23m9s) (5h49m22s) (4h5m5s) 5452 (5h8m4s) (8h8m5s) (6h46m42s) (9h6m39s) (4h5m44s) (3hms) B3LYP / OPT G3-C.2 G3-E. 464 (h6m54s) (32m5s) 4 43 (23m23s) 397 (5m37s) (24ms) (8m2) 246 (35m46s) (8ms) (2m24s) B3LYP / Freq G3-C.2 G3-E. 288 (48m) (23m55s) (3m49s) 244 (35m44s) (8m45s) (m23s) 44 (24ms) (m6s) (6m4s) (35m35s) (8m32s) (2m42s) (26m29s) (2m4s) (7m6s)

11 Cluster: silky Molecule I, benchmark- Molecule II, Dmitri s sample (#rmpwpw9/genecp nosymm opt freq) DFT / opt CPUs ia64-b, M2 (rmpwpw9) ia64-b, M (9h28m46s) (35m3s) 2 35 (2m55s) (2h44m3s) (3m5s) (h32m46s) (8m5s) (52m24s) (5m5s) (34m3s) (B3LYP) ia64-s, M 33m8s.62 22m54s 2.7 2m3s 2m52s, 2m4s DFT / Freq ia64-b, M2 (rmpwpw9) ia64-b, M (4h28m5s) (4m7s) 2 48 (8m) (h24m46s) (4m2s) (54m46s) (2m3s) (42m37s) (.36) (m33s) (4m24s) (B3LYP) ia64-s, M 4m4s.76 8ms m9s 4m2s, 4m32s MP2 M Opt 4946 (h22m26s) 4 99 (33ms) 8* 232 (2m32s) 6 5 (7m3s) (3ms) Freq 682 (4h4m2s) (h4m46s) (h2m26s) (h4m2s) (54m45s)

12 Cluster: hound, Molecule I (NFS storage file system, results are meaningless) B3lyp / opt CPUs G9-A., amd4 Runtime(s) Speedu p G9-A., 43m34s 24m5s 25ms 34m2s 4 7m4s 9m4s 9m3s 6m3s 8 2m25s 6m42s 6m4s 9m 6 m29s 6m8s 6m7s 7m36s 32 7m4s B3LYP / Freq G9-A., G9-A., 3m42 7m5s 36m4s 24m3s 4 8m48s 5ms 4m57s 6m49s 8 5m 2m49s 2m49s 3m4s 6 5m42s 2ms (2ms) 2m5s 32 2m44s MP2 / opt G9-A., G9-A., h57m47s h4m22s h9m39s h5m44s 4 42m4s 9m7s 26m53s 32m2s 8* h5m53s 5m 22m22s 4m53s 6 h47m34s 2m9s 2m3s 32 2h4m26s MP2 / Freq G9-A., G9-A., h57m49s h4m34s 7h3m48s h45m8s 4 6h3m4s 6h23m45s 3h4m48s 5h2m 8* 7h3m3s 7h23m6s 2h52m7s 3h47m38s 6 3h5m42s 5h9m28s 5h33m29s 32 4h46m42s

13 Input files %mem = 2GB for B3LYP %mem = 4GB for MP2 computations %mem = 2GB for CISD %mem = 4GB for CCSD computations %nproc varies from, 2, 4, 8, 6 to 32 threads/cpus depending on the node structures Molecule I, (H2PCH2CH2PH2)PdCl2(CH3)2 for B3LYP and MP2 It is from Gaussian test job 445, the geom. and basis sets can be found in test445.com in the directory /opt/sharcnet/gaussian/g9/tests/com or /opt/sharcnet/gaussian/g3/tests/com The following leading lines have been added above the geom. inputs (%nproc varies for scaling tests) %nosave %mem=2gb %chk=benchmark-b3lyp- %nproc= #p b3lyp/gen 6d opt freq (for B3LYP computations) [#p mp2/gen 6d opt freq (for MP2 computations)] Gaussian Test Job 445: (H2PCH2CH2PH2)PdCl2(CH3)2 benchmark optimization Molecule: WH(CO)(NO)(PMe3)3 for rmpwpw9 %chk=test4cpussilky.chk %mem=256mw %nproc=4 #opt rmpwpw9/genecp nosymm WH(CO)(NO)(PMe3)3 test calculation using 4 CPUs W P P P N O C O

14 C H H H C H H H C H H H C H H H C H H H C H H H C H H H C H H H C H H H H H C N O P 6-3g(d,p) **** W sdd **** W sdd --Link-- %chk=test4cpussilky.chk %mem=52mw %nproc=4 #freq geom=check guess=read rmpwpw9/genecp nosymm WH(CO)(NO)(PMe3)3 test calculation using 4 CPUs H C N O P

15 6-3g(d,p) **** W sdd **** W Sdd Molecule III for CISD Opt and Freq %NoSave %chk=ch3oh_cisd-4 %mem=2gb %nproc=4 #p cisd/6-3g(2df,p) opt freq Gaussian Test Job 58: MEOH opt, freq STD MOD cisd C O CO H CH 2 T H CH 2 T 3 T H CH 2 T 3 T - H 2 OH T 3 8. CO.43 CH.9 OH.96 T Molecule IV, for CCSD Opt and Freq %NoSave %chk=ch3ch2_ccsd-8 %mem=4gb %nproc=8 #p ccsd/6-3g* opt freq Gaussian Test Job 684: Ethyl radical CCSD opt+freq 2 C C2 C CC H C CH C2 T H2 C CH C2 T H T H3 C2 CH C T H 8. H4 C2 CH C T H3 2. H5 C2 CH C T H3 24. CC.54 CH.9 T

CURRENT STATUS OF THE PROJECT TO ENABLE GAUSSIAN 09 ON GPGPUS

CURRENT STATUS OF THE PROJECT TO ENABLE GAUSSIAN 09 ON GPGPUS CURRENT STATUS OF THE PROJECT TO ENABLE GAUSSIAN 09 ON GPGPUS Roberto Gomperts (NVIDIA, Corp.) Michael Frisch (Gaussian, Inc.) Giovanni Scalmani (Gaussian, Inc.) Brent Leback (PGI) TOPICS Gaussian Design

More information

Platform Choices for LS-DYNA

Platform Choices for LS-DYNA Platform Choices for LS-DYNA Manfred Willem and Lee Fisher High Performance Computing Division, HP lee.fisher@hp.com October, 2004 Public Benchmarks for LS-DYNA www.topcrunch.org administered by University

More information

COSC 6385 Computer Architecture - Multi Processor Systems

COSC 6385 Computer Architecture - Multi Processor Systems COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:

More information

APPENDIX 2. Density Functional Theory Calculation of Vibrational Frequencies with Gaussian 98W 1

APPENDIX 2. Density Functional Theory Calculation of Vibrational Frequencies with Gaussian 98W 1 APPENDIX 2 Density Functional Theory Calculation of Vibrational Frequencies with Gaussian 98W 1 This appendix describes the use of Gaussian 98W software for calculating vibrational frequencies for the

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc.

Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters 2006 ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary Our Business Simulation Driven Product Development Deliver superior

More information

ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009

ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009 ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009 What is ACEnet? Shared resource......for research computing... physics, chemistry, oceanography, biology, math, engineering,

More information

Appendix 1b Gaussian 03W on Dell XPS

Appendix 1b Gaussian 03W on Dell XPS WARNING NOTICE: The experiments described in these materials are potentially hazardous and require a high level of safety training, special facilities and equipment, and supervision by appropriate individuals.

More information

Materials Simulations using Gaussian 09. Eunhwan Jung

Materials Simulations using Gaussian 09. Eunhwan Jung Materials Simulations using Gaussian 09 Eunhwan Jung 1 INTRODUCTION 2 Information of Gaussian Gaussian website www.gaussian.com Gaussian Online Manual http://www.gaussian.com/g_tech/g_ur/g09help.htm Gaussian

More information

MINIMUM HARDWARE AND OS SPECIFICATIONS File Stream Document Management Software - System Requirements for V4.2

MINIMUM HARDWARE AND OS SPECIFICATIONS File Stream Document Management Software - System Requirements for V4.2 MINIMUM HARDWARE AND OS SPECIFICATIONS File Stream Document Management Software - System Requirements for V4.2 NB: please read this page carefully, as it contains 4 separate specifications for a Workstation

More information

Readme for Platform Open Cluster Stack (OCS)

Readme for Platform Open Cluster Stack (OCS) Readme for Platform Open Cluster Stack (OCS) Version 4.1.1-2.0 October 25 2006 Platform Computing Contents What is Platform OCS? What's New in Platform OCS 4.1.1-2.0? Supported Architecture Distribution

More information

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007 Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise

More information

Future Trends in Hardware and Software for use in Simulation

Future Trends in Hardware and Software for use in Simulation Future Trends in Hardware and Software for use in Simulation Steve Feldman VP/IT, CD-adapco April, 2009 HighPerformanceComputing Building Blocks CPU I/O Interconnect Software General CPU Maximum clock

More information

Assessment of LS-DYNA Scalability Performance on Cray XD1

Assessment of LS-DYNA Scalability Performance on Cray XD1 5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123

More information

Minimum Hardware and OS Specifications

Minimum Hardware and OS Specifications Hardware and OS Specifications File Stream Document Management Software System Requirements for v4.5 NB: please read through carefully, as it contains 4 separate specifications for a Workstation PC, a

More information

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]

More information

SPEC MPI2007 Benchmarks for HPC Systems

SPEC MPI2007 Benchmarks for HPC Systems SPEC MPI2007 Benchmarks for HPC Systems Ron Lieberman Chair, SPEC HPG HP-MPI Performance Hewlett-Packard Company Dr. Tom Elken Manager, Performance Engineering QLogic Corporation Dr William Brantley Manager

More information

Installation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz

Installation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz Installation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz Rene Kobler May 25, 25 Contents 1 Abstract 2 2 Status 2 3 Changelog 2 4 Installation Notes 3 4.1

More information

IT Business Management System Requirements Guide

IT Business Management System Requirements Guide IT Business Management System Requirements Guide IT Business Management Advanced or Enterprise Edition 8.1 This document supports the version of each product listed and supports all subsequent versions

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

Manufacturing Bringing New Levels of Performance to CAE Applications

Manufacturing Bringing New Levels of Performance to CAE Applications Solution Brief: Manufacturing Bringing New Levels of Performance to CAE Applications Abstract Computer Aided Engineering (CAE) is used to help manufacturers bring products to market faster while maintaining

More information

Maximizing Memory Performance for ANSYS Simulations

Maximizing Memory Performance for ANSYS Simulations Maximizing Memory Performance for ANSYS Simulations By Alex Pickard, 2018-11-19 Memory or RAM is an important aspect of configuring computers for high performance computing (HPC) simulation work. The performance

More information

Experiences with HP SFS / Lustre in HPC Production

Experiences with HP SFS / Lustre in HPC Production Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre

More information

Full Vehicle Dynamic Analysis using Automated Component Modal Synthesis. Peter Schartz, Parallel Project Manager ClusterWorld Conference June 2003

Full Vehicle Dynamic Analysis using Automated Component Modal Synthesis. Peter Schartz, Parallel Project Manager ClusterWorld Conference June 2003 Full Vehicle Dynamic Analysis using Automated Component Modal Synthesis Peter Schartz, Parallel Project Manager Conference Outline Introduction Background Theory Case Studies Full Vehicle Dynamic Analysis

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

HPC In The Cloud? Michael Kleber. July 2, Department of Computer Sciences University of Salzburg, Austria

HPC In The Cloud? Michael Kleber. July 2, Department of Computer Sciences University of Salzburg, Austria HPC In The Cloud? Michael Kleber Department of Computer Sciences University of Salzburg, Austria July 2, 2012 Content 1 2 3 MUSCLE NASA 4 5 Motivation wide spread availability of cloud services easy access

More information

WRF performance on Intel Processors

WRF performance on Intel Processors WRF performance on Intel Processors R. Dubtsov, A. Semenov, D. Shkurko Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {roman.s.dubtsov, alexander.l.semenov,dmitry.v.shkurko,}@intel.com

More information

Gaussian03 and Gaussview Presentation Anita Orendt Center for High Performance Computing University of Utah anita.orendt@utah.edu April 24, 2008 http://www.chpc.utah.edu 4/29/08 http://www.chpc.utah.edu

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

Technical guide. Windows HPC server 2016 for LS-DYNA How to setup. Reference system setup - v1.0

Technical guide. Windows HPC server 2016 for LS-DYNA How to setup. Reference system setup - v1.0 Technical guide Windows HPC server 2016 for LS-DYNA How to setup Reference system setup - v1.0 2018-02-17 2018 DYNAmore Nordic AB LS-DYNA / LS-PrePost 1 Introduction - Running LS-DYNA on Windows HPC cluster

More information

an exceptional performance/price ratio are fully configured for parallel computation

an exceptional performance/price ratio are fully configured for parallel computation WHO ARE WE? (PQS) manufactures parallel computers with integrated software for high-performance computational chemistry. Our offerings have an exceptional performance/price ratio are fully configured for

More information

vrealize Business System Requirements Guide

vrealize Business System Requirements Guide vrealize Business System Requirements Guide vrealize Business Advanced and Enterprise 8.2.1 This document supports the version of each product listed and supports all subsequent versions until the document

More information

Basis Sets, Electronic Properties, and Visualization

Basis Sets, Electronic Properties, and Visualization Basis Sets, Electronic Properties, and Visualization Goals of the Exercise - Perform common quantum chemical calculations using the Gaussian package. - Practice preparation, monitoring, and analysis of

More information

NUMA replicated pagecache for Linux

NUMA replicated pagecache for Linux NUMA replicated pagecache for Linux Nick Piggin SuSE Labs January 27, 2008 0-0 Talk outline I will cover the following areas: Give some NUMA background information Introduce some of Linux s NUMA optimisations

More information

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology

Introduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology Introduction to the SHARCNET Environment 2010-May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology available hardware and software resources our web portal

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

InfoBrief. Dell 2-Node Cluster Achieves Unprecedented Result with Three-tier SAP SD Parallel Standard Application Benchmark on Linux

InfoBrief. Dell 2-Node Cluster Achieves Unprecedented Result with Three-tier SAP SD Parallel Standard Application Benchmark on Linux InfoBrief Dell 2-Node Cluster Achieves Unprecedented Result with Three-tier SAP SD Parallel Standard Application Benchmark on Linux Leveraging Oracle 9i Real Application Clusters (RAC) Technology and Red

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

Introduction to High Performance Computing at ZIH

Introduction to High Performance Computing at ZIH Center for Information Services and High Performance Computing (ZIH) Introduction to High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Zellescher Weg 12 Trefftz-Bau/HRSK 151 Phone

More information

HPC Solution. Technology for a New Era in Computing

HPC Solution. Technology for a New Era in Computing HPC Solution Technology for a New Era in Computing TEL IN HPC & Storage.. 20 years of changing with Technology Complete Solution Integrators for Select Verticals Mechanical Design & Engineering High Performance

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

Itanium 2. Itanium.

Itanium 2. Itanium. Itanium 2 Itanium 2 Itanium www.intel.com/itanium2 ... 2... 2... 4... 4... 4... 4... 5... 5... 5... 6 Itanium 9MB L3 Itanium 2 1.60GHz Itanium Itanium 2 Itanium 2 Itanium 2 25% 1 5 15% IA-32 Itanium 2

More information

Session 40 Evaluating Server Options

Session 40 Evaluating Server Options Quintessential School Systems Session 40 Evaluating Server Options Presented by Duane Percox Bill Genske Copyright Quintessential School Systems, 2009 All Rights Reserved 867 American Street --- Second

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces

MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces Hye-Churn Jang Hyun-Wook (Jin) Jin Department of Computer Science and Engineering Konkuk University Seoul, Korea {comfact,

More information

Defense Technical Information Center Compilation Part Notice

Defense Technical Information Center Compilation Part Notice UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP023800 TITLE: A Comparative Study of ARL Linux Cluster Performance DISTRIBUTION: Approved for public release, distribution unlimited

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe: Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: http://www.cug.org! Cray Technical Workshop Europe:! When:

More information

High Volume Transaction Processing in Enterprise Applications

High Volume Transaction Processing in Enterprise Applications High Volume Transaction Processing in Enterprise Applications By Thomas Wheeler Recursion Software, Inc. February 16, 2005 TABLE OF CONTENTS Overview... 1 Products, Tools, and Environment... 1 OS and hardware

More information

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

Maximizing Six-Core AMD Opteron Processor Performance with RHEL Maximizing Six-Core AMD Opteron Processor Performance with RHEL Bhavna Sarathy Red Hat Technical Lead, AMD Sanjay Rao Senior Software Engineer, Red Hat Sept 4, 2009 1 Agenda Six-Core AMD Opteron processor

More information

Itanium 2 Impact Software / Systems MSC.Software. Jay Clark Director, Business Development High Performance Computing

Itanium 2 Impact Software / Systems MSC.Software. Jay Clark Director, Business Development High Performance Computing Itanium 2 Impact Software / Systems MSC.Software Jay Clark Director, Business Development High Performance Computing jay.clark@mscsoftware.com Agenda What MSC.Software does Software vendor point of view

More information

Coherent HyperTransport Enables The Return of the SMP

Coherent HyperTransport Enables The Return of the SMP Coherent HyperTransport Enables The Return of the SMP Einar Rustad Copyright 2010 - All rights reserved. 1 Top500 History The expensive SMPs used to rule: Cray XMP, Convex Exemplar, Sun ES NOW, the Clusters

More information

Determining the MPP LS-DYNA Communication and Computation Costs with the 3-Vehicle Collision Model and the Infiniband Interconnect

Determining the MPP LS-DYNA Communication and Computation Costs with the 3-Vehicle Collision Model and the Infiniband Interconnect 8 th International LS-DYNA Users Conference Computing / Code Tech (1) Determining the MPP LS-DYNA Communication and Computation Costs with the 3-Vehicle Collision Model and the Infiniband Interconnect

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel

More information

Symantec NetBackup PureDisk Compatibility Matrix Created August 26, 2010

Symantec NetBackup PureDisk Compatibility Matrix Created August 26, 2010 Symantec NetBackup PureDisk 6.6.1 Compatibility Matrix Created August 26, 2010 Copyright 2010 Symantec Corporation. All rights reserved. Symantec, the Symantec Logo, and Backup Exec are trademarks or registered

More information

Ab Initio modelling of surfaces and surfactants

Ab Initio modelling of surfaces and surfactants Ab Initio modelling of surfaces and surfactants Outline Background System studied Hardware used MPI programming Background Flotation of minerals is a multi billion dollar industry Much is allready known

More information

Benchmarking AMD64 and EMT64

Benchmarking AMD64 and EMT64 Benchmarking AMD64 and EMT64 Hans Wenzel, Oliver Gutsche, FNAL, Batavia, IL 60510, USA Mako Furukawa, University of Nebraska, Lincoln, USA Abstract We have benchmarked various single and dual core 64 Bit

More information

Parallel Processing & Multicore computers

Parallel Processing & Multicore computers Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)

More information

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational

More information

vcdm System Requirements Version 6.6 English

vcdm System Requirements Version 6.6 English vcdm System Requirements Version 6.6 English Imprint Vector Informatik GmbH Ingersheimer Straße 24 70499 Stuttgart, Germany Vector reserves the right to modify any information and/or data in this user

More information

OCTOPUS Performance Benchmark and Profiling. June 2015

OCTOPUS Performance Benchmark and Profiling. June 2015 OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 28 International ANSYS Conference Maximizing Performance for Large Scale Analysis on Multi-core Processor Systems Don Mize Technical Consultant Hewlett Packard 28 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

vcdm System Requirements Version 6.4 English

vcdm System Requirements Version 6.4 English vcdm System Requirements Version 6.4 English Imprint Vector Informatik GmbH Ingersheimer Straße 24 70499 Stuttgart, Germany Vector reserves the right to modify any information and/or data in this user

More information

Comp. Org II, Spring

Comp. Org II, Spring Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer

More information

Graham vs legacy systems

Graham vs legacy systems New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet

More information

Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract

Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand Abstract...1 Introduction...2 Overview of ConnectX Architecture...2 Performance Results...3 Acknowledgments...7 For

More information

Computer Systems Architecture

Computer Systems Architecture Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student

More information

Designing High Performance Communication Middleware with Emerging Multi-core Architectures

Designing High Performance Communication Middleware with Emerging Multi-core Architectures Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

ORACLE Linux / TSC.

ORACLE Linux / TSC. ORACLE Linux / TSC Sekook.jang@oracle.com Unbreakable Linux Unbreakable Support Unbreakable Products Unbreakable Performance Asianux Then. Next? Microsoft Scalability 20 User Workgroup Computing Microsoft

More information

GEN_OMEGA2: The HPSUMMARY Procedure: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with

GEN_OMEGA2: The HPSUMMARY Procedure: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with GEN_OMEGA2: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with The HPSUMMARY Procedure: Analysis of Variance Models An Old Friend s Younger (and Brawnier) Cousin The HPSUMMARY

More information

Parallel Architecture. Hwansoo Han

Parallel Architecture. Hwansoo Han Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range

More information

Parallel File Systems Compared

Parallel File Systems Compared Parallel File Systems Compared Computing Centre (SSCK) University of Karlsruhe, Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» Parallel file systems (PFS) Design and typical usage Important features

More information

Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Summarized by: Michael Riera 9/17/2011 University of Central Florida CDA5532 Agenda

More information

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director CAS 2K13 Sept. 2013 Jean-Pierre Panziera Chief Technology Director 1 personal note 2 Complete solutions for Extreme Computing b ubullx ssupercomputer u p e r c o p u t e r suite s u e Production ready

More information

Introduction to HPC2N

Introduction to HPC2N Introduction to HPC2N Birgitte Brydsø HPC2N, Umeå University 4 May 2017 1 / 24 Overview Kebnekaise and Abisko Using our systems The File System The Module System Overview Compiler Tool Chains Examples

More information

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization

More information

FUSION1200 Scalable x86 SMP System

FUSION1200 Scalable x86 SMP System FUSION1200 Scalable x86 SMP System Introduction Life Sciences Departmental System Manufacturing (CAE) Departmental System Competitive Analysis: IBM x3950 Competitive Analysis: SUN x4600 / SUN x4600 M2

More information

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda KFUPM HPC Workshop April 29-30 2015 Mohamed Mekias HPC Solutions Consultant Agenda 1 Agenda-Day 1 HPC Overview What is a cluster? Shared v.s. Distributed Parallel v.s. Massively Parallel Interconnects

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

vcdm System Requirements Version 6.2 English

vcdm System Requirements Version 6.2 English vcdm System Requirements Version 6.2 English Imprint Vector Informatik GmbH Ingersheimer Straße 24 70499 Stuttgart, Germany Vector reserves the right to modify any information and/or data in this user

More information

MD NASTRAN on Advanced SGI Architectures *

MD NASTRAN on Advanced SGI Architectures * W h i t e P a p e r MD NASTRAN on Advanced SGI Architectures * Olivier Schreiber, Scott Shaw, Joe Griffin** Abstract MD Nastran tackles all important Normal Mode Analyses utilizing both Shared Memory Parallelism

More information

4. LS-DYNA Anwenderforum, Bamberg 2005 IT I. September 28, 2005 Computation Products Group 1. September 28, 2005 Computation Products Group 2

4. LS-DYNA Anwenderforum, Bamberg 2005 IT I. September 28, 2005 Computation Products Group 1. September 28, 2005 Computation Products Group 2 4. LS-DYNA Anwenderforum, Bamberg 2005 IT I High Performance Enterprise Computing Hardware Design & Performance Application Optimization Guide Performance Evaluation Lynn Lewis Director WW FAE MSS lynn.lewis@amd.com

More information

HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON

HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON Li Tan 1, Longxiang Chen 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California,

More information

GROMACS Performance Benchmark and Profiling. August 2011

GROMACS Performance Benchmark and Profiling. August 2011 GROMACS Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

Building 96-processor Opteron Cluster at Florida International University (FIU) January 5-10, 2004

Building 96-processor Opteron Cluster at Florida International University (FIU) January 5-10, 2004 Building 96-processor Opteron Cluster at Florida International University (FIU) January 5-10, 2004 Brian Dennis, Ph.D. Visiting Associate Professor University of Tokyo Designing the Cluster Goal: provide

More information

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop Before We Start Sign in hpcxx account slips Windows Users: Download PuTTY Google PuTTY First result Save putty.exe to Desktop Research Computing at Virginia Tech Advanced Research Computing Compute Resources

More information

Parallel Computing: From Inexpensive Servers to Supercomputers

Parallel Computing: From Inexpensive Servers to Supercomputers Parallel Computing: From Inexpensive Servers to Supercomputers Lyle N. Long The Pennsylvania State University & The California Institute of Technology Seminar to the Koch Lab http://www.personal.psu.edu/lnl

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

Performance Analysis of LS-DYNA in Huawei HPC Environment

Performance Analysis of LS-DYNA in Huawei HPC Environment Performance Analysis of LS-DYNA in Huawei HPC Environment Pak Lui, Zhanxian Chen, Xiangxu Fu, Yaoguo Hu, Jingsong Huang Huawei Technologies Abstract LS-DYNA is a general-purpose finite element analysis

More information

HOKUSAI System. Figure 0-1 System diagram

HOKUSAI System. Figure 0-1 System diagram HOKUSAI System October 11, 2017 Information Systems Division, RIKEN 1.1 System Overview The HOKUSAI system consists of the following key components: - Massively Parallel Computer(GWMPC,BWMPC) - Application

More information

Getting the Best Performance from an HPC Cluster: BY BARIS GULER; JENWEI HSIEH, PH.D.; RAJIV KAPOOR; LANCE SHULER; AND JOHN BENNINGHOFF

Getting the Best Performance from an HPC Cluster: BY BARIS GULER; JENWEI HSIEH, PH.D.; RAJIV KAPOOR; LANCE SHULER; AND JOHN BENNINGHOFF Getting the Best Performance from an HPC Cluster: A STAR-CD Case Study High-performance computing (HPC) clusters represent a new era in supercomputing. Because HPC clusters usually comprise standards-based,

More information

Parallel Programming with MPI

Parallel Programming with MPI Parallel Programming with MPI Science and Technology Support Ohio Supercomputer Center 1224 Kinnear Road. Columbus, OH 43212 (614) 292-1800 oschelp@osc.edu http://www.osc.edu/supercomputing/ Functions

More information

Distributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5

Distributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5 Distributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5 Paper IEEE Computer (May 2016) What is DAS? Distributed common infrastructure for Dutch Computer Science Distributed: multiple (4-6) clusters

More information

Linux Clusters for High- Performance Computing: An Introduction

Linux Clusters for High- Performance Computing: An Introduction Linux Clusters for High- Performance Computing: An Introduction Jim Phillips, Tim Skirvin Outline Why and why not clusters? Consider your Users Application Budget Environment Hardware System Software HPC

More information

Veritas NetBackup Enterprise Server and Server 6.x OS Software Compatibility List

Veritas NetBackup Enterprise Server and Server 6.x OS Software Compatibility List Veritas NetBackup Enterprise Server and Server 6.x OS Software Compatibility List Created on July 21, 2010 Copyright 2010 Symantec Corporation. All rights reserved. Symantec, the Symantec Logo, and Backup

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Efficient Power Management

Efficient Power Management Efficient Power Management on Dell PowerEdge Servers with AMD Opteron Processors Efficient power management enables enterprises to help reduce overall IT costs by avoiding unnecessary energy use. This

More information

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting

More information