Gaussian G09 Scaling Benchmarks
|
|
- Lawrence Harper
- 5 years ago
- Views:
Transcription
1 Systems: Gaussian G9 Scaling Benchmarks Jemmy Hu SHARCNET June-July, 29 Name CPUs/node RAM/node OS Interconnect Saw 8 (2 quad-core), 6. GB HP Linux XC 4 InfiniBand 3. GHz Narwhal Silky Hound 4 (2 dual-core), 2.2 GHz SGI, GHz 6 (4 quad) Xeon@2.4GHz, 32 (8 quad) Opteron@2.2 GHz 8. GB HP Linux XC 3. Myrinet 2g (gm) 256GB SUSE Enterprise SMP, NUMA 28 GB Centos 5 InfiniBand, NFS storage file system Molecules and Methods/Models: Molecule\Module B3LYP MP2 CISD CCSD I III IV C4H4Cl2P2Pd (test job 445) CH3OH (test job 58 ) CH3CH2 (test job 684 ) Opt + Freq Opt + Freq Opt + Freq Opt +Freq BS on card BS on card 6-3g(2df,p) 6-3g*, 6-3g(2df,p) Gaussian versions: Gaussian versions G9-A. Binary versions from Gaussian Inc G9-A.2 Compiled from source on Silky Binaries for others G3-E. Binary versions from Gaussian Inc Target goals: [] Scaling results for typical models/methods in Gaussian 9 [2] Scaling on different systems: clusters (saw, narwhal, hound) vs. SMP (silky) [3] G3 vs G9
2 General conclusions:. Gaussian 9 scales quite good for shared memory jobs. Silky (SMP machine): DFT type of methods scale very good to 6 processors (small speedup from 6 to 32 processors) MP2 type of methods scale very good to 8 CPUs (small speedup from 8 to 6 processors) Saw (8-cpu nodes): DFT scale good to 8 processors MP2 scales to 4 processors (small speedup for 8 processors) 2. Gaussian does not scale for CI and CC based methods. 3. G9 is about 2 times faster than G3 for DFT, CI and CC based methods. Maximum processors for G9 jobs (In practice, in order to run more jobs on a system, smaller cpus/size jobs are recommended) [] Silky (SMP machine) Methods/Modules Opt Freq Energy HF DFT (B3LYP, etc) MP(2, 3, 4) CISD (cis, cid, cisd, qcisd) CCSD (ccd, ccsd, ccsd(t)) [2]Saw (2 quad-core nodes) Methods/Modules Opt Freq Energy HF DFT (B3LYP, etc) MP(2, 3, 4)* CISD (cis, cid, cisd, qcisd) CCSD (ccd, ccsd, ccsd(t)) *due to the node per job LSF nature, run 8-way MP2 on saw is fine. If a node can be shared by multiple jobs (torque on hound), 4-way MP2 jobs are recommended. [3] bull, goblin (and other 4-core node XC clusters) Methods/Modules Opt Freq Energy HF DFT (B3LYP, etc) MP(2, 3, 4) CISD (cis, cid, cisd, qcisd) CCSD (ccd, ccsd, ccsd(t))
3 Results on saw, molecule-i 8 6 B3LYP-Optimization on saw 8 6 B3LYP-Optimization on saw G3-E. G9-A G3-E. G9-A MP2-Optimization on saw MP2-Frequency on saw G3-E. G9-A G3-E. G9-A
4 Molecule-III CISD - Opt on saw CISD -Freq on saw G3-E. G9-A.2 G3-E. G9-A No. pf CPUs Molecule-IV CCSD-Opt on saw CCSD-Freq on saw G3-E. G9-A G3-E. G9-A
5 Results on narwhal, molecule-i B3LYP-Opt on narwhal B3LYP-Freq on narwhal G3-C.2 G3-E. G9-A G3-C.2 G3-E. G9-A MP2-Opt on narwhal MP2-Freq on narwhal G3-C.2 G3-E. G9-A G3-C.2 G3-E. G9-A No. f CPUs
6 Results on silky, molecule-i B3LYP-Opt on silky B3LYP-Freq on silky G9-A G9-A No of CPUs No of CPUs MP2-Opt on silky MP2-Freq G9-A G9-A No of CPUs
7 Molecule: WH(CO)(NO)(PMe3)3 4 rmpwpw9-opt on silky 8 rmpwpw9-freq on silky G9-A G9-A No of CPUs
8 Results: saw, Molecule - I B3lyp / opt CPUs G3-E. G9-A., 62 (27ms) 2 7 (7m5s) 4 7 (m4s) (9m57s) (8m3s) (m39s) (7m) (5m37s) 29 (8m49s) 639 (m39s) 423 (7m3s) 326 (5m26s) (26m4s) (2m32s) (8ms) (6m9s) B3LYP / Freq G3-E. 56 (26ms) (4m7s) 4 53 (8m33s) (5m45s) G9-A., (3m23s) (7m) (3m54s) (2m38s) 832 (3m52s) 42 (7m) 233 (3m53s) 57 (2m37s) (2m) (9m26s) (5m) (3m6s) MP2 / opt G3-E (37m26s) (22m57s) (6m37s) 8* 5 (7m3s) G9-A., (32m4s) (9m43s) (4m32s) (4m27s) 234 (39ms) 74 (9m34s) 87 (4m3s) 866 (4m26s) (5m45s) (23m5s) (5m58s) (5m53s) MP2 / Freq G3-E (9h35m7s) (5h36m32s) (3h37m34s) 8* 2849 (3h34m9s) G9-A., (5h5ms) 9836 (5h3m36s) 9355 (2h33m25s) (2h35m52s) 63 (h43m4s) (h45m) 5455 (h37m23s) (h3m55s) (9h6s)? (3h58m3s) (2h2m2s) (h45m39s)
9 Results: CISD, Molecule-III Cluster: saw, G3-E./G9-A., 6-3g(2df,p) G3-E. G9-A. Opt Freq Opt Freq Run time Speedup 626 (m26s) 533 (h28m23s) 332 (5m32s) 273 (34m33s) (8m27s) (h2m2s) (5m37s) (37m24s) (2m42s) (h3m44s) (5m28s) (35m3s) (3m) (h27m) (5m44s) (36m) Results: CCSD, Molecule - IV Cluster: saw, G9-A. Opt Freq Opt Freq 6-3g* 6-3g* 6-3g(2df,p) 6-3g(2df,p) 48 (8m) 2 36 (6m) (7m2s) 8 36 (6m) 2355 (39m5s) (33m23s) (36m25s) (32m) 338 (55m8s).8 25 (4m4s).8 34 (5m54s) (47m4s) 6785 (4h39m45s) (3h32m4s) (4h25m53s) Cluster: saw, G3-E. Opt+Freq Opt Freq 6-3g* 6-3g* 6-3g* Run time Speed up 34m45s 6 (ms) 358 (58m38s) 2 34m6s (m5s) (h8m6s) 4 34ms (m2s) (h2m39s) (m22s) (h4m24s) Gaussian does not scale for CI or CC based methods, but G9-A. is about 2 times faster than G3- E. for the CISD and CCSD jobs (6-3g* results)
10 Cluster: narwhal, Molecule - I MP2 / Opt CPUs G3-C.2 G3-E. Runtimes(s) 7522 (2h5m22s) (45m44s) (38m5s) speedup 3629 (h29s) (35m57s) (33m26s) 466 (h7m46s) (35m4s).8 2 (33m2s) MP2 / Freq G3-C.2 G3-E (9h23m9s) (5h49m22s) (4h5m5s) 5452 (5h8m4s) (8h8m5s) (6h46m42s) (9h6m39s) (4h5m44s) (3hms) B3LYP / OPT G3-C.2 G3-E. 464 (h6m54s) (32m5s) 4 43 (23m23s) 397 (5m37s) (24ms) (8m2) 246 (35m46s) (8ms) (2m24s) B3LYP / Freq G3-C.2 G3-E. 288 (48m) (23m55s) (3m49s) 244 (35m44s) (8m45s) (m23s) 44 (24ms) (m6s) (6m4s) (35m35s) (8m32s) (2m42s) (26m29s) (2m4s) (7m6s)
11 Cluster: silky Molecule I, benchmark- Molecule II, Dmitri s sample (#rmpwpw9/genecp nosymm opt freq) DFT / opt CPUs ia64-b, M2 (rmpwpw9) ia64-b, M (9h28m46s) (35m3s) 2 35 (2m55s) (2h44m3s) (3m5s) (h32m46s) (8m5s) (52m24s) (5m5s) (34m3s) (B3LYP) ia64-s, M 33m8s.62 22m54s 2.7 2m3s 2m52s, 2m4s DFT / Freq ia64-b, M2 (rmpwpw9) ia64-b, M (4h28m5s) (4m7s) 2 48 (8m) (h24m46s) (4m2s) (54m46s) (2m3s) (42m37s) (.36) (m33s) (4m24s) (B3LYP) ia64-s, M 4m4s.76 8ms m9s 4m2s, 4m32s MP2 M Opt 4946 (h22m26s) 4 99 (33ms) 8* 232 (2m32s) 6 5 (7m3s) (3ms) Freq 682 (4h4m2s) (h4m46s) (h2m26s) (h4m2s) (54m45s)
12 Cluster: hound, Molecule I (NFS storage file system, results are meaningless) B3lyp / opt CPUs G9-A., amd4 Runtime(s) Speedu p G9-A., 43m34s 24m5s 25ms 34m2s 4 7m4s 9m4s 9m3s 6m3s 8 2m25s 6m42s 6m4s 9m 6 m29s 6m8s 6m7s 7m36s 32 7m4s B3LYP / Freq G9-A., G9-A., 3m42 7m5s 36m4s 24m3s 4 8m48s 5ms 4m57s 6m49s 8 5m 2m49s 2m49s 3m4s 6 5m42s 2ms (2ms) 2m5s 32 2m44s MP2 / opt G9-A., G9-A., h57m47s h4m22s h9m39s h5m44s 4 42m4s 9m7s 26m53s 32m2s 8* h5m53s 5m 22m22s 4m53s 6 h47m34s 2m9s 2m3s 32 2h4m26s MP2 / Freq G9-A., G9-A., h57m49s h4m34s 7h3m48s h45m8s 4 6h3m4s 6h23m45s 3h4m48s 5h2m 8* 7h3m3s 7h23m6s 2h52m7s 3h47m38s 6 3h5m42s 5h9m28s 5h33m29s 32 4h46m42s
13 Input files %mem = 2GB for B3LYP %mem = 4GB for MP2 computations %mem = 2GB for CISD %mem = 4GB for CCSD computations %nproc varies from, 2, 4, 8, 6 to 32 threads/cpus depending on the node structures Molecule I, (H2PCH2CH2PH2)PdCl2(CH3)2 for B3LYP and MP2 It is from Gaussian test job 445, the geom. and basis sets can be found in test445.com in the directory /opt/sharcnet/gaussian/g9/tests/com or /opt/sharcnet/gaussian/g3/tests/com The following leading lines have been added above the geom. inputs (%nproc varies for scaling tests) %nosave %mem=2gb %chk=benchmark-b3lyp- %nproc= #p b3lyp/gen 6d opt freq (for B3LYP computations) [#p mp2/gen 6d opt freq (for MP2 computations)] Gaussian Test Job 445: (H2PCH2CH2PH2)PdCl2(CH3)2 benchmark optimization Molecule: WH(CO)(NO)(PMe3)3 for rmpwpw9 %chk=test4cpussilky.chk %mem=256mw %nproc=4 #opt rmpwpw9/genecp nosymm WH(CO)(NO)(PMe3)3 test calculation using 4 CPUs W P P P N O C O
14 C H H H C H H H C H H H C H H H C H H H C H H H C H H H C H H H C H H H H H C N O P 6-3g(d,p) **** W sdd **** W sdd --Link-- %chk=test4cpussilky.chk %mem=52mw %nproc=4 #freq geom=check guess=read rmpwpw9/genecp nosymm WH(CO)(NO)(PMe3)3 test calculation using 4 CPUs H C N O P
15 6-3g(d,p) **** W sdd **** W Sdd Molecule III for CISD Opt and Freq %NoSave %chk=ch3oh_cisd-4 %mem=2gb %nproc=4 #p cisd/6-3g(2df,p) opt freq Gaussian Test Job 58: MEOH opt, freq STD MOD cisd C O CO H CH 2 T H CH 2 T 3 T H CH 2 T 3 T - H 2 OH T 3 8. CO.43 CH.9 OH.96 T Molecule IV, for CCSD Opt and Freq %NoSave %chk=ch3ch2_ccsd-8 %mem=4gb %nproc=8 #p ccsd/6-3g* opt freq Gaussian Test Job 684: Ethyl radical CCSD opt+freq 2 C C2 C CC H C CH C2 T H2 C CH C2 T H T H3 C2 CH C T H 8. H4 C2 CH C T H3 2. H5 C2 CH C T H3 24. CC.54 CH.9 T
CURRENT STATUS OF THE PROJECT TO ENABLE GAUSSIAN 09 ON GPGPUS
CURRENT STATUS OF THE PROJECT TO ENABLE GAUSSIAN 09 ON GPGPUS Roberto Gomperts (NVIDIA, Corp.) Michael Frisch (Gaussian, Inc.) Giovanni Scalmani (Gaussian, Inc.) Brent Leback (PGI) TOPICS Gaussian Design
More informationPlatform Choices for LS-DYNA
Platform Choices for LS-DYNA Manfred Willem and Lee Fisher High Performance Computing Division, HP lee.fisher@hp.com October, 2004 Public Benchmarks for LS-DYNA www.topcrunch.org administered by University
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationAPPENDIX 2. Density Functional Theory Calculation of Vibrational Frequencies with Gaussian 98W 1
APPENDIX 2 Density Functional Theory Calculation of Vibrational Frequencies with Gaussian 98W 1 This appendix describes the use of Gaussian 98W software for calculating vibrational frequencies for the
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationComputer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc.
Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters 2006 ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary Our Business Simulation Driven Product Development Deliver superior
More informationACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009
ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009 What is ACEnet? Shared resource......for research computing... physics, chemistry, oceanography, biology, math, engineering,
More informationAppendix 1b Gaussian 03W on Dell XPS
WARNING NOTICE: The experiments described in these materials are potentially hazardous and require a high level of safety training, special facilities and equipment, and supervision by appropriate individuals.
More informationMaterials Simulations using Gaussian 09. Eunhwan Jung
Materials Simulations using Gaussian 09 Eunhwan Jung 1 INTRODUCTION 2 Information of Gaussian Gaussian website www.gaussian.com Gaussian Online Manual http://www.gaussian.com/g_tech/g_ur/g09help.htm Gaussian
More informationMINIMUM HARDWARE AND OS SPECIFICATIONS File Stream Document Management Software - System Requirements for V4.2
MINIMUM HARDWARE AND OS SPECIFICATIONS File Stream Document Management Software - System Requirements for V4.2 NB: please read this page carefully, as it contains 4 separate specifications for a Workstation
More informationReadme for Platform Open Cluster Stack (OCS)
Readme for Platform Open Cluster Stack (OCS) Version 4.1.1-2.0 October 25 2006 Platform Computing Contents What is Platform OCS? What's New in Platform OCS 4.1.1-2.0? Supported Architecture Distribution
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationFuture Trends in Hardware and Software for use in Simulation
Future Trends in Hardware and Software for use in Simulation Steve Feldman VP/IT, CD-adapco April, 2009 HighPerformanceComputing Building Blocks CPU I/O Interconnect Software General CPU Maximum clock
More informationAssessment of LS-DYNA Scalability Performance on Cray XD1
5 th European LS-DYNA Users Conference Computing Technology (2) Assessment of LS-DYNA Scalability Performance on Cray Author: Ting-Ting Zhu, Cray Inc. Correspondence: Telephone: 651-65-987 Fax: 651-65-9123
More informationMinimum Hardware and OS Specifications
Hardware and OS Specifications File Stream Document Management Software System Requirements for v4.5 NB: please read through carefully, as it contains 4 separate specifications for a Workstation PC, a
More informationSMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems
Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]
More informationSPEC MPI2007 Benchmarks for HPC Systems
SPEC MPI2007 Benchmarks for HPC Systems Ron Lieberman Chair, SPEC HPG HP-MPI Performance Hewlett-Packard Company Dr. Tom Elken Manager, Performance Engineering QLogic Corporation Dr William Brantley Manager
More informationInstallation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz
Installation and Test of Molecular Dynamics Simulation Packages on SGI Altix and Hydra-Cluster at JKU Linz Rene Kobler May 25, 25 Contents 1 Abstract 2 2 Status 2 3 Changelog 2 4 Installation Notes 3 4.1
More informationIT Business Management System Requirements Guide
IT Business Management System Requirements Guide IT Business Management Advanced or Enterprise Edition 8.1 This document supports the version of each product listed and supports all subsequent versions
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationManufacturing Bringing New Levels of Performance to CAE Applications
Solution Brief: Manufacturing Bringing New Levels of Performance to CAE Applications Abstract Computer Aided Engineering (CAE) is used to help manufacturers bring products to market faster while maintaining
More informationMaximizing Memory Performance for ANSYS Simulations
Maximizing Memory Performance for ANSYS Simulations By Alex Pickard, 2018-11-19 Memory or RAM is an important aspect of configuring computers for high performance computing (HPC) simulation work. The performance
More informationExperiences with HP SFS / Lustre in HPC Production
Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre
More informationFull Vehicle Dynamic Analysis using Automated Component Modal Synthesis. Peter Schartz, Parallel Project Manager ClusterWorld Conference June 2003
Full Vehicle Dynamic Analysis using Automated Component Modal Synthesis Peter Schartz, Parallel Project Manager Conference Outline Introduction Background Theory Case Studies Full Vehicle Dynamic Analysis
More informationThe Optimal CPU and Interconnect for an HPC Cluster
5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance
More informationHPC In The Cloud? Michael Kleber. July 2, Department of Computer Sciences University of Salzburg, Austria
HPC In The Cloud? Michael Kleber Department of Computer Sciences University of Salzburg, Austria July 2, 2012 Content 1 2 3 MUSCLE NASA 4 5 Motivation wide spread availability of cloud services easy access
More informationWRF performance on Intel Processors
WRF performance on Intel Processors R. Dubtsov, A. Semenov, D. Shkurko Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {roman.s.dubtsov, alexander.l.semenov,dmitry.v.shkurko,}@intel.com
More informationGaussian03 and Gaussview Presentation Anita Orendt Center for High Performance Computing University of Utah anita.orendt@utah.edu April 24, 2008 http://www.chpc.utah.edu 4/29/08 http://www.chpc.utah.edu
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationTechnical guide. Windows HPC server 2016 for LS-DYNA How to setup. Reference system setup - v1.0
Technical guide Windows HPC server 2016 for LS-DYNA How to setup Reference system setup - v1.0 2018-02-17 2018 DYNAmore Nordic AB LS-DYNA / LS-PrePost 1 Introduction - Running LS-DYNA on Windows HPC cluster
More informationan exceptional performance/price ratio are fully configured for parallel computation
WHO ARE WE? (PQS) manufactures parallel computers with integrated software for high-performance computational chemistry. Our offerings have an exceptional performance/price ratio are fully configured for
More informationvrealize Business System Requirements Guide
vrealize Business System Requirements Guide vrealize Business Advanced and Enterprise 8.2.1 This document supports the version of each product listed and supports all subsequent versions until the document
More informationBasis Sets, Electronic Properties, and Visualization
Basis Sets, Electronic Properties, and Visualization Goals of the Exercise - Perform common quantum chemical calculations using the Gaussian package. - Practice preparation, monitoring, and analysis of
More informationNUMA replicated pagecache for Linux
NUMA replicated pagecache for Linux Nick Piggin SuSE Labs January 27, 2008 0-0 Talk outline I will cover the following areas: Give some NUMA background information Introduce some of Linux s NUMA optimisations
More informationIntroduction to the SHARCNET Environment May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology
Introduction to the SHARCNET Environment 2010-May-25 Pre-(summer)school webinar Speaker: Alex Razoumov University of Ontario Institute of Technology available hardware and software resources our web portal
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationInfoBrief. Dell 2-Node Cluster Achieves Unprecedented Result with Three-tier SAP SD Parallel Standard Application Benchmark on Linux
InfoBrief Dell 2-Node Cluster Achieves Unprecedented Result with Three-tier SAP SD Parallel Standard Application Benchmark on Linux Leveraging Oracle 9i Real Application Clusters (RAC) Technology and Red
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationIntroduction to High Performance Computing at ZIH
Center for Information Services and High Performance Computing (ZIH) Introduction to High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Zellescher Weg 12 Trefftz-Bau/HRSK 151 Phone
More informationHPC Solution. Technology for a New Era in Computing
HPC Solution Technology for a New Era in Computing TEL IN HPC & Storage.. 20 years of changing with Technology Complete Solution Integrators for Select Verticals Mechanical Design & Engineering High Performance
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationItanium 2. Itanium.
Itanium 2 Itanium 2 Itanium www.intel.com/itanium2 ... 2... 2... 4... 4... 4... 4... 5... 5... 5... 6 Itanium 9MB L3 Itanium 2 1.60GHz Itanium Itanium 2 Itanium 2 Itanium 2 25% 1 5 15% IA-32 Itanium 2
More informationSession 40 Evaluating Server Options
Quintessential School Systems Session 40 Evaluating Server Options Presented by Duane Percox Bill Genske Copyright Quintessential School Systems, 2009 All Rights Reserved 867 American Street --- Second
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationMiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces
MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces Hye-Churn Jang Hyun-Wook (Jin) Jin Department of Computer Science and Engineering Konkuk University Seoul, Korea {comfact,
More informationDefense Technical Information Center Compilation Part Notice
UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP023800 TITLE: A Comparative Study of ARL Linux Cluster Performance DISTRIBUTION: Approved for public release, distribution unlimited
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationCray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:
Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: http://www.cug.org! Cray Technical Workshop Europe:! When:
More informationHigh Volume Transaction Processing in Enterprise Applications
High Volume Transaction Processing in Enterprise Applications By Thomas Wheeler Recursion Software, Inc. February 16, 2005 TABLE OF CONTENTS Overview... 1 Products, Tools, and Environment... 1 OS and hardware
More informationMaximizing Six-Core AMD Opteron Processor Performance with RHEL
Maximizing Six-Core AMD Opteron Processor Performance with RHEL Bhavna Sarathy Red Hat Technical Lead, AMD Sanjay Rao Senior Software Engineer, Red Hat Sept 4, 2009 1 Agenda Six-Core AMD Opteron processor
More informationItanium 2 Impact Software / Systems MSC.Software. Jay Clark Director, Business Development High Performance Computing
Itanium 2 Impact Software / Systems MSC.Software Jay Clark Director, Business Development High Performance Computing jay.clark@mscsoftware.com Agenda What MSC.Software does Software vendor point of view
More informationCoherent HyperTransport Enables The Return of the SMP
Coherent HyperTransport Enables The Return of the SMP Einar Rustad Copyright 2010 - All rights reserved. 1 Top500 History The expensive SMPs used to rule: Cray XMP, Convex Exemplar, Sun ES NOW, the Clusters
More informationDetermining the MPP LS-DYNA Communication and Computation Costs with the 3-Vehicle Collision Model and the Infiniband Interconnect
8 th International LS-DYNA Users Conference Computing / Code Tech (1) Determining the MPP LS-DYNA Communication and Computation Costs with the 3-Vehicle Collision Model and the Infiniband Interconnect
More informationComp. Org II, Spring
Lecture 11 Parallel Processor Architectures Flynn s taxonomy from 1972 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing (Sta09 Fig 17.1) 2 Parallel
More informationSymantec NetBackup PureDisk Compatibility Matrix Created August 26, 2010
Symantec NetBackup PureDisk 6.6.1 Compatibility Matrix Created August 26, 2010 Copyright 2010 Symantec Corporation. All rights reserved. Symantec, the Symantec Logo, and Backup Exec are trademarks or registered
More informationAb Initio modelling of surfaces and surfactants
Ab Initio modelling of surfaces and surfactants Outline Background System studied Hardware used MPI programming Background Flotation of minerals is a multi billion dollar industry Much is allready known
More informationBenchmarking AMD64 and EMT64
Benchmarking AMD64 and EMT64 Hans Wenzel, Oliver Gutsche, FNAL, Batavia, IL 60510, USA Mako Furukawa, University of Nebraska, Lincoln, USA Abstract We have benchmarked various single and dual core 64 Bit
More informationParallel Processing & Multicore computers
Lecture 11 Parallel Processing & Multicore computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1)
More informationReducing Network Contention with Mixed Workloads on Modern Multicore Clusters
Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational
More informationvcdm System Requirements Version 6.6 English
vcdm System Requirements Version 6.6 English Imprint Vector Informatik GmbH Ingersheimer Straße 24 70499 Stuttgart, Germany Vector reserves the right to modify any information and/or data in this user
More informationOCTOPUS Performance Benchmark and Profiling. June 2015
OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the
More information2008 International ANSYS Conference
28 International ANSYS Conference Maximizing Performance for Large Scale Analysis on Multi-core Processor Systems Don Mize Technical Consultant Hewlett Packard 28 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationvcdm System Requirements Version 6.4 English
vcdm System Requirements Version 6.4 English Imprint Vector Informatik GmbH Ingersheimer Straße 24 70499 Stuttgart, Germany Vector reserves the right to modify any information and/or data in this user
More informationComp. Org II, Spring
Lecture 11 Parallel Processing & computers 8th edition: Ch 17 & 18 Earlier editions contain only Parallel Processing Parallel Processor Architectures Flynn s taxonomy from 1972 (Sta09 Fig 17.1) Computer
More informationGraham vs legacy systems
New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet
More informationPerformance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract
Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand Abstract...1 Introduction...2 Overview of ConnectX Architecture...2 Performance Results...3 Acknowledgments...7 For
More informationComputer Systems Architecture
Computer Systems Architecture Lecture 24 Mahadevan Gomathisankaran April 29, 2010 04/29/2010 Lecture 24 CSCE 4610/5610 1 Reminder ABET Feedback: http://www.cse.unt.edu/exitsurvey.cgi?csce+4610+001 Student
More informationDesigning High Performance Communication Middleware with Emerging Multi-core Architectures
Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationORACLE Linux / TSC.
ORACLE Linux / TSC Sekook.jang@oracle.com Unbreakable Linux Unbreakable Support Unbreakable Products Unbreakable Performance Asianux Then. Next? Microsoft Scalability 20 User Workgroup Computing Microsoft
More informationGEN_OMEGA2: The HPSUMMARY Procedure: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with
GEN_OMEGA2: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with The HPSUMMARY Procedure: Analysis of Variance Models An Old Friend s Younger (and Brawnier) Cousin The HPSUMMARY
More informationParallel Architecture. Hwansoo Han
Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range
More informationParallel File Systems Compared
Parallel File Systems Compared Computing Centre (SSCK) University of Karlsruhe, Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» Parallel file systems (PFS) Design and typical usage Important features
More informationAmazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud
Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Summarized by: Michael Riera 9/17/2011 University of Central Florida CDA5532 Agenda
More informationCAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director
CAS 2K13 Sept. 2013 Jean-Pierre Panziera Chief Technology Director 1 personal note 2 Complete solutions for Extreme Computing b ubullx ssupercomputer u p e r c o p u t e r suite s u e Production ready
More informationIntroduction to HPC2N
Introduction to HPC2N Birgitte Brydsø HPC2N, Umeå University 4 May 2017 1 / 24 Overview Kebnekaise and Abisko Using our systems The File System The Module System Overview Compiler Tool Chains Examples
More informationIFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor
IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization
More informationFUSION1200 Scalable x86 SMP System
FUSION1200 Scalable x86 SMP System Introduction Life Sciences Departmental System Manufacturing (CAE) Departmental System Competitive Analysis: IBM x3950 Competitive Analysis: SUN x4600 / SUN x4600 M2
More informationHPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda
KFUPM HPC Workshop April 29-30 2015 Mohamed Mekias HPC Solutions Consultant Agenda 1 Agenda-Day 1 HPC Overview What is a cluster? Shared v.s. Distributed Parallel v.s. Massively Parallel Interconnects
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationvcdm System Requirements Version 6.2 English
vcdm System Requirements Version 6.2 English Imprint Vector Informatik GmbH Ingersheimer Straße 24 70499 Stuttgart, Germany Vector reserves the right to modify any information and/or data in this user
More informationMD NASTRAN on Advanced SGI Architectures *
W h i t e P a p e r MD NASTRAN on Advanced SGI Architectures * Olivier Schreiber, Scott Shaw, Joe Griffin** Abstract MD Nastran tackles all important Normal Mode Analyses utilizing both Shared Memory Parallelism
More information4. LS-DYNA Anwenderforum, Bamberg 2005 IT I. September 28, 2005 Computation Products Group 1. September 28, 2005 Computation Products Group 2
4. LS-DYNA Anwenderforum, Bamberg 2005 IT I High Performance Enterprise Computing Hardware Design & Performance Application Optimization Guide Performance Evaluation Lynn Lewis Director WW FAE MSS lynn.lewis@amd.com
More informationHP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON
HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON Li Tan 1, Longxiang Chen 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California,
More informationGROMACS Performance Benchmark and Profiling. August 2011
GROMACS Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationBuilding 96-processor Opteron Cluster at Florida International University (FIU) January 5-10, 2004
Building 96-processor Opteron Cluster at Florida International University (FIU) January 5-10, 2004 Brian Dennis, Ph.D. Visiting Associate Professor University of Tokyo Designing the Cluster Goal: provide
More informationBefore We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop
Before We Start Sign in hpcxx account slips Windows Users: Download PuTTY Google PuTTY First result Save putty.exe to Desktop Research Computing at Virginia Tech Advanced Research Computing Compute Resources
More informationParallel Computing: From Inexpensive Servers to Supercomputers
Parallel Computing: From Inexpensive Servers to Supercomputers Lyle N. Long The Pennsylvania State University & The California Institute of Technology Seminar to the Koch Lab http://www.personal.psu.edu/lnl
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationPerformance Analysis of LS-DYNA in Huawei HPC Environment
Performance Analysis of LS-DYNA in Huawei HPC Environment Pak Lui, Zhanxian Chen, Xiangxu Fu, Yaoguo Hu, Jingsong Huang Huawei Technologies Abstract LS-DYNA is a general-purpose finite element analysis
More informationHOKUSAI System. Figure 0-1 System diagram
HOKUSAI System October 11, 2017 Information Systems Division, RIKEN 1.1 System Overview The HOKUSAI system consists of the following key components: - Massively Parallel Computer(GWMPC,BWMPC) - Application
More informationGetting the Best Performance from an HPC Cluster: BY BARIS GULER; JENWEI HSIEH, PH.D.; RAJIV KAPOOR; LANCE SHULER; AND JOHN BENNINGHOFF
Getting the Best Performance from an HPC Cluster: A STAR-CD Case Study High-performance computing (HPC) clusters represent a new era in supercomputing. Because HPC clusters usually comprise standards-based,
More informationParallel Programming with MPI
Parallel Programming with MPI Science and Technology Support Ohio Supercomputer Center 1224 Kinnear Road. Columbus, OH 43212 (614) 292-1800 oschelp@osc.edu http://www.osc.edu/supercomputing/ Functions
More informationDistributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5
Distributed ASCI Supercomputer DAS-1 DAS-2 DAS-3 DAS-4 DAS-5 Paper IEEE Computer (May 2016) What is DAS? Distributed common infrastructure for Dutch Computer Science Distributed: multiple (4-6) clusters
More informationLinux Clusters for High- Performance Computing: An Introduction
Linux Clusters for High- Performance Computing: An Introduction Jim Phillips, Tim Skirvin Outline Why and why not clusters? Consider your Users Application Budget Environment Hardware System Software HPC
More informationVeritas NetBackup Enterprise Server and Server 6.x OS Software Compatibility List
Veritas NetBackup Enterprise Server and Server 6.x OS Software Compatibility List Created on July 21, 2010 Copyright 2010 Symantec Corporation. All rights reserved. Symantec, the Symantec Logo, and Backup
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationEfficient Power Management
Efficient Power Management on Dell PowerEdge Servers with AMD Opteron Processors Efficient power management enables enterprises to help reduce overall IT costs by avoiding unnecessary energy use. This
More informationIntel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2
Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2 This release of the Intel C++ Compiler 16.0 product is a Pre-Release, and as such is 64 architecture processor supporting
More information