NAMD GPU Performance Benchmark. March 2011

Similar documents
NAMD Performance Benchmark and Profiling. November 2010

LAMMPSCUDA GPU Performance. April 2011

NAMD Performance Benchmark and Profiling. February 2012

NAMD Performance Benchmark and Profiling. January 2015

AMBER 11 Performance Benchmark and Profiling. July 2011

CP2K Performance Benchmark and Profiling. April 2011

GROMACS Performance Benchmark and Profiling. August 2011

AcuSolve Performance Benchmark and Profiling. October 2011

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

LAMMPS Performance Benchmark and Profiling. July 2012

GROMACS Performance Benchmark and Profiling. September 2012

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

AcuSolve Performance Benchmark and Profiling. October 2011

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

Himeno Performance Benchmark and Profiling. December 2010

CP2K Performance Benchmark and Profiling. April 2011

Altair RADIOSS Performance Benchmark and Profiling. May 2013

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

NEMO Performance Benchmark and Profiling. May 2011

HYCOM Performance Benchmark and Profiling

ICON Performance Benchmark and Profiling. March 2012

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

OCTOPUS Performance Benchmark and Profiling. June 2015

SNAP Performance Benchmark and Profiling. April 2014

CPMD Performance Benchmark and Profiling. February 2014

MILC Performance Benchmark and Profiling. April 2013

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

ABySS Performance Benchmark and Profiling. May 2010

LS-DYNA Performance Benchmark and Profiling. April 2015

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

LS-DYNA Performance Benchmark and Profiling. October 2017

AMD EPYC and NAMD Powering the Future of HPC February, 2019

MM5 Modeling System Performance Research and Profiling. March 2009

OpenFOAM Performance Testing and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

Game-changing Extreme GPU computing with The Dell PowerEdge C4130

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Sharing High-Performance Devices Across Multiple Virtual Machines

OpenPOWER Performance

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

Birds of a Feather Presentation

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

High Performance Computing with Accelerators

OpenPOWER Performance

2008 International ANSYS Conference

The rcuda middleware and applications

Interconnect Your Future

HPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab

Solutions for Scalable HPC

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

n N c CIni.o ewsrg.au

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

Clustering Optimizations How to achieve optimal performance? Pak Lui

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

Interconnect Your Future

Building the Most Efficient Machine Learning System

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC

IBM Power AC922 Server

ENABLING NEW SCIENCE GPU SOLUTIONS

Future Routing Schemes in Petascale clusters

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

The Future of Interconnect Technology

NAMD Serial and Parallel Performance

rcuda: an approach to provide remote access to GPU computational power

Performance comparison between a massive SMP machine and clusters

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

GRID Testing and Profiling. November 2017

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Increasing the efficiency of your GPU-enabled cluster with rcuda. Federico Silla Technical University of Valencia Spain

Building the Most Efficient Machine Learning System

High Performance Computing

GPU Clusters for High- Performance Computing Jeremy Enos Innovative Systems Laboratory

Optimal BIOS settings for HPC with Dell PowerEdge 12 th generation servers

Accelerating Molecular Modeling Applications with Graphics Processors

Towards Transparent and Efficient GPU Communication on InfiniBand Clusters. Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake

HPC 2 Informed by Industry

The Future of High Performance Interconnects

Performance Analysis of HPC Applications on Several Dell PowerEdge 12 th Generation Servers

GPU acceleration on IB clusters. Sadaf Alam Jeffrey Poznanovic Kristopher Howard Hussein Nasser El-Harake

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

Martin Dubois, ing. Contents

Architecting High Performance Computing Systems for Fault Tolerance and Reliability

Introduction to High-Performance Computing

University at Buffalo Center for Computational Research

Accelerating Implicit LS-DYNA with GPU

Research on performance dependence of cluster computing system based on GPU accelerators on architecture and number of cluster nodes

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

Open storage architecture for private Oracle database clouds

Transcription:

NAMD GPU Performance Benchmark March 2011

Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory Council Cluster Center For more info please refer to http://www.dell.com http://www.intel.com http://www.mellanox.com http://www.ks.uiuc.edu/research/namd/ 2

NAMD A parallel molecular dynamics code that received the 2002 Gordon Bell Award Designed for high-performance simulation of large biomolecular systems Scales to hundreds of processors and millions of atoms Developed by the joint collaboration of the Theoretical and Computational Biophysics Group (TCB) and the Parallel Programming Laboratory (PPL) at the University of Illinois at Urbana-Champaign NAMD is distributed free of charge with source code 3

Objectives The presented research was done to provide best practices NAMD performance benchmarking Interconnect performance comparisons Ways to increase NAMD productivity Power-efficient simulations The presented results will demonstrate The scalability of the compute environment Considerations for performance optimization 4

Test Cluster Configuration Dell PowerEdge C6100 4-node cluster Six-Core Intel X5670 @ 2.93 GHz CPUs, 24GB memory per node OS: RHEL5 Update 5, OFED 1.5.2 InfiniBand SW stack Dell PowerEdge C410x PCIe Expansion Chassis 4 NVIDIA Tesla C2050 Fermi GPUs Mellanox ConnectX-2 VPI Mezzanine cards for InfiniBand &10Gb Ethernet Mellanox MTS3600Q 36-Port 40Gb/s QDR InfiniBand switch Fulcrum based 10Gb/s Ethernet switch CUDA driver and runtime version 3.2 Application: NAMD 2.7 (Linux-x86_64 -CUDA and ibverbs-cuda distros) Benchmark Workloads: STMV (virus) benchmark (1,066,628 atoms, periodic, PME) F1ATPase benchmark (327,506 atoms, periodic, PME) 5

NAMD Performance GPUs Dataset: STMV (virus) benchmark (1,066,628 atoms, periodic, PME) Up to 138% higher performance with 4 GPUs vs 1 GPU on a single node Up to 60% higher performance 4-node each with single GPU versus 4 GPUs on a single node 138% 60% InfiniBand QDR Higher is better 12 Cores/Node 6

NAMD Performance Interconnect InfiniBand provides the highest performance as the cluster scales Up to 210% higher performance than 1GigE at 4-node Up to 20% higher performance than 10GigE at 4-node 1GigE provides no job gain after 2 nodes GPU communications between nodes become a significantly burden to the network 210% 20% Higher is better 12 Cores/Node 7

NAMD Performance Scalability InfiniBand enables higher scalability nearly 100% Only 31% of the system compute capability can be utilized with 1GigE Higher is better 12 Cores/Node 8

NAMD Performance GPU utilization Dataset STMV (virus) benchmark (1,066,628 atoms, periodic, PME) Using GPUs in computation shows benefit Results show that 92% gain in additional productivity with the use of 1 GPU per node ibverbs+cuda using GPU for computations, nodes connected via InfiniBand ibverbs no use of GPUs for the simulations, only CPUs, nodes connected via InfiniBand 92% Higher is better 12 Cores/Node 9

NAMD Performance Result GPU computation Dataset F1ATPase benchmark (327,506 atoms, periodic, PME) Utilizing GPUs in computation does not demonstrate benefit Results show that utilizing additional GPUs does not equal to performance improve Due to the small size of the dataset ibverbs+cuda using GPU for computations, nodes connected via InfiniBand ibverbs no use of GPUs for the simulations, only CPUs, nodes connected via InfiniBand Higher is better 12 Cores/Node 10

Summary Interconnects comparison for NAMD InfiniBand enables the highest performance and best GPU cluster scalability Spreading GPUs to more nodes improves job productivity Distributing the GPUs across the cluster to allow the full GPU potential to unleash NAMD can benefit performance boost with GPUs The STMV dataset shows big performance advantage with GPU The F1ATPase dataset shows no performance gain due to its small size 11

Thank You HPC Advisory Council All trademarks are property of their respective owners. All information is provided As-Is without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council Mellanox undertakes no duty and assumes no obligation to update or correct any information presented herein 12 12