ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

Similar documents
MILC Performance Benchmark and Profiling. April 2013

CPMD Performance Benchmark and Profiling. February 2014

SNAP Performance Benchmark and Profiling. April 2014

OCTOPUS Performance Benchmark and Profiling. June 2015

STAR-CCM+ Performance Benchmark and Profiling. July 2014

NAMD Performance Benchmark and Profiling. January 2015

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

OpenFOAM Performance Testing and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. April 2015

AcuSolve Performance Benchmark and Profiling. October 2011

Altair RADIOSS Performance Benchmark and Profiling. May 2013

LS-DYNA Performance Benchmark and Profiling. October 2017

CP2K Performance Benchmark and Profiling. April 2011

LS-DYNA Performance Benchmark and Profiling. October 2017

NAMD Performance Benchmark and Profiling. November 2010

GROMACS Performance Benchmark and Profiling. September 2012

AcuSolve Performance Benchmark and Profiling. October 2011

AMBER 11 Performance Benchmark and Profiling. July 2011

Himeno Performance Benchmark and Profiling. December 2010

NAMD GPU Performance Benchmark. March 2011

ICON Performance Benchmark and Profiling. March 2012

GROMACS Performance Benchmark and Profiling. August 2011

CP2K Performance Benchmark and Profiling. April 2011

NEMO Performance Benchmark and Profiling. May 2011

LAMMPS Performance Benchmark and Profiling. July 2012

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

HYCOM Performance Benchmark and Profiling

NAMD Performance Benchmark and Profiling. February 2012

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Clustering Optimizations How to achieve optimal performance? Pak Lui

ABySS Performance Benchmark and Profiling. May 2010

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

MM5 Modeling System Performance Research and Profiling. March 2009

HP GTC Presentation May 2012

LAMMPSCUDA GPU Performance. April 2011

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

The Effect of HPC Cluster Architecture on the Scalability Performance of CAE Simulations

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

RADIOSS Benchmark Underscores Solver s Scalability, Quality and Robustness

Intel Xeon E v4, Optional Operating System, 8GB Memory, 2TB SAS H330 Hard Drive and a 3 Year Warranty

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

Intel Xeon E v4, Windows Server 2016 Standard, 16GB Memory, 1TB SAS Hard Drive and a 3 Year Warranty

Application Performance Optimizations. Pak Lui

Performance Optimizations for LS-DYNA with Mellanox HPC-X Scalable Software Toolkit

HPE ProLiant ML350 Gen P 16GB-R E208i-a 8SFF 1x800W RPS Solution Server (P04674-S01)

HPE ProLiant ML350 Gen10 Server

2008 International ANSYS Conference

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

January 28 29, 2014San Jose. Engineering Workshop

TELECOMMUNICATIONS TECHNOLOGY ASSOCIATION JET-SPEED HHS3124F / HHS2112F (10 NODES)

Fujitsu PRIMERGY BX920 S1 Dual-Socket Server

All-new HP ProLiant ML350p Gen8 Server series

HP ProLiant m300 1P C2750 CPU 32GB Configure-to-order Server Cartridge B21

QuickSpecs. HPE ProLiant XL170r Gen9 Server. Overview. HPE ProLiant XL170r Gen9 Server

QuickSpecs. HP ProLiant BL460c Generation 8 (Gen8) Server Blade - NEBS (GR-63 & GR-1089) and ETSI certified.

GRID Testing and Profiling. November 2017

ProLiant DL F100 Integrated Cluster Solutions and Non-Integrated Cluster Bundle Configurations. Configurations

HPE ProLiant DL360 Gen P 16GB-R P408i-a 8SFF 500W PS Performance Server (P06453-B21)

HPE ProLiant DL380p Gen8 Server

FUSION1200 Scalable x86 SMP System

Quotations invited. 2. The supplied hardware should have 5 years comprehensive onsite warranty (24 x 7 call logging) from OEM directly.

QuickSpecs. HP ProLiant SL390s Generation 7 (G7) Overview

Data Sheet Fujitsu Server PRIMERGY CX250 S2 Dual Socket Server Node

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

Technologies in HP ProLiant G7 c-class server blades with AMD Opteron processors

Front-loading drive bays 1 12 support 3.5-inch SAS/SATA drives. Optionally, front-loading drive bays 1 and 2 support 3.5-inch NVMe SSDs.

Cisco UCS M5 Blade and Rack Servers

HP ProLiant DL360 G7 Server - Overview

Performance Analysis of HPC Applications on Several Dell PowerEdge 12 th Generation Servers

Application Acceleration Beyond Flash Storage

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

Selecting Server Hardware for FlashGrid Deployment

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

Altos R320 F3 Specifications. Product overview. Product views. Internal view

COMPUTING. MC1500 MaxCore Micro Versatile Compute and Acceleration Platform

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

NEC Express5800/R120e-2E Configuration Guide

GW2000h w/gw175h/q F1 specifications

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

High Performance Computing

HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads

Density Optimized System Enabling Next-Gen Performance

HPE ProLiant Gen10. Franz Weberberger Presales Consultant Server

HP ProLiant DL560 Gen8

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Supports up to four 3.5-inch SAS/SATA drives. Drive bays 1 and 2 support NVMe SSDs. A size-converter

QuickSpecs. HPE Cloudline CL7100 Server. Overview

Avid Configuration Guidelines HP Z6 G4 workstation Dual 8 to 28 Core CPU System

NEC Express5800/B110d Configuration Guide

QuickSpecs. HPE ProLiant DL360 Generation9 (Gen9) Overview

Memory Selection Guidelines for High Performance Computing with Dell PowerEdge 11G Servers

IBM Power 755 server. High performance compute node for scalable clusters using InfiniBand architecture interconnect products.

HP ProLiant blade planning and deployment

HP ProLiant Server Booklet

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC

Open storage architecture for private Oracle database clouds

Transcription:

ANSYS Fluent 14 Performance Benchmark and Profiling October 2012

Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting vendors solutions please refer to: www.mellanox.com, http://www.hp.com/go/hpc For more information on the application: http://www.ansys.com 2

FLUENT Computational Fluid Dynamics (CFD) is a computational technology Enables the study of the dynamics of things that flow By generating numerical solutions to a system of partial differential equations which describe fluid flow Enable better understanding of qualitative and quantitative physical phenomena in the flow which is used to improve engineering design CFD brings together a number of different disciplines Fluid dynamics, mathematical theory of partial differential systems, computational geometry, numerical analysis, Computer science ANSYS FLUENT is a leading CFD application from ANSYS Widely used in almost every industry sector and manufactured product 3

Objectives The presented research was done to provide best practices Fluent performance benchmarking Interconnect performance comparisons MPI performance comparison Understanding Fluent communication patterns The presented results will demonstrate The scalability of the compute environment to provide nearly linear application scalability 4

Test Cluster Configuration HP ProLiant SL230s Gen8 4-node Athena cluster Processors: Dual Eight-Core Intel Xeon E5-2680 @ 2.7 GHz Memory: 32GB per node, 1600MHz DDR3 DIMMs OS: RHEL 6 Update 2, OFED 1.5.3-3.10 InfiniBand SW stack Mellanox ConnectX-3 VPI InfiniBand adapters Mellanox SwitchX SX6036 56Gb/s InfiniBand and 40G/s Ethernet Switch MPI (Vendor-provided): Intel MPI 4 U2, Open MPI 1.3.3, Platform MPI 8.1.2 Application: ANSYS Fluent 14.0.0 Benchmarks: Eddy_417k Reacting Flow with Eddy Dissipation Model (417K elements) 5

About HP ProLiant SL230s Gen8 Item Processor Chipset Memory Max Memory Internal Storage Max Internal Storage Networking I/O Slots Ports Power Supplies Integrated Management Additional Features Form Factor SL230 Gen8 Two Intel Xeon E5-2600 Series, 4/6/8 Cores, Intel Sandy Bridge EP Socket-R (512 GB), 16 sockets, DDR3 up to 1600MHz, ECC 512 GB Two LFF non-hot plug SAS, SATA bays or Four SFF non-hot plug SAS, SATA, SSD bays Two Hot Plug SFF Drives (Option) 8TB Dual port 1GbE NIC/ Single 10G NIC One PCIe Gen3 x16 LP slot 1Gb and 10Gb Ethernet, IB, and FlexFabric options Front: (1) Management, (2) 1GbE, (1) Serial, (1) S.U.V port, (2) PCIe, and Internal Micro SD card & Active Health 750, 1200W (92% or 94%), high power chassis ilo4 hardware-based power capping via SL Advanced Power Manager Shared Power & Cooling and up to 8 nodes per 4U chassis, single GPU support, Fusion I/O support 16P/8GPUs/4U chassis 6

Fluent Performance CPU Generations Intel E5-2680 (Sandy Bridge) cluster outperforms prior generations Performs 46% better than X5670 Plutus cluster nodes System components used: Athena: 2-socket Intel E5-2680 @ 2.7GHz, 1600MHz DIMMs, FDR IB Plutus: 2-socket Intel X5670 @ 2.93GHz, 1333MHz DIMMs, QDR IB 46% Higher is better Platform MPI InfiniBand FDR 7

Fluent Performance Network InfiniBand FDR is the most efficient inter-node communication for Fluent Outperforms 1GbE by 230% at 4 nodes Outperforms 40GbE by 80% at 4 nodes 230% 80% Higher is better 16 Processes/Node 8

Fluent Performance MPI Platform MPI provides better scalability than Open MPI Fluent 14 provides Platform MPI as the default MPI option Up to 24% better performance at 4 nodes Default Fluent run script is used for all cases shown No other optimization flags were added 16% 24% Higher is better InfiniBand FDR 9

Fluent Profiling # of MPI Calls InfiniBand FDR reduces the time needed for communication InfiniBand FDR frees up more time for computation Ethernet solutions consume from 74% to 86% of time in MPI communications Higher is better 16 Processes/Node 10

Fluent Profiling # of MPI Calls The most used MPI calls is MPI_Iprobe Aside from MPI_Iprobe, MPI_Isend and Irecv are the next most used calls Showing All MPI calls Excluding MPI_Iprobe Higher is better 16 Processes/Node 11

Fluent Profiling MPI Communication Time Majority of MPI communication time is spent on MPI_Recv MPI_Allreduce(27%), MPI_Recv(28%) 12

Fluent Profiling MPI Message Sizes Majority of messages are small messages Messages below 4KB are mostly used 13

Fluent Summary HP ProLiant Gen8 servers delivers better performance than its predecessor ProLiant Gen8 equipped with Intel E5 series processes and InfiniBand FDR Up to 46% higher performance than ProLiant G7 (running Intel Xeon X5670) when compared at 4 nodes InfiniBand FDR is the most efficient inter-node communication for Fluent Outperforms 1GbE by 230% at 4 nodes Outperforms 10GbE by 80% at 4 nodes Fluent Profiling Platform MPI performs 24% better than Open MPI, and 16% better than Intel MPI InfiniBand FDR reduces communication time; provides more time for computation InfiniBand FDR consumes 53% of total time, versus 74-86% for Ethernet solutions MPI: Large MPI call volumes for testing non-blocking data transfers (MPI_Iprobe, MPI_Isend, MPI_Irecv) MPI time is spent mostly on MPI_Recv and MPI_Allreduce Messages are concentrated in small messages, from 0B to 4KB 14

Thank You HPC Advisory Council All trademarks are property of their respective owners. All information is provided As-Is without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council Mellanox undertakes no duty and assumes no obligation to update or correct any information presented herein 15 15