OpenFOAM Performance Testing and Profiling. October 2017

Similar documents
LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

NAMD Performance Benchmark and Profiling. January 2015

LS-DYNA Performance Benchmark and Profiling. April 2015

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

STAR-CCM+ Performance Benchmark and Profiling. July 2014

OCTOPUS Performance Benchmark and Profiling. June 2015

CPMD Performance Benchmark and Profiling. February 2014

SNAP Performance Benchmark and Profiling. April 2014

MILC Performance Benchmark and Profiling. April 2013

AcuSolve Performance Benchmark and Profiling. October 2011

GROMACS Performance Benchmark and Profiling. September 2012

HUAWEI TECHNOLOGIES CO., LTD. HUAWEI FusionServer X6000 High-Density Server

HUAWEI Tecal X6000 High-Density Server

AcuSolve Performance Benchmark and Profiling. October 2011

Altair RADIOSS Performance Benchmark and Profiling. May 2013

CP2K Performance Benchmark and Profiling. April 2011

GROMACS Performance Benchmark and Profiling. August 2011

Huawei RH1288 V3 Rack Server Datasheet. Overview. Check its price: Click Here. Quick Specs. Product Details

ICON Performance Benchmark and Profiling. March 2012

CP2K Performance Benchmark and Profiling. April 2011

AMBER 11 Performance Benchmark and Profiling. July 2011

HYCOM Performance Benchmark and Profiling

NAMD Performance Benchmark and Profiling. November 2010

LAMMPS Performance Benchmark and Profiling. July 2012

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

Performance Analysis of LS-DYNA in Huawei HPC Environment

NAMD GPU Performance Benchmark. March 2011

Huawei FusionServer. V5 Rack Server HUAWEI TECHNOLOGIES CO., LTD.

NAMD Performance Benchmark and Profiling. February 2012

HUAWEI Tecal X8000 High-Density Rack Server

Himeno Performance Benchmark and Profiling. December 2010

ABySS Performance Benchmark and Profiling. May 2010

NEMO Performance Benchmark and Profiling. May 2011

TELECOMMUNICATIONS TECHNOLOGY ASSOCIATION JET-SPEED HHS3124F / HHS2112F (10 NODES)

GRID Testing and Profiling. November 2017

Boundless Computing Inspire an Intelligent Digital World

Figure 1 shows the appearance of Huawei FusionServer RH8100 V3 Rack Server. The RH8100 V3 provides three front I/O modules:

Huawei FusionServer. V5 Rack Server HUAWEI TECHNOLOGIES CO., LTD.

TECHNOLOGIES CO., LTD.

LAMMPSCUDA GPU Performance. April 2011

Clustering Optimizations How to achieve optimal performance? Pak Lui

Cisco UCS M5 Blade and Rack Servers

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

Supports up to four 3.5-inch SAS/SATA drives. Drive bays 1 and 2 support NVMe SSDs. A size-converter

Innovation Makes Computing Simple

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

MM5 Modeling System Performance Research and Profiling. March 2009

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

Data Sheet Fujitsu Server PRIMERGY CX250 S2 Dual Socket Server Node

Suggested use: infrastructure applications, collaboration/ , web, and virtualized desktops in a workgroup or distributed environments.

Application Performance Optimizations. Pak Lui

Intel Xeon E v4, Windows Server 2016 Standard, 16GB Memory, 1TB SAS Hard Drive and a 3 Year Warranty

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

HPE ProLiant DL360 Gen P 16GB-R P408i-a 8SFF 500W PS Performance Server (P06453-B21)

SERVER TECHNOLOGY H. Server & Workstation Motherboards Server Barebones & Accessories

Inspur AI Computing Platform

The Effect of HPC Cluster Architecture on the Scalability Performance of CAE Simulations

IBM Power Systems HPC Cluster

GW2000h w/gw175h/q F1 specifications

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

ThinkServer RD450 Platform Specifications

Quotations invited. 2. The supplied hardware should have 5 years comprehensive onsite warranty (24 x 7 call logging) from OEM directly.

Front-loading drive bays 1 12 support 3.5-inch SAS/SATA drives. Optionally, front-loading drive bays 1 and 2 support 3.5-inch NVMe SSDs.

Density Optimized System Enabling Next-Gen Performance

AMD EPYC and NAMD Powering the Future of HPC February, 2019

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Huawei Enterprise A Better Way. Huawei FusionServer X6800 Competitiveness Analysis

EPYC VIDEO CUG 2018 MAY 2018

Data Sheet FUJITSU Server PRIMERGY CX2550 M4 Multi-node Server

Data Sheet FUJITSU Server PRIMERGY CX2550 M1 Dual Socket Server Node

Essentials. Expected Discontinuance Q2'15 Limited 3-year Warranty Yes Extended Warranty Available

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE

The QM2 PCIe Expansion Card Gives Your NAS a Performance Boost! QM2-2S QM2-2P QM2-2S10G1T QM2-2P10G1T

Sun Fire X4170 M2 Server Frequently Asked Questions

PPC-F-Q370 SERIES. OpenVINO. AI Ready Panel PC Based on Intel OpenVINO toolkit AR 501 CAR 500 CAR 406 CAR 407 CAR 408 CAR 409 CAR 404 CAR 405 CAR 402

Data Sheet FUJITSU Server PRIMERGY CX2550 M4 Dual Socket Server Node

HostEngine 5URP24 Computer User Guide

Selecting Server Hardware for FlashGrid Deployment

HPE ProLiant ML350 Gen10 Server

Altos R320 F3 Specifications. Product overview. Product views. Internal view

Intel Xeon E v4, Optional Operating System, 8GB Memory, 2TB SAS H330 Hard Drive and a 3 Year Warranty

HIGH-DENSITY HIGH-CAPACITY HIGH-DENSITY HIGH-CAPACITY

Sugon TC6600 blade server

NEC Express5800/B120f-h System Configuration Guide

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

Broadberry. Artificial Intelligence Server for Fraud. Date: Q Application: Artificial Intelligence

Data Sheet FUJITSU Server PRIMERGY CX2550 M4 Multi-node Server

PPC-F-Q370 SERIES AI Ready Panel PC Based on Intel OpenVINO toolkit

Huawei FusionServer RH8100 V3. Technical White Paper. Issue 04. Date HUAWEI TECHNOLOGIES CO., LTD.

IBM Power AC922 Server

HPE ProLiant ML350 Gen P 16GB-R E208i-a 8SFF 1x800W RPS Solution Server (P04674-S01)

ThinkServer RD330 Photography - 3.5" Disk Bays

Transcription:

OpenFOAM Performance Testing and Profiling October 2017

Note The following research was performed under the HPC Advisory Council activities Participating vendors: Huawei, Mellanox Compute resource - HPC Advisory Council Cluster Center The following was done to provide best practices OpenFOAM performance overview Understanding OpenFOAM communication patterns Ways to increase OpenFOAM productivity MPI libraries comparisons For more info please refer to http://www.huawei.com http://www.mellanox.com/hpc/ https://www.openfoam.com/ 2

OpenFOAM OpenFOAM (Open Field Operation and Manipulation) CFD Toolbox in an open source CFD applications that can simulate Complex fluid flows involving Chemical reactions Turbulence Heat transfer Solid dynamics Electromagnetics The pricing of financial options OpenFOAM support can be obtained from OpenCFD Ltd 3

Objectives The presented research was done to provide best practices OpenFOAM performance benchmarking MPI Library performance comparison Interconnect performance comparison Compilers comparison Optimization tuning The presented results will demonstrate The scalability of the compute environment/application Considerations for higher productivity and efficiency 4

5 Test Cluster Configuration Huawei FusionServer X6000 Broadwell Cluster with Huawei FusionServer XH321 V3 32 Server Nodes Dual-Socket 14-Core Intel Xeon E5-2690 v4 @ 2.60 GHz CPUs (35MB Cache, Turbo @3.50GHz) Dual-Socket 16-core Intel Xeon E5-2697A v4 @ 2.60 GHz CPUs (40MB Cache, Turbo @3.60GHz) Memory: 256GB memory, DDR4 2400 MHz, Memory Snoop Mode in BIOS sets to Home Snoop OS: RHEL 7.2, MLNX_OFED_LINUX-4.1-1.0.2.0 InfiniBand SW stack Mellanox ConnectX-4 EDR 100Gb/s InfiniBand Adapters Mellanox Switch-IB SB7800 36-port EDR 100Gb/s InfiniBand Switch Huawei OceanStor 9000 Scale-out NAS storage system Compilers: Intel Parallel Studio XE 2018 MPI: Intel MPI 2018, Mellanox HPC-X MPI Toolkit v1.9.7 Application: OpenFOAM v1612+, single precision MPI Profiler: IPM (from Mellanox HPC-X) Benchmarks: MotorBike, 160K elements, 100 steps

XH321 V3 Highlights 6 About Huawei FusionServer X6000 XH321 V3 Entire Server Form Factor Half-width server node Processors 1 or 2 Intel Xeon E5-2600 v3/v4 Memory 16 DDR4 DIMMs, providing up to 1 TB memory when configured with LRDIMMs Internal Storage 6 SAS/SATA HDDs or NVMe SSDs Node RAID LOM Network Ports RAID 0, 1, 10, 5, 50, 6, or 60; supercapacitor 2 GE or 2 GE + 2 x 10GE PCIe Expansion Up to 2 PCIe x16 slots Note: The mainboard has two PCIe slots: PCIe slot 1 (left) shared by a RAID controller card and PCIe slot 2(right) shared by an IB card. LOM Storage 1 SATADOM and 1 M.2 NVMe SSD Operating Temperature 120 W to 145 W processors: 5ºC to 35ºC Processors below 120 W: 5ºC to 40ºC Dimensions 40.5 mm x 177.9 mm x 545.5 mm (1.59-in. x 7.00-in. x 21.48-in.) High performance: Supports 1 or 2 Intel Xeon E5-2600 v3/v4 series processors with up to forty-four cores and 55 MB L3 cache capacity High reliability: Supports multiple RAID levels, supercapacitor for power failure protection, and TPM disk encryption Diverse LOM I/O ports: mainboards support up to two 1GbE and two 10GbE LOM ports Features: Hot-swappable fans, 2.5 NVMe SSDs, Mixed NVMe SSD and SAS/SATA SSD, 1m deep cabinet, 40 operating temperature, Liquid Cooling

7 OpenFOAM Performance Processor Speed OpenFOAM can benefit from higher frequency CPUs Difference can be ~11% when running 2GHz compared 2.6GHz at base clock (at 32 nodes) Difference can be ~10% when comparing 2.6GHz CPUs with and without turbo speed (at 32 nodes) turbostat shows cores runs at turbo speed (3087MHz) when turbo is enabled 11% 10% Higher is better 28 MPI Processes / Node

8 OpenFOAM Performance MPI Libraries Comparing two MPI libraries HPC-X provides 19% higher performance at 32 nodes 19% Higher is better 28 MPI Processes / Node

OpenFOAM Profiling MPI Time MPI profiler shows the type of underlying MPI network communications Majority of communications occurred are non-blocking communications Majority of the MPI time is spent on non-blocking communications at 32 nodes MPI_Waitall (11% wall), 8-byte MPI_Recv (1.4% wall), 1-byte MPI_Recv (0.7% wall) Only 14% of the overall runtime is spent on MPI communications at 32-nodes (when EDR is used) 9

10 OpenFOAM Profiling MPI Communication Topology Communication topology shows communication patterns among MPI ranks MPI processes mainly communicates with neighbors, but also shows some other patterns 4 Nodes 8 Nodes 16 Nodes 32 Nodes

OpenFOAM Summary Network and MPI library are important factor for better OpenFOAM scalability HPC-X and tuning can improve performance by up to 19% at 32 nodes OpenFOAM can benefit from CPU that runs at a higher clock speed ~11% better performance when running 2.6GHz compared 2.0GHz at base clock (at 32 nodes) ~10% better performance when running 2.6GHz CPUs with turbo speed than without (at 32 nodes) MPI Profiling Most MPI time is spent on MPI non-blocking communications Majority of the MPI time is spent on MPI non-blocking communications at 32 nodes MPI_Waitall (11% wall), 8-byte MPI_Recv (1.4% wall), 1-byte MPI_Recv (0.7% wall) 11

12 12 Thank You HPC Advisory Council All trademarks are property of their respective owners. All information is provided As-Is without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein