Application Performance Optimizations. Pak Lui

Size: px
Start display at page:

Download "Application Performance Optimizations. Pak Lui"

Transcription

1 Application Performance Optimizations Pak Lui

2 2 140 Applications Best Practices Published Abaqus CPMD LS-DYNA MILC AcuSolve Dacapo minife OpenMX Amber Desmond MILC PARATEC AMG DL-POLY MSC Nastran PFA AMR Eclipse MR Bayes PFLOTRAN ABySS FLOW-3D MM5 Quantum ESPRESSO ANSYS CFX GADGET-2 MPQC RADIOSS ANSYS FLUENT GROMACS NAMD SPECFEM3D ANSYS Mechanics Himeno Nekbone WRF BQCD HOOMD-blue NEMO CCSM HYCOM NWChem CESM ICON Octopus COSMO Lattice QCD OpenAtom CP2K LAMMPS OpenFOAM For more information, visit:

3 3 HPC Advisory Council HPC Center Dell PowerEdge R node cluster Dell PowerEdge R720/R720xd 32-node cluster Dell PowerEdge C node cluster Dell PowerEdge R node cluster HP Proliant XL230a Gen9 10-node cluster HP ProLiant SL230s Gen8 4-node cluster HP Cluster Platform 3000SL 16-node cluster Dell PowerVault MD3420 / MD3460 InfiniBand-based Lustre Storage Colfax CX1350s-XK5 4-node cluster Dell PowerEdge C node cluster Dell PowerEdge M node cluster

4 4 Agenda Overview of HPC Applications Performance Way to Inspect, Profile, Optimize HPC Applications CPU, memory, file I/O, network System Configurations and Tuning Case Studies, Performance Comparisons, Optimizations and Highlights Conclusions

5 5 HPC Application Performance Overview To achieve scalability performance on HPC applications Involves understanding of the workload by performing profile analysis Tune for the most time spent (either CPU, Network, IO, etc) Underlying implicit requirement: Each node to perform similarly Run CPU/memory /network tests or cluster checker to identify bad node(s) Comparing behaviors of using different HW components Which pinpoint bottlenecks in different areas of the HPC cluster A selection of HPC applications will be shown To demonstrate method of profiling and analysis To determine the bottleneck in SW/HW To determine the effectiveness of tuning to improve on performance

6 6 Ways To Inspect and Profile Applications Computation (CPU/Accelerators) Tools: gprof, top, htop, perf top, pstack, Visual Profiler, etc Tests and Benchmarks: HPL, STREAM File I/O Bandwidth and Block Size: iostat, collectl, darshan, etc Characterization Tools and Benchmarks: iozone, ior, etc Network Interconnect and MPI communications Tools and Profilers: perfquery, MPI profilers (IPM, TAU, etc) Characterization Tools and Benchmarks: Latency and Bandwidth: OSU benchmarks, IMB

7 7 Case Study: LS-DYNA

8 8 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource - HPC Advisory Council Cluster Center The following was done to provide best practices LS-DYNA performance overview Understanding LS-DYNA communication patterns Ways to increase LS-DYNA productivity MPI libraries comparisons For more info please refer to

9 9 LS-DYNA LS-DYNA A general purpose structural and fluid analysis simulation software package capable of simulating complex real world problems Developed by the Livermore Software Technology Corporation (LSTC) LS-DYNA used by Automobile Aerospace Construction Military Manufacturing Bioengineering

10 Objectives 10 The presented research was done to provide best practices LS-DYNA performance benchmarking MPI Library performance comparison Interconnect performance comparison CPU cores/speed comparison Optimization tuning The presented results will demonstrate The scalability of the compute environment/application Considerations for higher productivity and efficiency

11 11 Test Cluster Configuration Dell PowerEdge R node (896-core) Thor cluster Dual-Socket 14-Core Intel 2.60 GHz CPUs (Power Management in BIOS sets to Maximum Performance) Memory: 64GB memory, DDR MHz, Memory Snoop Mode in BIOS sets to Home Snoop OS: RHEL 6.5, MLNX_OFED_LINUX _ _1555 InfiniBand SW stack Hard Drives: 2x 1TB 7.2 RPM SATA 2.5 on RAID 1 Mellanox ConnectX-4 EDR 100Gb/s InfiniBand Adapters Mellanox Switch-IB SB port EDR 100Gb/s InfiniBand Switch Mellanox ConnectX-3 FDR VPI InfiniBand and 40Gb/s Ethernet Adapters Mellanox SwitchX-2 SX port 56Gb/s FDR InfiniBand / VPI Ethernet Switch MPI: Open MPI 1.8.4, Mellanox HPC-X v , Intel MPI , IBM Platform MPI 9.1 Application: LS-DYNA (builds 95359, 95610), Single Precision Benchmarks: 3 Vehicle Collision, Neon refined revised

12 PowerEdge R730 Massive flexibility for data intensive operations 12 Performance and efficiency Intelligent hardware-driven systems management with extensive power management features Innovative tools including automation for parts replacement and lifecycle manageability Broad choice of networking technologies from GigE to IB Built in redundancy with hot plug and swappable PSU, HDDs and fans Benefits Designed for performance workloads from big data analytics, distributed storage or distributed computing where local storage is key to classic HPC and large scale hosting environments High performance scale-out compute and low cost dense storage in one package Hardware Capabilities Flexible compute platform with dense storage capacity 2S/2U server, 6 PCIe slots Large memory footprint (Up to 768GB / 24 DIMMs) High I/O performance and optional storage configurations HDD options: 12 x or - 24 x x 2.5 HDDs in rear of server Up to 26 HDDs with 2 hot plug drives in rear of server for boot or scratch

13 13 LS-DYNA Performance Network Interconnects EDR InfiniBand delivers superior scalability in application performance Provides higher performance by over 4-5 times than 1GbE, 10GbE and 40GbE 1GbE stop scaling beyond 4 nodes, and 10GbE stops scaling beyond 8 nodes InfiniBand demonstrates continuous performance gain at scale 505% 572% 444% Higher is better 28 MPI Processes / Node

14 14 LS-DYNA Profiling Time Spent in MPI Majority of the MPI time is spent on MPI_recv and MPI Collective Ops MPI_Recv(36%), MPI_Allreduce(27%), MPI_Bcast(24%) Similar communication characteristics seen on both input dataset Both exhibit similar communication patterns Neon_refined_revised 32 nodes 3 Vehicle Collision 32 nodes

15 15 LS-DYNA Profiling Time Spent in MPI Most of the MPI messages are in the medium sizes Most message sizes are between 0 to 64B For the most time consuming MPI calls MPI_Recv: Most messages are under 4KB MPI_Bcast: Majority are less than 16B, but larger messages exist MPI_Allreduce: Most messages are less than 256B neon_refined_revised

16 16 LS-DYNA Performance EDR vs FDR InfiniBand EDR InfiniBand delivers superior scalability in application performance As the cluster scales, performance gap of EDR IB becomes widen Performance advantage of EDR InfiniBand increases for larger core counts EDR IB provides 15% versus FDR IB at 32 nodes (896 cores) 15% Higher is better 28 MPI Processes / Node

17 17 LS-DYNA Performance Cores Per Node Better performance is seen at scale with less CPU cores per node At low node counts, higher performance can be achieved with more cores per node At high node counts, slightly better performance by using less cores per node Memory bandwidth might be limited by more CPU cores being used 2% 5% 3%5% Higher is better 2.6GHz

18 18 LS-DYNA Performance AVX2/SSE2 CPU Instructions LS-DYNA provides executables with supports for different CPU instructions AVX2 is supported on Haswell while SSE2 is supported on previous generations Due to runtime issue, AVX2 executable build is used, instead of the public build Slight improvement of ~2-4% by using executable with AVX2 instructions The AVX2 instructions runs at a lower clock speed (2.2GHz) than normal CPU clock (2.6GHz) 4% Higher is better 24 MPI Processes / Node

19 19 LS-DYNA Performance Turbo Mode Turbo Boost enables processors to run above its base frequency Capability to allow CPU cores to run dynamically above the CPU clock When thermal headroom allows the CPU to operate The 2.6GHz clock speed could boost to Max Turbo Frequency of 3.3GHz Running with Turbo Boost translates to a ~25% of performance boost 40% 25% Higher is better 28 MPI Processes / Node

20 20 LS-DYNA Performance Memory Optimization Setting the environment variables for memory allocator improve on performance Modifying the memory allocator allows faster memory registration for communications Environment variables used: export MALLOC_MMAP_MAX_=0 export MALLOC_TRIM_THRESHOLD_=-1 19% 176% Higher is better 28 MPI Processes / Node

21 21 LS-DYNA Performance MPI Optimization FCA and MXM enhance LS-DYNA performance at scale for HPC-X Open MPI and HPC-X are based on the Open MPI distribution The yalla PML, UD transport and memory optimization in HPC-X reduce overhead MXM provides a speedup of 38% over un-tuned baseline run at 32 nodes (768 cores) MCA parameters: -mca btl_sm_use_knem 1 -mca pml yalla -x MXM_TLS=ud,shm,self -x MXM_SHM_RNDV_THRESH= x HCOLL_CONTEXT_CACHE_ENABLE=1 38% Higher is better 24 MPI Processes / Node

22 22 LS-DYNA Performance Intel MPI Optimization The DAPL provider performs better than OFA provider for Intel MPI DAPL would provide better scalability performance for Intel MPI on LS-DYNA MCA parameters for MXM: Common for 2 tests: I_MPI_DAPL_SCALABLE_PROGRESS 1, I_MPI_RDMA_TRANSLATION_CACHE 1, I_MPI_FAIR_CONN_SPIN_COUNT , I_MPI_FAIR_READ_SPIN_COUNT , I_MPI_ADJUST_REDUCE 2, I_MPI_ADJUST_BCAST 0, I_MPI_RDMA_TRANSLATION_CACHE 1, I_MPI_RDMA_RNDV_BUF_ALIGN 65536, I_MPI_SPIN_COUNT 121 For OFA: -IB, MV2_USE_APM 0, I_MPI_OFA_USE_XRC 1 For DAPL: -DAPL, I_MPI_DAPL_DIRECT_COPY_THRESHOLD 65536, I_MPI_DAPL_UD enable, I_MPI_DAPL_PROVIDER ofa-v2-mlx5_0-1u 20% Higher is better 24 MPI Processes / Node

23 23 LS-DYNA Performance MPI Libraries HPC-X outperforms Platform MPI, and Open MPI in scalability performance HPC-X delivers higher performance than Intel MPI (OFA) by 33%, (DAPL) by 11%, Platform MPI by 27% on neon_refined_revised Performance is 20% higher than Intel OFA, and % 8% better than Platform MPI in 3cars Tuning parameter used: For Open MPI: -bind-to-core and KNEM. For Platform MPI: -cpu_bind, -xrc. For Intel MPI: see previous slide 11% 27% 33% 20% 7% Higher is better 24 MPI Processes / Node

24 24 LS-DYNA Performance System Generations Current Haswell system configuration outperforms prior system generations Current systems outperformed Ivy Bridge by 47%, Sandy Bridge by 75%, Westmere by 148%, Nehalem by 290% Scalability support from EDR InfiniBand and HPC-X provide huge boost in performance at scale for LS-DYNA System components used: Haswell: 2-socket 14-core 2133MHz DIMMs, ConnectX-4 EDR InfiniBand Ivy Bridge: 2-socket 10-core 1600MHz DIMMs, Connect-IB FDR InfiniBand Sandy Bridge: 2-socket 8-core 1600MHz DIMMs, ConnectX-3 FDR InfiniBand Westmere: 2-socket 6-core 1333MHz DIMMs, ConnectX-2 QDR InfiniBand Nehalem: 2-socket 4-core 1333MHz DIMMs, ConnectX-2 QDR InfiniBand 148%75% 290% 47% Higher is better

25 25 LS-DYNA Summary Performance Compute: Intel Haswell cluster outperforms system architecture of previous generations Outperforms Ivy Bridge by 47%, Sandy Bridge by 75%, Westmere by 148%, and Nehalem by 290% Using executable with AVX2 instructions provides slight advantage Slight improvement of ~2-4% by using executable with AVX2 instructions Turbo Mode: Running with Turbo Boost provides ~25% of performance boost in some cases Turbo Boost enables processors to run above its base frequency Network: EDR InfiniBand and HPC-X MPI library deliver superior scalability in application performance EDR IB provides higher performance by over 4-5 times vs 1GbE, 10GbE and 40GbE, 15% vs FDR IB at 32 nodes MPI Tuning HPC-X enhances LS-DYNA performance at scale for LS-DYNA MXM UD provides a speedup of 38% over un-tuned baseline run at 32 nodes HPC-X outperforms Platform MPI, and Open MPI in scalability performance Up to 27% better than Platform MPI on neon_refined_revised, and 8% better than Platform MPI in 3cars

26 26 Case Study: GROMACS

27 27 GROMACS GROMACS (GROningen MAchine for Chemical Simulation) A molecular dynamics simulation package Primarily designed for biochemical molecules like proteins, lipids and nucleic acids A lot of algorithmic optimizations have been introduced in the code Extremely fast at calculating the nonbonded interactions Ongoing development to extend GROMACS with interfaces both to Quantum Chemistry and Bioinformatics/databases An open source software released under the GPL

28 Objectives 28 The presented research was done to provide best practices GROMACS performance benchmarking CPU performance comparison MPI library performance comparison Interconnect performance comparison System generations comparison The presented results will demonstrate The scalability of the compute environment/application Considerations for higher productivity and efficiency

29 29 Test Cluster Configuration Dell PowerEdge R node (896-core) Thor cluster Dual-Socket 14-Core Intel 2.60 GHz CPUs (Power Management in BIOS sets to Maximum Performance) Memory: 64GB memory, DDR MHz, Memory Snoop Mode in BIOS sets to Home Snoop, Turbo Enabled OS: RHEL 6.5, MLNX_OFED_LINUX InfiniBand SW stack Hard Drives: 2x 1TB 7.2 RPM SATA 2.5 on RAID 1 Mellanox ConnectX-4 EDR 100Gbps EDR InfiniBand Adapters Mellanox Switch-IB SB port 100Gb/s EDR InfiniBand Switch Mellanox ConnectX-3 FDR InfiniBand, 10/40GbE Ethernet VPI Adapters Mellanox SwitchX-2 SX port 56Gb/s FDR InfiniBand / VPI Ethernet Switch MPI: Mellanox HPC-X v Compiler and Libraries: Intel Composer XE and MKL Application: GROMACS Benchmark datasets: DPPC in Water (d.dppc, atoms, steps, SP) unless stated otherwise

30 30 GROMACS Performance Network Interconnects InfiniBand is the only interconnect that delivers superior scalability performance EDR InfiniBand provides higher performance and more scalable than 1GbE, 10GbE, or 40GbE Performance for Ethernet stays flat (or stops scaling) beyond 2 nodes EDR InfiniBand outperforms 10GbE-RoCE on scalability performance by 55% at 32 nodes / 896c EDR InfiniBand demonstrates continuous performance gain at scale 4.2x4.1x 55% 4.6x Higher is better 28 MPI Processes / Node

31 31 GROMACS Profiling Time Spent by MPI Calls The most time consuming MPI call os MPI_Sendrecv MPI_Sendrecv: 66% (or 27% of runtime) at 32 nodes (896 cores) MPI_Waitall: 18% (or 7% of runtime), MPI_Bcast: 6% (or 2% of runtime) Point to point and non-blocking sends and receives consume most time in GROMACS 32 Nodes

32 32 GROMACS Profiling MPI Message Sizes Majority of data transfer messages are medium sizes, except for: MPI_Sendrecv has a large concentration (from 8B to 8KB) MPI_Bcast shows some concentration 32 Nodes

33 33 GROMACS Profiling MPI Data Transfer As the cluster grows, similar communication behavior is seen Majority of communications are between neighboring ranks Non-blocking (point to point) data, and point-to-point transfers are shown in the graph Collective data communications are small compared to point-to-point communications 32 Nodes / 896 Cores 2 Nodes / 56 Cores

34 34 GROMACS Performance EDR vs FDR InfiniBand EDR InfiniBand delivers superior scalability in application performance As the number of nodes scales, performance gap of EDR IB becomes widen Performance advantage of EDR InfiniBand increases for larger core counts EDR InfiniBand provides 29% versus FDR InfiniBand at 32 nodes (896 cores) 29% Higher is better 28 MPI Processes / Node

35 35 GROMACS Performance System Generations Thor cluster (based on Intel E5-2697v3 - Haswell) outperforms prior generations 1.1 to 3.5x higher performance than clusters based on previous generations of Intel architecture System components used: Janus: 2-socket 6-core Xeon 2.93GHz, 1333MHz DIMMs, ConnectX-2 QDR IB Jupiter: 2-socket 8-core Xeon 2.7GHz, 1600MHz DIMMs, ConnectX-3 FDR IB Thor: 2-socket 14-core Xeon 2133MHz DIMMs, ConnectX-4 EDR IB 118% 356% Higher is better

36 36 GROMACS Performance Cores Per Node Running more CPU cores provides higher performance ~7-10% higher productivity with 28PPN compared to 24PPN Higher demand on memory bandwidth and network might limit performance as more cores are used 39% 7% 13% Higher is better 2.6GHz

37 37 GROMACS Performance Turbo Mode & CPU Clock Advantages are seen with running higher clock rate Either by enabling Turbo mode or higher CPU clock frequency Boosting CPU clock rate yields higher performance at lower cost Increasing to 2600MHz (from 2300MHz) run 11% faster 11% 16% Higher is better 2.6GHz

38 38 GROMACS Performance Floating Point Precision GROMACS allows running either SP and DP for floating point precision Running at SP is shown to be faster than running at DP Seen around 41%-47% faster running at SP (Single Precision) versus DP (Double Precision) All other slides are running using Single Precision 41% 47% Higher is better 2.6GHz

39 39 GROMACS Summary Latest system generation improve GROMACS performance at scale Compute: Intel Haswell cluster outperforms system architecture of previous generations Haswell cluster outperforms Sandy Bridge cluster by 110%, and outperforms Westmere cluster by 350% at 32 node Compute: Running more CPU cores provides higher performance ~7-10% higher productivity with 28PPN compared to 24PPN Network: EDR InfiniBand delivers superior scalability in application performance EDR InfiniBand provides higher performance and more scalable than 1GbE, 10GbE, or 40GbE Performance for Ethernet (1GbE/10GbE/40GbE) stays flat (or stops scaling) beyond 2 nodes EDR InfiniBand outperforms 10GbE-RoCE on scalability performance by 55% at 32 nodes / 896c Running at Single Precision is approximately twice as fast as running at Double Precision Seen around 41%-47% faster running at SP (Single Precision) versus DP (Double Precision) MPI Profile shows majority of data transfer are point-to-point and non-blocking communications MPI_Sendrecv and MPI_Waitall are the most used MPI communication

40 40 40 Thank You HPC Advisory Council All trademarks are property of their respective owners. All information is provided As-Is without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council undertakes no duty and assumes no obligation to update or correct any information presented herein

LS-DYNA Performance Benchmark and Profiling. April 2015

LS-DYNA Performance Benchmark and Profiling. April 2015 LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

GROMACS (GPU) Performance Benchmark and Profiling. February 2016 GROMACS (GPU) Performance Benchmark and Profiling February 2016 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Mellanox, NVIDIA Compute

More information

Clustering Optimizations How to achieve optimal performance? Pak Lui

Clustering Optimizations How to achieve optimal performance? Pak Lui Clustering Optimizations How to achieve optimal performance? Pak Lui 130 Applications Best Practices Published Abaqus CPMD LS-DYNA MILC AcuSolve Dacapo minife OpenMX Amber Desmond MILC PARATEC AMG DL-POLY

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015 LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA

More information

NAMD Performance Benchmark and Profiling. January 2015

NAMD Performance Benchmark and Profiling. January 2015 NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

The Effect of HPC Cluster Architecture on the Scalability Performance of CAE Simulations

The Effect of HPC Cluster Architecture on the Scalability Performance of CAE Simulations The Effect of HPC Cluster Architecture on the Scalability Performance of CAE Simulations Pak Lui HPC Advisory Council June 7, 2016 1 Agenda Introduction to HPC Advisory Council Benchmark Configuration

More information

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015 Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute

More information

STAR-CCM+ Performance Benchmark and Profiling. July 2014

STAR-CCM+ Performance Benchmark and Profiling. July 2014 STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute

More information

GROMACS Performance Benchmark and Profiling. August 2011

GROMACS Performance Benchmark and Profiling. August 2011 GROMACS Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

GROMACS Performance Benchmark and Profiling. September 2012

GROMACS Performance Benchmark and Profiling. September 2012 GROMACS Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

HPC Applications Performance and Optimizations Best Practices Pak Lui

HPC Applications Performance and Optimizations Best Practices Pak Lui HPC Applications Performance and Optimizations Best Practices Pak Lui 130 Applications Best Practices Published Abaqus CPMD LS-DYNA MILC AcuSolve Dacapo minife OpenMX Amber Desmond MILC PARATEC AMG DL-POLY

More information

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations Ophir Maor HPC Advisory Council ophir@hpcadvisorycouncil.com The HPC-AI Advisory Council World-wide HPC non-profit

More information

CPMD Performance Benchmark and Profiling. February 2014

CPMD Performance Benchmark and Profiling. February 2014 CPMD Performance Benchmark and Profiling February 2014 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

Performance Optimizations for LS-DYNA with Mellanox HPC-X Scalable Software Toolkit

Performance Optimizations for LS-DYNA with Mellanox HPC-X Scalable Software Toolkit Performance Optimizations for LS-DYNA with Mellanox HPC-X Scalable Software Toolkit Pak Lui 1, David Cho 1, Gilad Shainer 1, Scot Schultz 1, Brian Klaff 1 1 Mellanox Technologies, Inc. 1 Abstract From

More information

MILC Performance Benchmark and Profiling. April 2013

MILC Performance Benchmark and Profiling. April 2013 MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

OpenFOAM Performance Testing and Profiling. October 2017

OpenFOAM Performance Testing and Profiling. October 2017 OpenFOAM Performance Testing and Profiling October 2017 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Huawei, Mellanox Compute resource - HPC

More information

OCTOPUS Performance Benchmark and Profiling. June 2015

OCTOPUS Performance Benchmark and Profiling. June 2015 OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the

More information

SNAP Performance Benchmark and Profiling. April 2014

SNAP Performance Benchmark and Profiling. April 2014 SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting

More information

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012 ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information

More information

LAMMPS Performance Benchmark and Profiling. July 2012

LAMMPS Performance Benchmark and Profiling. July 2012 LAMMPS Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

Altair RADIOSS Performance Benchmark and Profiling. May 2013

Altair RADIOSS Performance Benchmark and Profiling. May 2013 Altair RADIOSS Performance Benchmark and Profiling May 2013 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Altair, AMD, Dell, Mellanox Compute

More information

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011 AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute

More information

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011 AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute

More information

ICON Performance Benchmark and Profiling. March 2012

ICON Performance Benchmark and Profiling. March 2012 ICON Performance Benchmark and Profiling March 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource - HPC

More information

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011 CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

AMBER 11 Performance Benchmark and Profiling. July 2011

AMBER 11 Performance Benchmark and Profiling. July 2011 AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

NEMO Performance Benchmark and Profiling. May 2011

NEMO Performance Benchmark and Profiling. May 2011 NEMO Performance Benchmark and Profiling May 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox

More information

HYCOM Performance Benchmark and Profiling

HYCOM Performance Benchmark and Profiling HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities

More information

Performance Analysis of LS-DYNA in Huawei HPC Environment

Performance Analysis of LS-DYNA in Huawei HPC Environment Performance Analysis of LS-DYNA in Huawei HPC Environment Pak Lui, Zhanxian Chen, Xiangxu Fu, Yaoguo Hu, Jingsong Huang Huawei Technologies Abstract LS-DYNA is a general-purpose finite element analysis

More information

Maximizing Cluster Scalability for LS-DYNA

Maximizing Cluster Scalability for LS-DYNA Maximizing Cluster Scalability for LS-DYNA Pak Lui 1, David Cho 1, Gerald Lotto 1, Gilad Shainer 1 1 Mellanox Technologies, Inc. Sunnyvale, CA, USA 1 Abstract High performance network interconnect is an

More information

NAMD GPU Performance Benchmark. March 2011

NAMD GPU Performance Benchmark. March 2011 NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory

More information

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 Gilad Shainer, Tong Liu (Mellanox); Jeffrey Layton (Dell); Joshua Mora (AMD) High Performance Interconnects for

More information

NAMD Performance Benchmark and Profiling. February 2012

NAMD Performance Benchmark and Profiling. February 2012 NAMD Performance Benchmark and Profiling February 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia

More information

NAMD Performance Benchmark and Profiling. November 2010

NAMD Performance Benchmark and Profiling. November 2010 NAMD Performance Benchmark and Profiling November 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox Compute resource - HPC Advisory

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

LAMMPSCUDA GPU Performance. April 2011

LAMMPSCUDA GPU Performance. April 2011 LAMMPSCUDA GPU Performance April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory Council

More information

Himeno Performance Benchmark and Profiling. December 2010

Himeno Performance Benchmark and Profiling. December 2010 Himeno Performance Benchmark and Profiling December 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource

More information

GRID Testing and Profiling. November 2017

GRID Testing and Profiling. November 2017 GRID Testing and Profiling November 2017 2 GRID C++ library for Lattice Quantum Chromodynamics (Lattice QCD) calculations Developed by Peter Boyle (U. of Edinburgh) et al. Hybrid MPI+OpenMP plus NUMA aware

More information

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011 The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011 HPC Scale Working Group, Dec 2011 Gilad Shainer, Pak Lui, Tong Liu,

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

RADIOSS Benchmark Underscores Solver s Scalability, Quality and Robustness

RADIOSS Benchmark Underscores Solver s Scalability, Quality and Robustness RADIOSS Benchmark Underscores Solver s Scalability, Quality and Robustness HPC Advisory Council studies performance evaluation, scalability analysis and optimization tuning of RADIOSS 12.0 on a modern

More information

Game-changing Extreme GPU computing with The Dell PowerEdge C4130

Game-changing Extreme GPU computing with The Dell PowerEdge C4130 Game-changing Extreme GPU computing with The Dell PowerEdge C4130 A Dell Technical White Paper This white paper describes the system architecture and performance characterization of the PowerEdge C4130.

More information

MM5 Modeling System Performance Research and Profiling. March 2009

MM5 Modeling System Performance Research and Profiling. March 2009 MM5 Modeling System Performance Research and Profiling March 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center

More information

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for ANSYS Mechanical, ANSYS Fluent, and

More information

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work

More information

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA Gilad Shainer 1, Tong Liu 1, Pak Lui 1, Todd Wilde 1 1 Mellanox Technologies Abstract From concept to engineering, and from design to

More information

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 Technologies and application performance Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 The landscape is changing We are no longer in the general purpose era the argument of

More information

Performance Analysis of HPC Applications on Several Dell PowerEdge 12 th Generation Servers

Performance Analysis of HPC Applications on Several Dell PowerEdge 12 th Generation Servers Performance Analysis of HPC Applications on Several Dell PowerEdge 12 th Generation Servers This Dell technical white paper evaluates and provides recommendations for the performance of several HPC applications

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product

More information

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016 RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD

More information

HPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab

HPC and AI Solution Overview. Garima Kochhar HPC and AI Innovation Lab HPC and AI Solution Overview Garima Kochhar HPC and AI Innovation Lab 1 Dell EMC HPC and DL team charter Design, develop and integrate HPC and DL Heading systems Lorem ipsum dolor sit amet, consectetur

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments LS-DYNA Productivity and Power-aware Simulations in Cluster Environments Gilad Shainer 1, Tong Liu 1, Jacob Liberman 2, Jeff Layton 2 Onur Celebioglu 2, Scot A. Schultz 3, Joshua Mora 3, David Cownie 3,

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

Optimal BIOS settings for HPC with Dell PowerEdge 12 th generation servers

Optimal BIOS settings for HPC with Dell PowerEdge 12 th generation servers Optimal BIOS settings for HPC with Dell PowerEdge 12 th generation servers This Dell technical white paper analyses the various BIOS options available in Dell PowerEdge 12 th generation servers and provides

More information

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms Dr. Jeffrey Layton Enterprise Technologist HPC Why GPUs? GPUs have very high peak compute capability! 6-9X CPU Challenges

More information

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has

More information

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe: Cray events! Cray User Group (CUG):! When: May 16-19, 2005! Where: Albuquerque, New Mexico - USA! Registration: reserved to CUG members! Web site: http://www.cug.org! Cray Technical Workshop Europe:! When:

More information

FUSION1200 Scalable x86 SMP System

FUSION1200 Scalable x86 SMP System FUSION1200 Scalable x86 SMP System Introduction Life Sciences Departmental System Manufacturing (CAE) Departmental System Competitive Analysis: IBM x3950 Competitive Analysis: SUN x4600 / SUN x4600 M2

More information

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science ECSS Symposium, 12/16/14 M. L. Norman, R. L. Moore, D. Baxter, G. Fox (Indiana U), A Majumdar, P Papadopoulos, W Pfeiffer, R. S.

More information

Dell HPC System for Manufacturing System Architecture and Application Performance

Dell HPC System for Manufacturing System Architecture and Application Performance Dell HPC System for Manufacturing System Architecture and Application Performance This Dell technical white paper describes the architecture of the Dell HPC System for Manufacturing and discusses performance

More information

HPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017

HPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017 HPC Innovation Lab Update Dell EMC HPC Community Meeting 3/28/2017 Dell EMC HPC Innovation Lab charter Design, develop and integrate Heading HPC systems Lorem ipsum Flexible reference dolor sit amet, architectures

More information

HP GTC Presentation May 2012

HP GTC Presentation May 2012 HP GTC Presentation May 2012 Today s Agenda: HP s Purpose-Built SL Server Line Desktop GPU Computing Revolution with HP s Z Workstations Hyperscale the new frontier for HPC New HPC customer requirements

More information

GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation

GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO

More information

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE

More information

Trends in systems and how to get efficient performance

Trends in systems and how to get efficient performance Trends in systems and how to get efficient performance Martin Hilgeman HPC Consultant martin.hilgeman@dell.com The landscape is changing We are no longer in the general purpose era the argument of tuning

More information

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions A comparative analysis with PowerEdge R510 and PERC H700 Global Solutions Engineering Dell Product

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

Nehalem Hochleistungsrechnen für reale Anwendungen

Nehalem Hochleistungsrechnen für reale Anwendungen Nehalem Hochleistungsrechnen für reale Anwendungen T-Systems HPCN Workshop DLR Braunschweig May 14-15, 2009 Hans-Joachim Plum Intel GmbH 1 Performance tests and ratings are measured using specific computer

More information

AMD EPYC and NAMD Powering the Future of HPC February, 2019

AMD EPYC and NAMD Powering the Future of HPC February, 2019 AMD EPYC and NAMD Powering the Future of HPC February, 19 Exceptional Core Performance NAMD is a compute-intensive workload that benefits from AMD EPYC s high core IPC (Instructions Per Clock) and high

More information

unleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer

unleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer the future unleashed Alexey Belogortsev Field Application Engineer Intel Xeon Scalable Processors for High Performance Computing Growing Challenges in System Architecture The Walls System Bottlenecks Divergent

More information

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester

GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester NVIDIA GPU Computing A Revolution in High Performance Computing GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester John Ashley Senior Solutions Architect

More information

UCX: An Open Source Framework for HPC Network APIs and Beyond

UCX: An Open Source Framework for HPC Network APIs and Beyond UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer

More information

HPC Hardware Overview

HPC Hardware Overview HPC Hardware Overview John Lockman III April 19, 2013 Texas Advanced Computing Center The University of Texas at Austin Outline Lonestar Dell blade-based system InfiniBand ( QDR) Intel Processors Longhorn

More information

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

Single-Points of Performance

Single-Points of Performance Single-Points of Performance Mellanox Technologies Inc. 29 Stender Way, Santa Clara, CA 9554 Tel: 48-97-34 Fax: 48-97-343 http://www.mellanox.com High-performance computations are rapidly becoming a critical

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

Dell EMC HPC System for Life Sciences v1.4

Dell EMC HPC System for Life Sciences v1.4 Dell EMC HPC System for Life Sciences v1.4 Designed for genomics sequencing analysis, bioinformatics and computational biology Dell EMC Engineering April 2017 A Dell EMC Reference Architecture Revisions

More information

FUJITSU PHI Turnkey Solution

FUJITSU PHI Turnkey Solution FUJITSU PHI Turnkey Solution Integrated ready to use XEON-PHI based platform Dr. Pierre Lagier ISC2014 - Leipzig PHI Turnkey Solution challenges System performance challenges Parallel IO best architecture

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

Memory Selection Guidelines for High Performance Computing with Dell PowerEdge 11G Servers

Memory Selection Guidelines for High Performance Computing with Dell PowerEdge 11G Servers Memory Selection Guidelines for High Performance Computing with Dell PowerEdge 11G Servers A Dell Technical White Paper By Garima Kochhar and Jacob Liberman High Performance Computing Engineering Dell

More information

LS-DYNA Performance on Intel Scalable Solutions

LS-DYNA Performance on Intel Scalable Solutions LS-DYNA Performance on Intel Scalable Solutions Nick Meng, Michael Strassmaier, James Erwin, Intel nick.meng@intel.com, michael.j.strassmaier@intel.com, james.erwin@intel.com Jason Wang, LSTC jason@lstc.com

More information

A Comprehensive Study on the Performance of Implicit LS-DYNA

A Comprehensive Study on the Performance of Implicit LS-DYNA 12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four

More information

Accelerating Insights In the Technical Computing Transformation

Accelerating Insights In the Technical Computing Transformation Accelerating Insights In the Technical Computing Transformation Dr. Rajeeb Hazra Vice President, Data Center Group General Manager, Technical Computing Group June 2014 TOP500 Highlights Intel Xeon Phi

More information

Intel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*

Intel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008* Intel Cluster Toolkit Compiler Edition. for Linux* or Windows HPC Server 8* Product Overview High-performance scaling to thousands of processors. Performance leadership Intel software development products

More information

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory

More information

Interconnect Your Future

Interconnect Your Future #OpenPOWERSummit Interconnect Your Future Scot Schultz, Director HPC / Technical Computing Mellanox Technologies OpenPOWER Summit, San Jose CA March 2015 One-Generation Lead over the Competition Mellanox

More information

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA, S7750 - THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE Presenter: Louis Capps, Solution Architect, NVIDIA, lcapps@nvidia.com A TALE OF ENLIGHTENMENT Basic OK List 10 for x = 1 to 3 20 print

More information

Faster Metal Forming Solution with Latest Intel Hardware & Software Technology

Faster Metal Forming Solution with Latest Intel Hardware & Software Technology 12 th International LS-DYNA Users Conference Computing Technologies(3) Faster Metal Forming Solution with Latest Intel Hardware & Software Technology Nick Meng 1, Jixian Sun 2, Paul J Besl 1 1 Intel Corporation,

More information

Implementing Storage in Intel Omni-Path Architecture Fabrics

Implementing Storage in Intel Omni-Path Architecture Fabrics white paper Implementing in Intel Omni-Path Architecture Fabrics Rev 2 A rich ecosystem of storage solutions supports Intel Omni- Path Executive Overview The Intel Omni-Path Architecture (Intel OPA) is

More information