TFLOP Performance for ANSYS Mechanical

Size: px
Start display at page:

Download "TFLOP Performance for ANSYS Mechanical"

Transcription

1 TFLOP Performance for ANSYS Mechanical Dr. Herbert Güttler Engineering GmbH Holunderweg Bernstadt Engineering H. Güttler Seite 1

2 May 2009, Ansys12, 512 cores, 1 TFLOP per secoond Engineering H. Güttler Seite 2

3

4 Numerical Effort for a random selection of MCE Projects ANSYS MAPDL, sparse solver How long will your simulation take? Can vary by an order of magnitude for the same # DOFs Source: AnandTech Engineering H. Güttler Seite 4

5 Stats data can be found here =========================== = multifrontal statistics = p.ex. file.dsp =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 64 GPU acceleration activated percentage of GPU accelerated flops = time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB Memory allocated on core 62 = MB Memory allocated on core 63 = MB Total Memory allocated by all cores = MB DSP Matrix Solver CPU Time (sec) = DSP Matrix Solver ELAPSED Time (sec) = DSP Matrix Solver Memory Used ( MB) = Engineering H. Güttler Seite 5

6 Performance Results Engineering H. Güttler Seite 6

7 Numerical Effort for a random selection of MCE Projects ANSYS MAPDL, sparse solver 260 s on a 1 TFLOPs machine 40 s on a 2TFLOP TFLOPs machine Source: AnandTech Engineering H. Güttler Seite 7

8 Current status of HPC Computing Source: AnandTech Engineering H. Güttler Seite 8

9 Tools (Hardware: Oct 2010) Compute Servers 8 Intel Harpertown systems: (SUN X4150) total of 64 cores, 496 GB RAM 16 Intel Nehalem systems: (SUN X4170) total of 128 cores, 1140 GB RAM Memory / core typ. 8GB Infiniband interconnect across servers Each with local Raid 0 disk array Operating System: SUSE Linux Enterprise Server Latest addition: 1 AMD Opteron 6172 System (Magny Cours ) 48 cores, 192 GB RAM UPS, Air conditioning Max. power consumption ~ 18kW Applications: ANSYS Mechanical, optislang Engineering H. Güttler Seite 9

10 Interconnect: FDR Performance Latencies Bandwidth Latency time from master to core 1 = µs Latency time from master to core 2 = µs Latency time from master to core 3 = µs Communication speed from master to core Communication speed from master to core Communication speed from master to core 1 = MB/sec 2 = MB/sec 3 = MB/sec Latency time from master to core 9 = µs Latency time from master to core 10 = µs Latency time from master to core 11 = µs Latency time from master to core 16 = µs Latency time from master to core 17 = µs Latency time from master to core 18 = µs Latency time from master to core 28 = µs Latency time from master to core 29 = µs Latency time from master to core 30 = µs Latency time from master to core 31 = µs Communication speed from master to core 9 = MB/sec Communication speed from master to core 10 = MB/sec Communication speed from master to core 11 = MB/sec Communication speed from master to core 16 = MB/sec Communication speed from master to core 17 = MB/sec Communication speed from master to core 18 = MB/sec Communication speed from master to core 28 = MB/sec Communication speed from master to core 29 = MB/sec Communication speed from master to core 30 = MB/sec Communication speed from master to core 31 = MB/sec core core on die socket - socket node -node Engineering H. Güttler Seite 10

11 Tools (Hardware: Jan 2013) 128 E5 Sandy Bridge cores 2.9 GHz 156 Westmere cores 2.9 GHz Up to 4 GPUs per node Up to 2GPUs per node Engineering H. Güttler Seite 11

12 Tools (Hardware: April 2013) 1 TB RAM 1,1 3.5 kw (theoretical ti l peak25tflop 2.5 TFLOPs) 128 E5 Sandy Bridge cores 2.9 GHz 8 nodes in a 4U case Engineering H. Güttler Seite 12

13 Tools (Hardware: June 2013) 0.4 TB RAM kw (theoretical peak TFLOPs) 2 nodes, total of 32 E5 cores, 2.9 GHz + 8 K20x GPUs Engineering H. Güttler Seite 13

14 Comparison for 5 MDOF model (R14.5.7) w/o GPUs (16x E5 2690) =========================== = multifrontal statistics = =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 128 time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB Memory allocated on core 126 = MB Memory allocated on core 127 = MB Total Memory allocated by all cores = MB w dual GPUs (4x E / 8x Kepler K20x) =========================== = multifrontal statistics = =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 32 GPU acceleration activated percentage of GPU accelerated flops = time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor f = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB.. Memory allocated on core 30 = MB Memory allocated on core 31 = MB Total Memory allocated by all cores = MB DSP Matrix Solver CPU Time (sec) = DSP Matrix Solver ELAPSED Time (sec) = DSP Matrix Solver CPU Time (sec) = DSP Matrix Solver Memory Used ( MB) = DSP Matrixt i Solver l ELAPSED Time (sec)( = DSP Matrix Solver Memory Used ( MB) = Engineering H. Güttler Seite 14

15 Applications Engineering H. Güttler Seite 15

16 Example: Ball grid array M O D E L S U M M A R Y Processor Number Max Min Elements Nodes Shared Nodes DOFs (solid 186 &187)!! no contact elements!! Mold Solder balls PCB Engineering H. Güttler Seite 16

17 HPC mit ANSYS Engineering H. Güttler Seite 17

18 HPC mit ANSYS 14.0 Engineering H. Güttler Seite 18

19 HPC mit ANSYS 14.0 Engineering H. Güttler Seite 19

20 HPC mit ANSYS 14.5 Engineering H. Güttler Seite 20

21 BGA Benchmark with R14.5 on Sandy Bridge Xeons + GPUs Single node / Workstation class Engineering H. Güttler Seite 21

22 BGA Benchmark with R145 (compilation of all results) Engineering H. Güttler Seite 22

23 GPU Acceleration Real life : Hardware: E x Tesla K20X Accelerator, DSPARSE Duty Cycle ca % Engineering H. Güttler Seite 23

24 Next steps Engineering H. Güttler Seite 24

25 Applications: BGA, LQFP Einzelbauteile & Systembetrachtung Schwerpunkt Lotkriechen Engineering H. Güttler Seite 25

26 Benchmark Results: Leda Benchmark Procedure ANSYS S 11 ANSYS12 S ANSYS12. S ANSYS13 S ANSYS S 14 ANSYS S 14.5 ANSYS S SP02 (UP ) (UP ) Thermal (full model) 3 MDOF 4h (8 cores) 1h (8 cores + 1 GPU) 0.8h (32 cores) Thermomechanical Simulation (full model) 78MDOF Interpolation of boundary conditions Submodell: ~ 5.5 days for 163 iterations (8 ) 37h for 16 Loadsteps 34.3h for 164 iterations (20 cores) 7.8 MDOF (8 cores). Identical to ANSYS 11 Creep Strain Analysis 5.5 MDOF 12.5h for 195 iterations (64 cores) Identical to ANSYS h for 195 iterations (64 cores) 0.2h (improved algorithm) 6.1h for 488 iterations (128 cores) 7.5h for 195 iterations (128 cores) 0.2h ~ 5.5 days for 38.5h for h for h for 498 iterationsti 492 iterations iterations (64 cores + iterations (16 cores) (76 cores) 8GPUs) (16 cores) 4.2h (256 cores) 2 weeks 5 days 2 days 1 day ½ day 6.4h for 196 iterations (128 E5 cores) Best Performance with E5 Xeons 4h for 498 iterations ti (128 E5 cores) 4.8h (128 E5 cores + 16 GPUs) All runs with SMP Sparse or DSPARSE solver Hardware 11 & 12: Dual X5460 (3.16 GHz Harpertown Xeon) Hardware : 1 14 Dual X5570 (2.93 GHz Nehalem Xeon) or Dual X5670 (2.93 GHz Westmere Xeon), M207x Nvidia GPUs, 14.5 results also with Dual E (2.9 GHz Sandy Bridge Xeon) ANSYS creep runs with NROPT,,crpl + DDOPT, metis ANSYS runs with Infiniband interconnect 7.2h for 196 iterations (72 cores + 12 GPUs) 5.5h for 498 iterations ti (72 cores + 12 GPUs) Engineering H. Güttler Seite 26

27 Comparison: 2009 vs Update 2013: softwarecosts dominate, 128 cores. Engineering H. Güttler Seite 27

28 Examples periodic structure, identical pins Engineering H. Güttler Seite 28

29 Comparison for 5 MDOF model (w. contacts; R14.5) w/o GPUs (E5 2690) =========================== = multifrontal statistics = =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 128 time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB Memory allocated on core 126 = MB Memory allocated on core 127 = MB Total Memory allocated by all cores = MB w dual GPUs (E5 2690) =========================== = multifrontal statistics = =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 64 GPU acceleration activated percentage of GPU accelerated flops = time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor f = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB Memory allocated on core 62 = MB Memory allocated on core 63 = MB Total Memory allocated by all cores = MB DSP Matrix Solver CPU Time (sec) = DSP Matrix Solver ELAPSED Time (sec) = DSP Matrix Solver Memory Used ( MB) = DSP Matrix Solver CPU Time (sec) = DSP Matrixt i Solver l ELAPSED Time (sec)( = DSP Matrix Solver Memory Used ( MB) = Engineering H. Güttler Seite 29

30 GPU Performance tested with mold injected part (w. fibers) Engineering H. Güttler Seite 30

31 Objective For a plastic cover generated via mold injection from a fiber reinforced plastic (PA66GF30) there is a considerable variation of the material properties caused by avariation in the direction of the fiber orientation. Furthermore, the degree of orientation will vary locally. The fiber orientation can be calculated outside of ANSYS and mapped onto the model. However, a much finer mesh is needed to represent the locally varying material accurately, compared to the situation with a homogenous material. During acustomer project we made astudy with models of different meshing density (meshed inside workbench) to investigate the displacements under thermal load The model is a simple bulk model (solid 186), no contacts, no material nonlinearities. Coarse model (2mm Tets): 07MDOF 0.7 Medium model (0.5mm HexDom): 5.9 MDOF Engineering H. Güttler Seite 31

32 Objective Orientation Material Mapping Engineering H. Güttler Seite 32

33 Model 0.5 mm Hex Dominant Engineering H. Güttler Seite 33

34 Difference in Displacements (free expansion) 2mm Tet Mesh 0.5mm Hex Dom Mesh Coarse model (2mm Tets): 0.7 MDOF Medium model (0.5mm HexDom): 5.9 MDOF Displacement range off by about 50% Engineering H. Güttler Seite 34

35 Results for 0.5mm HexDom model 100% speedup when using GPUs. & latesthardware Engineering H. Güttler Seite 35

36 Conclusions ANSYS Mechanical routinely deliver TFLOP per second performance in a HPC environment! Highest Peak performance with GPUs (and suitable case) Conventional solution provides similar performance with fewer surprises. GPU licensing & stability critical for adoption Engineering H. Güttler Seite 36

37 Acknoweldgements Jeff Beisheim, ANSYS Inc. Erke Wang, Peter Tiefenthaler, CADFEM GmbH Natalja Schafet, Wolfgang Müller-Hirsch, Robert Bosch GmbH Philipp Schmid, Holger Mai, Engineering GmbH Engineering H. Güttler Seite 37

38 Engineering H. Güttler Seite 38

Recent Advances in ANSYS Toward RDO Practices Using optislang. Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH

Recent Advances in ANSYS Toward RDO Practices Using optislang. Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH Recent Advances in ANSYS Toward RDO Practices Using optislang Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH 1 Product Development Pressures Source: Engineering Simulation & HPC Usage Survey

More information

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models Solving Large Complex Problems Efficient and Smart Solutions for Large Models 1 ANSYS Structural Mechanics Solutions offers several techniques 2 Current trends in simulation show an increased need for

More information

ANSYS HPC Technology Leadership

ANSYS HPC Technology Leadership ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011 ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include

More information

Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA

Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B revenue ~5,500 employees Founded in 1999 with primary business in

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

University at Buffalo Center for Computational Research

University at Buffalo Center for Computational Research University at Buffalo Center for Computational Research The following is a short and long description of CCR Facilities for use in proposals, reports, and presentations. If desired, a letter of support

More information

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs Baskar Rajagopalan Accelerated Computing, NVIDIA 1 Engineering & IT Challenges/Trends NVIDIA GPU Solutions AGENDA Abaqus GPU

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs

More information

GPU COMPUTING WITH MSC NASTRAN 2013

GPU COMPUTING WITH MSC NASTRAN 2013 SESSION TITLE WILL BE COMPLETED BY MSC SOFTWARE GPU COMPUTING WITH MSC NASTRAN 2013 Srinivas Kodiyalam, NVIDIA, Santa Clara, USA THEME Accelerated computing with GPUs SUMMARY Current trends in HPC (High

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE

More information

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC

Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms Dr. Jeffrey Layton Enterprise Technologist HPC Why GPUs? GPUs have very high peak compute capability! 6-9X CPU Challenges

More information

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com GPU Acceleration of Matrix Algebra Dr. Ronald C. Young Multipath Corporation FMS Performance History Machine Year Flops DEC VAX 1978 97,000 FPS 164 1982 11,000,000 FPS 164-MAX 1985 341,000,000 DEC VAX

More information

ANSYS High. Computing. User Group CAE Associates

ANSYS High. Computing. User Group CAE Associates ANSYS High Performance Computing User Group 010 010 CAE Associates Parallel Processing in ANSYS ANSYS offers two parallel processing methods: Shared-memory ANSYS: Shared-memory ANSYS uses the sharedmemory

More information

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute

More information

Accelerating Implicit LS-DYNA with GPU

Accelerating Implicit LS-DYNA with GPU Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

Why HPC for. ANSYS Mechanical and ANSYS CFD?

Why HPC for. ANSYS Mechanical and ANSYS CFD? Why HPC for ANSYS Mechanical and ANSYS CFD? 1 HPC Defined High Performance Computing (HPC) at ANSYS: An ongoing effort designed to remove computing limitations from engineers who use computer aided engineering

More information

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.

More information

Faster Metal Forming Solution with Latest Intel Hardware & Software Technology

Faster Metal Forming Solution with Latest Intel Hardware & Software Technology 12 th International LS-DYNA Users Conference Computing Technologies(3) Faster Metal Forming Solution with Latest Intel Hardware & Software Technology Nick Meng 1, Jixian Sun 2, Paul J Besl 1 1 Intel Corporation,

More information

The Red Storm System: Architecture, System Update and Performance Analysis

The Red Storm System: Architecture, System Update and Performance Analysis The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI

More information

Integrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective

Integrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective Integrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective Nathan Woods XtremeData FPGA 2007 Outline Background Problem Statement Possible Solutions Description

More information

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid

More information

Performance Benefits of NVIDIA GPUs for LS-DYNA

Performance Benefits of NVIDIA GPUs for LS-DYNA Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA

More information

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics

More information

A Comprehensive Study on the Performance of Implicit LS-DYNA

A Comprehensive Study on the Performance of Implicit LS-DYNA 12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has

More information

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup The Cray CX1 puts massive power and flexibility right where you need it in your workgroup Up to 96 cores of Intel 5600 compute power 3D visualization Up to 32TB of storage GPU acceleration Small footprint

More information

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program C.T. Vaughan, D.C. Dinge, P.T. Lin, S.D. Hammond, J. Cook, C. R. Trott, A.M. Agelastos, D.M. Pase, R.E. Benner,

More information

Application Performance on Dual Processor Cluster Nodes

Application Performance on Dual Processor Cluster Nodes Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys

More information

Structural Mechanics With ANSYS Pierre THIEFFRY Lead Product Manager ANSYS, Inc.

Structural Mechanics With ANSYS Pierre THIEFFRY Lead Product Manager ANSYS, Inc. Structural Mechanics With ANSYS Pierre THIEFFRY Lead Product Manager ANSYS, Inc. 1 ANSYS Structural Mechanics provides highend solver solutions within a highly productive user environment to reliably and

More information

Particleworks: Particle-based CAE Software fully ported to GPU

Particleworks: Particle-based CAE Software fully ported to GPU Particleworks: Particle-based CAE Software fully ported to GPU Introduction PrometechVideo_v3.2.3.wmv 3.5 min. Particleworks Why the particle method? Existing methods FEM, FVM, FLIP, Fluid calculation

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Understanding Hardware Selection to Speedup Your CFD and FEA Simulations

Understanding Hardware Selection to Speedup Your CFD and FEA Simulations Understanding Hardware Selection to Speedup Your CFD and FEA Simulations 1 Agenda Why Talking About Hardware HPC Terminology ANSYS Work-flow Hardware Considerations Additional resources 2 Agenda Why Talking

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

HPC Capabilities at Research Intensive Universities

HPC Capabilities at Research Intensive Universities HPC Capabilities at Research Intensive Universities Purushotham (Puri) V. Bangalore Department of Computer and Information Sciences and UAB IT Research Computing UAB HPC Resources 24 nodes (192 cores)

More information

CUDA Experiences: Over-Optimization and Future HPC

CUDA Experiences: Over-Optimization and Future HPC CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign

More information

Thread and Data parallelism in CPUs - will GPUs become obsolete?

Thread and Data parallelism in CPUs - will GPUs become obsolete? Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für

More information

LBRN - HPC systems : CCT, LSU

LBRN - HPC systems : CCT, LSU LBRN - HPC systems : CCT, LSU HPC systems @ CCT & LSU LSU HPC Philip SuperMike-II SuperMIC LONI HPC Eric Qeenbee2 CCT HPC Delta LSU HPC Philip 3 Compute 32 Compute Two 2.93 GHz Quad Core Nehalem Xeon 64-bit

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 28 International ANSYS Conference Maximizing Performance for Large Scale Analysis on Multi-core Processor Systems Don Mize Technical Consultant Hewlett Packard 28 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product

More information

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia

More information

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

HPC Enabling R&D at Philip Morris International

HPC Enabling R&D at Philip Morris International HPC Enabling R&D at Philip Morris International Jim Geuther*, Filipe Bonjour, Bruce O Neel, Didier Bouttefeux, Sylvain Gubian, Stephane Cano, and Brian Suomela * Philip Morris International IT Service

More information

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07

More information

FEMAP/NX NASTRAN PERFORMANCE TUNING

FEMAP/NX NASTRAN PERFORMANCE TUNING FEMAP/NX NASTRAN PERFORMANCE TUNING Chris Teague - Saratech (949) 481-3267 www.saratechinc.com NX Nastran Hardware Performance History Running Nastran in 1984: Cray Y-MP, 32 Bits! (X-MP was only 24 Bits)

More information

CST STUDIO SUITE R Supported GPU Hardware

CST STUDIO SUITE R Supported GPU Hardware CST STUDIO SUITE R 2017 Supported GPU Hardware 1 Supported Hardware CST STUDIO SUITE currently supports up to 8 GPU devices in a single host system, meaning each number of GPU devices between 1 and 8 is

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

Performance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem

Performance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem Performance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem Guan Wang and Matthias K. Gobbert Department of Mathematics and Statistics, University of

More information

QLogic TrueScale InfiniBand and Teraflop Simulations

QLogic TrueScale InfiniBand and Teraflop Simulations WHITE Paper QLogic TrueScale InfiniBand and Teraflop Simulations For ANSYS Mechanical v12 High Performance Interconnect for ANSYS Computer Aided Engineering Solutions Executive Summary Today s challenging

More information

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Optimising the Mantevo benchmark suite for multi- and many-core architectures Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of

More information

Philippe Thierry Sr Staff Engineer Intel Corp.

Philippe Thierry Sr Staff Engineer Intel Corp. HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market

More information

Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc.

Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters 2006 ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary Our Business Simulation Driven Product Development Deliver superior

More information

PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent

PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python F.D. Witherden, M. Klemm, P.E. Vincent 1 Overview Motivation. Accelerators and Modern Hardware Python and PyFR. Summary. Motivation

More information

OzenCloud Case Studies

OzenCloud Case Studies OzenCloud Case Studies Case Studies, April 20, 2015 ANSYS in the Cloud Case Studies: Aerodynamics & fluttering study on an aircraft wing using fluid structure interaction 1 Powered by UberCloud http://www.theubercloud.com

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA

The AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA The AMD64 Technology for Server and Workstation Dr. Ulrich Knechtel Enterprise Program Manager EMEA Agenda Direct Connect Architecture AMD Opteron TM Processor Roadmap Competition OEM support The AMD64

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure

HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure - CÉCI: What we want - Cluster HMEM - Cluster Lemaitre2 - Comparison - What next? - Support and training - Conclusions CÉCI: What we want CÉCI:

More information

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

Automatic Tuning of the High Performance Linpack Benchmark

Automatic Tuning of the High Performance Linpack Benchmark Automatic Tuning of the High Performance Linpack Benchmark Ruowei Chen Supervisor: Dr. Peter Strazdins The Australian National University What is the HPL Benchmark? World s Top 500 Supercomputers http://www.top500.org

More information

The Road from Peta to ExaFlop

The Road from Peta to ExaFlop The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units

More information

Performance Analysis and Prediction for distributed homogeneous Clusters

Performance Analysis and Prediction for distributed homogeneous Clusters Performance Analysis and Prediction for distributed homogeneous Clusters Heinz Kredel, Hans-Günther Kruse, Sabine Richling, Erich Strohmaier IT-Center, University of Mannheim, Germany IT-Center, University

More information

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated

More information

SATGPU - A Step Change in Model Runtimes

SATGPU - A Step Change in Model Runtimes SATGPU - A Step Change in Model Runtimes User Group Meeting Thursday 16 th November 2017 Ian Wright, Atkins Peter Heywood, University of Sheffield 20 November 2017 1 SATGPU: Phased Development Phase 1

More information

Improving Your Structural Mechanics Simulations with Release 14.0

Improving Your Structural Mechanics Simulations with Release 14.0 Improving Your Structural Mechanics Simulations with Release 14.0 1 What will Release 14.0 bring you? 2 Let s now take a closer look at some topics 3 MAPDL/WB Integration Finite Element Information Access

More information

Properly Sizing Processing and Memory for your AWMS Server

Properly Sizing Processing and Memory for your AWMS Server Overview This document provides guidelines for purchasing new hardware which will host the AirWave Wireless Management System. Your hardware should incorporate margin for WLAN expansion as well as future

More information

HPC Current Development in Indonesia. Dr. Bens Pardamean Bina Nusantara University Indonesia

HPC Current Development in Indonesia. Dr. Bens Pardamean Bina Nusantara University Indonesia HPC Current Development in Indonesia Dr. Bens Pardamean Bina Nusantara University Indonesia HPC Facilities Educational & Research Institutions in Indonesia CIBINONG SITE Basic Nodes: 80 node 2 processors

More information

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA 3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires

More information

Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet. Swamy N. Kandadai and Xinghong He and

Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet. Swamy N. Kandadai and Xinghong He and Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet Swamy N. Kandadai and Xinghong He swamy@us.ibm.com and xinghong@us.ibm.com ABSTRACT: We compare the performance of several applications

More information

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

(software agnostic) Computational Considerations

(software agnostic) Computational Considerations (software agnostic) Computational Considerations The Issues CPU GPU Emerging - FPGA, Phi, Nervana Storage Networking CPU 2 Threads core core Processor/Chip Processor/Chip Computer CPU Threads vs. Cores

More information

High Performance Computing with Accelerators

High Performance Computing with Accelerators High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing

More information

The determination of the correct

The determination of the correct SPECIAL High-performance SECTION: H i gh-performance computing computing MARK NOBLE, Mines ParisTech PHILIPPE THIERRY, Intel CEDRIC TAILLANDIER, CGGVeritas (formerly Mines ParisTech) HENRI CALANDRA, Total

More information

Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations

Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations Performance Brief Quad-Core Workstation Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations With eight cores and up to 80 GFLOPS of peak performance at your fingertips,

More information

APENet: LQCD clusters a la APE

APENet: LQCD clusters a la APE Overview Hardware/Software Benchmarks Conclusions APENet: LQCD clusters a la APE Concept, Development and Use Roberto Ammendola Istituto Nazionale di Fisica Nucleare, Sezione Roma Tor Vergata Centro Ricerce

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling

Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Iterative Solvers Numerical Results Conclusion and outlook 1/22 Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Part II: GPU Implementation and Scaling on Titan Eike

More information

High performance Computing and O&G Challenges

High performance Computing and O&G Challenges High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating

More information

Industrial finite element analysis: Evolution and current challenges. Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009

Industrial finite element analysis: Evolution and current challenges. Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009 Industrial finite element analysis: Evolution and current challenges Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009 Dr. Chief Numerical Analyst Office of Architecture and

More information

Computing on GPU Clusters

Computing on GPU Clusters Computing on GPU Clusters Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information