TFLOP Performance for ANSYS Mechanical
|
|
- Clinton McKinney
- 5 years ago
- Views:
Transcription
1 TFLOP Performance for ANSYS Mechanical Dr. Herbert Güttler Engineering GmbH Holunderweg Bernstadt Engineering H. Güttler Seite 1
2 May 2009, Ansys12, 512 cores, 1 TFLOP per secoond Engineering H. Güttler Seite 2
3
4 Numerical Effort for a random selection of MCE Projects ANSYS MAPDL, sparse solver How long will your simulation take? Can vary by an order of magnitude for the same # DOFs Source: AnandTech Engineering H. Güttler Seite 4
5 Stats data can be found here =========================== = multifrontal statistics = p.ex. file.dsp =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 64 GPU acceleration activated percentage of GPU accelerated flops = time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB Memory allocated on core 62 = MB Memory allocated on core 63 = MB Total Memory allocated by all cores = MB DSP Matrix Solver CPU Time (sec) = DSP Matrix Solver ELAPSED Time (sec) = DSP Matrix Solver Memory Used ( MB) = Engineering H. Güttler Seite 5
6 Performance Results Engineering H. Güttler Seite 6
7 Numerical Effort for a random selection of MCE Projects ANSYS MAPDL, sparse solver 260 s on a 1 TFLOPs machine 40 s on a 2TFLOP TFLOPs machine Source: AnandTech Engineering H. Güttler Seite 7
8 Current status of HPC Computing Source: AnandTech Engineering H. Güttler Seite 8
9 Tools (Hardware: Oct 2010) Compute Servers 8 Intel Harpertown systems: (SUN X4150) total of 64 cores, 496 GB RAM 16 Intel Nehalem systems: (SUN X4170) total of 128 cores, 1140 GB RAM Memory / core typ. 8GB Infiniband interconnect across servers Each with local Raid 0 disk array Operating System: SUSE Linux Enterprise Server Latest addition: 1 AMD Opteron 6172 System (Magny Cours ) 48 cores, 192 GB RAM UPS, Air conditioning Max. power consumption ~ 18kW Applications: ANSYS Mechanical, optislang Engineering H. Güttler Seite 9
10 Interconnect: FDR Performance Latencies Bandwidth Latency time from master to core 1 = µs Latency time from master to core 2 = µs Latency time from master to core 3 = µs Communication speed from master to core Communication speed from master to core Communication speed from master to core 1 = MB/sec 2 = MB/sec 3 = MB/sec Latency time from master to core 9 = µs Latency time from master to core 10 = µs Latency time from master to core 11 = µs Latency time from master to core 16 = µs Latency time from master to core 17 = µs Latency time from master to core 18 = µs Latency time from master to core 28 = µs Latency time from master to core 29 = µs Latency time from master to core 30 = µs Latency time from master to core 31 = µs Communication speed from master to core 9 = MB/sec Communication speed from master to core 10 = MB/sec Communication speed from master to core 11 = MB/sec Communication speed from master to core 16 = MB/sec Communication speed from master to core 17 = MB/sec Communication speed from master to core 18 = MB/sec Communication speed from master to core 28 = MB/sec Communication speed from master to core 29 = MB/sec Communication speed from master to core 30 = MB/sec Communication speed from master to core 31 = MB/sec core core on die socket - socket node -node Engineering H. Güttler Seite 10
11 Tools (Hardware: Jan 2013) 128 E5 Sandy Bridge cores 2.9 GHz 156 Westmere cores 2.9 GHz Up to 4 GPUs per node Up to 2GPUs per node Engineering H. Güttler Seite 11
12 Tools (Hardware: April 2013) 1 TB RAM 1,1 3.5 kw (theoretical ti l peak25tflop 2.5 TFLOPs) 128 E5 Sandy Bridge cores 2.9 GHz 8 nodes in a 4U case Engineering H. Güttler Seite 12
13 Tools (Hardware: June 2013) 0.4 TB RAM kw (theoretical peak TFLOPs) 2 nodes, total of 32 E5 cores, 2.9 GHz + 8 K20x GPUs Engineering H. Güttler Seite 13
14 Comparison for 5 MDOF model (R14.5.7) w/o GPUs (16x E5 2690) =========================== = multifrontal statistics = =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 128 time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB Memory allocated on core 126 = MB Memory allocated on core 127 = MB Total Memory allocated by all cores = MB w dual GPUs (4x E / 8x Kepler K20x) =========================== = multifrontal statistics = =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 32 GPU acceleration activated percentage of GPU accelerated flops = time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor f = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB.. Memory allocated on core 30 = MB Memory allocated on core 31 = MB Total Memory allocated by all cores = MB DSP Matrix Solver CPU Time (sec) = DSP Matrix Solver ELAPSED Time (sec) = DSP Matrix Solver CPU Time (sec) = DSP Matrix Solver Memory Used ( MB) = DSP Matrixt i Solver l ELAPSED Time (sec)( = DSP Matrix Solver Memory Used ( MB) = Engineering H. Güttler Seite 14
15 Applications Engineering H. Güttler Seite 15
16 Example: Ball grid array M O D E L S U M M A R Y Processor Number Max Min Elements Nodes Shared Nodes DOFs (solid 186 &187)!! no contact elements!! Mold Solder balls PCB Engineering H. Güttler Seite 16
17 HPC mit ANSYS Engineering H. Güttler Seite 17
18 HPC mit ANSYS 14.0 Engineering H. Güttler Seite 18
19 HPC mit ANSYS 14.0 Engineering H. Güttler Seite 19
20 HPC mit ANSYS 14.5 Engineering H. Güttler Seite 20
21 BGA Benchmark with R14.5 on Sandy Bridge Xeons + GPUs Single node / Workstation class Engineering H. Güttler Seite 21
22 BGA Benchmark with R145 (compilation of all results) Engineering H. Güttler Seite 22
23 GPU Acceleration Real life : Hardware: E x Tesla K20X Accelerator, DSPARSE Duty Cycle ca % Engineering H. Güttler Seite 23
24 Next steps Engineering H. Güttler Seite 24
25 Applications: BGA, LQFP Einzelbauteile & Systembetrachtung Schwerpunkt Lotkriechen Engineering H. Güttler Seite 25
26 Benchmark Results: Leda Benchmark Procedure ANSYS S 11 ANSYS12 S ANSYS12. S ANSYS13 S ANSYS S 14 ANSYS S 14.5 ANSYS S SP02 (UP ) (UP ) Thermal (full model) 3 MDOF 4h (8 cores) 1h (8 cores + 1 GPU) 0.8h (32 cores) Thermomechanical Simulation (full model) 78MDOF Interpolation of boundary conditions Submodell: ~ 5.5 days for 163 iterations (8 ) 37h for 16 Loadsteps 34.3h for 164 iterations (20 cores) 7.8 MDOF (8 cores). Identical to ANSYS 11 Creep Strain Analysis 5.5 MDOF 12.5h for 195 iterations (64 cores) Identical to ANSYS h for 195 iterations (64 cores) 0.2h (improved algorithm) 6.1h for 488 iterations (128 cores) 7.5h for 195 iterations (128 cores) 0.2h ~ 5.5 days for 38.5h for h for h for 498 iterationsti 492 iterations iterations (64 cores + iterations (16 cores) (76 cores) 8GPUs) (16 cores) 4.2h (256 cores) 2 weeks 5 days 2 days 1 day ½ day 6.4h for 196 iterations (128 E5 cores) Best Performance with E5 Xeons 4h for 498 iterations ti (128 E5 cores) 4.8h (128 E5 cores + 16 GPUs) All runs with SMP Sparse or DSPARSE solver Hardware 11 & 12: Dual X5460 (3.16 GHz Harpertown Xeon) Hardware : 1 14 Dual X5570 (2.93 GHz Nehalem Xeon) or Dual X5670 (2.93 GHz Westmere Xeon), M207x Nvidia GPUs, 14.5 results also with Dual E (2.9 GHz Sandy Bridge Xeon) ANSYS creep runs with NROPT,,crpl + DDOPT, metis ANSYS runs with Infiniband interconnect 7.2h for 196 iterations (72 cores + 12 GPUs) 5.5h for 498 iterations ti (72 cores + 12 GPUs) Engineering H. Güttler Seite 26
27 Comparison: 2009 vs Update 2013: softwarecosts dominate, 128 cores. Engineering H. Güttler Seite 27
28 Examples periodic structure, identical pins Engineering H. Güttler Seite 28
29 Comparison for 5 MDOF model (w. contacts; R14.5) w/o GPUs (E5 2690) =========================== = multifrontal statistics = =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 128 time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB Memory allocated on core 126 = MB Memory allocated on core 127 = MB Total Memory allocated by all cores = MB w dual GPUs (E5 2690) =========================== = multifrontal statistics = =========================== number of equations = no. of nonzeroes in lower triangle of a = no. of nonzeroes in the factor l = ratio of nonzeroes in factor (min/max) = number of super nodes = maximum order of a front matrix = maximum size of a front matrix = maximum size of a front trapezoid = no. of floating point ops for factor = D+13 no. of floating point ops for solve = D+10 ratio of flops for factor (min/max) = near zero pivot monitoring activated number of pivots adjusted = 0 negative pivot monitoring activated number of negative pivots encountered = 0 factorization panel size = 128 number of cores used = 64 GPU acceleration activated percentage of GPU accelerated flops = time (cpu & wall) for structure input = time (cpu & wall) for ordering = time (cpu & wall) for other matrix prep = time (cpu & wall) for value input = time (cpu & wall) for matrix distrib. = time (cpu & wall) for numeric factor = computational rate (mflops) for factor f = time (cpu & wall) for numeric solve = computational rate (mflops) for solve = effective I/O rate (MB/sec) for solve = Memory allocated on core 0 = MB Memory allocated on core 1 = MB Memory allocated on core 62 = MB Memory allocated on core 63 = MB Total Memory allocated by all cores = MB DSP Matrix Solver CPU Time (sec) = DSP Matrix Solver ELAPSED Time (sec) = DSP Matrix Solver Memory Used ( MB) = DSP Matrix Solver CPU Time (sec) = DSP Matrixt i Solver l ELAPSED Time (sec)( = DSP Matrix Solver Memory Used ( MB) = Engineering H. Güttler Seite 29
30 GPU Performance tested with mold injected part (w. fibers) Engineering H. Güttler Seite 30
31 Objective For a plastic cover generated via mold injection from a fiber reinforced plastic (PA66GF30) there is a considerable variation of the material properties caused by avariation in the direction of the fiber orientation. Furthermore, the degree of orientation will vary locally. The fiber orientation can be calculated outside of ANSYS and mapped onto the model. However, a much finer mesh is needed to represent the locally varying material accurately, compared to the situation with a homogenous material. During acustomer project we made astudy with models of different meshing density (meshed inside workbench) to investigate the displacements under thermal load The model is a simple bulk model (solid 186), no contacts, no material nonlinearities. Coarse model (2mm Tets): 07MDOF 0.7 Medium model (0.5mm HexDom): 5.9 MDOF Engineering H. Güttler Seite 31
32 Objective Orientation Material Mapping Engineering H. Güttler Seite 32
33 Model 0.5 mm Hex Dominant Engineering H. Güttler Seite 33
34 Difference in Displacements (free expansion) 2mm Tet Mesh 0.5mm Hex Dom Mesh Coarse model (2mm Tets): 0.7 MDOF Medium model (0.5mm HexDom): 5.9 MDOF Displacement range off by about 50% Engineering H. Güttler Seite 34
35 Results for 0.5mm HexDom model 100% speedup when using GPUs. & latesthardware Engineering H. Güttler Seite 35
36 Conclusions ANSYS Mechanical routinely deliver TFLOP per second performance in a HPC environment! Highest Peak performance with GPUs (and suitable case) Conventional solution provides similar performance with fewer surprises. GPU licensing & stability critical for adoption Engineering H. Güttler Seite 36
37 Acknoweldgements Jeff Beisheim, ANSYS Inc. Erke Wang, Peter Tiefenthaler, CADFEM GmbH Natalja Schafet, Wolfgang Müller-Hirsch, Robert Bosch GmbH Philipp Schmid, Holger Mai, Engineering GmbH Engineering H. Güttler Seite 37
38 Engineering H. Güttler Seite 38
Recent Advances in ANSYS Toward RDO Practices Using optislang. Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH
Recent Advances in ANSYS Toward RDO Practices Using optislang Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH 1 Product Development Pressures Source: Engineering Simulation & HPC Usage Survey
More informationSolving Large Complex Problems. Efficient and Smart Solutions for Large Models
Solving Large Complex Problems Efficient and Smart Solutions for Large Models 1 ANSYS Structural Mechanics Solutions offers several techniques 2 Current trends in simulation show an increased need for
More informationANSYS HPC Technology Leadership
ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011
ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include
More informationStan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA
Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B revenue ~5,500 employees Founded in 1999 with primary business in
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationMaximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs
Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationUniversity at Buffalo Center for Computational Research
University at Buffalo Center for Computational Research The following is a short and long description of CCR Facilities for use in proposals, reports, and presentations. If desired, a letter of support
More informationFaster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA
Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs Baskar Rajagopalan Accelerated Computing, NVIDIA 1 Engineering & IT Challenges/Trends NVIDIA GPU Solutions AGENDA Abaqus GPU
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationEngineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary
white paper Computer-Aided Engineering ANSYS Mechanical on Intel Xeon Processors Engineer Productivity Boosted by Higher-Core CPUs Engineers can be significantly more productive when ANSYS Mechanical runs
More informationGPU COMPUTING WITH MSC NASTRAN 2013
SESSION TITLE WILL BE COMPLETED BY MSC SOFTWARE GPU COMPUTING WITH MSC NASTRAN 2013 Srinivas Kodiyalam, NVIDIA, Santa Clara, USA THEME Accelerated computing with GPUs SUMMARY Current trends in HPC (High
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationPART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System
INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE
More informationAnalyzing Performance and Power of Applications on GPUs with Dell 12G Platforms. Dr. Jeffrey Layton Enterprise Technologist HPC
Analyzing Performance and Power of Applications on GPUs with Dell 12G Platforms Dr. Jeffrey Layton Enterprise Technologist HPC Why GPUs? GPUs have very high peak compute capability! 6-9X CPU Challenges
More informationGPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com
GPU Acceleration of Matrix Algebra Dr. Ronald C. Young Multipath Corporation FMS Performance History Machine Year Flops DEC VAX 1978 97,000 FPS 164 1982 11,000,000 FPS 164-MAX 1985 341,000,000 DEC VAX
More informationANSYS High. Computing. User Group CAE Associates
ANSYS High Performance Computing User Group 010 010 CAE Associates Parallel Processing in ANSYS ANSYS offers two parallel processing methods: Shared-memory ANSYS: Shared-memory ANSYS uses the sharedmemory
More informationLustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE
Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute
More informationAccelerating Implicit LS-DYNA with GPU
Accelerating Implicit LS-DYNA with GPU Yih-Yih Lin Hewlett-Packard Company Abstract A major hindrance to the widespread use of Implicit LS-DYNA is its high compute cost. This paper will show modern GPU,
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationWhy HPC for. ANSYS Mechanical and ANSYS CFD?
Why HPC for ANSYS Mechanical and ANSYS CFD? 1 HPC Defined High Performance Computing (HPC) at ANSYS: An ongoing effort designed to remove computing limitations from engineers who use computer aided engineering
More informationThe Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center
The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.
More informationFaster Metal Forming Solution with Latest Intel Hardware & Software Technology
12 th International LS-DYNA Users Conference Computing Technologies(3) Faster Metal Forming Solution with Latest Intel Hardware & Software Technology Nick Meng 1, Jixian Sun 2, Paul J Besl 1 1 Intel Corporation,
More informationThe Red Storm System: Architecture, System Update and Performance Analysis
The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI
More informationIntegrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective
Integrating FPGAs in High Performance Computing A System, Architecture, and Implementation Perspective Nathan Woods XtremeData FPGA 2007 Outline Background Problem Statement Possible Solutions Description
More informationBig Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures
Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid
More informationPerformance Benefits of NVIDIA GPUs for LS-DYNA
Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA
More informationAccelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead
Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics
More informationA Comprehensive Study on the Performance of Implicit LS-DYNA
12 th International LS-DYNA Users Conference Computing Technologies(4) A Comprehensive Study on the Performance of Implicit LS-DYNA Yih-Yih Lin Hewlett-Packard Company Abstract This work addresses four
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationSystem Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.
System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has
More informationThe Cray CX1 puts massive power and flexibility right where you need it in your workgroup
The Cray CX1 puts massive power and flexibility right where you need it in your workgroup Up to 96 cores of Intel 5600 compute power 3D visualization Up to 32TB of storage GPU acceleration Small footprint
More informationEarly Experiences with Trinity - The First Advanced Technology Platform for the ASC Program
Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program C.T. Vaughan, D.C. Dinge, P.T. Lin, S.D. Hammond, J. Cook, C. R. Trott, A.M. Agelastos, D.M. Pase, R.E. Benner,
More informationApplication Performance on Dual Processor Cluster Nodes
Application Performance on Dual Processor Cluster Nodes by Kent Milfeld milfeld@tacc.utexas.edu edu Avijit Purkayastha, Kent Milfeld, Chona Guiang, Jay Boisseau TEXAS ADVANCED COMPUTING CENTER Thanks Newisys
More informationStructural Mechanics With ANSYS Pierre THIEFFRY Lead Product Manager ANSYS, Inc.
Structural Mechanics With ANSYS Pierre THIEFFRY Lead Product Manager ANSYS, Inc. 1 ANSYS Structural Mechanics provides highend solver solutions within a highly productive user environment to reliably and
More informationParticleworks: Particle-based CAE Software fully ported to GPU
Particleworks: Particle-based CAE Software fully ported to GPU Introduction PrometechVideo_v3.2.3.wmv 3.5 min. Particleworks Why the particle method? Existing methods FEM, FVM, FLIP, Fluid calculation
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationUnderstanding Hardware Selection to Speedup Your CFD and FEA Simulations
Understanding Hardware Selection to Speedup Your CFD and FEA Simulations 1 Agenda Why Talking About Hardware HPC Terminology ANSYS Work-flow Hardware Considerations Additional resources 2 Agenda Why Talking
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationHPC Capabilities at Research Intensive Universities
HPC Capabilities at Research Intensive Universities Purushotham (Puri) V. Bangalore Department of Computer and Information Sciences and UAB IT Research Computing UAB HPC Resources 24 nodes (192 cores)
More informationCUDA Experiences: Over-Optimization and Future HPC
CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign
More informationThread and Data parallelism in CPUs - will GPUs become obsolete?
Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für
More informationLBRN - HPC systems : CCT, LSU
LBRN - HPC systems : CCT, LSU HPC systems @ CCT & LSU LSU HPC Philip SuperMike-II SuperMIC LONI HPC Eric Qeenbee2 CCT HPC Delta LSU HPC Philip 3 Compute 32 Compute Two 2.93 GHz Quad Core Nehalem Xeon 64-bit
More information2008 International ANSYS Conference
28 International ANSYS Conference Maximizing Performance for Large Scale Analysis on Multi-core Processor Systems Don Mize Technical Consultant Hewlett Packard 28 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationSami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1
Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationAccelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing
Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product
More informationDell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia
More informationACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationHPC Enabling R&D at Philip Morris International
HPC Enabling R&D at Philip Morris International Jim Geuther*, Filipe Bonjour, Bruce O Neel, Didier Bouttefeux, Sylvain Gubian, Stephane Cano, and Brian Suomela * Philip Morris International IT Service
More informationSUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine
SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07
More informationFEMAP/NX NASTRAN PERFORMANCE TUNING
FEMAP/NX NASTRAN PERFORMANCE TUNING Chris Teague - Saratech (949) 481-3267 www.saratechinc.com NX Nastran Hardware Performance History Running Nastran in 1984: Cray Y-MP, 32 Bits! (X-MP was only 24 Bits)
More informationCST STUDIO SUITE R Supported GPU Hardware
CST STUDIO SUITE R 2017 Supported GPU Hardware 1 Supported Hardware CST STUDIO SUITE currently supports up to 8 GPU devices in a single host system, meaning each number of GPU devices between 1 and 8 is
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationPerformance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem
Performance Comparison between Blocking and Non-Blocking Communications for a Three-Dimensional Poisson Problem Guan Wang and Matthias K. Gobbert Department of Mathematics and Statistics, University of
More informationQLogic TrueScale InfiniBand and Teraflop Simulations
WHITE Paper QLogic TrueScale InfiniBand and Teraflop Simulations For ANSYS Mechanical v12 High Performance Interconnect for ANSYS Computer Aided Engineering Solutions Executive Summary Today s challenging
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationPhilippe Thierry Sr Staff Engineer Intel Corp.
HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market
More informationComputer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters ANSYS, Inc. All rights reserved. 1 ANSYS, Inc.
Computer Aided Engineering with Today's Multicore, InfiniBand-Based Clusters 2006 ANSYS, Inc. All rights reserved. 1 ANSYS, Inc. Proprietary Our Business Simulation Driven Product Development Deliver superior
More informationPyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent
PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python F.D. Witherden, M. Klemm, P.E. Vincent 1 Overview Motivation. Accelerators and Modern Hardware Python and PyFR. Summary. Motivation
More informationOzenCloud Case Studies
OzenCloud Case Studies Case Studies, April 20, 2015 ANSYS in the Cloud Case Studies: Aerodynamics & fluttering study on an aircraft wing using fluid structure interaction 1 Powered by UberCloud http://www.theubercloud.com
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationThe AMD64 Technology for Server and Workstation. Dr. Ulrich Knechtel Enterprise Program Manager EMEA
The AMD64 Technology for Server and Workstation Dr. Ulrich Knechtel Enterprise Program Manager EMEA Agenda Direct Connect Architecture AMD Opteron TM Processor Roadmap Competition OEM support The AMD64
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationHMEM and Lemaitre2: First bricks of the CÉCI s infrastructure
HMEM and Lemaitre2: First bricks of the CÉCI s infrastructure - CÉCI: What we want - Cluster HMEM - Cluster Lemaitre2 - Comparison - What next? - Support and training - Conclusions CÉCI: What we want CÉCI:
More informationHigh-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs
High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea
More informationLS-DYNA Performance Benchmark and Profiling. October 2017
LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource
More informationAutomatic Tuning of the High Performance Linpack Benchmark
Automatic Tuning of the High Performance Linpack Benchmark Ruowei Chen Supervisor: Dr. Peter Strazdins The Australian National University What is the HPL Benchmark? World s Top 500 Supercomputers http://www.top500.org
More informationThe Road from Peta to ExaFlop
The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units
More informationPerformance Analysis and Prediction for distributed homogeneous Clusters
Performance Analysis and Prediction for distributed homogeneous Clusters Heinz Kredel, Hans-Günther Kruse, Sabine Richling, Erich Strohmaier IT-Center, University of Mannheim, Germany IT-Center, University
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationSATGPU - A Step Change in Model Runtimes
SATGPU - A Step Change in Model Runtimes User Group Meeting Thursday 16 th November 2017 Ian Wright, Atkins Peter Heywood, University of Sheffield 20 November 2017 1 SATGPU: Phased Development Phase 1
More informationImproving Your Structural Mechanics Simulations with Release 14.0
Improving Your Structural Mechanics Simulations with Release 14.0 1 What will Release 14.0 bring you? 2 Let s now take a closer look at some topics 3 MAPDL/WB Integration Finite Element Information Access
More informationProperly Sizing Processing and Memory for your AWMS Server
Overview This document provides guidelines for purchasing new hardware which will host the AirWave Wireless Management System. Your hardware should incorporate margin for WLAN expansion as well as future
More informationHPC Current Development in Indonesia. Dr. Bens Pardamean Bina Nusantara University Indonesia
HPC Current Development in Indonesia Dr. Bens Pardamean Bina Nusantara University Indonesia HPC Facilities Educational & Research Institutions in Indonesia CIBINONG SITE Basic Nodes: 80 node 2 processors
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More informationPerformance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet. Swamy N. Kandadai and Xinghong He and
Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet Swamy N. Kandadai and Xinghong He swamy@us.ibm.com and xinghong@us.ibm.com ABSTRACT: We compare the performance of several applications
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More information(software agnostic) Computational Considerations
(software agnostic) Computational Considerations The Issues CPU GPU Emerging - FPGA, Phi, Nervana Storage Networking CPU 2 Threads core core Processor/Chip Processor/Chip Computer CPU Threads vs. Cores
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationThe determination of the correct
SPECIAL High-performance SECTION: H i gh-performance computing computing MARK NOBLE, Mines ParisTech PHILIPPE THIERRY, Intel CEDRIC TAILLANDIER, CGGVeritas (formerly Mines ParisTech) HENRI CALANDRA, Total
More informationEnhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations
Performance Brief Quad-Core Workstation Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations With eight cores and up to 80 GFLOPS of peak performance at your fingertips,
More informationAPENet: LQCD clusters a la APE
Overview Hardware/Software Benchmarks Conclusions APENet: LQCD clusters a la APE Concept, Development and Use Roberto Ammendola Istituto Nazionale di Fisica Nucleare, Sezione Roma Tor Vergata Centro Ricerce
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationEfficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling
Iterative Solvers Numerical Results Conclusion and outlook 1/22 Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Part II: GPU Implementation and Scaling on Titan Eike
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationIndustrial finite element analysis: Evolution and current challenges. Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009
Industrial finite element analysis: Evolution and current challenges Keynote presentation at NAFEMS World Congress Crete, Greece June 16-19, 2009 Dr. Chief Numerical Analyst Office of Architecture and
More informationComputing on GPU Clusters
Computing on GPU Clusters Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More information