GPU-Acceleration of CAE Simulations. Bhushan Desam NVIDIA Corporation

Similar documents
Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA

The Visual Computing Company

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead

Why HPC for. ANSYS Mechanical and ANSYS CFD?

GPU COMPUTING WITH MSC NASTRAN 2013

Performance Benefits of NVIDIA GPUs for LS-DYNA

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models

Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations

Application of GPU technology to OpenFOAM simulations

ANSYS HPC Technology Leadership

GPU PROGRESS AND DIRECTIONS IN APPLIED CFD

OpenFOAM + GPGPU. İbrahim Özküçük

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011

Stan Posey NVIDIA, Santa Clara, CA, USA;

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Understanding Hardware Selection to Speedup Your CFD and FEA Simulations

ANSYS High. Computing. User Group CAE Associates

Multi-GPU simulations in OpenFOAM with SpeedIT technology.

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup

The Fermi GPU and HPC Application Breakthroughs

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015

GPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA

S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS

GPU Computing fuer rechenintensive Anwendungen. Axel Koehler NVIDIA

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

Recent Advances in ANSYS Toward RDO Practices Using optislang. Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH

Performance of Implicit Solver Strategies on GPUs

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

Krishnan Suresh Associate Professor Mechanical Engineering

Industrial achievements on Blue Waters using CPUs and GPUs

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

Large-scale Gas Turbine Simulations on GPU clusters

WHAT S NEW IN GRID 7.0. Mason Wu, GRID & ProViz Solutions Architect Nov. 2018

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

TFLOP Performance for ANSYS Mechanical

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

CUDA Accelerated Compute Libraries. M. Naumov

Computing on GPU Clusters

GPU-based Parallel Reservoir Simulators

Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

Accelerating Implicit LS-DYNA with GPU

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Large scale Imaging on Current Many- Core Platforms

Two-Phase flows on massively parallel multi-gpu clusters

Turbostream: A CFD solver for manycore

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs

University at Buffalo Center for Computational Research

Team 194: Aerodynamic Study of Airflow around an Airfoil in the EGI Cloud

GPU Cluster Computing for FEM

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle

OP2 FOR MANY-CORE ARCHITECTURES

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

GPU Acceleration of Matrix Algebra. Dr. Ronald C. Young Multipath Corporation. fmslib.com

Using GPUs for unstructured grid CFD

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering

Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures

Accelerating High Performance Computing.

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program

Block Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems

Recent Advances in Modelling Wind Parks in STAR CCM+ Steve Evans

Numerical Algorithms on Multi-GPU Architectures

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

CST STUDIO SUITE R Supported GPU Hardware

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

ANSYS - Workbench Overview. From zero to results. AGH 2014 April, 2014 W0-1

FEMAP/NX NASTRAN PERFORMANCE TUNING

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

smooth coefficients H. Köstler, U. Rüde

High Performance Computing with Accelerators

FRAUNHOFER INSTITUTE FOR ALGORITHMS AND SCIENTIFIC COMPUTING SCAI

Enhanced Oil Recovery simulation Performances on New Hybrid Architectures

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst

HPC with Multicore and GPUs

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications

Mesh Morphing and the Adjoint Solver in ANSYS R14.0. Simon Pereira Laz Foley

OzenCloud Case Studies

Tesla GPU Computing A Revolution in High Performance Computing

Particleworks: Particle-based CAE Software fully ported to GPU

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Transcription:

GPU-Acceleration of CAE Simulations Bhushan Desam NVIDIA Corporation bdesam@nvidia.com 1

AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications Computational Fluid Dynamics (CFD) Applications Computational Structural Mechanics (CSM) Applications Computational Electro Magnetics (CEM) Applications NVIDIA Solutions Value of GPU-accelerated Simulations 2

NVIDIA Enterprise Group Visualization, Accelerated Computing & Virtualization QUADRO Revolutionizing Design & Visualization TESLA Accelerating Momentum in HPC and Big Data Analytics GRID Enabling End-to-End Enterprise Virtualization 3

TESLA Accelerating Computing GPUs enable tremendous breakthroughs by simply enabling us to do more, faster ANSYS, SIMULIA, and other major ISVs leverage GPUs to accelerate engineering simulation for better design The world s top10 energyefficient supercomputers use NVIDIA GPUs according to TheGreen500 list 4

AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications Computational Fluid Dynamics (CFD) Applications Computational Structural Mechanics (CSM) Applications Computational Electro Magnetics (CEM) Applications NVIDIA Solutions Value of GPU-accelerated Simulations 5

Business Challenges in Product Development Improve productquality Faster time-tomarket Manage Product Complexity Simulation is playing an important role in meeting these challenges, thus add value to many product development companies 6

Changing Role of Simulation in Product Development From insight to product innovation Design variable 2 Final design concept E E E S E E S S E E S E S S Experience envelope E Experiment S Simulation Design variable 2 Final design concept S S S S S S E S S S S S S S S S S E E S S S S S S S Experience envelope S S S S S S S Innovation envelope Design variable 1 Building insight Design variable 1 Simulation-driven product innovation 7

Computing Capacity is Still a Major Challenge Frequency of limiting size/detail in simulation models due to compute infrastructure or turnaround time limitations Almost never 9% Nearly every model 34% For some models 57% Source: Survey by ANSYS with over 1,800 respondents 8

Increasing GPU Performance & Memory Bandwidth Peak Double Precision FP Peak Memory Bandwidth GFLOPS GB/s 1400 1200 Kepler 300 250 Kepler 1000 800 600 400 200 M1060 Nehalem 3 GHz Fermi M2070 Westmere 3 GHz Fermi+ M2090 8 core Sandy Bridge 3 GHz 0 2007 2008 2009 2010 2011 2012 200 150 100 50 M1060 Nehalem 3 GHz Fermi M2070 Westmere 3 GHz Fermi+ M2090 8 core Sandy Bridge 3 GHz 0 2007 2008 2009 2010 2011 2012 Double Precision: NVIDIA GPU Double Precision: x86 CPU NVIDIA GPU (ECC off) x86 CPU 9

AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications Computational Fluid Dynamics (CFD) Applications Computational Structural Mechanics (CSM) Applications Computational Electro Magnetics (CEM) Applications NVIDIA Solutions Value of GPU-accelerated Simulations 10

Basics of GPU Computing GPU is an accelerator attached to an x86 CPU GPU acceleration is user-transparent Jobs launch and complete without additional user steps Schematic of a CPU with an attached GPU accelerator CPU begins/ends job, GPU manages heavy computations CPU I/O Hub Cache 1 4 DDR DDR PCI-Express 3 2 GDDR GDDR GPU Schematic of an x86 CPU with a GPU accelerator 1. Job launched on CPU 2. Solver operations sent to GPU 3. GPU sends results back to CPU 4. Job completes on CPU 11

GPU Acceleration of a CAE Application CAE Application Software Read input, matrix Set-up GPU Implicit Sparse Matrix Operations - Hand-CUDA Parallel 40% - 75% of Profile time, Small % LoC Implicit Sparse Matrix Operations CPU -GPU Libraries, CUBLAS -OpenACCDirectives Global solution, write output (Investigating OpenACC for more tasks on GPU) + 12

AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications Computational Fluid Dynamics (CFD) Applications Computational Structural Mechanics (CSM) Applications Computational Electro Magnetics (CEM) Applications NVIDIA Solutions Value of GPU-accelerated Simulations 13

GPU-accelerated CFD Applications 14

ANSYS Fluent 15.0 15

GPU Acceleration in Fluent 15.0 Fluent solution time 30-40% Non- AMG 60-70% AMG GPU CPU In flow problems, pressure-based coupled solver typically spends 60-70% time in AMG where as segregated solver only spends 30-40% of the time in AMG Higher AMG times are ideal for GPU acceleration, thus coupled-problems benefit from GPUs 16

Fluent Speed-up from GPU acceleration 3.0 Coupled solver Speed-up factor in Fluent 2.5 2.0 1.5 Segregated solver AMG speed-up on GPU 3.5 2.5 2.0 1.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Linear solver fraction 17

ANSYS 15.0 HPC licenses (new) Treats each GPU socket as a CPU core, which significantly increases simulation productivity from HPC licenses All ANSYS HPC products unlock GPUs in 15.0, including HPC, HPC Pack, HPC Workgroup, and HPC Enterprise products. License type ANSYS 14.5 ANSYS 15.0 HPC per core licenses None 1 HPC license 1 GPU 2 HPC licenses 2 GPUs N HPC licenses N GPUs HPC pack GPU support in ANSYS Licensing comparison 1 HPC pack 1 GPU 2 HPC packs 4 GPUs 1 HPC pack 4 GPUs 2 HPC packs 16 GPUs 18

GPU Acceleration of Water Jacket Analysis ANSYS Fluent 15.0 performance on pressure-based coupled Solver 6391 ANSYS Fluent Time (Sec) 4557 5.9x 775 AMG solver time Lower is Better CPU only CPU + GPU CPU only 2.5x 2520 CPU + GPU Solution time Water jacket model Unsteady RANS model Fluid: water Internal flow CPU: Intel Xeon E5 2680; 8 cores GPU: 2 X Tesla K40 NOTE: Times for 20 time steps 19

GPU value proposition for Fluent 15.0 25 secs/iter Formula 1 aerodynamic study (144 million cells) 2.1x 12 secs/iter Additional cost of adding GPUs and HPC licenses 55% 110% Additional productivity from GPUs CPU only 160 cores CPU + GPU 32 X K40 Lower is Better All results are based on turbulent flow over an F1 case (144million cells) over 1000 iterations; steady-state, pressure-based coupled solver with single-precision; CPU: 8 Ivy Bridge nodes with 20 cores each, F-cycle, size 8; GPU: 32 X Tesla K40, V-cycle, and size 2. 2X Additional productivity from HPC licenses and GPUs CPU-only solution cost 100% Cost CPU 100% Benefit GPU Simulation productivity from CPU-only system CPU-only solution cost is approximated and includes both hardware and paid-up software license costs. Benefit/productivity is based on the number of completed Fluent jobs/day. 20

GPU Scaling in Fluent 15.0 (F1 case) 33 secs/iter 1.9x 17 secs/iter 25 secs/iter 2.1x 12 secs/iter Lower is Better 21 secs/iter 1.8x 12 secs/iter CPU only 120 cores CPU + GPU 24 X K40 CPU only 160 cores CPU + GPU 32 X K40 CPU only 180 cores CPU + GPU 30 X K40 CPU:GPU=5 CPU:GPU=5 CPU:GPU=6 All results are based on turbulent flow over an F1 case (144million cells) over 1000 iterations; steady-state, pressure-based coupled solver with single-precision; CPU: Ivy Bridge with 10 cores per socket, F-cycle, size 8; GPU: Tesla K40, V-cycle, and size 2. 21

Shorter Time to Solution with GPUs at PSI Inc. A customer success story Objective Meeting engineering services schedule & budget, and technical excellence are imperative for success. HPC Solution PSI evaluates and implements the new technology in software (ANSYS 15.0) and hardware (NVIDIA GPU) as soon as possible. GPU produces a 43% reduction in Fluent solution time on an Intel Xeon E5-2687 (8 core, 64GB) workstation equipped with an NVIDIA K40 GPU Design Impact Increased simulation throughput allows meeting deliverytime requirements for engineering services. Images courtesy of Parametric Solutions, Inc. 22

ANSYS Fluent 15.0 Resources ANSYS solution web page at NVIDIA http://www.nvidia.com/ansys GPU user guide for ANSYS Fluent 15.0 available at - http://www.ansys.com/resource+library/technical+briefs/ Accelerating+ANSYS+Fluent+15.0+Using+NVIDIA+GPUs Previously recorded ANSYS IT webcast series webinar titled How to Speed Up ANSYS 15.0 with NVIDIA GPUs, which is available at http://www.ansys.com 23

GPU Acceleration of Computational Fluid Dynamics (CFD) in Industrial Applications using Culises and aerofluidx

Slide 25 Library Culises Concept and Features Simulation tool e.g. OpenFOAM Culises = Cuda Library for Solving Linear Equation Systems See also www.culises.com State of the art solvers for solution of linear systems Multi GPU and multi node capable Single precision or double precision available Krylov subspace methods CG, BiCGStab, GMRES for symmetric /non symmetric matrices Preconditioning options Jacobi (Diagonal) Incomplete Cholesky (IC) Incomplete LU (ILU) Algebraic Multigrid (AMG), see below Stand alone multigrid method Algebraic aggregation and classical coarsening Multitude of smoothers (Jacobi, Gauss Seidel, ILU etc. ) Flexible interfaces for arbitrary applications e.g.: established coupling with OpenFOAM

Slide 26 Culises: Auto OEM Model Multi GPU runs Automotive industrial setup (Japanese OEM) CPU linear solver for pressure: geometric algebraic multigrid (GAMG) of OpenFoam GPU linear solver for pressure: AMG preconditioned CG (AMGPCG) of Culises 200 SIMPLE iterations Grid Cells CPU cores Intel E5 2650 GPUs Nvidia K40 Linear solve time [s] Total simulation time [s] Speedup linear solver Speedup total simulation 18M 8 1 1779 8407 3.83 1.60 18M 8 2 1238 7846 5.50 1.71 18M 16 2 1194 4564 2.50 1.39 62M 16 2 4170 16337 2.62 1.42 62M 32 4 2488 7905 1.90 1.29

Slide 27 Culises Potential speedup for hybrid approach total speedup s. : Limited speedup 2 acceleration of linear solver on GPU:, = 1.0 = 2.5, = 1.0 a :Speedup linear solver a : Speedup matrix assembly fraction f = f: Solve linear system CPU time spent in linear solver total CPU time 1 f: Assembly of linear system f(steady state run) << f(transient run)

Slide 28 aerofluidx an extension of the hybrid approach CPU flow solver e.g. OpenFOAM preprocessing discretization Linear solver postrocessing aerofluidx GPU implementation FV module FV module Culises Culises Porting discretization of equations to GPU discretization module (Finite Volume) running on GPU Possibility of direct coupling to Culises Zero overhead from CPU GPU CPU memory transfer and matrix format conversion Solution of momentum equations also beneficial OpenFOAM environment supported Enables plug in solution for OpenFOAM customers But communication with other input/output file formats possible

Slide 29 aerofluidx NACA0012 airfoil flow CPU: Intel E5 2650 (all 8 cores) GPU: Nvidia K40 4M grid cells (unstructured) Running 100 SIMPLE steps with: OpenFOAM (OF) pressure: GAMG Velocitiy: Gauss Seidel OpenFOAM (OFC) Pressure: Culises AMGPCG (1.5x) Velocity: Gauss Seidel aerofluidx (AFXC) Pressure: Culises AMGPCG Velocity: Culises Jacobi Total speedup: OF (1x) OFC 1.22x AFXC 1.82x 100 90 80 70 60 50 40 30 20 10 0 1x 1x Normalized computing time 1x 1.35x all assembly all linear solve 2.23x 1.71x OpenFOAM OpenFOAM+Culises aerofluidx+culises all assembly = assembly of all linear systems (pressure and velocity) all linear solve = solution of all linear systems (pressure and velocity)

aerofluidx Release Planning 2014 2015 2016 fv0.98 fv1.0 fv1.2 fv2.0 Steady state laminar flow Single GPU Speedup* > 2x Turbulent flow (RANS) k omega (SST) Spalart Allmaras Multi GPU Single node Multi node Unsteady flow Advanced turbulence modelling (LES/DES) Speedup* 2 3x Basic support for moving geometries (MRF) Porous media Advanced model for rotating devices (sliding mesh approach) aerofluidx V1.0 * Speedup against standard OpenFoam

Slide 31 Summary Culises hybrid approach for accelerated CFD applications (OpenFOAM) General applicability for industrial cases including various existing flow models Significant speedup ( 2x) of linear solver employing GPUs Moderate speedup ( 1.6x) of total simulation Culises V1.1 released: Commercial and academic licensing available Free testing & benchmarking opportunities at FluiDyna GPU servers aerofluidx fully ported flow solver on GPU to harvest full GPU computing power General applicability requires rewrite of large portion of existing code Steady state, incompressible unstructured multigrid flow solver established & validated Significant speedup ( 2x) of matrix assembly; without full code tuning/optimization! Enhanced speedup for total simulation

AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications Computational Fluid Dynamics (CFD) Applications Computational Structural Mechanics (CSM) Applications Computational Electro Magnetics (CEM) Applications NVIDIA Solutions Value of GPU-accelerated Simulations 32

GPU-accelerated CSM Applications 33

ANSYS Mechanical15.0 34

GPU Acceleration in Mechanical 15.0 Source: Accelerating Mechanical Solutions with GPUs by Sheldon Imaka, ANSYS Advantage Volume VII, Issue 3, 2013 35

ANSYS Mechanical15.0 on Tesla K40 V14sp 5 Model ANSYS Mechanical jobs/day 3.8X Higher is Better 2.5X Turbine geometry 2,100,000 DOF SOLID187 FEs Static, nonlinear Distributed ANSYS 15.0 Direct sparse solver Distributed ANSYS Mechanical 15.0 with Sandy Bridge (Xeon E5-2687W 3.1 GHz) 8-core CPU and a Tesla K40 GPU with boost clocks; V145sp-5 model, Turbine geometry, 2.5 M DOF, and direct sparse solver. 36

GPU value proposition for Mechanical15.0 With one HPC license + Tesla K20 V14sp 6 Model ANSYS Mechanical jobs/day 59 2.8X 165 2 CPU cores 2 CPU cores + Tesla K20 Simulation productivity Higher is Better V14sp-6 benchmark, 4.9 M DOF, static non-linear analysis; direct sparse solver, distributed ANSYS Mechanical 15.0 with Intel Xeon E5-2697 v2 2.7 GHz CPU; Tesla K20 GPU. 7X Additional productivity from a GPU and one HPC license Additional cost of adding a GPU + HPC license CPU-only solution cost 25% 100% Cost CPU 180% 100% Benefit GPU Additional productivity from GPU Simulation productivity from CPU-only system CPU-only solution cost is approximated and includes both hardware and software license costs. Benefit is based on the number of completed Mechanical jobs/day. 37

GPU value proposition for Mechanical15.0 With an HPC pack + Tesla K20 ANSYS Mechanical jobs/day 180 1.5X 270 8 CPU cores 7 CPU cores + Tesla K20 Simulation productivity Higher is Better V14sp-6 benchmark, 4.9 M DOF, static non-linear analysis; direct sparse solver, distributed ANSYS Mechanical 15.0 with Intel Xeon E5-2697 v2 2.7 GHz CPU; Tesla K20 GPU. V14sp 6 Model 4X Additional productivity from a GPU and HPC pack CPU-only solution cost Additional cost of adding a GPU 12% 100% Cost CPU 50% 100% Benefit GPU Additional productivity from GPU Simulation productivity from CPU-only system CPU-only solution cost is approximated and includes both hardware and software license costs. Benefit is based on the number of completed Mechanical jobs/day. 38

ANSYS Mechanical 15.0 Success Story: PSI Inc. ANSYS V14.5.7 Windows 7 Pro SP1 5.8 Million DOF 8 Cores (Xeon E5-2687) 64 GB Ram NVIDIA C2075 GPU Solution Time : CPU-only: ~ 4 hours With GPU: ~ 2 hours Images courtesy of Parametric Solutions, Inc. GPU Produces a 50% Reduction in Solution Time 39

ANSYS Mechanical 15.0 Resources ANSYS solution web page at NVIDIA http://www.nvidia.com/ansys ANSYS Advantage magazine article titled Accelerating Mechanical Solutions with GPUs for download at http://www.ansys.com Previously recorded ANSYS IT webcast series webinar titled How to Speed Up ANSYS 15.0 with NVIDIA GPUs, which is available at http://www.ansys.com ANSYS ADVANTAGE Volume VII Issue 3 2013 40

SIMULIA Abaqus with NVIDIA GPUs 41

Abaqus/Standard GPU Computing Abaqus 6.11, June 2011 Direct sparse solver is accelerated on single GPU Abaqus 6.12, June 2012 Multi-GPU/node; multi-node DMP clusters Abaqus 6.13, June 2013 Un-symmetric sparse solver on GPU Official Kepler (Tesla K20/K20X) support Abaqus 6.14, July 2014* Direct Sparse Solver Relaxation of memory requirements for GPU Improved performance / DMP split AMS Eigensolver GPUs used in the AMS reduced eigen solution phase. Note: Relevant only for models with ~10,000 or more modes AMS AMS Reduction Phase - Reduce the structure onto substructure modal subspaces AMS Reduced Eigensolution Phase - Compute reduced eigenmodes AMS Recovery Phase - Recover full/partial eigenmodes 42

Abaqus Performance with GPU Customer: Rolls Royce 5.1 UP TO 3.3X FASTER WITH NVIDIA GPU Elapsed Time (hr) 3.3X 1.5 3.0 3.0X Lower is Better 1.0 1 DMP (8 cores) 2 DMP Split (16 cores) Large Model (~77 TFLOPs), 4.71M DOF, Nonlinear Static, Direct Sparse Solver Abaqus 6.14-PR2 with Intel Xeon E5-2690v2, 3.0 GHz CPU, 128 GB memory; Tesla K20x GPU 43

Abaqus Performance with GPU Customer: Rolls Royce 2.4 2.4 Lower is Better Elapsed Time (hr) 2.4X 1.0 Additional cost of adding GPUs and HPC licenses 15% 140% Additional productivity from GPUs 9X CPU-only solution cost 100% 100% Simulation productivity from CPU-only system 20 CPU Cores 20 CPU Cores + CPU Only 2x CPU Tesla + K20X GPU 20 cores 2x K20X Additional productivity/$ spent on GPUs Large Model (~77 TFLOPs), 4.71M DOF, Nonlinear Static, Direct Sparse Solver, 2 DMP - Split Abaqus 6.14-PR2 with Intel Xeon E5-2690v2, 3.0 GHz CPU, 128 GB memory; Tesla K20X Cost CPU Benefit GPU CPU-only solution cost is approximated and includes both hardware and paid-up software license costs. Benefit/productivity is based on the number of completed Abaqus jobs/day. 44

Symmetric Solver Speed-up with DMP Split 6.13 DMP 6.14 DMP Split Speedup Factor (relative to 32 core without GPU) 3.00 2.50 2.00 1.50 1.00 1.31 1.91x 1.49 2.12x 2.15x 1.57 1.48 2.66x 1.53 2.25x 2 HP SL250, 2 Intel E5-2660 cpus (32 cores) 2 Nvidia K20m GPUs per compute node 128 Gb memory per compute node 0.50 0.00 43.6Tflops 68.1 Tflops 88.7 Tflops 155 Tflops 175 Tflops Direct Solver Floating Point Operations 45

Customer-confidential Auto model 4000 3500 3000 3.72 3.03 4.00 3.50 3.00 Up to 50% time savings! Time, sec 2500 2000 1500 1000 500 1.98x 1.74x 2.40 1.48x 2.50 2.00 1.50 1.00 0.50 Std no GPU Std GPU Std Speedup Fct Speedup In addition to time saving, there is license cost advantage of running on 2 nodes with GPUs compared to 3 and 4 nodes without GPUs. 0 2 nodes 3 nodes 4 nodes 0.00 2 MPI processes per compute node 11M DOF on Sandybridge (16 cores+256 GB) and two K20 cards Accelerated DMP execution mode (an optional feature in 6.14) 46

MSc Nastran with NVIDIA GPUs 47

MSC Nastran GPU Computing MSC Nastran direct equation solver is GPU accelerated Sparse direct factorization (MSCLDL, MSCLU) Handles very large fronts Impacts several solutions High (SOL101 Static Stress, SOL108 Direct Freq Response) Medium (SOL103 Modal Analysis) Low (SOL111 Modal Freq Response, SOL400 NL Static & Dynamic) Support of NVIDIA multi-gpus Tesla K20/K20X, Tesla K40, Quadro K6000 (compute & pre/post), Tesla 20-series GPU Licensing Separate license feature supports unlimited GPU cores 48 48

MSC Nastran GPU Computing Timeline MSC Nastran 2012.1, 2H 2011 Real & symmetric sparse direct solver is accelerated on the GPU MSC Nastran 2012.x, 2012 Complex & unsymmetric sparse direct solver is accelerated on the GPU MSC Nastran 2013 & 2013.1, 2013 Vastly reduced use of pinned host memory Ability to handle arbitrarily large fronts, for very large models (> 15M DOF) on a single GPU with 6GB device memory 49

MSC Nastran 2013.1 SMP SOL101 and SOL103 30-70% time savings! 3 2.5 2.65x Higher is Better 2 1.5 1 1.71x 1.68x 1 1 1 Hollow Sphere 0.5 0 Sol101 Eser Sol101 xx0kst0 Sol103 piston 4 cpu 4cpu + k20x Turbine Blade Piston Server node: Ivy Bridge E5-2697v2 (2.7GHz), Tesla K20X GPU, 128 GB memory 50

AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications Computational Fluid Dynamics (CFD) Applications Computational Structural Mechanics (CSM) Applications Computational Electro Magnetics (CEM) Applications NVIDIA Solutions Value of GPU-accelerated Simulations 51

GPU-accelerated CEM Applications 52

GPU-acceleration of ANSYS HFSS Average speed up of 2.41x and a maximum of 5.21x on a Tesla K20 53

AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications Computational Fluid Dynamics (CFD) Applications Computational Structural Mechanics (CSM) Applications Computational Electro Magnetics (CEM) Applications NVIDIA Solutions Value of GPU-accelerated Simulations 54

NVIDIA Kepler family GPUs for CAE simulations K20 (5 GB) K20X (6 GB) K40 (12 GB) K6000 (12 GB) 55

MAXIMUS Solution for Workstations Visual Computing NVIDIA MAXIMUS Parallel Computing Intelligent GPU job Allocation Unified Driver for Quadro + Tesla CAD Operations FEA ISV Application Certifications HP, Dell, Lenovo, others Pre-processing CFD Now Kepler-based GPUs Post-processing CEM Available Since November 2011 56

AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications Computational Fluid Dynamics (CFD) Applications Computational Structural Mechanics (CSM) Applications Computational Electro Magnetics (CEM) Applications NVIDIA Solutions Value of GPU-accelerated Simulations 57

Benefits of GPU-accelerated simulations More simulations in the same amount of time or same number of simulations in less amount of time Improve productquality Faster time-tomarket Complex simulations More design points can be analyzed for better-quality products without slipping project schedules Simulation times can be cut into half, thus shorter product development times Mesh sizes can be doubled or advanced models can be used without increasing simulation times 58

Thank you Bhushan Desam bdesam@nvidia.com 59