ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

Similar documents
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs

Enhanced Oil Recovery simulation Performances on New Hybrid Architectures

GPU-Acceleration of CAE Simulations. Bhushan Desam NVIDIA Corporation

Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA

Optimising the Mantevo benchmark suite for multi- and many-core architectures

GPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA

S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS

CUDA Accelerated Compute Libraries. M. Naumov

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

Distributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs

GPU PROGRESS AND DIRECTIONS IN APPLIED CFD

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

14MMFD-34 Parallel Efficiency and Algorithmic Optimality in Reservoir Simulation on GPUs

ANSYS HPC Technology Leadership

OpenFOAM + GPGPU. İbrahim Özküçük

GPU-based Parallel Reservoir Simulators

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

Applications of Berkeley s Dwarfs on Nvidia GPUs

Introduction to Multigrid and its Parallelization

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

D036 Accelerating Reservoir Simulation with GPUs

Two-Phase flows on massively parallel multi-gpu clusters

Multi-GPU simulations in OpenFOAM with SpeedIT technology.

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models

The Visual Computing Company

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA

GPU Cluster Computing for FEM

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

CUDA 8 PERFORMANCE OVERVIEW. November 2016

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Why HPC for. ANSYS Mechanical and ANSYS CFD?

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

Accelerating Implicit LS-DYNA with GPU

NEW ADVANCES IN GPU LINEAR ALGEBRA

GPU Acceleration of Unmodified CSM and CFD Solvers

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs

Stan Posey NVIDIA, Santa Clara, CA, USA;

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Application of GPU technology to OpenFOAM simulations

Code Saturne on POWER8 clusters: First Investigations

Performances and Tuning for Designing a Fast Parallel Hemodynamic Simulator. Bilel Hadri

Recent Advances in ANSYS Toward RDO Practices Using optislang. Wim Slagter, ANSYS Inc. Herbert Güttler, MicroConsult GmbH

Efficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI

Performance of deal.ii on a node

Understanding Hardware Selection to Speedup Your CFD and FEA Simulations

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

The Fermi GPU and HPC Application Breakthroughs

Block Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems

Highly Parallel Multigrid Solvers for Multicore and Manycore Processors

Krishnan Suresh Associate Professor Mechanical Engineering

Iterative Sparse Triangular Solves for Preconditioning

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011

Performance Benefits of NVIDIA GPUs for LS-DYNA

The Immersed Interface Method

Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing

Advances of parallel computing. Kirill Bogachev May 2016

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

GPU Implementation of Elliptic Solvers in NWP. Numerical Weather- and Climate- Prediction

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations

Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling

OPENFOAM ON GPUS USING AMGX

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

Scalability of Elliptic Solvers in NWP. Weather and Climate- Prediction

A Comparison of Algebraic Multigrid Preconditioners using Graphics Processing Units and Multi-Core Central Processing Units

Accelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University

Dynamic Selection of Auto-tuned Kernels to the Numerical Libraries in the DOE ACTS Collection

TESLA V100 PERFORMANCE GUIDE May 2018

Contents. I The Basic Framework for Stationary Problems 1

Large scale Imaging on Current Many- Core Platforms

GPU DEVELOPMENT & FUTURE PLAN OF MIDAS NFX

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Mathematical Methods in Fluid Dynamics and Simulation of Giant Oil and Gas Reservoirs. 3-5 September 2012 Swissotel The Bosphorus, Istanbul, Turkey

Automated Finite Element Computations in the FEniCS Framework using GPUs

Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation

Missile External Aerodynamics Using Star-CCM+ Star European Conference 03/22-23/2011

Numerical Algorithms on Multi-GPU Architectures

NVIDIA GPU TECHNOLOGY UPDATE

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM

Transcription:

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2

Large Volumes Complex Geometry High Velocities Turbulence Multiple fluids CHALLENGES Computational Fluid Dynamics and Reservoir Simulations 3

ACCURACY & PERFORMANCE MATTERS Quality Matters Designing critical systems Oil field management -- $$$ Time to solution matters Limits size and accuracy Limits coverage 4

WHAT IS AMG ALGEBRAIC MULTI GRID A way to efficiently solve very large systems System: Ax b = 0 A = matrix - system x,b = vector - state Residual Function : r= b-ax Start with a proposed solution vector x Iterative process reduces r Trick: approximate solutions Make a small system based on the big one Solve that to get a good approximation Project back to full system Fast, powerful, general technique! 5

ALGEBRAIC MULTI GRID THE V CYCLE Full Size System Full Size System Coarsen + Smooth Prolong+ Smooth Medium System Medium System Coarsen + Smooth Prolong+ Smooth Tiny System Solve Tiny System Repeat a few times 6

Coarsening 7

Coarsening 8

Why use AMG? A Powerful Solver Handles Complex Geometries Complex Physics Huge Systems High Resolution Fast Algorithm Each iteration reduces r by 2-10x. Converge with 6-20 iterations It runs really well on NVIDIA GPUs! 9

Speedup vs. HYPRE NVIDIA AmgX Unstructured Implicit Linear Systems - Solved > 15x Speedup vs HYPRE The AmgX library provides a configurable and scalable GPU accelerated algebraic multi-grid solver for large sparse linear systems. CFD and Reservoir Simulation Scales up to hundreds of GPUs 18x 16x 14x 12x 10x 8x 6x 4x 2x 0x 14x HYPRE K40 M40 P100-SXM2 17x 16x 15x 14x 15x Rich collection of algorithms Accelerate existing simulations https://developer.nvidia.com/amgx Florida Matrix Collection; Total Time to Solution HYPRE AMG Package (http://acts.nersc.gov/hypre) on Intel Xeon E5-2697 v4@2.3ghz, 3.6GHz Turbo, Hyperthreading off AmgX on K40m, M40, P100; Base clocks Host system: Intel Xeon Haswell single-socket 16-core E5-2698 v3 @ 2.3GHz, 3.6GHz Turbo CentOS 7.2 x86-64 with 128GB System Memory 10

ANSYS Mechanical jobs/day ANSYS Mechanical 16.0 on Tesla K80 Simulation productivity (with HPC Pack) THE Industry Standard CFD is accelerated with AmgX V15sp-4 Model 371 V15sp-5 Model 2.3x 247 Turbine geometry 3,200,000 DOF SOLID187 FEs Static, nonlinear Distributed ANSYS 16.0 Direct sparse solver 159 Higher is Better 135 1.8x Ball Grid Array geometry 6,000,000 DOF Static, nonlinear Distributed ANSYS 16.0 Direct sparse solver 8 CPU cores 6 CPU cores + K80 GPU V15sp-4 Model 8 CPU cores 6 CPU cores + K80 GPU V15sp-5 Model Distributed ANSYS Mechanical 16.0 with Ivy Bridge (Xeon E5-2697 V2 2.7 GHz) 8-core CPU and a Tesla K80 GPU. 11

AmgX in Reservoir Simulation Application Time (seconds) 1500 1150 1000 Lower is Better 500 197 98 0 CPU GPU Custom AmgX 3-phase Black Oil Reservoir Simulation. 400K grid blocks solved fully implicitly. CPU: Intel Xeon CPU E5-2670 GPU: NVIDIA Tesla K10 12

Minimal Example With Config //One header #include amgx_c.h //Read config file AMGX_create_config(&cfg, cfgfile); //Create resources based on config AMGX_resources_create_simple(&res, cfg); //Create solver object, A,x,b, set precision AMGX_solver_create(&solver, res, mode, cfg); AMGX_matrix_create(&A,res,mode); AMGX_vector_create(&x,res,mode); AMGX_vector_create(&b,res,mode); //Read coefficients from a file AMGX_read_system(&A,&x,&b, matrixfile); //Setup and Solve Loop AMGX_solver_setup(solver,A); AMGX_solver_solve(solver, b, x); //Download Result AMGX_download_vector(&x) solver(main)=fgmres main:max_iters=100 main:convergence=relative_max main:tolerance=0.1 main:preconditioner(amg)=amg amg:algorithm=aggregation amg:selector=size_8 amg:cycle=v amg:max_iters=1 amg:max_levels=10 amg:smoother(smoother)=block_jacobi amg:relaxation_factor= 0.75 amg:presweeps=1 amg:postsweeps=2 amg:coarsest_sweeps=4 determinism_flag=1 13

AmgX GPU acceleration for your simulation Flexible & powerful technique Simple to adopt Get results 15x faster with Pascal Scales out to handle large systems http://developer.nvidia.com/amgx 14

Thanks! Questions? http://developer.nvidia.com/amgx cgottbrath@nvidia.com

Numerical Models There are a lot of different ways to model AmgX is only used for PDE type models It works best with unstructured implicit code. Other code can be Accelerated using the CUDA libraries such as curand, cublas, cufft, cusparse, and cusolver Event Based Models Lattice Boltzman Monte Carlo Finite State Machine N-body / SPH PDE based models Grid Structured Unstructured Method Implicit Explicit 16

Second Order PDEs Many physical problem domains can be modeled using 2 nd order PDE These can be classified based on how they behave mathematically Hyperbolic Pure Hyperbolic Use Other Solvers Parabolic Solutions smooth out over time. Heat transfer Elliptic Smooth within volume, potentially discontinuous boundary values. Subsonic fluid flow Hyperbolic Discontinuities (shocks) will persist Wave equation Supersonic fluid flow Parabolic Helmholtz Helmholtz problems are currently not well suited Elliptic 17 Use AmgX

KEY FEATURES Classical and Aggregation AMG Robust Aggressive Coarsening Algorithms Krylov methods: CG, GMRES, BiCGStab, IDR Smoothers / Solvers Jacobi-L1,Block-Jacobi, ILU[0,1,2], DILU, Dense LU, KPZ- Polynomial, Chebyshev High level API with composition through configuration MPI support with consolidation Multi-precision Adaptors for: HYPRE, PETSc, Trilinos Python bindings 18