ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
Large Volumes Complex Geometry High Velocities Turbulence Multiple fluids CHALLENGES Computational Fluid Dynamics and Reservoir Simulations 3
ACCURACY & PERFORMANCE MATTERS Quality Matters Designing critical systems Oil field management -- $$$ Time to solution matters Limits size and accuracy Limits coverage 4
WHAT IS AMG ALGEBRAIC MULTI GRID A way to efficiently solve very large systems System: Ax b = 0 A = matrix - system x,b = vector - state Residual Function : r= b-ax Start with a proposed solution vector x Iterative process reduces r Trick: approximate solutions Make a small system based on the big one Solve that to get a good approximation Project back to full system Fast, powerful, general technique! 5
ALGEBRAIC MULTI GRID THE V CYCLE Full Size System Full Size System Coarsen + Smooth Prolong+ Smooth Medium System Medium System Coarsen + Smooth Prolong+ Smooth Tiny System Solve Tiny System Repeat a few times 6
Coarsening 7
Coarsening 8
Why use AMG? A Powerful Solver Handles Complex Geometries Complex Physics Huge Systems High Resolution Fast Algorithm Each iteration reduces r by 2-10x. Converge with 6-20 iterations It runs really well on NVIDIA GPUs! 9
Speedup vs. HYPRE NVIDIA AmgX Unstructured Implicit Linear Systems - Solved > 15x Speedup vs HYPRE The AmgX library provides a configurable and scalable GPU accelerated algebraic multi-grid solver for large sparse linear systems. CFD and Reservoir Simulation Scales up to hundreds of GPUs 18x 16x 14x 12x 10x 8x 6x 4x 2x 0x 14x HYPRE K40 M40 P100-SXM2 17x 16x 15x 14x 15x Rich collection of algorithms Accelerate existing simulations https://developer.nvidia.com/amgx Florida Matrix Collection; Total Time to Solution HYPRE AMG Package (http://acts.nersc.gov/hypre) on Intel Xeon E5-2697 v4@2.3ghz, 3.6GHz Turbo, Hyperthreading off AmgX on K40m, M40, P100; Base clocks Host system: Intel Xeon Haswell single-socket 16-core E5-2698 v3 @ 2.3GHz, 3.6GHz Turbo CentOS 7.2 x86-64 with 128GB System Memory 10
ANSYS Mechanical jobs/day ANSYS Mechanical 16.0 on Tesla K80 Simulation productivity (with HPC Pack) THE Industry Standard CFD is accelerated with AmgX V15sp-4 Model 371 V15sp-5 Model 2.3x 247 Turbine geometry 3,200,000 DOF SOLID187 FEs Static, nonlinear Distributed ANSYS 16.0 Direct sparse solver 159 Higher is Better 135 1.8x Ball Grid Array geometry 6,000,000 DOF Static, nonlinear Distributed ANSYS 16.0 Direct sparse solver 8 CPU cores 6 CPU cores + K80 GPU V15sp-4 Model 8 CPU cores 6 CPU cores + K80 GPU V15sp-5 Model Distributed ANSYS Mechanical 16.0 with Ivy Bridge (Xeon E5-2697 V2 2.7 GHz) 8-core CPU and a Tesla K80 GPU. 11
AmgX in Reservoir Simulation Application Time (seconds) 1500 1150 1000 Lower is Better 500 197 98 0 CPU GPU Custom AmgX 3-phase Black Oil Reservoir Simulation. 400K grid blocks solved fully implicitly. CPU: Intel Xeon CPU E5-2670 GPU: NVIDIA Tesla K10 12
Minimal Example With Config //One header #include amgx_c.h //Read config file AMGX_create_config(&cfg, cfgfile); //Create resources based on config AMGX_resources_create_simple(&res, cfg); //Create solver object, A,x,b, set precision AMGX_solver_create(&solver, res, mode, cfg); AMGX_matrix_create(&A,res,mode); AMGX_vector_create(&x,res,mode); AMGX_vector_create(&b,res,mode); //Read coefficients from a file AMGX_read_system(&A,&x,&b, matrixfile); //Setup and Solve Loop AMGX_solver_setup(solver,A); AMGX_solver_solve(solver, b, x); //Download Result AMGX_download_vector(&x) solver(main)=fgmres main:max_iters=100 main:convergence=relative_max main:tolerance=0.1 main:preconditioner(amg)=amg amg:algorithm=aggregation amg:selector=size_8 amg:cycle=v amg:max_iters=1 amg:max_levels=10 amg:smoother(smoother)=block_jacobi amg:relaxation_factor= 0.75 amg:presweeps=1 amg:postsweeps=2 amg:coarsest_sweeps=4 determinism_flag=1 13
AmgX GPU acceleration for your simulation Flexible & powerful technique Simple to adopt Get results 15x faster with Pascal Scales out to handle large systems http://developer.nvidia.com/amgx 14
Thanks! Questions? http://developer.nvidia.com/amgx cgottbrath@nvidia.com
Numerical Models There are a lot of different ways to model AmgX is only used for PDE type models It works best with unstructured implicit code. Other code can be Accelerated using the CUDA libraries such as curand, cublas, cufft, cusparse, and cusolver Event Based Models Lattice Boltzman Monte Carlo Finite State Machine N-body / SPH PDE based models Grid Structured Unstructured Method Implicit Explicit 16
Second Order PDEs Many physical problem domains can be modeled using 2 nd order PDE These can be classified based on how they behave mathematically Hyperbolic Pure Hyperbolic Use Other Solvers Parabolic Solutions smooth out over time. Heat transfer Elliptic Smooth within volume, potentially discontinuous boundary values. Subsonic fluid flow Hyperbolic Discontinuities (shocks) will persist Wave equation Supersonic fluid flow Parabolic Helmholtz Helmholtz problems are currently not well suited Elliptic 17 Use AmgX
KEY FEATURES Classical and Aggregation AMG Robust Aggressive Coarsening Algorithms Krylov methods: CG, GMRES, BiCGStab, IDR Smoothers / Solvers Jacobi-L1,Block-Jacobi, ILU[0,1,2], DILU, Dense LU, KPZ- Polynomial, Chebyshev High level API with composition through configuration MPI support with consolidation Multi-precision Adaptors for: HYPRE, PETSc, Trilinos Python bindings 18