Evacuate Now? Faster-than-real-time Shallow Water Simulations on GPUs. NVIDIA GPU Technology Conference San Jose, California, 2010 André R.

Similar documents
Shallow Water Simulations on Graphics Hardware

Simulaciones Eficientes de las Ecuaciones de Aguas Someras en GPU

Load-balancing multi-gpu shallow water simulations on small clusters

Efficient Shallow Water Simulations on GPUs: Implementation, Visualization, Verification, and Validation

Efficient Shallow Water Simulations on GPUs: Implementation, Visualization, Verification, and Validation

EXPLICIT SHALLOW WATER SIMULATIONS ON GPUS: GUIDELINES AND BEST PRACTICES

Auto-tuning Shallow water simulations on GPUs

This is a draft of the paper entitled Simulation and Visualization of the Saint-Venant System using GPUs

This is a draft of the paper entitled Simulation and Visualization of the Saint-Venant System using GPUs

This is a draft. The full paper can be found in Journal of Scientific Computing xx(x):xx xx:

CS205b/CME306. Lecture 9

A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv: v1 [cs.dc] 5 Sep 2013

Parallel Adaptive Tsunami Modelling with Triangular Discontinuous Galerkin Schemes

Simulating Shallow Water on GPUs Programming of Heterogeneous Systems in Physics

MET report. One-Layer Shallow Water Models on the GPU

INTERNATIONAL JOURNAL OF CIVIL AND STRUCTURAL ENGINEERING Volume 2, No 3, 2012

Numerical Methods for (Time-Dependent) HJ PDEs

Simulation of one-layer shallow water systems on multicore and CUDA architectures

Mid-Year Report. Discontinuous Galerkin Euler Equation Solver. Friday, December 14, Andrey Andreyev. Advisor: Dr.

Adaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA

Solving the Euler Equations on Graphics Processing Units

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

Two-Phase flows on massively parallel multi-gpu clusters

XP Solutions has a long history of Providing original, high-performing software solutions Leading the industry in customer service and support

Debojyoti Ghosh. Adviser: Dr. James Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering

Scalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009

Lax-Wendroff and McCormack Schemes for Numerical Simulation of Unsteady Gradually and Rapidly Varied Open Channel Flow

A mass-conservative version of the semi- Lagrangian semi-implicit HIRLAM using Lagrangian vertical coordinates

Development of a Maxwell Equation Solver for Application to Two Fluid Plasma Models. C. Aberle, A. Hakim, and U. Shumlak

Nonoscillatory Central Schemes on Unstructured Triangular Grids for Hyperbolic Systems of Conservation Laws

Fast Tridiagonal Solvers on GPU

GPU Implementation of Implicit Runge-Kutta Methods

NUMERICAL SIMULATION OF THE SHALLOW WATER EQUATIONS USING A TIME-CENTERED SPLIT-IMPLICIT METHOD

High performance 2D Discrete Fourier Transform on Heterogeneous Platforms. Shrenik Lad, IIIT Hyderabad Advisor : Dr. Kishore Kothapalli

How to Optimize Geometric Multigrid Methods on GPUs

Final Report. Discontinuous Galerkin Compressible Euler Equation Solver. May 14, Andrey Andreyev. Adviser: Dr. James Baeder

Radial Basis Function-Generated Finite Differences (RBF-FD): New Opportunities for Applications in Scientific Computing

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Mass-Spring Systems. Last Time?

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang

Partial Differential Equations

Towards real-time prediction of Tsunami impact effects on nearshore infrastructure

Simulation in Computer Graphics. Particles. Matthias Teschner. Computer Science Department University of Freiburg

Parallelising Pipelined Wavefront Computations on the GPU

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA

NIA CFD Seminar, October 4, 2011 Hyperbolic Seminar, NASA Langley, October 17, 2011

Nonoscillatory Central Schemes on Unstructured Triangulations for Hyperbolic Systems of Conservation Laws

2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Prof. B.S. Thandaveswara. The computation of a flood wave resulting from a dam break basically involves two

BACK AND FORTH ERROR COMPENSATION AND CORRECTION METHODS FOR REMOVING ERRORS INDUCED BY UNEVEN GRADIENTS OF THE LEVEL SET FUNCTION

CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA

DISCONTINUOUS GALERKIN SHALLOW WATER SOLVER ON CUDA ARCHITECTURES

Using GPUs to compute the multilevel summation of electrostatic forces

Comparing HEC-RAS v5.0 2-D Results with Verification Datasets

GPU - Next Generation Modeling for Catchment Floodplain Management. ASFPM Conference, Grand Rapids (June 2016) Chris Huxley

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

Numerical Methods for Hyperbolic and Kinetic Equations

Acknowledgements. Prof. Dan Negrut Prof. Darryl Thelen Prof. Michael Zinn. SBEL Colleagues: Hammad Mazar, Toby Heyn, Manoj Kumar

Homework 4A Due November 7th IN CLASS

WAVE PATTERNS, WAVE INDUCED FORCES AND MOMENTS FOR A GRAVITY BASED STRUCTURE PREDICTED USING CFD

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

Technology for a better society. SINTEF ICT, Applied Mathematics, Heterogeneous Computing Group

Accelerating the Implicit Integration of Stiff Chemical Systems with Emerging Multi-core Technologies

Introduction to MIKE FLOOD

Quantifying the Dynamic Ocean Surface Using Underwater Radiometric Measurement

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

Large scale Imaging on Current Many- Core Platforms

In Proc. of the 13th Int. Conf. on Hydroinformatics, Preprint

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

Multi Agent Navigation on GPU. Avi Bleiweiss

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

Modeling Khowr-e Musa Multi-Branch Estuary Currents due to the Persian Gulf Tides Using NASIR Depth Average Flow Solver

Double-Precision Matrix Multiply on CUDA

Accelerating CFD with Graphics Hardware

Acceleration of a Python-based Tsunami Modelling Application via CUDA and OpenHMPP

Coastal impact of a tsunami Review of numerical models

Overview of Traditional Surface Tracking Methods

The Shallow Water Equations and CUDA

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

A Toolbox of Level Set Methods

GPU-accelerated data expansion for the Marching Cubes algorithm

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Realtime Water Simulation on GPU. Nuttapong Chentanez NVIDIA Research

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

A-posteriori Diffusion Analysis of Numerical Schemes in Wavenumber Domain

Asynchronous OpenCL/MPI numerical simulations of conservation laws

FAST ALGORITHMS FOR CALCULATIONS OF VISCOUS INCOMPRESSIBLE FLOWS USING THE ARTIFICIAL COMPRESSIBILITY METHOD

A Two-Dimensional Numerical Scheme of Dry/Wet Fronts for the Saint-Venant System of Shallow Water Equations

arxiv: v1 [cs.ms] 8 Aug 2018

Advanced CUDA Optimizing to Get 20x Performance. Brent Oster

Parallel Hyperbolic PDE Simulation on Clusters: Cell versus GPU

A laboratory-dualsphysics modelling approach to support landslide-tsunami hazard assessment

Exploiting graphical processing units for data-parallel scientific applications

Computational Astrophysics 5 Higher-order and AMR schemes

Graphics Processor Acceleration and YOU

A new multidimensional-type reconstruction and limiting procedure for unstructured (cell-centered) FVs solving hyperbolic conservation laws

State-of-the-art in Heterogeneous Computing

Transcription:

Evacuate Now? Faster-than-real-time Shallow Water Simulations on GPUs NVIDIA GPU Technology Conference San Jose, California, 2010 André R. Brodtkorb

Talk Outline Learn how to simulate a half an hour dam break in 27 seconds Introduction Why Shallow Water Simulations? The Shallow Water Equations Numerical scheme Our contribution Simulator Implementation Results including screen capture video Live Demo on a standard Laptop Summary 2

The Shallow Water Equations First described by de Saint-Venant (1797-1886) Gravity-induced fluid motion 2D free surface Negligible vertical acceleration Wave length much larger than depth Conservation of mass and momentum Not only for water: Atmospheric flow Avalanches... Water image from http://freephoto.com / Ian Britton 3

Target application areas Tsunamis Floods 2004 Indian Ocean (230000) 2010: Pakistan (2000+) Storm Surges Dam breaks 2005 Hurricane Katrina (1836) 1959 Malpasset (423) Images from wikipedia.org 4

Mathematical Formulation Vector of Conserved variables Flux Functions Bed slope source term Bed friction source term 5

The Shallow Water Equations Water depth, discharge (u), and discharge (v) 6

Explicit Numerical Schemes Hyperbolic partial differential equation Enables explicit schemes Accurate modeling of discontinuities / shocks High accuracy in smooth parts without oscillations near discontinuities Capable of representing dry states Negative water depths ruin simulations Images from wikipedia.org, James Kilfiger 7

Explicit Numerical Schemes Additional wanted properties: Second order accurate fluxes Total variation diminishing Well balancedness 8

Explicit Numerical Schemes Additional wanted properties: Second order accurate fluxes Total variation diminishing Well balancedness Scheme of choice: A. Kurganov and G. Petrova, A Second-Order Well-Balanced Positivity Preserving Central-Upwind Scheme for the Saint-Venant System Communications in Mathematical Sciences, 5 (2007), 133-160 9

Kurganov-Petrova Spatial discretization Rewrite in terms of w=h+b Write on vector form Impose finite-volume grid 10

Kurganov-Petrova Finite Volume Grid Q defined as cell averages B defined as piecewise bilinear F and G calculated across cell interfaces Source terms, H, calculated as cell averages 11

Kurganov-Petrova Flux calculations Continuous variables Discrete variables Slope reconstruction Flux calculation Integration points Dry states fix 12

Kurganov-Petrova Temporal discretization Gather all explicit terms One ordinary differential equation in time per cell 13

Kurganov-Petrova Temporal discretization Discretize using second order Runge-Kutta Total variation diminishing Semi-implicit friction source term Discretize in time 14

Kurganov-Petrova CFL condition Explicit scheme, time step restriction: Time step size restricted by a Courant-Friedrichs-Lewy condition The numerical domain of dependence must include the domain of dependence of the equation Each wave is allowed to travel at most one quarter grid cell per time step Space Mathematical propagation speed Unstable Time Stable 15

Kurganov-Petrova Simulation cycle 1. Calculate fluxes 2. Calculate Dt 6. Boundary conditions 3. Halfstep 5. Evolve in time 4. Calculate fluxes 16

Implementation GPU code Four CUDA kernels: 87% Flux <1% Timestep size (CFL condition) 12% Forward euler step <1% Set boundary conditions Step 17

Flux kernel Domain decomposition A nine-point nonlinear stencil Comprised of simpler stencils Heavy use of shmem Computationally demanding Traditional Block Decomposition Overlaping ghost cells (aka. apron) Global ghost cells for boundary conditions Domain padding 18

Flux kernel Block size Block size is 16x14 Warp size: multiple of 32 Shared memory use: 16 shmem buffers use ~16 KB Occupancy Use 48 KB shared mem, 16 KB cache Three resident blocks Trades cache for occupancy Fermi cache Global memory access 19

Flux kernel - computations Input Slopes Integration points Flux Calculations Flux across north and east interface Bed slope source term for the cell Collective stencil operations n threads, and n+1 interfaces one warp performs extra calculations! Alternative is one thread per stencil operation Many idle threads, and extra register pressure 20

Flux kernel flux limiter Limits the fluxes to obtain non-oscillatory solution Generalized minmod limiter Least steep slope, or Zero if signs differ Creates divergent code paths Use branchless implementation (2007) Requires special sign function Much faster than naïve approach float minmod(float a, float b, float c) { return 0.25f *sign(a) *(sign(a) + sign(b)) *(sign(b) + sign(c)) *min( min(abs(a), abs(b)), abs(c) ); } (2007) T. Hagen, M. Henriksen, J. Hjelmervik, and K.-A. Lie. How to solve systems of conservation laws numerically using the graphics processor as a high-performance computational engine. Geometrical Modeling, Numerical Simulation, and Optimization: Industrial Mathematics at SINTEF, (211 264). Springer Verlag, 2007. 21

Timestep size kernel Flux kernel calculates wave speed per cell Find global maximum Calculate timestep using the CFL condition Parallel reduction: Models CUDA SDK sample Template code Fully coalesced reads Without bank conflicts Optimization Perform partial reduction in flux kernel Reduces memory and bandwidth by a factor 192 16x14 1 Image from Optimizing Parallel Reduction in CUDA, Mark Harris 22

Time integration kernel Computes Q* or Q n+1 Solves the time-ode per cell Trivial to implement Fully coalesced memory access Memory bound 23

Boundary conditions kernel Global boundary uses ghost cells Fixed inlet / outlet discharge Fixed depth Reflecting Outflow/Absorbing Global boundary Local ghost cells Currently no mixed boundaries Can also supply hydrograph Tsunamies Storm surges Tidal waves 3.5m Tsunami, 1h 10m Storm Surge, 4d 24

Boundary conditions kernel Similar to CUDA SDK reduction sample, using templates: One block sets all four boundaries Boundary length (>64, >128, >256, >512) Boundary type ( none, reflecting, fixed depth, fixed discharge, absorbing outlet) In total: 4*5*5*5*5 = 2500 realizations switch(block.x) { case 512: BCKernelLauncher<512, N, S, E, W>(grid, block, stream); break; case 256: BCKernelLauncher<256, N, S, E, W>(grid, block, stream); break; case 128: BCKernelLauncher<128, N, S, E, W>(grid, block, stream); break; case 64: BCKernelLauncher< 64, N, S, E, W>(grid, block, stream); break; } 25

Optimization: Early exit Observation: Many dry areas do not require computation Use a small buffer to store wet blocks Exit flux kernel if nearest neighbours are dry Up-to 6x speedup Blocks still have to be scheduled Blocks read the auxiliary buffer One wet cell marks the whole block as wet 26

Results - Performance Circular Dam break 1st order Euler 30% wet cells: 1200 megacells / s 50% wet cells: 900 megacells / s 100% wet cells: 300 megacells / s 2nd order Runge-Kutta 30% wet cells: 600 megacells / s 50% wet cells: 450 megacells / s 100% wet cells: 150 megacells / s 27

Results Multiple GPUs Single-node multi-gpu Four Tesla GPUs Threading Near-perfect weak scaling Near-perfect strong scaling Up-to 380 million cells (16 GB) 19 000 x 19 000 cells 28

Verification 2D Parabolic basin Planar water surface oscillates 100 x 100 cells Horizontal scale: 8 km Vertical scale: 3.3 m Simulation and analytical match well But, as most schemes, growing errors along wet-dry interface 29

Validation Barrage de Malpasset South-east France near Fréjus Bursts at 21:13 December 2nd 1959 40 meter high wall of water 70 km/h (43 mi/h) Reaches mediterranean in 30 minutes 423 casualties, $68 million in damages Double curvature dam 66.5 m high 220 m crest length 55 million cubic metres of water Images from Google maps, TeraMetrics 30

Validation Experimental data from 1:400 model 482 000 cells 1100 x 440 bathymetry values 15 meter resolution Accurately predicts maximum elevation and front arrival time Largest discrepancy at gauges 14 (arrival time) and 9 (elevation) Compares well with published results 31

Implementation CPU framework Simulation loop executed by CPU Output to netcdf Direct visualization via OpenGL 32

Video: http://www.youtube.com/watch?v=fbzbr-fjrwy 33

Live Demo Dell XPS m1330, Flamingo Pink Purchased 09-2008, price ~$1850 Intel Core 2 duo T9300 @ 2.5 GHz 4.0 GB RAM NVIDIA GeForce 8400M GS 128 MB graphics RAM Only 16 cuda cores (GTX 480 has 448) Windows Vista Ultimate SP2 32-bit CUDA toolkit/sdk 3.1 32-bit CUDA Driver 257.21 Microsoft Visual Studio 2008 Images from dell.com 34

Summary Learn how to simulate a half an hour dam break in seconds Faster than real-time performance 150-1200 megacells per second Verified and validated results Can accurately predict real-world events using single precision Direct visualization Interactive exploration of simulation results 35

References A. R. Brodtkorb, T. R. Hagen, K.-A. Lie and J. R. Natvig, Simulation and Visualization of the Saint-Venant System using GPUs, Computing and Visualization in Science, 2010 special issue on Hot topics in Computational Engineering, [forthcoming]. A. R. Brodtkorb, M. L. Sætra, and M. Altinakar, Efficient Shallow Water Simulations on GPUs: Implementation, Visualization, Verification, and Validation, in review, 2010. A. R. Brodtkorb, Scientific Computing on Heterogeneous Architectures Ph.D. Thesis, University of Oslo, Submitted, 2010. 36

Thank you for your attention. Questions? http://babrodtk.at.ifi.uio.no http://hetcomp.com Andre.Brodtkorb@sintef.no 37