Simulating Shallow Water on GPUs Programming of Heterogeneous Systems in Physics

Similar documents
CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni

GPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA

Evacuate Now? Faster-than-real-time Shallow Water Simulations on GPUs. NVIDIA GPU Technology Conference San Jose, California, 2010 André R.

Portland State University ECE 588/688. Graphics Processors

Load-balancing multi-gpu shallow water simulations on small clusters

CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Shallow Water Simulations on Graphics Hardware

Simulation of one-layer shallow water systems on multicore and CUDA architectures

Real-Time Support for GPU. GPU Management Heechul Yun

Fast Tridiagonal Solvers on GPU

CUDA Optimization with NVIDIA Nsight Visual Studio Edition 3.0. Julien Demouth, NVIDIA

About Phoenix FD PLUGIN FOR 3DS MAX AND MAYA. SIMULATING AND RENDERING BOTH LIQUIDS AND FIRE/SMOKE. USED IN MOVIES, GAMES AND COMMERCIALS.

Realtime Water Simulation on GPU. Nuttapong Chentanez NVIDIA Research

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Advanced CUDA Optimizing to Get 20x Performance. Brent Oster

Advanced CUDA Optimizing to Get 20x Performance

GPU ARCHITECTURE Chris Schultz, June 2017

CS427 Multicore Architecture and Parallel Computing

Advanced and parallel architectures. Part B. Prof. A. Massini. June 13, Exercise 1a (3 points) Exercise 1b (3 points) Exercise 2 (8 points)

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Navier-Stokes & Flow Simulation

Dense Linear Algebra. HPC - Algorithms and Applications

Programming with CUDA

CUDA Experiences: Over-Optimization and Future HPC

Navier-Stokes & Flow Simulation

Mattan Erez. The University of Texas at Austin

Threading Hardware in G80

CUDA Performance Optimization. Patrick Legresley

EE 4702 GPU Programming

CUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.

Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation

NVIDIA Fermi Architecture

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Homework 4A Due November 7th IN CLASS

COMP 605: Introduction to Parallel Computing Lecture : GPU Architecture

GPU ARCHITECTURE Chris Schultz, June 2017

A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv: v1 [cs.dc] 5 Sep 2013

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

Fundamental CUDA Optimization. NVIDIA Corporation

Tesla Architecture, CUDA and Optimization Strategies

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Introduction to Parallel Computing with CUDA. Oswald Haan

UberFlow: A GPU-Based Particle Engine

Convolution Soup: A case study in CUDA optimization. The Fairmont San Jose 10:30 AM Friday October 2, 2009 Joe Stam

The Shallow Water Equations and CUDA

GPUs and GPGPUs. Greg Blanton John T. Lubia

CUDA/OpenGL Fluid Simulation. Nolan Goodnight

Windowing System on a 3D Pipeline. February 2005

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

CUDA OPTIMIZATIONS ISC 2011 Tutorial

Fundamental CUDA Optimization. NVIDIA Corporation

Scan Primitives for GPU Computing

Identifying Performance Limiters Paulius Micikevicius NVIDIA August 23, 2011

Acceleration of a Python-based Tsunami Modelling Application via CUDA and OpenHMPP

The ICON project: Design and performance of an unstructured grid approach for a global triangular grid model

The Shallow Water Equations and CUDA

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CUDA Threads. Origins. ! The CPU processing core 5/4/11

Shallow Water Equation simulation with Sparse Grid Combination Technique

B. Tech. Project Second Stage Report on

Mathematical computations with GPUs

Abstract. Introduction. Kevin Todisco

CUDA Particles. Simon Green

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

The Shallow Water Equations and CUDA

Optimizing Parallel Reduction in CUDA

Introduction to GPU hardware and to CUDA

and its analysis based on Statistical Model Checking Masahiro Fujita VLSI Design and Education Center (VDEC) University of Tokyo

Cartoon parallel architectures; CPUs and GPUs

General Purpose GPU Computing in Partial Wave Analysis

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards

Two-Phase flows on massively parallel multi-gpu clusters

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

Auto-tuning Shallow water simulations on GPUs

Parallel Programming Concepts. GPU Computing with OpenCL

Particle Simulation using CUDA. Simon Green

CSE 160 Lecture 24. Graphical Processing Units

Large scale Imaging on Current Many- Core Platforms

Programming in CUDA. Malik M Khan

Convolution Soup: A case study in CUDA optimization. The Fairmont San Jose Joe Stam

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

CUDA Particles. Simon Green

Shallow Water Equations:Variable Bed Topography Adam Riley Computer Science (with Industry) 2011/2012

CS377P Programming for Performance GPU Programming - II

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

Advanced CUDA Optimizations. Umar Arshad ArrayFire

How to Optimize Geometric Multigrid Methods on GPUs

Bifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel CPUs are becoming more parallel Dual and quad cores, roadm

Persistent RNNs. (stashing recurrent weights on-chip) Gregory Diamos. April 7, Baidu SVAIL

Navier-Stokes & Flow Simulation

Hybrid Implementation of 3D Kirchhoff Migration

GPU Fundamentals Jeff Larkin November 14, 2016

Register and Thread Structure Optimization for GPUs Yun (Eric) Liang, Zheng Cui, Kyle Rupnow, Deming Chen

CS195V Week 1. Introduction

Breaking the Memory Barrier for Finite Difference Algorithms

Numerical Algorithms on Multi-GPU Architectures

Mattan Erez. The University of Texas at Austin

Transcription:

Simulating Shallow Water on GPUs Programming of Heterogeneous Systems in Physics Martin Pfeiffer (m.pfeiffer@uni-jena.de) Friedrich Schiller University Jena 06.10.2011 Simulating Shallow Water on GPUs 06.10.2011 1 / 17

Course Project Origin: Student project to Programming with CUDA course Learned cuda in first part of course Student project in the second part Conditions: 5 weeks to complete (20h per week) Two person team Almost zero prior experience of C/OpenGL Initial idea: Hey, let s simulate tsunamis waves! Simulating Shallow Water on GPUs 06.10.2011 2 / 17

Applications for Shallow Water Equations Simulating Shallow Water on GPUs 06.10.2011 3 / 17

Shallow Water Equation h uh uh + u 2 h + 1 2 gh2 vh uvh t State variable x vh + uvh v 2 h + 1 2 gh2 Fluxes conservation of mass and momentum y 0 = hb x hb y Slope u - horizontal water velocity v - vertical water velocity h - water height B - sea bed height g - gravitational constant Simulating Shallow Water on GPUs 06.10.2011 4 / 17

The Lax-Wendroff Method Gridstructure gridpoint stores water height (h), momentum (uh, vh) & sea bed height (B) half-step point defines (h, uh, vh) T at time n + 1/2 between 2 gridpoints half step full step Simulating Shallow Water on GPUs 06.10.2011 5 / 17

Implementation Challenges I - Host-Device Bandwidth The waterwave have to be visualized Problems Host-Device memory copy is slow PCI-Express 2.0 16x allows 8 Gigabyte/s Visualization slows down graphic device Solution Don t visualize every step Don t copy state variables - only water height and color Use OpenGL compatible data structures Simulating Shallow Water on GPUs 06.10.2011 6 / 17

Computation Cycle Simulating Shallow Water on GPUs 06.10.2011 7 / 17

Decoupling Computation-Visualisation Maximize wave steps per frame More CUDA work Less memory transfer But less graphical updates Other benefits Slow/fast motion FPS cap less visualization Simulating Shallow Water on GPUs 06.10.2011 8 / 17

Implementation Challenges II - Memory Access Every point on the grid accesses the state of it s neighbours Problems Slow global memory access on older devices Memory access domiates computation performance GPU is idle while waiting for operands Don t update grid point before finished reading Solution Texture memory Recalculation is faster than sharing Simulating Shallow Water on GPUs 06.10.2011 9 / 17

Texture Memory Optimized for 2D memory access Spatial-aware cache Read-only for kernels Interpolation Fast! Simulating Shallow Water on GPUs 06.10.2011 10 / 17

Performance Improvement - Memory Now the instructions dominate the memory access! Simulating Shallow Water on GPUs 06.10.2011 11 / 17

Implementation challenges III - Divergence Every thread should do the same Problems Divergent branches within a warp are serialized Problem with handling boundary conditions Model requires non-negativ water height Solution Compute non-boundary grid points first Fix boundary grid points with separate kernel Minimize divergent branch workload Simulating Shallow Water on GPUs 06.10.2011 12 / 17

Computation performance GF lops max Achieved % of max Realtime 1 Tesla C2050 1030 410 * 39,8 1654x1654 GeForce 330M 182 53 * 29,1 512x512 -use fast math switch Reduces register usage Higher occupancy About 15% more GFlops * Nvidia s nbody demo achieves 540 GFlops @ Tesla C2050 / 59 GFlops @ GeForce 330M 1 Gridsize @ 24 FPS & 20 wave steps per frame Simulating Shallow Water on GPUs 06.10.2011 13 / 17

Landscape Data and Graphical Output Data: Landscape data from US-national geophysical data center Can also be read from image files (ppm) Initial waves are read from image files Graphics: 3D Graphics & Movement OpenGL for visualization Multi-platform support Heavy use of vertex buffer objects Same data structures as in CUDA Sample landscape image file Simulating Shallow Water on GPUs 06.10.2011 14 / 17

Cross your fingers! - Demo Time Simulating Shallow Water on GPUs 06.10.2011 15 / 17

Questions? Course URL theinf2.informatik.uni-jena.de/lectures/programming+with+cuda.html Code Repository github.com/frty2/cuda Shallow-water-equations Special Thanks to Daniel Kirbst, Jens Mueller, Thomas Baumbach, Prof. Joachim Giesen & Prof. Gerhard Zumbusch. Simulating Shallow Water on GPUs 06.10.2011 16 / 17

Simulating Shallow Water on GPUs Programming of Heterogeneous Systems in Physics Martin Pfeiffer (m.pfeiffer@uni-jena.de) Friedrich Schiller University Jena 06.10.2011 Simulating Shallow Water on GPUs 06.10.2011 17 / 17