Computational Fluid Dynamics (CFD) using Graphics Processing Units

Similar documents
Case Study - Computational Fluid Dynamics (CFD) using Graphics Processing Units

Module 3: CUDA Execution Model -I. Objective

Introduction to GPU programming. Introduction to GPU programming p. 1/17

Comparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2

Module Memory and Data Locality

This is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC.

1.2 Numerical Solutions of Flow Problems

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

Lattice Boltzmann with CUDA

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Programming in CUDA. Malik M Khan

CUDA/OpenGL Fluid Simulation. Nolan Goodnight

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

GPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum

CSE 599 I Accelerated Computing - Programming GPUS. Memory performance

GPU Computing Master Clss. Development Tools

ECE 408 / CS 483 Final Exam, Fall 2014

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

AMath 483/583 Lecture 21 May 13, 2011

Example 1: Color-to-Grayscale Image Processing

AMath 483/583 Lecture 24. Notes: Notes: Steady state diffusion. Notes: Finite difference method. Outline:

CUDA Parallelism Model

AMath 483/583 Lecture 24

f xx + f yy = F (x, y)

GPU programming basics. Prof. Marco Bertini

A Novel Approach to High Speed Collision

GPU Memory Memory issue for CUDA programming

GPU Memory. GPU Memory. Memory issue for CUDA programming. Copyright 2013 Yong Cao, Referencing UIUC ECE498AL Course Notes

cuibm A GPU Accelerated Immersed Boundary Method

This is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC.

CFD-1. Introduction: What is CFD? T. J. Craft. Msc CFD-1. CFD: Computational Fluid Dynamics

GPU Programming. Performance Considerations. Miaoqing Huang University of Arkansas Fall / 60

Dense Linear Algebra. HPC - Algorithms and Applications

Massively Parallel Architectures

Solving the heat equation with CUDA

Two-Phase flows on massively parallel multi-gpu clusters

2/2/11. Administrative. L6: Memory Hierarchy Optimization IV, Bandwidth Optimization. Project Proposal (due 3/9) Faculty Project Suggestions

CUDA Memory Model. Monday, 21 February Some material David Kirk, NVIDIA and Wen-mei W. Hwu, (used with permission)

Data Parallel Execution Model

Lessons learned from a simple application

S7260: Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs

Slide credit: Slides adapted from David Kirk/NVIDIA and Wen-mei W. Hwu, DRAM Bandwidth

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

Driven Cavity Example

Computation to Core Mapping Lessons learned from a simple application

GPU Computing with CUDA

ACCELERATION OF A COMPUTATIONAL FLUID DYNAMICS CODE WITH GPU USING OPENACC

Lecture 3: Introduction to CUDA

GPU programming CUDA C. GPU programming,ii. COMP528 Multi-Core Programming. Different ways:

Accelerating CFD with Graphics Hardware

Information Coding / Computer Graphics, ISY, LiTH. CUDA memory! ! Coalescing!! Constant memory!! Texture memory!! Pinned memory 26(86)

Realistic Animation of Fluids

CSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices

Semester Final Report

Matrix Multiplication in CUDA. A case study

CUDA Memory Types All material not from online sources/textbook copyright Travis Desell, 2012

DRAM Bank Organization

CUDA GPGPU Workshop CUDA/GPGPU Arch&Prog

Technische Universität München. GPU Programming. Rüdiger Westermann Chair for Computer Graphics & Visualization. Faculty of Informatics

CSE 591: GPU Programming. Memories. Klaus Mueller. Computer Science Department Stony Brook University

Introduction to Multigrid and its Parallelization

Overview of Traditional Surface Tracking Methods

1/25/12. Administrative

CSE 599 I Accelerated Computing - Programming GPUS. Parallel Patterns: Graph Search

MASSACHUSETTS INSTITUTE OF TECHNOLOGY. Analyzing wind flow around the square plate using ADINA Project. Ankur Bajoria

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

Finite Volume Discretization on Irregular Voronoi Grids

Device Memories and Matrix Multiplication

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Outline 2011/10/8. Memory Management. Kernels. Matrix multiplication. CIS 565 Fall 2011 Qing Sun

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s)

An added mass partitioned algorithm for rigid bodies and incompressible flows

Lab 1 Part 1: Introduction to CUDA

2/17/10. Administrative. L7: Memory Hierarchy Optimization IV, Bandwidth Optimization and Case Studies. Administrative, cont.

Stream Function-Vorticity CFD Solver MAE 6263

A GPU Parallel Finite Volume Method for a 3D Poisson Equation on Arbitrary Geometries

Lecture 10. Stencil Methods Limits to Performance

Chapter 1 - Basic Equations

COMPUTATIONAL FLUID DYNAMICS USING GRAPHICS PROCESSING UNITS: CHALLENGES AND OPPORTUNITIES

2 T. x + 2 T. , T( x, y = 0) = T 1

Hardware/Software Co-Design

Introduction to Parallel Computing with CUDA. Oswald Haan

ECE408/CS483 Fall Applied Parallel Programming. Objective. To learn about tiled convolution algorithms

Figure 6.1: Truss topology optimization diagram.

Convolution, Constant Memory and Constant Caching

CUDA programming model. N. Cardoso & P. Bicudo. Física Computacional (FC5)

Module Memory and Data Locality

Introduction to CUDA (1 of n*)

OpenACC programming for GPGPUs: Rotor wake simulation

Didem Unat, Xing Cai, Scott Baden

GPU Programming. Lecture 2: CUDA C Basics. Miaoqing Huang University of Arkansas 1 / 34

HPC with Multicore and GPUs

Example of 3D Injection Molding CAE Directly Connected with 3D CAD

University of Quebec at Chicoutimi (UQAC) Friction Stir Welding Research Group. A CUDA Fortran. Story

Introduction to C omputational F luid Dynamics. D. Murrin

Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs)

GPU-based Distributed Behavior Models with CUDA

PART 1: FLUID-STRUCTURE INTERACTION Simulation of particle filtration processes in deformable media

Partial Differential Equations

Transcription:

Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores

Example CFD problem: Heat Conduction in Plate Figure: Physical domain: unit square heated from the top. Steady-State 2D Heat Conduction (Laplace Equation) 2 T = 2 T x 2 + 2 T y 2 = 0

Discretization/Solution of Heat Conduction Equation Finite-Volume Method (Temperatures are cell-centered) Iterative solver: Red-Black Gauss-Seidel with SOR (parallel algorithm) Gauss-Seidel where T n+1 p = (a n)(t n n ) + (a s )(T n s ) + (a e )(T n e ) + (a w )(T n w) a p a p = a n + a s + a e + a w and n=north, s=south, e=east, w=west Successive-Overrelaxation (SOR) Tp n+1 (accepted) = ωtp n+1 + (1 ω)tp n

Red-Black Gauss-Seidel Color the grid like a checkboard. First update red cells from n to n + 1 (only depends on black cells at n). Then update black cells from n to n + 1 (only depends on the red cells at n + 1) Figure: Red-Black coloring scheme on internal grid cells and stencil.

Developing the CUDA algorithm Experimented with various memory models Tried shared memory with if statements to handle B.C.s for each sub-domain slow Tried global memory where each thread loads its nearest-neighbors fast Currently using global memory Next step: texture memory

CUDA algorithm on host Programmed CPU (host) in C and GPU (device) in CUDA Pseudo-Code for Laplace solver on host (CPU) dynamic memory allocation set I.C. and B.C. setup coefficients (a n, a s, a e, a w, a p ) allocate device memory and copy all variables to device setup the execution configuration iteration loop: call red kernel, call black kernel each iteration copy final results from device back to host

CUDA algorithm on host: main.cu Execution configuration dim3 dimblock(block_size, BLOCK_SIZE); dim3 dimgrid(imx / BLOCK_SIZE, jmx / BLOCK_SIZE); Iteration loop for (iter=1; iter <= itmax; iter++) { // Launch kernel to update red squares red_kernel<<<dimgrid, dimblock>>> (T_old_d,an_d,as_d,ae_d,aw_d,ap_d,imx,jmx); } // Launch kernel to update black squares black_kernel<<<dimgrid, dimblock>>> (T_old_d,an_d,as_d,ae_d,aw_d,ap_d,imx,jmx);

CUDA algorithm on device: redkernel.cu Relations between threads and mesh cells // global thread indices (tx,ty) int tx = blockidx.x * BLOCK_SIZE + threadidx.x; int ty = blockidx.y * BLOCK_SIZE + threadidx.y; // convert thread indices to mesh indices row = (ty+1); col = (tx+1);

CUDA algorithm on device: redkernel.cu

CUDA algorithm on device: redkernel.cu Gauss-Seidel with SOR if ( (row + col) % 2 == 0 ) { // red cell float omega = 1.85; float sum; k = row*imax + col; // perform SOR on red squares sum = aw_d[k]*t_old_d[row*imax+ (col-1)] + \ ae_d[k]*t_old_d[row*imax+ (col+1)] + \ as_d[k]*t_old_d[(row+1)*imax+ col] + \ an_d[k]*t_old_d[(row-1)*imax+ col]; T_old_d[k]=T_old_d[k]*(1.0-omega)+omega*(sum/ap_d[k]); }

GPU Results Figure: Solution of 2D heat conduction equation on unit square with T=1 as top B.C. and T=0 along left, right, and bottom

Governing Equations Conservation of Mass Conservation of Momentum Conservation of Energy ρc p DT Dt ρ t + ρu = 0 ρ Du Dt = p + τ = βt Dp Dt + (k T ) + Φ where the viscous stress tensor is ( ui τ = µ + u ) j + δ ij λ( u) x j x i

Discretization of Governing Equations Fractional-Step Method ρûn+1 u n t ρ un+1 û n+1 t = H n ρu n+1 = 0 = p n+1 Pressure-Poisson Equation can consume 70-95% of CPU time! Boundary Conditions u n+1 = u b p n+1 n = 0

GPU research Developed Red-Black SOR solver for 2D heat conduction equation for GPU in CUDA GPU code currently 17 times faster than CPU code Developing CUDA code for Large-Eddy Simulations Collaborating with Prof. Wen-mei Hwu in ECE dept. Also collaborating with Jonathan Cohen from NVIDIA Their guidance is greatly appreciated!