Lattice Boltzmann with CUDA Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1
Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 2
Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 3
Overview of LBM Lattice Boltzmann Method is a class of computational fluid dynamics methods for fluid simulation CFD Methods: volume mesh (irregular/regular) - Euler equations - Navier-Stokes equations Smoothed particle hydrodynamics (SPH): - Lagrangian method Spectral methods: - spherical harmonics - Chebyshev polynomials LBM: simulate an equivalent mesoscopic system on a Cartesian grid Page 4
Overview of LBM from macroscropic to mesoscopic to microscropic ρ T u r e r i v r Page 5
Overview of LBM lattice structure: D2Q9, D3Q19... Page 6
Overview of LBM boundary condition: Domain boundary: - the out-most surrounding lattice nodes Obstacle boundary: - the objects as obstacles inside the lattice grid to block the fluid flow Solution: - not change - bounce-back Page 7
Overview of LBM LBM is Rresource intensive! > 100x100x100 grid points not practical due to the slow speed of memory access and long processing time explicit in nature & require only next neighbor interaction very suitable for the implementation on GPUs Parallel computing Single-Program Multiple-Data (SPMD) Model within-processor memory Page 8
Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 9
Target Model Lid Driven Cavity Page 10
Reforming of LBM Equation Discrete Lattice Boltzmann equation Collide Step: Stream Step: Page 11
Stream Step Fluid particles propagate to neighboring cells Page 12
Collide Step 4/9 1/9 1/36-11 0 1 1 0-1 Page 13
Boundary Condition (BC) Treatment For non-moving walls: For moving wall: : Velocity of the moving wall 1 0-1 -11 0 1 Page 14
Algorithm 1. Initialize distribution functions, density, and velocity for each cell 2. Set initial time (t0) 3. Treat boundary cells 4. Perform Stream operation 5. Perform Collide operation 6. Increment time by step 7. Go to step 3 unless end time reached Initialization Boundary Condition Treatment Perform Stream operation Perform Collide operation Incremented by time step False End time is reached End True Page 15
Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 16
Implementation in CUDA und Optimization Kernels #define BLOCK_SIZE 16 dim3 dimblock( BLOCK_SIZE, BLOCK_SIZE ); dim3 dimgrid( (cmd.sizex+2) / BLOCK_SIZE, (cmd.sizey+2) / BLOCK_SIZE ); BC<<<dimGrid,dimBlock>>>(d_cell, d_rho, d_wall_velocity, d_sizex, d_sizey); Stream<<<dimGrid,dimBlock>>>( d_cell, d_temp_cell, d_sizex ); Collide<<<dimGrid,dimBlock>>>( d_cell, d_rho, d_u, d_omega, d_sizex, d_sizey ); Page 17
Implementation in CUDA und Optimization Coalesce Block: 16x16 =256 cell Cell: 0..9 means (C,N,S,W,E,NW,NE,SW,SE,Flag) Uncoalesced access : 0..9 0..9 0..9 0..9 0..9 All 256 cells 0..9 10-vectors Coalesced access: 0,0,,0 1,1,,1 2,2,,2 3,3,,3 4,4,,4 All 10 elements 9,9,,9 256-vectors Page 18
Implementation in CUDA und Optimization Ghost Cell Block( i, j ) Block( (i+1), j ) 0,0 1,0 2,0 15,0 16,0 0,1 0,0 1,0 2,0 15,0 0,1 Page 19
Implementation in CUDA und Optimization Ghost Cell How it works Page 20
Implementation in CUDA und Optimization Matrix vs. Standard Block Matrix complementation decomposed in blocks every block must be 16x16 cells a x If the block on the edge is small than 16x16, then completed with 0 b Original Matrix Standard matrix y Page 21
Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 22
Chart : optimization Page 23
Chart : GPU vs GPU Page 24
Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 25
References http://www.wikipedia.org http://www10.informatik.uni-erlangen.de http://www12.informatik.uni-erlangen.de http://math.nist.gov/mcsd/savg/parallel/index.html Page 26