Lattice Boltzmann with CUDA

Similar documents
CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

Software and Performance Engineering for numerical codes on GPU clusters

Simulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method

LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters

(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more

Computational Fluid Dynamics (CFD) using Graphics Processing Units

Shape of Things to Come: Next-Gen Physics Deep Dive

International Supercomputing Conference 2009

Sailfish: Lattice Boltzmann Fluid Simulations with GPUs and Python

CGT 581 G Fluids. Overview. Some terms. Some terms

The Immersed Interface Method

Simulation of moving Particles in 3D with the Lattice Boltzmann Method

FOURTH ORDER COMPACT FORMULATION OF STEADY NAVIER-STOKES EQUATIONS ON NON-UNIFORM GRIDS

Overview of Traditional Surface Tracking Methods

Computational Fluid Dynamics with the Lattice Boltzmann Method KTH SCI, Stockholm

Tesla Architecture, CUDA and Optimization Strategies

PHYSICALLY BASED ANIMATION

Introduction to the immersed boundary method

GPU-based Distributed Behavior Models with CUDA

simulation framework for piecewise regular grids

Development of an Incompressible SPH Method through SPARTACUS-2D

Realtime Water Simulation on GPU. Nuttapong Chentanez NVIDIA Research

LATTICE-BOLTZMANN METHOD FOR THE SIMULATION OF LAMINAR MIXERS

Numerical Algorithms on Multi-GPU Architectures


Virtual EM Inc. Ann Arbor, Michigan, USA

Acknowledgements. Prof. Dan Negrut Prof. Darryl Thelen Prof. Michael Zinn. SBEL Colleagues: Hammad Mazar, Toby Heyn, Manoj Kumar

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123

Realistic Animation of Fluids

Interaction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang

Realistic Animation of Fluids

NVIDIA. Interacting with Particle Simulation in Maya using CUDA & Maximus. Wil Braithwaite NVIDIA Applied Engineering Digital Film

Driven Cavity Example

Phys 113 Final Project

Introduction to the immersed boundary method

Example 13 - Shock Tube

Performance Analysis of the Lattice Boltzmann Method on x86-64 Architectures

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Verification and Validation in CFD and Heat Transfer: ANSYS Practice and the New ASME Standard

A Particle Cellular Automata Model for Fluid Simulations

The Lattice Boltzmann Method used for fluid flow modeling in hydraulic components

SPH: Why and what for?

Unstructured Mesh Generation for Implicit Moving Geometries and Level Set Applications

Dynamic Mode Decomposition analysis of flow fields from CFD Simulations

Free Surface Lattice-Boltzmann fluid simulations. with and without level sets.

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Numerical Simulation of Coastal Wave Processes with the Use of Smoothed Particle Hydrodynamics (SPH) Method

Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s)

BOUNDLESS FLUIDS USING THE LATTICE-BOLTZMANN METHOD. A Thesis. Presented to. the Faculty of California Polytechnic State University.

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Parallelization of Scientific Applications (II)

Free Surface Flows with Moving and Deforming Objects for LBM

IMPROVED WALL BOUNDARY CONDITIONS WITH IMPLICITLY DEFINED WALLS FOR PARTICLE BASED FLUID SIMULATION

Navier-Stokes & Flow Simulation

Multigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK

OpenACC programming for GPGPUs: Rotor wake simulation

Support for Multi physics in Chrono

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

Computational Fluid Dynamics - Incompressible Flows

Real-time Thermal Flow Predictions for Data Centers

Level set methods Formulation of Interface Propagation Boundary Value PDE Initial Value PDE Motion in an externally generated velocity field

Permeable and Absorbent Materials in Fluid Simulations

Introduction to Numerical General Purpose GPU Computing with NVIDIA CUDA. Part 1: Hardware design and programming model

Implementation and Optimization of the Lattice Boltzmann Method for the Jackal DSM System

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

Technical Report TR

SENSEI / SENSEI-Lite / SENEI-LDC Updates

GPU Simulations of Violent Flows with Smooth Particle Hydrodynamics (SPH) Method

Pressure Correction Scheme for Incompressible Fluid Flow

Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla

Parallel Summation of Inter-Particle Forces in SPH

Navier-Stokes & Flow Simulation

Possibility of Implicit LES for Two-Dimensional Incompressible Lid-Driven Cavity Flow Based on COMSOL Multiphysics

Recent applications of overset mesh technology in SC/Tetra

Preliminary Spray Cooling Simulations Using a Full-Cone Water Spray

EXPLICIT MOVING PARTICLE SIMULATION METHOD ON GPU CLUSTERS. of São Paulo

Fluids in Games. Jim Van Verth Insomniac Games

Aeroacoustic computations with a new CFD solver based on the Lattice Boltzmann Method

Solving Partial Differential Equations on Overlapping Grids

3D Simulation of Dam-break effect on a Solid Wall using Smoothed Particle Hydrodynamic

Case Study - Computational Fluid Dynamics (CFD) using Graphics Processing Units

Lattice Boltzmann Method for Simulating Turbulent Flows

The 3D DSC in Fluid Simulation

Particle-Based Fluid Simulation. CSE169: Computer Animation Steve Rotenberg UCSD, Spring 2016

A New Approach to Reduce Memory Consumption in Lattice Boltzmann Method on GPU

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

An Embedded Boundary Method with Adaptive Mesh Refinements

The Shallow Water Equations and CUDA

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Debojyoti Ghosh. Adviser: Dr. James Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering

1. Mathematical Modelling

High Performance Computing

Divergence-Free Smoothed Particle Hydrodynamics

CHRONO::HPC DISTRIBUTED MEMORY FLUID-SOLID INTERACTION SIMULATIONS. Felipe Gutierrez, Arman Pazouki, and Dan Negrut University of Wisconsin Madison

FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION. M. Panchatcharam 1, S. Sundar 2

cuibm A GPU Accelerated Immersed Boundary Method

Directions: 1) Delete this text box 2) Insert desired picture here

Webinar #3 Lattice Boltzmann method for CompBioMed (incl. Palabos)

Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs)

Transcription:

Lattice Boltzmann with CUDA Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1

Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 2

Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 3

Overview of LBM Lattice Boltzmann Method is a class of computational fluid dynamics methods for fluid simulation CFD Methods: volume mesh (irregular/regular) - Euler equations - Navier-Stokes equations Smoothed particle hydrodynamics (SPH): - Lagrangian method Spectral methods: - spherical harmonics - Chebyshev polynomials LBM: simulate an equivalent mesoscopic system on a Cartesian grid Page 4

Overview of LBM from macroscropic to mesoscopic to microscropic ρ T u r e r i v r Page 5

Overview of LBM lattice structure: D2Q9, D3Q19... Page 6

Overview of LBM boundary condition: Domain boundary: - the out-most surrounding lattice nodes Obstacle boundary: - the objects as obstacles inside the lattice grid to block the fluid flow Solution: - not change - bounce-back Page 7

Overview of LBM LBM is Rresource intensive! > 100x100x100 grid points not practical due to the slow speed of memory access and long processing time explicit in nature & require only next neighbor interaction very suitable for the implementation on GPUs Parallel computing Single-Program Multiple-Data (SPMD) Model within-processor memory Page 8

Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 9

Target Model Lid Driven Cavity Page 10

Reforming of LBM Equation Discrete Lattice Boltzmann equation Collide Step: Stream Step: Page 11

Stream Step Fluid particles propagate to neighboring cells Page 12

Collide Step 4/9 1/9 1/36-11 0 1 1 0-1 Page 13

Boundary Condition (BC) Treatment For non-moving walls: For moving wall: : Velocity of the moving wall 1 0-1 -11 0 1 Page 14

Algorithm 1. Initialize distribution functions, density, and velocity for each cell 2. Set initial time (t0) 3. Treat boundary cells 4. Perform Stream operation 5. Perform Collide operation 6. Increment time by step 7. Go to step 3 unless end time reached Initialization Boundary Condition Treatment Perform Stream operation Perform Collide operation Incremented by time step False End time is reached End True Page 15

Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 16

Implementation in CUDA und Optimization Kernels #define BLOCK_SIZE 16 dim3 dimblock( BLOCK_SIZE, BLOCK_SIZE ); dim3 dimgrid( (cmd.sizex+2) / BLOCK_SIZE, (cmd.sizey+2) / BLOCK_SIZE ); BC<<<dimGrid,dimBlock>>>(d_cell, d_rho, d_wall_velocity, d_sizex, d_sizey); Stream<<<dimGrid,dimBlock>>>( d_cell, d_temp_cell, d_sizex ); Collide<<<dimGrid,dimBlock>>>( d_cell, d_rho, d_u, d_omega, d_sizex, d_sizey ); Page 17

Implementation in CUDA und Optimization Coalesce Block: 16x16 =256 cell Cell: 0..9 means (C,N,S,W,E,NW,NE,SW,SE,Flag) Uncoalesced access : 0..9 0..9 0..9 0..9 0..9 All 256 cells 0..9 10-vectors Coalesced access: 0,0,,0 1,1,,1 2,2,,2 3,3,,3 4,4,,4 All 10 elements 9,9,,9 256-vectors Page 18

Implementation in CUDA und Optimization Ghost Cell Block( i, j ) Block( (i+1), j ) 0,0 1,0 2,0 15,0 16,0 0,1 0,0 1,0 2,0 15,0 0,1 Page 19

Implementation in CUDA und Optimization Ghost Cell How it works Page 20

Implementation in CUDA und Optimization Matrix vs. Standard Block Matrix complementation decomposed in blocks every block must be 16x16 cells a x If the block on the edge is small than 16x16, then completed with 0 b Original Matrix Standard matrix y Page 21

Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 22

Chart : optimization Page 23

Chart : GPU vs GPU Page 24

Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization Performance Demo Page 25

References http://www.wikipedia.org http://www10.informatik.uni-erlangen.de http://www12.informatik.uni-erlangen.de http://math.nist.gov/mcsd/savg/parallel/index.html Page 26