cuibm A GPU Accelerated Immersed Boundary Method

Similar documents
arxiv: v2 [cs.ce] 9 Apr 2016

Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM)

ALE Seamless Immersed Boundary Method with Overset Grid System for Multiple Moving Objects

Driven Cavity Example

The Immersed Interface Method

An Accurate Cartesian Grid Method for Viscous Incompressible Flows with Complex Immersed Boundaries

An Embedded Boundary Method with Adaptive Mesh Refinements

MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP

Computational Study of Laminar Flowfield around a Square Cylinder using Ansys Fluent

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Using a Single Rotating Reference Frame

A singular value decomposition based generalized finite difference method for fluid solid interaction problems

Comparison Between Different Immersed Boundary Conditions for Simulation of Complex Fluid Flows

The 3D DSC in Fluid Simulation

The Development of a Navier-Stokes Flow Solver with Preconditioning Method on Unstructured Grids

Stream Function-Vorticity CFD Solver MAE 6263

Axisymmetric Viscous Flow Modeling for Meridional Flow Calculation in Aerodynamic Design of Half-Ducted Blade Rows

FAST ALGORITHMS FOR CALCULATIONS OF VISCOUS INCOMPRESSIBLE FLOWS USING THE ARTIFICIAL COMPRESSIBILITY METHOD

1.2 Numerical Solutions of Flow Problems

Strömningslära Fluid Dynamics. Computer laboratories using COMSOL v4.4

Numerical Simulation of Coupled Fluid-Solid Systems by Fictitious Boundary and Grid Deformation Methods

Lab 9: FLUENT: Transient Natural Convection Between Concentric Cylinders

Introduction to the immersed boundary method

Solving Partial Differential Equations on Overlapping Grids

Studies of the Continuous and Discrete Adjoint Approaches to Viscous Automatic Aerodynamic Shape Optimization

A higher-order finite volume method with collocated grid arrangement for incompressible flows

A COUPLED FINITE VOLUME SOLVER FOR THE SOLUTION OF LAMINAR TURBULENT INCOMPRESSIBLE AND COMPRESSIBLE FLOWS

Numerical Methods in Aerodynamics. Fluid Structure Interaction. Lecture 4: Fluid Structure Interaction

Computational Fluid Dynamics - Incompressible Flows

Backward facing step Homework. Department of Fluid Mechanics. For Personal Use. Budapest University of Technology and Economics. Budapest, 2010 autumn

FEMLAB Exercise 1 for ChE366

FOURTH ORDER COMPACT FORMULATION OF STEADY NAVIER-STOKES EQUATIONS ON NON-UNIFORM GRIDS

Aeroacoustic computations with a new CFD solver based on the Lattice Boltzmann Method

Tutorial School on Fluid Dynamics: Topics in Turbulence May 24-28, 2010

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Two-Phase flows on massively parallel multi-gpu clusters

Validation of an Automatic Mesh Generation Technique in Engine Simulations

IMPLEMENTATION OF AN IMMERSED BOUNDARY METHOD IN SPECTRAL-ELEMENT SOFTWARE

Potsdam Propeller Test Case (PPTC)

Introduction to the immersed boundary method

Inviscid Flows. Introduction. T. J. Craft George Begg Building, C41. The Euler Equations. 3rd Year Fluid Mechanics

Transactions on Modelling and Simulation vol 20, 1998 WIT Press, ISSN X

A 3D VOF model in cylindrical coordinates

CFD MODELING FOR PNEUMATIC CONVEYING

An Eulerian Immersed Boundary Method for Flow Simulations over Stationary and Moving Rigid Bodies

Computational Simulation of the Wind-force on Metal Meshes

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

Steady Flow: Lid-Driven Cavity Flow

Analysis of the Flow in Hermetic Compressor Valves Using the Immersed Boundary Method

Coupled Analysis of FSI

Eulerian Techniques for Fluid-Structure Interactions - Part II: Applications

A Study of the Development of an Analytical Wall Function for Large Eddy Simulation of Turbulent Channel and Rectangular Duct Flow

Tutorial 17. Using the Mixture and Eulerian Multiphase Models

Available online at ScienceDirect. Parallel Computational Fluid Dynamics Conference (ParCFD2013)

Hydro-elastic analysis of a propeller using CFD and FEM co-simulation

Incompressible Viscous Flow Simulations Using the Petrov-Galerkin Finite Element Method

Application of A Priori Error Estimates for Navier-Stokes Equations to Accurate Finite Element Solution

Computational Fluid Dynamics using OpenCL a Practical Introduction

Performance of Implicit Solver Strategies on GPUs

Adarsh Krishnamurthy (cs184-bb) Bela Stepanova (cs184-bs)

Module 1: Introduction to Finite Difference Method and Fundamentals of CFD Lecture 13: The Lecture deals with:

Coupling of STAR-CCM+ to Other Theoretical or Numerical Solutions. Milovan Perić

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123

Compressible Flow in a Nozzle

Immersed Boundary Method in FOAM

Possibility of Implicit LES for Two-Dimensional Incompressible Lid-Driven Cavity Flow Based on COMSOL Multiphysics

Introduction to C omputational F luid Dynamics. D. Murrin

Three-dimensional numerical simulations of flapping wings at low Reynolds numbers

Computation of Velocity, Pressure and Temperature Distributions near a Stagnation Point in Planar Laminar Viscous Incompressible Flow

Verification of Moving Mesh Discretizations

Rotating Moving Boundary Analysis Using ANSYS 5.7

MASSACHUSETTS INSTITUTE OF TECHNOLOGY. Analyzing wind flow around the square plate using ADINA Project. Ankur Bajoria

Solution Recording and Playback: Vortex Shedding

Numerical and theoretical analysis of shock waves interaction and reflection

FLUENT Secondary flow in a teacup Author: John M. Cimbala, Penn State University Latest revision: 26 January 2016

Solved with COMSOL Multiphysics 4.0a. COPYRIGHT 2010 COMSOL AB.

C. A. D. Fraga Filho 1,2, D. F. Pezzin 1 & J. T. A. Chacaltana 1. Abstract

Flow and Heat Transfer in a Mixing Elbow

Using Multiple Rotating Reference Frames

Introduction to Computational Fluid Dynamics Mech 122 D. Fabris, K. Lynch, D. Rich

Verification of Laminar and Validation of Turbulent Pipe Flows

CGT 581 G Fluids. Overview. Some terms. Some terms

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation

COMPUTATIONAL FLUID DYNAMICS ANALYSIS OF ORIFICE PLATE METERING SITUATIONS UNDER ABNORMAL CONFIGURATIONS

Direct numerical simulation. in an annular pipe. of turbulent flow. Paolo Luchini & Maurizio Quadrio

Imagery for 3D geometry design: application to fluid flows.

Characteristic Aspects of SPH Solutions

Computational Fluid Dynamics (CFD) using Graphics Processing Units

Art Checklist. Journal Code: Article No: 7087

Implementation of a new discrete Immersed Boundary Method in OpenFOAM

This tutorial illustrates how to set up and solve a problem involving solidification. This tutorial will demonstrate how to do the following:

Application of Wray-Agarwal Turbulence Model for Accurate Numerical Simulation of Flow Past a Three-Dimensional Wing-body

ENERGY-224 Reservoir Simulation Project Report. Ala Alzayer

Numerical Modeling Study for Fish Screen at River Intake Channel ; PH (505) ; FAX (505) ;

Center for Computational Science

Fast Multipole Method on the GPU

Computation of Incompressible Navier-Stokes Equations by Local RBF-based Differential Quadrature Method

Development of immersed boundary methods for complex geometries

NUMERICAL 3D TRANSONIC FLOW SIMULATION OVER A WING

Comparison of different solvers for two-dimensional steady heat conduction equation ME 412 Project 2

Transcription:

cuibm A GPU Accelerated Immersed Boundary Method S. K. Layton, A. Krishnan and L. A. Barba Corresponding author: labarba@bu.edu Department of Mechanical Engineering, Boston University, Boston, MA, 225, USA. Abstract: A projection-based immersed boundary method (IBM) is dominated by sparse linear algebra routines. Using the open-source CUSP library, we observe a speedup with respect to a single CPU core which reflects the constraints of a bandwidth-dominated problem on the GPU. Nevertheless, GPUs offer the capacity to solve large problems on commodity hardware. This work includes validation and a convergence study of the GPU-accelerated IBM, and various optimizations. Keywords: Immersed Boundary, Computational Fluid Dynamics, GPU Computing. Introduction Conventional CFD techniques require the generation of a mesh that conforms to the geometry of any boundaries in the fluid domain. The immersed boundary method (IBM), in contrast, allows using a grid that does not conform to solid boundaries. In the IBM, the fluid is represented by an Eulerian grid (typically a Cartesian grid) and the solid boundary points are represented by a collection of Lagrangian points. This has several advantages. Mesh generation is trivial, and simulations involving moving solid bodies and boundaries are made simpler. The Navier-Stokes equations are solved on the entire grid (including points within the solid), and the effect of the solid body is modelled by adding a singular force distribution f along the solid boundary to the fluid which enforces the no-slip condition. The governing equations are: u t + u u = p + ν u + f(ξ(s, t))δ(ξ x)ds, s u =, u(ξ(s, t)) = u(x)δ(x ξ)dx = u B (ξ(s, t)), s where u B is the velocity of the body at the boundary point locations. The different IBM formulations use different techniques to calculate the forcing term, f. The IBM was introduced in 972 by Peskin [3] to model blood flow within the elastic membranes of the heart. It experienced a revival in the 99s thanks to increased computational capacity and growing interest in moving-boundary problems. The reader can find various IBM formulations described in the 25 review by Mittal and Iaccarino []. In the present work, we implement the algorithm presented in [4] for the solution of two-dimensional incompressible viscous flows with immersed boundaries, explained in detail in 2. To our knowledge, the IBM has not previously been implemented on the GPU. The perspective of doing so is the capacity of solving large three-dimensional moving boundary problems on commodity hardware. (a) (b) (c)

2 Immersed Boundary Projection Method The Navier-Stokes equations (a)-(c) are discretized on a staggered grid and we obtain the following set of algebraic equations: Âu n+ ˆr n = Ĝφ + ˆbc + Ĥf ˆDu n+ = bc 2 (2a) (2b) Êu n+ = u n+ B. (2c) Here, φ and f are vectors containing the pressure and the values of the singular force at the boundary points of the immersed boundary respectively. The velocity at the current time step u n is known; ˆbc and bc 2 are obtained from the boundary conditions on the velocity; Ĥ and Ê are the regularization and interpolation matrices respectively. These matrices are used to transfer values of the flow variables between the Eulerian and Lagrangian grids. The above system of equations can be solved to obtain the velocity field at time step n +, the pressure (to a constant) and the body forces. But the left-hand side matrix is indefinite, and solving the system directly would be ill-advised. For time stepping, an explicit second order Adams-Bashforth scheme is used for the convection terms and Crank-Nicolson is used for diffusion. All spatial derivatives are calculated using central differences. By performing appropriate transformations (see [4] for details), one can show that the above system is equivalent to: ( A Q Q T ) ( q n+ λ ) = ( r r 2 ), (3) where q n+ is the momentum flux at each cell boundary and λ is a vector containing both the pressure and the body force values. Consider an N th order approximation of the inverse of matrix A, given by B N. We can now perform the same factorisation as described in [2] to obtain the following set of equations, which can be solved to obtain the velocity distribution at time step n + : Aq = r Q T B N Qλ = Q T q r 2 q n+ = q B N Qλ (4a) (4b) (4c) Only the left hand side of (3) is affected by the factorisation, and hence r and r 2 remain the same. This factorisation is very advantageous as the two linear systems (4a) and (4b) that we now need to solve can be made positive definite, and can be solved efficiently using the conjugate gradient method. In the absence of an immersed boundary, this set of equations is the same as that solved in the traditional fractional step method or projection method [2]. The projection step (4c) simultaneously ensures a divergence-free velocity field and that the no-slip condition on the immersed boundary is satisfied in the next time step. 3 Implementation The matrices A, Q and Q T are sparse, the vectors q n+ and λ are dense, and all operations require tools for sparse linear algebra. To take advantage of the GPU, we need some way of both representing and operating on these matrices and vectors on the device. Currently, there are two main choices for this: CUSPARSE, part of NVIDIA s CUDA, or the external library, CUSP. The CUSP library is being developed by several NVIDIA employees with minimal software dependencies and released freely under an open-

Time [s].4.35.3.25.2.5..5 Average over timesteps (a) Timing breakdown AXPY Apply BCs Conversion Force Calculation Force Output Generate bc Generate r2 Generate rn MMM Mat vec Mem Transfer Output Preconditioner Solve Solve 2 Transfer q Transpose Update B Update QT Time [s] 4 35 3 25 2 5 5 CPU GPU 2 # of unknowns 3 4 x 6 (b) Solving linear equations Figure : (a) Timing breakdown for flow past a cylinder at Re = 4 using the GPU code. (b) Comparison of time taken to solve a system of linear equations Ax = b on the CPU and GPU. A is chosen as the standard 5-pt Poisson stencil. source license. We use the CUSP library for several reasons: it is actively developed and separate from the main CUDA distribution, allowing for faster addition of new features (such as new pre-conditioners, solvers, etc.); and, all objects/methods from the library are usable on both CPU and GPU. This allows us the flexibility to, for example, perform branching-heavy code on the CPU, before trivially transferring to the device and running (for instance) a linear solve, where it will be significantly faster. It also allows us to maintain both a CPU and GPU code. Figure (a) shows a breakdown of the timings from an example run ( time steps of flow past a cylinder at Re = 4). The mesh comprises of 4 8 cells, resulting in systems of over 3, unknowns. Even in this relatively small test, the time is dominated by the solution of a linear system, denoted by Solve 2. Speeding up this linear solve is the major motivation for using the GPU. Figure (b) shows a timing comparison between the CPU and GPU using CUSP s conjugate gradient solver. The system being solved in this case is given by a traditional 5-point Poisson stencil, which while not directly used in the IBM code, gives a good measure of relative performance. The plot shows the wall-clock time required to solve to a relative accuracy of 5 for numbers of unknowns ranging from 25 to 4 6. For large systems, the GPU solve is significantly faster, with a speedup of 8 for the largest system shown. Our choice of tools allows us to easily perform all sparse linear algebra operations on the GPU. On the other hand, there are parts of the algorithm that cannot easily be expressed using linear algebra, such as generating the convection term using a finite-difference stencil and applying boundary conditions to the velocities (which involves modifying select values of appropriate arrays). One possible way of performing these actions is to transfer data from the GPU, do the calculations on the CPU and transfer the modified vector(s) back to the GPU every time step this incurs a prohibitively high cost in memory transfers. The alternative is to use custom-written CUDA kernels utilizing all appropriate techniques, including the use of shared memory, to perform these operations on the GPU. This requires access to the underlying data from the CUSP data structures, and can be done using the Thrust library, on which CUSP was built. The combination of accelerated linear algebra and custom kernels on the GPU has resulted in initial runs showing up to 7 speedup over our equivalent CPU code, for the problem sizes we ran. This is almost as good as the 8 speedup experienced by the 5-point Poisson solver in Figure (b).

4 Validation 4. Couette flow between concentric cylinders As a validation test, we calculate the flow between two concentric cylinders of radius r i =.5 and r o = centered at the origin. The outer cylinder is held stationary while the inner cylinder is impulsively rotated from rest with an angular velocity of Ω =.5. The cylinders are contained in a square stationary box of side.5 centered at the origin. The fluid in the entire domain is initially at rest and the calculations were carried out for kinematic viscosity ν =.3. The steady-state analytical solution for this flow is known. The velocity distribution in the interior of the inner cylinder is the same as for solid body rotation and the azimuthal velocity between the two cylinders is given by: u θ (r) = Ωr i (r o /r r/r o ) (r o /r i r i /r o ). (5) We compared this to the numerical solution for six different grid sizes ranging from 75 75 to 45 45. Table shows the L 2 and L norms of the relative errors and Figure 2(b) shows that the scheme is first-order accurate in space, as expected for the IBM formulation we used..3.25 5x5 grid Analytical Solution L-2 norm L-inf norm st Order convergence u (r).2.5. Error norm...5..2.3.4.5.6.7.8.9 r (a)... (b) Cell width Figure 2: (a) Comparison of the numerical solution on a 5 5 grid with the analytical solution and (b) convergence study, showing errors for different grid sizes. To verify the temporal order of convergence, we ran a simulation from t = to t = 8 on a 5 5 grid, using different time steps ( t =.,.5 and.25). Both first- and third-order accurate expansions of B N were used and the calculated orders of convergence (using the L 2 norms of the differences in the solutions) at various times have been summarised in Table, and are as expected. Order of convergence Order of convergence Time (N = ) (N = 3).8.97 2.67 2.99 2.85 4.93 2.73 8.97 2.83 Table : Calculated order of convergence at different times for Couette-flow validation.

2 5.5.5 -.5 - -.5-2 - 2 3 4 Drag Coefficient 4 3 2 2 4 6 8 2 4 6 8 2 Time (a) (b) Figure 3: Steady state vorticity field (a) and time varying drag coefficient (b) for external flow over a circular cylinder at Reynolds number 4. The contour lines in (a) are drawn from -3 to 3 in steps of.4. 4.2 External flow over a circular cylinder We also carried out computations to simulate external flow over a circular cylinder at Reynolds number 4. The cylinder is of diameter d = centered at the origin and is placed in an external flow with freestream velocity u =. The simulation was carried out on a 2 2 grid with uniform cell spacing in the entire domain, which was a square with opposite corners at ( 5, 5) and (5, 5). The velocity at the inlet, top and bottom of the domain was fixed to the freestream velocity and the outlet boundary condition used was u t + u u x =. The initial condition was a uniform velocity field in the entire domain. The vorticity field obtained for this case is shown in Figure 3(a) and the time varying drag coefficient is plotted in Figure 3(b). The drag coefficient at steady state is found to be.6, which is in good agreement with the expected value [5]. 5 Conclusions and Future Work At this time, we have a validated GPU code for the projection IBM, and we have shown convergence with the expected rates. Using the free and open-source CUSP and Thrust libraries to provide sparse linear algebra functionality, a speedup of 7 over the equivalent CPU code was obtained for the largest tested problem. In the final paper we will provide a more extensive study of optimizations, timing and breakdowns and demonstrate moving boundary applications. References [] R. Mittal and G. Iaccarino. Immersed boundary methods. Ann. Rev. Fluid Mech., 37():239 26, 25. [2] J. B. Perot. An analysis of the fractional step method. J. Comp. Phys., 8():5 58, 993. [3] C.S. Peskin. Flow patterns around heart valves: A numerical method. J. Comp. Phys., (2):252 27, 972. [4] K. Taira and T. Colonius. The immersed boundary method: A projection approach. J. Comp. Phys., 225(2):28 237, 27. [5] D. J. Tritton. Experiments on the flow past a circular cylinder at low Reynolds numbers. J. Fluid Mech., 6(4):547 567, 959.