Fast Multipole Method on the GPU

Size: px
Start display at page:

Download "Fast Multipole Method on the GPU"

Transcription

1 Fast Multipole Method on the GPU with application to the Adaptive Vortex Method University of Bristol, Bristol, United Kingdom. 1

2 Introduction Particle methods Highly parallel Computational intensive Numerical Challenge: N-body problem Opportunity: Clever algorithms Massively parallel architectures (GPUs) Contribution: Mesh-less method. Accelerated using clever algorithms (FMM). Implementation for GPUs. 2

3 Overview of the presentation Adaptive Vortex Method (brief introduction) Algorithmic representation The Fast Multipole Method Introduction to the algorithm GPU implementation Lessons learned Final remark 3

4 Vortex Method for fluid simulation 4

5 Vortex Method for fluid simulation Incompresible Newtonian fluid (2D case) u t + u u = p ρ + ν 2 u Navier-Stokes equation on vorticity formulation ω ω = u t + u ω = ω u + ν 2 ω 5

6 Vortex Method for fluid simulation Discretize the vorticity field into particles ω σ (x, t) = N i=1 γ i ζ σ (x x i ) Each particle carries vorticity ω ζ σ (x) = 1 2πσ 2 exp ( x 2 2σ 2 ) Particles move with the fluid u dx i dt = u(x i,t) 6

7 Vortex Method for fluid simulation The velocity can be obtained from the vorticity field: ω = 2 ψ u(x) = 1 2π (x x ) ω(x )ê z x x 2 dx where ω is given by the discretized vorticity field, which results in an N-body problem: u σ (x, t) = N i=1 γ i K σ (x x i ) K σ = 1 ( )) 2π x 2 ( x 2,x 1 ) 1 exp ( x 2 2σ 2 7

8 Vortex Method Algorithm 8

9 Vortex Method algorithm 1.Discretization 2.Velocity evaluation 3.Convection 4.Diffusion 5.Spatial adaptation Start N ω(x,t) ω σ (x,t)= i=1 Γ i (t)ζ σi (x x i (t)). End 9

10 Vortex Method algorithm 1.Discretization 2.Velocity evaluation 3.Convection 4.Diffusion 5.Spatial adaptation Start N u σ (x,t)= j=1 Γ j K σ (x x j ) End 10

11 Vortex Method algorithm 1.Discretization 2.Velocity evaluation 3.Convection 4.Diffusion 5.Spatial adaptation Start dx i dt = u(x i,t) End 11

12 Vortex Method algorithm 1.Discretization 2.Velocity evaluation 3.Convection 4.Diffusion 5.Spatial adaptation Start dω dt = ν 2 ω End 12

13 Vortex Method algorithm 1.Discretization 2.Velocity evaluation 3.Convection 4.Diffusion 5.Spatial adaptation Start N ω(x,t) ω σ (x,t)= i=1 Γ i (t)ζ σi (x x i (t)). End 13

14 VM advantages Low numerical diffusion. No mesh. It adapts to the fluid. VM challenges Efficient treatment of boundary conditions. Numerical: solution of an N-body problem. 14

15 Fast Multipole Method 15

16 Fast summation problem Accelerate the evaluation of problems of the form: f(y) = N c i K(y x i ) y [1...N] i=1 For N evaluations the total amount of work is proportional to N 2 We want to solve this kind of problems in less than O(N 2 ): We want a O(N) and highly accurate algorithm The FMM exchanges accuracy for speed and we control the accuracy. 16

17 ! " # $ % & ' ( ) * +, -. / : ; < = A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { } ~ The Fast Multipole Method The FMM is based on ME to approximate the kernel function when evaluated far away from the origin. A ME is an infinite series truncated after p terms. This is how we control the accuracy of the approximation. K( y x c )= p a m (x c )f m (y) m=0 y y a m (x c ) : coefficient terms r r x c x i x 17

18 The Fast Multipole Method The basic idea is to use this ME to approximate a cluster of particles as a single pseudo particle. The bigger the distance to the cluster, the bigger the pseudo particles can be. Direct evaluation for all particles in the near-field. pseudo-particles particles Distance Evaluation point b r Domain decomposition 18

19 The Fast Multipole Method A Local Expansion (LE) is used to approximate the influence of a group of Multipole Expansions. An LE provides a local description of the influence of a particle that is located far away. Far field evaluation using a single Local Expansion. 19

20 The Fast Multipole Method A Local Expansion (LE) is used to approximate the influence of a group of Multipole Expansions. An LE provides a local description of the influence of a particle that is located far away. Far field evaluation using a single Local Expansion. 20

21 The Fast Multipole Method The computation related to the tree-structure, in the O(N) algorithm: Upward Sweep Downward Sweep Create Multipole Expansions. Evaluate Local Expansions. P2M M2M M2L L2L L2P 21

22 Fast Multipole Method on the GPU 22

23 Exposing task level parallelism Stages: Setup Upward Sweep Downward Sweep Evaluation Directed Acyclic Graph of the FMM. Show tasks dependencies. Expose Task level parallelism. 1. Tree creation. 2. Particle clustering. 3. Listing of clusters interactions. 4. Particle to Multipole. 5. Multipole to Multipole. 6. Multipole to Local. 7. Local to Local. 8. Local to Particle. 9. Near field evaluation. 10. Adding near and far field contributions. 23

24 FMM: Computational time per stage Downward Sweep (M2L) and particle evaluation = over 99% of time ME Initialization Upward Sweep Downward Sweep Evaluation Total time Opportunities for these two stages, big gains. Particle evaluation easy to implement for the GPU. Time [sec] Focus on Multipole-to-Local operations (M2L) Number of processors Computational time Parallel FMM (PetFMM) 10 million particles FMM level 9 FMM terms 17 24

25 Accelerating the M2L M2L stage can over 99% of computation time. One LE is formed by several transformed MEs. In total, many LEs are produced but only one per cluster. (L=5 requires 27,648 M2L translations) The M2L transformations as a matrix vector operator. M2L implementation is: matrix free, and computationally intensive. ME (orange) used to produce a single LE (blue) M2L(t) ME LE M2L Transformation 25

26 Accelerating the M2L Work reorganization: From hierarchical structure to a Queue. Homogeneous units of work. Improved temporal locality. Upward Sweep Downward Sweep Create Multipole Expansions. Evaluate Local Expansions. P2M M2M M2L L2L L2P 26

27 Accelerating the M2L Work reorganization: From hierarchical structure to a Queue. Homogeneous units of work. Improved temporal locality. Upward Sweep Downward Sweep Reorganized Task Queue M2L(A, c 1 ) M2L(A, c 2 ) Reorganize computations M2L(A, c 3 ) M2L(B, c 1 ) Create Multipole Expansions. Evaluate Local Expansions. P2M M2M M2L L2L L2P M2L(B, c 2 ) M2L(B, c 3 ) 27

28 GPU kernel version 1 Each thread transforms one ME. Matrix free multiplication. Efficient matrix creation and multiplication. No thread synchronization is required. Resource intensive thread. ME LE Non-coalesced memory transactions. Single thread computation pattern Result: 20 Giga-operations. (1 C1060 card) 20x speedup. 28

29 GPU kernel version 2 Many threads transform one ME. One thread computes only one term. Less float-operation efficient. More parallelism. Coalesced memory transactions. ME LE Less resources per thread. Other memory tricks. Multiple threads computation pattern Result: 482 Giga-operations. (1 C1060 card) 100x speedup. 29

30 Lessons Learned 30

31 Paradigm shift Start by exposing parallelism: Think about homogeneous units of work. Think about thousands of parallel operations. Think about smart usage of resources. Trade operation efficiency for more parallel and resource efficient kernels. Think about heterogeneous computing. GPUs are not a silver bullet. Use CPU to reorganize work. 31

32 Conclusions Heterogeneous Computing: use all available hardware! Current FMM peak: 480 giga-ops. Methodology: Identify and expose parallelism Distribute work between CPU and GPU Use the best for each job! Current Work: Parallel FMM library (many applications) Multi-GPU implementation of the FMM. 32

33 Ongoing work Particle methods maps well to new architectures. However, particle methods has the disadvantage of not being as mature as mesh-based methods. Much more research has been done for conventional mesh methods. On going work: A compromise between method, hybrid particle-mesh methods on new architectures. 33

34 Final remark Novel Architectures Current Applications How to cross the bridge between new technologies to current applications? Re-develop algorithms can give large speedups but is far from trivial. Port algorithms can give small speedups with less effort. Cost effective solution: Research / development of heterogeneity aware libraries. 34

35 Thanks for listening 35

36 Velocity calculation: Gaussian particles N-body problem ζ σ (x) = 1 2πσ 2 exp ( x 2 2σ 2 ) vorticity ω σ (x, t) = N γ i ζ σ (x x i ) i=1 velocity u σ (x, t) = N γ i K σ (x x i ) i=1 with K σ = 1 2π x 2 ( x 2,x 1 ) ( )) 1 exp ( x 2 2σ 2 36

37 Vortex sheet Discontinuity in the velocity field. Represented by vortex elements. γ(s) 1 π [ n [log x(s) x(s ) ] ρ 1(s) L ]γ(s )ds = 2 u slip ŝ ω t ν ω =0, ω(t δt) =0, ν ω n = γ(s) δt 37

38 Vortex Method algorithm 1.Discretization 2.Velocity evaluation 3.Convection 4.Diffusion 5.Spatial adaptation Start End

39 Vortex method algorithm with panel-free boundary conditions Start 5 B 1 A 2 4 A.Vortex sheet calculation B.Vortex sheet diffusion 3 End

40 Vortex method algorithm with panel-free boundary conditions Start 5 B 1 A 2 4 A.Vortex sheet calculation B.Vortex sheet diffusion 3 End

41 Vortex method algorithm with panel-free boundary conditions Start 5 B 1 A 2 4 A.Vortex sheet calculation B.Vortex sheet diffusion 3 End

42 Panel-free method Discretize into points. Particle discretization Points are the control points. B.C. are enforced at the control points. RBF solution. 42

43 Panel-free method Discretize into points. Particle discretization Points are the control points. B.C. are enforced at the control points. RBF solution. γ(x) N φ( x c i )α i i=1

44 { Accelerating the M2L M2L: Two stage computation ME ME ME ME ME ME Stage 1: Transformation of ME. Stage 2: Reduction of LE. LE 44

45 PetFMM Parallel extensible toolkit for the FMM M2M and L2L translations M2L transformation Local domain Root tree Level k Sub-tree 1 Sub-tree 2 Sub-tree 3 Sub-tree 4 Sub-tree 5 Sub-tree 6 Sub-tree 7 Sub-tree 8 Parallelization strategy 45

46 PetFMM Parallel extensible toolkit for the FMM w i c ij w j Parallel work distribution 46

47 PetFMM Parallel extensible toolkit for the FMM Speedup uniform 4ML8R5 uniform 10ML9R5 spiral 1ML8R5 spiral w/ space-filling 1ML8R5 Perfect Speedup Number of processors Speedup of PetFMM for different test cases 47

Center for Computational Science

Center for Computational Science Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,

More information

Fast Multipole and Related Algorithms

Fast Multipole and Related Algorithms Fast Multipole and Related Algorithms Ramani Duraiswami University of Maryland, College Park http://www.umiacs.umd.edu/~ramani Joint work with Nail A. Gumerov Efficiency by exploiting symmetry and A general

More information

ExaFMM. Fast multipole method software aiming for exascale systems. User's Manual. Rio Yokota, L. A. Barba. November Revision 1

ExaFMM. Fast multipole method software aiming for exascale systems. User's Manual. Rio Yokota, L. A. Barba. November Revision 1 ExaFMM Fast multipole method software aiming for exascale systems User's Manual Rio Yokota, L. A. Barba November 2011 --- Revision 1 ExaFMM User's Manual i Revision History Name Date Notes Rio Yokota,

More information

A Kernel-independent Adaptive Fast Multipole Method

A Kernel-independent Adaptive Fast Multipole Method A Kernel-independent Adaptive Fast Multipole Method Lexing Ying Caltech Joint work with George Biros and Denis Zorin Problem Statement Given G an elliptic PDE kernel, e.g. {x i } points in {φ i } charges

More information

Stokes Preconditioning on a GPU

Stokes Preconditioning on a GPU Stokes Preconditioning on a GPU Matthew Knepley 1,2, Dave A. Yuen, and Dave A. May 1 Computation Institute University of Chicago 2 Department of Molecular Biology and Physiology Rush University Medical

More information

The Fast Multipole Method on NVIDIA GPUs and Multicore Processors

The Fast Multipole Method on NVIDIA GPUs and Multicore Processors The Fast Multipole Method on NVIDIA GPUs and Multicore Processors Toru Takahashi, a Cris Cecka, b Eric Darve c a b c Department of Mechanical Science and Engineering, Nagoya University Institute for Applied

More information

Tree-based methods on GPUs

Tree-based methods on GPUs Tree-based methods on GPUs Felipe Cruz 1 and Matthew Knepley 2,3 1 Department of Mathematics University of Bristol 2 Computation Institute University of Chicago 3 Department of Molecular Biology and Physiology

More information

21. Efficient and fast numerical methods to compute fluid flows in the geophysical β plane

21. Efficient and fast numerical methods to compute fluid flows in the geophysical β plane 12th International Conference on Domain Decomposition Methods Editors: Tony Chan, Takashi Kako, Hideo Kawarada, Olivier Pironneau, c 2001 DDM.org 21. Efficient and fast numerical methods to compute fluid

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

cuibm A GPU Accelerated Immersed Boundary Method

cuibm A GPU Accelerated Immersed Boundary Method cuibm A GPU Accelerated Immersed Boundary Method S. K. Layton, A. Krishnan and L. A. Barba Corresponding author: labarba@bu.edu Department of Mechanical Engineering, Boston University, Boston, MA, 225,

More information

Fast Multipole Methods on a Cluster of GPUs for the Meshless Simulation of Turbulence

Fast Multipole Methods on a Cluster of GPUs for the Meshless Simulation of Turbulence Fast Multipole Methods on a Cluster of GPUs for the Meshless Simulation of Turbulence Rio Yokota 1, Tetsu Narumi 2, Ryuji Sakamaki 3, Shun Kameoka 3, Shinnosuke Obi 3, Kenji Yasuoka 3 1 Department of Mathematics,

More information

GPU accelerated heterogeneous computing for Particle/FMM Approaches and for Acoustic Imaging

GPU accelerated heterogeneous computing for Particle/FMM Approaches and for Acoustic Imaging GPU accelerated heterogeneous computing for Particle/FMM Approaches and for Acoustic Imaging Ramani Duraiswami University of Maryland, College Park http://www.umiacs.umd.edu/~ramani With Nail A. Gumerov,

More information

Slat noise prediction with Fast Multipole BEM based on anisotropic synthetic turbulence sources

Slat noise prediction with Fast Multipole BEM based on anisotropic synthetic turbulence sources DLR.de Chart 1 Slat noise prediction with Fast Multipole BEM based on anisotropic synthetic turbulence sources Nils Reiche, Markus Lummer, Roland Ewert, Jan W. Delfs Institute of Aerodynamics and Flow

More information

Accelerated flow acoustic boundary element solver and the noise generation of fish

Accelerated flow acoustic boundary element solver and the noise generation of fish Accelerated flow acoustic boundary element solver and the noise generation of fish JUSTIN W. JAWORSKI, NATHAN WAGENHOFFER, KEITH W. MOORED LEHIGH UNIVERSITY, BETHLEHEM, USA FLINOVIA PENN STATE 27 APRIL

More information

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

Virtual EM Inc. Ann Arbor, Michigan, USA

Virtual EM Inc. Ann Arbor, Michigan, USA Functional Description of the Architecture of a Special Purpose Processor for Orders of Magnitude Reduction in Run Time in Computational Electromagnetics Tayfun Özdemir Virtual EM Inc. Ann Arbor, Michigan,

More information

GPU-based Distributed Behavior Models with CUDA

GPU-based Distributed Behavior Models with CUDA GPU-based Distributed Behavior Models with CUDA Courtesy: YouTube, ISIS Lab, Universita degli Studi di Salerno Bradly Alicea Introduction Flocking: Reynolds boids algorithm. * models simple local behaviors

More information

PHYSICALLY BASED ANIMATION

PHYSICALLY BASED ANIMATION PHYSICALLY BASED ANIMATION CS148 Introduction to Computer Graphics and Imaging David Hyde August 2 nd, 2016 WHAT IS PHYSICS? the study of everything? WHAT IS COMPUTATION? the study of everything? OUTLINE

More information

Numerical Algorithms on Multi-GPU Architectures

Numerical Algorithms on Multi-GPU Architectures Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications

More information

Fast Methods with Sieve

Fast Methods with Sieve Fast Methods with Sieve Matthew G Knepley Mathematics and Computer Science Division Argonne National Laboratory August 12, 2008 Workshop on Scientific Computing Simula Research, Oslo, Norway M. Knepley

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids

More information

Panel methods are currently capable of rapidly solving the potential flow equation on rather complex

Panel methods are currently capable of rapidly solving the potential flow equation on rather complex A Fast, Unstructured Panel Solver John Moore 8.337 Final Project, Fall, 202 A parallel high-order Boundary Element Method accelerated by the Fast Multipole Method is presented in this report. The case

More information

Collocation and optimization initialization

Collocation and optimization initialization Boundary Elements and Other Mesh Reduction Methods XXXVII 55 Collocation and optimization initialization E. J. Kansa 1 & L. Ling 2 1 Convergent Solutions, USA 2 Hong Kong Baptist University, Hong Kong

More information

Stream Function-Vorticity CFD Solver MAE 6263

Stream Function-Vorticity CFD Solver MAE 6263 Stream Function-Vorticity CFD Solver MAE 66 Charles O Neill April, 00 Abstract A finite difference CFD solver was developed for transient, two-dimensional Cartesian viscous flows. Flow parameters are solved

More information

Parallelized Coupled Solver (PCS) Model Refinements & Extensions

Parallelized Coupled Solver (PCS) Model Refinements & Extensions Parallelized Coupled Solver (PCS) Model Refinements & Extensions Sven Schmitz GE Wind November 29 th, 2007 Greenville, SC University of California, Davis Schmitz GE Wind - PCS 1 Outline 2007 Parallelized

More information

Efficient tools for the simulation of flapping wing flows

Efficient tools for the simulation of flapping wing flows 43rd AIAA Aerospace Sciences Meeting and Exhibit 1-13 January 25, Reno, Nevada AIAA 25-85 Efficient tools for the simulation of flapping wing flows Jeff D. Eldredge Mechanical & Aerospace Engineering Department,

More information

Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM)

Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM) Computational Methods and Experimental Measurements XVII 235 Investigation of cross flow over a circular cylinder at low Re using the Immersed Boundary Method (IBM) K. Rehman Department of Mechanical Engineering,

More information

FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION. M. Panchatcharam 1, S. Sundar 2

FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION. M. Panchatcharam 1, S. Sundar 2 International Journal of Applied Mathematics Volume 25 No. 4 2012, 547-557 FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION M. Panchatcharam 1, S. Sundar 2 1,2 Department of Mathematics

More information

NIA CFD Futures Conference Hampton, VA; August 2012

NIA CFD Futures Conference Hampton, VA; August 2012 Petascale Computing and Similarity Scaling in Turbulence P. K. Yeung Schools of AE, CSE, ME Georgia Tech pk.yeung@ae.gatech.edu NIA CFD Futures Conference Hampton, VA; August 2012 10 2 10 1 10 4 10 5 Supported

More information

A brief description of the particle finite element method (PFEM2). Extensions to free surface

A brief description of the particle finite element method (PFEM2). Extensions to free surface A brief description of the particle finite element method (PFEM2). Extensions to free surface flows. Juan M. Gimenez, L.M. González, CIMEC Universidad Nacional del Litoral (UNL) Santa Fe, Argentina Universidad

More information

arxiv: v4 [cs.na] 20 Aug 2012

arxiv: v4 [cs.na] 20 Aug 2012 FMM-based vortex method for simulation of isotropic turbulence on GPUs, compared with a method Rio Yokota a,, L. A. Barba a a Department of Mechanical Engineering, Boston University, Boston, MA, 5, USA.

More information

Parallel and Distributed Systems Lab.

Parallel and Distributed Systems Lab. Parallel and Distributed Systems Lab. Department of Computer Sciences Purdue University. Jie Chi, Ronaldo Ferreira, Ananth Grama, Tzvetan Horozov, Ioannis Ioannidis, Mehmet Koyuturk, Shan Lei, Robert Light,

More information

Lecture 1.1 Introduction to Fluid Dynamics

Lecture 1.1 Introduction to Fluid Dynamics Lecture 1.1 Introduction to Fluid Dynamics 1 Introduction A thorough study of the laws of fluid mechanics is necessary to understand the fluid motion within the turbomachinery components. In this introductory

More information

Integral Equation Methods for Vortex Dominated Flows, a High-order Conservative Eulerian Approach

Integral Equation Methods for Vortex Dominated Flows, a High-order Conservative Eulerian Approach Integral Equation Methods for Vortex Dominated Flows, a High-order Conservative Eulerian Approach J. Bevan, UIUC ICERM/HKUST Fast Integral Equation Methods January 5, 2016 Vorticity and Circulation Γ =

More information

Possibility of Implicit LES for Two-Dimensional Incompressible Lid-Driven Cavity Flow Based on COMSOL Multiphysics

Possibility of Implicit LES for Two-Dimensional Incompressible Lid-Driven Cavity Flow Based on COMSOL Multiphysics Possibility of Implicit LES for Two-Dimensional Incompressible Lid-Driven Cavity Flow Based on COMSOL Multiphysics Masanori Hashiguchi 1 1 Keisoku Engineering System Co., Ltd. 1-9-5 Uchikanda, Chiyoda-ku,

More information

Interdisciplinary practical course on parallel finite element method using HiFlow 3

Interdisciplinary practical course on parallel finite element method using HiFlow 3 Interdisciplinary practical course on parallel finite element method using HiFlow 3 E. Treiber, S. Gawlok, M. Hoffmann, V. Heuveline, W. Karl EuroEDUPAR, 2015/08/24 KARLSRUHE INSTITUTE OF TECHNOLOGY -

More information

Topology optimization of heat conduction problems

Topology optimization of heat conduction problems Topology optimization of heat conduction problems Workshop on industrial design optimization for fluid flow 23 September 2010 Misha Marie Gregersen Anton Evgrafov Mads Peter Sørensen Technical University

More information

FMM implementation on CPU and GPU. Nail A. Gumerov (Lecture for CMSC 828E)

FMM implementation on CPU and GPU. Nail A. Gumerov (Lecture for CMSC 828E) FMM implementation on CPU and GPU Nail A. Gumerov (Lecture for CMSC 828E) Outline Two parts of the FMM Data Structure Flow Chart of the Run Algorithm FMM Cost/Optimization on CPU Programming on GPU Fast

More information

Driven Cavity Example

Driven Cavity Example BMAppendixI.qxd 11/14/12 6:55 PM Page I-1 I CFD Driven Cavity Example I.1 Problem One of the classic benchmarks in CFD is the driven cavity problem. Consider steady, incompressible, viscous flow in a square

More information

A Deterministic Viscous Vortex Method for Grid-free CFD with Moving Boundary Conditions

A Deterministic Viscous Vortex Method for Grid-free CFD with Moving Boundary Conditions A Deterministic Viscous Vortex Method for Grid-free CFD with Moving Boundary Conditions M.W. PITMAN, A.D. LUCEY Department of Mechanical Engineering Curtin University of Technology GPO Box U1987, Perth,

More information

Computing Nearly Singular Solutions Using Pseudo-Spectral Methods

Computing Nearly Singular Solutions Using Pseudo-Spectral Methods Computing Nearly Singular Solutions Using Pseudo-Spectral Methods Thomas Y. Hou Ruo Li January 9, 2007 Abstract In this paper, we investigate the performance of pseudo-spectral methods in computing nearly

More information

CMSC 858M/AMSC 698R. Fast Multipole Methods. Nail A. Gumerov & Ramani Duraiswami. Lecture 20. Outline

CMSC 858M/AMSC 698R. Fast Multipole Methods. Nail A. Gumerov & Ramani Duraiswami. Lecture 20. Outline CMSC 858M/AMSC 698R Fast Multipole Methods Nail A. Gumerov & Ramani Duraiswami Lecture 20 Outline Two parts of the FMM Data Structures FMM Cost/Optimization on CPU Fine Grain Parallelization for Multicore

More information

(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more

(LSS Erlangen, Simon Bogner, Ulrich Rüde, Thomas Pohl, Nils Thürey in collaboration with many more Parallel Free-Surface Extension of the Lattice-Boltzmann Method A Lattice-Boltzmann Approach for Simulation of Two-Phase Flows Stefan Donath (LSS Erlangen, stefan.donath@informatik.uni-erlangen.de) Simon

More information

The Fast Multipole Method and the Radiosity Kernel

The Fast Multipole Method and the Radiosity Kernel and the Radiosity Kernel The Fast Multipole Method Sharat Chandran http://www.cse.iitb.ac.in/ sharat January 8, 2006 Page 1 of 43 (Joint work with Alap Karapurkar and Nitin Goel) 1 Copyright c 2005 Sharat

More information

A higher-order finite volume method with collocated grid arrangement for incompressible flows

A higher-order finite volume method with collocated grid arrangement for incompressible flows Computational Methods and Experimental Measurements XVII 109 A higher-order finite volume method with collocated grid arrangement for incompressible flows L. Ramirez 1, X. Nogueira 1, S. Khelladi 2, J.

More information

Vortex Method Applications. Peter S. Bernard University of Maryland

Vortex Method Applications. Peter S. Bernard University of Maryland Vortex Method Applications Peter S. Bernard University of Maryland Vortex Methods Flow field is represented using gridfree vortex elements Navier-Stokes equation governs the dynamics of the freely convecting

More information

Numerical Simulation of Coupled Fluid-Solid Systems by Fictitious Boundary and Grid Deformation Methods

Numerical Simulation of Coupled Fluid-Solid Systems by Fictitious Boundary and Grid Deformation Methods Numerical Simulation of Coupled Fluid-Solid Systems by Fictitious Boundary and Grid Deformation Methods Decheng Wan 1 and Stefan Turek 2 Institute of Applied Mathematics LS III, University of Dortmund,

More information

ALE Seamless Immersed Boundary Method with Overset Grid System for Multiple Moving Objects

ALE Seamless Immersed Boundary Method with Overset Grid System for Multiple Moving Objects Tenth International Conference on Computational Fluid Dynamics (ICCFD10), Barcelona,Spain, July 9-13, 2018 ICCFD10-047 ALE Seamless Immersed Boundary Method with Overset Grid System for Multiple Moving

More information

CUDA Experiences: Over-Optimization and Future HPC

CUDA Experiences: Over-Optimization and Future HPC CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign

More information

Geodesics in heat: A new approach to computing distance

Geodesics in heat: A new approach to computing distance Geodesics in heat: A new approach to computing distance based on heat flow Diana Papyan Faculty of Informatics - Technische Universität München Abstract In this report we are going to introduce new method

More information

Scalable, Hybrid-Parallel Multiscale Methods using DUNE

Scalable, Hybrid-Parallel Multiscale Methods using DUNE MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE R. Milk S. Kaulmann M. Ohlberger December 1st 2014 Outline MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 2 /28 Abstraction

More information

Kernel Independent FMM

Kernel Independent FMM Kernel Independent FMM FMM Issues FMM requires analytical work to generate S expansions, R expansions, S S (M2M) translations S R (M2L) translations R R (L2L) translations Such analytical work leads to

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964

More information

Mass-Spring Systems. Last Time?

Mass-Spring Systems. Last Time? Mass-Spring Systems Last Time? Implicit Surfaces & Marching Cubes/Tetras Collision Detection & Conservative Bounding Regions Spatial Acceleration Data Structures Octree, k-d tree, BSF tree 1 Today Particle

More information

Particle-Based Fluid Simulation. CSE169: Computer Animation Steve Rotenberg UCSD, Spring 2016

Particle-Based Fluid Simulation. CSE169: Computer Animation Steve Rotenberg UCSD, Spring 2016 Particle-Based Fluid Simulation CSE169: Computer Animation Steve Rotenberg UCSD, Spring 2016 Del Operations Del: = x Gradient: s = s x y s y z s z Divergence: v = v x + v y + v z x y z Curl: v = v z v

More information

Towards a Parallel, 3D Simulation of Platelet Aggregation and Blood Coagulation

Towards a Parallel, 3D Simulation of Platelet Aggregation and Blood Coagulation Towards a Parallel, 3D Simulation of Platelet Aggregation and Blood Coagulation p. 1/22 Towards a Parallel, 3D Simulation of Platelet Aggregation and Blood Coagulation Oral Exam Elijah Newren January 7,

More information

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Asynchronous OpenCL/MPI numerical simulations of conservation laws Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation

More information

Parallel 3D Sweep Kernel with PaRSEC

Parallel 3D Sweep Kernel with PaRSEC Parallel 3D Sweep Kernel with PaRSEC Salli Moustafa Mathieu Faverge Laurent Plagne Pierre Ramet 1 st International Workshop on HPC-CFD in Energy/Transport Domains August 22, 2014 Overview 1. Cartesian

More information

An Embedded Boundary Method with Adaptive Mesh Refinements

An Embedded Boundary Method with Adaptive Mesh Refinements An Embedded Boundary Method with Adaptive Mesh Refinements Marcos Vanella and Elias Balaras 8 th World Congress on Computational Mechanics, WCCM8 5 th European Congress on Computational Methods in Applied

More information

Massively Parallel Phase Field Simulations using HPC Framework walberla

Massively Parallel Phase Field Simulations using HPC Framework walberla Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich

More information

Exploring the features of OpenCL 2.0

Exploring the features of OpenCL 2.0 Exploring the features of OpenCL 2.0 Saoni Mukherjee, Xiang Gong, Leiming Yu, Carter McCardwell, Yash Ukidave, Tuan Dao, Fanny Paravecino, David Kaeli Northeastern University Outline Introduction and evolution

More information

Solving a Two Dimensional Unsteady-State. Flow Problem by Meshless Method

Solving a Two Dimensional Unsteady-State. Flow Problem by Meshless Method Applied Mathematical Sciences, Vol. 7, 203, no. 49, 242-2428 HIKARI Ltd, www.m-hikari.com Solving a Two Dimensional Unsteady-State Flow Problem by Meshless Method A. Koomsubsiri * and D. Sukawat Department

More information

Quasi-3D Computation of the Taylor-Green Vortex Flow

Quasi-3D Computation of the Taylor-Green Vortex Flow Quasi-3D Computation of the Taylor-Green Vortex Flow Tutorials November 25, 2017 Department of Aeronautics, Imperial College London, UK Scientific Computing and Imaging Institute, University of Utah, USA

More information

Realistic Animation of Fluids

Realistic Animation of Fluids Realistic Animation of Fluids p. 1/2 Realistic Animation of Fluids Nick Foster and Dimitri Metaxas Realistic Animation of Fluids p. 2/2 Overview Problem Statement Previous Work Navier-Stokes Equations

More information

Intermediate Parallel Programming & Cluster Computing

Intermediate Parallel Programming & Cluster Computing High Performance Computing Modernization Program (HPCMP) Summer 2011 Puerto Rico Workshop on Intermediate Parallel Programming & Cluster Computing in conjunction with the National Computational Science

More information

Gradient Free Design of Microfluidic Structures on a GPU Cluster

Gradient Free Design of Microfluidic Structures on a GPU Cluster Gradient Free Design of Microfluidic Structures on a GPU Cluster Austen Duffy - Florida State University SIAM Conference on Computational Science and Engineering March 2, 2011 Acknowledgements This work

More information

COMPUTATIONAL METHODS FOR ENVIRONMENTAL FLUID MECHANICS

COMPUTATIONAL METHODS FOR ENVIRONMENTAL FLUID MECHANICS COMPUTATIONAL METHODS FOR ENVIRONMENTAL FLUID MECHANICS Tayfun Tezduyar tezduyar@rice.edu Team for Advanced Flow Simulation and Modeling (T*AFSM) Mechanical Engineering and Materials Science Rice University

More information

Available online at ScienceDirect. Parallel Computational Fluid Dynamics Conference (ParCFD2013)

Available online at  ScienceDirect. Parallel Computational Fluid Dynamics Conference (ParCFD2013) Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 61 ( 2013 ) 81 86 Parallel Computational Fluid Dynamics Conference (ParCFD2013) An OpenCL-based parallel CFD code for simulations

More information

Lecture 7: Introduction to HFSS-IE

Lecture 7: Introduction to HFSS-IE Lecture 7: Introduction to HFSS-IE 2015.0 Release ANSYS HFSS for Antenna Design 1 2015 ANSYS, Inc. HFSS-IE: Integral Equation Solver Introduction HFSS-IE: Technology An Integral Equation solver technology

More information

Superdiffusion and Lévy Flights. A Particle Transport Monte Carlo Simulation Code

Superdiffusion and Lévy Flights. A Particle Transport Monte Carlo Simulation Code Superdiffusion and Lévy Flights A Particle Transport Monte Carlo Simulation Code Eduardo J. Nunes-Pereira Centro de Física Escola de Ciências Universidade do Minho Page 1 of 49 ANOMALOUS TRANSPORT Definitions

More information

Accepted Manuscript. A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion

Accepted Manuscript. A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion Accepted Manuscript A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion Seungjoon Lee, Ioannis G. Kevrekidis, George Em Karniadakis

More information

Coping with the Ice Accumulation Problems on Power Transmission Lines

Coping with the Ice Accumulation Problems on Power Transmission Lines Coping with the Ice Accumulation Problems on Power Transmission Lines P.N. Shivakumar 1, J.F.Peters 2, R.Thulasiram 3, and S.H.Lui 1 1 Department of Mathematics 2 Department of Electrical & Computer Engineering

More information

Lattice Boltzmann with CUDA

Lattice Boltzmann with CUDA Lattice Boltzmann with CUDA Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Overview of LBM An usage of LBM Algorithm Implementation in CUDA and Optimization

More information

Computational Fluid Dynamics using OpenCL a Practical Introduction

Computational Fluid Dynamics using OpenCL a Practical Introduction 19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Computational Fluid Dynamics using OpenCL a Practical Introduction T Bednarz

More information

Inviscid Flows. Introduction. T. J. Craft George Begg Building, C41. The Euler Equations. 3rd Year Fluid Mechanics

Inviscid Flows. Introduction. T. J. Craft George Begg Building, C41. The Euler Equations. 3rd Year Fluid Mechanics Contents: Navier-Stokes equations Inviscid flows Boundary layers Transition, Reynolds averaging Mixing-length models of turbulence Turbulent kinetic energy equation One- and Two-equation models Flow management

More information

MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP

MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP Vol. 12, Issue 1/2016, 63-68 DOI: 10.1515/cee-2016-0009 MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP Juraj MUŽÍK 1,* 1 Department of Geotechnics, Faculty of Civil Engineering, University

More information

Numerical Analysis of Shock Tube Problem by using TVD and ACM Schemes

Numerical Analysis of Shock Tube Problem by using TVD and ACM Schemes Numerical Analysis of Shock Tube Problem by using TVD and Schemes Dr. Mukkarum Husain, Dr. M. Nauman Qureshi, Syed Zaid Hasany IST Karachi, Email: mrmukkarum@yahoo.com Abstract Computational Fluid Dynamics

More information

Application of STAR-CCM+ to Helicopter Rotors in Hover

Application of STAR-CCM+ to Helicopter Rotors in Hover Application of STAR-CCM+ to Helicopter Rotors in Hover Lakshmi N. Sankar and Chong Zhou School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA Ritu Marpu Eschol CD-Adapco, Inc.,

More information

An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm. Martin Burtscher Department of Computer Science Texas State University-San Marcos

An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm. Martin Burtscher Department of Computer Science Texas State University-San Marcos An Efficient CUDA Implementation of a Tree-Based N-Body Algorithm Martin Burtscher Department of Computer Science Texas State University-San Marcos Mapping Regular Code to GPUs Regular codes Operate on

More information

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Reports of Research Institute for Applied Mechanics, Kyushu University, No.150 (60-70) March 2016 Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Report 2: For the Case

More information

SPH: Why and what for?

SPH: Why and what for? SPH: Why and what for? 4 th SPHERIC training day David Le Touzé, Fluid Mechanics Laboratory, Ecole Centrale de Nantes / CNRS SPH What for and why? How it works? Why not for everything? Duality of SPH SPH

More information

Transition modeling using data driven approaches

Transition modeling using data driven approaches Center for urbulence Research Proceedings of the Summer Program 2014 427 ransition modeling using data driven approaches By K. Duraisamy AND P.A. Durbin An intermittency transport-based model for bypass

More information

Network traffic: Scaling

Network traffic: Scaling Network traffic: Scaling 1 Ways of representing a time series Timeseries Timeseries: information in time domain 2 Ways of representing a time series Timeseries FFT Timeseries: information in time domain

More information

1 Past Research and Achievements

1 Past Research and Achievements Parallel Mesh Generation and Adaptation using MAdLib T. K. Sheel MEMA, Universite Catholique de Louvain Batiment Euler, Louvain-La-Neuve, BELGIUM Email: tarun.sheel@uclouvain.be 1 Past Research and Achievements

More information

Shallow Water Simulations on Graphics Hardware

Shallow Water Simulations on Graphics Hardware Shallow Water Simulations on Graphics Hardware Ph.D. Thesis Presentation 2014-06-27 Martin Lilleeng Sætra Outline Introduction Parallel Computing and the GPU Simulating Shallow Water Flow Topics of Thesis

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System

More information

The Immersed Interface Method

The Immersed Interface Method The Immersed Interface Method Numerical Solutions of PDEs Involving Interfaces and Irregular Domains Zhiiin Li Kazufumi Ito North Carolina State University Raleigh, North Carolina Society for Industrial

More information

Finite Volume Discretization on Irregular Voronoi Grids

Finite Volume Discretization on Irregular Voronoi Grids Finite Volume Discretization on Irregular Voronoi Grids C.Huettig 1, W. Moore 1 1 Hampton University / National Institute of Aerospace Folie 1 The earth and its terrestrial neighbors NASA Colin Rose, Dorling

More information

A Novel Approach to High Speed Collision

A Novel Approach to High Speed Collision A Novel Approach to High Speed Collision Avril Slone University of Greenwich Motivation High Speed Impact Currently a very active research area. Generic projectile- target collision 11 th September 2001.

More information

smooth coefficients H. Köstler, U. Rüde

smooth coefficients H. Köstler, U. Rüde A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin

More information

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA 3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires

More information

Technical Report TR

Technical Report TR Technical Report TR-2015-09 Boundary condition enforcing methods for smoothed particle hydrodynamics Arman Pazouki 1, Baofang Song 2, Dan Negrut 1 1 University of Wisconsin-Madison, Madison, WI, 53706-1572,

More information

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software

Reproducibility of Complex Turbulent Flow Using Commercially-Available CFD Software Reports of Research Institute for Applied Mechanics, Kyushu University No.150 (47 59) March 2016 Reproducibility of Complex Turbulent Using Commercially-Available CFD Software Report 1: For the Case of

More information

LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS

LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS NAVIER-STOKES EQUATIONS u t + u u + 1 ρ p = Ԧg + ν u u=0 WHAT IS COMPUTATIONAL FLUID DYNAMICS? Branch of Fluid Dynamics which uses computer power to approximate

More information

FOURTH ORDER COMPACT FORMULATION OF STEADY NAVIER-STOKES EQUATIONS ON NON-UNIFORM GRIDS

FOURTH ORDER COMPACT FORMULATION OF STEADY NAVIER-STOKES EQUATIONS ON NON-UNIFORM GRIDS International Journal of Mechanical Engineering and Technology (IJMET Volume 9 Issue 10 October 2018 pp. 179 189 Article ID: IJMET_09_10_11 Available online at http://www.iaeme.com/ijmet/issues.asp?jtypeijmet&vtype9&itype10

More information