S7260: Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs
|
|
- Donald Lester
- 5 years ago
- Views:
Transcription
1 S7260: Microswimmers on Speed: Simulating Spheroidal Squirmers on GPUs Elmar Westphal - Forschungszentrum Jülich GmbH
2 Spheroids Spheroid: A volume formed by rotating an ellipse around one of its axes Two kinds: oblate (rotated around its shorter axis), like a pumpkin or teapot prolate (rotated around its longer axis), like an American football 2
3 Squirmers Squirmer is a model for simulating micro swimmers Model origins date back to the 1950s One of today s standard models for self propelled swimmers Can use different means to swim (flagella, arm-like structures etc.), here we simulate the flow caused by surfaces covered with short cilia (filaments) Example: Paramecium bursaria Ein grünes Pantoffeltierchen Frank Fox / license CC BY-SA 3.0 de 3
4 The Simulation Our simulation is split into three parts: Movement of the squirmers and interactions between them Simulation of the liquid Interactions between the squirmers and the liquid 4
5 Simulation of the Squirmers Number of squirmers is low to moderate (up to a few 1000) Simulation of the squirmers and their interactions is sufficiently fast on CPU This may change for future projects 5
6 Simulation of the Fluid We use discrete fluid particles and an algorithm called Multi-Particle Collision Dynamics (MPC) to simulate the fluid MPC inherently conserves energy and momentum of the fluid The phenomena we want to study also require: Conservation of the angular momentum of the fluid particles Adding this roughly doubles the computational effort Walls at one dimension of our system to form a slit This requires additional ghost particles A sufficiently large simulation box for a moderate number of squirmers contains millions of fluid particles 6
7 Simulation of the Fluid We use discrete fluid particles and an algorithm called Multi-Particle Collision Dynamics (MPC) to simulate the fluid MPC inherently conserves energy and momentum of the fluid The phenomena we want to study also require: Conservation of the angular momentum of the fluid particles Adding this roughly doubles the computational effort Walls at one dimension of our system to form a slit This requires additional ghost particles GPU to the Rescue! A sufficiently large simulation box for a moderate number of squirmers contains millions of fluid particles 6
8 Please note that our simulation is in fact a 3-dimensional system. To explain the algorithms, 2D-drawings are used for the sake of simplicity. 7
9 Multi-particle Collision Dynamics (MPC) Fluid particles 8
10 Multi-particle Collision Dynamics (MPC) Fluid particles are moved ballistically One thread per particle, memory-bound 9
11 Multi-particle Collision Dynamics (MPC) Fluid particles are moved ballistically and sorted into cells of a randomly shifted grid One thread per particle, memory-bound 10
12 Multi-particle Collision Dynamics (MPC) Fluid particles are moved ballistically and sorted into cells of a randomly shifted grid For each cell the centre of mass, centre of mass velocity, kinetic energy and angular momentum are computed One thread per particle, atomic-bound, then one thread per cell, memory-bound 11
13 Multi-particle Collision Dynamics (MPC) Fluid particles are moved ballistically and sorted into cells of a randomly shifted grid For each cell the centre of mass, centre of mass velocity, kinetic energy and angular momentum are computed Relative velocities are calculated One thread per particle, memory-bound by random cell-data reads (texture cache) 12
14 Multi-particle Collision Dynamics (MPC) Fluid particles are moved ballistically and sorted into cells of a randomly shifted grid For each cell the centre of mass, centre of mass velocity, kinetic energy and angular momentum are computed Relative velocities are calculated and rotated around a random axis One thread per particle, memory-bound by random cell-data reads (texture cache) 13
15 Multi-particle Collision Dynamics (MPC) Fluid particles are moved ballistically and sorted into cells of a randomly shifted grid For each cell the centre of mass, centre of mass velocity, kinetic energy and angular momentum are computed Relative velocities are calculated and rotated around a random axis Angular momentum and kinetic energy need to be restored several steps using one thread per particle or cell, mostly atomic-bound 14
16 MPC on GPU Is limited by speed of atomic operations and memory bandwidth Implementation uses optimisations as described in some of my earlier GTC talks Reordering of particles to preserve data locality (S2036*) Reducing the number of atomic operations (S5151) *the speed of atomic operations has improved significantly over time, so parts of the implementation described here have been abandoned 15
17 Interactions between Fluid and Squirmers Squirmer surfaces are considered impenetrable for the fluid and are therefore boundaries for the fluid particles Collisions have to be detected Their impact has to be combined and applied to the squirmer and fluid particles accordingly This happens on the GPU (large number of fluid particles), The total impact for each squirmer is passed to and processed by the CPU (low number of squirmers) 16
18 Collisions Between Fluid Particles and Squirmers Fluid particles and Squirmers move during each time-step 17
19 Collisions Between Fluid Particles and Squirmers Fluid particles and Squirmers move during each time-step 17
20 Collisions Between Fluid Particles and Squirmers Fluid particles and Squirmers move during each time-step Squirmer walls are considered impenetrable X 17
21 Collisions Between Fluid Particles and Squirmers Fluid particles and Squirmers move during each time-step Squirmer walls are considered impenetrable Fluid particles entering the squirmer have to be dealt with accordingly 17
22 Collisions Between Fluid Particles and Squirmers Detecting penetration requires rotating particles into the squirmers frame of reference and scaling according to its ratio Checking every fluid particle against every squirmer is too much work, even for a GPU 18
23 Collisions Between Fluid Particles and Squirmers Detecting penetration requires rotating particles into the squirmers frame of reference and scaling according to its ratio Checking every fluid particle against every squirmer is too much work, even for a GPU 18
24 Collisions Between Fluid Particles and Squirmers Detecting penetration requires rotating particles into the squirmers frame of reference and scaling according to its ratio X Checking every fluid particle against every squirmer is too much work, even for a GPU 18
25 Handling Squirmer- Fluid Collisions 19
26 Handling Squirmer- Fluid Collisions The system is divided into cells 19
27 Handling Squirmer- Fluid Collisions The system is divided into cells Regardless of its orientation, a spheroid can not exceed a sphere matching its centre and largest radius 19
28 Handling Squirmer- Fluid Collisions The system is divided into cells Regardless of its orientation, a spheroid can not exceed a sphere matching its centre and largest radius Each squirmer has a list of the cells affected by its surrounding sphere (they may overlap) 19
29 Handling Squirmer- Fluid Collisions The system is divided into cells Regardless of its orientation, a spheroid can not exceed a sphere matching its centre and largest radius Each squirmer has a list of the cells affected by its surrounding sphere (they may overlap) Each cell has a list of the fluid particles it contains 19
30 Handling Squirmer- Fluid Collisions The system is divided into cells Regardless of its orientation, a spheroid can not exceed a sphere matching its centre and largest radius Each squirmer has a list of the cells affected by its surrounding sphere (they may overlap) Each cell has a list of the fluid particles it contains This reduces the number of checks from Nsq x Nfluid to ~Nsq x Vsq* ρ 19
31 Handling Squirmer Surroundings on GPU For each squirmer, a cubical box of cells is defined that contains its surrounding sphere Cells can be conveniently coded into grid- and blockdimensions: x- and y- directions are coded into threadidx z-direction is coded into blockidx.x (limit of 1024 threads per block) Squirmer-index is coded into blockidx.y Affected cells are selected using approximate cell stencils Cells are processed one per thread blockidx.x threadidx.y threadidx.x blockidx.y 20
32 Handling Squirmer Surroundings on GPU Threads from the described grid are used to Select affected cells Check particles from selected cells for collision and Move colliding particles accordingly Sum up collision impact to apply to squirmers Generate random fluid particles inside squirmers to preserve physical properties of the system 21
33 Performance Pitfalls Squirmers and MPC using walls require volumes filled with random particles Their relative impact depends on: Ratio of surface to height of the simulation box (wall algorithm adds an additional cell in one direction) Number and size of squirmers Lack of data locality in squirmer random particles has a severe impact on the performance of the MPC implementation on GPU: Fewer memory accesses can be combined Atomics optimisations depend on data locality 22
34 Computation Time Mostly due to the influence of the spatially unordered random particles inside the squirmers, computation times vary significantly with the number of squirmers (measured on Tesla K80): 4.9 ns per iteration and MPC-particle for large systems with 1 squirmer 13.2 ns per iteration and MPC-particle for large systems with 3692 squirmers Squirmer interactions done on CPU only ~4% of computation time 23
35 General Implementation Heavily templated C++-11 code MPC parts works in 2D, 3D, different precisions and with a variety of options, generating optimised code for different applications and GPU architectures Uses a class template for particle sets to allow mixed precisions and features in a single simulation User managed particle sets can be injected into most steps of the process 24
36 General Implementation Based on a std::vector-like class template using managed memory Replacing the allocator is not enough (members/ operators not defined for device) Easy exchange of data between CPU- and GPU-parts Uses texture caching where applicable Operator templates for CUDA vector-types (float4 etc.) make life a lot easier 25
37 Memory Considerations MPC particles use 52 bytes of memory each MPC cells use 160 bytes of memory each (with angular momentum conservation) Squirmer data uses about 600 bytes of memory Random particles inside squirmers do not count, because they replace particles of the original fluid On recent GPUs, this allows system sizes of up to ~10M cells and ~15K squirmers 26
38 Outlook The next version (currently testing) will definitely feature MPI parallelisation, allowing larger simulations and/or higher speed Hybrid code using template magic to generate CPU or GPU based executables from the same sources (real kernels, not pragma-based) Fewer restraints through the use of even more templates and is prepared for Bringing some (spatial) order to the random particle chaos Bit-true, reproducible calculations 27
39 Example 800 Squirmers in slit, Lx=Lz=300, Ly=7, ~7M fluid particles, 20M timesteps, 2-3 weeks walltime Simulation and rendering courtesy of Mario Theers Further reading: DOI: /C6SM01424K (Paper) Soft Matter, 2016, 12,
40 Example 800 Squirmers in slit, Lx=Lz=300, Ly=7, ~7M fluid particles, 20M timesteps, 2-3 weeks walltime Simulation and rendering courtesy of Mario Theers Further reading: DOI: /C6SM01424K (Paper) Soft Matter, 2016, 12,
41 Thank you for your time Questions? ~100 Squirmers in flow, Lx=300, Ly=Lz=60, ~11M fluid particles, 1M timesteps, 14h walltime Simulation courtesy of Hemalatha Annepu, rendering courtesy of Mario Theers 29
42 Thank you for your time Questions? ~100 Squirmers in flow, Lx=300, Ly=Lz=60, ~11M fluid particles, 1M timesteps, 14h walltime Simulation courtesy of Hemalatha Annepu, rendering courtesy of Mario Theers 29
Computational Fluid Dynamics (CFD) using Graphics Processing Units
Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores
More informationCUDA Particles. Simon Green
CUDA Particles Simon Green sdkfeedback@nvidia.com Document Change History Version Date Responsible Reason for Change 1.0 Sept 19 2007 Simon Green Initial draft Abstract Particle systems [1] are a commonly
More informationS Voting And Shuffling For Fewer Atomic Operations
S5151 - Voting And Shuffling For Fewer Atomic Operations Elmar Westphal, Forschungszentrum Jülich GmbH Contents On atomic operations and speed problems A possible remedy About intra-warp communication
More informationFrom Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133)
From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond Research Fellow University of Sheffield (NVIDIA CUDA Research Centre) Overview Complex
More informationInformation Coding / Computer Graphics, ISY, LiTH. CUDA memory! ! Coalescing!! Constant memory!! Texture memory!! Pinned memory 26(86)
26(86) Information Coding / Computer Graphics, ISY, LiTH CUDA memory Coalescing Constant memory Texture memory Pinned memory 26(86) CUDA memory We already know... Global memory is slow. Shared memory is
More information2006: Short-Range Molecular Dynamics on GPU. San Jose, CA September 22, 2010 Peng Wang, NVIDIA
2006: Short-Range Molecular Dynamics on GPU San Jose, CA September 22, 2010 Peng Wang, NVIDIA Overview The LAMMPS molecular dynamics (MD) code Cell-list generation and force calculation Algorithm & performance
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationProfiling & Tuning Applications. CUDA Course István Reguly
Profiling & Tuning Applications CUDA Course István Reguly Introduction Why is my application running slow? Work it out on paper Instrument code Profile it NVIDIA Visual Profiler Works with CUDA, needs
More informationAbout Phoenix FD PLUGIN FOR 3DS MAX AND MAYA. SIMULATING AND RENDERING BOTH LIQUIDS AND FIRE/SMOKE. USED IN MOVIES, GAMES AND COMMERCIALS.
About Phoenix FD PLUGIN FOR 3DS MAX AND MAYA. SIMULATING AND RENDERING BOTH LIQUIDS AND FIRE/SMOKE. USED IN MOVIES, GAMES AND COMMERCIALS. Phoenix FD core SIMULATION & RENDERING. SIMULATION CORE - GRID-BASED
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More informationLarge and Sparse Mass Spectrometry Data Processing in the GPU Jose de Corral 2012 GPU Technology Conference
Large and Sparse Mass Spectrometry Data Processing in the GPU Jose de Corral 2012 GPU Technology Conference 2012 Waters Corporation 1 Agenda Overview of LC/IMS/MS 3D Data Processing 4D Data Processing
More informationIntroducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method
Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System
More information1/25/12. Administrative
Administrative L3: Memory Hierarchy Optimization I, Locality and Data Placement Next assignment due Friday, 5 PM Use handin program on CADE machines handin CS6235 lab1 TA: Preethi Kotari - Email:
More informationGPU-based Distributed Behavior Models with CUDA
GPU-based Distributed Behavior Models with CUDA Courtesy: YouTube, ISIS Lab, Universita degli Studi di Salerno Bradly Alicea Introduction Flocking: Reynolds boids algorithm. * models simple local behaviors
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationShape of Things to Come: Next-Gen Physics Deep Dive
Shape of Things to Come: Next-Gen Physics Deep Dive Jean Pierre Bordes NVIDIA Corporation Free PhysX on CUDA PhysX by NVIDIA since March 2008 PhysX on CUDA available: August 2008 GPU PhysX in Games Physical
More informationCUDA/OpenGL Fluid Simulation. Nolan Goodnight
CUDA/OpenGL Fluid Simulation Nolan Goodnight ngoodnight@nvidia.com Document Change History Version Date Responsible Reason for Change 0.1 2/22/07 Nolan Goodnight Initial draft 1.0 4/02/07 Nolan Goodnight
More informationGPU programming basics. Prof. Marco Bertini
GPU programming basics Prof. Marco Bertini CUDA: atomic operations, privatization, algorithms Atomic operations The basics atomic operation in hardware is something like a read-modify-write operation performed
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationCUDA Memory Types All material not from online sources/textbook copyright Travis Desell, 2012
CUDA Memory Types All material not from online sources/textbook copyright Travis Desell, 2012 Overview 1. Memory Access Efficiency 2. CUDA Memory Types 3. Reducing Global Memory Traffic 4. Example: Matrix-Matrix
More informationAccelerating SRD Simulation on GPU
Accelerating SRD Simulation on GPU by Zhilu Chen A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE In partial fulfillment of the requirements for the Degree of Master of Science
More informationEfficient Tridiagonal Solvers for ADI methods and Fluid Simulation
Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular
More informationParallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung
More informationModeling Evaporating Liquid Spray
Tutorial 17. Modeling Evaporating Liquid Spray Introduction In this tutorial, the air-blast atomizer model in ANSYS FLUENT is used to predict the behavior of an evaporating methanol spray. Initially, the
More informationCUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata
CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids
More informationGPU Computation Strategies & Tricks. Ian Buck NVIDIA
GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit
More informationRAMSES on the GPU: An OpenACC-Based Approach
RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU
More informationNVIDIA. Interacting with Particle Simulation in Maya using CUDA & Maximus. Wil Braithwaite NVIDIA Applied Engineering Digital Film
NVIDIA Interacting with Particle Simulation in Maya using CUDA & Maximus Wil Braithwaite NVIDIA Applied Engineering Digital Film Some particle milestones FX Rendering Physics 1982 - First CG particle FX
More informationWhat is a Rigid Body?
Physics on the GPU What is a Rigid Body? A rigid body is a non-deformable object that is a idealized solid Each rigid body is defined in space by its center of mass To make things simpler we assume the
More informationLecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1
Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei
More informationOpenACC programming for GPGPUs: Rotor wake simulation
DLR.de Chart 1 OpenACC programming for GPGPUs: Rotor wake simulation Melven Röhrig-Zöllner, Achim Basermann Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU) GPU computing
More informationLATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS
LATTICE-BOLTZMANN AND COMPUTATIONAL FLUID DYNAMICS NAVIER-STOKES EQUATIONS u t + u u + 1 ρ p = Ԧg + ν u u=0 WHAT IS COMPUTATIONAL FLUID DYNAMICS? Branch of Fluid Dynamics which uses computer power to approximate
More informationOptimisation Myths and Facts as Seen in Statistical Physics
Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY
More informationNavier-Stokes & Flow Simulation
Last Time? Navier-Stokes & Flow Simulation Implicit Surfaces Marching Cubes/Tetras Collision Detection & Response Conservative Bounding Regions backtracking fixing Today Flow Simulations in Graphics Flow
More informationCUDA Particles. Simon Green
CUDA Particles Simon Green sdkfeedback@nvidia.com Document Change History Version Date Responsible Reason for Change 1.0 Sept 19 2007 Simon Green Initial draft 1.1 Nov 3 2007 Simon Green Fixed some mistakes,
More informationReal-time Graphics 9. GPGPU
Real-time Graphics 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing
More informationAbstract. Introduction. Kevin Todisco
- Kevin Todisco Figure 1: A large scale example of the simulation. The leftmost image shows the beginning of the test case, and shows how the fluid refracts the environment around it. The middle image
More informationDesigning a Domain-specific Language to Simulate Particles. dan bailey
Designing a Domain-specific Language to Simulate Particles dan bailey Double Negative Largest Visual Effects studio in Europe Offices in London and Singapore Large and growing R & D team Squirt Fluid Solver
More informationAn Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture
An Introduction to GPGPU Pro g ra m m ing - CUDA Arc hitec ture Rafia Inam Mälardalen Real-Time Research Centre Mälardalen University, Västerås, Sweden http://www.mrtc.mdh.se rafia.inam@mdh.se CONTENTS
More informationParticle Systems. Sample Particle System. What is a particle system? Types of Particle Systems. Stateless Particle System
Sample Particle System Particle Systems GPU Graphics Water Fire and Smoke What is a particle system? Types of Particle Systems One of the original uses was in the movie Star Trek II William Reeves (implementor)
More informationAutomatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo
Automatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo 31 August, 2015 Goals Running CUDA code on CPUs. Why? Performance portability! A major challenge faced
More informationTesla Architecture, CUDA and Optimization Strategies
Tesla Architecture, CUDA and Optimization Strategies Lan Shi, Li Yi & Liyuan Zhang Hauptseminar: Multicore Architectures and Programming Page 1 Outline Tesla Architecture & CUDA CUDA Programming Optimization
More informationThis is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC.
David Kirk/NVIDIA and Wen-mei Hwu, 2006-2008 This is a draft chapter from an upcoming CUDA textbook by David Kirk from NVIDIA and Prof. Wen-mei Hwu from UIUC. Please send any comment to dkirk@nvidia.com
More informationGPU & High Performance Computing (by NVIDIA) CUDA. Compute Unified Device Architecture Florian Schornbaum
GPU & High Performance Computing (by NVIDIA) CUDA Compute Unified Device Architecture 29.02.2008 Florian Schornbaum GPU Computing Performance In the last few years the GPU has evolved into an absolute
More informationConvolution Soup: A case study in CUDA optimization. The Fairmont San Jose 10:30 AM Friday October 2, 2009 Joe Stam
Convolution Soup: A case study in CUDA optimization The Fairmont San Jose 10:30 AM Friday October 2, 2009 Joe Stam Optimization GPUs are very fast BUT Naïve programming can result in disappointing performance
More information6. Parallel Volume Rendering Algorithms
6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks
More informationCGT 581 G Fluids. Overview. Some terms. Some terms
CGT 581 G Fluids Bedřich Beneš, Ph.D. Purdue University Department of Computer Graphics Technology Overview Some terms Incompressible Navier-Stokes Boundary conditions Lagrange vs. Euler Eulerian approaches
More informationJohn W. Romein. Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands
Signal Processing on GPUs for Radio Telescopes John W. Romein Netherlands Institute for Radio Astronomy (ASTRON) Dwingeloo, the Netherlands 1 Overview radio telescopes six radio telescope algorithms on
More informationModeling Evaporating Liquid Spray
Tutorial 16. Modeling Evaporating Liquid Spray Introduction In this tutorial, FLUENT s air-blast atomizer model is used to predict the behavior of an evaporating methanol spray. Initially, the air flow
More informationVector Addition on the Device: main()
Vector Addition on the Device: main() #define N 512 int main(void) { int *a, *b, *c; // host copies of a, b, c int *d_a, *d_b, *d_c; // device copies of a, b, c int size = N * sizeof(int); // Alloc space
More informationImage convolution with CUDA
Image convolution with CUDA Lecture Alexey Abramov abramov _at_ physik3.gwdg.de Georg-August University, Bernstein Center for Computational Neuroscience, III Physikalisches Institut, Göttingen, Germany
More informationDevice Memories and Matrix Multiplication
Device Memories and Matrix Multiplication 1 Device Memories global, constant, and shared memories CUDA variable type qualifiers 2 Matrix Multiplication an application of tiling runningmatrixmul in the
More informationAnatomy of a Physics Engine. Erwin Coumans
Anatomy of a Physics Engine Erwin Coumans erwin_coumans@playstation.sony.com How it fits together» Terminology» Rigid Body Dynamics» Collision Detection» Software Design Decisions» Trip through the Physics
More informationECE 574 Cluster Computing Lecture 15
ECE 574 Cluster Computing Lecture 15 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 March 2017 HW#7 (MPI) posted. Project topics due. Update on the PAPI paper Announcements
More informationSimulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method
Simulation of Liquid-Gas-Solid Flows with the Lattice Boltzmann Method June 21, 2011 Introduction Free Surface LBM Liquid-Gas-Solid Flows Parallel Computing Examples and More References Fig. Simulation
More informationAccelerating the acceleration search a case study. By Chris Laidler
Accelerating the acceleration search a case study By Chris Laidler Optimization cycle Assess Test Parallelise Optimise Profile Identify the function or functions in which the application is spending most
More informationG P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G
Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty
More informationGAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD
GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES TAKAHIRO HARADA, AMD DESTRUCTION FOR GAMES ERWIN COUMANS, AMD GAME PROGRAMMING ON HYBRID CPU-GPU ARCHITECTURES Jason Yang, Takahiro Harada AMD HYBRID CPU-GPU
More informationCUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.
Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication
More informationReview. Lecture 10. Today s Outline. Review. 03b.cu. 03?.cu CUDA (II) Matrix addition CUDA-C API
Review Lecture 10 CUDA (II) host device CUDA many core processor threads thread blocks grid # threads >> # of cores to be efficient Threads within blocks can cooperate Threads between thread blocks cannot
More informationHigh Performance Computing and GPU Programming
High Performance Computing and GPU Programming Lecture 1: Introduction Objectives C++/CPU Review GPU Intro Programming Model Objectives Objectives Before we begin a little motivation Intel Xeon 2.67GHz
More informationGPGPUGPGPU: Multi-GPU Programming
GPGPUGPGPU: Multi-GPU Programming Fall 2012 HW4 global void cuda_transpose(const float *ad, const int n, float *atd) { } int i = threadidx.y + blockidx.y*blockdim.y; int j = threadidx.x + blockidx.x*blockdim.x;
More informationUberFlow: A GPU-Based Particle Engine
UberFlow: A GPU-Based Particle Engine Peter Kipfer Mark Segal Rüdiger Westermann Technische Universität München ATI Research Technische Universität München Motivation Want to create, modify and render
More informationReal-time Graphics 9. GPGPU
9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing GPGPU general-purpose
More informationOptimizing CUDA for GPU Architecture. CSInParallel Project
Optimizing CUDA for GPU Architecture CSInParallel Project August 13, 2014 CONTENTS 1 CUDA Architecture 2 1.1 Physical Architecture........................................... 2 1.2 Virtual Architecture...........................................
More informationUsing GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation
Using GPUs to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation GPU Technology Conference 2012 May 15, 2012 Thomas M. Benson, Daniel P. Campbell, Daniel A. Cook thomas.benson@gtri.gatech.edu
More informationHigh Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters
SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014
More informationDense Linear Algebra. HPC - Algorithms and Applications
Dense Linear Algebra HPC - Algorithms and Applications Alexander Pöppl Technical University of Munich Chair of Scientific Computing November 6 th 2017 Last Tutorial CUDA Architecture thread hierarchy:
More informationMatrix Multiplication in CUDA. A case study
Matrix Multiplication in CUDA A case study 1 Matrix Multiplication: A Case Study Matrix multiplication illustrates many of the basic features of memory and thread management in CUDA Usage of thread/block
More informationGPU Tutorial 2: Integrating GPU Computing into a Project
GPU Tutorial 2: Integrating GPU Computing into a Project Summary This tutorial extends the concept of GPU computation, introduced in the previous tutorial. We consider the specifics of integrating GPU
More informationModule 3: CUDA Execution Model -I. Objective
ECE 8823A GPU Architectures odule 3: CUDA Execution odel -I 1 Objective A more detailed look at kernel execution Data to thread assignment To understand the organization and scheduling of threads Resource
More informationLecture 2: CUDA Programming
CS 515 Programming Language and Compilers I Lecture 2: CUDA Programming Zheng (Eddy) Zhang Rutgers University Fall 2017, 9/12/2017 Review: Programming in CUDA Let s look at a sequential program in C first:
More informationExploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology
Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation
More informationNavier-Stokes & Flow Simulation
Last Time? Navier-Stokes & Flow Simulation Pop Worksheet! Teams of 2. Hand in to Jeramey after we discuss. Sketch the first few frames of a 2D explicit Euler mass-spring simulation for a 2x3 cloth network
More informationOutline. L5: Writing Correct Programs, cont. Is this CUDA code correct? Administrative 2/11/09. Next assignment (a homework) given out on Monday
Outline L5: Writing Correct Programs, cont. 1 How to tell if your parallelization is correct? Race conditions and data dependences Tools that detect race conditions Abstractions for writing correct parallel
More informationLECTURE 4. Announcements
LECTURE 4 Announcements Retries Email your grader email your grader email your grader email your grader email your grader email your grader email your grader email your grader email your grader email your
More informationHow to Optimize Geometric Multigrid Methods on GPUs
How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient
More informationGPU Computing: Particle Simulation
GPU Computing: Particle Simulation Dipl.-Ing. Jan Novák Dipl.-Inf. Gábor Liktor Prof. Dr.-Ing. Carsten Dachsbacher Abstract In this assignment we will learn how to implement two simple particle systems
More informationSimulation of Flow Development in a Pipe
Tutorial 4. Simulation of Flow Development in a Pipe Introduction The purpose of this tutorial is to illustrate the setup and solution of a 3D turbulent fluid flow in a pipe. The pipe networks are common
More informationExample 1: Color-to-Grayscale Image Processing
GPU Teaching Kit Accelerated Computing Lecture 16: CUDA Parallelism Model Examples Example 1: Color-to-Grayscale Image Processing RGB Color Image Representation Each pixel in an image is an RGB value The
More informationMassively Parallel Architectures
Massively Parallel Architectures A Take on Cell Processor and GPU programming Joel Falcou - LRI joel.falcou@lri.fr Bat. 490 - Bureau 104 20 janvier 2009 Motivation The CELL processor Harder,Better,Faster,Stronger
More informationEXAMINATIONS 2017 TRIMESTER 2
EXAMINATIONS 2017 TRIMESTER 2 CGRA 151 INTRODUCTION TO COMPUTER GRAPHICS Time Allowed: TWO HOURS CLOSED BOOK Permitted materials: Silent non-programmable calculators or silent programmable calculators
More informationLecture 9. Outline. CUDA : a General-Purpose Parallel Computing Architecture. CUDA Device and Threads CUDA. CUDA Architecture CUDA (I)
Lecture 9 CUDA CUDA (I) Compute Unified Device Architecture 1 2 Outline CUDA Architecture CUDA Architecture CUDA programming model CUDA-C 3 4 CUDA : a General-Purpose Parallel Computing Architecture CUDA
More informationGPU-accelerated data expansion for the Marching Cubes algorithm
GPU-accelerated data expansion for the Marching Cubes algorithm San Jose (CA) September 23rd, 2010 Christopher Dyken, SINTEF Norway Gernot Ziegler, NVIDIA UK Agenda Motivation & Background Data Compaction
More informationParallelising Pipelined Wavefront Computations on the GPU
Parallelising Pipelined Wavefront Computations on the GPU S.J. Pennycook G.R. Mudalige, S.D. Hammond, and S.A. Jarvis. High Performance Systems Group Department of Computer Science University of Warwick
More informationRigid Body Dynamics, Collision Response, & Deformation
Rigid Body Dynamics, Collision Response, & Deformation Pop Worksheet! Teams of 2. SOMEONE YOU HAVEN T ALREADY WORKED WITH What are the horizontal and face velocities after 1, 2, and many iterations of
More informationPARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON. Pawe l Wróblewski, Krzysztof Boryczko
Computing and Informatics, Vol. 28, 2009, 139 150 PARALLEL SIMULATION OF A FLUID FLOW BY MEANS OF THE SPH METHOD: OPENMP VS. MPI COMPARISON Pawe l Wróblewski, Krzysztof Boryczko Department of Computer
More informationL17: Lessons from Particle System Implementations
L17: Lessons from Particle System Implementations Administrative Still missing some design reviews - Please email to me slides from presentation - And updates to reports - By Thursday, Apr 16, 5PM Grading
More informationWarps and Reduction Algorithms
Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum
More informationLearn CUDA in an Afternoon. Alan Gray EPCC The University of Edinburgh
Learn CUDA in an Afternoon Alan Gray EPCC The University of Edinburgh Overview Introduction to CUDA Practical Exercise 1: Getting started with CUDA GPU Optimisation Practical Exercise 2: Optimising a CUDA
More informationVirtual Environments
ELG 524 Virtual Environments Jean-Christian Delannoy viewing in 3D world coordinates / geometric modeling culling lighting virtualized reality collision detection collision response interactive forces
More informationCS 314 Principles of Programming Languages
CS 314 Principles of Programming Languages Zheng Zhang Fall 2016 Dec 14 GPU Programming Rutgers University Programming with CUDA Compute Unified Device Architecture (CUDA) Mapping and managing computations
More informationCE 530 Molecular Simulation
1 CE 530 Molecular Simulation Lecture 3 Common Elements of a Molecular Simulation David A. Kofke Department of Chemical Engineering SUNY Buffalo kofke@eng.buffalo.edu 2 Boundary Conditions Impractical
More informationAccelerating Octo-Tiger: Stellar Mergers on Intel Knights Landing with HPX
Accelerating Octo-Tiger: Stellar Mergers on Intel Knights Landing with HPX David Pfander*, Gregor Daiß*, Dominic Marcello**, Hartmut Kaiser**, Dirk Pflüger* * University of Stuttgart ** Louisiana State
More informationMemory concept. Grid concept, Synchronization. GPU Programming. Szénási Sándor.
Memory concept Grid concept, Synchronization GPU Programming http://cuda.nik.uni-obuda.hu Szénási Sándor szenasi.sandor@nik.uni-obuda.hu GPU Education Center of Óbuda University MEMORY CONCEPT Off-chip
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationGPGPU LAB. Case study: Finite-Difference Time- Domain Method on CUDA
GPGPU LAB Case study: Finite-Difference Time- Domain Method on CUDA Ana Balevic IPVS 1 Finite-Difference Time-Domain Method Numerical computation of solutions to partial differential equations Explicit
More informationB. Tech. Project Second Stage Report on
B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code
More informationFrom Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with LibGeoDecomp
From Notebooks to Supercomputers: Tap the Full Potential of Your CUDA Resources with andreas.schaefer@cs.fau.de Friedrich-Alexander-Universität Erlangen-Nürnberg GPU Technology Conference 2013, San José,
More information