A Massively Parallel Two-Phase Solver for Incompressible Fluids on Multi-GPU Clusters
|
|
- Nigel Parks
- 5 years ago
- Views:
Transcription
1 A Massively Parallel Two-Phase Solver for Incompressible Fluids on Multi-GPU Clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn GPU Technology Conference 2012
2 Outline 1 Introduction 2 Multi-GPU Fluid Solver 3 Uncertainty Quantification in CFD
3 Two-phase flows major topic in computational fluid dynamics simulating interaction of two fluids like air & water, water & oil interesting small-scale phenomena: surface tension effects, droplet deformation, bubble dynamics large-scale studies: ship construction, river simulation
4 NaSt3DGPF - A 3D two-phase Navier-Stokes solver 3D grid-based fluid solver for the non-stationary two-phase incompressible Navier-Stokes equations simulation of two interacting fluids (e.g. air, water) with high-order methods running on HPC clusters
5 Application: Flows through hydraulic constructions Real-world sluice system was successfully modeled and simulated Joint project with German Federal Waterways Engineering and Research Institute
6 Bionics-motivated textile design development of textiles and materials which can store oil model in nature: specialized bees that transport oil
7 Application: Fluid animation Coupling NaSt3DGPF and Autodesk Maya integration of full fluid simulation pipeline in Maya graphical setup of simulation scenes automatic parallel solver run integration of computed fluid simulations photo-realistic rendering of fluid surfaces Z., Griebel: Photorealistic Visualization and Fluid Animation [...], Comp. Vis. Sci., 2012, accepted.
8 Outline 1 Introduction 2 Multi-GPU Fluid Solver 3 Uncertainty Quantification in CFD
9 NaSt3DGPF - A 3D two-phase Navier-Stokes solver We have ported our in-house fluid solver to the GPU 3D grid-based fluid solver for the non-stationary two-phase incompressible Navier-Stokes equations simulation of two interacting fluids (e.g. air, water) FD solver based on Chorin s pressure projection approach high-order space discretizations: e.g. WENO 5 th time discretizations: Runge-Kutta 3 rd, Adams-Bashforth Jacobi-preconditioned conjugate gradient solver for Poisson equation complex geometries with different boundary conditions multi-gpu MPI parallelization by domain decomposition Griebel, Z.: A multi-gpu accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations. Comp. Sci. Res. Dev., 25(1-2):65-73, May Z., Griebel: Solving incompressible two-phase flows on multi-gpu clusters. Comp. Fluids, accept.
10 Core technique for two-phase flows: Level-set method Level-set method Representation of a free surface Γ t by a signed distance function φ R 3 R R: Γ t = {x φ(x, t) = 0} φ = 1 Fluid phase distinction by sign of level set function: Ω 2 : φ < 0 φ(x, t) > 0 for x Ω 1 φ(x, t) 0 for x Ω 2 Ω 1 : φ > 0
11 Two-phase Navier-Stokes equations PDE system u p φ ρ fluid velocity pressure level set function density ρ(φ)( t u + (u )u) = (µ(φ)s) p u = 0 t φ + u φ = 0 µ dynamic viscosity S stress tensor σ surface tension g volume forces S := u + { u} T ρ(φ) := ρ 2 + (ρ 1 ρ 2 ) H(φ) µ(φ) := µ 2 + (µ 1 µ 2 ) H(φ) σκ(φ)δ(φ) φ + ρ(φ)g 0 if φ < 0 1 H(φ) := 2 if φ = 0 1 if φ > 0 κ δ local curvature of fluid surface Dirac deltafunctional
12 Solver algorithm based on pressure projection For t = 1, 2,... do: 1 set boundary conditions for u n 2 compute intermediate velocity field u : u u n δt = (u n )u n + g + 1 ρ(φ n ) (µ(φn )S n ) 1 ρ(φ n ) σκ(φn )δ(φ n ) φ n 3 apply boundary conditions and transport level-set function: 4 reinitialize level-set function by solving φ = φ n + δt (u n φ n ) τ d + sign(φ )( d 1) = 0, 5 solve the pressure Poisson equation with φ n+1 = d: ( ) δt ρ(φ n+1 ) pn+1 = u 6 apply velocity correction: u n+1 = u δt ρ(φ n+1 ) pn+1 d 0 = φ
13 Solver algorithm based on pressure projection For t = 1, 2,... do: 1 set boundary conditions for u n 2 compute intermediate velocity field u : u u n δt = (u n )u n + g + 1 ρ(φ n ) (µ(φn )S n ) 1 ρ(φ n ) σκ(φn )δ(φ n ) φ n 3 apply boundary conditions and transport level-set function: 4 reinitialize level-set function by solving This φ = φis n + now δt (udone n φ n ) on multiple GPUs. τ d + sign(φ )( d 1) = 0, 5 solve the pressure Poisson equation with φ n+1 = d: ( ) δt ρ(φ n+1 ) pn+1 = u 6 apply velocity correction: u n+1 = u δt ρ(φ n+1 ) pn+1 d 0 = φ
14 CPU GPU porting process Our approach 1 identification of most time consuming parts of CPU code good starting point 2 stepwise porting with full CPU GPU data copy before and after GPU computation and per-method memory allocation 3 continuously: GPU code validation for each porting step 4 step-wise unification of data fields and reduction of CPU GPU data transfers 5 overall optimization Advantage first results within short period of time easy code validation during porting process
15 Design principles of the GPU code (1) General CUDA as GPU programming framework full double precision implementation linearization of 3D data fields Memory hierarchies use global memory wherever acceptable low algorithmic complexity L1 / L2 caches more and more popular and faster optimization with shared memory for most time-critical parts shmem-based parallel reduction used from SDK Compute configuration for maximized GPU occupancy use of maximum number of threads supported by symmetric multiprocessor (SM)
16 Design principles of the GPU code (2) Compute-intensive kernels high instruction count per kernel high register usage = slow kernels (example: WENO stencil) solution: precompute some parts in other kernel What remains on CPU? configuration file parser binary/visualization data file input/output parallel communication
17 Multi-GPU parallelization by domain decomposition Multi-GPU parallelization fully integrated with distributed memory MPI parallelization of CPU code: 1 GPU 1 CPU core
18 Optimizing multi-gpu data exchanges buffer on GPU buffer CPU RAM buffer on GPU on GPU buffer on GPU buffer CPU RAM Prepacking of boundary data buffer on GPU on GPU Matrix-vector product on inner cells Exchange boundary data Ax Results Matrix-vector product on boundary cells Ax Overlapping communication and computation (PCG solver)
19 Results
20 Benchmarking problem: air bubble rising in water Properties domain size: liquid phase: gas phase: surface tension: volume forces: initial air bubble radius: initial center position of bubble: 20 cm 20 cm 20 cm water at 20 o C air at 20 o C standard standard gravity 3 cm (10 cm, 6 cm, 10 cm)
21 Performance measurements for GPUs Perfectly fair CPU-GPU benchmarks are very hard! 1 GPU vs. 1 CPU core + good GPU results CPU speed unclear not realistic wrt. price Performance per dollar ++ best price realism price per node / CPU? prices subject to changes 1 GPU vs. 1 CPU socket + better price realism # of cores per socket? speed per CPU core? Performance per Watt ++ Green IT + power costs high influence of config.
22 Benchmarking platforms CPU Hardware dual-6-core Intel Xeon E5650 CPU 2.67 GHz 24 GB DDR3-RAM GPU Cluster (8 GT200 GPUs) 2 workstations with 4-core Intel Core i7-920 CPU 2.66 GHz 12 GB DDR3-RAM NVIDIA Tesla S1070 (4 GPUs) InfiniBand 40G QDR ConnectX Ubuntu Linux bit GCC compiler, CUDA 3.2 SDK, OpenMPI GPU Hardware (GF100 Fermi) 4-core Intel Xeon E5620 CPU 2.40 GHz 6 GB DDR3-RAM NVIDIA Tesla C2050 GPU GPU Cluster (48 GF100 RWTH Aachen University, Germany 24 double-gpu systems with 2 Intel Xeon X GHz 2 NVIDIA Quadro 6000 GPUs QDR InfiniBand Scientific Linux 6.1, GCC 4.4.5, OpenMPI 1.5.3, CUDA
23 Performance per dollar Speed-up on one GPU GT200 GPU vs. 6-core Xeon CPU GF100 GPU w. ECC vs. dual 6-core Xeon CPU GF100 GPU w/o ECC vs. dual 6-core Xeon CPU Simulation Grid Resolution Costs NVIDIA Tesla C1060 GPU similar to 6-core Xeon CPU new NVIDIA Tesla C2050 Fermi GPU similar to two 6-core Xeon CPUs 1 core vs. 1 GPU (w. ECC) > 36x speedup 4-core socket vs. 1 GPU (w. ECC) > 9x speedup
24 Performance per Watt Power consumption in kwh dual 6-core Xeon CPU 8 GT200 GPUs GF100 GPU with ECC GF100 GPU w/o ECC Grid resolution Fermi-type GPU more than two times more power-efficient
25 Multi-GPU performance (GT200 GPUs) Strong scaling speed-up relative to one GT200 GPU grid resolution (w/o overlap) grid resolution (w. overlap) Number of GPUs Weak scaling efficiency (in %) relative to one GPU Weak scaling efficiency (in %) relative to one GPU grid resolution per GPU (w/o overlap) grid resolution per GPU (w. overlap) grid resolution per GPU (w/o overlap) grid resolution per GPU (w. overlap) Number of GT200 GPUs strong scaling / speedup weak scaling / scale-up
26 Multi-GPU performance (Fermi GPUs) Strong scaling / speed-up relative to one Fermi GPU grid resolution Number of Fermi GPUs Weak scaling efficiency (in percent) relative to one GPU grid resolution per GPU Number of Fermi GPUs strong scaling / speed-up weak scaling / scale-up
27 Outline 1 Introduction 2 Multi-GPU Fluid Solver 3 Uncertainty Quantification in CFD
28 Uncertainty Quantification (UQ) in CFD Current standard CFD simulations for fixed and known input parameters Problem nature phenomena: input data not known exactly engineering: constructions / measurements always subject to perturbations Solution numerical modeling and analysis of uncertainties Uncertainty Quantification joint project with M. Griebel and C. Rieger
29 High-level idea of standard UQ methods Algorithmic idea: 1 sampling of stochastic input parameters according to some distribution 2 computation of hundreds or thousands of stochastic realizations (high-resolution simulations) 3 extraction of averaged data (expectation value), variance information,... as post-processing step non-intrusive method all available solvers usable without reprogramming Challenge: computation of realizations highly parallel but extremely time-intensive handling of enormous data quantities in post-processing step optimal setting for multi-gpu clusters in Exascale
30 Stochastic formulation of the Naver-Stokes equations Find stochastic functions u : D R + Ω R 3, p : D R + Ω R such that P-almost surely: 1 u(x, t, ω)+u(x, t, ω) u(x, t, ω)+ ( p(x, t, ω) µ(ω) u(x, t, ω)) = f(x, ω) t ρ(ω) on D u(x, t, ω) = 0 on D u(x, t, ω) = g(x, ω) on D x D R 3 physical domain (Ω, F, P) complete probability space Ω set of outcomes, F 2 Ω σ-algebra of events, P : F [0, 1] probability measure ρ, µ : Ω R, f, g : Ω R 3 stochastic functions
31 General setting Formulation for general setting Find stochastic functions u : D R + Ω R l such that P-almost surely: L(a(x, t, ω))u(x, t, ω) = f (x, t, ω) for all x D R d, t R + +BC L general operator depending on parameters modeled by random field a a, f : D R + Ω R s stochastic function x D R 3 physical domain (Ω, F, P) complete probability space Ω set of outcomes, F 2 Ω σ-algebra of events, P : F [0, 1] probability measure
32 Finite noise assumption Introduction of finite number of random variables: Γ M : Ω R M, ω (Y 1 (ω),..., Y M (ω)) T = (y (1),..., y (M) ) T =: y, M N Ω M := Γ M (Ω), (Ω, F, P) (Ω M, B(Ω M ), ρ M (y)), ρ M joint probability density E[Y i ] = 0, E[Y i Y j ] = δ i,j for all i, j = 1,..., M Finite dimensional representation of stochastic space: a(x, t, ω) a M (x, t, Γ M (ω)) = a M (x, t, y) f (x, t, ω) f M (x, t, Γ M (ω)) = f M (x, t, y) With further assumptions, solution can be given as: u(x, t, ω) u M (x, t, Γ M (ω)) = u M (x, t, y)
33 Stochastic RBF collocation 1. generate N(Ω) stochastic samples (collocation points) wrt. some distribution X N(Ω) := {y 1,..., y N(Ω) } Ω M 2. solve N(Ω) discretized deterministic problems L (h) (a (h) M (x, t, y i))u (h) M (x, t, y i) = f (h) M (x, t, y i) for all x D R d, t R + 3. compute Lagrange basis {l i } in trial space V XN(Ω) := span{k(, y i ) y i X N(Ω) } {l i (y)} N(Ω) i=1, l i(y) V XN(Ω), l i (y j ) = δ i,j K(, ) RBF kernel, i.e. K(y i, y j ) = k( y i y j ), 4. interpolate solution N(Ω) M (x, t, y) = u (h) (x, t, y i )l i (y) u (h) i=1 e.g. k(x) = e αx2
34 Computation of Lagrange basis {l i (y)} N(Ω) i=1 Given: samples in stochastic space: X N(Ω) := {y 1,..., y N(Ω) } Ω M Target: Lagrange basis {l i (y)} N(Ω) i=1 in trial space V XN(Ω) := span{k(, y i ) y i X N(Ω) } l i (y) V XN(Ω), l i (y j ) = δ i,j, K(y i, y j ) = k( y i y j ), k(x) = e αx2 Evaluation of Lagrange basis at point q Ω M l(q) K(y 1, y 1 ) K(y 1, y N(Ω) ) A = K(y N(Ω), y 1 ) K(y N(Ω), y N(Ω) ) Solution of a linear system for each q, l(q) = A l(q) = R(q) l 1(q). l N(Ω) (q), R(q) = K(q, y 1 ). K(q, y N(Ω) )
35 For now: expectation value Remember Expectation value N(Ω) M (x, t, y) = u (h) (x, t, y i )l i (y) u (h) i=1 E(u (h) E(u (h) M (x, t, y)) = Ω M u (h) (x, t, y i )ρ M (y)dy N(Ω) M (x, t, y)) = u (h) (x, t, y i )E(l i (y)), E(l i (y)) = l i (y)ρ M (y)dy Ω M i=1 Monte Carlo using quadrature points Q = {q 1,..., q Q } Ω M gives E(l i (y)) 1 l i (q)ρ M (y) Q q Q
36 Implementation Framework configuration files for independent fluid solver runs generated by Python scripts simulations performed with full multi-gpu code on 36 Nvidia Tesla M2090 cluster at our institute stochastics as post-processing on GPU using VTK data files produced by fluid solver GPU programming Nvidia CUDA 4.1 Thrust (transforms, reductions,... ) CULA Dense R14 (double precision LU factorization) CUBLAS (dot product with stride, not available in Thrust) custom GPU kernels (matrix setup, integration,... )
37 First results: Uncertainty in reattachment length
38 First results: Mixing in bubble reactors
39 Summary NaSt3DGPF solves the two-phase incompressible Navier-Stokes equations Code scales well on multi-gpu clusters Uncertainty Quantification profits from GPUs Thank you!
Two-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationA multi-gpu accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations
Computer Science - Research and Development manuscript No. (will be inserted by the editor) A multi-gpu accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations Michael
More informationSolving Incompressible Two-Phase Flows on Multi-GPU Clusters
Wegelerstraße 6 53115 Bonn Germany phone +49 228 73-3427 fax +49 228 73-7527 www.ins.uni-bonn.de P. Zaspel, M. Griebel Solving Incompressible Two-Phase Flows on Multi-GPU Clusters INS Preprint No. 1113
More informationGradient Free Design of Microfluidic Structures on a GPU Cluster
Gradient Free Design of Microfluidic Structures on a GPU Cluster Austen Duffy - Florida State University SIAM Conference on Computational Science and Engineering March 2, 2011 Acknowledgements This work
More informationHigh-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs
High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea
More informationFINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION. M. Panchatcharam 1, S. Sundar 2
International Journal of Applied Mathematics Volume 25 No. 4 2012, 547-557 FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION M. Panchatcharam 1, S. Sundar 2 1,2 Department of Mathematics
More informationStudy and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou
Study and implementation of computational methods for Differential Equations in heterogeneous systems Asimina Vouronikoy - Eleni Zisiou Outline Introduction Review of related work Cyclic Reduction Algorithm
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More information2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123
2.7 Cloth Animation 320491: Advanced Graphics - Chapter 2 123 Example: Cloth draping Image Michael Kass 320491: Advanced Graphics - Chapter 2 124 Cloth using mass-spring model Network of masses and springs
More informationcuibm A GPU Accelerated Immersed Boundary Method
cuibm A GPU Accelerated Immersed Boundary Method S. K. Layton, A. Krishnan and L. A. Barba Corresponding author: labarba@bu.edu Department of Mechanical Engineering, Boston University, Boston, MA, 225,
More informationCenter for Computational Science
Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,
More informationParallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung
More informationPerformance of Implicit Solver Strategies on GPUs
9. LS-DYNA Forum, Bamberg 2010 IT / Performance Performance of Implicit Solver Strategies on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Abstract: The increasing power of GPUs can be used
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationParticleworks: Particle-based CAE Software fully ported to GPU
Particleworks: Particle-based CAE Software fully ported to GPU Introduction PrometechVideo_v3.2.3.wmv 3.5 min. Particleworks Why the particle method? Existing methods FEM, FVM, FLIP, Fluid calculation
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016
ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationLarge-scale Gas Turbine Simulations on GPU clusters
Large-scale Gas Turbine Simulations on GPU clusters Tobias Brandvik and Graham Pullan Whittle Laboratory University of Cambridge A large-scale simulation Overview PART I: Turbomachinery PART II: Stencil-based
More informationInternational Supercomputing Conference 2009
International Supercomputing Conference 2009 Implementation of a Lattice-Boltzmann-Method for Numerical Fluid Mechanics Using the nvidia CUDA Technology E. Riegel, T. Indinger, N.A. Adams Technische Universität
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationMESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP
Vol. 12, Issue 1/2016, 63-68 DOI: 10.1515/cee-2016-0009 MESHLESS SOLUTION OF INCOMPRESSIBLE FLOW OVER BACKWARD-FACING STEP Juraj MUŽÍK 1,* 1 Department of Geotechnics, Faculty of Civil Engineering, University
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More informationVery fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards
Very fast simulation of nonlinear water waves in very large numerical wave tanks on affordable graphics cards By Allan P. Engsig-Karup, Morten Gorm Madsen and Stefan L. Glimberg DTU Informatics Workshop
More informationEfficient Tridiagonal Solvers for ADI methods and Fluid Simulation
Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular
More informationShape of Things to Come: Next-Gen Physics Deep Dive
Shape of Things to Come: Next-Gen Physics Deep Dive Jean Pierre Bordes NVIDIA Corporation Free PhysX on CUDA PhysX by NVIDIA since March 2008 PhysX on CUDA available: August 2008 GPU PhysX in Games Physical
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationANSYS HPC Technology Leadership
ANSYS HPC Technology Leadership 1 ANSYS, Inc. November 14, Why ANSYS Users Need HPC Insight you can t get any other way It s all about getting better insight into product behavior quicker! HPC enables
More informationANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011
ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include
More informationThe 3D DSC in Fluid Simulation
The 3D DSC in Fluid Simulation Marek K. Misztal Informatics and Mathematical Modelling, Technical University of Denmark mkm@imm.dtu.dk DSC 2011 Workshop Kgs. Lyngby, 26th August 2011 Governing Equations
More informationAutomated Finite Element Computations in the FEniCS Framework using GPUs
Automated Finite Element Computations in the FEniCS Framework using GPUs Florian Rathgeber (f.rathgeber10@imperial.ac.uk) Advanced Modelling and Computation Group (AMCG) Department of Earth Science & Engineering
More informationGPU Implementation of Elliptic Solvers in NWP. Numerical Weather- and Climate- Prediction
1/8 GPU Implementation of Elliptic Solvers in Numerical Weather- and Climate- Prediction Eike Hermann Müller, Robert Scheichl Department of Mathematical Sciences EHM, Xu Guo, Sinan Shi and RS: http://arxiv.org/abs/1302.7193
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationMatrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs
Iterative Solvers Numerical Results Conclusion and outlook 1/18 Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Eike Hermann Müller, Robert Scheichl, Eero Vainikko
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationApplication of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures
Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application
More informationSampling Using GPU Accelerated Sparse Hierarchical Models
Sampling Using GPU Accelerated Sparse Hierarchical Models Miroslav Stoyanov Oak Ridge National Laboratory supported by Exascale Computing Project (ECP) exascaleproject.org April 9, 28 Miroslav Stoyanov
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More informationA TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE
A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA
More informationAmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationDroplet collisions using a Level Set method: comparisons between simulation and experiments
Computational Methods in Multiphase Flow III 63 Droplet collisions using a Level Set method: comparisons between simulation and experiments S. Tanguy, T. Ménard & A. Berlemont CNRS-UMR6614-CORIA, Rouen
More informationFEM techniques for interfacial flows
FEM techniques for interfacial flows How to avoid the explicit reconstruction of interfaces Stefan Turek, Shu-Ren Hysing (ture@featflow.de) Institute for Applied Mathematics University of Dortmund Int.
More informationRealtime Water Simulation on GPU. Nuttapong Chentanez NVIDIA Research
1 Realtime Water Simulation on GPU Nuttapong Chentanez NVIDIA Research 2 3 Overview Approaches to realtime water simulation Hybrid shallow water solver + particles Hybrid 3D tall cell water solver + particles
More informationMultigrid Solvers in CFD. David Emerson. Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK
Multigrid Solvers in CFD David Emerson Scientific Computing Department STFC Daresbury Laboratory Daresbury, Warrington, WA4 4AD, UK david.emerson@stfc.ac.uk 1 Outline Multigrid: general comments Incompressible
More informationCUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation
CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark
More informationApplication of GPU technology to OpenFOAM simulations
Application of GPU technology to OpenFOAM simulations Jakub Poła, Andrzej Kosior, Łukasz Miroslaw jakub.pola@vratis.com, www.vratis.com Wroclaw, Poland Agenda Motivation Partial acceleration SpeedIT OpenFOAM
More informationCUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata
CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids
More informationAdaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA
Adaptive Mesh Astrophysical Fluid Simulations on GPU San Jose 10/2/2009 Peng Wang, NVIDIA Overview Astrophysical motivation & the Enzo code Finite volume method and adaptive mesh refinement (AMR) CUDA
More informationOverview of Traditional Surface Tracking Methods
Liquid Simulation With Mesh-Based Surface Tracking Overview of Traditional Surface Tracking Methods Matthias Müller Introduction Research lead of NVIDIA PhysX team PhysX GPU acc. Game physics engine www.nvidia.com\physx
More informationMulti-GPU simulations in OpenFOAM with SpeedIT technology.
Multi-GPU simulations in OpenFOAM with SpeedIT technology. Attempt I: SpeedIT GPU-based library of iterative solvers for Sparse Linear Algebra and CFD. Current version: 2.2. Version 1.0 in 2008. CMRS format
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationComputing on GPU Clusters
Computing on GPU Clusters Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009
More informationAsynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation
More informationFirst Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster
First Steps of YALES2 Code Towards GPU Acceleration on Standard and Prototype Cluster YALES2: Semi-industrial code for turbulent combustion and flows Jean-Matthieu Etancelin, ROMEO, NVIDIA GPU Application
More informationPerformance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla
Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla SIAM PP 2016, April 13 th 2016 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer,
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationCloth Simulation. COMP 768 Presentation Zhen Wei
Cloth Simulation COMP 768 Presentation Zhen Wei Outline Motivation and Application Cloth Simulation Methods Physically-based Cloth Simulation Overview Development References 2 Motivation Movies Games VR
More informationGPU Cluster Computing for FEM
GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing
More informationB. Tech. Project Second Stage Report on
B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic
More informationOn Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy
On Massively Parallel Algorithms to Track One Path of a Polynomial Homotopy Jan Verschelde joint with Genady Yoffe and Xiangcheng Yu University of Illinois at Chicago Department of Mathematics, Statistics,
More informationOpenFOAM + GPGPU. İbrahim Özküçük
OpenFOAM + GPGPU İbrahim Özküçük Outline GPGPU vs CPU GPGPU plugins for OpenFOAM Overview of Discretization CUDA for FOAM Link (cufflink) Cusp & Thrust Libraries How Cufflink Works Performance data of
More informationPORTABLE AND SCALABLE SOLUTIONS FOR CFD ON MODERN SUPERCOMPUTERS
PORTABLE AND SCALABLE SOLUTIONS FOR CFD ON MODERN SUPERCOMPUTERS Ricard Borrell Pol Head and Mass Transfer Technological Center cttc.upc.edu Termo Fluids S.L termofluids.co Barcelona Supercomputing Center
More informationComputational Fluid Dynamics (CFD) using Graphics Processing Units
Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores
More informationFast Multipole Method on the GPU
Fast Multipole Method on the GPU with application to the Adaptive Vortex Method University of Bristol, Bristol, United Kingdom. 1 Introduction Particle methods Highly parallel Computational intensive Numerical
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationFLUID SIMULATION. Kristofer Schlachter
FLUID SIMULATION Kristofer Schlachter The Equations Incompressible Navier-Stokes: @u @t = (r u)u 1 rp + vr2 u + F Incompressibility condition r u =0 Breakdown @u @t The derivative of velocity with respect
More informationGeneric Refinement and Block Partitioning enabling efficient GPU CFD on Unstructured Grids
Generic Refinement and Block Partitioning enabling efficient GPU CFD on Unstructured Grids Matthieu Lefebvre 1, Jean-Marie Le Gouez 2 1 PhD at Onera, now post-doc at Princeton, department of Geosciences,
More informationCS-184: Computer Graphics Lecture #21: Fluid Simulation II
CS-184: Computer Graphics Lecture #21: Fluid Simulation II Rahul Narain University of California, Berkeley Nov. 18 19, 2013 Grid-based fluid simulation Recap: Eulerian viewpoint Grid is fixed, fluid moves
More informationAccelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University
Accelerating GPU computation through mixed-precision methods Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University Outline Motivation Truncated Precision using CUDA Solving Linear
More informationWhy Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends
Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationA Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids
A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011
More informationHigh Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters
SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014
More informationDevelopment of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics
Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation
More informationEfficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling
Iterative Solvers Numerical Results Conclusion and outlook 1/22 Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Part II: GPU Implementation and Scaling on Titan Eike
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationMassively Parallel Phase Field Simulations using HPC Framework walberla
Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich
More informationDirected Optimization On Stencil-based Computational Fluid Dynamics Application(s)
Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s) Islam Harb 08/21/2015 Agenda Motivation Research Challenges Contributions & Approach Results Conclusion Future Work 2
More informationPrim ary break-up: DNS of liquid jet to improve atomization m odelling
Computational Methods in Multiphase Flow III 343 Prim ary break-up: DNS of liquid jet to improve atomization m odelling T. Ménard, P. A. Beau, S. Tanguy, F. X. Demoulin & A. Berlemont CNRS-UMR6614-CORIA,
More informationInterdisciplinary practical course on parallel finite element method using HiFlow 3
Interdisciplinary practical course on parallel finite element method using HiFlow 3 E. Treiber, S. Gawlok, M. Hoffmann, V. Heuveline, W. Karl EuroEDUPAR, 2015/08/24 KARLSRUHE INSTITUTE OF TECHNOLOGY -
More informationAb initio NMR Chemical Shift Calculations for Biomolecular Systems Using Fragment Molecular Orbital Method
4 Ab initio NMR Chemical Shift Calculations for Biomolecular Systems Using Fragment Molecular Orbital Method A Large-scale Two-phase Flow Simulation Evolutive Image/ Video Coding with Massively Parallel
More informationNVIDIA. Interacting with Particle Simulation in Maya using CUDA & Maximus. Wil Braithwaite NVIDIA Applied Engineering Digital Film
NVIDIA Interacting with Particle Simulation in Maya using CUDA & Maximus Wil Braithwaite NVIDIA Applied Engineering Digital Film Some particle milestones FX Rendering Physics 1982 - First CG particle FX
More informationAerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project
Workshop HPC enabling of OpenFOAM for CFD applications Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project A. De Maio (1), V. Krastev (2), P. Lanucara (3),
More informationMultiphysics simulations of nuclear reactors and more
Multiphysics simulations of nuclear reactors and more Gothenburg Region OpenFOAM User Group Meeting Klas Jareteg klasjareteg@chalmersse Division of Nuclear Engineering Department of Applied Physics Chalmers
More informationHPC Usage for Aerodynamic Flow Computation with Different Levels of Detail
DLR.de Folie 1 HPCN-Workshop 14./15. Mai 2018 HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail Cornelia Grabe, Marco Burnazzi, Axel Probst, Silvia Probst DLR, Institute of Aerodynamics
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationInvestigation of the critical submergence at pump intakes based on multiphase CFD calculations
Advances in Fluid Mechanics X 143 Investigation of the critical submergence at pump intakes based on multiphase CFD calculations P. Pandazis & F. Blömeling TÜV NORD SysTec GmbH and Co. KG, Germany Abstract
More informationInteraction of Fluid Simulation Based on PhysX Physics Engine. Huibai Wang, Jianfei Wan, Fengquan Zhang
4th International Conference on Sensors, Measurement and Intelligent Materials (ICSMIM 2015) Interaction of Fluid Simulation Based on PhysX Physics Engine Huibai Wang, Jianfei Wan, Fengquan Zhang College
More informationGPUs and Einstein s Equations
GPUs and Einstein s Equations Tim Dewey Advisor: Dr. Manuel Tiglio AMSC Scientific Computing University of Maryland May 5, 2011 Outline 1 Project Summary 2 Evolving Einstein s Equations 3 Implementation
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More information