Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
|
|
- Bertram Melvin Dennis
- 5 years ago
- Views:
Transcription
1 Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 January 26, 2016
2 (1) The Vijayasundaram Method for Multi-Physics Euler Equations The Euler equations are given by a system of differential equations. We consider two gas species with densities ρ 1 and ρ 2 for the simulations and ideal gas state equations. More complicated and realistic state equation can also be handled by the ARMO simulation code. Let ρ 1, ρ 2 be the densities of the gas species and ρ = ρ 1 + ρ 2 the density of the gas, p the pressure, and p 1, p 2, p 3 the components of the gas momentum density, and E the total energy density. Let x = {x 1, x 2, x 3 } Ω R 3 and t (0, T ) R be the space time coordinates. Then the conserved quantity w(x, t) is given by w = ρ 1 ρ 2 p 1 p 2 p 3 E (1) Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 1
3 and the flux vectors are defined as f k (w) = ρ 1 p k /ρ ρ 2 p k /ρ p 1 p k /ρ + δ 1k p p 2 p k /ρ + δ 2k p p 3 p k /ρ + δ 3k p (E + p)p k /ρ, k {1, 2, 3} (2) The Euler equations on the domain Ω (0, T ) can then be expressed as w(x, t) + t x 1 f 1 (w(x, t)) + x 2 f 2 (w(x, t)) + x 3 f 3 (w(x, t)) = 0 (3) and together with suitable boundary conditions the system can be solved with the finite volume approach. The finite volume method can be formulated by applying Green s theorem d dt Ω w(x, t)dx = Ω f 1 n 1 + f 2 n 2 + f 3 n 3 ds (4) Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 2
4 where n = (n 1, n 2, n 3 ) denotes the outer normal to the boundary Ω. The discrete version is then derived by integration over a time intervall [t n, t n + t] and averaging over the cells K i. w (n+1) Ki = w (n) Ki t j S(i) Γ ij K i 3 F k,γij (w (n) Ki, w (n) Kj )n k (5) k=1 With a tetrahedral approximation to Ω {K i } i I and Γ ij are the interfaces between the cells K i, K j and the set S(i) stores the indices of the neighboring cells of K i Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 3
5 The Vijayasundaram method defines the fluxes as ( ) ( ) u + v u + v F k,γij (u, v) = A + k u + A k v, 2 2 k = 1, 2, 3 (6) The essence of the Vijayasundaram method is the calculation of an eigenspace decomposition of A k = df k /dw, k = 1, 2, 3 into positive and negative subspaces. Thus the matrices A + k, A k are constructed from the positive and negative eigenvalues of A k = R k Λ k L k with Λ k = diag(λ k,1,..., λ k,6 ) and k = 1, 2, 3. A ± k = R kλ ± k L k, Λ ± k = diag(λ± k,1,..., λ± k,m ), (8) λ + k,i = max(λ k,i, 0), λ k,i = min(λ k,i, 0), i = 1,..., 6 (9) (7) Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 4
6 (2) ARMO CPU/GPU Algorithms High level parallel CPU algorithm: Require: f, g, com, nei, geo, pio Require: t max, i max, C, σ, m, n t 0, i 0 while t < t max and i < i max do exchange(m, n, f, g, com) mpi alltoall(m, n, g, f) vijaya(n, nei, geo, pio, f, g, σ) mpi allreduce max(σ) update(n, f, g, σ, C) i i + 1 t t + C/σ end while Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 5
7 High level parallel GPU algorithm: Require: f D, g D, com D, nei D, geo D, pio D, σ D Require: t max, i max, C, σ, m, n, snd, rcv t 0, i 0 while t < t max and i < i max do exchange D (m, n, f D, g D, com D ) device to host(n, g D, snd) mpi alltoall(snd, rcv) host to device(n, f D, rcv) vijaya D (n, nei D, geo D, pio D, f D, g D, σ D ) device to host(σ D, σ) mpi allreduce max(σ) host to device(σ D, σ) update D (n, f D, g D, σ D, C) i i + 1 t t + C/σ end while Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 6
8 (3) ARMO CPU/GPU Benchmarks Figure 1: GPU Cluster: mephisto.uni-graz.at Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 7
9 GPU Computing Hardware kepler: 4x Nvidia Tesla K20 GPU (9,984 cores / 24 GB on-board RAM) mephisto: 20x Nvidia Tesla C2070 GPU (8,960 cores / 120 GB on-board RAM) iscsergpu: 32x Nvidia Geforce GTX 295 (15,360 cores / 56 GB on-board RAM) gtx: 4x Nvidia Geforce GTX 280 (960 cores / 4 GB on-board RAM) fermi: 2x Nvidia Geforce GTX 480 (960 cores / 3 GB on-board RAM) Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 8
10 GPU Clusters and Servers kepler: 2x Intel Xeon 2.0 GHz with 256 GB RAM (4x Tesla K20) mephisto: 12x Intel Xeon 2.67 GHz with 520 GB RAM (20x Tesla C2070) iscsergpu: 8x Intel Core i7 3.2 GHz with 12 GB RAM (32x GTX 295) gtx: AMD Phenom 2.6 GHz with 8 GB RAM (4x GTX 280) fermi: Intel Core i GHz with 12 GB RAM (2x GTX 480) CPU Clusters and Servers memo: 8x Intel Xeon 2.27 GHz with 1024 GB RAM penge: 12x Dual Intel Xeon 3.0 GHz with 16 GB RAM quad2: 4x AMD Opteron 1.9 GHz with 32 GB RAM Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 9
11 Benchmark example: Intake port of a diesel engine with 155,325 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 10
12 Four pieces of the intake port for parallel processing using domain decomposition. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 11
13 CPU cores memo quad2 gtx iscsergpu penge fermi kepler mephisto (6) [1] 1.27 [1] (1.76) 16 (12) [2] 0.64 [2] 0.72 (0.84) [1] 32 (24) [4] 0.33 [4] (0.41) [2] 64 (48) [8] (0.21) [4] Speedup , Efficiency GPUs memo quad2 gtx iscsergpu penge fermi kepler mephisto ECC: on/off / / / [1] [1] / [2] / [4] Speedup / 4.72 Efficiency / 0.29 Table 1: Parallel scalability benchmark for an intake-port with 155,325 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 12
14 Benchmark example: Nozzle with 642,700, 2,570,800, and 10,283,200 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 13
15 CPU cores quad2 gtx iscsergpu fermi kepler mephisto (6) [1] (7.92) 16 (12) [2] 3.26 (3.75) [1] 32 (24) 2.42 [4] (1.74) [2] 64 (48) (0.84) [4] Speedup Efficiency GPUs quad2 gtx iscsergpu fermi kepler mephisto ECC: on/off / / / [1] [1] / [2] / [4] Speedup / 7.40 Efficiency / 0.46 Table 2: Parallel scalability benchmark for a nozzle with 642,700 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 14
16 CPU cores quad2 gtx iscsergpu fermi kepler mephisto (6) [1] (29.74) 16 (12) [2] (14.58) [1] 32 (24) 7.40 [4] (7.16) [2] 64 (48) 3.75 [8] (3.49) [4] Speedup Efficiency GPUs quad2 gtx iscsergpu fermi kepler mephisto ECC: on/off / / / [1] [1] / [2] [2] / [4] Speedup / Efficiency / 0.73 Table 3: Parallel scalability benchmark for a nozzle with 2,570,800 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 15
17 CPU cores quad2 gtx iscsergpu fermi kepler mephisto (6) [1] (109.45) 16 (12) [2] (54.44) [1] 32 (24) [4] (27.16) [2] 64 (48) [8] (13.66) [4] Speedup Efficiency GPUs quad2 gtx iscsergpu fermi kepler mephisto ECC: on/off 1 * * / / / [1] [1] / [2] [2] / [4] 32 (24) [4] (0.495) / * [6] [8] Speedup / Efficiency / 0.94 Table 4: Parallel scalability benchmark for a nozzle with 10,283,200 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 16
18 Effective GFLOPS for ARMO Simulator Intake-port Nozzle Nozzle Nozzle CPU / GPU Hardware 155, ,700 2,570,800 10,283,200 kepler 2x Intel Xeon E [2] [2] [2] [2] kepler 4x Nvidia Tesla K [4] [4] [4] [4] mephisto 16x Nvidia Tesla C [16] [16] [16] [16] iscsergpu 32x Nvidia GTX [8] [8] [16] [64] Table 5: Effective GFLOPS for ARMO simulator. GPU cluster performance is equivalent to CPU cores! Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 17
19 Conclusions GPUs deliver excellent performance for CFD problems! speedup on GPU cluster with 4 64 GPUs compared with modern CPU core New GPU hardware: Maxwell architecture brings even more performance CUDA programming model fits well Essential software design decision: Element-based loops! Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 18
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationTwo-Phase flows on massively parallel multi-gpu clusters
Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous
More informationAdvanced Topics in High Performance Scientific Computing [MA5327] Exercise 1
Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de
More informationAsynchronous OpenCL/MPI numerical simulations of conservation laws
Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationHigh-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs
High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea
More informationApplications of Berkeley s Dwarfs on Nvidia GPUs
Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse
More informationGPU Accelerated Solvers for ODEs Describing Cardiac Membrane Equations
GPU Accelerated Solvers for ODEs Describing Cardiac Membrane Equations Fred Lionetti @ CSE Andrew McCulloch @ Bioeng Scott Baden @ CSE University of California, San Diego What is heart modeling? Bioengineer
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationInternational Supercomputing Conference 2009
International Supercomputing Conference 2009 Implementation of a Lattice-Boltzmann-Method for Numerical Fluid Mechanics Using the nvidia CUDA Technology E. Riegel, T. Indinger, N.A. Adams Technische Universität
More informationA Massively Parallel Two-Phase Solver for Incompressible Fluids on Multi-GPU Clusters
A Massively Parallel Two-Phase Solver for Incompressible Fluids on Multi-GPU Clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn GPU
More informationJ. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst
Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function
More informationFaster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA
Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs Baskar Rajagopalan Accelerated Computing, NVIDIA 1 Engineering & IT Challenges/Trends NVIDIA GPU Solutions AGENDA Abaqus GPU
More informationSession S0069: GPU Computing Advances in 3D Electromagnetic Simulation
Session S0069: GPU Computing Advances in 3D Electromagnetic Simulation Andreas Buhr, Alexander Langwost, Fabrizio Zanella CST (Computer Simulation Technology) Abstract Computer Simulation Technology (CST)
More informationA TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE
A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA
More informationA Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids
A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011
More informationParallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung
More informationHigh Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters
SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014
More informationOptimization of HOM Couplers using Time Domain Schemes
Optimization of HOM Couplers using Time Domain Schemes Workshop on HOM Damping in Superconducting RF Cavities Carsten Potratz Universität Rostock October 11, 2010 10/11/2010 2009 UNIVERSITÄT ROSTOCK FAKULTÄT
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationApplication of GPU technology to OpenFOAM simulations
Application of GPU technology to OpenFOAM simulations Jakub Poła, Andrzej Kosior, Łukasz Miroslaw jakub.pola@vratis.com, www.vratis.com Wroclaw, Poland Agenda Motivation Partial acceleration SpeedIT OpenFOAM
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationA Simulated Annealing algorithm for GPU clusters
A Simulated Annealing algorithm for GPU clusters Institute of Computer Science Warsaw University of Technology Parallel Processing and Applied Mathematics 2011 1 Introduction 2 3 The lower level The upper
More informationEfficient Tridiagonal Solvers for ADI methods and Fluid Simulation
Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular
More informationLarge-scale Gas Turbine Simulations on GPU clusters
Large-scale Gas Turbine Simulations on GPU clusters Tobias Brandvik and Graham Pullan Whittle Laboratory University of Cambridge A large-scale simulation Overview PART I: Turbomachinery PART II: Stencil-based
More informationHardware Recommendations for SOLIDWORKS 2017
Hardware Recommendations for 2017 Minimum System OS: Windows 10, Windows 8.1 64, or Windows 7 64 CPU: Intel i5 Core Intel i7 Dual Core, or equivalent AMD Hard Drive: >250GB, 7200rpm Graphics Card: 2GB
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationUnstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node
Unstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node Keith Obenschain & Andrew Corrigan Laboratory for Computa;onal Physics and Fluid Dynamics Naval Research Laboratory Washington DC,
More informationCenter for Computational Science
Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,
More informationGradient Free Design of Microfluidic Structures on a GPU Cluster
Gradient Free Design of Microfluidic Structures on a GPU Cluster Austen Duffy - Florida State University SIAM Conference on Computational Science and Engineering March 2, 2011 Acknowledgements This work
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationAdaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA
Adaptive Mesh Astrophysical Fluid Simulations on GPU San Jose 10/2/2009 Peng Wang, NVIDIA Overview Astrophysical motivation & the Enzo code Finite volume method and adaptive mesh refinement (AMR) CUDA
More informationHarnessing GPU speed to accelerate LAMMPS particle simulations
Harnessing GPU speed to accelerate LAMMPS particle simulations Paul S. Crozier, W. Michael Brown, Peng Wang pscrozi@sandia.gov, wmbrown@sandia.gov, penwang@nvidia.com SC09, Portland, Oregon November 18,
More informationGPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran. G. Ruetsch, M. Fatica, E. Phillips, N.
GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran G. Ruetsch, M. Fatica, E. Phillips, N. Juffa Outline WRF and RRTM Previous Work CUDA Fortran Features RRTM in CUDA
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationProgress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core
Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core Tom Henderson NOAA/OAR/ESRL/GSD/ACE Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden,
More informationNumerical Methods for PDEs. SSC Workgroup Meetings Juan J. Alonso October 8, SSC Working Group Meetings, JJA 1
Numerical Methods for PDEs SSC Workgroup Meetings Juan J. Alonso October 8, 2001 SSC Working Group Meetings, JJA 1 Overview These notes are meant to be an overview of the various memory access patterns
More informationComputational Fluid Dynamics (CFD) using Graphics Processing Units
Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview
More informationEfficient Imaging Algorithms on Many-Core Platforms
Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationThe Fermi GPU and HPC Application Breakthroughs
The Fermi GPU and HPC Application Breakthroughs Peng Wang, PhD HPC Developer Technology Group Stan Posey HPC Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA Corporation 2009 Overview GPU Computing:
More informationTR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut
TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1
More informationAerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project
Workshop HPC enabling of OpenFOAM for CFD applications Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project A. De Maio (1), V. Krastev (2), P. Lanucara (3),
More informationFOR P3: A monolithic multigrid FEM solver for fluid structure interaction
FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany
More informationSENSEI / SENSEI-Lite / SENEI-LDC Updates
SENSEI / SENSEI-Lite / SENEI-LDC Updates Chris Roy and Brent Pickering Aerospace and Ocean Engineering Dept. Virginia Tech July 23, 2014 Collaborations with Math Collaboration on the implicit SENSEI-LDC
More informationAccelerating Financial Applications on the GPU
Accelerating Financial Applications on the GPU Scott Grauer-Gray Robert Searles William Killian John Cavazos Department of Computer and Information Science University of Delaware Sixth Workshop on General
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More information3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs
3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional
More informationAccelerator Programming Lecture 1
Accelerator Programming Lecture 1 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de January 11, 2016 Accelerator Programming
More informationImplementation of a compressible-flow simulation code in the D programming language
Implementation of a compressible-flow simulation code in the D programming language Peter Jacobsa * and Rowan Gollanb School of Mechanical and Mining Engineering, The University of Queensland, Brisbane,
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationNonoscillatory Central Schemes on Unstructured Triangulations for Hyperbolic Systems of Conservation Laws
Nonoscillatory Central Schemes on Unstructured Triangulations for Hyperbolic Systems of Conservation Laws Ivan Christov Bojan Popov Department of Mathematics, Texas A&M University, College Station, Texas
More information14MMFD-34 Parallel Efficiency and Algorithmic Optimality in Reservoir Simulation on GPUs
14MMFD-34 Parallel Efficiency and Algorithmic Optimality in Reservoir Simulation on GPUs K. Esler, D. Dembeck, K. Mukundakrishnan, V. Natoli, J. Shumway and Y. Zhang Stone Ridge Technology, Bel Air, MD
More informationDual Interpolants for Finite Element Methods
Dual Interpolants for Finite Element Methods Andrew Gillette joint work with Chandrajit Bajaj and Alexander Rand Department of Mathematics Institute of Computational Engineering and Sciences University
More informationGPU Cluster Computing for FEM
GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing
More informationParallel Interpolation in FSI Problems Using Radial Basis Functions and Problem Size Reduction
Parallel Interpolation in FSI Problems Using Radial Basis Functions and Problem Size Reduction Sergey Kopysov, Igor Kuzmin, Alexander Novikov, Nikita Nedozhogin, and Leonid Tonkov Institute of Mechanics,
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationParallel Performance Studies for a Clustering Algorithm
Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,
More informationCUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation
CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationPerformance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows
Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Benzi John* and M. Damodaran** Division of Thermal and Fluids Engineering, School of Mechanical and Aerospace
More informationLaptop Requirement: Technical Specifications and Guidelines. Frequently Asked Questions
Laptop Requirement: Technical Specifications and Guidelines As artists and designers, you will be working in an increasingly digital landscape. The Parsons curriculum addresses this by making digital literacy
More informationPerformance Benefits of NVIDIA GPUs for LS-DYNA
Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationThe Cray CX1 puts massive power and flexibility right where you need it in your workgroup
The Cray CX1 puts massive power and flexibility right where you need it in your workgroup Up to 96 cores of Intel 5600 compute power 3D visualization Up to 32TB of storage GPU acceleration Small footprint
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationDEMO KIT quick start guide MULTI-TOUCH PROJECTED CAPACITIVE
DEMO KIT quick start guide MULTI-TOUCH PROJECTED CAPACITIVE 2222 W. Rundberg Ln., Suite 200, Austin, Texas 78758 T: 512-832-8292 F: 512-832-8291 DEMO KIT QUICK START GUIDE: 3 Easy Steps STEP 1. Get to
More informationCUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata
CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids
More informationComputing on GPU Clusters
Computing on GPU Clusters Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009
More informationMaximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs
Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise
More informationTimo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 1627, FI-70211, Finland
Timo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 627, FI-72, Finland timo.lahivaara@uku.fi INTRODUCTION The modeling of the acoustic wave fields often
More information2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123
2.7 Cloth Animation 320491: Advanced Graphics - Chapter 2 123 Example: Cloth draping Image Michael Kass 320491: Advanced Graphics - Chapter 2 124 Cloth using mass-spring model Network of masses and springs
More informationBlock Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations
Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations D. Zheltkov, N. Zamarashkin INM RAS September 24, 2018 Scalability of Lanczos method Notations Matrix order
More informationGPU Simulations of Violent Flows with Smooth Particle Hydrodynamics (SPH) Method
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe GPU Simulations of Violent Flows with Smooth Particle Hydrodynamics (SPH) Method T. Arslan a*, M. Özbulut b a Norwegian
More informationBlock Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations
Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Nikolai Zamarashkin and Dmitry Zheltkov INM RAS, Gubkina 8, Moscow, Russia {nikolai.zamarashkin,dmitry.zheltkov}@gmail.com
More informationNumerical Algorithms on Multi-GPU Architectures
Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications
More informationConforming Vector Interpolation Functions for Polyhedral Meshes
Conforming Vector Interpolation Functions for Polyhedral Meshes Andrew Gillette joint work with Chandrajit Bajaj and Alexander Rand Department of Mathematics Institute of Computational Engineering and
More informationEmpirical Modeling: an Auto-tuning Method for Linear Algebra Routines on CPU plus Multi-GPU Platforms
Empirical Modeling: an Auto-tuning Method for Linear Algebra Routines on CPU plus Multi-GPU Platforms Javier Cuenca Luis-Pedro García Domingo Giménez Francisco J. Herrera Scientific Computing and Parallel
More informationSystem Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.
System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has
More informationOFX SERIES quick start guide 7 to 32 STANDARD OPEN FRAME TOUCH MONITORS
OFX SERIES quick start guide 7 to 32 STANDARD OPEN FRAME TOUCH MONITORS 2222 W. Rundberg Ln., Suite 200, Austin, Texas 78758 T: 512-832-8292 F: 512-832-8291 OFX Series QUICK START GUIDE: 3 Easy Steps STEP
More information1 Past Research and Achievements
Parallel Mesh Generation and Adaptation using MAdLib T. K. Sheel MEMA, Universite Catholique de Louvain Batiment Euler, Louvain-La-Neuve, BELGIUM Email: tarun.sheel@uclouvain.be 1 Past Research and Achievements
More informationDDFV Schemes for the Euler Equations
DDFV Schemes for the Euler Equations Christophe Berthon, Yves Coudière, Vivien Desveaux Journées du GDR Calcul, 5 juillet 2011 Introduction Hyperbolic system of conservation laws in 2D t W + x f(w)+ y
More informationFINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION. M. Panchatcharam 1, S. Sundar 2
International Journal of Applied Mathematics Volume 25 No. 4 2012, 547-557 FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION M. Panchatcharam 1, S. Sundar 2 1,2 Department of Mathematics
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing
More informationGPU Acceleration of Unmodified CSM and CFD Solvers
GPU Acceleration of Unmodified CSM and CFD Solvers Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de
More informationA GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv: v1 [cs.dc] 5 Sep 2013
A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv:1309.1230v1 [cs.dc] 5 Sep 2013 Kerry A. Seitz, Jr. 1, Alex Kennedy 1, Owen Ransom 2, Bassam A. Younis 2, and John D. Owens 3 1 Department
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationBlazer Pro V2.1 Client Requirements & Hardware Performance
Blazer Pro V2.1 Client Requirements & Hardware Performance Table of Contents Chapter 1 Client Requirements... 2 Chapter 2 Control Client Performance... 3 2.1 Local Control Client on Blazer Pro Server...
More informationAmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative
More informationOverview of Parallel Computing. Timothy H. Kaiser, PH.D.
Overview of Parallel Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu Introduction What is parallel computing? Why go parallel? The best example of parallel computing Some Terminology Slides and examples
More informationMatrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs
Iterative Solvers Numerical Results Conclusion and outlook 1/18 Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Eike Hermann Müller, Robert Scheichl, Eero Vainikko
More informationHigh Performance Computing for PDE Towards Petascale Computing
High Performance Computing for PDE Towards Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik, Univ. Dortmund
More information