Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Size: px
Start display at page:

Download "Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters"

Transcription

1 Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 January 26, 2016

2 (1) The Vijayasundaram Method for Multi-Physics Euler Equations The Euler equations are given by a system of differential equations. We consider two gas species with densities ρ 1 and ρ 2 for the simulations and ideal gas state equations. More complicated and realistic state equation can also be handled by the ARMO simulation code. Let ρ 1, ρ 2 be the densities of the gas species and ρ = ρ 1 + ρ 2 the density of the gas, p the pressure, and p 1, p 2, p 3 the components of the gas momentum density, and E the total energy density. Let x = {x 1, x 2, x 3 } Ω R 3 and t (0, T ) R be the space time coordinates. Then the conserved quantity w(x, t) is given by w = ρ 1 ρ 2 p 1 p 2 p 3 E (1) Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 1

3 and the flux vectors are defined as f k (w) = ρ 1 p k /ρ ρ 2 p k /ρ p 1 p k /ρ + δ 1k p p 2 p k /ρ + δ 2k p p 3 p k /ρ + δ 3k p (E + p)p k /ρ, k {1, 2, 3} (2) The Euler equations on the domain Ω (0, T ) can then be expressed as w(x, t) + t x 1 f 1 (w(x, t)) + x 2 f 2 (w(x, t)) + x 3 f 3 (w(x, t)) = 0 (3) and together with suitable boundary conditions the system can be solved with the finite volume approach. The finite volume method can be formulated by applying Green s theorem d dt Ω w(x, t)dx = Ω f 1 n 1 + f 2 n 2 + f 3 n 3 ds (4) Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 2

4 where n = (n 1, n 2, n 3 ) denotes the outer normal to the boundary Ω. The discrete version is then derived by integration over a time intervall [t n, t n + t] and averaging over the cells K i. w (n+1) Ki = w (n) Ki t j S(i) Γ ij K i 3 F k,γij (w (n) Ki, w (n) Kj )n k (5) k=1 With a tetrahedral approximation to Ω {K i } i I and Γ ij are the interfaces between the cells K i, K j and the set S(i) stores the indices of the neighboring cells of K i Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 3

5 The Vijayasundaram method defines the fluxes as ( ) ( ) u + v u + v F k,γij (u, v) = A + k u + A k v, 2 2 k = 1, 2, 3 (6) The essence of the Vijayasundaram method is the calculation of an eigenspace decomposition of A k = df k /dw, k = 1, 2, 3 into positive and negative subspaces. Thus the matrices A + k, A k are constructed from the positive and negative eigenvalues of A k = R k Λ k L k with Λ k = diag(λ k,1,..., λ k,6 ) and k = 1, 2, 3. A ± k = R kλ ± k L k, Λ ± k = diag(λ± k,1,..., λ± k,m ), (8) λ + k,i = max(λ k,i, 0), λ k,i = min(λ k,i, 0), i = 1,..., 6 (9) (7) Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 4

6 (2) ARMO CPU/GPU Algorithms High level parallel CPU algorithm: Require: f, g, com, nei, geo, pio Require: t max, i max, C, σ, m, n t 0, i 0 while t < t max and i < i max do exchange(m, n, f, g, com) mpi alltoall(m, n, g, f) vijaya(n, nei, geo, pio, f, g, σ) mpi allreduce max(σ) update(n, f, g, σ, C) i i + 1 t t + C/σ end while Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 5

7 High level parallel GPU algorithm: Require: f D, g D, com D, nei D, geo D, pio D, σ D Require: t max, i max, C, σ, m, n, snd, rcv t 0, i 0 while t < t max and i < i max do exchange D (m, n, f D, g D, com D ) device to host(n, g D, snd) mpi alltoall(snd, rcv) host to device(n, f D, rcv) vijaya D (n, nei D, geo D, pio D, f D, g D, σ D ) device to host(σ D, σ) mpi allreduce max(σ) host to device(σ D, σ) update D (n, f D, g D, σ D, C) i i + 1 t t + C/σ end while Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 6

8 (3) ARMO CPU/GPU Benchmarks Figure 1: GPU Cluster: mephisto.uni-graz.at Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 7

9 GPU Computing Hardware kepler: 4x Nvidia Tesla K20 GPU (9,984 cores / 24 GB on-board RAM) mephisto: 20x Nvidia Tesla C2070 GPU (8,960 cores / 120 GB on-board RAM) iscsergpu: 32x Nvidia Geforce GTX 295 (15,360 cores / 56 GB on-board RAM) gtx: 4x Nvidia Geforce GTX 280 (960 cores / 4 GB on-board RAM) fermi: 2x Nvidia Geforce GTX 480 (960 cores / 3 GB on-board RAM) Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 8

10 GPU Clusters and Servers kepler: 2x Intel Xeon 2.0 GHz with 256 GB RAM (4x Tesla K20) mephisto: 12x Intel Xeon 2.67 GHz with 520 GB RAM (20x Tesla C2070) iscsergpu: 8x Intel Core i7 3.2 GHz with 12 GB RAM (32x GTX 295) gtx: AMD Phenom 2.6 GHz with 8 GB RAM (4x GTX 280) fermi: Intel Core i GHz with 12 GB RAM (2x GTX 480) CPU Clusters and Servers memo: 8x Intel Xeon 2.27 GHz with 1024 GB RAM penge: 12x Dual Intel Xeon 3.0 GHz with 16 GB RAM quad2: 4x AMD Opteron 1.9 GHz with 32 GB RAM Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 9

11 Benchmark example: Intake port of a diesel engine with 155,325 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 10

12 Four pieces of the intake port for parallel processing using domain decomposition. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 11

13 CPU cores memo quad2 gtx iscsergpu penge fermi kepler mephisto (6) [1] 1.27 [1] (1.76) 16 (12) [2] 0.64 [2] 0.72 (0.84) [1] 32 (24) [4] 0.33 [4] (0.41) [2] 64 (48) [8] (0.21) [4] Speedup , Efficiency GPUs memo quad2 gtx iscsergpu penge fermi kepler mephisto ECC: on/off / / / [1] [1] / [2] / [4] Speedup / 4.72 Efficiency / 0.29 Table 1: Parallel scalability benchmark for an intake-port with 155,325 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 12

14 Benchmark example: Nozzle with 642,700, 2,570,800, and 10,283,200 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 13

15 CPU cores quad2 gtx iscsergpu fermi kepler mephisto (6) [1] (7.92) 16 (12) [2] 3.26 (3.75) [1] 32 (24) 2.42 [4] (1.74) [2] 64 (48) (0.84) [4] Speedup Efficiency GPUs quad2 gtx iscsergpu fermi kepler mephisto ECC: on/off / / / [1] [1] / [2] / [4] Speedup / 7.40 Efficiency / 0.46 Table 2: Parallel scalability benchmark for a nozzle with 642,700 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 14

16 CPU cores quad2 gtx iscsergpu fermi kepler mephisto (6) [1] (29.74) 16 (12) [2] (14.58) [1] 32 (24) 7.40 [4] (7.16) [2] 64 (48) 3.75 [8] (3.49) [4] Speedup Efficiency GPUs quad2 gtx iscsergpu fermi kepler mephisto ECC: on/off / / / [1] [1] / [2] [2] / [4] Speedup / Efficiency / 0.73 Table 3: Parallel scalability benchmark for a nozzle with 2,570,800 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 15

17 CPU cores quad2 gtx iscsergpu fermi kepler mephisto (6) [1] (109.45) 16 (12) [2] (54.44) [1] 32 (24) [4] (27.16) [2] 64 (48) [8] (13.66) [4] Speedup Efficiency GPUs quad2 gtx iscsergpu fermi kepler mephisto ECC: on/off 1 * * / / / [1] [1] / [2] [2] / [4] 32 (24) [4] (0.495) / * [6] [8] Speedup / Efficiency / 0.94 Table 4: Parallel scalability benchmark for a nozzle with 10,283,200 elements. Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 16

18 Effective GFLOPS for ARMO Simulator Intake-port Nozzle Nozzle Nozzle CPU / GPU Hardware 155, ,700 2,570,800 10,283,200 kepler 2x Intel Xeon E [2] [2] [2] [2] kepler 4x Nvidia Tesla K [4] [4] [4] [4] mephisto 16x Nvidia Tesla C [16] [16] [16] [16] iscsergpu 32x Nvidia GTX [8] [8] [16] [64] Table 5: Effective GFLOPS for ARMO simulator. GPU cluster performance is equivalent to CPU cores! Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 17

19 Conclusions GPUs deliver excellent performance for CFD problems! speedup on GPU cluster with 4 64 GPUs compared with modern CPU core New GPU hardware: Maxwell architecture brings even more performance CUDA programming model fits well Essential software design decision: Element-based loops! Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters 18

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters

Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1

Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1 Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de

More information

Asynchronous OpenCL/MPI numerical simulations of conservation laws

Asynchronous OpenCL/MPI numerical simulations of conservation laws Asynchronous OpenCL/MPI numerical simulations of conservation laws Philippe HELLUY 1,3, Thomas STRUB 2. 1 IRMA, Université de Strasbourg, 2 AxesSim, 3 Inria Tonus, France IWOCL 2015, Stanford Conservation

More information

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs

High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea

More information

Applications of Berkeley s Dwarfs on Nvidia GPUs

Applications of Berkeley s Dwarfs on Nvidia GPUs Applications of Berkeley s Dwarfs on Nvidia GPUs Seminar: Topics in High-Performance and Scientific Computing Team N2: Yang Zhang, Haiqing Wang 05.02.2015 Overview CUDA The Dwarfs Dynamic Programming Sparse

More information

GPU Accelerated Solvers for ODEs Describing Cardiac Membrane Equations

GPU Accelerated Solvers for ODEs Describing Cardiac Membrane Equations GPU Accelerated Solvers for ODEs Describing Cardiac Membrane Equations Fred Lionetti @ CSE Andrew McCulloch @ Bioeng Scott Baden @ CSE University of California, San Diego What is heart modeling? Bioengineer

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

International Supercomputing Conference 2009

International Supercomputing Conference 2009 International Supercomputing Conference 2009 Implementation of a Lattice-Boltzmann-Method for Numerical Fluid Mechanics Using the nvidia CUDA Technology E. Riegel, T. Indinger, N.A. Adams Technische Universität

More information

A Massively Parallel Two-Phase Solver for Incompressible Fluids on Multi-GPU Clusters

A Massively Parallel Two-Phase Solver for Incompressible Fluids on Multi-GPU Clusters A Massively Parallel Two-Phase Solver for Incompressible Fluids on Multi-GPU Clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn GPU

More information

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst

J. Blair Perot. Ali Khajeh-Saeed. Software Engineer CD-adapco. Mechanical Engineering UMASS, Amherst Ali Khajeh-Saeed Software Engineer CD-adapco J. Blair Perot Mechanical Engineering UMASS, Amherst Supercomputers Optimization Stream Benchmark Stag++ (3D Incompressible Flow Code) Matrix Multiply Function

More information

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA

Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs Baskar Rajagopalan Accelerated Computing, NVIDIA 1 Engineering & IT Challenges/Trends NVIDIA GPU Solutions AGENDA Abaqus GPU

More information

Session S0069: GPU Computing Advances in 3D Electromagnetic Simulation

Session S0069: GPU Computing Advances in 3D Electromagnetic Simulation Session S0069: GPU Computing Advances in 3D Electromagnetic Simulation Andreas Buhr, Alexander Langwost, Fabrizio Zanella CST (Computer Simulation Technology) Abstract Computer Simulation Technology (CST)

More information

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA

More information

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011

More information

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs

Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung

More information

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters

High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters SIAM PP 2014 High Scalability of Lattice Boltzmann Simulations with Turbulence Models using Heterogeneous Clusters C. Riesinger, A. Bakhtiari, M. Schreiber Technische Universität München February 20, 2014

More information

Optimization of HOM Couplers using Time Domain Schemes

Optimization of HOM Couplers using Time Domain Schemes Optimization of HOM Couplers using Time Domain Schemes Workshop on HOM Damping in Superconducting RF Cavities Carsten Potratz Universität Rostock October 11, 2010 10/11/2010 2009 UNIVERSITÄT ROSTOCK FAKULTÄT

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

Application of GPU technology to OpenFOAM simulations

Application of GPU technology to OpenFOAM simulations Application of GPU technology to OpenFOAM simulations Jakub Poła, Andrzej Kosior, Łukasz Miroslaw jakub.pola@vratis.com, www.vratis.com Wroclaw, Poland Agenda Motivation Partial acceleration SpeedIT OpenFOAM

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

A Simulated Annealing algorithm for GPU clusters

A Simulated Annealing algorithm for GPU clusters A Simulated Annealing algorithm for GPU clusters Institute of Computer Science Warsaw University of Technology Parallel Processing and Applied Mathematics 2011 1 Introduction 2 3 The lower level The upper

More information

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular

More information

Large-scale Gas Turbine Simulations on GPU clusters

Large-scale Gas Turbine Simulations on GPU clusters Large-scale Gas Turbine Simulations on GPU clusters Tobias Brandvik and Graham Pullan Whittle Laboratory University of Cambridge A large-scale simulation Overview PART I: Turbomachinery PART II: Stencil-based

More information

Hardware Recommendations for SOLIDWORKS 2017

Hardware Recommendations for SOLIDWORKS 2017 Hardware Recommendations for 2017 Minimum System OS: Windows 10, Windows 8.1 64, or Windows 7 64 CPU: Intel i5 Core Intel i7 Dual Core, or equivalent AMD Hard Drive: >250GB, 7200rpm Graphics Card: 2GB

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

Turbostream: A CFD solver for manycore

Turbostream: A CFD solver for manycore Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware

More information

Unstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node

Unstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node Unstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node Keith Obenschain & Andrew Corrigan Laboratory for Computa;onal Physics and Fluid Dynamics Naval Research Laboratory Washington DC,

More information

Center for Computational Science

Center for Computational Science Center for Computational Science Toward GPU-accelerated meshfree fluids simulation using the fast multipole method Lorena A Barba Boston University Department of Mechanical Engineering with: Felipe Cruz,

More information

Gradient Free Design of Microfluidic Structures on a GPU Cluster

Gradient Free Design of Microfluidic Structures on a GPU Cluster Gradient Free Design of Microfluidic Structures on a GPU Cluster Austen Duffy - Florida State University SIAM Conference on Computational Science and Engineering March 2, 2011 Acknowledgements This work

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

Adaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA

Adaptive Mesh Astrophysical Fluid Simulations on GPU. San Jose 10/2/2009 Peng Wang, NVIDIA Adaptive Mesh Astrophysical Fluid Simulations on GPU San Jose 10/2/2009 Peng Wang, NVIDIA Overview Astrophysical motivation & the Enzo code Finite volume method and adaptive mesh refinement (AMR) CUDA

More information

Harnessing GPU speed to accelerate LAMMPS particle simulations

Harnessing GPU speed to accelerate LAMMPS particle simulations Harnessing GPU speed to accelerate LAMMPS particle simulations Paul S. Crozier, W. Michael Brown, Peng Wang pscrozi@sandia.gov, wmbrown@sandia.gov, penwang@nvidia.com SC09, Portland, Oregon November 18,

More information

GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran. G. Ruetsch, M. Fatica, E. Phillips, N.

GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran. G. Ruetsch, M. Fatica, E. Phillips, N. GPU Acceleration of the Longwave Rapid Radiative Transfer Model in WRF using CUDA Fortran G. Ruetsch, M. Fatica, E. Phillips, N. Juffa Outline WRF and RRTM Previous Work CUDA Fortran Features RRTM in CUDA

More information

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka

DIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods

More information

Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core

Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core Tom Henderson NOAA/OAR/ESRL/GSD/ACE Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden,

More information

Numerical Methods for PDEs. SSC Workgroup Meetings Juan J. Alonso October 8, SSC Working Group Meetings, JJA 1

Numerical Methods for PDEs. SSC Workgroup Meetings Juan J. Alonso October 8, SSC Working Group Meetings, JJA 1 Numerical Methods for PDEs SSC Workgroup Meetings Juan J. Alonso October 8, 2001 SSC Working Group Meetings, JJA 1 Overview These notes are meant to be an overview of the various memory access patterns

More information

Computational Fluid Dynamics (CFD) using Graphics Processing Units

Computational Fluid Dynamics (CFD) using Graphics Processing Units Computational Fluid Dynamics (CFD) using Graphics Processing Units Aaron F. Shinn Mechanical Science and Engineering Dept., UIUC Accelerators for Science and Engineering Applications: GPUs and Multicores

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview

More information

Efficient Imaging Algorithms on Many-Core Platforms

Efficient Imaging Algorithms on Many-Core Platforms Efficient Imaging Algorithms on Many-Core Platforms H. Köstler Dagstuhl, 22.11.2011 Contents Imaging Applications HDR Compression performance of PDE-based models Image Denoising performance of patch-based

More information

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010

More information

The Fermi GPU and HPC Application Breakthroughs

The Fermi GPU and HPC Application Breakthroughs The Fermi GPU and HPC Application Breakthroughs Peng Wang, PhD HPC Developer Technology Group Stan Posey HPC Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA Corporation 2009 Overview GPU Computing:

More information

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut

TR An Overview of NVIDIA Tegra K1 Architecture. Ang Li, Radu Serban, Dan Negrut TR-2014-17 An Overview of NVIDIA Tegra K1 Architecture Ang Li, Radu Serban, Dan Negrut November 20, 2014 Abstract This paperwork gives an overview of NVIDIA s Jetson TK1 Development Kit and its Tegra K1

More information

Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project

Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project Workshop HPC enabling of OpenFOAM for CFD applications Aerodynamics of a hi-performance vehicle: a parallel computing application inside the Hi-ZEV project A. De Maio (1), V. Krastev (2), P. Lanucara (3),

More information

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany

More information

SENSEI / SENSEI-Lite / SENEI-LDC Updates

SENSEI / SENSEI-Lite / SENEI-LDC Updates SENSEI / SENSEI-Lite / SENEI-LDC Updates Chris Roy and Brent Pickering Aerospace and Ocean Engineering Dept. Virginia Tech July 23, 2014 Collaborations with Math Collaboration on the implicit SENSEI-LDC

More information

Accelerating Financial Applications on the GPU

Accelerating Financial Applications on the GPU Accelerating Financial Applications on the GPU Scott Grauer-Gray Robert Searles William Killian John Cavazos Department of Computer and Information Science University of Delaware Sixth Workshop on General

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs 3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional

More information

Accelerator Programming Lecture 1

Accelerator Programming Lecture 1 Accelerator Programming Lecture 1 Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences, M17 manfred.liebmann@tum.de January 11, 2016 Accelerator Programming

More information

Implementation of a compressible-flow simulation code in the D programming language

Implementation of a compressible-flow simulation code in the D programming language Implementation of a compressible-flow simulation code in the D programming language Peter Jacobsa * and Rowan Gollanb School of Mechanical and Mining Engineering, The University of Queensland, Brisbane,

More information

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013 GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»

More information

Nonoscillatory Central Schemes on Unstructured Triangulations for Hyperbolic Systems of Conservation Laws

Nonoscillatory Central Schemes on Unstructured Triangulations for Hyperbolic Systems of Conservation Laws Nonoscillatory Central Schemes on Unstructured Triangulations for Hyperbolic Systems of Conservation Laws Ivan Christov Bojan Popov Department of Mathematics, Texas A&M University, College Station, Texas

More information

14MMFD-34 Parallel Efficiency and Algorithmic Optimality in Reservoir Simulation on GPUs

14MMFD-34 Parallel Efficiency and Algorithmic Optimality in Reservoir Simulation on GPUs 14MMFD-34 Parallel Efficiency and Algorithmic Optimality in Reservoir Simulation on GPUs K. Esler, D. Dembeck, K. Mukundakrishnan, V. Natoli, J. Shumway and Y. Zhang Stone Ridge Technology, Bel Air, MD

More information

Dual Interpolants for Finite Element Methods

Dual Interpolants for Finite Element Methods Dual Interpolants for Finite Element Methods Andrew Gillette joint work with Chandrajit Bajaj and Alexander Rand Department of Mathematics Institute of Computational Engineering and Sciences University

More information

GPU Cluster Computing for FEM

GPU Cluster Computing for FEM GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing

More information

Parallel Interpolation in FSI Problems Using Radial Basis Functions and Problem Size Reduction

Parallel Interpolation in FSI Problems Using Radial Basis Functions and Problem Size Reduction Parallel Interpolation in FSI Problems Using Radial Basis Functions and Problem Size Reduction Sergey Kopysov, Igor Kuzmin, Alexander Novikov, Nikita Nedozhogin, and Leonid Tonkov Institute of Mechanics,

More information

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Optimising the Mantevo benchmark suite for multi- and many-core architectures Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of

More information

Parallel Performance Studies for a Clustering Algorithm

Parallel Performance Studies for a Clustering Algorithm Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,

More information

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows

Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Performance Metrics of a Parallel Three Dimensional Two-Phase DSMC Method for Particle-Laden Flows Benzi John* and M. Damodaran** Division of Thermal and Fluids Engineering, School of Mechanical and Aerospace

More information

Laptop Requirement: Technical Specifications and Guidelines. Frequently Asked Questions

Laptop Requirement: Technical Specifications and Guidelines. Frequently Asked Questions Laptop Requirement: Technical Specifications and Guidelines As artists and designers, you will be working in an increasingly digital landscape. The Parsons curriculum addresses this by making digital literacy

More information

Performance Benefits of NVIDIA GPUs for LS-DYNA

Performance Benefits of NVIDIA GPUs for LS-DYNA Performance Benefits of NVIDIA GPUs for LS-DYNA Mr. Stan Posey and Dr. Srinivas Kodiyalam NVIDIA Corporation, Santa Clara, CA, USA Summary: This work examines the performance characteristics of LS-DYNA

More information

High Performance Computing with Accelerators

High Performance Computing with Accelerators High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing

More information

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup

The Cray CX1 puts massive power and flexibility right where you need it in your workgroup The Cray CX1 puts massive power and flexibility right where you need it in your workgroup Up to 96 cores of Intel 5600 compute power 3D visualization Up to 32TB of storage GPU acceleration Small footprint

More information

Software and Performance Engineering for numerical codes on GPU clusters

Software and Performance Engineering for numerical codes on GPU clusters Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3

More information

DEMO KIT quick start guide MULTI-TOUCH PROJECTED CAPACITIVE

DEMO KIT quick start guide MULTI-TOUCH PROJECTED CAPACITIVE DEMO KIT quick start guide MULTI-TOUCH PROJECTED CAPACITIVE 2222 W. Rundberg Ln., Suite 200, Austin, Texas 78758 T: 512-832-8292 F: 512-832-8291 DEMO KIT QUICK START GUIDE: 3 Easy Steps STEP 1. Get to

More information

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata

CUDA. Fluid simulation Lattice Boltzmann Models Cellular Automata CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata Please excuse my layout of slides for the remaining part of the talk! Fluid Simulation Navier Stokes equations for incompressible fluids

More information

Computing on GPU Clusters

Computing on GPU Clusters Computing on GPU Clusters Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009

More information

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs

Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Presented at the 2014 ANSYS Regional Conference- Detroit, June 5, 2014 Maximize automotive simulation productivity with ANSYS HPC and NVIDIA GPUs Bhushan Desam, Ph.D. NVIDIA Corporation 1 NVIDIA Enterprise

More information

Timo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 1627, FI-70211, Finland

Timo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 1627, FI-70211, Finland Timo Lähivaara, Tomi Huttunen, Simo-Pekka Simonaho University of Kuopio, Department of Physics P.O.Box 627, FI-72, Finland timo.lahivaara@uku.fi INTRODUCTION The modeling of the acoustic wave fields often

More information

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123

2.7 Cloth Animation. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter 2 123 2.7 Cloth Animation 320491: Advanced Graphics - Chapter 2 123 Example: Cloth draping Image Michael Kass 320491: Advanced Graphics - Chapter 2 124 Cloth using mass-spring model Network of masses and springs

More information

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations

Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations Block Lanczos-Montgomery Method over Large Prime Fields with GPU Accelerated Dense Operations D. Zheltkov, N. Zamarashkin INM RAS September 24, 2018 Scalability of Lanczos method Notations Matrix order

More information

GPU Simulations of Violent Flows with Smooth Particle Hydrodynamics (SPH) Method

GPU Simulations of Violent Flows with Smooth Particle Hydrodynamics (SPH) Method Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe GPU Simulations of Violent Flows with Smooth Particle Hydrodynamics (SPH) Method T. Arslan a*, M. Özbulut b a Norwegian

More information

Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations

Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Block Lanczos-Montgomery method over large prime fields with GPU accelerated dense operations Nikolai Zamarashkin and Dmitry Zheltkov INM RAS, Gubkina 8, Moscow, Russia {nikolai.zamarashkin,dmitry.zheltkov}@gmail.com

More information

Numerical Algorithms on Multi-GPU Architectures

Numerical Algorithms on Multi-GPU Architectures Numerical Algorithms on Multi-GPU Architectures Dr.-Ing. Harald Köstler 2 nd International Workshops on Advances in Computational Mechanics Yokohama, Japan 30.3.2010 2 3 Contents Motivation: Applications

More information

Conforming Vector Interpolation Functions for Polyhedral Meshes

Conforming Vector Interpolation Functions for Polyhedral Meshes Conforming Vector Interpolation Functions for Polyhedral Meshes Andrew Gillette joint work with Chandrajit Bajaj and Alexander Rand Department of Mathematics Institute of Computational Engineering and

More information

Empirical Modeling: an Auto-tuning Method for Linear Algebra Routines on CPU plus Multi-GPU Platforms

Empirical Modeling: an Auto-tuning Method for Linear Algebra Routines on CPU plus Multi-GPU Platforms Empirical Modeling: an Auto-tuning Method for Linear Algebra Routines on CPU plus Multi-GPU Platforms Javier Cuenca Luis-Pedro García Domingo Giménez Francisco J. Herrera Scientific Computing and Parallel

More information

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has

More information

OFX SERIES quick start guide 7 to 32 STANDARD OPEN FRAME TOUCH MONITORS

OFX SERIES quick start guide 7 to 32 STANDARD OPEN FRAME TOUCH MONITORS OFX SERIES quick start guide 7 to 32 STANDARD OPEN FRAME TOUCH MONITORS 2222 W. Rundberg Ln., Suite 200, Austin, Texas 78758 T: 512-832-8292 F: 512-832-8291 OFX Series QUICK START GUIDE: 3 Easy Steps STEP

More information

1 Past Research and Achievements

1 Past Research and Achievements Parallel Mesh Generation and Adaptation using MAdLib T. K. Sheel MEMA, Universite Catholique de Louvain Batiment Euler, Louvain-La-Neuve, BELGIUM Email: tarun.sheel@uclouvain.be 1 Past Research and Achievements

More information

DDFV Schemes for the Euler Equations

DDFV Schemes for the Euler Equations DDFV Schemes for the Euler Equations Christophe Berthon, Yves Coudière, Vivien Desveaux Journées du GDR Calcul, 5 juillet 2011 Introduction Hyperbolic system of conservation laws in 2D t W + x f(w)+ y

More information

FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION. M. Panchatcharam 1, S. Sundar 2

FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION. M. Panchatcharam 1, S. Sundar 2 International Journal of Applied Mathematics Volume 25 No. 4 2012, 547-557 FINITE POINTSET METHOD FOR 2D DAM-BREAK PROBLEM WITH GPU-ACCELERATION M. Panchatcharam 1, S. Sundar 2 1,2 Department of Mathematics

More information

Accelerating CFD with Graphics Hardware

Accelerating CFD with Graphics Hardware Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery

More information

Mathematical computations with GPUs

Mathematical computations with GPUs Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing

More information

GPU Acceleration of Unmodified CSM and CFD Solvers

GPU Acceleration of Unmodified CSM and CFD Solvers GPU Acceleration of Unmodified CSM and CFD Solvers Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de

More information

A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv: v1 [cs.dc] 5 Sep 2013

A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv: v1 [cs.dc] 5 Sep 2013 A GPU Implementation for Two-Dimensional Shallow Water Modeling arxiv:1309.1230v1 [cs.dc] 5 Sep 2013 Kerry A. Seitz, Jr. 1, Alex Kennedy 1, Owen Ransom 2, Bassam A. Younis 2, and John D. Owens 3 1 Department

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Blazer Pro V2.1 Client Requirements & Hardware Performance

Blazer Pro V2.1 Client Requirements & Hardware Performance Blazer Pro V2.1 Client Requirements & Hardware Performance Table of Contents Chapter 1 Client Requirements... 2 Chapter 2 Control Client Performance... 3 2.1 Local Control Client on Blazer Pro Server...

More information

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative

More information

Overview of Parallel Computing. Timothy H. Kaiser, PH.D.

Overview of Parallel Computing. Timothy H. Kaiser, PH.D. Overview of Parallel Computing Timothy H. Kaiser, PH.D. tkaiser@mines.edu Introduction What is parallel computing? Why go parallel? The best example of parallel computing Some Terminology Slides and examples

More information

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Iterative Solvers Numerical Results Conclusion and outlook 1/18 Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Eike Hermann Müller, Robert Scheichl, Eero Vainikko

More information

High Performance Computing for PDE Towards Petascale Computing

High Performance Computing for PDE Towards Petascale Computing High Performance Computing for PDE Towards Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik, Univ. Dortmund

More information