Block Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems

Size: px
Start display at page:

Download "Block Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems"

Transcription

1 Block Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems Dr.-Ing. Achim Basermann, Melven Zöllner** German Aerospace Center (DLR) Simulation- and Software Technology Distributed Systems and Component Software Porz-Wahnheide, Linder Höhe, D Cologne, Germany **also RWTH Aachen University Folie 1

2 DLR German Aerospace Center Research Institution Space Agency Project Management Agency Folie 2

3 Locations and employees Germany: 6,900 employees across 33 research institutes and facilities at 15 sites. Hamburg Bremen- Neustrelitz Trauen Berlin- Braunschweig Offices in Brussels, Paris and Washington. Koeln Bonn Goettingen Lampoldshausen Stuttgart Oberpfaffenhofen Weilheim Folie 3

4 Survey CFD computations at DLR Storage schemes for sparse matrices The Distributed Schur Complement method (DSC) Experiments with TRACE and TAU matrices Conclusions and future work Folie 4

5 Parallel Simulation System TRACE TRACE: Turbo-machinery Research Aerodynamic Computational Environment Developed by the Institute for Propulsion Technology of the German Aerospace Center (DLR-AT) Calculates internal turbo-machinery flows Finite volume method with block-structured grids The linearized TRACE modules require the parallel, iterative solution with preconditioning of large, sparse, non-symmetric real or complex systems of linear equations Folie 5

6 Preconditioners for TAU: Background TAU: developed for the aerodynamic design of aircrafts by the DLR Institute of Aerodynamics and Flow Technology Unstructured RANS solver (Reynolds-averaged Navier-Stokes), exploits finite volumes Requires the parallel, iterative solution with preconditioning of large, sparse, real, non-symmetric systems of linear equations Solvers used: geometric Multigrid, AMG preconditoned GMRes Here: experiments with DSC methods Folie 6

7 Storage Schemes for Sparse Matrices Compressed Row Storage (CSR) and Block Compressed Row Storage (BCSR) Non-zero values, row-wise: Matrix: Column indices, row-wise: Row pointers: TRACE and TAU apply BCSR with 5x5 blocks. Avantage: less indirect addressing Disadvantage: A few zeros are stored. Folie 7

8 DSC Method (1) Distributed matrix, 2 processors Folie 8

9 DSC Method (2) DSC Algorithm BiCGstab or GMRes iteration for the local interface rows (unknowns) Schematic view on each processor Folie 9

10 DSC Method (3) Preconditioning within the DSC algorithm Folie 10

11 Hardware System RWTH Bull HPC cluster Intel Westmere X5675 CPUs 6 cores per CPU with 3.06 GHz 12 cores (2 CPUs) per node Computations with 1 MPI process per core Folie 11

12 Experiments: CSR versus BCSR Format Block-Jacobi-ILU preconditioning with 12 processes TAU matrix: n=541,980; nz=170,610,950; ILU fill-in ratio 0.8; rel. res. < 10-5 Execution time in seconds ILU construction Iterations Block size Folie 12

13 Experiments: Strong Scaling, Iterations TRACE mat. UHBR: n=4,497,520; nz=552,324,700; threshold= ; rel. res. <10-5 # iterations # processes Folie 13

14 Experiments: Strong Scaling, Time TRACE mat. UHBR: n=4,497,520; nz=552,324,700; threshold= ; rel. res. <10-5 Execution time in seconds # processes Folie 14

15 lineartrace Performance: Internal versus DSC Solver (2x Intel XEON E5520 with 4 cores each, 2.26 GHz ) dsc2011 solver for lineartrace (8 processes, test case "THD stator": dim = 0.8 Mio, nnz = 90 Mio) Time in seconds # 140 iterations trace (setup matrix etc) solver iteration prec. preparation (ilut) 10 # 57 iterations 0 internal solver ( gmres100, ssor(0.7,3) ) dsc2011 ( fgmres40, dsc gmres 5, ilut(0.01,1) ) Folie 15

16 Conclusions BCSR format application significantly outperforms CSR format application for real TRACE and TAU problems. DSC method achieves higher scalability and faster iteration than block-local methods. DSC method very suitable for TRACE and TAU problems Future work Hybrid parallelization is appropriate to further improve scalability. Folie 16

17 Questions? Folie 17

18 DSC Solver: CSR versus BCSR Format (2x Intel XEON E5520 with 4 cores each, 2.26 GHz ) lineartrace matrix (8 processes, dim = 56,240, nnz = 2.6 Mio) 5 Time in seconds 4,5 4 3,5 3 2,5 2 1,5 real real blocked (bs=5) complex complex blocked (bs=5) 1 0,5 0 # 76 # 40 # 32 # 29 total ilut construction solver iteration (#number of iterations) Folie 18

19 DSC Method: Effect of the Interface Iteration (real) (2x Intel XEON E5520 with 4 cores each, 2.26 GHz ) 35 Results on 8 cores TAU matrix: n=541,980; nz=170,610,950; threshold = 10-3 ; rel. residual < 10-7 Solver iteration time in seconds interf-bicgstab, bs=1 interf-gmres, bs=1 interf-bicgstab, bs=5 interf-gmres, bs= interface iterations Folie 19

Distributed Schur Complement Solvers for Real and Complex Block-Structured CFD Problems

Distributed Schur Complement Solvers for Real and Complex Block-Structured CFD Problems Distributed Schur Complement Solvers for Real and Complex Block-Structured CFD Problems Dr.-Ing. Achim Basermann, Dr. Hans-Peter Kersken German Aerospace Center (DLR) Simulation- and Software Technology

More information

Open Source in Aeronautics and Space Research

Open Source in Aeronautics and Space Research Open Source in Aeronautics and Space Research Doreen Seider, Markus Litz (DLR - German Aerospace Center) ApacheCon NA 2010 (Atlanta, 11/03/2010) Slide 1 Slide 2 DLR: German Aerospace Center 6500 employees

More information

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013 GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»

More information

DLR in facts and figures 2015

DLR in facts and figures 2015 Imprint Publisher: Deutsches Zentrum für Luft- und Raumfahrt e. V. (DLR) German Aerospace Center DLR Corporate Communications Address: Linder Höhe, 51147 Köln Telephone: +49 2203 601-2116 E-mail: kommunikation@dlr.de

More information

A linear solver based on algebraic multigrid and defect correction for the solution of adjoint RANS equations

A linear solver based on algebraic multigrid and defect correction for the solution of adjoint RANS equations INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS Int. J. Numer. Meth. Fluids 2014; 74:846 855 Published online 24 January 2014 in Wiley Online Library (wileyonlinelibrary.com)..3878 A linear solver

More information

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2

More information

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC 2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy

More information

S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS

S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS John R Appleyard Jeremy D Appleyard Polyhedron Software with acknowledgements to Mark A Wakefield Garf Bowen Schlumberger Outline of Talk Reservoir

More information

Comparison of parallel preconditioners for a Newton-Krylov flow solver

Comparison of parallel preconditioners for a Newton-Krylov flow solver Comparison of parallel preconditioners for a Newton-Krylov flow solver Jason E. Hicken, Michal Osusky, and David W. Zingg 1Introduction Analysis of the results from the AIAA Drag Prediction workshops (Mavriplis

More information

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU - a Library for Iterative Sparse Methods on CPU and GPU Dimitar Lukarski Division of Scientific Computing Department of Information Technology Uppsala Programming for Multicore Architectures Research Center

More information

A parallel direct/iterative solver based on a Schur complement approach

A parallel direct/iterative solver based on a Schur complement approach A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics

More information

First Experiences with Intel Cluster OpenMP

First Experiences with Intel Cluster OpenMP First Experiences with Intel Christian Terboven, Dieter an Mey, Dirk Schmidl, Marcus Wagner surname@rz.rwth aachen.de Center for Computing and Communication RWTH Aachen University, Germany IWOMP 2008 May

More information

GPU PROGRESS AND DIRECTIONS IN APPLIED CFD

GPU PROGRESS AND DIRECTIONS IN APPLIED CFD Eleventh International Conference on CFD in the Minerals and Process Industries CSIRO, Melbourne, Australia 7-9 December 2015 GPU PROGRESS AND DIRECTIONS IN APPLIED CFD Stan POSEY 1*, Simon SEE 2, and

More information

Parallel resolution of sparse linear systems by mixing direct and iterative methods

Parallel resolution of sparse linear systems by mixing direct and iterative methods Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix

More information

Higher Order Multigrid Algorithms for a 2D and 3D RANS-kω DG-Solver

Higher Order Multigrid Algorithms for a 2D and 3D RANS-kω DG-Solver www.dlr.de Folie 1 > HONOM 2013 > Marcel Wallraff, Tobias Leicht 21. 03. 2013 Higher Order Multigrid Algorithms for a 2D and 3D RANS-kω DG-Solver Marcel Wallraff, Tobias Leicht DLR Braunschweig (AS - C

More information

European exascale applications workshop, Manchester, 11th and 12th October 2016 DLR TAU-Code - Application in INTERWinE

European exascale applications workshop, Manchester, 11th and 12th October 2016 DLR TAU-Code - Application in INTERWinE European exascale applications workshop, Manchester, 11th and 12th October 2016 DLR TAU-Code - Application in INTERWinE Thomas Gerhold, Barbara Brandfass, Jens Jägersküpper, DLR Christian Simmendinger,

More information

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Distributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs

Distributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Distributed NVAMG Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Istvan Reguly (istvan.reguly at oerc.ox.ac.uk) Oxford e-research Centre NVIDIA Summer Internship

More information

GPU Cluster Computing for FEM

GPU Cluster Computing for FEM GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing

More information

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail

HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail DLR.de Folie 1 HPCN-Workshop 14./15. Mai 2018 HPC Usage for Aerodynamic Flow Computation with Different Levels of Detail Cornelia Grabe, Marco Burnazzi, Axel Probst, Silvia Probst DLR, Institute of Aerodynamics

More information

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S

More information

Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks

Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Kevin Deweese 1 Erik Boman 2 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms

More information

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Iterative Sparse Triangular Solves for Preconditioning

Iterative Sparse Triangular Solves for Preconditioning Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations

More information

Accelerating the Iterative Linear Solver for Reservoir Simulation

Accelerating the Iterative Linear Solver for Reservoir Simulation Accelerating the Iterative Linear Solver for Reservoir Simulation Wei Wu 1, Xiang Li 2, Lei He 1, Dongxiao Zhang 2 1 Electrical Engineering Department, UCLA 2 Department of Energy and Resources Engineering,

More information

Enhanced Oil Recovery simulation Performances on New Hybrid Architectures

Enhanced Oil Recovery simulation Performances on New Hybrid Architectures Renewable energies Eco-friendly production Innovative transport Eco-efficient processes Sustainable resources Enhanced Oil Recovery simulation Performances on New Hybrid Architectures A. Anciaux, J-M.

More information

HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach

HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach Mini-workshop PHyLeaS associated team J. Gaidamour, P. Hénon July 9, 28 HIPS : an hybrid direct/iterative solver /

More information

GPU-based Parallel Reservoir Simulators

GPU-based Parallel Reservoir Simulators GPU-based Parallel Reservoir Simulators Zhangxin Chen 1, Hui Liu 1, Song Yu 1, Ben Hsieh 1 and Lei Shao 1 Key words: GPU computing, reservoir simulation, linear solver, parallel 1 Introduction Nowadays

More information

PARDISO Version Reference Sheet Fortran

PARDISO Version Reference Sheet Fortran PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly

More information

Digital-X. Towards Virtual Aircraft Design and Testing based on High-Fidelity Methods - Recent Developments at DLR -

Digital-X. Towards Virtual Aircraft Design and Testing based on High-Fidelity Methods - Recent Developments at DLR - Digital-X Towards Virtual Aircraft Design and Testing based on High-Fidelity Methods - Recent Developments at DLR - O. Brodersen, C.-C. Rossow, N. Kroll DLR Institute of Aerodynamics and Flow Technology

More information

NEW ADVANCES IN GPU LINEAR ALGEBRA

NEW ADVANCES IN GPU LINEAR ALGEBRA GTC 2012: NEW ADVANCES IN GPU LINEAR ALGEBRA Kyle Spagnoli EM Photonics 5/16/2012 QUICK ABOUT US» HPC/GPU Consulting Firm» Specializations in:» Electromagnetics» Image Processing» Fluid Dynamics» Linear

More information

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany

More information

Performance of Implicit Solver Strategies on GPUs

Performance of Implicit Solver Strategies on GPUs 9. LS-DYNA Forum, Bamberg 2010 IT / Performance Performance of Implicit Solver Strategies on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Abstract: The increasing power of GPUs can be used

More information

Numerical Investigation of Transonic Shock Oscillations on Stationary Aerofoils

Numerical Investigation of Transonic Shock Oscillations on Stationary Aerofoils Numerical Investigation of Transonic Shock Oscillations on Stationary Aerofoils A. Soda, T. Knopp, K. Weinman German Aerospace Center DLR, Göttingen/Germany Symposium on Hybrid RANS-LES Methods Stockholm/Sweden,

More information

OpenFOAM + GPGPU. İbrahim Özküçük

OpenFOAM + GPGPU. İbrahim Özküçük OpenFOAM + GPGPU İbrahim Özküçük Outline GPGPU vs CPU GPGPU plugins for OpenFOAM Overview of Discretization CUDA for FOAM Link (cufflink) Cusp & Thrust Libraries How Cufflink Works Performance data of

More information

Multi-GPU simulations in OpenFOAM with SpeedIT technology.

Multi-GPU simulations in OpenFOAM with SpeedIT technology. Multi-GPU simulations in OpenFOAM with SpeedIT technology. Attempt I: SpeedIT GPU-based library of iterative solvers for Sparse Linear Algebra and CFD. Current version: 2.2. Version 1.0 in 2008. CMRS format

More information

GPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA

GPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA GPU-Accelerated Algebraic Multigrid for Commercial Applications Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA ANSYS Fluent 2 Fluent control flow Accelerate this first Non-linear iterations Assemble

More information

Recent developments in the solution of indefinite systems Location: De Zwarte Doos (TU/e campus)

Recent developments in the solution of indefinite systems Location: De Zwarte Doos (TU/e campus) 1-day workshop, TU Eindhoven, April 17, 2012 Recent developments in the solution of indefinite systems Location: De Zwarte Doos (TU/e campus) :10.25-10.30: Opening and word of welcome 10.30-11.15: Michele

More information

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle ICES Student Forum The University of Texas at Austin, USA November 4, 204 Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of

More information

Scalable, Hybrid-Parallel Multiscale Methods using DUNE

Scalable, Hybrid-Parallel Multiscale Methods using DUNE MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE R. Milk S. Kaulmann M. Ohlberger December 1st 2014 Outline MÜNSTER Scalable Hybrid-Parallel Multiscale Methods using DUNE 2 /28 Abstraction

More information

GPU Acceleration of Unmodified CSM and CFD Solvers

GPU Acceleration of Unmodified CSM and CFD Solvers GPU Acceleration of Unmodified CSM and CFD Solvers Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de

More information

European Symposium on Satellite-AIS

European Symposium on Satellite-AIS European Symposium on Satellite-AIS 6 th -7 th December 2010 Atlantic Hotel Universum Bremen Scope The symposium focuses on satellite-based Automatic Identification Signal (AIS) detection and data exploitation

More information

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Hartwig Anzt 1, Marc Baboulin 2, Jack Dongarra 1, Yvan Fournier 3, Frank Hulsemann 3, Amal Khabou 2, and Yushan Wang 2 1 University

More information

European exascale applications workshop, Edinburgh, 19th/20th April 2018 Asynchronous Execution in DLR's CFD Solvers

European exascale applications workshop, Edinburgh, 19th/20th April 2018 Asynchronous Execution in DLR's CFD Solvers European exascale applications workshop, Edinburgh, 19th/20th April 2018 Asynchronous Execution in DLR's CFD Solvers Thomas Gerhold Institute of Software Methods for Product Virtualization, Dresden DLR

More information

Exploring unstructured Poisson solvers for FDS

Exploring unstructured Poisson solvers for FDS Exploring unstructured Poisson solvers for FDS Dr. Susanne Kilian hhpberlin - Ingenieure für Brandschutz 10245 Berlin - Germany Agenda 1 Discretization of Poisson- Löser 2 Solvers for 3 Numerical Tests

More information

Highly Parallel Multigrid Solvers for Multicore and Manycore Processors

Highly Parallel Multigrid Solvers for Multicore and Manycore Processors Highly Parallel Multigrid Solvers for Multicore and Manycore Processors Oleg Bessonov (B) Institute for Problems in Mechanics of the Russian Academy of Sciences, 101, Vernadsky Avenue, 119526 Moscow, Russia

More information

A General Sparse Sparse Linear System Solver and Its Application in OpenFOAM

A General Sparse Sparse Linear System Solver and Its Application in OpenFOAM Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A General Sparse Sparse Linear System Solver and Its Application in OpenFOAM Murat Manguoglu * Middle East Technical University,

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26

More information

Algebraic Multigrid (AMG) for Ground Water Flow and Oil Reservoir Simulation

Algebraic Multigrid (AMG) for Ground Water Flow and Oil Reservoir Simulation lgebraic Multigrid (MG) for Ground Water Flow and Oil Reservoir Simulation Klaus Stüben, Patrick Delaney 2, Serguei Chmakov 3 Fraunhofer Institute SCI, Klaus.Stueben@scai.fhg.de, St. ugustin, Germany 2

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

Paralution & ViennaCL

Paralution & ViennaCL Paralution & ViennaCL Clemens Schiffer June 12, 2014 Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, 2014 1 / 32 Introduction Clemens Schiffer (Uni Graz) Paralution & ViennaCL June 12, 2014

More information

Efficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI

Efficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI Efficient AMG on Hybrid GPU Clusters ScicomP 2012 Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann Fraunhofer SCAI Illustration: Darin McInnis Motivation Sparse iterative solvers benefit from

More information

Chart 1 Application of AD in Turbomachinery Design 19 th European Workshop on Automatic Differentiation Jan Backhaus DLR Cologne

Chart 1 Application of AD in Turbomachinery Design 19 th European Workshop on Automatic Differentiation Jan Backhaus DLR Cologne www.dlr.de Chart 1 Application of AD in Turbomachinery Design 19 th European Workshop on Automatic Differentiation Jan Backhaus DLR Cologne www.dlr.de Chart 2 CFD based Optimization CRISP 1 rig Gradient-free

More information

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids

A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids A Scalable GPU-Based Compressible Fluid Flow Solver for Unstructured Grids Patrice Castonguay and Antony Jameson Aerospace Computing Lab, Stanford University GTC Asia, Beijing, China December 15 th, 2011

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

Sparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg

Sparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg Sparse Matrices Many matrices in computing only contain a very small percentage of nonzeros. Such matrices are called sparse ( dünn besetzt ). Often, an upper bound on the number of nonzeros in a row can

More information

Parallel Computations

Parallel Computations Parallel Computations Timo Heister, Clemson University heister@clemson.edu 2015-08-05 deal.ii workshop 2015 2 Introduction Parallel computations with deal.ii: Introduction Applications Parallel, adaptive,

More information

The 3D DSC in Fluid Simulation

The 3D DSC in Fluid Simulation The 3D DSC in Fluid Simulation Marek K. Misztal Informatics and Mathematical Modelling, Technical University of Denmark mkm@imm.dtu.dk DSC 2011 Workshop Kgs. Lyngby, 26th August 2011 Governing Equations

More information

On Robust Parallel Preconditioning for Incompressible Flow Problems

On Robust Parallel Preconditioning for Incompressible Flow Problems On Robust Parallel Preconditioning for Incompressible Flow Problems Timo Heister, Gert Lube, and Gerd Rapin Abstract We consider time-dependent flow problems discretized with higher order finite element

More information

Elmer 8/16/2012. Parallel computing concepts. Parallel Computing. Parallel programming models. Parallel computers. Execution model

Elmer 8/16/2012. Parallel computing concepts. Parallel Computing. Parallel programming models. Parallel computers. Execution model Parallel computing concepts Elmer Parallel Computing ElmerTeam Parallel computation means executing tasks concurrently A task encapsulates a sequential program and local data, and its interface to its

More information

State of the art at DLR in solving aerodynamic shape optimization problems using the discrete viscous adjoint method

State of the art at DLR in solving aerodynamic shape optimization problems using the discrete viscous adjoint method DLR - German Aerospace Center State of the art at DLR in solving aerodynamic shape optimization problems using the discrete viscous adjoint method J. Brezillon, C. Ilic, M. Abu-Zurayk, F. Ma, M. Widhalm

More information

Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers and Applications in CFD and CSM

Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers and Applications in CFD and CSM Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers and Applications in CFD and CSM Dominik Göddeke and Robert Strzodka Institut für Angewandte Mathematik (LS3), TU Dortmund Max Planck Institut

More information

Acceleration of Computational Fluid Dynamics Analysis by using Multiple GPUs

Acceleration of Computational Fluid Dynamics Analysis by using Multiple GPUs Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 103 Acceleration of Computational Fluid Dynamics Analysis by using Multiple s Hyungdo Lee 1, Bongjae Kim 2, Kyounghak Lee 3, Hyedong Jung

More information

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation

A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation Amir Nejat * and Carl Ollivier-Gooch Department of Mechanical Engineering, The University of British Columbia, BC V6T 1Z4, Canada

More information

Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes

Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes Pramod Kumbhar August 19, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract

More information

Infrastructure for Simulation Science

Infrastructure for Simulation Science Infrastructure for Simulation Science Christian Bischof niversity Center for Computing and Communications (CCC) Institute for Scientific Computing Center for Computing and Communications 1 Simulation Science

More information

GPU-Acceleration of CAE Simulations. Bhushan Desam NVIDIA Corporation

GPU-Acceleration of CAE Simulations. Bhushan Desam NVIDIA Corporation GPU-Acceleration of CAE Simulations Bhushan Desam NVIDIA Corporation bdesam@nvidia.com 1 AGENDA GPUs in Enterprise Computing Business Challenges in Product Development NVIDIA GPUs for CAE Applications

More information

On Convergence Acceleration Techniques for Unstructured Meshes

On Convergence Acceleration Techniques for Unstructured Meshes NASA/CR-1998-208732 ICASE Report No. 98-44 On Convergence Acceleration Techniques for Unstructured Meshes Dimitri J. Mavriplis ICASE, Hampton, Virginia Institute for Computer Applications in Science and

More information

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation

More information

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application

More information

Super Matrix Solver-P-ICCG:

Super Matrix Solver-P-ICCG: Super Matrix Solver-P-ICCG: February 2011 VINAS Co., Ltd. Project Development Dept. URL: http://www.vinas.com All trademarks and trade names in this document are properties of their respective owners.

More information

Advanced Numerical Techniques for Cluster Computing

Advanced Numerical Techniques for Cluster Computing Advanced Numerical Techniques for Cluster Computing Presented by Piotr Luszczek http://icl.cs.utk.edu/iter-ref/ Presentation Outline Motivation hardware Dense matrix calculations Sparse direct solvers

More information

An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC

An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC Pi-Yueh Chuang The George Washington University Fernanda S. Foertter Oak Ridge National Laboratory Goal Develop an OpenACC

More information

nag sparse nsym sol (f11dec)

nag sparse nsym sol (f11dec) f11 Sparse Linear Algebra f11dec nag sparse nsym sol (f11dec) 1. Purpose nag sparse nsym sol (f11dec) solves a real sparse nonsymmetric system of linear equations, represented in coordinate storage format,

More information

Available online at ScienceDirect. Parallel Computational Fluid Dynamics Conference (ParCFD2013)

Available online at  ScienceDirect. Parallel Computational Fluid Dynamics Conference (ParCFD2013) Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 61 ( 2013 ) 81 86 Parallel Computational Fluid Dynamics Conference (ParCFD2013) An OpenCL-based parallel CFD code for simulations

More information

A Parallel Solver for Laplacian Matrices. Tristan Konolige (me) and Jed Brown

A Parallel Solver for Laplacian Matrices. Tristan Konolige (me) and Jed Brown A Parallel Solver for Laplacian Matrices Tristan Konolige (me) and Jed Brown Graph Laplacian Matrices Covered by other speakers (hopefully) Useful in a variety of areas Graphs are getting very big Facebook

More information

Krishnan Suresh Associate Professor Mechanical Engineering

Krishnan Suresh Associate Professor Mechanical Engineering Large Scale FEA on the GPU Krishnan Suresh Associate Professor Mechanical Engineering High-Performance Trick Computations (i.e., 3.4*1.22): essentially free Memory access determines speed of code Pick

More information

Nonsymmetric Problems. Abstract. The eect of a threshold variant TPABLO of the permutation

Nonsymmetric Problems. Abstract. The eect of a threshold variant TPABLO of the permutation Threshold Ordering for Preconditioning Nonsymmetric Problems Michele Benzi 1, Hwajeong Choi 2, Daniel B. Szyld 2? 1 CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France (benzi@cerfacs.fr) 2 Department

More information

Application of GPU technology to OpenFOAM simulations

Application of GPU technology to OpenFOAM simulations Application of GPU technology to OpenFOAM simulations Jakub Poła, Andrzej Kosior, Łukasz Miroslaw jakub.pola@vratis.com, www.vratis.com Wroclaw, Poland Agenda Motivation Partial acceleration SpeedIT OpenFOAM

More information

Automatic Tuning of Sparse Matrix Kernels

Automatic Tuning of Sparse Matrix Kernels Automatic Tuning of Sparse Matrix Kernels Kathy Yelick U.C. Berkeley and Lawrence Berkeley National Laboratory Richard Vuduc, Lawrence Livermore National Laboratory James Demmel, U.C. Berkeley Berkeley

More information

PROJECT REPORT. Symbol based Multigrid method. Hreinn Juliusson, Johanna Brodin, Tianhao Zhang Project in Computational Science: Report

PROJECT REPORT. Symbol based Multigrid method. Hreinn Juliusson, Johanna Brodin, Tianhao Zhang Project in Computational Science: Report Symbol based Multigrid method Hreinn Juliusson, Johanna Brodin, Tianhao Zhang Project in Computational Science: Report February 20, 2018 PROJECT REPORT Department of Information Technology Abstract Discretization

More information

Hybrid Simulation of Wake Vortices during Landing HPCN-Workshop 2014

Hybrid Simulation of Wake Vortices during Landing HPCN-Workshop 2014 Hybrid Simulation of Wake Vortices during Landing HPCN-Workshop 2014 A. Stephan 1, F. Holzäpfel 1, T. Heel 1 1 Institut für Physik der Atmosphäre, DLR, Oberpfaffenhofen, Germany Aircraft wake vortices

More information

Combinatorial problems in a Parallel Hybrid Linear Solver

Combinatorial problems in a Parallel Hybrid Linear Solver Combinatorial problems in a Parallel Hybrid Linear Solver Ichitaro Yamazaki and Xiaoye Li Lawrence Berkeley National Laboratory François-Henry Rouet and Bora Uçar ENSEEIHT-IRIT and LIP, ENS-Lyon SIAM workshop

More information

VBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems Liao, Jia

VBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems Liao, Jia University of Groningen VBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems Liao, Jia IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation

More information

The next-generation CFD solver Flucs HPC aspects

The next-generation CFD solver Flucs HPC aspects The next-generation CFD solver Flucs HPC aspects Jens Jägersküpper German Aerospace Center Institute of Aerodynamics and Flow Technology Center for Computer Applications in AeroSpace Science and Engineering

More information

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea.

PhD Student. Associate Professor, Co-Director, Center for Computational Earth and Environmental Science. Abdulrahman Manea. Abdulrahman Manea PhD Student Hamdi Tchelepi Associate Professor, Co-Director, Center for Computational Earth and Environmental Science Energy Resources Engineering Department School of Earth Sciences

More information

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new

More information

Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems

Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems Mathematical Problems in Engineering Volume 2015, Article ID 147286, 7 pages http://dx.doi.org/10.1155/2015/147286 Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity

More information

Real Application Performance and Beyond

Real Application Performance and Beyond Real Application Performance and Beyond Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400 Fax: 408-970-3403 http://www.mellanox.com Scientists, engineers and analysts

More information

Two-Phase flows on massively parallel multi-gpu clusters

Two-Phase flows on massively parallel multi-gpu clusters Two-Phase flows on massively parallel multi-gpu clusters Peter Zaspel Michael Griebel Institute for Numerical Simulation Rheinische Friedrich-Wilhelms-Universität Bonn Workshop Programming of Heterogeneous

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

NAG Library Function Document nag_sparse_nsym_sol (f11dec)

NAG Library Function Document nag_sparse_nsym_sol (f11dec) f11 Large Scale Linear Systems NAG Library Function Document nag_sparse_nsym_sol () 1 Purpose nag_sparse_nsym_sol () solves a real sparse nonsymmetric system of linear equations, represented in coordinate

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964

More information

Scheduling Strategies for Parallel Sparse Backward/Forward Substitution

Scheduling Strategies for Parallel Sparse Backward/Forward Substitution Scheduling Strategies for Parallel Sparse Backward/Forward Substitution J.I. Aliaga M. Bollhöfer A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ. Jaume I (Spain) {aliaga,martina,quintana}@icc.uji.es

More information