Performance Strategies for Parallel Mathematical Libraries Based on Historical Knowledgebase

Size: px
Start display at page:

Download "Performance Strategies for Parallel Mathematical Libraries Based on Historical Knowledgebase"

Transcription

1 Performance Strategies for Parallel Mathematical Libraries Based on Historical Knowledgebase CScADS workshop 29 Eduardo Cesar, Anna Morajko, Ihab Salawdeh Universitat Autònoma de Barcelona

2 Objective Mathematical Library (PETSc) Performance Methodology Knowledgebase Data Analysis Load balancing Memory pre-allocation Summary What we have would like 2

3 Design a methodology, based on input data, that can automatically choose between existing solving parameters to improve PETSc application s performance 3

4 Application Code Other higher Level solvers PDE Solvers SNES TS Time Stepping SLES Mathematical Algorithms KSP PC Draw Data Structure Matrices Vectors Index set Level of abstraction BLAS LAPACK MPI 4

5 Data Structure Matrices Vectors Index set Fundamental objects for storing linear operators There is no data structure appropriate for all problems. PETSc supports: dense storage where each process stores its entries in the usual Array style. Compressed sparse row storage, where only the nonzero values be stored. Block compressed Sparse row Storage Block Diagonal Storage 5

6 Application Code Other higher Level solvers PDE Solvers SNES TS Time Stepping SLES Mathematical Algorithms KSP PC Draw Data Structure Matrices Vectors Index set Level of abstraction BLAS LAPACK MPI 6

7 Mathematical Algorithms KSP PC To solve a linear system you need to calculate the matrix s preconditioner (PC) then to apply the Krylov Subspace method (KSP) Different mathematical algorithms exist either to calculate the preconditioner (jacobi, LU, etc.) or for the KSP (GMRES, CG, etc.) 7

8 Pattern recognition Data Mining Load balancing Knowledgebase Mathematical Constrains Mpiaij rechardson jaco Mpiaij chebychev asm... Dense cgs bjacobi 4 Mpibdaij chebychev as Mpiaij rechardson jaco Mpiaij chebychev asm Dense cgs bjacobi diag Solve problem Control Solution Memory preallocation

9 PETSc Solve Ax = b Main Routine Linear Solvers (KSP) PC Application Initialization Initialize A and b Post- Processing User code PETSc code 9

10 PETSc Solve Ax = b Pattern recognition Main Routine Knowledgebase Linear Solvers (KSP) PC Data Mining Solver Parameters Decision Application Initialization Initialize A and b Post- Processing User code PETSc code User code PETSc code Tuning Code

11 Main Routine Pattern recognition Knowledgebase Data Mining Solver Parameters Decision PETSc Solve Ax = b Linear Solvers (KSP) PC Application Initialization Initialize A and b Post- Processing User code PETSc code Tuning Code

12 Contains performance characterization of historical executions for different input data and hardware configurations It is organized according to the most common matrix patterns and the matrix size 2

13 Input Type Input Size No. Processors Memory Representation KSP PC Total Time (s) Matrix Pattern Around-Diagonal 952X952 8 mpiaij bcgs asm 3, Around-Diagonal 952X952 8 mpiaij bcgs bjacobi 25,5934 Around-Diagonal 952X952 8 mpiaij bcgs jacobi 35, Around-Diagonal 952X952 8 mpiaij bcgsl asm 222,55694 Around-Diagonal 952X952 8 mpiaij bcgsl bjacobi 89,236794,942,46,7576 2,63E-5,3553,489,896,2772 2,87E-5,7592,3492,298,72,275,8965,725,2237,86,767,78,94,427,57,245,23,756,866,373,3687,43,439,22,2387,59,965,852,569,7959,2629,457,4,52,246,3,846,2625,2327,7945,585,27,682,37,898,65,23,859,4,2374,298,642,925,423,2647,893,38,257,998,89,58,267,928,645,44,959,488,5944,624,623,5,885,2642,25,2499,359 5,25E-5,46,5852,4293,839,367,224,8769,9,9E-6,359,254,25,527,628,4946,623,873,258,224,2648,8342,2474 5,25E-5,52,6252,428,354,342,226,9,2455,8324,542,5666,632,234,375,6967,6452,69,2336,23866 Around-Diagonal 952X952 8 mpiaij bcgs asm 3, Distributed 9242X mpiaij bcgs asm 522,5523 Distributed 9242X mpiaij bcgs bjacobi 455,43445 Distributed 9242X mpiaij bcgs jacobi 459,62954 Distributed 9242X mpiaij bcgsl asm 457,2476,2539,2526,26,24,34,237,366,2525,2554,2556,8,5,333,368,2544,2546,2529,26 4,5E-5,39,379,376,47,759,6,763,62,25,6,326,35,29,45,6,38,56,26 7,2E-5,338,362,389,7,7 9,5E-5 2,7E-5,4E-6 5,9E-5,49,26,44,7,7,4 5,E-6 4,2E-5,28,5,44,26 2,8E-5,22,4 6,7E-5,37,27,9,45,74 4,E-6,6 9,8E-5,,34,26,48,5,5,26 8,9E-6 7E-6,,76,35,94,49,229,52,274,E-6 5,4E-7 8,2E-5,6,24,34,237,366,243,669,26,2,2,,8,5,333,368,368,37,692,263,22,2,22,2,2 4,5E-5,39,379,376,37,677,2546,2564,26,7 4,9E-6,7 5,E-6,99,25,6,326,35,396,37,685,2573,2569,2599,35,7,35,7,7 7,2E-5,338,362,389,397,684,2548,257,2548,2572,26,89,42,88,42,385,23,2,352,245,398,3,73,755,259,758,2586,82,2433,8,2427,82,6,52,26,253,2,263,3,744,6,75,2,768,92,26,2526,2542,2543,2535,76,54,2,7 3,4E-5 9,5E-6,9E-5,4E-6,2 4,9E-6,7,42,826,2559,258,2533,259,2534,5,9,3,8 2,4E-6,7E-5,E-6,2,22,6,35,86,2435,254,26 Distributed 9242X mpiaij bcgsl bjacobi 456,5858 Distributed 9242X mpiaij bcgsl jacobi 455,3952 3

14 4 25 Non-zero Blocks % Non-zero Diagonal

15 5 25 Non-zero Blocks % Non-zero Diagonal

16 6 25 Non-zero Blocks % Non-zero Diagonal 8 Non-zero Blocks 5% Non-zero Diagonal = 25 Diagonal distance = 5%

17 Number of Similar Entries = 39 Number of Different Entries = 25

18 Matrix No. Similar Entries (intersection) No. of Different Entries Union Matrix Similarity (Fitness) City Block (Manhattan) distance Diagonal Density Distance Mat % Mat % - Fitness Distance Max Distance (64)

19 Matrix No. Similar Entries (intersection) No. of Different Entries Union Matrix Similarity (Fitness) N. City Block (Manhattan) distance N. Diagonal Density Distance Mat Mat

20 Pattern recognition Data Mining Load balancing Knowledgebase Mathematical Constrains Mpiaij rechardson jaco Mpiaij chebychev asm... Dense cgs bjacobi 4 Mpibdaij chebychev as Mpiaij rechardson jaco Mpiaij chebychev asm Dense cgs bjacobi diag Solve problem Control Solution Memory preallocation

21 5 6 7 Uniform Matrix Distribution using PETSC_DECIDE Non uniform Matrix Distribution using PETSC_DECIDE Non uniform Matrix Distribution using Load Balancing Module

22 Mat A; Int column[3],i; double value[3]; MatCreate (PETSC_COMM_WORLD,&A); MatSetSizes (A, PETSC_DECIDE, PETSC_DECIDE, N, N); MatSetFromOption(A); Matrix global index value[] = -.; value[] = 2.; value[2] = -.; if (rank == ) { for (i=; i<n-2; i++) { column[] = i-; column[] = i; column[2] = i+; MatSetValues(A,,&i,3,column,value,INSERT_VALUES); } } // Must set also the and N- values PETSc determine the matrix cross-processor distribution MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);

23 Line-based distribution for a 9242x 9242 matrix (PETSC_DECIDE) Proc. Number of Lines Values Total

24 Value-based distribution for a 9242x 9242 matrix Proc. Number of Lines Values Total

25 After specifying the data structure properties and the way it will be distributed, it is needed to allocate the memory. Automatic: each SET_VALUE call PETSc allocates the memory that corresponds to this value. (nonzero values malloc) Each processor allocate a memory of the size of its corresponding rows. ( malloc) Allocate the memory for the local sub matrix depending on the most dense row. ( malloc) Calculate the number of nonzero values per row. And allocate the memory for the row. (No. rows malloc) Each local processor loads its own data from the file 29

26 Create Setting Reading Assembly Solve Create Setting Reading Assembly Solve Execution time (s) 2 5 Executin time (s) Perfromance Performance 3

27 We have designed a methodology based on input data that can automatically choose between existing solving parameters to improve PETSc application s performance Our methodology follows data mining techniques and is based on the comparison of the input matrix with a set of predefined patterns that are stored in a knowledgebase The load balancing method automatically hands data out to all processes based on the non zero-value distribution across the matrix Choosing the suitable memory pre-allocation method improves memory utilization and reduces the cost of memory allocation 33

28 Components for analyzing the input matrix Database with hundreds of executions Components for load balancing and memory preallocation We can easily produce a wrapper for PETSc calls A more detailed characterization of the hardware (cache, memory, FLOPS, #cores?) 34

Anna Morajko.

Anna Morajko. Performance analysis and tuning of parallel/distributed applications Anna Morajko Anna.Morajko@uab.es 26 05 2008 Introduction Main research projects Develop techniques and tools for application performance

More information

PETSc Satish Balay, Kris Buschelman, Bill Gropp, Dinesh Kaushik, Lois McInnes, Barry Smith

PETSc   Satish Balay, Kris Buschelman, Bill Gropp, Dinesh Kaushik, Lois McInnes, Barry Smith PETSc http://www.mcs.anl.gov/petsc Satish Balay, Kris Buschelman, Bill Gropp, Dinesh Kaushik, Lois McInnes, Barry Smith PDE Application Codes PETSc PDE Application Codes! ODE Integrators! Nonlinear Solvers,!

More information

Introduction to PETSc KSP, PC. CS595, Fall 2010

Introduction to PETSc KSP, PC. CS595, Fall 2010 Introduction to PETSc KSP, PC CS595, Fall 2010 1 Linear Solution Main Routine PETSc Solve Ax = b Linear Solvers (KSP) PC Application Initialization Evaluation of A and b Post- Processing User code PETSc

More information

High-Performance Libraries and Tools. HPC Fall 2012 Prof. Robert van Engelen

High-Performance Libraries and Tools. HPC Fall 2012 Prof. Robert van Engelen High-Performance Libraries and Tools HPC Fall 2012 Prof. Robert van Engelen Overview Dense matrix BLAS (serial) ATLAS (serial/threaded) LAPACK (serial) Vendor-tuned LAPACK (shared memory parallel) ScaLAPACK/PLAPACK

More information

What it Takes to Assign Blame

What it Takes to Assign Blame What it Takes to Assign Blame Nick Rutar Jeffrey K. Hollingsworth Parallel Framework Mapping Traditional profiling represented as Functions, Basic Blocks, Statement Frameworks have intuitive abstractions

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26

More information

PETSc PRACE video tutorial

PETSc PRACE video tutorial PETSc PRACE video tutorial Václav Hapla IT4Innovations & Department of Applied Mathematics VSB - Technical University of Ostrava Ostrava, Czech Republic PETSc PRACE video tutorial Part IV: Matrices Václav

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing W. P. Petersen Seminar for Applied Mathematics Department of Mathematics, ETHZ, Zurich wpp@math. ethz.ch P. Arbenz Institute for Scientific Computing Department Informatik,

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

MPI Related Software

MPI Related Software 1 MPI Related Software Profiling Libraries and Tools Visualizing Program Behavior Timing Performance Measurement and Tuning High Level Libraries Profiling Libraries MPI provides mechanism to intercept

More information

MPI Related Software. Profiling Libraries. Performance Visualization with Jumpshot

MPI Related Software. Profiling Libraries. Performance Visualization with Jumpshot 1 MPI Related Software Profiling Libraries and Tools Visualizing Program Behavior Timing Performance Measurement and Tuning High Level Libraries Performance Visualization with Jumpshot For detailed analysis

More information

THE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS

THE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS ITERATIVE LIEAR SOLVERS. Objectives The goals of the laboratory workshop are as follows: to learn basic properties of iterative methods for solving linear least squares problems, to study the properties

More information

Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems

Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems Mathematical Problems in Engineering Volume 2015, Article ID 147286, 7 pages http://dx.doi.org/10.1155/2015/147286 Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity

More information

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013 GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»

More information

Sparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg

Sparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg Sparse Matrices Many matrices in computing only contain a very small percentage of nonzeros. Such matrices are called sparse ( dünn besetzt ). Often, an upper bound on the number of nonzeros in a row can

More information

SLEPc: Scalable Library for Eigenvalue Problem Computations

SLEPc: Scalable Library for Eigenvalue Problem Computations SLEPc: Scalable Library for Eigenvalue Problem Computations Jose E. Roman Joint work with A. Tomas and E. Romero Universidad Politécnica de Valencia, Spain 10th ACTS Workshop - August, 2009 Outline 1 Introduction

More information

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative

More information

Introduction to SLEPc, the Scalable Library for Eigenvalue Problem Computations

Introduction to SLEPc, the Scalable Library for Eigenvalue Problem Computations Introduction to SLEPc, the Scalable Library for Eigenvalue Problem Computations Jose E. Roman Universidad Politécnica de Valencia, Spain (joint work with Andres Tomas) Porto, June 27-29, 2007 Outline 1

More information

Lecture 17: More Fun With Sparse Matrices

Lecture 17: More Fun With Sparse Matrices Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you

More information

IBM Research. IBM Research Report

IBM Research. IBM Research Report RC 24398 (W0711-017) November 5, 2007 (Last update: June 28, 2018) Computer Science/Mathematics IBM Research Report WSMP: Watson Sparse Matrix Package Part III iterative solution of sparse systems Version

More information

Automatic Tuning of Sparse Matrix Kernels

Automatic Tuning of Sparse Matrix Kernels Automatic Tuning of Sparse Matrix Kernels Kathy Yelick U.C. Berkeley and Lawrence Berkeley National Laboratory Richard Vuduc, Lawrence Livermore National Laboratory James Demmel, U.C. Berkeley Berkeley

More information

Batched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs

Batched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs Workshop on Batched, Reproducible, and Reduced Precision BLAS Atlanta, GA 02/25/2017 Batched Factorization and Inversion Routines for Block-Jacobi Preconditioning on GPUs Hartwig Anzt Joint work with Goran

More information

PETSCEXT-V3.0.0: BLOCK EXTENSIONS TO PETSC

PETSCEXT-V3.0.0: BLOCK EXTENSIONS TO PETSC PETSCEXT-V300: BLOCK EXTENSIONS TO PETSC DAVE A MAY 1 Overview The discrete form of coupled partial differential equations require some ordering of the unknowns For example, fluid flow problems involving

More information

Building a Successful Scalable Parallel Numerical Library: Lessons From the PETSc Library. William D. Gropp

Building a Successful Scalable Parallel Numerical Library: Lessons From the PETSc Library. William D. Gropp Building a Successful Scalable Parallel Numerical Library: Lessons From the PETSc Library William D. Gropp www.cs.uiuc.edu/homes/wgropp 1 What is PETSc? PETSc is a numerical library that is organized around

More information

fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner

fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 08 01 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer

More information

Statistical and Machine Learning Techniques Applied to Algorithm Selection for Solving Sparse Linear Systems

Statistical and Machine Learning Techniques Applied to Algorithm Selection for Solving Sparse Linear Systems University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 12-2007 Statistical and Machine Learning Techniques Applied to Algorithm Selection

More information

PARDISO Version Reference Sheet Fortran

PARDISO Version Reference Sheet Fortran PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly

More information

SCALABLE ALGORITHMS for solving large sparse linear systems of equations

SCALABLE ALGORITHMS for solving large sparse linear systems of equations SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational

More information

fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner

fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 09 10 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer

More information

PetIGA. A Framework for High Performance Isogeometric Analysis. Santa Fe, Argentina. Knoxville, United States. Thuwal, Saudi Arabia

PetIGA. A Framework for High Performance Isogeometric Analysis. Santa Fe, Argentina. Knoxville, United States. Thuwal, Saudi Arabia PetIGA A Framework for High Performance Isogeometric Analysis Lisandro Dalcin 1,3, Nathaniel Collier 2, Adriano Côrtes 3, Victor M. Calo 3 1 Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)

More information

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker BLAS and LAPACK + Data Formats for Sparse Matrices Part of the lecture Wissenschaftliches Rechnen Hilmar Wobker Institute of Applied Mathematics and Numerics, TU Dortmund email: hilmar.wobker@math.tu-dortmund.de

More information

1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order).

1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order). Exercises Exercises 1. Represent each of these relations on {1, 2, 3} with a matrix (with the elements of this set listed in increasing order). a) {(1, 1), (1, 2), (1, 3)} b) {(1, 2), (2, 1), (2, 2), (3,

More information

BLASFEO. Gianluca Frison. BLIS retreat September 19, University of Freiburg

BLASFEO. Gianluca Frison. BLIS retreat September 19, University of Freiburg University of Freiburg BLIS retreat September 19, 217 Basic Linear Algebra Subroutines For Embedded Optimization performance dgemm_nt 5 4 Intel Core i7 48MQ HP OpenBLAS.2.19 MKL 217.2.174 ATLAS 3.1.3 BLIS.1.6

More information

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply University of California, Berkeley Berkeley Benchmarking and Optimization Group (BeBOP) http://bebop.cs.berkeley.edu

More information

Using PETSc for scientific computing in parallel

Using PETSc for scientific computing in parallel Using PETSc for scientific computing in parallel Ed Bueler Dept. of Mathematics and Statistics, UAF 19 June, 2007 (ARSC) This talk comes out of joint work on PISM with Jed Brown (now at ETH Switzerland),

More information

AMath 483/583 Lecture 22. Notes: Another Send/Receive example. Notes: Notes: Another Send/Receive example. Outline:

AMath 483/583 Lecture 22. Notes: Another Send/Receive example. Notes: Notes: Another Send/Receive example. Outline: AMath 483/583 Lecture 22 Outline: MPI Master Worker paradigm Linear algebra LAPACK and the BLAS References: $UWHPSC/codes/mpi class notes: MPI section class notes: Linear algebra Another Send/Receive example

More information

NEW ADVANCES IN GPU LINEAR ALGEBRA

NEW ADVANCES IN GPU LINEAR ALGEBRA GTC 2012: NEW ADVANCES IN GPU LINEAR ALGEBRA Kyle Spagnoli EM Photonics 5/16/2012 QUICK ABOUT US» HPC/GPU Consulting Firm» Specializations in:» Electromagnetics» Image Processing» Fluid Dynamics» Linear

More information

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S

More information

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016

ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 ACCELERATING CFD AND RESERVOIR SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic Multi-Grid (AMG)? AGENDA Why use AMG? When to use AMG? NVIDIA AmgX Results 2

More information

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage

More information

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi

More information

Dynamic Selection of Auto-tuned Kernels to the Numerical Libraries in the DOE ACTS Collection

Dynamic Selection of Auto-tuned Kernels to the Numerical Libraries in the DOE ACTS Collection Numerical Libraries in the DOE ACTS Collection The DOE ACTS Collection SIAM Parallel Processing for Scientific Computing, Savannah, Georgia Feb 15, 2012 Tony Drummond Computational Research Division Lawrence

More information

A User s View of OpenMP: The Good, The Bad, and The Ugly

A User s View of OpenMP: The Good, The Bad, and The Ugly A User s View of OpenMP: The Good, The Bad, and The Ugly William D. Gropp Mathematics and Computer Science Division Argonne National Laboratory http://www.mcs.anl.gov/~gropp Collaborators Dinesh K. Kaushik

More information

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College

More information

Libraries for Scientific Computing: an overview and introduction to HSL

Libraries for Scientific Computing: an overview and introduction to HSL Libraries for Scientific Computing: an overview and introduction to HSL Mario Arioli Jonathan Hogg STFC Rutherford Appleton Laboratory 2 / 41 Overview of talk Brief introduction to who we are An overview

More information

2 Fundamentals of Serial Linear Algebra

2 Fundamentals of Serial Linear Algebra . Direct Solution of Linear Systems.. Gaussian Elimination.. LU Decomposition and FBS..3 Cholesky Decomposition..4 Multifrontal Methods. Iterative Solution of Linear Systems.. Jacobi Method Fundamentals

More information

Iterative Sparse Triangular Solves for Preconditioning

Iterative Sparse Triangular Solves for Preconditioning Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations

More information

Lecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice

Lecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 11-10/09/013 Lecture 11: Randomized Least-squares Approximation in Practice Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these

More information

CUDA Accelerated Compute Libraries. M. Naumov

CUDA Accelerated Compute Libraries. M. Naumov CUDA Accelerated Compute Libraries M. Naumov Outline Motivation Why should you use libraries? CUDA Toolkit Libraries Overview of performance CUDA Proprietary Libraries Address specific markets Third Party

More information

Epetra Performance Optimization Guide

Epetra Performance Optimization Guide SAND2005-1668 Unlimited elease Printed March 2005 Updated for Trilinos 9.0 February 2009 Epetra Performance Optimization Guide Michael A. Heroux Scalable Algorithms Department Sandia National Laboratories

More information

Sparse Matrix Libraries in C++ for High Performance. Architectures. ferent sparse matrix data formats in order to best

Sparse Matrix Libraries in C++ for High Performance. Architectures. ferent sparse matrix data formats in order to best Sparse Matrix Libraries in C++ for High Performance Architectures Jack Dongarra xz, Andrew Lumsdaine, Xinhui Niu Roldan Pozo z, Karin Remington x x Oak Ridge National Laboratory z University oftennessee

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

Automatic Development of Linear Algebra Libraries for the Tesla Series

Automatic Development of Linear Algebra Libraries for the Tesla Series Automatic Development of Linear Algebra Libraries for the Tesla Series Enrique S. Quintana-Ortí quintana@icc.uji.es Universidad Jaime I de Castellón (Spain) Dense Linear Algebra Major problems: Source

More information

Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation

Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation Alexander Kalinkin Anton Anders Roman Anders 1 Legal Disclaimer INFORMATION IN

More information

OSKI: Autotuned sparse matrix kernels

OSKI: Autotuned sparse matrix kernels : Autotuned sparse matrix kernels Richard Vuduc LLNL / Georgia Tech [Work in this talk: Berkeley] James Demmel, Kathy Yelick, UCB [BeBOP group] CScADS Autotuning Workshop What does the sparse case add

More information

Parallel Implementations of Gaussian Elimination

Parallel Implementations of Gaussian Elimination s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse

More information

CS 542G: Solving Sparse Linear Systems

CS 542G: Solving Sparse Linear Systems CS 542G: Solving Sparse Linear Systems Robert Bridson November 26, 2008 1 Direct Methods We have already derived several methods for solving a linear system, say Ax = b, or the related leastsquares problem

More information

Tools and Libraries for Parallel Sparse Matrix Computations. Edmond Chow and Yousef Saad. University of Minnesota. Minneapolis, MN

Tools and Libraries for Parallel Sparse Matrix Computations. Edmond Chow and Yousef Saad. University of Minnesota. Minneapolis, MN Tools and Libraries for Parallel Sparse Matrix Computations Edmond Chow and Yousef Saad Department of Computer Science, and Minnesota Supercomputer Institute University of Minnesota Minneapolis, MN 55455

More information

Accelerating the Iterative Linear Solver for Reservoir Simulation

Accelerating the Iterative Linear Solver for Reservoir Simulation Accelerating the Iterative Linear Solver for Reservoir Simulation Wei Wu 1, Xiang Li 2, Lei He 1, Dongxiao Zhang 2 1 Electrical Engineering Department, UCLA 2 Department of Energy and Resources Engineering,

More information

Self Adapting Numerical Software (SANS-Effort)

Self Adapting Numerical Software (SANS-Effort) Self Adapting Numerical Software (SANS-Effort) Jack Dongarra Innovative Computing Laboratory University of Tennessee and Oak Ridge National Laboratory 1 Work on Self Adapting Software 1. Lapack For Clusters

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new

More information

HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach

HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach Mini-workshop PHyLeaS associated team J. Gaidamour, P. Hénon July 9, 28 HIPS : an hybrid direct/iterative solver /

More information

Implicit schemes for wave models

Implicit schemes for wave models Implicit schemes for wave models Mathieu Dutour Sikirić Rudjer Bo sković Institute, Croatia and Universität Rostock April 17, 2013 I. Wave models Stochastic wave modelling Oceanic models are using grids

More information

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010

More information

paper, we focussed on the GMRES(m) which is improved GMRES(Generalized Minimal RESidual method), and developed its library on distributed memory machi

paper, we focussed on the GMRES(m) which is improved GMRES(Generalized Minimal RESidual method), and developed its library on distributed memory machi Performance of Automatically Tuned Parallel GMRES(m) Method on Distributed Memory Machines Hisayasu KURODA 1?, Takahiro KATAGIRI 12, and Yasumasa KANADA 3 1 Department of Information Science, Graduate

More information

Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

Developing a High Performance Software Library with MPI and CUDA for Matrix Computations Developing a High Performance Software Library with MPI and CUDA for Matrix Computations Bogdan Oancea 1, Tudorel Andrei 2 1 Nicolae Titulescu University of Bucharest, e-mail: bogdanoancea@univnt.ro, Calea

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Uppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2

Uppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2 Uppsala University Department of Information technology Hands-on : Ill-conditioning Exercise (Ill-conditioned linear systems) Definition A system of linear equations is said to be ill-conditioned when

More information

Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes

Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes Pramod Kumbhar August 19, 2011 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2011 Abstract

More information

Advanced Computer Graphics

Advanced Computer Graphics G22.2274 001, Fall 2010 Advanced Computer Graphics Project details and tools 1 Projects Details of each project are on the website under Projects Please review all the projects and come see me if you would

More information

Scientific Computing. Some slides from James Lambers, Stanford

Scientific Computing. Some slides from James Lambers, Stanford Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical

More information

MAGMA: a New Generation

MAGMA: a New Generation 1.3 MAGMA: a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Jack Dongarra T. Dong, M. Gates, A. Haidar, S. Tomov, and I. Yamazaki University of Tennessee, Knoxville Release

More information

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany

More information

Figure 6.1: Truss topology optimization diagram.

Figure 6.1: Truss topology optimization diagram. 6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.

More information

HPC Libraries. Hartmut Kaiser PhD. High Performance Computing: Concepts, Methods & Means

HPC Libraries. Hartmut Kaiser PhD. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means HPC Libraries Hartmut Kaiser PhD Center for Computation & Technology Louisiana State University April 19 th, 2007 Outline Introduction to High Performance

More information

Report of Linear Solver Implementation on GPU

Report of Linear Solver Implementation on GPU Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,

More information

MAGMA. Matrix Algebra on GPU and Multicore Architectures

MAGMA. Matrix Algebra on GPU and Multicore Architectures MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/

More information

Review of previous examinations TMA4280 Introduction to Supercomputing

Review of previous examinations TMA4280 Introduction to Supercomputing Review of previous examinations TMA4280 Introduction to Supercomputing NTNU, IMF April 24. 2017 1 Examination The examination is usually comprised of: one problem related to linear algebra operations with

More information

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics

Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

NAG Library Function Document nag_superlu_lu_factorize (f11mec)

NAG Library Function Document nag_superlu_lu_factorize (f11mec) NAG Library Function Document nag_superlu_lu_factorize () 1 Purpose nag_superlu_lu_factorize () computes the LU factorization of a real sparse matrix in compressed column (Harwell Boeing), column-permuted

More information

Solving Dense Linear Systems on Graphics Processors

Solving Dense Linear Systems on Graphics Processors Solving Dense Linear Systems on Graphics Processors Sergio Barrachina Maribel Castillo Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí High Performance Computing & Architectures Group Universidad

More information

Hands-On Exercises for SLEPc

Hands-On Exercises for SLEPc Scalable Library for Eigenvalue Problem Computations http://slepc.upv.es Hands-On Exercises for SLEPc Oct 2017 Intended for use with version 3.8 of SLEPc These exercises introduce application development

More information

Maths for Signals and Systems Linear Algebra in Engineering. Some problems by Gilbert Strang

Maths for Signals and Systems Linear Algebra in Engineering. Some problems by Gilbert Strang Maths for Signals and Systems Linear Algebra in Engineering Some problems by Gilbert Strang Problems. Consider u, v, w to be non-zero vectors in R 7. These vectors span a vector space. What are the possible

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

Algebraic Iterative Methods for Computed Tomography

Algebraic Iterative Methods for Computed Tomography Algebraic Iterative Methods for Computed Tomography Per Christian Hansen DTU Compute Department of Applied Mathematics and Computer Science Technical University of Denmark Per Christian Hansen Algebraic

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

Adaptable benchmarks for register blocked sparse matrix-vector multiplication

Adaptable benchmarks for register blocked sparse matrix-vector multiplication Adaptable benchmarks for register blocked sparse matrix-vector multiplication Berkeley Benchmarking and Optimization group (BeBOP) Hormozd Gahvari and Mark Hoemmen Based on research of: Eun-Jin Im Rich

More information

Brief notes on setting up semi-high performance computing environments. July 25, 2014

Brief notes on setting up semi-high performance computing environments. July 25, 2014 Brief notes on setting up semi-high performance computing environments July 25, 2014 1 We have two different computing environments for fitting demanding models to large space and/or time data sets. 1

More information

Tomonori Kouya Shizuoka Institute of Science and Technology Toyosawa, Fukuroi, Shizuoka Japan. October 5, 2018

Tomonori Kouya Shizuoka Institute of Science and Technology Toyosawa, Fukuroi, Shizuoka Japan. October 5, 2018 arxiv:1411.2377v1 [math.na] 10 Nov 2014 A Highly Efficient Implementation of Multiple Precision Sparse Matrix-Vector Multiplication and Its Application to Product-type Krylov Subspace Methods Tomonori

More information

A GPU Sparse Direct Solver for AX=B

A GPU Sparse Direct Solver for AX=B 1 / 25 A GPU Sparse Direct Solver for AX=B Jonathan Hogg, Evgueni Ovtchinnikov, Jennifer Scott* STFC Rutherford Appleton Laboratory 26 March 2014 GPU Technology Conference San Jose, California * Thanks

More information

Sparse Matrices Direct methods

Sparse Matrices Direct methods Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.

More information

nag sparse nsym sol (f11dec)

nag sparse nsym sol (f11dec) f11 Sparse Linear Algebra f11dec nag sparse nsym sol (f11dec) 1. Purpose nag sparse nsym sol (f11dec) solves a real sparse nonsymmetric system of linear equations, represented in coordinate storage format,

More information

Preliminary Implementation of PETSc Using GPUs

Preliminary Implementation of PETSc Using GPUs Preliminary Implementation of PETSc Using GPUs Victor Minden, Barry Smith, and Matthew G. Knepley To appear in the Proceedings of the 2010 International Workshop of GPU Solutions to Multiscale Problems

More information

Supercomputing and Science An Introduction to High Performance Computing

Supercomputing and Science An Introduction to High Performance Computing Supercomputing and Science An Introduction to High Performance Computing Part VII: Scientific Computing Henry Neeman, Director OU Supercomputing Center for Education & Research Outline Scientific Computing

More information

Sparse Matrix Formats

Sparse Matrix Formats Christopher Bross Friedrich-Alexander-Universität Erlangen-Nürnberg Motivation Sparse Matrices are everywhere Sparse Matrix Formats C. Bross BGCE Research Day, Erlangen, 09.06.2016 2/16 Motivation Sparse

More information

Mixed Precision Methods

Mixed Precision Methods Mixed Precision Methods Mixed precision, use the lowest precision required to achieve a given accuracy outcome " Improves runtime, reduce power consumption, lower data movement " Reformulate to find correction

More information

Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs

Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs JH160041-NAHI Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs Rio Yokota(Tokyo Institute of Technology) Hierarchical low-rank approximation methods such as H-matrix, H2-matrix,

More information