Overview of MUMPS (A multifrontal Massively Parallel Solver)

Size: px
Start display at page:

Download "Overview of MUMPS (A multifrontal Massively Parallel Solver)"

Transcription

1 Overview of MUMPS (A multifrontal Massively Parallel Solver) E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, J.-Y. L Excellent, T. Slavova, B. Uçar CERFACS, ENSEEIHT-IRIT, INRIA (LIP and LaBRI), ( or Petascale Applications, Algorithms and Programming (PAAAP) workshop, Toulouse June 24, Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 1

2 Outline Introduction-History 1. Introduction-History 2. Sparse Direct Methods and preprocessing 3. Parallel sparse solver - Memory issues 4. Conclusion-On going work. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 2

3 History Introduction-History objective : Solve Ax = b, where A is large and sparse. At the beginning : LTR ( Long Term Research) European project, from 1996 to 1999 Led to first public domain version Now : MUMPS is supported by CERFACS (Toulouse),ENSEEIHT-IRIT (Toulouse) and INRIA (Lyon, Bordeaux). Public Domain (available free of charge) 300,000 lines of F90/C/MPI code 1000 users, 3 requests per day. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 3

4 History Introduction-History Current development team : Patrick Amestoy, ENSEEIHT-IRIT (Toulouse) Alfredo Buttari, INRIA Lyon Philippe Combes, CNRS/ENS Lyon Abdou Guermouche, LaBRI-INRIA (Bordeaux) Jean-Yves L Excellent, INRIA Bora Uçar, CERFACS Phd Students Emmanuel Agullo, ENS-Lyon Tzvetomila Slavova, CERFACS (Toulouse).. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 4

5 Users Introduction-History NORTH AMERICA 31% OCEANIA 2% AFRICA SOUTH AMERICA < 1% 4% EASTERN EUROPE 6% ASIA 19% EUROPE 39%. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 5

6 Outline Sparse Direct Methods and preprocessing 1. Introduction-History 2. Sparse Direct Methods and preprocessing 3. Parallel sparse solver - Memory issues 4. Conclusion-On going work. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 6

7 Sparse Direct Methods and preprocessing MUMPS MUMPS solves large systems of linear equations of the form Ax=b by factorizing A into A=LU or LDLT. It uses a multifrontal technique which is a direct method. 3 main steps (plus initialization and termination) : JOB = 1 ANALYSIS JOB = 1 FACTORIZATION JOB = 2 SOLVE JOB = 3 JOB = 2 E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 7

8 Sparse Direct Methods and preprocessing MUMPS MUMPS solves large systems of linear equations of the form Ax=b by factorizing A into A=LU or LDLT. It uses a multifrontal technique which is a direct method. 3 main steps (plus initialization and termination) : JOB = 1 ANALYSIS JOB = 1 FACTORIZATION JOB = 2 SOLVE JOB = 3 JOB = 2 JOB=-1 : initialize solver type (LU, LDL T ) and default parameters. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 7

9 Sparse Direct Methods and preprocessing MUMPS MUMPS solves large systems of linear equations of the form Ax=b by factorizing A into A=LU or LDLT. It uses a multifrontal technique which is a direct method. 3 main steps (plus initialization and termination) : JOB = 1 ANALYSIS JOB = 1 FACTORIZATION JOB = 2 SOLVE JOB = 3 JOB = 2 JOB=1 : analyse the matrix, build an ordering, prepare factorization. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 7

10 Sparse Direct Methods and preprocessing MUMPS MUMPS solves large systems of linear equations of the form Ax=b by factorizing A into A=LU or LDLT. It uses a multifrontal technique which is a direct method. 3 main steps (plus initialization and termination) : JOB = 1 ANALYSIS JOB = 1 FACTORIZATION JOB = 2 SOLVE JOB = 3 JOB = 2 JOB=2 : (parallel) numerical factorization A = LU E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 7

11 Sparse Direct Methods and preprocessing MUMPS MUMPS solves large systems of linear equations of the form Ax=b by factorizing A into A=LU or LDLT. It uses a multifrontal technique which is a direct method. 3 main steps (plus initialization and termination) : JOB = 1 ANALYSIS JOB = 1 FACTORIZATION JOB = 2 SOLVE JOB = 3 JOB = 2 JOB=3 : (parallel) solution step forward and backward substitutions (Ly = b,ux = y). Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 7

12 Sparse Direct Methods and preprocessing MUMPS MUMPS solves large systems of linear equations of the form Ax=b by factorizing A into A=LU or LDLT. It uses a multifrontal technique which is a direct method. 3 main steps (plus initialization and termination) : JOB = 1 ANALYSIS JOB = 1 FACTORIZATION JOB = 2 SOLVE JOB = 3 JOB = 2 JOB=-2 : termination deallocate all MUMPS data structures. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 7

13 Sparse solver for Ax = b : only a black box? Preprocessing and postprocessing : Symmetric permutations to reduce fill :(Ax = b P AP t P x = b) Numerical pivoting, scaling to preserve numerical accuracy Maximum transversal (set large entries on the diagonal) Preprocessing for parallelism (influence of task mapping on parallelism) Iterative refinement, error analysis Default (often automatic/adaptive) setting of the options is available. However, a better knowledge of the options can help the user to further improve : memory usage, time for solution, numerical accuracy.

14 Sparse Direct Methods and preprocessing Preprocessing - illustration Original (A =lhr01) Preprocessed matrix (A (lhr01)) nz = nz = Modified Problem :A x = b with A = P n D r P AQP t D c. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 9

15 Sparse Direct Methods and preprocessing Impact of fill-reducing heuristics Number of operations (millions) METIS SCOTCH PORD AMF AMD gupta ship twotone wang xenon Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 10

16 Sparse Direct Methods and preprocessing Multifrontal method (Duff and Reid, 83) 1 2 A= 3 L+U I= Fill in Memory is divided into two parts (that can overlap in time) : the factors the active memory Factors Contribution block Factors Active frontal matrix Stack of contribution blocks Active Memory Elimination tree represents tasks dependencies. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 11

17 Sparse Direct Methods and preprocessing Impact of fill-reducing heuristics Peak of active memory for multifrontal approach ( 10 6 entries) METIS SCOTCH PORD AMF AMD gupta ship twotone wang xenon Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 12

18 Sparse Direct Methods and preprocessing Impact of fill-reducing heuristics Time for factorization (seconds) 1p 16p 32p 64p 128p coneshl METIS PORD audi METIS PORD Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 13

19 Sparse Direct Methods and preprocessing Preprocessing - Maximum weighted matching (I) Objective : Set large entries on the diagonal Unsymmetric permutation and scaling Preprocessed matrix B = D 1 AQD 2 is such that b ii = 1 and b ij 1 0 Original (A =lhr01) 0 Permuted (A = AQ) nz = nz = Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 14

20 Sparse Direct Methods and preprocessing Preprocessing - Maximum weighted matching (II) Influence of maximum weighted matching on the performance Matrix Symmetry LU Flops Backwd (10 6 ) (10 9 ) Error twotone OFF ON fidapm11 OFF ON On very unsymmetric matrices : reduce flops, factor size and memory used. In general improve accuracy, and reduce number of iterative refinements. Improve reliability of memory estimates. E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 15

21 Sparse Direct Methods and preprocessing Preprocessing - Maximum weighted matching (II) Influence of maximum weighted matching on the performance Matrix Symmetry LU Flops Backwd (10 6 ) (10 9 ) Error twotone OFF ON fidapm11 OFF ON On very unsymmetric matrices : reduce flops, factor size and memory used. In general improve accuracy, and reduce number of iterative refinements. Improve reliability of memory estimates. E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 15

22 Sparse Direct Methods and preprocessing Preprocessing symmetric matrices - Compressed ordering Perform an unsymmetric weighted matching Matched entry. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 16

23 Sparse Direct Methods and preprocessing Preprocessing symmetric matrices - Compressed ordering Perform an unsymmetric weighted matching Select matched entries Select Matched entry Matched entry Selected Matched entry. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 16

24 Sparse Direct Methods and preprocessing Preprocessing symmetric matrices - Compressed ordering Perform an unsymmetric weighted matching Select matched entries Symmetrically permute matrix to set large entries near diagonal j1 j2 j3 j4 j5 j6 j1 j4 j2 j3 j5 j6 Permute B = Qt A Q Selected entries. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 16

25 Sparse Direct Methods and preprocessing Preprocessing symmetric matrices - Compressed ordering Perform an unsymmetric weighted matching Select matched entries Symmetrically permute matrix to set large entries near diagonal Compression : 2 2 diagonal blocks become supervariables. Compress permuted matrix B. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 16

26 Sparse Direct Methods and preprocessing Preprocessing symmetric matrices - Compressed ordering Perform an unsymmetric weighted matching Select matched entries Symmetrically permute matrix to set large entries near diagonal Compression : 2 2 diagonal blocks become supervariables. Compress permuted matrix B Influence of using a compressed graph (with scaling) Total time Nb of entries in factors in Millions (seconds) (estimated) (effective) Compression : OFF ON OFF ON OFF ON cont cvxqp stokes Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 16

27 Outline Parallel sparse solver - Memory issues 1. Introduction-History 2. Sparse Direct Methods and preprocessing 3. Parallel sparse solver - Memory issues 4. Conclusion-On going work. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 17

28 Parallel sparse solver - Memory issues Out-of-core Solving sparse linear systems Ax = b : 10 M variables A = LU (Direct methods) Current limits : BRGM matrix variables non zeros in A non zeros in LU flops Physical constraint. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 18

29 Parallel sparse solver - Memory issues Out-of-core Solving sparse linear systems Ax = b : 10 M variables A = LU (Direct methods) Current limits : BRGM matrix variables non zeros in A non zeros in LU flops Physical constraint Core memory Memory required Memory crash. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 18

30 Parallel sparse solver - Memory issues Out-of-core Solving sparse linear systems Ax = b : 10 M variables A = LU (Direct methods) Current limits : BRGM matrix variables non zeros in A non zeros in LU flops Out-of-core Core memory Disks Memory required Use of disks. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 18

31 Parallel sparse solver - Memory issues Out-of-core solution phase For a symmetric case : Ax = b A = LDL T matrix pattern elimination tree L factors Location of factors on DISK Out-Of-Core solution phase (LDL T x = b) is often as costly as the factorization. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 19

32 Parallel sparse solver - Memory issues Out-of-core Basics Objective : Reduce time for solution of Ax = b in a parallel limited memory environment (OOC) Time performance of the solution phase is related to : the bandwidth and number of disk acesses for reading data the number of processors and volume of factors per processor the regularity in the disk accesses Scheduling can significantly improve the parallel performance of the solution Matrix name Order Nb entries Factor size Nb Nodes Origin (Millions) (MBytes) in the tree qimonda Qimonda AG cas4r-l EADS Publicly available matrices can be obtained on gridtlse.org web site. E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 20

33 Parallel sparse solver - Memory issues Out-of-core Basics Objective : Reduce time for solution of Ax = b in a parallel limited memory environment (OOC) Time performance of the solution phase is related to : the bandwidth and number of disk acesses for reading data the number of processors and volume of factors per processor the regularity in the disk accesses Scheduling can significantly improve the parallel performance of the solution Matrix name Order Nb entries Factor size Nb Nodes Origin (Millions) (MBytes) in the tree qimonda Qimonda AG cas4r-l EADS Publicly available matrices can be obtained on gridtlse.org web site. E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 20

34 Parallel sparse solver - Memory issues Performance study - I qimonda07 (Qimonda AG company) Strategy Nb of Factor Size Workspace Fwd Bwd Procs (MB) (MB) (sec) (sec) Standard New Standard New Standard New cas4r-l15 (EADS) Strategy Nb of Factor Size Workspace Fwd Bwd Procs (MB) (MB) (sec) (sec) Standard New Standard New Standard New E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 21

35 Parallel sparse solver - Memory issues Memory related to communication buffers Communication buffer size becomes critical in Out-of-core. Idea : send message by packets that fit in a small buffer of fixed size. Cost : more synchronizations (receiver must be ready to receive before we can send the next packet). Communication scheme Matrix Large buffers Small buffers AUDIKW CONESHL MOD CONV3D ULTRASOUND Tab.: Size of the communication buffers (MB) with 32 processors. Balance to be found to avoid too many synchronizations between senders and receivers.. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 22

36 Parallel sparse solver - Memory issues Memory related to communication buffers Communication buffer size becomes critical in Out-of-core. Idea : send message by packets that fit in a small buffer of fixed size. Cost : more synchronizations (receiver must be ready to receive before we can send the next packet). Communication scheme Matrix Large buffers Small buffers AUDIKW CONESHL MOD CONV3D ULTRASOUND Total IC : 1260 Tab.: Size of the communication buffers (MB) with 32 processors. Balance to be found to avoid too many synchronizations between senders and receivers. E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 22

37 Parallel sparse solver - Memory issues Memory related to communication buffers Communication buffer size becomes critical in Out-of-core. Idea : send message by packets that fit in a small buffer of fixed size. Cost : more synchronizations (receiver must be ready to receive before we can send the next packet). Communication scheme Matrix Large buffers Small buffers AUDIKW CONESHL MOD CONV3D ULTRASOUND Total OOC : 800 Tab.: Size of the communication buffers (MB) with 32 processors. Balance to be found to avoid too many synchronizations between senders and receivers. E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 22

38 Parallel sparse solver - Memory issues Preliminary work : scheduling unfluences peak memory needed Modify tree mapping to reduce the minimum memory requirement in parallel executions. Factors Performance oriented Memory oriented In Core Max Avg Out-Of-Core Max Avg Tab.: Estimated total memory (MB) - AUDIKW 1 matrix - 16 processors.. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 23

39 Outline Conclusion-On going work 1. Introduction-History 2. Sparse Direct Methods and preprocessing 3. Parallel sparse solver - Memory issues 4. Conclusion-On going work. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 24

40 Conclusion-On going work Conclusion-On going work Future functionalities (on-going work) Out-of-core continuing Parallel analysis phase (orderings, scalings) Rank-detection and null space basis Scheduling for memory (OOC, Fault-Tolerance) Multi-core : (Flops not best metric to minimize) On-going projects ANR CIS-SOLSTICE (ANR-06-CIS6-010) France-Berkeley (2008-) REDIMPS : CNRS/JST (Japan) cooperation project ( ) SEISCOPE (http ://www-geoazur.unice.fr/seiscope). Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, Overview J.-Y. of L Excellent, MUMPS (AT. multifrontal Slavova, B. Massively Uçar Parallel Solver) 25

MUMPS. The MUMPS library. Abdou Guermouche and MUMPS team, June 22-24, Univ. Bordeaux 1 and INRIA

MUMPS. The MUMPS library. Abdou Guermouche and MUMPS team, June 22-24, Univ. Bordeaux 1 and INRIA The MUMPS library Abdou Guermouche and MUMPS team, Univ. Bordeaux 1 and INRIA June 22-24, 2010 MUMPS Outline MUMPS status Recently added features MUMPS and multicores? Memory issues GPU computing Future

More information

Null space computation of sparse singular matrices with MUMPS

Null space computation of sparse singular matrices with MUMPS Null space computation of sparse singular matrices with MUMPS Xavier Vasseur (CERFACS) In collaboration with Patrick Amestoy (INPT-IRIT, University of Toulouse and ENSEEIHT), Serge Gratton (INPT-IRIT,

More information

Analysis of the Out-of-Core Solution Phase of a Parallel Multifrontal Approach

Analysis of the Out-of-Core Solution Phase of a Parallel Multifrontal Approach Analysis of the Out-of-Core Solution Phase of a Parallel Multifrontal Approach P. Amestoy I.S. Duff A. Guermouche Tz. Slavova April 25, 200 Abstract We consider the parallel solution of sparse linear systems

More information

On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme

On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme Emmanuel Agullo 1,6,3,7, Abdou Guermouche 5,4,8, and Jean-Yves L Excellent 2,6,3,9 1 École Normale Supérieure de

More information

SCALABLE ALGORITHMS for solving large sparse linear systems of equations

SCALABLE ALGORITHMS for solving large sparse linear systems of equations SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational

More information

Sparse Matrices Direct methods

Sparse Matrices Direct methods Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.

More information

Package rmumps. December 18, 2017

Package rmumps. December 18, 2017 Type Package Title Wrapper for MUMPS Library Version 5.1.2-2 Date 2017-12-13 Package rmumps December 18, 2017 Maintainer Serguei Sokol Description Some basic features of MUMPS

More information

Sparse Direct Solvers for Extreme-Scale Computing

Sparse Direct Solvers for Extreme-Scale Computing Sparse Direct Solvers for Extreme-Scale Computing Iain Duff Joint work with Florent Lopez and Jonathan Hogg STFC Rutherford Appleton Laboratory SIAM Conference on Computational Science and Engineering

More information

P. Amestoy, F. Camillo, M. Daydé, R. Guivarch, A. Hurault, JY. L'Excellent (*), C. Puglisi. IRIT INPT(ENSEEIHT) (Toulouse) (*)LIP ENS (Lyon)

P. Amestoy, F. Camillo, M. Daydé, R. Guivarch, A. Hurault, JY. L'Excellent (*), C. Puglisi. IRIT INPT(ENSEEIHT) (Toulouse) (*)LIP ENS (Lyon) How How integrate integrate GreenComputing GreenComputing Concepts Concepts in in Grid-TLSE? Grid-TLSE? P. Amestoy, F. Camillo, M. Daydé, R. Guivarch, A. Hurault, JY. L'Excellent (*), C. Puglisi IRIT INPT(ENSEEIHT)

More information

Towards a Parallel Out-of-core Multifrontal Solver: Preliminary Study

Towards a Parallel Out-of-core Multifrontal Solver: Preliminary Study Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Towards a Parallel Out-of-core Multifrontal Solver: Preliminary

More information

PACKAGE SPECIFICATION HSL 2013

PACKAGE SPECIFICATION HSL 2013 MA41 PACKAGE SPECIFICATION HSL 2013 1 SUMMARY To solve a sparse unsymmetric system of linear equations. Given a square unsymmetric sparse matrix A of order n T and an n-vector b, this subroutine solves

More information

Multifrontal Methods: Parallelism, Memory Usage and Numerical Aspects

Multifrontal Methods: Parallelism, Memory Usage and Numerical Aspects École Normale Supérieure de Lyon, France Multifrontal Methods: Parallelism, Memory Usage and Numerical Aspects Jean-Yves L Excellent Chargé de Recherche, Inria MÉMOIRE D HABILITATION À DIRIGER DES RECHERCHES

More information

MA41 SUBROUTINE LIBRARY SPECIFICATION 1st Aug. 2000

MA41 SUBROUTINE LIBRARY SPECIFICATION 1st Aug. 2000 MA41 SUBROUTINE LIBRARY SPECIFICATION 1st Aug. 2000 1 SUMMARY To solve a sparse unsymmetric system of linear equations. Given a square unsymmetric sparse matrix T A of order n and a matrix B of nrhs right

More information

Numerical methods for on-line power system load flow analysis

Numerical methods for on-line power system load flow analysis Energy Syst (2010) 1: 273 289 DOI 10.1007/s12667-010-0013-6 ORIGINAL PAPER Numerical methods for on-line power system load flow analysis Siddhartha Kumar Khaitan James D. McCalley Mandhapati Raju Received:

More information

Sparse LU Factorization for Parallel Circuit Simulation on GPUs

Sparse LU Factorization for Parallel Circuit Simulation on GPUs Department of Electronic Engineering, Tsinghua University Sparse LU Factorization for Parallel Circuit Simulation on GPUs Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Nano-scale Integrated

More information

An Hybrid Approach for the Parallelization of a Block Iterative Algorithm

An Hybrid Approach for the Parallelization of a Block Iterative Algorithm An Hybrid Approach for the Parallelization of a Block Iterative Algorithm Carlos Balsa 1, Ronan Guivarch 2, Daniel Ruiz 2, and Mohamed Zenadi 2 1 CEsA FEUP, Porto, Portugal balsa@ipb.pt 2 Université de

More information

A Parallel Matrix Scaling Algorithm

A Parallel Matrix Scaling Algorithm A Parallel Matrix Scaling Algorithm Patrick R. Amestoy 1, Iain S. Duff 2,3, Daniel Ruiz 1, and Bora Uçar 3 1 ENSEEIHT-IRIT, 2 rue Camichel, 31071, Toulouse, France amestoy@enseeiht.fr, Daniel.Ruiz@enseeiht.fr

More information

qr_mumps a multithreaded, multifrontal QR solver The MUMPS team,

qr_mumps a multithreaded, multifrontal QR solver The MUMPS team, qr_mumps a multithreaded, multifrontal QR solver The MUMPS team, MUMPS Users Group Meeting, May 29-30, 2013 The qr_mumps software the qr_mumps software qr_mumps version 1.0 a multithreaded, multifrontal

More information

Hybrid scheduling for the parallel solution of linear systems

Hybrid scheduling for the parallel solution of linear systems Laboratoire de l Informatique du Parallélisme École Normale Supérieure de Lyon Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668 Hybrid scheduling for the parallel solution of linear systems Patrick

More information

Combinatorial problems in a Parallel Hybrid Linear Solver

Combinatorial problems in a Parallel Hybrid Linear Solver Combinatorial problems in a Parallel Hybrid Linear Solver Ichitaro Yamazaki and Xiaoye Li Lawrence Berkeley National Laboratory François-Henry Rouet and Bora Uçar ENSEEIHT-IRIT and LIP, ENS-Lyon SIAM workshop

More information

sizes become smaller than some threshold value. This ordering guarantees that no non zero term can appear in the factorization process between unknown

sizes become smaller than some threshold value. This ordering guarantees that no non zero term can appear in the factorization process between unknown Hybridizing Nested Dissection and Halo Approximate Minimum Degree for Ecient Sparse Matrix Ordering? François Pellegrini 1, Jean Roman 1, and Patrick Amestoy 2 1 LaBRI, UMR CNRS 5800, Université Bordeaux

More information

A NUMA Aware Scheduler for a Parallel Sparse Direct Solver

A NUMA Aware Scheduler for a Parallel Sparse Direct Solver Author manuscript, published in "N/P" A NUMA Aware Scheduler for a Parallel Sparse Direct Solver Mathieu Faverge a, Pierre Ramet a a INRIA Bordeaux - Sud-Ouest & LaBRI, ScAlApplix project, Université Bordeaux

More information

MaPHyS, a sparse hybrid linear solver and preliminary complexity analysis

MaPHyS, a sparse hybrid linear solver and preliminary complexity analysis MaPHyS, a sparse hybrid linear solver and preliminary complexity analysis Emmanuel AGULLO, Luc GIRAUD, Abdou GUERMOUCHE, Azzam HAIDAR, Jean ROMAN INRIA Bordeaux Sud Ouest CERFACS Université de Bordeaux

More information

Sparse Matrices Introduction to sparse matrices and direct methods

Sparse Matrices Introduction to sparse matrices and direct methods Sparse Matrices Introduction to sparse matrices and direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections,

More information

An Out-of-Core Sparse Cholesky Solver

An Out-of-Core Sparse Cholesky Solver An Out-of-Core Sparse Cholesky Solver JOHN K. REID and JENNIFER A. SCOTT Rutherford Appleton Laboratory 9 Direct methods for solving large sparse linear systems of equations are popular because of their

More information

An Adaptive LU Factorization Algorithm for Parallel Circuit Simulation

An Adaptive LU Factorization Algorithm for Parallel Circuit Simulation An Adaptive LU Factorization Algorithm for Parallel Circuit Simulation Xiaoming Chen, Yu Wang, Huazhong Yang Department of Electronic Engineering Tsinghua National Laboratory for Information Science and

More information

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination

More information

A direct solver with reutilization of previously-computed LU factorizations for h-adaptive finite element grids with point singularities

A direct solver with reutilization of previously-computed LU factorizations for h-adaptive finite element grids with point singularities Maciej Paszynski AGH University of Science and Technology Faculty of Computer Science, Electronics and Telecommunication Department of Computer Science al. A. Mickiewicza 30 30-059 Krakow, Poland tel.

More information

A comparison of parallel rank-structured solvers

A comparison of parallel rank-structured solvers A comparison of parallel rank-structured solvers François-Henry Rouet Livermore Software Technology Corporation, Lawrence Berkeley National Laboratory Joint work with: - LSTC: J. Anton, C. Ashcraft, C.

More information

A parallel direct/iterative solver based on a Schur complement approach

A parallel direct/iterative solver based on a Schur complement approach A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008

More information

Fast and reliable linear system solutions on new parallel architectures

Fast and reliable linear system solutions on new parallel architectures Fast and reliable linear system solutions on new parallel architectures Marc Baboulin Université Paris-Sud Chaire Inria Saclay Île-de-France Séminaire Aristote - Ecole Polytechnique 15 mai 2013 Marc Baboulin

More information

Model Reduction for High Dimensional Micro-FE Models

Model Reduction for High Dimensional Micro-FE Models Model Reduction for High Dimensional Micro-FE Models Rudnyi E. B., Korvink J. G. University of Freiburg, IMTEK-Department of Microsystems Engineering, {rudnyi,korvink}@imtek.uni-freiburg.de van Rietbergen

More information

Scilab and MATLAB Interfaces to MUMPS (version 4.6 or greater)

Scilab and MATLAB Interfaces to MUMPS (version 4.6 or greater) INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE Scilab and MATLAB Interfaces to MUMPS (version 4.6 or greater) Aurélia Fèvre Jean-Yves L Excellent Stéphane Pralet (ENSEEIHT-IRIT) N 5816

More information

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P.

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P. 1 2 The LINPACK Benchmark on the Fujitsu AP 1000 Richard P. Brent Computer Sciences Laboratory The LINPACK Benchmark A popular benchmark for floating-point performance. Involves the solution of a nonsingular

More information

Algorithm 837: AMD, An Approximate Minimum Degree Ordering Algorithm

Algorithm 837: AMD, An Approximate Minimum Degree Ordering Algorithm Algorithm 837: AMD, An Approximate Minimum Degree Ordering Algorithm PATRICK R. AMESTOY, ENSEEIHT-IRIT, and TIMOTHY A. DAVIS University of Florida and IAIN S. DUFF CERFACS and Rutherford Appleton Laboratory

More information

Parallel resolution of sparse linear systems by mixing direct and iterative methods

Parallel resolution of sparse linear systems by mixing direct and iterative methods Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

Native mesh ordering with Scotch 4.0

Native mesh ordering with Scotch 4.0 Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse

More information

PARDISO Version Reference Sheet Fortran

PARDISO Version Reference Sheet Fortran PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly

More information

Sparse Matrix Algorithms

Sparse Matrix Algorithms Sparse Matrix Algorithms combinatorics + numerical methods + applications Math + X Tim Davis University of Florida June 2013 contributions to the field current work vision for the future Outline Math+X

More information

CERFACS 42 av. Gaspard Coriolis, Toulouse, Cedex 1, France. Available at Date: August 7, 2009.

CERFACS 42 av. Gaspard Coriolis, Toulouse, Cedex 1, France. Available at  Date: August 7, 2009. COMBINATORIAL PROBLEMS IN SOLVING LINEAR SYSTEMS IAIN S. DUFF and BORA UÇAR Technical Report: No: TR/PA/09/60 CERFACS 42 av. Gaspard Coriolis, 31057 Toulouse, Cedex 1, France. Available at http://www.cerfacs.fr/algor/reports/

More information

The block triangular form and bipartite matchings

The block triangular form and bipartite matchings Outline The block triangular form and bipartite matchings Jean-Yves L Excellent and Bora Uçar GRAAL, LIP, ENS Lyon, France CR-07: Sparse Matrix Computations, November 2010 http://graal.ens-lyon.fr/~bucar/cr07/

More information

Sparse Linear Systems

Sparse Linear Systems 1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite

More information

Sparse matrices: Basics

Sparse matrices: Basics Outline : Basics Bora Uçar RO:MA, LIP, ENS Lyon, France CR-08: Combinatorial scientific computing, September 201 http://perso.ens-lyon.fr/bora.ucar/cr08/ 1/28 CR09 Outline Outline 1 Course presentation

More information

Some Experiments and Issues to Exploit Multicore Parallelism in a Distributed-Memory Parallel Sparse Direct Solver

Some Experiments and Issues to Exploit Multicore Parallelism in a Distributed-Memory Parallel Sparse Direct Solver Some Experiments and Issues to Exploit Multicore Parallelism in a Distributed-Memory Parallel Sparse Direct Solver Indranil Chowdhury, Jean-Yves L Excellent To cite this version: Indranil Chowdhury, Jean-Yves

More information

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li Evaluation of sparse LU factorization and triangular solution on multicore architectures X. Sherry Li Lawrence Berkeley National Laboratory ParLab, April 29, 28 Acknowledgement: John Shalf, LBNL Rich Vuduc,

More information

Fast solvers for mesh-based computations

Fast solvers for mesh-based computations Fast solvers for mesh-based computations Maciej Paszyński Department of Computer Science AGH University, Krakow, Poland home.agh.edu.pl/paszynsk Collaborators: Anna Paszyńska (Jagiellonian University,Poland)

More information

Thermomechanical and hydraulic industrial simulations using MUMPS at EDF

Thermomechanical and hydraulic industrial simulations using MUMPS at EDF K u f Thermomechanical and hydraulic industrial simulations using MUMPS at EDF MUMPS User group meeting 15 april 2010 O.Boiteau, C.Denis, F.Zaoui MUltifrontal Massively Parallel sparse direct Solver 1

More information

Solvers and partitioners in the Bacchus project

Solvers and partitioners in the Bacchus project 1 Solvers and partitioners in the Bacchus project 11/06/2009 François Pellegrini INRIA-UIUC joint laboratory The Bacchus team 2 Purpose Develop and validate numerical methods and tools adapted to problems

More information

High Performance Matrix Computations

High Performance Matrix Computations High Performance Matrix Computations J.-Y. L Excellent (INRIA/LIP-ENS Lyon) Jean-Yves.L.Excellent@ens-lyon.fr Office -8 prepared in collaboration with P. Amestoy, L.Giraud, M. Daydé (ENSEEIHT-IRIT) Contents

More information

Performance Prediction of Task-Based Runtimes

Performance Prediction of Task-Based Runtimes Performance Prediction of Task-Based Runtimes 1: A. Legrand J.-F. Méhaut L. Stanisic B. Videau 2: E. Agullo A. Guermouche S. Thibault 3: A. Buttari F. Lopez 1: CNRS/Inria/University of Grenoble, France

More information

A Static Parallel Multifrontal Solver for Finite Element Meshes

A Static Parallel Multifrontal Solver for Finite Element Meshes A Static Parallel Multifrontal Solver for Finite Element Meshes Alberto Bertoldo, Mauro Bianco, and Geppino Pucci Department of Information Engineering, University of Padova, Padova, Italy {cyberto, bianco1,

More information

Preconditioning for linear least-squares problems

Preconditioning for linear least-squares problems Preconditioning for linear least-squares problems Miroslav Tůma Institute of Computer Science Academy of Sciences of the Czech Republic tuma@cs.cas.cz joint work with Rafael Bru, José Marín and José Mas

More information

User Guide for GLU V2.0

User Guide for GLU V2.0 User Guide for GLU V2.0 Lebo Wang and Sheldon Tan University of California, Riverside June 2017 Contents 1 Introduction 1 2 Using GLU within a C/C++ program 2 2.1 GLU flowchart........................

More information

Multifrontal QR Factorization for Multicore Architectures over Runtime Systems

Multifrontal QR Factorization for Multicore Architectures over Runtime Systems Multifrontal QR Factorization for Multicore Architectures over Runtime Systems Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Florent Lopez To cite this version: Emmanuel Agullo, Alfredo Buttari,

More information

Preliminary Investigations on Resilient Parallel Numerical Linear Algebra Solvers

Preliminary Investigations on Resilient Parallel Numerical Linear Algebra Solvers SIAM EX14 Workshop July 7, Chicago - IL reliminary Investigations on Resilient arallel Numerical Linear Algebra Solvers HieACS Inria roject Joint Inria-CERFACS lab INRIA Bordeaux Sud-Ouest Luc Giraud joint

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach

HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach Mini-workshop PHyLeaS associated team J. Gaidamour, P. Hénon July 9, 28 HIPS : an hybrid direct/iterative solver /

More information

Parallel Sparse LU Factorization on Different Message Passing Platforms

Parallel Sparse LU Factorization on Different Message Passing Platforms Parallel Sparse LU Factorization on Different Message Passing Platforms Kai Shen Department of Computer Science, University of Rochester Rochester, NY 1467, USA Abstract Several message passing-based parallel

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

A parallel software for a saltwater intrusion problem

A parallel software for a saltwater intrusion problem 1 A parallel software for a saltwater intrusion problem É. Canot a, C. de Dieuleveult b, J. Erhel b a CNRS IRISA, Campus de Beaulieu, 352 Rennes b INRIA IRISA, Campus de Beaulieu, 352 Rennes Abstract In

More information

Mixed Precision Methods

Mixed Precision Methods Mixed Precision Methods Mixed precision, use the lowest precision required to achieve a given accuracy outcome " Improves runtime, reduce power consumption, lower data movement " Reformulate to find correction

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

NumAn2014 Conference Proceedings.

NumAn2014 Conference Proceedings. OpenAccess Proceedings of the 6th International Conference on Numerical Analysis, pp 97-102 Contents lists available at AMCL s Digital Library NumAn2014 Conference Proceedings. Digital Library Triton :

More information

arxiv: v1 [math.oc] 30 Jan 2013

arxiv: v1 [math.oc] 30 Jan 2013 On the effects of scaling on the performance of Ipopt J. D. Hogg and J. A. Scott Scientific Computing Department, STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot OX11 QX, UK arxiv:131.7283v1

More information

This is an author-deposited version published in : Eprints ID : 12707

This is an author-deposited version published in :   Eprints ID : 12707 Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited

More information

A comparison of Algorithms for Sparse Matrix. Real-time Multibody Dynamic Simulation

A comparison of Algorithms for Sparse Matrix. Real-time Multibody Dynamic Simulation A comparison of Algorithms for Sparse Matrix Factoring and Variable Reordering aimed at Real-time Multibody Dynamic Simulation Jose-Luis Torres-Moreno, Jose-Luis Blanco, Javier López-Martínez, Antonio

More information

Intel MKL Sparse Solvers. Software Solutions Group - Developer Products Division

Intel MKL Sparse Solvers. Software Solutions Group - Developer Products Division Intel MKL Sparse Solvers - Agenda Overview Direct Solvers Introduction PARDISO: main features PARDISO: advanced functionality DSS Performance data Iterative Solvers Performance Data Reference Copyright

More information

A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche

A First Step to the Evaluation of SimGrid in the Context of a Real Application. Abdou Guermouche A First Step to the Evaluation of SimGrid in the Context of a Real Application Abdou Guermouche Hélène Renard 19th International Heterogeneity in Computing Workshop April 19, 2010 École polytechnique universitaire

More information

Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime systems

Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime systems Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime systems Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Florent Lopez To cite this version: Emmanuel

More information

Advanced Numerical Techniques for Cluster Computing

Advanced Numerical Techniques for Cluster Computing Advanced Numerical Techniques for Cluster Computing Presented by Piotr Luszczek http://icl.cs.utk.edu/iter-ref/ Presentation Outline Motivation hardware Dense matrix calculations Sparse direct solvers

More information

Lecture 9. Introduction to Numerical Techniques

Lecture 9. Introduction to Numerical Techniques Lecture 9. Introduction to Numerical Techniques Ivan Papusha CDS270 2: Mathematical Methods in Control and System Engineering May 27, 2015 1 / 25 Logistics hw8 (last one) due today. do an easy problem

More information

ME38. All use is subject to licence. ME38 v Documentation date: 15th April SUMMARY 2 HOW TO USE THE PACKAGE

ME38. All use is subject to licence. ME38 v Documentation date: 15th April SUMMARY 2 HOW TO USE THE PACKAGE ME38 PACKAGE SPECIFICAION HSL 2013 1 SUMMARY his package solves a sparse complex unsymmetric system of n linear equations in n unknowns using an unsymmetric multifrontal variant of Gaussian elimination.

More information

A Parallel Implementation of the BDDC Method for Linear Elasticity

A Parallel Implementation of the BDDC Method for Linear Elasticity A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague

More information

Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing

Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing Erik G. Boman 1, Umit V. Catalyurek 2, Cédric Chevalier 1, Karen D. Devine 1, Ilya Safro 3, Michael M. Wolf

More information

Exploiting Mixed Precision Floating Point Hardware in Scientific Computations

Exploiting Mixed Precision Floating Point Hardware in Scientific Computations Exploiting Mixed Precision Floating Point Hardware in Scientific Computations Alfredo BUTTARI a Jack DONGARRA a;b;d Jakub KURZAK a Julie LANGOU a Julien LANGOU c Piotr LUSZCZEK a and Stanimire TOMOV a

More information

SUMMARY. solve the matrix system using iterative solvers. We use the MUMPS codes and distribute the computation over many different

SUMMARY. solve the matrix system using iterative solvers. We use the MUMPS codes and distribute the computation over many different Forward Modelling and Inversion of Multi-Source TEM Data D. W. Oldenburg 1, E. Haber 2, and R. Shekhtman 1 1 University of British Columbia, Department of Earth & Ocean Sciences 2 Emory University, Atlanta,

More information

A General Sparse Sparse Linear System Solver and Its Application in OpenFOAM

A General Sparse Sparse Linear System Solver and Its Application in OpenFOAM Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe A General Sparse Sparse Linear System Solver and Its Application in OpenFOAM Murat Manguoglu * Middle East Technical University,

More information

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M

More information

MAGMA: a New Generation

MAGMA: a New Generation 1.3 MAGMA: a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Jack Dongarra T. Dong, M. Gates, A. Haidar, S. Tomov, and I. Yamazaki University of Tennessee, Knoxville Release

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 1 Don t you just invert

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse

More information

Enhancing scalability and load balancing of Parallel Selected Inversion via tree-based asynchronous communication

Enhancing scalability and load balancing of Parallel Selected Inversion via tree-based asynchronous communication 216 IEEE International Parallel and Distributed Processing Symposium Enhancing scalability and load balancing of Parallel Selected Inversion via tree-based asynchronous communication Mathias Jacquelin,

More information

HARWELL SUBROUTINE LIBRARY SPECIFICATION Release 13 (1998)

HARWELL SUBROUTINE LIBRARY SPECIFICATION Release 13 (1998) HSL HARWELL SUBROUINE LIBRARY SPECIFICAION Release 13 (1998) 1 SUMMARY o compute an orthogonal factorization of a sparse overdetermined matrix A and optionally to solve the least squares problem min b

More information

Intel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager

Intel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager Intel Math Kernel Library (Intel MKL) Sparse Solvers Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager Copyright 3, Intel Corporation. All rights reserved. Sparse

More information

A GPU Sparse Direct Solver for AX=B

A GPU Sparse Direct Solver for AX=B 1 / 25 A GPU Sparse Direct Solver for AX=B Jonathan Hogg, Evgueni Ovtchinnikov, Jennifer Scott* STFC Rutherford Appleton Laboratory 26 March 2014 GPU Technology Conference San Jose, California * Thanks

More information

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview Aim Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity Julian Hall School of Mathematics University of Edinburgh jajhall@ed.ac.uk What should a 2-hour PhD lecture on structure

More information

Parallel Greedy Matching Algorithms

Parallel Greedy Matching Algorithms Parallel Greedy Matching Algorithms Fredrik Manne Department of Informatics University of Bergen, Norway Rob Bisseling, University of Utrecht Md. Mostofa Patwary, University of Bergen 1 Outline Background

More information

Modeling and Simulation of Hybrid Systems

Modeling and Simulation of Hybrid Systems Modeling and Simulation of Hybrid Systems Luka Stanisic Samuel Thibault Arnaud Legrand Brice Videau Jean-François Méhaut CNRS/Inria/University of Grenoble, France University of Bordeaux/Inria, France JointLab

More information

Toward a supernodal sparse direct solver over DAG runtimes

Toward a supernodal sparse direct solver over DAG runtimes Toward a supernodal sparse direct solver over DAG runtimes HOSCAR 2013, Bordeaux X. Lacoste Xavier LACOSTE HiePACS team Inria Bordeaux Sud-Ouest November 27, 2012 Guideline Context and goals About PaStiX

More information

Sparse LU Factorization for Parallel Circuit Simulation on GPU

Sparse LU Factorization for Parallel Circuit Simulation on GPU Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering Tsinghua National Laboratory for Information

More information

o-diagonal blocks. One can eciently perform such a block symbolic factorization in quasi-linear space and time complexities [5]. From the block struct

o-diagonal blocks. One can eciently perform such a block symbolic factorization in quasi-linear space and time complexities [5]. From the block struct PaStiX : A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions? Pascal Hénon, Pierre Ramet, and Jean Roman LaBRI, UMR CNRS 5800, Université Bordeaux I & ENSERB

More information

Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009

Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Dan l Pierce, PhD, MBA, CEO & President AAI Joint with: Yukai Hung, Chia-Chi Liu, Yao-Hung Tsai, Weichung Wang, and David Yu Access

More information

HYBRID DIRECT AND ITERATIVE SOLVER FOR H ADAPTIVE MESHES WITH POINT SINGULARITIES

HYBRID DIRECT AND ITERATIVE SOLVER FOR H ADAPTIVE MESHES WITH POINT SINGULARITIES HYBRID DIRECT AND ITERATIVE SOLVER FOR H ADAPTIVE MESHES WITH POINT SINGULARITIES Maciej Paszyński Department of Computer Science, AGH University of Science and Technology, Kraków, Poland email: paszynsk@agh.edu.pl

More information

Iterative Sparse Triangular Solves for Preconditioning

Iterative Sparse Triangular Solves for Preconditioning Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations

More information

Direct Solvers for Sparse Matrices X. Li September 2006

Direct Solvers for Sparse Matrices X. Li September 2006 Direct Solvers for Sparse Matrices X. Li September 2006 Direct solvers for sparse matrices involve much more complicated algorithms than for dense matrices. The main complication is due to the need for

More information

Anna Morajko.

Anna Morajko. Performance analysis and tuning of parallel/distributed applications Anna Morajko Anna.Morajko@uab.es 26 05 2008 Introduction Main research projects Develop techniques and tools for application performance

More information

Increasing the Scale of LS-DYNA Implicit Analysis

Increasing the Scale of LS-DYNA Implicit Analysis Increasing the Scale of LS-DYNA Implicit Analysis Cleve Ashcraft 2, Jef Dawson 1, Roger Grimes 2, Erman Guleryuz 3, Seid Koric 3, Robert Lucas 2, James Ong 4, Francois-Henry Rouet 2, Todd Simons 4, and

More information

Nested-Dissection Orderings for Sparse LU with Partial Pivoting

Nested-Dissection Orderings for Sparse LU with Partial Pivoting Nested-Dissection Orderings for Sparse LU with Partial Pivoting Igor Brainman 1 and Sivan Toledo 1 School of Mathematical Sciences, Tel-Aviv University Tel-Aviv 69978, ISRAEL Email: sivan@math.tau.ac.il

More information

Parallel Data Mining on a Beowulf Cluster

Parallel Data Mining on a Beowulf Cluster Parallel Data Mining on a Beowulf Cluster Peter Strazdins, Peter Christen, Ole M. Nielsen and Markus Hegland http://cs.anu.edu.au/ Peter.Strazdins (/seminars) Data Mining Group Australian National University,

More information