F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00

Size: px
Start display at page:

Download "F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00"

Transcription

1 PRLLEL SPRSE HOLESKY FTORIZTION J URGEN SHULZE University of Paderborn, Department of omputer Science Furstenallee, 332 Paderborn, Germany Sparse matrix factorization plays an important role in many numerical algorithms. In this paper we describe a scalable parallel algorithm based on the Multifrontal Method. omputational experiments on a Parsytec system with 32 processors show that large sparse matrices can be factorized in only a few seconds. Introduction Let 2 M(n; IR), = (a ij ), be a sparse positive denite matrix. can be factorized into a lower triangular matrix L so that = L L t. L is called the holesky factor of and can be computed column by column using Eq.. p p d v t d d = = v v= p vt = p d () d I? vv t =d I Here, d denotes the rst diagonal entry and v is an (n?)-vector. p d and v together form the rst column of L. The remaining columns of L can be obtained by recursively applying Eq. to the submatrix?vv t =d. Eq. also shows that the factorization process can introduce some ll into L, i.e. an element a ij = may become nonzero in L. To demonstrate this, let v = (v : : : v i : : : v j : : : v n? ) with v i ; v j 6=. Then, v ij 6= in vv t and, thus, the corresponding element in? vv t =d is nonzero even if a ij =. In general, L has much more nonzeros than and this ll heavily inuences the performance of the overall factorization process. It is well known that the amount of ll can be reduced by reordering the columns and rows of prior to factorization. The problem of determining the optimal ordering is NP-complete, therefore heuristics are used. ll heuristics are based on the observation that a symmetric matrix can be interpreted as the adjacency matrix of an undirected graph G. In G each node corresponds to a column in. Hence, a renumbering of the nodes in G gives a reordering of the columns in. One of the most successful and widely used ordering heuristics is nested dissection. 2 It starts with computing a minimal node separator S that divides G in two equally sized parts U and V. ll nodes in S are numbered higher than nodes in U and V. The method is recursively applied to the subgraphs induced by U; V. This work was supported by the German Federal Department of Science and Technology (PRLOR project) and by the EU H&M project SOOP.

2 for column j := to n do Let j; i ; : : : ; ir be the locations of nonzeros in column j of L; Let c ; : : : ; cs be the children of j in the elimination tree; Form the frontal matrix F j using the update matrices of all children of j; F j := a j;j a j;i : : : a j;i r a i ;j. U c : : : U c s ; a ir ;j Factor frontal matrix F j into l j;j : : : l i ;j. I l ir ;j : : :. U j l j;j l j;i : : : l j;i r. I ; od Figure : ore of the multifrontal method. 2 The Multifrontal Method The multifrontal method reduces the factorization of the sparse input matrix to the factorization of several dense submatrices. 6 The method is guided by a special data structure, the so called elimination tree. 7 It consists of n nodes, each corresponding to a column in L, and is dened as follows: Node p is the parent of node j if and only if p = minfi > j; l ij 6= g (p is the row index of the rst nonzero subdiagonal element in column j of L). Figure 2 shows the top levels of an elimination tree induced by a nested dissection ordering with separators S 3 ; S 2 ; S. Nodes belonging to the same separator in G form a chain in the elimination tree. Following the nested dissection rule, nodes in S 3 are numbered higher than all other nodes. Hence, S 3 is placed at the top of the elimination tree above S ; S 2 determined in the next recursion level of the nested dissection method. Figure presents the core of the multifrontal method from an algorithmic point of view. gain, L is computed column by column. single iteration of the algorithm can be described as follows: Let j; i ; : : : ; i r be the row indices of all nonzero elements in column j. Now consider all children c ; : : : ; c s of j in the elimination tree. With each child c l an update matrix U cl is associated (we will show how this update matrix is computed for j). The update matrices are summed up to form the frontal matrix F j. In general, the subscripts of the update matrices are a subset of j; i ; : : : ; i r. Therefore, the update matrices have to be extended to conform with the subscripts in F j. This extension together with the addition is symbolized by the operator (extended-add). In

3 S 3 F k G V S S 3 S 2 V 2 V 3 S S 2 F F i j V V V V 2 V 3 P P P P Figure 2: Nested dissection ordering (l) and structure of corresponding elimination tree (r). the next step, Eq. is applied to F j. This gives column L ;j = (l j;j l i;j : : : l ir;j) and the update matrix U j associated with j. Let ~ Fj denote F j without the rst column and the rst row. Then, U j = ~ Fj? (l i;j : : : l ir;j)(l i;j : : : l ir;j) t. 3 Parallelization of the Multifrontal Method The elimination tree has the interesting property that columns in dierent branches of the tree can be factorized in parallel. Thus, the elimination tree provides useful information for parallelizing the multifrontal method. The parallel algorithm can be described best with the help of a simple example (a detailed description can be found in the literature 3 ). Let us assume that P = 4 processors are used to compute L and that the processors are numbered binary from () 2 to () 2 (cf. Figure 2). Each of the 4 subtrees induced by the nodesets V ; : : : ; V 3 is completely mapped to a dierent processor. The factorization of the associated columns can be done without any communication. In the next level (levels are separated by dashed lines in Figure 2) all even columns of the factor matrix F i (F j ) are mapped to processor P (P ) and all odd columns to processor P (P ). The factor matrix F k at the topmost level is distributed over all 4 processors. For example, processor P stores all elements of F k with even column and odd row index. In this way a cyclic mapping of the columns and rows of a frontal matrix can be obtained for P = 2 k processors using only the binary representation of the processor number. ll even bits (bits are numbered from left to right and counting starts with ) determine which columns and all odd bits which rows are mapped on the processor. In the following, we describe in more detail what data has to be exchanged between the processors when Eq. is applied to F j. For ease of presentation we assume that F j is a 22 matrix with nonzeros in rows ; ; : : : ; (i.e. j = )

4 Figure 3: Mapping of 2 2 frontal matrix on 6 proc. (l) and logical proc. grid (r). and that the matrix is mapped on 6 processors according to the rule given above. Figure 3(l) shows for each diagonal and subdiagonal element of F the number of the processor it is mapped on. The elements of the rst column of F are mapped on processors ; 2; 8 and. These processors are called pivot processors and together compute the factor column L ;. In the next step, each processor computes its fraction of U. For this, the processor must have access to certain elements of L ;. For example processor 9 has to compute (l ;2 ; l ;6 ; l ; )(l ; ; l ;5 ; l ;9 ) t to obtain u 2; ; u 6; ; u ; ; u 2;5 ; u 2;9 ; u ;9. To show how each processor receives the requested data, consider the logical processor grid given in Figure 3(r). horizontal and vertical hypercubic broadcast scheme 9 is used to distribute L ; among the processors of the logical grid. First, each pivot processor initiates a horizontal broadcast to distribute its part of L ; among all processors in the same row of the logical grid. s soon as information arrives at a diagonal processor of the grid (in Figure 3(r) processors ; 3; 2 and 5 are diagonal processors), a vertical broadcast is initiated to distribute the information among all processors in the same column of the grid. For example, all elements in (l ;2 ; l ;6 ; l ; ) are mapped on pivot processor 8 and processor 9 receives the elements during the horizontal broadcast initiated by 8. The second vector (l ; ; l ;5 ; l ;9 ) is mapped on pivot processor 2 and processor 9 receives it during the vertical broadcast initiated by diagonal processor 3. Horizontal and vertical broadcast are implemented as additional threads to minimize the communication overhead of the parallel algorithm. The overhead is further minimized by using a block cyclic mapping scheme for the columns and rows of a frontal matrix. Table shows the runnig times in seconds for the factorization of four selected test problems on a Parsytec system with 32 nodes. Each node consists of a PowerP 64 (33MHz) with 64M memory (further technical information can be found at The rst three columns of Table show the number of columns/rows in and the

5 Table : Size of matrices and running times on Parsytec system (in sec.). N jj jlj GRID GRID { mat2hf mat3hf { number of nonzeros in and L. While problems GRID255 and GRID5 have a regular grid structure, problems mat2hf and mat3hf are obtained from unstructured nite element meshes. For the nested dissection ordering a multilevel graph bisection method 4 combined with the helpful set heuristic was used. omputational experiments have shown that high quality orderings can be obtained when using multilevel methods. 8;5 References Parallel algorithms for sparse matrix factorization have been extensively investigated by many researchers during the last decades. Due to space limitations only a small fraction of the relevant literature can be mentioned here.. R. Diekmann,. Monien, R. Preis, Using helpful sets to improve graph bisections, DIMS Series in Discrete Mathematics and Theoretical omputer Science, merican Mathematical Society, Volume 2, George, Nested dissection of a regular nite element mesh, SIM J. Num. nal., (973), pp Gupta, G. Karypis, V. Kumar, Highly Scalable Parallel lgorithms for Sparse Matrix Factorization, TR 94-63, S-Dept., Univ. Minnesota, Hendrickson, R. Leland, The haco User's Guide, Tech. Rep. SND , Sandia Nat. Lab., G. Karypis, V. Kumar, Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, TR 95-35, S-Dept., Univ. Minnesota, J. W.-H. Liu, The Multifrontal Method for Sparse Matrix Solution: Theory and Practice, SIM Review, 34 (992), pp J. W.-H. Liu, The Role of Elimination Trees in Sparse Factorization, SIM J. Matrix nal. ppl. Vol., No. (99), pp J. Schulze, R. Diekmann, R. Preis, omparing Nested Dissection Orderings for Parallel Sparse Matrix Factorization, Proc. of PDPT '95, SRE 96-3, pp , J. Schulze, Implementation of a Parallel lgorithm for Sparse Matrix Factorization, Tech. Rep. (in preparation), S-Dept., Univ. of Paderborn, M. Yannakakis, omputing the minimum ll-in is NP-complete, SIM J. lgebraic Discrete Methods, 2 (98) pp

A High Performance Sparse Cholesky Factorization Algorithm For. University of Minnesota. Abstract

A High Performance Sparse Cholesky Factorization Algorithm For. University of Minnesota. Abstract A High Performance Sparse holesky Factorization Algorithm For Scalable Parallel omputers George Karypis and Vipin Kumar Department of omputer Science University of Minnesota Minneapolis, MN 55455 Technical

More information

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017 Outline Introduction Graph Partitioning Problem Multilevel

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

Technical Report TR , Computer and Information Sciences Department, University. Abstract

Technical Report TR , Computer and Information Sciences Department, University. Abstract An Approach for Parallelizing any General Unsymmetric Sparse Matrix Algorithm Tariq Rashid y Timothy A.Davis z Technical Report TR-94-036, Computer and Information Sciences Department, University of Florida,

More information

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage

More information

Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing

Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing Erik G. Boman 1, Umit V. Catalyurek 2, Cédric Chevalier 1, Karen D. Devine 1, Ilya Safro 3, Michael M. Wolf

More information

sizes become smaller than some threshold value. This ordering guarantees that no non zero term can appear in the factorization process between unknown

sizes become smaller than some threshold value. This ordering guarantees that no non zero term can appear in the factorization process between unknown Hybridizing Nested Dissection and Halo Approximate Minimum Degree for Ecient Sparse Matrix Ordering? François Pellegrini 1, Jean Roman 1, and Patrick Amestoy 2 1 LaBRI, UMR CNRS 5800, Université Bordeaux

More information

Native mesh ordering with Scotch 4.0

Native mesh ordering with Scotch 4.0 Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse

More information

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the

More information

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3 UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis

More information

CS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam

CS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam CS 441 Discrete Mathematics for CS Lecture 26 Graphs Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Final exam Saturday, April 26, 2014 at 10:00-11:50am The same classroom as lectures The exam

More information

An Introduction to Graph Theory

An Introduction to Graph Theory An Introduction to Graph Theory CIS008-2 Logic and Foundations of Mathematics David Goodwin david.goodwin@perisic.com 12:00, Friday 17 th February 2012 Outline 1 Graphs 2 Paths and cycles 3 Graphs and

More information

The JOSTLE executable user guide : Version 3.1

The JOSTLE executable user guide : Version 3.1 The JOSTLE executable user guide : Version 3.1 Chris Walshaw School of Computing & Mathematical Sciences, University of Greenwich, London, SE10 9LS, UK email: jostle@gre.ac.uk July 6, 2005 Contents 1 The

More information

Parallel Implementations of Gaussian Elimination

Parallel Implementations of Gaussian Elimination s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse

More information

SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS

SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS SUPERFAS MULIFRONAL MEHOD FOR SRUCURED LINEAR SYSEMS OF EQUAIONS S. CHANDRASEKARAN, M. GU, X. S. LI, AND J. XIA Abstract. In this paper we develop a fast direct solver for discretized linear systems using

More information

Performance Evaluation of a New Parallel Preconditioner

Performance Evaluation of a New Parallel Preconditioner Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The

More information

Lecture 4: Principles of Parallel Algorithm Design (part 4)

Lecture 4: Principles of Parallel Algorithm Design (part 4) Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Minimize execution time Reduce overheads of execution Sources of overheads: Inter-process interaction

More information

Numerical Algorithms

Numerical Algorithms Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0

More information

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department CS473-Algorithms I Lecture 3-A Graphs Graphs A directed graph (or digraph) G is a pair (V, E), where V is a finite set, and E is a binary relation on V The set V: Vertex set of G The set E: Edge set of

More information

Minimizing Communication Cost in Fine-Grain Partitioning of Sparse Matrices

Minimizing Communication Cost in Fine-Grain Partitioning of Sparse Matrices Minimizing Communication Cost in Fine-Grain Partitioning of Sparse Matrices Bora Uçar and Cevdet Aykanat Department of Computer Engineering, Bilkent University, 06800, Ankara, Turkey {ubora,aykanat}@cs.bilkent.edu.tr

More information

Level 3: Level 2: Level 1: Level 0:

Level 3: Level 2: Level 1: Level 0: A Graph Based Method for Generating the Fiedler Vector of Irregular Problems 1 Michael Holzrichter 1 and Suely Oliveira 2 1 Texas A&M University, College Station, TX,77843-3112 2 The University of Iowa,

More information

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY General definitions; Representations; Graph Traversals; Topological sort; Graphs definitions & representations Graph theory is a fundamental tool in sparse

More information

A Connection between Network Coding and. Convolutional Codes

A Connection between Network Coding and. Convolutional Codes A Connection between Network Coding and 1 Convolutional Codes Christina Fragouli, Emina Soljanin christina.fragouli@epfl.ch, emina@lucent.com Abstract The min-cut, max-flow theorem states that a source

More information

CERFACS 42 av. Gaspard Coriolis, Toulouse, Cedex 1, France. Available at Date: August 7, 2009.

CERFACS 42 av. Gaspard Coriolis, Toulouse, Cedex 1, France. Available at  Date: August 7, 2009. COMBINATORIAL PROBLEMS IN SOLVING LINEAR SYSTEMS IAIN S. DUFF and BORA UÇAR Technical Report: No: TR/PA/09/60 CERFACS 42 av. Gaspard Coriolis, 31057 Toulouse, Cedex 1, France. Available at http://www.cerfacs.fr/algor/reports/

More information

On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme

On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme Emmanuel Agullo 1,6,3,7, Abdou Guermouche 5,4,8, and Jean-Yves L Excellent 2,6,3,9 1 École Normale Supérieure de

More information

Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning

Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning Kirk Schloegel, George Karypis, and Vipin Kumar Army HPC Research Center Department of Computer Science and Engineering University

More information

Graph and Hypergraph Partitioning for Parallel Computing

Graph and Hypergraph Partitioning for Parallel Computing Graph and Hypergraph Partitioning for Parallel Computing Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology June 29, 2016 Graph and hypergraph partitioning References:

More information

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination

More information

Elimination Structures in Scientific Computing

Elimination Structures in Scientific Computing Chapter 1 Elimination Structures in Scientific Computing Alex Pothen Old Dominion University Sivan Toledo Tel-Aviv University The most fundamental computation in numerical linear algebra is the factorization

More information

A Modified Inertial Method for Loop-free Decomposition of Acyclic Directed Graphs

A Modified Inertial Method for Loop-free Decomposition of Acyclic Directed Graphs MACRo 2015-5 th International Conference on Recent Achievements in Mechatronics, Automation, Computer Science and Robotics A Modified Inertial Method for Loop-free Decomposition of Acyclic Directed Graphs

More information

Combinatorial problems in solving linear systems

Combinatorial problems in solving linear systems Combinatorial problems in solving linear systems Iain S. Duff 1,2, Bora Uçar 3 1 CERFACS, 42 Av. G. Coriolis, 31057, Toulouse, France iain.duff@stfc.ac.uk 2 Atlas Centre, RAL, Oxon, OX11 0QX, England 3

More information

LINE AND PLANE SEPARATORS. PADMA RAGHAVAN y. Abstract. We consider sparse matrices arising from nite-element or nite-dierence methods.

LINE AND PLANE SEPARATORS. PADMA RAGHAVAN y. Abstract. We consider sparse matrices arising from nite-element or nite-dierence methods. LAPACK WORKING NOTE 63 (UT CS-93-202) LINE AND PLANE SEPARATORS PADMA RAGHAVAN y Abstract. We consider sparse matrices arising from nite-element or nite-dierence methods. The graphs of such matrices are

More information

Sparse matrices, graphs, and tree elimination

Sparse matrices, graphs, and tree elimination Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);

More information

Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning

Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Michael M. Wolf 1,2, Erik G. Boman 2, and Bruce A. Hendrickson 3 1 Dept. of Computer Science, University of Illinois at Urbana-Champaign,

More information

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static

More information

A substructure based parallel dynamic solution of large systems on homogeneous PC clusters

A substructure based parallel dynamic solution of large systems on homogeneous PC clusters CHALLENGE JOURNAL OF STRUCTURAL MECHANICS 1 (4) (2015) 156 160 A substructure based parallel dynamic solution of large systems on homogeneous PC clusters Semih Özmen, Tunç Bahçecioğlu, Özgür Kurç * Department

More information

Memory Hierarchy Management for Iterative Graph Structures

Memory Hierarchy Management for Iterative Graph Structures Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced

More information

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology. A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic

More information

The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method

The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method VLADIMIR ROTKIN and SIVAN TOLEDO Tel-Aviv University We describe a new out-of-core sparse Cholesky factorization

More information

Parallel Multilevel Graph Partitioning

Parallel Multilevel Graph Partitioning Parallel Multilevel raph Partitioning eorge Karypis and Vipin Kumar University of Minnesota, Department of Computer Science, Minneapolis, MN 55455 Abstract In this paper we present a parallel formulation

More information

Sparse Matrices Direct methods

Sparse Matrices Direct methods Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.

More information

Multilevel Graph Partitioning

Multilevel Graph Partitioning Multilevel Graph Partitioning George Karypis and Vipin Kumar Adapted from Jmes Demmel s slide (UC-Berkely 2009) and Wasim Mohiuddin (2011) Cover image from: Wang, Wanyi, et al. "Polygonal Clustering Analysis

More information

BIL694-Lecture 1: Introduction to Graphs

BIL694-Lecture 1: Introduction to Graphs BIL694-Lecture 1: Introduction to Graphs Lecturer: Lale Özkahya Resources for the presentation: http://www.math.ucsd.edu/ gptesler/184a/calendar.html http://www.inf.ed.ac.uk/teaching/courses/dmmr/ Outline

More information

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003 Topic Overview One-to-All Broadcast

More information

Nested-Dissection Orderings for Sparse LU with Partial Pivoting

Nested-Dissection Orderings for Sparse LU with Partial Pivoting Nested-Dissection Orderings for Sparse LU with Partial Pivoting Igor Brainman 1 and Sivan Toledo 1 School of Mathematical Sciences, Tel-Aviv University Tel-Aviv 69978, ISRAEL Email: sivan@math.tau.ac.il

More information

Matrix multiplication

Matrix multiplication Matrix multiplication Standard serial algorithm: procedure MAT_VECT (A, x, y) begin for i := 0 to n - 1 do begin y[i] := 0 for j := 0 to n - 1 do y[i] := y[i] + A[i, j] * x [j] end end MAT_VECT Complexity:

More information

However, m pq is just an approximation of M pq. As it was pointed out by Lin [2], more precise approximation can be obtained by exact integration of t

However, m pq is just an approximation of M pq. As it was pointed out by Lin [2], more precise approximation can be obtained by exact integration of t FAST CALCULATION OF GEOMETRIC MOMENTS OF BINARY IMAGES Jan Flusser Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodarenskou vez 4, 82 08 Prague 8, Czech

More information

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1

More information

Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices

Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices Nerma Baščelija Sarajevo School of Science and Technology Department of Computer Science Hrasnicka Cesta 3a, 71000 Sarajevo

More information

for Parallel Matrix-Vector Multiplication? Umit V. C atalyurek and Cevdet Aykanat Computer Engineering Department, Bilkent University

for Parallel Matrix-Vector Multiplication? Umit V. C atalyurek and Cevdet Aykanat Computer Engineering Department, Bilkent University Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication? Umit V. C atalyurek and Cevdet Aykanat Computer Engineering Department, Bilkent University 06533 Bilkent, Ankara, Turkey

More information

Hypercubes. (Chapter Nine)

Hypercubes. (Chapter Nine) Hypercubes (Chapter Nine) Mesh Shortcomings: Due to its simplicity and regular structure, the mesh is attractive, both theoretically and practically. A problem with the mesh is that movement of data is

More information

Cache Coherency and Interconnection Networks

Cache Coherency and Interconnection Networks Cache Coherency and Interconnection Networks Cluster and Grid Computing Autumn Semester (2006-2007) 7 th August 2006 Umang Jain Kumar Puspesh Pankaj Jajoo Amar Kumar Dani 03CS3004 03CS3025 03CS3024 03CS304

More information

Technical Report. OSUBMI-TR-2009-n02/ BU-CE Hypergraph Partitioning-Based Fill-Reducing Ordering

Technical Report. OSUBMI-TR-2009-n02/ BU-CE Hypergraph Partitioning-Based Fill-Reducing Ordering Technical Report OSUBMI-TR-2009-n02/ BU-CE-0904 Hypergraph Partitioning-Based Fill-Reducing Ordering Ümit V. Çatalyürek, Cevdet Aykanat and Enver Kayaaslan April 2009 The Ohio State University Department

More information

A Compiler for Parallel Finite Element Methods. with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS

A Compiler for Parallel Finite Element Methods. with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS Contemporary Mathematics Volume 00, 0000 A Compiler for Parallel Finite Element Methods with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS December 11, 1993 Abstract.

More information

A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices

A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices Ümit V. Çatalyürek Dept. of Pathology, Division of Informatics Johns Hopkins Medical Institutions Baltimore, MD 21287 umit@jhmi.edu

More information

Sparse Matrices and Graphs: There and Back Again

Sparse Matrices and Graphs: There and Back Again Sparse Matrices and Graphs: There and Back Again John R. Gilbert University of California, Santa Barbara Simons Institute Workshop on Parallel and Distributed Algorithms for Inference and Optimization

More information

Persistent Homology and Nested Dissection

Persistent Homology and Nested Dissection Persistent Homology and Nested Dissection Don Sheehy University of Connecticut joint work with Michael Kerber and Primoz Skraba A Topological Data Analysis Pipeline A Topological Data Analysis Pipeline

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Bit Summation on the Recongurable Mesh. Martin Middendorf? Institut fur Angewandte Informatik

Bit Summation on the Recongurable Mesh. Martin Middendorf? Institut fur Angewandte Informatik Bit Summation on the Recongurable Mesh Martin Middendorf? Institut fur Angewandte Informatik und Formale Beschreibungsverfahren, Universitat Karlsruhe, D-76128 Karlsruhe, Germany mmi@aifb.uni-karlsruhe.de

More information

o-diagonal blocks. One can eciently perform such a block symbolic factorization in quasi-linear space and time complexities [5]. From the block struct

o-diagonal blocks. One can eciently perform such a block symbolic factorization in quasi-linear space and time complexities [5]. From the block struct PaStiX : A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions? Pascal Hénon, Pierre Ramet, and Jean Roman LaBRI, UMR CNRS 5800, Université Bordeaux I & ENSERB

More information

Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009

Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Dan l Pierce, PhD, MBA, CEO & President AAI Joint with: Yukai Hung, Chia-Chi Liu, Yao-Hung Tsai, Weichung Wang, and David Yu Access

More information

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs Frankie Smith Nebraska Wesleyan University fsmith@nebrwesleyan.edu May 11, 2015 Abstract We will look at how to represent

More information

Discrete Mathematics, Spring 2004 Homework 8 Sample Solutions

Discrete Mathematics, Spring 2004 Homework 8 Sample Solutions Discrete Mathematics, Spring 4 Homework 8 Sample Solutions 6.4 #. Find the length of a shortest path and a shortest path between the vertices h and d in the following graph: b c d a 7 6 7 4 f 4 6 e g 4

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Course Introduction / Review of Fundamentals of Graph Theory

Course Introduction / Review of Fundamentals of Graph Theory Course Introduction / Review of Fundamentals of Graph Theory Hiroki Sayama sayama@binghamton.edu Rise of Network Science (From Barabasi 2010) 2 Network models Many discrete parts involved Classic mean-field

More information

Lecture 17: Array Algorithms

Lecture 17: Array Algorithms Lecture 17: Array Algorithms CS178: Programming Parallel and Distributed Systems April 4, 2001 Steven P. Reiss I. Overview A. We talking about constructing parallel programs 1. Last time we discussed sorting

More information

Parallel direct solution of large sparse systems. Hai Xiang Lin and Henk J. Sips. Abstract

Parallel direct solution of large sparse systems. Hai Xiang Lin and Henk J. Sips. Abstract Parallel direct solution of large sparse systems in nite element computations Hai Xiang Lin and Henk J. Sips Abstract An integrated approach for the parallel solution of large sparse systems arisen in

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering

More information

Chapter 9: Elementary Graph Algorithms Basic Graph Concepts

Chapter 9: Elementary Graph Algorithms Basic Graph Concepts hapter 9: Elementary Graph lgorithms asic Graph oncepts msc 250 Intro to lgorithms graph is a mathematical object that is used to model different situations objects and processes: Linked list Tree (partial

More information

Matrices. D. P. Koester, S. Ranka, and G. C. Fox. The Northeast Parallel Architectures Center (NPAC) Syracuse University

Matrices. D. P. Koester, S. Ranka, and G. C. Fox. The Northeast Parallel Architectures Center (NPAC) Syracuse University Parallel LU Factorization of Block-Diagonal-Bordered Sparse Matrices D. P. Koester, S. Ranka, and G. C. Fox School of Computer and Information Science and The Northeast Parallel Architectures Center (NPAC)

More information

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Issues in Sorting on Parallel

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science and Center of Simulation of Advanced Rockets University of Illinois at Urbana-Champaign

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

CSCI5070 Advanced Topics in Social Computing

CSCI5070 Advanced Topics in Social Computing CSCI5070 Advanced Topics in Social Computing Irwin King The Chinese University of Hong Kong king@cse.cuhk.edu.hk!! 2012 All Rights Reserved. Outline Graphs Origins Definition Spectral Properties Type of

More information

Minimum-Cost Spanning Tree. as a. Path-Finding Problem. Laboratory for Computer Science MIT. Cambridge MA July 8, 1994.

Minimum-Cost Spanning Tree. as a. Path-Finding Problem. Laboratory for Computer Science MIT. Cambridge MA July 8, 1994. Minimum-Cost Spanning Tree as a Path-Finding Problem Bruce M. Maggs Serge A. Plotkin Laboratory for Computer Science MIT Cambridge MA 02139 July 8, 1994 Abstract In this paper we show that minimum-cost

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26

More information

CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus

CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus Tim Davis last modified September 23, 2014 1 Catalog Description CSCE 689. Special

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

SCALABLE ALGORITHMS for solving large sparse linear systems of equations

SCALABLE ALGORITHMS for solving large sparse linear systems of equations SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational

More information

Graph Theory. ICT Theory Excerpt from various sources by Robert Pergl

Graph Theory. ICT Theory Excerpt from various sources by Robert Pergl Graph Theory ICT Theory Excerpt from various sources by Robert Pergl What can graphs model? Cost of wiring electronic components together. Shortest route between two cities. Finding the shortest distance

More information

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by Sorting Algorithms Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel

More information

Study of Butterfly Patterns of Matrix in Interconnection Network

Study of Butterfly Patterns of Matrix in Interconnection Network International Journal of Scientific & Engineering Research, Volume 7, Issue, December-6 3 ISSN 9-558 Study of Butterfly Patterns of Matrix in Interconnection Network Rakesh Kumar Katare Professor, Department

More information

HYPERGRAPH-BASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION

HYPERGRAPH-BASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION SIAM J. SCI. COMPUT. Vol. 32, No. 6, pp. 3426 3446 c 2010 Society for Industrial and Applied Mathematics HYPERGRAPH-BASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION LAURA GRIGORI,

More information

FAST COMPUTATION OF MINIMAL FILL INSIDE A GIVEN ELIMINATION ORDERING

FAST COMPUTATION OF MINIMAL FILL INSIDE A GIVEN ELIMINATION ORDERING FAST COMPUTATION OF MINIMAL FILL INSIDE A GIVEN ELIMINATION ORDERING PINAR HEGGERNES AND BARRY W. PEYTON Abstract. Minimal elimination orderings were introduced by Rose, Tarjan, and Lueker in 1976, and

More information

Hierarchical Multi level Approach to graph clustering

Hierarchical Multi level Approach to graph clustering Hierarchical Multi level Approach to graph clustering by: Neda Shahidi neda@cs.utexas.edu Cesar mantilla, cesar.mantilla@mail.utexas.edu Advisor: Dr. Inderjit Dhillon Introduction Data sets can be presented

More information

Sparse Matrix Algorithms

Sparse Matrix Algorithms Sparse Matrix Algorithms combinatorics + numerical methods + applications Math + X Tim Davis University of Florida June 2013 contributions to the field current work vision for the future Outline Math+X

More information

Chapter 8 Dense Matrix Algorithms

Chapter 8 Dense Matrix Algorithms Chapter 8 Dense Matrix Algorithms (Selected slides & additional slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to arallel Computing, Addison Wesley, 23. Topic Overview

More information

Finding a winning strategy in variations of Kayles

Finding a winning strategy in variations of Kayles Finding a winning strategy in variations of Kayles Simon Prins ICA-3582809 Utrecht University, The Netherlands July 15, 2015 Abstract Kayles is a two player game played on a graph. The game can be dened

More information

Introduction III. Graphs. Motivations I. Introduction IV

Introduction III. Graphs. Motivations I. Introduction IV Introduction I Graphs Computer Science & Engineering 235: Discrete Mathematics Christopher M. Bourke cbourke@cse.unl.edu Graph theory was introduced in the 18th century by Leonhard Euler via the Königsberg

More information

N. Hitschfeld. Blanco Encalada 2120, Santiago, CHILE.

N. Hitschfeld. Blanco Encalada 2120, Santiago, CHILE. Generalization of modied octrees for geometric modeling N. Hitschfeld Dpto. Ciencias de la Computacion, Univ. de Chile Blanco Encalada 2120, Santiago, CHILE E-mail: nancy@dcc.uchile.cl Abstract. This paper

More information

PARDISO Version Reference Sheet Fortran

PARDISO Version Reference Sheet Fortran PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly

More information

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15 CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) HW#3 Due at the beginning of class Thursday 02/26/15 1. Consider a model of a nonbipartite undirected graph in which

More information

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur Lecture : Graphs Rajat Mittal IIT Kanpur Combinatorial graphs provide a natural way to model connections between different objects. They are very useful in depicting communication networks, social networks

More information

Graph Adjacency Matrix Automata Joshua Abbott, Phyllis Z. Chinn, Tyler Evans, Allen J. Stewart Humboldt State University, Arcata, California

Graph Adjacency Matrix Automata Joshua Abbott, Phyllis Z. Chinn, Tyler Evans, Allen J. Stewart Humboldt State University, Arcata, California Graph Adjacency Matrix Automata Joshua Abbott, Phyllis Z. Chinn, Tyler Evans, Allen J. Stewart Humboldt State University, Arcata, California Abstract We define a graph adjacency matrix automaton (GAMA)

More information

Heuristic Graph Bisection with Less Restrictive Balance Constraints

Heuristic Graph Bisection with Less Restrictive Balance Constraints Heuristic Graph Bisection with Less Restrictive Balance Constraints Stefan Schamberger Fakultät für Elektrotechnik, Informatik und Mathematik Universität Paderborn Fürstenallee 11, D-33102 Paderborn schaum@uni-paderborn.de

More information

Graph Matrices and Applications: Motivational Overview The Problem with Pictorial Graphs Graphs were introduced as an abstraction of software structure. There are many other kinds of graphs that are useful

More information

Ecient Processor Allocation for 3D Tori. Wenjian Qiao and Lionel M. Ni. Department of Computer Science. Michigan State University

Ecient Processor Allocation for 3D Tori. Wenjian Qiao and Lionel M. Ni. Department of Computer Science. Michigan State University Ecient Processor llocation for D ori Wenjian Qiao and Lionel M. Ni Department of Computer Science Michigan State University East Lansing, MI 4884-107 fqiaow, nig@cps.msu.edu bstract Ecient allocation of

More information

An Ecient Parallel Algorithm. for Matrix{Vector Multiplication. Albuquerque, NM 87185

An Ecient Parallel Algorithm. for Matrix{Vector Multiplication. Albuquerque, NM 87185 An Ecient Parallel Algorithm for Matrix{Vector Multiplication Bruce Hendrickson 1, Robert Leland 2 and Steve Plimpton 3 Sandia National Laboratories Albuquerque, NM 87185 Abstract. The multiplication of

More information

High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms

High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering (CiE) Scientific Computing

More information