High Performance Matrix Computations

Size: px
Start display at page:

Download "High Performance Matrix Computations"

Transcription

1 High Performance Matrix Computations J.-Y. L Excellent (INRIA/LIP-ENS Lyon) Jean-Yves.L.Excellent@ens-lyon.fr Office -8 prepared in collaboration with P. Amestoy, L.Giraud, M. Daydé (ENSEEIHT-IRIT) Contents Lectures related to sparse direct methods novembre Introduction to Sparse Matrix Computations. Motivation and main issues Sparse matrices Gaussian elimination Symmetric matrices and graphs Ordering sparse matrices. Objectives/Outline Fill-reducing orderings Impact of fill reduction algorithm on the shape of the tree Postorderings and memory usage Equivalent orderings and elimination trees Reorder unsymmetric matrices to special forms Combining approaches Factorization of sparse matrices. Introduction Elimination tree and Multifrontal approach Comparison between approaches for LU factorization Task mapping and scheduling Distributed memory approaches Some parallel solvers Case study: comparison of MUMPS and SuperLU Concluding remarks Related topics Introduction to Sparse Matrix Computations A selection of references Books Duff, Erisman and Reid, Direct methods for Sparse Matrices, Clarenton Press, Oxford 98. Dongarra, Duff, Sorensen and H. A. van der Vorst, Solving Linear Systems on Vector and Shared Memory Computers, SIAM, 99. George, Liu, and Ng, Computer Solution of Sparse Positive Definite Systems, book to appear

2 Articles Gilbert and Liu, Elimination structures for unsymmetric sparse LU factors, SIMAX, 99. Liu, The role of elimination trees in sparse factorization, SIMAX, 99. Heath and E. Ng and B. W. Peyton, Parallel Algorithms for Sparse Linear Systems, SIAM review 99.. Motivation and main issues Motivations solution of linear systems of equations key algorithmic kernel Main parameters: Continuous problem Discretization Solution of a linear system Ax = b Numerical properties of the linear system (symmetry, pos. definite, conditioning,... ) Size and structure: Large (>?), square/rectangular Dense or sparse (structured / unstructured) Target computer (sequential/parallel) Algorithmic choices are critical Motivations for designing efficient algorithms Time-critical applications Solve larger problems Decrease elapsed time (parallelism?) Minimize cost of computations (time, memory) Difficulties Access to data : Computer : complex memory hierarchy (registers, multilevel cache, main memory (shared or distributed), disk) Sparse matrix : large irregular dynamic data structures. Exploit the locality of references to data on the computer (design algorithms providing such locality) Efficiency (time and memory) Number of operations and memory depend very much on the algorithm used and on the numerical and structural properties of the problem. The algorithm depends on the target computer (vector, scalar, shared, distributed, clusters of Symmetric Multi-Processors (SMP), GRID). Algorithmic choices are critical

3 . Sparse matrices Sparse matrices Example: x + x = x - x = x + x = can be represented as where A =, x = x x x Sparse matrix: only nonzeros are stored. Sparse matrix? Ax = b,, and b = Original matrix nz = Matrix dwt 9.rua (N=9, NZ=); Structural analysis of a submarine Factorization process Solution of Ax = b A is unsymmetric : A is factorized as: A = LU, where L is a lower triangular matrix, and U is an upper triangular matrix. Forward-backward substitution: Ly = b then Ux = y A is symmetric: A = LDL T or LL T A is rectangular m n with m n and min x Ax b : A = QR where Q is orthogonal (Q = Q T and R is triangular). Solve: y = Q T b then Rx = y

4 Difficulties Only non-zero values are stored Factors L and U have far more nonzeros than A Data structures are complex Computations are only a small portion of the code (the rest is data manipulation) Memory size is a limiting factor out-of-core solvers Key numbers: - Average size : MB matrix; Factors = GB; Flops = Gflops ; - A bit more challenging : Lab. Géosiences Azur, Valbonne Complex matrix arising in D, nonzeros Storage : GB ( GB with the factors?) Flops : tens of TeraFlops - Typical performance (MUMPS): PC LINUX (P, GHz) :. GFlops/s Cray TE ( procs) : Speed-up, Perf. GFlops/s Typical test problems: Typical test problems: BMW car body,, unknowns,,,99 nonzeros, MSC.Software Size of factors:. Number of operation Size of factors: 9. million entries Number of operations:.9 9 BMW crankshaft, 8, unknowns,,9,8 nonzeros, MSC.Software

5 Sources of parallelism Several levels of parallelism can be exploited: At problem level: problem can de decomposed into sub-problems (e.g. domain decomposition) At matrix level arising from its sparse structure At submatrix level within dense linear algebra computations (parallel BLAS,... ) Data structure for sparse matrices Storage scheme depends on the pattern of the matrix and on the type of access required band or variable-band matrices block bordered or block tridiagonal matrices general matrix row, column or diagonal access Data formats for a general sparse matrix A What needs to be represented Assembled matrices: MxN matrix A with NNZ nonzeros. Elemental matrices (unassembled): MxN matrix A with NELT elements. Arithmetic: Real ( or 8 bytes) or complex (8 or bytes) Symmetric (or Hermitian) store only part of the data. Distributed format? Duplicate entries and/or out-of-range values? Classical Data Formats for Assembled Matrices Example of a x matrix with NNZ= nonzeros a a a a a Coordinate format IRN [ : NNZ] = JCN [ : NNZ] = VAL [ : NNZ] = a a a a a Compressed Sparse Column (CSC) format IRN [ : NNZ] = VAL [ : NNZ] = a a a a a COLPTR [ : N + ] = column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+)- Compressed Sparse Column format: Similar to CSC, but row by row na na a Diagonal format: NDIAG = IDIAG = VAL = a a a (na: not accessed) na a na

6 Example of elemental matrix format A A, A = N= NELT= NVAR= A = P NELT i= A i ELTPTR [:NELT+] = ELTVAR [:NVAR] = ELTVAL [:NVAL] = Remarks: NVAR = NVAL = P S i (unsym) ou P S i(s i + )/ (sym), avec S i = ELT P T R(i + ) ELT P T R(i) storage of elements in ELTVAL: by columns File storage: Rutherford-Boeing Standard ASCII format for files Header + Data (CSC format). key xyz: x=[rcp] (real, complex, pattern) y=[suhzr] (sym., uns., herm., skew sym., rectang.) z=[ae] (assembled, elemental) ex: M T.RSA, SHIP.RSE Supplementary files: right-hand-sides, solution, permutations... Canonical format introduced to guarantee a unique representation (order of entries in each column, no duplicates). File storage: Rutherford-Boeing DNV-Ex : Tubular joint M_T rsa (I8) (I8) (e.) E+.98E+9.E+.889E E+8.89E+8.889E+ -.98E E+.889E+8.99E+ -.99E+.9998E E E+9.8E+8.8E E E+8.98E+9.988E E E E+8 A. Gaussian elimination Gaussian elimination A = A (), b = b (), A () x = b () a a a a a a a a a x x x A b b b A a /a A () x = b () a /a a a a a () a () a () a () A Finally A () x = b () x x x A = b b () b () C A a a a a () a () a () b () = b a b /a... a () = a a a /a... A Typical Gaussian elimination step k : x x x A = a (k+) ij b b () b () = a (k) ij C A a () () = a() () a() a() /a()... a(k) ik a(k) kj a (k) kk

7 Relation with A = LU factorization One step of Gaussian elimination can be written: A (k+) = L (k) A (k), with L k = l k+,k... l n,k C A and l ik = a(k) ik a (k) kk. Then, A (n) = U = L (n )... L () A, which gives A = LU, with L = [L () ]... [L (n ) ] = l i,j C A, In dense codes, entries of L and U overwrite entries of A. Furthermore, if A is symmetric, A = LDL T with d kk = a (k) kk : A = LU = A t = U t L t implies (U)(L t ) = L U t = D diagonal and U = DL t, thus A = L(DL t ) = LDL t Gaussian elimination and sparsity Step k of LU factorization (a kk pivot): For i > k compute l ik = a ik /a kk (= a ik ), For i > k, j > k or a ij = a ij a ik a kj a kk a ij = a ij l ik a kj If a ik et a kj then a ij If a ij was zero its non-zero value must be stored k j k j k x x k x x i x x i x fill-in Idem for Cholesky : For i > k compute l ik = a ik / a kk (= a ik ), For i > k, j > k, j i (lower triang.) a ij = a ij a ik a jk akk or a ij = a ij l ik a jk

8 Example Original matrix x x x x x x x x x x x x x Matrix is full after the first step of elimination After reordering the matrix (st row and column last row and column) x x x x x x x x x x x x x No fill-in Ordering the variables has a strong impact on the fill-in the number of operations Table : Benefits of Sparsity on matrix of order with nonzeros. (Dongarra etal 9). Procedure Total storage Flops Time (sec.) on CRAY J9 Full Syst. 8 Kwords. Sparse Syst. Kwords. Sparse Syst. and reordering Kwords.9 Efficient implementation of sparse solvers Indirect addressing is often used in sparse calculations: e.g. sparse SAXPY do i =, m A( ind(i) ) = A( ind(i) ) + alpha * w( i ) enddo Even if manufacturers provide hardware for improving indirect addressing It penalizes the performance Switching to dense calculations as soon as the matrix is not sparse enough Effect of switch to dense calculations Matrix from -point discretization of the Laplacian on a grid (Dongarra etal 9) Sparse structure should be exploited if density < %. 8

9 Density for Order of Millions Time switch to full code full matrix of flops (sec.) No switch Symmetric matrices and graphs Symmetric matrices and graphs Assumptions: A symmetric and pivots are chosen on the diagonal Structure of A symmetric represented by the graph G = (V, E) Vertices are associated to columns: V = {,..., n} Edges E are defined by: (i, j) E a ij G undirected (symmetry of A) Remarks: Number of nonzeros in column j = Adj G (j) Symmetric permutation renumbering the graph Symmetric matrix Corresponding graph The elimination graph model Construction of the elimination graphs Let v i denote the vertex of index i. G = G(A), i =. At each step delete v i and its incident edges Add edges so that vertices in Adj(v i ) are pairwise adjacent in G i = G(H i ). G i are the so-called elimination graphs. G : H = 9

10 A sequence of elimination graphs G : H = G : H = G : H = G : H = Introducing the filled graph G + (A) Let F = L + L T be the filled matrix, and G(F) the filled graph of A denoted by G + (A). Lemma (Parter 9) : (v i, v j ) G + (v i, v k ) G + and (v k, v j ) G +. if and only if (v i, v j ) G or k < min(i, j) such that + G (A) = G(F) F = L + L T Modeling elimination by reachable sets The fill edge (v, v ) is due to the path (v, v, v ) in G. However (v, v ) originates from the path (v, v, v ) in G. Thus the path (v, v, v, v ) in the original graph is in fact responsible of the fill in edge (v, v ). Illustration : + G (A) = G(F) F = L + L T This has motivated George and Liu to introduce the notion of reachable sets.

11 Reachable sets Let G = (V, E) be a graph and S be a subset of V. Definition: The vertex v is said to be reachable from a vertex w through S if there exists a path (w, v,..., v k, v) such that all v i S (where k could be zero). Reach(w, S)= is the set of reachable vertices from w through S. Illustration: let S = {s, s, s, s } then Reach(w, S) = {a, b, c} a S S w S d b c S Fill-in and reachable sets The filled graph G + (A) can be characterized with reachable sets. of vertices eliminated at step i. Let S i = {v... v i } denote the set Theorem. For i < j, (v i, v j ) G + (A) if and only if v j Reach(v i, S i ). In terms of the matrix the set Reach(v i, S i )is the set of row subscripts that correspond to non-zeros entries below the diagonal in column i of L. Illustration: Reach(v i, S i ) provides the column structure of column i of L Reach(, S ) = {, 8}; Reach(, S ) = {, } ; Reach(, S ) = {8}; Reach(, S ) = {, }; Reach(, S ) = {,, 8} Structure of L factors Theorem. Let w be a vertex in G i. The set of vertices adjacent to w in G i (column structure of w) is given by Reach(w, S i ) where the reach operator is applied to the original graph G (A). The structure of L can thus be characterized in terms of the structure of A. Illustration : S = {, }, Reach(, S ) = {,, }; Reach(, S ) = {, }; Reach(, S ) = {, }; and Reach(, S ) = {,, } due to the paths (,,, ), (,,, ), (, ). G : G : + G (A) = G(F) F = L + L T Previous theorem shows that the whole elimination process can be described implicitly by the sequence of reach operator. It can be considered as an implicit model for elimination, as opposed to the explicit model using elimination graphs.

12 This theorem also gives indication on the impact of reordering on the fill-in: Suppose that S separates G into two disconnected components C and C. If the vertices in S are labelled after C and C then from previous theorem there cannot be fill-edge between vertices of the disconnected components. A first definition of the elimination tree A spanning tree of a connected graph G is a subgraph T of G such that if there is a path in G between i and j then there exists a path between i and j in T. Let A be a symmetric positive-definite matrix, A = LL T its filled graph (graph of F = L + L T ). its Cholesky factorization, and G + (A) Definition. The elimination tree of A is a spanning tree of G + (A) satisfying the relation P ARENT [j] = min{i > j l ij }. Graph structures A = a b c d e f g h i j F = a b c d e f g h i j fill in entries entries in T(A) j a c g a c g 9 i h d i e G(A) f j b 8 9 h i d e j b + G (A) = G(F) f e b c a 8 h g f T(A) d Properties of the elimination tree Another perspective also leads to the elimination tree j i j i Dependency between columns of L :. Column i > j depends on column j iff l ij. Use a directed graph to express this dependency. Simplify redundant dependencies (transitive reduction in graph theory) The transitive reduction of the directed filled graph gives the elimination tree structure

13 Directed filled graph and its transitive reduction a c g a c g 9 j i 8 9 h i d e f j b 8 9 h i d e f j b e b c 8 h g f d Directed filled graph Transitive reduction a T(A) Ordering sparse matrices. Objectives/Outline Ordering sparse matrices: objectives/outline Reduce fill-in and number of operations during factorization (local and global heuristics): Increase parallelism (wide tree) Decrease memory usage (deep tree) Equivalent orderings : (Traverse tree to minimize working memory) Reorder unsymmetric matrices to special forms: block upper triangular matrix: with (large) non-zero entries on the diagonal (maximum transversal). Combining approaches. Fill-reducing orderings Fill-reducing orderings Three main classes of methods for minimizing fill-in during factorization Global approach: The matrix is permuted into a matrix with a given pattern Fill-in is restricted to occur within that structure Cuthill-McKee (block tridiagonal matrix) Nested dissections ( block bordered matrix). Local heuristics: At each step of the factorization, selection of the pivot that is likely to minimize fill-in. Method is characterized by the way pivots are selected. Markowitz criterion (for a general matrix). Minimum degree (for symmetric matrices). Hybrid approaches: Once the matrix is permuted in order to obtain a block structure, local heuristics are used within the blocks. Consider the matrix:

14 The corresponding graph is x x x x x x A = x x x x x x x x x x x x Cuthill-McKee algorithm Goal: reduce the profile/bandwidth of the matrix (the fill is restricted to the band structure) Level sets (such as Breadth First Search) are built from the vertex of minimum degree (priority to the vertex of smallest number) We get: S = {}, S = {}, S = {, }, S = {, } and thus the ordering,,,,,. The reordered matrix is: x A = x x x x x x x x x x x x x x x x x Reverse Cuthill-McKee The ordering is the reverse of that obtained using Cuthill-McKee i.e. on the example {,,,,, } The reordered matrix is: x x x x x x x x A = x x x x x x x x x x More efficient than Cuthill-McKee at reducing the envelop of the matrix. Nested Dissection Recursive approach based on graph partitioning. S Graph partitioning S () () () () S Permuted matrix S S S

15 Local heuristics to reduce fill-in during factorization Let G(A) be the graph associated to a matrix A that we want to order using local heuristics. Metric such that Metric(v i ) < Metric(v j ) implies v i is a better than v j Generic algorithm Loop until all nodes are selected Step: select current node p (so called pivot) with minimum metric value, Step: update elimination graph, Step: update Metric(v j ) for all non-selected nodes v j. Step should only be applied to nodes for which the Metric value might have changed. Let Reordering unsymmetric matrices: Markowitz criterion At step k of Gaussian elimination: U L A k ri k = number of non-zeros in row i of A k c k j = number of non-zeros in column j of Ak a kk must be large enough and should minimize (ri k ) (ck j ) i, j > k Minimum degree : Markowitz criterion for symmetric diagonally dominant matrices Minimum degree algorithm Step : Select the vertex that possesses the smallest number of neighbors in G (a) Sparse symmetric matrix (b) Elimination graph The node/variable selected is of degree. Notation for the elimination graph Let G k = (V k, E k ) be the graph built at step k. G k describes the structure of A k after eliminating k pivots. G k is non-oriented (A k is symmetric) Fill-in in A k adding edges in the graph.

16 Illustration Step : elimination of pivot (a) Elimination graph (b) Factors and active submatrix Initial nonzeros Nonzeros in factors Fill in Minimum degree algorithm applied to the graph: Step k : Select the node with the smallest number of neighbors G k is built from G k by suppressing the pivot and adding edges corresponding to fill-in. Illustration (cont d) Graphs G, G, G and corresponding reduced matrices (a) Elimination graphs (b) Factors and active submatrices Original nonzero Original nonzero modified Fill in Nonzeros e in factors Minimum Degree does not always minimize fill-in!!!

17 Consider the following matrix Remark: Using initial ordering No fill in Corresponding elimination graph Step of Minimum Degree: Select pivot (minimum degree = ) Updated graph 9 Add (,) i.e. fill in 8 Reduce time complexity. Accelerate selection of pivots and update of the graph:. Supervariables: if several variables have the same adjacency structure in G k, they can be eliminated simultaneously.. Two non-adjacent nodes of same degree can be eliminated simultaneously (multiple eliminations).. Degree update of neighbours of the pivot can be effected in an approximate way (Approximate Minimum Degree). Reduce memory complexity. Decrease size of working space Using the elimination graph, working space is of order O(#nonzeros in factors). Fill-in: Let pivot be the pivot at step k If i Adj G k (pivot) then Adj G k (pivot) Adj G k (i) Structure of pivot column included in filled structure of column i. We can then use an implicit representation of fill-in by defining the notion of element (variable already eliminated) and quotient graph. A variable of the quotient graph is adjacent to variables and elements. One can show that k [... N], the size of the quotient graph is O(G ) Influence on the structure of factors Harwell-Boeing matrix: dwt 9.rua, structural computing on a submarine. NZ(LU factors)=8 nz =

18 Structure of factors after permutation Minimum Degree MMD (.+.+) nz = nz = 88 Detection of supervariables allows to build more regularly structured factors (easier factorization). Comparison of implementations of Minimum Degree Let V be the initial algorithm (based on the elimination graph) MMD the version including./ +./ + / (Multiple Minimum Degree, Liu 98, 989), used in MATLAB AMD the version including./ +./ + / (Approximate Minimum Degree, Amestoy, Davis, Duff 99). Execution times (secs) on a SUN Sparc Matrix Order Nonzeros Minimum Degree V MMD AMD dwt Min. memory size KB KB KB Wang 8 - Orani Fill-in is similar Memory space for MMD and AMD : NZ integers V was not able to perform reordering for the last matrices (lack of memory after hours of computations) Mininimum fill-in heuristics Recalling the generic algorithm Let G(A) be the graph associated to a matrix A that we want to order using local heuristics. Let Metric be such that Metric(v i ) < Metric(v j ) v i is a better than v j Generic algorithm Loop until all nodes are selected Step: Select current node p (so called pivot) with minimum metric value, Step: update elimination (or quotient) graph, Step: update Metric(v j ) for all non-selected nodes v j. Step should only be applied to nodes for which the Metric value might have changed. 8

19 Minimum fill based algorithm Metric(v i ) is the amount of fill-in that v i would introduce if it were selected as a pivot. Illustration: r has a degree d = and a fill-in metric of d (d )/ = whereas s has degree d = but a fill-in metric of d (d )/ 9 =. j j j r j s i i i i i Minimum fill-in properties The situation typically occurs when {i, i, i } and {i, i, i, i } were adjacent to two already selected nodes (here e and e ) j j j r j s i i i i i e e r s i i i i i j j j j e and e are previously selected nodes The elimination of a node v k affects the degree of nodes adjacent to v k. Adj(Adj(v k )) is also affected. The fill-in metric of Illustration: selecting r affects the fill-in metric of i (because of fill edge (j, j )). How to compute the fill-in metrics Computing the exact minimum fill-in metric is too costly Only nodes adjacent to current pivot are updated. Only approximated metrics (using clique structures) are computed Let d k be the degree of node k; d k (d k )/ is an upper bound of the fill (s d s = d s (d s )/ = ). Several possibilities:. Deduce the clique area of the last selected pivot adjacent to k (s clique of e ).. Deduce the largest clique area of all adjacent selected pivots (s clique of e ). If for d k we use instead AMD then cliques of all adjacent selected pivots can be deduced.. Impact of fill reduction algorithm on the shape of the tree Impact of fill reduction on the shape of the tree 9

20 Reordering technique Shape of the tree observations AMD Deep well-balanced Large frontal matrices on top AMF Very deep unbalanced Small frontal matrices Reordering technique Shape of the tree observations PORD deep unbalanced Small frontal matrices SCOTCH Very wide well-balanced Large frontal matrices METIS Wide well-balanced Smaller frontal matrices (than SCOTCH)

21 Importance of the shape of the tree Suppose that each node in the tree corresponds to a task that: - consumes temporary data from the children, - produces temporary data, that is passed to the parent node. Wide tree Good parallelism Many temporary blocks to store Large memory usage Deep tree Less parallelism Smaller memory usage. Postorderings and memory usage Trees, topological orderings and postorderings A rooted tree is a tree for which one node has been selected to be the root. A topological ordering of a rooted tree is an ordering that numbers children nodes before their parent. Postorderings are topological orderings which number nodes in any subtree consecutively. v z y u x w Connected graph G z y v x u w Rooted spanning tree with topological ordering u y x z w v Rooted spanning tree with postordering Postorderings and memory usage Assumptions: Tree processed from the leaves to the root Parents processed as soon as all children have completed (postorder of the tree) Each node produces and sends temporary data consumed by its father. Exercise: In which sense is postordering-based tree traversal be more interesting than a random topological ordering? Furthermore, memory usage also depends on the postordering chosen:

22 a b a b c d f e g h c d e f g h i i Best (abcdefghi) Worst (hfdbacegi) Leaves Root Example : Processing a wide tree Memory unused memory space stack memory space factor memory space non-free memory space Active memory Example : Processing a deep tree

23 Memory.. Allocation of Assembly step for Factorization step for + Stack step for.. unused memory space factor memory space stack memory space non-free memory space Modelization of the problem M i : memory peak for complete subtree rooted at i, temp i : temporary memory produced by node i, m parent : memory for storing the parent. M(parent) temp temp temp M M M M parent = max( max nbchildren j= (M i + j k= temp k), m parent + nbchildren j= temp j ) () Objective: order the children to minimize M parent Memory-minimizing schedules Theorem. [Liu,8] The minimum of max j (x j + j i= y i) is obtained when the sequence (x i, y i ) is sorted in decreasing order of x i y i. Corollary. An optimal child sequence is obtained by rearranging the children nodes in decreasing order of M i temp i. Interpretation: At each level of the tree, child with relatively large peak of memory in its subtree (M i large with respect to temp i ) should be processed first. Apply on complete tree starting from the leaves (or from the root with a recursive approach)

24 Optimal tree reordering Objective: Minimize peak of stack memory Tree Reorder (T ): Begin for all i in the set of root nodes do Process Node(i); end for End Process Node(i): if i is a leaf then M i=m i else for j = to nbchildren do Process Node(j th child); end for Reorder the children of i in decreasing order of (M j temp j); Compute M parent at node i using Formula (); end if. Equivalent orderings and elimination trees Equivalent orderings of symmetric matrices Let F be the filled matrix of a symmetric matrix A (that is, F = L + L t, where A = LL T G + (A) is the associated filled graph. Definition (Equivalent orderings). P and Q are said to be equivalent orderings iff G + (PAP T ) = G + (QAQ T ) By extension, a permutation P is said to be an equivalent ordering of a matrix A iff G + (PAP T ) = G + (A) It can be shown that an equivalent reordering also preserves the amount of arithmetic operations for sparse Cholesky factorization. Relation with elimination trees Let A be a reordered matrix, and G + (A) be its filled graph In the elimination tree, any tree traversal (that processes children before parents) corresponds to an equivalent ordering P of A and the elimination tree of PAP T is identical to that of A. A = u v w x F y F z z v z y y x v u x + Graph G (A) w u w Elimination Tree of A with topological ordering Tree rotations Definition. An ordering that does not introduce any fill is referred to as a perfect ordering Natural ordering is a perfect ordering of the filled matrix F.

25 Theorem 8. For any node x of G + (A) = G(F), there exists a perfect ordering on G(F) such that x is numbered last. Essence of tree rotations : Nodes in the clique of x in F are numbered last Relative ordering of other nodes is preserved. Example of equivalent orderings On the right hand side tree rotations applied on w : relative ordering w.r.t. tree on the left is preserved). (clique of w is {w, x} and for other nodes F = u w x y F v F z u y v z F F x w z y x u w v with a postordering w x z y u v Remark: Tree rotations can help reducing the temporary memory usage!. Reorder unsymmetric matrices to special forms Unsymmetric matrices and graphs An unsymmetric matrix can be seen as a bipartite graph: G = (V r, V c, E V r V c ) (r, c) E iff there is an entry at row r column c. or a directed graph (digraph): G = (V, E) There is an oriented edge from the source r to the target c ((r, c) E) iff there is an entry at row r column c. Maximum matching/transversal orderings Let A be a sparse matrix and G = (V r, V c, E V r V c ) its associated bipartite graph. M E is a matching iff for all (r, c ), (r, c ) in M distinct, r r and c c. Structural problem: Maximum transversal: how to permute the columns of A to have the maximum number of non zero entries on the diagonal? Maximum matching: how to find a matching of maximum size? dmperm in MATLAB, MC in fortran (HSL library).

26 Components of the Maximum Matching algorithm Suppose that matching of size k has been found (i.e. there exists permutations to place entries on the first k diagonal positions). augmenting paths: try to find a matching of size k + using a matching of size k, branch and bound techniques on depth first search. X X X X X X X X X X X X r r r r r r c c c c c c Structurally singular matrices If the matching is not maximal (cannot be extended) then the matrix is said to be structurally singular or symbolically singular. In the example Columns and are multiple of each other. X X X X X X X Illustration of the algorithm X X X X X X X X X X X X r r r r r r c c c c c c X X X X X X X X X X X X r r r r r r c c c c c c X X X X X X X X X X X X r r r r r r c c c c c c

27 X X X X X X X X X X X X r r r r r c c c c c X X X X X X X X X X X X r r r r r c c c c c r c r c Maximum weighted matching Structural+numerical problem: Maximum weighted transversal: how to find a permutation P of A columns such that the product (or another metric: min, sum... ) of the diagonal entries of AP is maximum? Maximum weighted matching? Same techniques as in maximum matching, but branch and bound more complicated and it is more efficient to do breadth first search. MC in fortran or 9 (HSL library). Reduction to Block Triangular Form (BTF) Suppose that there exists permutations matrices P and Q such that B B B B B B PAQ = B N B N B N... B NN If N > A is said to be reducible (irreducible otherwise) Each B ii is supposed to be irreducible (otherwise finer decomposition is possible) Advantage: to solve Ax = b only B ii need be factored B ii x i = (Pb) i i j= B ijy j, i =,..., N with y = Q T x A two-stage approach to compute the reduction Stage (): compute a (column) permutation matrix Q such that AQ has non zeros on the diagonal (find a maximum transversal of A), then Stage(): compute a (symmetric) permutation matrix P such that PAQ P t is BTF. The diagonal blocks of the BTF are uniquely defined. Techniques exits to directly compute the BTF form. They do not present advantage over this two stage approach. Main components of the algorithm Objective : assume AQ has non zeros on the diagonal and compute P such that PAQ P t is BTF. Use of a digraph (directed graph) associated to the matrix. Symmetric permutations on digraph relabelling nodes of the graph. If there is no closed path through all nodes in the digraph then the digraph can be subdivided into two parts. Strong components of a graph are the set of nodes belonging to a closed path. The strong components of the graph are the diagonal blocks B ii of the BTF format.

28 A = Digraph of A. PAPt = BTF of A Preamble : Finding the triangular form of a matrix Observations: a triangular matrix has a BTF form with blocks of size (BTF will be considered as generalization of the triangular form where each diagonal entry is now a strong component). Note that if the matrix has triangular form the digraph has no cycle (i.e. acyclic). Sargent and Westerberg algorithm :. Select a starting node and trace a path until finding a node from which no path leaves.. Number the last node first, eliminate it (and all edges pointing to it).. Continue from previous node on the path (or choose a new starting node). Digraph of A. Path Steps 8 9 A = PAPt = A permuted to triangular form (not unique) Generalizing Sargent and Westerberg algorithm to compute a BTF A composite node is a set nodes belonging to a closed path. Start from any node and follow a path until () a closed path is found: collapse all nodes in a closed path into a composite node the path continues from the composite node. () reaching a node or composite node from which no path leaves: the node or composite node is numbered next. 8

29 Digraph of A. PATHS STEPS 8 9 A = Digraph of A. PAPt = BTF of A Hall property Consider the bipartite representation of an unsymmetric matrix A bipartite graph with m rows and n columns has the Hall property if every set of k column vertices is adjacent to at least k row vertices for all k n. Hall theorem: A bipartite graph has a column matching iff it has the Hall property Strong Hall property A bipartite graph with m rows and n columns has the strong Hall property if every set of k column vertices is adjacent to at least k + row vertices for all k < n. Any matrix that is not strong Hall can be permuted in block lower triangular form.. Combining approaches Example () of hybrid approach Top-down followed by bottom-up processing of the graph: Top-down : Apply nested dissection (ND) on complete graph Bottom-up : Local heuristic on each subgraph Generally better for large-scale irregular problems than pure nested dissection local heuristics 9

30 ( cont) hybrid approach Graph partitioning S MD () MD () S MD () MD () S Permuted matrix S S S S S S ND Elimination graph MD Example () : combine maximum transversal and fill-in reduction Consider the LU factorization A = LU of an unsymmetric matrix. Compute the column permutation Q leading to a maximum numerical transversal of A. AQ has large (in some sense) numerical entries on the diagonal. Find best ordering of AQ preserving the diagonal entries. Equivalent to finding symmetric permutation P such that the factorization of PAQP T has reduced fill-in. Factorization of sparse matrices Outline. Introduction. Elimination tree and multifrontal method. Comparison between multifrontal,frontal and general approaches for LU factorization. Task mapping and scheduling. Distributed memory approaches: fan-in, fan-out, multifrontal. Some parallel solvers; case study on MUMPS and SuperLU.. Concluding remarks. Introduction Recalling the Gaussian elimination Step k of LU factorization (a kk pivot): For i > k compute l ik = a ik /a kk (= a ik ), For i > k, j > k such that a ik and a kj are nonzeros a ij = a ij a ik a kj a kk

31 If a ik et a kj then a ij If a ij was zero its non-zero value must be stored Orderings (minimum degree, Cuthill-McKee, ND) limit fill-in, the number of operations and modify the tasks graph Three-phase scheme to solve Ax = b. Analysis step Preprocessing of A (symmetric/unsymmetric orderings, scalings) Build the dependency graph (elimination tree, edag... ). Factorization (A = LU, LDL T, LL T, QR) Numerical pivoting. Solution based on factored matrices triangular solves: Ly = b, then Ux = y improvement of solution (iterative refinement), error analysis Control of numerical stability: numerical pivoting In dense linear algebra partial pivoting commonly used (at each step the largest entry in the column is selected). In sparse linear algebra, flexibility to preserve sparsity is offered : Partial threshold pivoting : Eligible pivots are not too small with respect to the maximum in the column. Set of eligible pivots = {r a (k) rk u max i a (k) ik }, where < u. Then among eligible pivots select one preserving better sparsity. u is called the threshold parameter (u = partial pivoting). It restricts the maximum possible growth of: a ij = a ij a ik a kj a kk u. is often chosen in practice. Symmetric indefinite case: requires by pivots, e.g. Threshold pivoting and numerical accuracy «Table : Effect of variation in threshold parameter u on a matrix with 8 nonzeros (Dongarra etal 9). u Nonzeros in LU factors Error

32 Iterative refinement for linear systems Suppose that a solver has computed A = LU (or LDL T or LL T, and a solution x to Ax = b.. Compute r = b A x.. Solve LU δx = r.. Update x = x + δx.. Repeat if necessary/useful.. Elimination tree and Multifrontal approach Elimination tree and Multifrontal approach We recall that: The elimination tree expresses dependencies between the various steps of the factorization. It also exhibits parallelism arising from the sparse structure of the matrix. Building the elimination tree Permute matrix (to reduce fill-in) PAP T. Build filled matrix A F = L + L T where PAP T = LL T Transitive reduction of associated filled graph Each column corresponds to a node of the graph. Each node k of the tree corresponds to the factorization of a frontal matrix whose row structure is that of column k of A F. Illustration of multifrontal factorization We assume pivot are chosen down the diagonal in order. _ F Filled matrix F _, _, _,,, Treatment at each node: Elimination graph Assembly of the frontal matrix using the contributions from the sons. Gaussian elimination on the frontal matrix Elimination of variable (a pivot) Assembly of the frontal matrix x x x x x Contributions : a ij = (a i a j ) a i >, j > on a, a, a and a :

33 Terms a i a j a a () (a a) = a () (a a) = a a () a (a a) = a () (a a) = a a of the contribution matrix are stored for later updates. Elimination of variable (a pivot) Assembly of frontal matrix: update of elements of pivot row and column using contributions from previous updates (none here) x x x x x Contributions on a, a, a, and a. a () (a a) = a () = a () = a (a a) a (a a) a a () (a a) = a Elimination of variable. Assembly of frontal matrix Update using the previous contributions: a = a + a () + a() a = a + a () + a() (a = ) a = a + a () + a(), (a = ) a = a () + a() stored as a so called contribution matrix. Note that a is partially summed since contributions are transfered only between son and father. x x x Contribution on a : a () = a (a a ) a Elimination of variable Frontal involves only a : a = a + a () The multifrontal method (Duff, Reid 8) Memory is divided into two parts (that can overlap in time): the factors the active memory Elimination tree represents tasks dependencies

34 A= L+U I= Fill in Factors Active frontal matrix Stack of contribution blocks Active Memory Multifrontal method AT EACH NODE F F F T F F Pivot can ONLY be chosen from F block since F is NOT fully summed Supernodal methods Definition 9. A supernode (or supervariable) is a set of contiguous columns in the factors L that share essentially the same sparsity structure. All algorithms (ordering, symbolic factor., factor., solve) generalize to blocked versions. Use of efficient matrix-matrix kernels (improve cache usage). Same concept as supervariables for elimination tree/minimum degree ordering. Supernodes and pivoting: pivoting inside a supernode does not increase fill-in. Factors Contribution block

35 Amalgamation Goal How? Exploit a more regular structure in the original matrix Decrease the amount of indirect addressing Increase the size of frontal matrices Relax the number of nonzeros of the matrix Amalgamation of nodes of the elimination tree Consequences? Increase in the total amount of flops But decrease of indirect addressing And increase in performance Remark : If i, i, i... i f is a son of node j, j, j... j p and if {j, j, j... j p } {i, i... i f } then the amalgamation of i and j is without fill-in Amalgamation of supernodes (same lower diagonal structure) is without fill-in Illustration of amalgamation Original Matrix Elimination tree (Root), F,, F F F, (Leaves),,,, Structure of node i = frontal matrix noted i, i, i... i f Illustration of amalgamation Amalgamation and Supervariables Amalgamation of supervariables does not cause fill-in Initial Graph: 9 8 Reordering:,,,,, 8,,,,, 9,, Supervariables: {,, } ; {,, 8} ; {, } ; {,, 9,, }

36 Amalgamation (WITHOUT fill-in) (WITH fill-in ),,,,,,,,,,,,, fill-in : (,) and (,) Parallelization: two levels of parallelism Arising from sparsity : between nodes of the elimination tree first level of parallelism Within each node: parallel dense LU factorization (BLAS) second level of parallelism Decreasing tree parallelism Increasing node parallelism U L L U L U Exploiting the second level of parallelism is crucial Multifrontal factorization () () Computer #procs MFlops (speed-up) MFlops (speed-up) Alliant FX/8 8 (.9) (.) IBM 9J/VF (.) (.8) CRAY- (.8) (.) CRAY Y-MP 9 (.) 9 (.8) Performance summary of the multifrontal factorization on matrix BCSSTK. In column (), we exploit only parallelism from the tree. In column (), we combine the two levels of parallelism. Other features Dynamic management of parallelism: Pool of tasks for exploiting the two levels of parallelism Assembly operations also parallel (but indirect addressing) L U

37 Dynamic management of data Storage of LU factors, frontal and contribution matrices Amount of memory available may conflict with exploiting maximum parallelism. Comparison between approaches for LU factorization We compare three general approaches for sparse LU factorization General technique Frontal method Multifrontal approach distributed memory multifrontal and supernodal approaches will be compared in another section Description of the approaches General technique Numerical and sparsity pivoting performed at the same time Dynamic sparse data structures Good preservation of sparsity : local decision influenced by numerical choices. Frontal method Extension of band or variable-band schemes No indirect addressing is required in the innermost loop (data are stored in dense matrices) Simple data structure, fast methods, easier to implement. Very popular, easy out-of-core implementation. Sequential by nature Description of the approaches Multifrontal approach Can be seen as an extension of frontal method Analysis phase to compute an ordering ordering can then be perturbed by numerical pivoting Full matrices are used in the innermost loops. Compared to frontal schemes: -complex to implement (assembly of dense matrices, managament of numerical pivoting) -Preserve in a better way the sparse structure.

38 Frontal method a a A = ajj Etape (Elimination of a) Frontal matrix holds of values for elimination of a F () = a Etape j (Elimination of ajj, j=,..) Frontal matrix holds: updates from previous steps and additionally row j and column j of the original matrix ajj (j) F = Values for updating are generated: fij= (ai*aj)/a Properties: The band is treated as full Efficient reordering to minimize bandwidth is crucial. Frontal vs Multifrontal methods Multifrontal method Step : Frontal method Step : Frontal matrix: Step : Considering frontal Matrices is sufficient: Considered as full Several fronts move ahead simultaneously (treatment of blocks and is independent) Characteristics of multifrontal method: More complex data structures. Usually more efficient for preserving sparsity than frontal techniques Parallelism arising from sparsity. Illustration: comparison between software for LU General approach (MA8: Davis and Duff): Control of fill-in: Markowitz criterion Numerical Stability: partial pivoting Numerical and sparsity pivoting are performed in one step Multifrontal method(ma, Amestoy and Duff): 8

39 Reordering before numerical factorization Minimum degree type of reordering Partial pivoting for numerical stability Frontal method (MA, Duff and Scott): Reordering before numerical factorization Reordering for decreasing bandwidth Partial pivoting for numerical stability All these software (MA8, MA, MA) are available in HSL-Library. Test problems from Harwell-Boeing and Tim Davis (Univ. Florida) collections. Matrix Order Nb of Description nonzeros Orani8.rua 98 Economic modelling Onetone.rua 88 Harmonic balance method Garon.rua 9 D Navier-Stokes Wang.rua 8 D Simulation of semiconductor mhda.rua 8 Spectral problem in Hydrodynamic rim.rua 9 CFD nonlinear problem Execution times on a SUN Matrix FLops Size of factors Time (Method) ( words) (seconds) Onetone.rua ( 9 ) General 9 Frontal 9 9 Multifrontal 8 9 Garon.rua ( 8 ) General 8 9 Frontal 9 8 Multifrontal 8 mhda.rua ( ) General..8 Frontal.. Multifrontal... Task mapping and scheduling Affect tasks to processors to achieve a goal: makespan minimization, memory minimization,... many approaches: static: Build the schedule before the execution and follow it at run-time Advantage: very efficient since it has a global view of the system 9

40 Drawback: Requires a very-good modelization of the platform dynamic: Take scheduling decisions dynamically at run-time Advantage: Reactive to the evolution of the platform and easy to use on several platforms Drawback: Decisions taken with local criteria (a decision which seems to be good at time t can have very bad consequences at time t + ) Influence of scheduling on the makespan Objective: Assign process/tasks to processors so that the completion time, also called the makespan is minimized. (We may also say that we minimize the maximum total processing time on any processor.) Task scheduling on shared memory computers The data can be shared between processors without any communication. Dynamic scheduling of the tasks (pool of ready tasks). Each processor selects a task (order can influence the performance). Example of good topological ordering (w.r.t time). 8 9 Ordering not so good in terms of working memory. Static scheduling: Subtree to subcube (or proportional) mapping Main objective: reduce the volume of communication between processors. Recursively partition the processors equally between children of a given node. Initially all processors are assigned to root node. Good at localizing communication but not so easy if no overlapping between processor partitions at each step. 8,,,,,,,,, 9 Mapping of the tasks onto the processors Mapping of the tree onto the processors Objective : Find a layer L such that subtrees of L can be mapped onto the processor with a good balance. Construction and mapping of the initial level L Begin Let L Roots of the assembly tree repeat

41 Find the node q in L whose subtree has largest computational cost Set L (L \{q}) {children of q} (See Figure??) Greedy mapping of the nodes of L onto the processors Estimate the load unbalance until load unbalance < threshold End Step A Step B Step C Decomposition of the tree into levels Determination of Level L based on subtree cost. L L L L Subtree roots Mapping of top of the tree can be dynamic. Could be useful for both shared and distributed memory algo.. Distributed memory approaches Computational strategies for parallel direct solvers The parallel algorithm is characterized by: Computational graph dependency Communication graph Three classical approaches. Fan-in. Fan-out. Multifrontal Preamble: left and right looking approaches for Cholesky factorization cmod(j, k) : Modification of column j by column k, k < j, cdiv(j) division of column j by a scalar Left-looking approach for j = to n do for k Struct(row L j ) do cmod(j, k) cdiv(j) Right-looking approach for k = to n do cdiv(k) for j Struct(col L k ) do cmod(j, k)

42 Illustration of Left and right looking Left looking Right looking used for modification modified Assumptions and Notations Assumptions : We assume that each column of L/each node of the tree is assigned to a single processor. Each processor is in charge of computing cdiv(j) for columns j that it owns. Notations : mycols(p) = is the set of columns owned by processor p. map(j) gives the processor owning column j (or task j). procs(l k ) = {map(j) j Struct(L k )} (only processors in procs(l k ) require updates from column k they correspond to ancestors of k in the tree). Fan-in variant (similar to left looking) Demand driven algorithm : data required are aggregated update columns computed by sending processor Fan-in (p) for j = to p do u = for all k Struct(row L j ) mycols(p) do u = u + cmod(j, k) end for if map(j) p then Send u to processor map(j) end if if map(j) == p then Incorporate u in column j Receive all necessary updated aggregates on column j and incorporate them in column j cdiv(j) end if end for

43 () () () () () Left Looking () () () Modified Used for modification Algorithm: (Cholesky) () For j= to n do () () () For k in Struct(L ) do j,* cmod(j,k) Endfor cdiv(j) Endfor if map() = map() = map() = p and map() p (only) one message sent by p to update column exploits data locality of proportional mapping. Fan-out variant (similar to right-looking) Data driven algorithm. Fan-out(p): for all leaf node j mycols(p) do cdiv(j) send column L j to procs(col L j ) mycols(p) = mycols(p) {j} end for while mycols(p) do Receive any column (say L k ) of L for j Struct(col L k ) mycols(p) do cmod(j, k) if column j completely updated then cdiv(j) send column L j mycols(p) = mycols(p) \ {j} end if end for end while () () () () () Right Looking () () () Updated Computed Algorithm: (Cholesky) () () () () For k= to n do cdiv(k) For j in Struct(L ) do *,k cmod(j,k) Endfor Endfor

44 if map() = map() = p and map() p then messages (for column and ) are sent by p to update column. Fan-out variant Properties of fan-out: Historically the first implemented. Incurs greater interprocessor communications than fan-in (or multifrontal) approach both in terms of total number of messages total volume Does not exploit data locality of proportional mapping. Improved algorithm (local aggregation): send aggregated update columns instead of individual factor columns for columns mapped on a single processor. Improve exploitation of data locality of proportional mapping. But memory increase to store aggregates can be critical (as in fan-in). Multifrontal variant () () () () () () Right Looking () Updated () Computed () () () () () () () () () () Elimination tree L C B () ()()()() () () () Algorithm: For k= to n do Build full frontal matrix with all indices in Struct(L *,k ) Partial factorisation Send Contribution Block to Father Endfor "Multifrontal Method". Some parallel solvers Shared memory sparse direct codes Code Technique Scope Availability ( MA Multifrontal UNS cse.clrc.ac.uk/activity/hsl MA9 Multifrontal QR RECT cse.clrc.ac.uk/activity/hsl PanelLLT Left-looking SPD Ng PARDISO Left-right looking UNS Schenk PSL Left-looking SPD/UNS SGI product SPOOLES Fan-in SYM/UNS netlib.org/linalg/spooles SuperLU Left-looking UNS nersc.gov/ xiaoye/superlu WSMP Multifrontal SYM/UNS IBM product Only object code for SGI is available

Sparse Matrices Direct methods

Sparse Matrices Direct methods Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.

More information

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination

More information

Sparse Linear Systems

Sparse Linear Systems 1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse

More information

Overview of MUMPS (A multifrontal Massively Parallel Solver)

Overview of MUMPS (A multifrontal Massively Parallel Solver) Overview of MUMPS (A multifrontal Massively Parallel Solver) E. Agullo, P. Amestoy, A. Buttari, P. Combes, A. Guermouche, J.-Y. L Excellent, T. Slavova, B. Uçar CERFACS, ENSEEIHT-IRIT, INRIA (LIP and LaBRI),

More information

Sparse Matrices Introduction to sparse matrices and direct methods

Sparse Matrices Introduction to sparse matrices and direct methods Sparse Matrices Introduction to sparse matrices and direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections,

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26

More information

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY General definitions; Representations; Graph Traversals; Topological sort; Graphs definitions & representations Graph theory is a fundamental tool in sparse

More information

AA220/CS238 Parallel Methods in Numerical Analysis. Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems)

AA220/CS238 Parallel Methods in Numerical Analysis. Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems) AA0/CS8 Parallel ethods in Numerical Analysis Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems) Kincho H. Law Professor of Civil and Environmental Engineering Stanford University

More information

The block triangular form and bipartite matchings

The block triangular form and bipartite matchings Outline The block triangular form and bipartite matchings Jean-Yves L Excellent and Bora Uçar GRAAL, LIP, ENS Lyon, France CR-07: Sparse Matrix Computations, November 2010 http://graal.ens-lyon.fr/~bucar/cr07/

More information

Analysis of the Out-of-Core Solution Phase of a Parallel Multifrontal Approach

Analysis of the Out-of-Core Solution Phase of a Parallel Multifrontal Approach Analysis of the Out-of-Core Solution Phase of a Parallel Multifrontal Approach P. Amestoy I.S. Duff A. Guermouche Tz. Slavova April 25, 200 Abstract We consider the parallel solution of sparse linear systems

More information

Sparse Matrices and Graphs: There and Back Again

Sparse Matrices and Graphs: There and Back Again Sparse Matrices and Graphs: There and Back Again John R. Gilbert University of California, Santa Barbara Simons Institute Workshop on Parallel and Distributed Algorithms for Inference and Optimization

More information

Sparse Direct Solvers for Extreme-Scale Computing

Sparse Direct Solvers for Extreme-Scale Computing Sparse Direct Solvers for Extreme-Scale Computing Iain Duff Joint work with Florent Lopez and Jonathan Hogg STFC Rutherford Appleton Laboratory SIAM Conference on Computational Science and Engineering

More information

SCALABLE ALGORITHMS for solving large sparse linear systems of equations

SCALABLE ALGORITHMS for solving large sparse linear systems of equations SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational

More information

Sparse LU Factorization for Parallel Circuit Simulation on GPUs

Sparse LU Factorization for Parallel Circuit Simulation on GPUs Department of Electronic Engineering, Tsinghua University Sparse LU Factorization for Parallel Circuit Simulation on GPUs Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Nano-scale Integrated

More information

On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme

On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme Emmanuel Agullo 1,6,3,7, Abdou Guermouche 5,4,8, and Jean-Yves L Excellent 2,6,3,9 1 École Normale Supérieure de

More information

CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus

CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus Tim Davis last modified September 23, 2014 1 Catalog Description CSCE 689. Special

More information

Native mesh ordering with Scotch 4.0

Native mesh ordering with Scotch 4.0 Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse

More information

Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices

Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices Nerma Baščelija Sarajevo School of Science and Technology Department of Computer Science Hrasnicka Cesta 3a, 71000 Sarajevo

More information

Combinatorial problems in solving linear systems

Combinatorial problems in solving linear systems Combinatorial problems in solving linear systems Iain S. Duff 1,2, Bora Uçar 3 1 CERFACS, 42 Av. G. Coriolis, 31057, Toulouse, France iain.duff@stfc.ac.uk 2 Atlas Centre, RAL, Oxon, OX11 0QX, England 3

More information

An Out-of-Core Sparse Cholesky Solver

An Out-of-Core Sparse Cholesky Solver An Out-of-Core Sparse Cholesky Solver JOHN K. REID and JENNIFER A. SCOTT Rutherford Appleton Laboratory 9 Direct methods for solving large sparse linear systems of equations are popular because of their

More information

CERFACS 42 av. Gaspard Coriolis, Toulouse, Cedex 1, France. Available at Date: August 7, 2009.

CERFACS 42 av. Gaspard Coriolis, Toulouse, Cedex 1, France. Available at  Date: August 7, 2009. COMBINATORIAL PROBLEMS IN SOLVING LINEAR SYSTEMS IAIN S. DUFF and BORA UÇAR Technical Report: No: TR/PA/09/60 CERFACS 42 av. Gaspard Coriolis, 31057 Toulouse, Cedex 1, France. Available at http://www.cerfacs.fr/algor/reports/

More information

qr_mumps a multithreaded, multifrontal QR solver The MUMPS team,

qr_mumps a multithreaded, multifrontal QR solver The MUMPS team, qr_mumps a multithreaded, multifrontal QR solver The MUMPS team, MUMPS Users Group Meeting, May 29-30, 2013 The qr_mumps software the qr_mumps software qr_mumps version 1.0 a multithreaded, multifrontal

More information

Combinatorial problems in a Parallel Hybrid Linear Solver

Combinatorial problems in a Parallel Hybrid Linear Solver Combinatorial problems in a Parallel Hybrid Linear Solver Ichitaro Yamazaki and Xiaoye Li Lawrence Berkeley National Laboratory François-Henry Rouet and Bora Uçar ENSEEIHT-IRIT and LIP, ENS-Lyon SIAM workshop

More information

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview Aim Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity Julian Hall School of Mathematics University of Edinburgh jajhall@ed.ac.uk What should a 2-hour PhD lecture on structure

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

Elimination Structures in Scientific Computing

Elimination Structures in Scientific Computing Chapter 1 Elimination Structures in Scientific Computing Alex Pothen Old Dominion University Sivan Toledo Tel-Aviv University The most fundamental computation in numerical linear algebra is the factorization

More information

Nested-Dissection Orderings for Sparse LU with Partial Pivoting

Nested-Dissection Orderings for Sparse LU with Partial Pivoting Nested-Dissection Orderings for Sparse LU with Partial Pivoting Igor Brainman 1 and Sivan Toledo 1 School of Mathematical Sciences, Tel-Aviv University Tel-Aviv 69978, ISRAEL Email: sivan@math.tau.ac.il

More information

Lecture 17: More Fun With Sparse Matrices

Lecture 17: More Fun With Sparse Matrices Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you

More information

Two algorithms for the numerical factorization of large symmetric positive definite matrices for a hypercube multiprocessor

Two algorithms for the numerical factorization of large symmetric positive definite matrices for a hypercube multiprocessor Two algorithms for the numerical factorization of large symmetric positive definite matrices for a hypercube multiprocessor Rune Torkildsen Program Systems Group Department of Computer Systems The Norwegian

More information

PACKAGE SPECIFICATION HSL 2013

PACKAGE SPECIFICATION HSL 2013 HSL MC73 PACKAGE SPECIFICATION HSL 2013 1 SUMMARY Let A be an n n matrix with a symmetric sparsity pattern. HSL MC73 has entries to compute the (approximate) Fiedler vector of the unweighted or weighted

More information

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker BLAS and LAPACK + Data Formats for Sparse Matrices Part of the lecture Wissenschaftliches Rechnen Hilmar Wobker Institute of Applied Mathematics and Numerics, TU Dortmund email: hilmar.wobker@math.tu-dortmund.de

More information

Outline. CS38 Introduction to Algorithms. Linear programming 5/21/2014. Linear programming. Lecture 15 May 20, 2014

Outline. CS38 Introduction to Algorithms. Linear programming 5/21/2014. Linear programming. Lecture 15 May 20, 2014 5/2/24 Outline CS38 Introduction to Algorithms Lecture 5 May 2, 24 Linear programming simplex algorithm LP duality ellipsoid algorithm * slides from Kevin Wayne May 2, 24 CS38 Lecture 5 May 2, 24 CS38

More information

Lecture 9. Introduction to Numerical Techniques

Lecture 9. Introduction to Numerical Techniques Lecture 9. Introduction to Numerical Techniques Ivan Papusha CDS270 2: Mathematical Methods in Control and System Engineering May 27, 2015 1 / 25 Logistics hw8 (last one) due today. do an easy problem

More information

Direct Solvers for Sparse Matrices X. Li September 2006

Direct Solvers for Sparse Matrices X. Li September 2006 Direct Solvers for Sparse Matrices X. Li September 2006 Direct solvers for sparse matrices involve much more complicated algorithms than for dense matrices. The main complication is due to the need for

More information

Technical Report TR , Computer and Information Sciences Department, University. Abstract

Technical Report TR , Computer and Information Sciences Department, University. Abstract An Approach for Parallelizing any General Unsymmetric Sparse Matrix Algorithm Tariq Rashid y Timothy A.Davis z Technical Report TR-94-036, Computer and Information Sciences Department, University of Florida,

More information

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France

More information

Lecture 27: Fast Laplacian Solvers

Lecture 27: Fast Laplacian Solvers Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall

More information

MUMPS. The MUMPS library. Abdou Guermouche and MUMPS team, June 22-24, Univ. Bordeaux 1 and INRIA

MUMPS. The MUMPS library. Abdou Guermouche and MUMPS team, June 22-24, Univ. Bordeaux 1 and INRIA The MUMPS library Abdou Guermouche and MUMPS team, Univ. Bordeaux 1 and INRIA June 22-24, 2010 MUMPS Outline MUMPS status Recently added features MUMPS and multicores? Memory issues GPU computing Future

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

Sparse matrices, graphs, and tree elimination

Sparse matrices, graphs, and tree elimination Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);

More information

The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method

The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method VLADIMIR ROTKIN and SIVAN TOLEDO Tel-Aviv University We describe a new out-of-core sparse Cholesky factorization

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Algorithm Design (8) Graph Algorithms 1/2

Algorithm Design (8) Graph Algorithms 1/2 Graph Algorithm Design (8) Graph Algorithms / Graph:, : A finite set of vertices (or nodes) : A finite set of edges (or arcs or branches) each of which connect two vertices Takashi Chikayama School of

More information

Null space computation of sparse singular matrices with MUMPS

Null space computation of sparse singular matrices with MUMPS Null space computation of sparse singular matrices with MUMPS Xavier Vasseur (CERFACS) In collaboration with Patrick Amestoy (INPT-IRIT, University of Toulouse and ENSEEIHT), Serge Gratton (INPT-IRIT,

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00 PRLLEL SPRSE HOLESKY FTORIZTION J URGEN SHULZE University of Paderborn, Department of omputer Science Furstenallee, 332 Paderborn, Germany Sparse matrix factorization plays an important role in many numerical

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science and Center of Simulation of Advanced Rockets University of Illinois at Urbana-Champaign

More information

Sparse matrices: Basics

Sparse matrices: Basics Outline : Basics Bora Uçar RO:MA, LIP, ENS Lyon, France CR-08: Combinatorial scientific computing, September 201 http://perso.ens-lyon.fr/bora.ucar/cr08/ 1/28 CR09 Outline Outline 1 Course presentation

More information

c 2004 Society for Industrial and Applied Mathematics

c 2004 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 26, No. 2, pp. 390 399 c 2004 Society for Industrial and Applied Mathematics HERMITIAN MATRICES, EIGENVALUE MULTIPLICITIES, AND EIGENVECTOR COMPONENTS CHARLES R. JOHNSON

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Row ordering for frontal solvers in chemical process engineering

Row ordering for frontal solvers in chemical process engineering Computers and Chemical Engineering 24 (2000) 1865 1880 www.elsevier.com/locate/compchemeng Row ordering for frontal solvers in chemical process engineering Jennifer A. Scott * Computational Science and

More information

CS Data Structures and Algorithm Analysis

CS Data Structures and Algorithm Analysis CS 483 - Data Structures and Algorithm Analysis Lecture VI: Chapter 5, part 2; Chapter 6, part 1 R. Paul Wiegand George Mason University, Department of Computer Science March 8, 2006 Outline 1 Topological

More information

FAST COMPUTATION OF MINIMAL FILL INSIDE A GIVEN ELIMINATION ORDERING

FAST COMPUTATION OF MINIMAL FILL INSIDE A GIVEN ELIMINATION ORDERING FAST COMPUTATION OF MINIMAL FILL INSIDE A GIVEN ELIMINATION ORDERING PINAR HEGGERNES AND BARRY W. PEYTON Abstract. Minimal elimination orderings were introduced by Rose, Tarjan, and Lueker in 1976, and

More information

Lecture 3: Graphs and flows

Lecture 3: Graphs and flows Chapter 3 Lecture 3: Graphs and flows Graphs: a useful combinatorial structure. Definitions: graph, directed and undirected graph, edge as ordered pair, path, cycle, connected graph, strongly connected

More information

Sparse Matrix Algorithms

Sparse Matrix Algorithms Sparse Matrix Algorithms combinatorics + numerical methods + applications Math + X Tim Davis University of Florida June 2013 contributions to the field current work vision for the future Outline Math+X

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Sparse Linear Algebra

Sparse Linear Algebra Lecture 5 Sparse Linear Algebra The solution of a linear system Ax = b is one of the most important computational problems in scientific computing. As we shown in the previous section, these linear systems

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

A Static Parallel Multifrontal Solver for Finite Element Meshes

A Static Parallel Multifrontal Solver for Finite Element Meshes A Static Parallel Multifrontal Solver for Finite Element Meshes Alberto Bertoldo, Mauro Bianco, and Geppino Pucci Department of Information Engineering, University of Padova, Padova, Italy {cyberto, bianco1,

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the

More information

CSCE 411 Design and Analysis of Algorithms

CSCE 411 Design and Analysis of Algorithms CSCE 411 Design and Analysis of Algorithms Set 4: Transform and Conquer Slides by Prof. Jennifer Welch Spring 2014 CSCE 411, Spring 2014: Set 4 1 General Idea of Transform & Conquer 1. Transform the original

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Object-oriented Design for Sparse Direct Solvers

Object-oriented Design for Sparse Direct Solvers NASA/CR-1999-208978 ICASE Report No. 99-2 Object-oriented Design for Sparse Direct Solvers Florin Dobrian Old Dominion University, Norfolk, Virginia Gary Kumfert and Alex Pothen Old Dominion University,

More information

PACKAGE SPECIFICATION HSL 2013

PACKAGE SPECIFICATION HSL 2013 MA41 PACKAGE SPECIFICATION HSL 2013 1 SUMMARY To solve a sparse unsymmetric system of linear equations. Given a square unsymmetric sparse matrix A of order n T and an n-vector b, this subroutine solves

More information

GraphBLAS Mathematics - Provisional Release 1.0 -

GraphBLAS Mathematics - Provisional Release 1.0 - GraphBLAS Mathematics - Provisional Release 1.0 - Jeremy Kepner Generated on April 26, 2017 Contents 1 Introduction: Graphs as Matrices........................... 1 1.1 Adjacency Matrix: Undirected Graphs,

More information

A direct solver with reutilization of previously-computed LU factorizations for h-adaptive finite element grids with point singularities

A direct solver with reutilization of previously-computed LU factorizations for h-adaptive finite element grids with point singularities Maciej Paszynski AGH University of Science and Technology Faculty of Computer Science, Electronics and Telecommunication Department of Computer Science al. A. Mickiewicza 30 30-059 Krakow, Poland tel.

More information

A parallel direct/iterative solver based on a Schur complement approach

A parallel direct/iterative solver based on a Schur complement approach A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement

Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Tim Davis (Texas A&M University) with Sanjay Ranka, Mohamed Gadou (University of Florida) Nuri Yeralan (Microsoft) NVIDIA

More information

Algorithm 837: AMD, An Approximate Minimum Degree Ordering Algorithm

Algorithm 837: AMD, An Approximate Minimum Degree Ordering Algorithm Algorithm 837: AMD, An Approximate Minimum Degree Ordering Algorithm PATRICK R. AMESTOY, ENSEEIHT-IRIT, and TIMOTHY A. DAVIS University of Florida and IAIN S. DUFF CERFACS and Rutherford Appleton Laboratory

More information

SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS

SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS SUPERFAS MULIFRONAL MEHOD FOR SRUCURED LINEAR SYSEMS OF EQUAIONS S. CHANDRASEKARAN, M. GU, X. S. LI, AND J. XIA Abstract. In this paper we develop a fast direct solver for discretized linear systems using

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

A Sparse Symmetric Indefinite Direct Solver for GPU Architectures

A Sparse Symmetric Indefinite Direct Solver for GPU Architectures 1 A Sparse Symmetric Indefinite Direct Solver for GPU Architectures JONATHAN D. HOGG, EVGUENI OVTCHINNIKOV, and JENNIFER A. SCOTT, STFC Rutherford Appleton Laboratory In recent years, there has been considerable

More information

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static

More information

Preconditioning for linear least-squares problems

Preconditioning for linear least-squares problems Preconditioning for linear least-squares problems Miroslav Tůma Institute of Computer Science Academy of Sciences of the Czech Republic tuma@cs.cas.cz joint work with Rafael Bru, José Marín and José Mas

More information

ECEN 615 Methods of Electric Power Systems Analysis Lecture 11: Sparse Systems

ECEN 615 Methods of Electric Power Systems Analysis Lecture 11: Sparse Systems ECEN 615 Methods of Electric Power Systems Analysis Lecture 11: Sparse Systems Prof. Tom Overbye Dept. of Electrical and Computer Engineering Texas A&M University overbye@tamu.edu Announcements Homework

More information

Q. Wang National Key Laboratory of Antenna and Microwave Technology Xidian University No. 2 South Taiba Road, Xi an, Shaanxi , P. R.

Q. Wang National Key Laboratory of Antenna and Microwave Technology Xidian University No. 2 South Taiba Road, Xi an, Shaanxi , P. R. Progress In Electromagnetics Research Letters, Vol. 9, 29 38, 2009 AN IMPROVED ALGORITHM FOR MATRIX BANDWIDTH AND PROFILE REDUCTION IN FINITE ELEMENT ANALYSIS Q. Wang National Key Laboratory of Antenna

More information

MA41 SUBROUTINE LIBRARY SPECIFICATION 1st Aug. 2000

MA41 SUBROUTINE LIBRARY SPECIFICATION 1st Aug. 2000 MA41 SUBROUTINE LIBRARY SPECIFICATION 1st Aug. 2000 1 SUMMARY To solve a sparse unsymmetric system of linear equations. Given a square unsymmetric sparse matrix T A of order n and a matrix B of nrhs right

More information

CSCI5070 Advanced Topics in Social Computing

CSCI5070 Advanced Topics in Social Computing CSCI5070 Advanced Topics in Social Computing Irwin King The Chinese University of Hong Kong king@cse.cuhk.edu.hk!! 2012 All Rights Reserved. Outline Graphs Origins Definition Spectral Properties Type of

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 1 Don t you just invert

More information

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path. Chapter 3 Trees Section 3. Fundamental Properties of Trees Suppose your city is planning to construct a rapid rail system. They want to construct the most economical system possible that will meet the

More information

CSC Design and Analysis of Algorithms

CSC Design and Analysis of Algorithms CSC : Lecture 7 CSC - Design and Analysis of Algorithms Lecture 7 Transform and Conquer I Algorithm Design Technique CSC : Lecture 7 Transform and Conquer This group of techniques solves a problem by a

More information

Review on Storage and Ordering for Large Sparse Matrix

Review on Storage and Ordering for Large Sparse Matrix Review on Storage and Ordering for Large Sparse Matrix (Final Project, Math) Wei He (SID: 5763774) Introduction Many problems in science and engineering require the solution of a set of sparse matrix equations

More information

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1

More information

Parallel resolution of sparse linear systems by mixing direct and iterative methods

Parallel resolution of sparse linear systems by mixing direct and iterative methods Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht

More information

CS 770G - Parallel Algorithms in Scientific Computing

CS 770G - Parallel Algorithms in Scientific Computing CS 770G - Parallel lgorithms in Scientific Computing Dense Matrix Computation II: Solving inear Systems May 28, 2001 ecture 6 References Introduction to Parallel Computing Kumar, Grama, Gupta, Karypis,

More information

Reducing the total bandwidth of a sparse unsymmetric matrix

Reducing the total bandwidth of a sparse unsymmetric matrix RAL-TR-2005-001 Reducing the total bandwidth of a sparse unsymmetric matrix J. K. Reid and J. A. Scott March 2005 Council for the Central Laboratory of the Research Councils c Council for the Central Laboratory

More information

HYPERGRAPH-BASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION

HYPERGRAPH-BASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION SIAM J. SCI. COMPUT. Vol. 32, No. 6, pp. 3426 3446 c 2010 Society for Industrial and Applied Mathematics HYPERGRAPH-BASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION LAURA GRIGORI,

More information

Parallel Implementations of Gaussian Elimination

Parallel Implementations of Gaussian Elimination s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n

More information

2 Fundamentals of Serial Linear Algebra

2 Fundamentals of Serial Linear Algebra . Direct Solution of Linear Systems.. Gaussian Elimination.. LU Decomposition and FBS..3 Cholesky Decomposition..4 Multifrontal Methods. Iterative Solution of Linear Systems.. Jacobi Method Fundamentals

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 4 Graphs Definitions Traversals Adam Smith 9/8/10 Exercise How can you simulate an array with two unbounded stacks and a small amount of memory? (Hint: think of a

More information

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage

More information

Parallel Sparse LU Factorization on Different Message Passing Platforms

Parallel Sparse LU Factorization on Different Message Passing Platforms Parallel Sparse LU Factorization on Different Message Passing Platforms Kai Shen Department of Computer Science, University of Rochester Rochester, NY 1467, USA Abstract Several message passing-based parallel

More information