SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS

Size: px
Start display at page:

Download "SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS"

Transcription

1 SUPERFAS MULIFRONAL MEHOD FOR SRUCURED LINEAR SYSEMS OF EQUAIONS S. CHANDRASEKARAN, M. GU, X. S. LI, AND J. XIA Abstract. In this paper we develop a fast direct solver for discretized linear systems using the multifrontal method together with low-rank approximations. For linear systems arising from certain partial differential equations such as elliptic equations we discover that during the Gaussian elimination of the matrices with proper ordering, the fill-in has a low-rank property: all off-diagonal blocks have small numerical ranks with proper definition of off-diagonal blocks. Matrices with this low-rank property can be efficiently approximated with some semiseparable structures, called hierarchically semiseparable HSS representations. We reveal the above low-rank property by ordering the variables with nested dissection and eliminating them with the multifrontal method. All matrix operations in the multifrontal method are performed in HSS forms. We present efficient ways to build compact HSS structures along the elimination. Necessary HSS matrix operations for the structured multifrontal method are proposed. By taking advantage of the low-rank property the multifrontal method with HSS structures can be shown to have linear complexity and to require only linear storage. We therefore call this structured method a superfast multifrontal method. It is especially suitable for large problems, and also has natural adaptability to parallel computations and great potential to provide effective preconditioners. Numerical results demonstrate the efficiency. Key words. direct solver, HSS matrix, low-rank property, superfast multifrontal method, nested dissection AMS subject classifications. 15A23, 65F05, 65F30, 65F50 1. Introduction. In many computational and engineering problems it is critical to solve large structured linear systems of equations. Different structures come from different natures of the original problems or different techniques of discretization, linearization, or simplification. It is usually important to take advantage of the special structures, or to preserve the structures when necessary. Direct solvers often provide good chances to exploit the structures. Direct methods are attractive also due to their efficiency, reliability, and generality. Here we are interested in the discretization of differential equations such as elliptic PDEs on a two dimensional 2D finite element mesh grid M [15, 2, 11] which is any mesh formed by partitioning a planar region with a boundary into subregions elements by some lines edges. Edges interconnect at their end points nodes. In this paper we consider a regular mesh, or more generally, a well-shaped mesh [28, 29, 33]. Here by a well-shaped mesh we simply mean a mesh where nested dissection or its generalizations [15, 25, 16, 18, 1, 33] can be used. A symmetric positive definite SPD system Ax = b 1.1 associated with M is called a finite element system, if variables are associated with mesh nodes, and the system has the property that A ij 0 if and only if x i and x j are associated with nodes of the same element of M. Such matrices arise when we apply 5-point difference or finite element techniques to solve 2D linear boundary Department of Electrical and Computer Engineering, University of California at Santa Barbara shiv@ece.ucsb.edu. Department of Mathematics, University of California at Berkeley mgu@math.berkeley.edu. Lawrence Berkeley National Laboratory, MS 50F-1650, One Cyclotron Rd., Berkeley, CA xsli@lbl.gov. Department of Mathematics, University of California at Los Angeles jxia@math.ucla.edu. 1

2 2 S. CHANDRASEKARAN, M. GU, X. S. LI AND J. XIA value problems such as elliptic boundary problems on rectangular domains [14]. For the purpose of presentations sometimes we will refer to the following model problem. Model Problem 1.1. We use the 2D discrete Laplacian on a 5-point stencil as our model problem for the numerical experiments, where A is a 5-diagonal n 2 n 2 sparse matrix: I I A = I 0 I, = Another example is the 9-point stencil. o solve 1.1 people typically first order the mesh nodes and then factorize the discretized matrix. Direct factorization of such a matrix with row-wise node ordering takes On 4 flops [15]. he nested dissection ordering of the nodes [15] gives an elimination scheme with On 3 cost which is optimal for any ordering in exact arithmetic [22]. But On 3 is still large for a big n. Sometimes iterative methods are cheaper if some effective preconditioners are available. But for hard problems good preconditioners can be difficult to find Superfast multifrontal method. Direct solvers have been considered expensive because of the large amount of fill-in even if the original matrix is sparse. o effectively handle the fill-in, some people represent or approximate full matrices in complicated problems with structured matrices such as H-matrices [19, 21, 20], quasiseparable matrices [13], semiseparable matrices [7, 5, 6], etc. Similarly, when solving discretized PDEs such as elliptic equations, we can develop fast direct solvers by exploiting the numerical low-rank property and by approximating dense matrices in these problems with compact semiseparable matrices without compromising the accuracy. hese approximations are feasible as we notice that the fill-in during the elimination with certain ordering is actually structured. he fill-in is closely related to the Green s functions via Schur complements. hus the off-diagonal blocks of the fill-in have small numerical ranks. Due to this low-rank property, we show that the structured approximations of the dense N N subproblems in those problems can be solved with a cost of Op 2 N, and this leads to a total complexity of Opn 2, where p is a constant related to the PDE and the accuracy in the matrix approximations and n is the mesh size. o be more precise, our solver includes both the approximation stage of the dense matrices in the original problem and the direct solve stage of the structured approximations. We still call the overall procedure a direct solver. One of the important direct methods for sparse matrix solutions is the multifrontal method by Duff and Reid [12]. Liu gave a detailed description of the multifrontal method for large sparse SPD systems in [27]. In our direct solver we order the mesh nodes with nested dissection and organize the elimination process with the multifrontal method. his procedure is demonstrated to be effective for spare problems. Moreover, we discover that it produces dense fill-ins with the above low-rank property for many discretized PDEs. he method eliminates variables and accumulates updates according to an elimination tree or assembly tree whose nodes are the variables. Here, we use a supernodal version of the multifrontal method. We treat each separator as a node in the supernodal elimination tree of the multifrontal method. In this paper the semiseparable matrices we use to approximate the dense submatrices in the multifrontal method are called hierarchically semiseparable HSS

3 SUPERFAS MULIFRONAL MEHOD FOR LINEAR SYSEMS 3 matrices, introduced by Chandrasekaran, Gu, et al. in [9, 10], and further generalized in [8, 34]. Many fast algorithms exist for HSS structures. HSS matrices are featured by hierarchical low-rank properties in the off-diagonal blocks Figure 1.1 which are block rows excluding the diagonal blocks in different levels of splittings of the matrix. We call them HSS off-diagonal blocks. Off-diagonal block columns can be similarly considered. i Bottom level HSS blocks ii Upper level HSS blocks Fig wo levels of HSS Off-diagonal blocks. More specifically, we say that a matrix has the low-rank property if all its HSS off-diagonal blocks have small numerical ranks. We call the maximum numerical rank of all HSS blocks the HSS rank of the matrix. Here by numerical ranks we mean the ranks obtained by rank revealing QR factorizations or τ-accurate SVD in the SVD all singular values less than a tolerance τ have been discarded when τ is an absolute tolerance; similarly, τ can be a relative tolerance. Later we simply use ranks to mean numerical ranks. Update matrices and frontal matrices in the multifrontal method for the model problem have the low-rank property. hus HSS approximations can be used. In the multifrontal method all matrix operations are conducted efficiently via HSS approximations. Some basic HSS operations can be found in [8, 9, 10], including structure generations, system solving, etc. In this paper we also provide additional HSS operations necessary for the structured multifrontal method. Both the multifrontal method and the HSS structure have nice hierarchical tree structures. hey both take good advantage of dense matrix computations and have efficient storage, good data locality, and natural adaptability to parallel computations. he usage of HSS matrices in the multifrontal method leads to an efficient structured multifrontal method. For problems such as the 5-point or 9-point finite difference Laplacian, this structured multifrontal method has nearly linear complexity. We thus call this method a superfast multifrontal method. he method is also memory efficient. By setting a relatively large tolerance in the matrix approximations the method becomes more efficient and can also work as an effective preconditioner. We have developed a software package for the solver and various HSS operations Outline of the paper. his paper is organized as follows. In Section 2 we give an introduction to the multifrontal method, nested dissection, and HSS structures. Section 3 presents our new superfast multifrontal method. he development proceeds as follows. First, in Subsection 3.1 we investigate the low-rank properties in the multifrontal method. Certain rules are set up to guarantee small numerical ranks and the feasibility of low-rank structures. Subsection 3.3 discusses the superfast multifrontal method with HSS structures in great detail. Necessary HSS operations are proposed so that all matrix operations are structured. hen in Subsection 3.4 we present some implementation issues and summarize the method in an algorithm. Section 4 demonstrates the efficiency of the superfast multifrontal method with numerical experiments. Section 5 gives some general remarks about the method.

4 4 S. CHANDRASEKARAN, M. GU, X. S. LI AND J. XIA 2. Review of the multifrontal method and HSS representations. In this section we briefly review the multifrontal method and HSS representations which will build our superfast structured multifrontal method Multifrontal method with nested dissection ordering. he central idea of the multifrontal method is to reorganize the overall factorization of a large sparse matrix into partial updates and factorizations of small dense matrices [12, 27]. It has been widely used in numerical methods for PDEs, optimization, fluid dynamics, and other areas. We take a look at the factorization of an N N SPD matrix A. he elimination of j 1 variables in a block form is A Dj 1 Vj 1 V j 1 A j 1 Lj 1 0 = V j 1 L j 1 I N j+1 L j 1 L 1 j 1 V j 1 0 A j, 2.1 where A j = A j 1 V j 1 D 1 j 1 V j 1 is the Schur complement. Cholesky factorizations can be represented by outer-product updates [27]. he matrix V j 1 D 1 j 1 V j 1 represents the update contributions from the first j 1 rows/columns to A j. It is clear that V j 1 D 1 j 1 V j 1 can be written in terms of the first j 1 columns of the Cholesky factor L as V j 1 D 1 j 1 V j 1 = V j 1 L j 1 V j 1 j 1L j 1 = Ũ k, where Ũk l j:n,k lj:n,k is an outer-product update from the k-th column of L. o conveniently consider how the update contributions are passed, a powerful tool called elimination tree is used. he following definitions can be found in [26, 27, 32, etc.]. Definition 2.1. he elimination tree A of the N N matrix A is the tree structure with N nodes {1,..., N} such that node p is the parent of j if and only if p = min{i > j l ij 0}. Definition 2.2. he j-th frontal matrix F j for A is a jj a ji1 a jip l jk a i1 j F j =. 0 + l i1 k. ljk l i1 k l ip k, k: proper a descendant of j ip j l ip k 2.2 where j, i 1,..., i p are the row subscripts of the nonzeros in L :,j. We call the first matrix on the right hand side of 2.2 the initial frontal matrix and denote it by Fj 0. he update matrix U j is the Schur complement from one step of elimination of F j to get the j-th column of L, that is, l jj 0 l jj l i1 j l ip j l i1j 1 F j = U j. l ip j 1 he frontal matrix F j carries contributions from matrix A and the outer-product contributions from those preceding nodes that are proper descendants of j. For each child c k of j, the update matrix U ck carries contributions from the subtree with root k=1

5 SUPERFAS MULIFRONAL MEHOD FOR LINEAR SYSEMS 5 c k. he contributions from all c k are assembled together by an operation called extend-add. he extend-add operator aligns the subscript sets of two matrices by padding zero entries, and then adds matrices. For example, if two matrices U ci = ai b i, i = 1, 2 have subscript sets {1, 2} and {1, 3}, respectively, then c i d i U c1 U c2 = a 1 b 1 0 c 1 d a 2 0 b c 2 0 d 2 = a 1 + a 2 b 1 b 2 c 1 d 1 0 c 2 0 d 2 he relationship between frontal matrices and update matrices can be revealed by F j = F 0 i U c 1 U c2 U cq,where nodes c 1, c 2,..., c q are the children of j. he manipulation of frontal and update matrices are related to the ordering of the nodes. Different orderings can lead to highly different performance of the method. Nested dissection proposed by George [15] is one of the most attractive methods for finding an elimination ordering in solving sparse systems. Nested dissection orders mesh nodes with the aid of separators. A separator is a small set of nodes whose removal divides the rest of the mesh or submesh into two disjoint pieces. Different levels of separators are used to order the nodes. hat is, lower level nodes are ordered before upper level ones. See Figure 2.1i. Note that here by a level we mean a set of separators in the same level of division which are either all horizontal or all vertical. It can be easily verified that the total number of levels in nested dissection is l = Olog 2 n. After the nodes are ordered, we compute the Cholesky factorization A = LL by Gaussian elimination which corresponds to the elimination of the mesh nodes unknowns from lower levels to upper levels.. i Partition with separators ii Node connections during elimination Fig After eliminating bottom levels Nodes may get connected during the elimination Figure 2.1ii. Consider matrix A j as in 2.1 and variables x k and x i j < i, k associated with nodes k and i. We say x k and x i are connected if their corresponding off-diagonal component in A j is nonzero. Assume no zeros are created through cancellation during the elimination process. hen the elimination of x j connects pairwise all x i i > j which were previously connected to x j [31, 15]. We use a supernodal version of the multifrontal method with nested dissection ordering to solve the discretized problems. Each separator in nested dissection is considered to be a node in the postordering elimination tree. he separators are put into different levels of the tree Figure 2.2. During elimination, the separators are eliminated following the nested dissection ordering of the mesh nodes. he elimination of separator i will connect all its neighbor separators, which, in the 2D mesh, we assume to be p 1, p 2, p 3, p 4 satisfying p 1 <

6 6 S. CHANDRASEKARAN, M. GU, X. S. LI AND J. XIA i Ordering separators ii Separator tree/nested dissection elimination tree Fig Ordering separators p 2 < p 3 < p 4 some of them may be empty. Here we say that the separators {i, p 1, p 2, p 3, p 4 } form an element, and that i is the pivot separator of this element. Figure 2.3i shows an example. Later when no confusion is caused we also call i a node of the elimination tree. Here we have two cases, depending on whether i is a bottom level separator in the mesh a leaf node in the elimination tree or not. i Leaf node i ii Non-leaf node i Fig Examples of elements. he ordering of p 1, p 2, p 3, p 4 may not necessarily follow their order in the elimination tree. If separator i is a bottom level separator in nested dissection or a leaf node in the elimination tree Figure 2.3i, the frontal matrix F i is simply the block form initial frontal matrix Fi 0 associated with node i. he elimination of separator i will provide the block column in L corresponding to i, and the Schur complement is the i-th update matrix. A ii A ip1 A ip2 A ip3 A ip4 A p1 A p1 i F i Fi 0 i = A p2i A p3 i 0, U i = A p2i A p3 i A 1 ii Aip1 A ip2 A ip3 A ip4 A A p4 i p4i 2.3 If separator i is not a leaf node Figure 2.3ii, we assume it has two children c 1 and c 2. hen F i is obtained by assembling the initial frontal matrix and the update matrices of its children with extend-add: F i = Fi 0 Lii 0 L U c1 U c2 = ii L Bi, 2.4 L Bi I 0 U i where B denotes the element boundary {p 1, p 2, p 3, p 4 }, and the elimination of separa-

7 SUPERFAS MULIFRONAL MEHOD FOR LINEAR SYSEMS 7 tor i yields U i. Note that U c1 carries contributions from the subtree rooted at c 1, or, it reflects the connections among the nodes in separators {p L 1, p 2, p L 3, i}. Here we say that the separators {c 1, p L 1, p 2, p L 3, i} form the left element, and {c 2, p R 1, p 2, p R 3, i} form the right element. In some situations separators may have empty neighbors for example, on mesh boundaries. We say an element is full if no separator is empty. Recursive application of the above procedure to all levels of the separator tree leads to a supernodal multifrontal method. he multifrontal method is just another way to carry out Gaussian elimination and it has the same complexity as nested dissection. hat is, he number of multiplicative operations required to factorize A in 1.1 is On 3. he number of nonzeros in the Cholesky factors is On 2 log 2 n. A stack structure is used for storing the update matrices. If the elimination tree is a full and balanced tree then the storage required for the update matrix stack is On Semiseparable structures and low-rank property. In our superfast multifrontal method, frontal matrices and update matrices are represented in HSS forms. A block 4 4 HSS matrix looks like m 1 m 2 m 3 m 3 m 1 D 1 U 1 B 21 V2 U 1 R 21 B 11 W23V 3 U 1 R 21 B 11 W24V 4 m 2 U 2 B 22 V1 D 2 U 2 R 22 B 11 W23V 3 U 2 R 22 B 11 W24V 4 U 3 R 23 B 12 W21V 1 U 23 R 23 B 12 W22V 2 D 3 U 3 B 23 V4, m 3 m 4 U 4 R 24 B 12 W 21V 1 U 24 R 24 B 12 W 22V 2 U 4 B 24 V 3 D where we use the postordering HSS notations introduced in [8] which are generalized and also simplified versions of those in [9, 10]. he matrices D i, U i,... are called generators. An HSS matrix can be conveniently represented by a binary tree structure, called HSS tree Figure 2.4. In an HSS tree, generators are associated with the nodes and edges. We may also say that generators are solely associated with nodes, since edges can be viewed as attached to nodes. We can identify the matrix blocks based on the paths connecting the nodes. For example the 2, 3 block of 2.5 can be defined by the path connecting nodes 2 and 4 in the bottom level: Fig HSS tree for 2.5. he hierarchical structure of HSS matrices can be illustrated by writing 2.5 as D 3 U 3 B 3 V6 U 6 B 6 V3 D 6 where the new generators for nodes 3 and 6 are defined in terms of the generators,

8 8 S. CHANDRASEKARAN, M. GU, X. S. LI AND J. XIA associated with their child nodes. For example, D D 3 = 1 U 1 B 1 V2 U1 R U 2 B 2 V1, U D 3 = 1 2 U 2 R 2 V1 W, V 3 = 1 U 2 W 2 Due to this structure we only store R i, W i, and bottom level D i, U i, and V i generators. For an HSS matrix with HSS rank p, when p is small and the sizes of all B i are close to p, we say that the HSS matrix is compact a more formal definition is given in [8]. For a symmetric HSS matrix we can set U i = V i, R i = W i, and B i = Bj where j is the sibling of i. It is shown in [8, 10] that for a compact HSS matrix, linear complexity system solvers exist. Many other HSS matrix operations such as structure generation, compression, etc. are also very efficient. 3. Superfast multifrontal method. Notice that in the multifrontal method for discretized equations the frontal and update matrices are generally dense because of the mutual connections among mesh nodes Figure 2.1ii. he elimination together with the extend-add operation on such an N N matrix typically take ON 3 floating point operations, in exact arithmetic. Here we consider approximations of these dense matrices. Approximations of dense matrices are feasible in solving linear systems derived from discretizations of certain PDEs such as elliptic equations, as we discover that low-rank properties exist in these problems. It turns out that we can reveal these low-rank properties by carefully organizing the elimination. In fact, if we use the multifrontal method and the nest dissection ordering of the nodes, the frontal and update matrices for those discretized problems e.g. Model Problem 1.1 have low-rank properties. herefore, we use low-rank structures to approximate those matrices. he main idea here is to employ HSS operations in the supernodal multifrontal method, due to the hierarchical/parallel tree structures of both HSS matrices and the multifrontal method. For discretized equations the HSS operations are generally complicated. We have developed a series of HSS operations to take advantage of the low-rank properties [8, 10]. Additional HSS operations necessary for our superfast multifrontal method will be presented here. With these techniques we are able to reduce the total complexity for solving the problems from On 3 to Opn 2 and storage from On 2 log 2 n to On 2 log 2 p, where p is a parameter related to the problem and the tolerance for matrix approximations Off-diagonal numerical ranks. he low-rank property has certain physical background. For example the paper [4] considers a 2D physical model consisting of a set of particles with pairwise interactions satisfying Coulomb s law. he authors define well-separated sets of particles to be the sets that have strong interactions within sets but weak ones between different sets. Here when we consider 2D meshes for discretized problems we have a similar situation. In Figure 2.1ii the nodes in the current-level element are mutually connected. But their connections differ between separators. When we consider the connections within a separator we can divide the separator into several pieces in the circles in Figure 3.1 i. Nodes inside each piece have stronger connections, while the connections between any two pieces are considered to be weak. When we consider the connections between separators, we can think of the nodes close to other separators in the circles in Figure 3.1 ii to have stronger connections. hus if we order the nodes in certain ways we can expect the off-diagonal blocks of the frontal/update matrices to have small numerical ranks. For example, we can impose the following rule..

9 SUPERFAS MULIFRONAL MEHOD FOR LINEAR SYSEMS 9 i Connections within separators ii Connections between separators Fig Well-separated sets in separators. Rule 3.1. Separators in an element are ordered counterclockwise as shown in Figure 3.2i. he nodes inside each separator are ordered following the uniform ordering of mesh nodes left-right and top-down. We may also order the nodes in the boundary separators counterclockwise. i Counterclockwise ordering of separators ii Rank pattern of F i Fig Counterclockwise ordering of separators and the resulting rank pattern of F i. hen we will have the corresponding frontal matrix F i, say, in the pattern as shown in Figure 3.2ii, where dark pieces denote high-rank blocks, and others blocks are low-rank. Some variations of the pattern in Figure 3.2i can be similarly considered, for example, a transpose of the pattern, or one with certain separators missing e.g. for boundary elements. Note that after the elimination of separator i we will get the update matrix U i Schur complement F i = Aii A ib = A Bi A BB Lii 0 L ii L Bi I 0 U i L Bi. 3.1 For the frontal/update matrices, we make the following critical rank observations. Observations. In the supernodal multifrontal method for solving Model Problem 1.1 the frontal/update matrices in 3.1 have the following properties with appropriate tolerance and block sizes if applicable: he off-diagonal block A ib is a numerically low-rank matrix. he off-diagonal blocks of A ii the leading pivot block of F i have small numerical ranks. he off-diagonal blocks of U i have small numerical ranks. Here a tolerance τ is used in the rank-revealing QR factorization or τ-accurate SVD to compute the numerical ranks. We present some rank results in the multifrontal method for solving Model Problem 1.1, the 5-point discretized Laplacian. he type of off-diagonal blocks is as shown in Figure 1.1. Again by ranks we mean numerical ranks if they are not explicitly stated. a Numerical rank of A ib.

10 10 S. CHANDRASEKARAN, M. GU, X. S. LI AND J. XIA We choose some frontal matrices F i from different mesh size problems. Each F i is associated with a complete element in the mesh no empty separator. We compute the numerical rank for A ib in each F i see 3.1. able 3.1 shows the rank results from four A ib matrices which are from mesh sizes n = 127, 255, 511, and 1023, respectively. τ SizeA ib able 3.1 Numerical ranks of some A ib matrices with different absolute tolerances τ. b Off-diagonal numerical ranks of A ii. Similarly, we test the off-diagonal ranks for A ii matrices. As an example, we use the frontal matrix corresponding to the top level separator of a mesh. In such a situation F i A ii has order We choose a fixed block size and make all bottom level HSS off-diagonal blocks to have the same row dimension. hese offdiagonal blocks are then ordered top down as in Figure 1.1. heir ranks are computed and plotted vs their block numbers. o better demonstrate the rank distribution we scale the block numbering to be within [0, 1] so that we can put into one graph all plots for different choices of block sizes. Figure 3.3 shows the ranks of these offdiagonal blocks. When we increase the block sizes the ranks increase only slightly. his illustrates the low-rank property of these matrices. Numerical ranks for off diagonal blocks with τ=10 6 Numerical ranks for off diagonal blocks with τ= block size: 16 block size: 32 block size: 64 block size: 128 block size: block size: 16 block size: 32 block size: 64 block size: 128 block size: Rank 14 Rank Scaled block numbering i τ = Scaled block numbering ii τ = 10 4 Fig Scaled block numbering vs ranks of HSS off-diagonal block rows of a A ii with different block row sizes. c Off-diagonal numerical ranks of update matrix U i. Figure 3.4 shows the numerical ranks of some HSS off-diagonal blocks for an update matrix U i with dimension N = he mesh size is n = Similar situations hold for the frontal matrices, although their off-diagonal ranks are slightly larger. See [34]. For larger mesh sizes the low-rank property is more significant, as the numerical ranks do not increase much when the matrix dimensions increase. Also larger tolerances lead to smaller numerical ranks. he low-rank property also exists in many other discretized problems such as certain integral equations.

11 SUPERFAS MULIFRONAL MEHOD FOR LINEAR SYSEMS 11 Numerical ranks for off diagonal blocks with τ=10 6 Numerical ranks for off diagonal blocks with τ= block size: 16 block size: 32 block size: 64 block size: 128 block size: block size: 16 block size: 32 block size: 64 block size: 128 block size: Rank 20 Rank Scaled block numbering Scaled block numbering i τ = 10 6 ii τ = 10 4 Fig Ranks of ype-2 off-diagonal blocks for a U i with different block row sizes Overview of structured extend-add and factorization. In Subsection 3.1 we have seen that the frontal/update matrices in the multifrontal method for some discretized problems have the low-rank property. We can take advantage of this property and replace the traditional Cholesky factorization and extend-add by structured factorization and extend-add. We will use low-rank structures HSS to represent matrices in the factorization and extend-add. Accordingly, we should pay special attention to the operations of the elements and nodes in the mesh, because any major operation will likely affect the matrix structures. Figure 3.5 shows two elements, each with a separator i, its four neighbors p 1,..., p 4 in upper levels, and its children c 1 and c 2. Element ii can be viewed as a transpose of i. hus it is sufficient to consider i. i Left-right orientation ii op-down orientation Fig Extend-add for one element. Element ii can be viewed as a transpose of i. By the rule for ordering separators in Figure 3.2 we have the frontal matrix F i from the extend-add operation 2.4 F i = F 0 i U c1 U c2 F 0 i + Ûc 1 + Ûc 2, 3.2 where the matrices take the nonzero patterns as in Figure 3.6. here are three key points about the how the separators in the child elements appear in their parent element during the extend-add. Firstly, we can see that in the element box for the child c 1 left element, the neighbors corresponding to U c1 follow the order {p 2, p L 1, i, p L 3 } according to Rule 3.1, while in the element box for i one level upper, the separators corresponding to F i follow the order {i, p 1, p 2, p 3, p 4 }. hat is, the blocks in U c1 and U c2 do not follow the natural order for those of F i. hus certain permutations on U c1 and U c2 are needed to align their blocks with the blocks of F i.

12 12 S. CHANDRASEKARAN, M. GU, X. S. LI AND J. XIA Fig Matrices in extend-add 3.2. he + shaped bars represent the overlap in separator p 1 p L 1 and pr 1. Secondly, separators in one child element may not appear in the other. For example, p 2 appears in the left element but not in the right one. hus we need to pad some zero blocks into U c1 and U c2. hen we need the following permutation-extension: Rule 3.2. Left permutation-extension to obtain Ûc 1 : if U c1 is partitioned into four blocks conformally according to separators {p 2, p L 1, i, p L 3 } in this order, exchange p 2 and i to get {i, p L 1, p 2, p L 3 }. hen insert appropriate zero blocks to get {i, {p L 1, 0}, p 2, {p L 3, 0}, 0} where the blocks have sizes consistent with {i, p 1, p 2, p 3, p 4 }. Similarly we can define a right permutation-extension to obtain Ûc 2. Lastly, the child elements share parts of the separators. Here the left and right elements share the entire separator i, which becomes the pivot separator in the upper element. he union of separator p L 1 in the left element and p R 1 in the right one is separator p 1, but there are overlaps between p L 1 and p R 1. A similar situation happens to p L 3 and p R 3. During the extend-add the overlaps may not always correspond to an entire HSS block row/column, then certain blocks may need to be split and merged with other blocks. he techniques in Subsection will be used. Figure 3.6 also shows the overlap situation for separator p 1 marked by the + shaped bars. All these operations are done with HSS structures. After F i is formed we then eliminate the pivot separator i. his elimination and the computation of the Schur complement are also in done with HSS matrices using the algorithms in [8]. hus the update matrix remains structured again. Before we go to the details of those HSS operations we give an overview of the algorithm. Algorithm 3.3. Multifrontal method with structured factorizations 1. Use nested dissection to order the nodes. Build postordering elimination tree of separators. 2. Do traditional Cholesky factorization and extend-add at certain bottom levels. 3. Do structured factorization and extend-add at upper levels. a At the switching level, construct HSS representations for update matrices and do structured extend-add. b In later levels, factorize the pivot blocks of structured frontal matrices to obtain structured update matrices. Do structured extend-add Superfast multifrontal method with HSS structures. According to the previous two subsections, when Model Problem 1.1 is solved with the multifrontal method the update matrices and frontal matrices can be represented by compact HSS matrices. he elimination of a separator with order N can be done by a generalized

13 SUPERFAS MULIFRONAL MEHOD FOR LINEAR SYSEMS 13 HSS Cholesky factorization in [8] which has complexity Op 2 N where p is an appropriate HSS rank. he computations of Schur complements update matrices are also done efficiently with HSS matrices. he details can be found in [8]. In this section we discuss some new HSS algorithms which build the structured HSS extend-add Permuting HSS blocks. It is convenient to permute a general HSS matrix by permuting its HSS tree. In many applications, say, our superfast multifrontal method, we can find the new HSS form for the permuted matrix by updating just few generators. For example, we consider to permute two neighbor block rows/columns in certain level of the HSS matrix. his corresponds to the permutation of two neighbor HSS subtrees subtrees with roots to be siblings. See Figure 3.7. Fig Permuting two subtrees. hus we can simply exchange generators associated with i and j, and all their children, without updating the matrices. he new matrix is still in HSS form. However, if the two subtrees are not neighbors we have different ways. One way is to write the new matrix as a sum of certain HSS matrices and then compress with the techniques in [8]. But this is usually expensive. A more efficient way to identify the path connecting the two subtrees and update the matrices associated with the nodes in the path and those connected to the path. As this varies for different situations we will come back to it in Subsection where the permutations are specifically designed for our superfast multifrontal method Merging and splitting HSS blocks. Elementary merging and splitting. he hierarchical structure of HSS matrices make it very convenient to merge and split blocks in many situations. For example, in Figure 3.8 we can merge the leaf nodes c 1 and c 2 in i and update node i by setting U i = Uc1 R c1 U c2 R c2, V i = Vc1 W c1 V c2 W c2, D i = D c1 U c2 B c2 V c 1 U c1 B c1 V c 2 D c i Node i with children c 1, c 2 ii Node i Fig Merging and splitting.

14 14 S. CHANDRASEKARAN, M. GU, X. S. LI AND J. XIA On the other hand if we want to split i in i into c 1 and c 2 then we need to find D ck, U ck, V ck, R ck, W ck, B ck, k = 1, 2 such that 3.3 is satisfied. First, we partition U i, V i, D i conformally U i = Ui;1 U i;2, V i = hen compute QR factorizations Vi;1 V i;2, D i = Dc1 D i;1, D i;2,1 D c2 Ui;1 D i;1,2 = Uc1 Rc1 1, 1 V i;2 = V c2 B c1 Ui;2 D i;2,1 = Uc2 Rc2 2, 2 V i;1 W c2 = V c2 B c2 W c1, Equations provides all necessary new generators. Advanced merging and splitting. here are more complicated situations that are very useful. Sometimes we need to maintain the tree structure during merging or splitting. For an N N HSS matrix H, we consider to split a piece from a leaf node i of the HSS tree of H and then to merge that piece with a neighbor j. We consider a general situation where i and j are not siblings. Without loss of generality we make two simplifications. One is that the matrix is symmetric; another is that block j is an empty block or zero block whose size is to be set by the splitting. It suffices to look at an example in Figure 3.9 where i = 5 and j = 10. Fig Splitting a block node 5. c 1 and c 2 are virtual nodes. We first split node i into two child blocks c 1 and c 2 by the techniques above, hat is, we introduce two virtual nodes in the tree. We will then move node c 2 to the position of node j. After that we will merge c 1 into i. Identify the path connecting node c 2 and node 10: pc 1, i : c

15 SUPERFAS MULIFRONAL MEHOD FOR LINEAR SYSEMS 15 We observe that in order to get a new HSS representation for H denoted H all the generators associated with the nodes in this path and those directly connected to it should be updated. Here we use tilded notations for the generators of H, with c 1 not merged to i yet. We call the subtrees rooted at nodes 9 and 14 left and right subtrees, respectively. o get the HSS tree for H, we consider the following connections here by a connection we mean the product of the generators associated with the edges connecting two nodes. he connection between node 15 and c 2 should be transferred to the connection between 15 and 10. he connection between node c 2 and each node k= 1, 2, 3, 4, c 1 that is directly connected to the path pc 1, i should be transferred to the connection between node 10 and k. he connection between any two nodes k 1, k 2 = 1, 2, 3, 4, c 1 that are directly connected to the path pc 1, i should remain the same. he connections between node 15 and nodes k= 1, 2, 3, 4, c 1 that are directly connected to the path pc 1, i should remain the same. All these connection changes can be reflected by appropriate products of generators associated with the nodes. See [34] for the details. We can assemble all the matrix products into one single equation 0 R 14 Ũ14 R 1 R 9 B9 Ũ 14 R 2 B 1 R 8 R 9 B9 Ũ14 R 3 B 2 R 7 B 1 R 8 R 9 B9 Ũ14 R 4 B 3 R 6 B 2 R 7 B 1 R 8 R 9 B9 Ũ14 R c1 B 4 R 5 B 3 R 6 B 2 R 7 B 1 R 8 R 9 B9 Ũ14 0 R9 R8 R7 R6 R5 Rc 2 Uc 2 R 1 R 9 B 1 R7 R6 R5 Rc 2 U c 2 = R 2 B1 R 8 R 9 B 2 R6 R5 Rc 2 Uc 2 R 3 B2 R 7 B1 R 8 R 9 B 3 R5 Rc 2 Uc, R 4 B3 R 6 B2 R 7 B1 R 8 R 9 B 4 Rc 2 Uc 2 R c1 B4 R 5 B3 R 6 B2 R 7 B1 R 8 R 9 B c1 Uc 2 where Ũ14 = Ũ10 R 10 R11 R12 R13 which collects the connections in the entire right subtree. he equation 3.7 is partitioned into four nonzero blocks corresponding to the above four types of connection changes. It turns out that we can construct the generators on the left hand side sequentially as follows by using 3.7. First consider the last row in 3.7. Compute a QR factorization of the right hand side such that Rc1 B 4 R 5 B 3 R 6 B 2 R 7 B 1 R 8 R 9 B9 Ũ14 = Q 1 1. hen we partition 1 = 1;1 1;2 such that 1;1 has the same column dimension as B4. hus we can let R c1 = Q 1, B 4 = 1;1, and R5 B 3 R 6 B 2 R 7 B 1 R 8 R 9 B9 Ũ14 = 1; his way we remove one layer from the last row. Next combine 3.8 with the fourth row of 3.7 R4 R 5 B 3 R6 B 2 R 7 B 1 R 8 R 9 B9 Ũ 14 = B4 R c 2 U c 2 1;2.

16 16 S. CHANDRASEKARAN, M. GU, X. S. LI AND J. XIA Again compute a QR factorization Q 2 2 of the right hand side. hen partition Q 2 = Q 2;1 Q 2;2 such that Q2;1 has the same row size as B 4, and partition 2 = 2;1 2;2 such that 2;1 has the same number columns as B3. We can set R 4 = Q 2;1, R 5 = Q 2;2, B 3 = 2;1, and R6 B 2 R 7 B 1 R 8 R 9 B9 Ũ 14 = 2; Now we can combine 3.9 with the third row of 3.7, and the above procedure repeats. Finally, it is trivial to merge node c 2 into i. he overall process costs ON flops Inserting and deleting HSS blocks. Sometimes we need to insert a block row/column to an HSS matrix or remove a block row/column from it. o delete a block is usually straightforward. For simplicity, in this subsection we consider symmetric HSS matrices. As an example in Figure 3.10i if we remove node 1, we remove any generators associated with it and merge node 2 into 3 by setting U 3 = U 2 R 2. o insert a block row/column into an HSS matrix the result depends on the desired HSS structure. For example, suppose Figure 3.10i is the HSS tree for an N N matrix H, and we insert a new node i between node 2 and 4 to get a new matrix H with tree structure as in Figure 3.10ii or iii. i HSS tree example ii Inserting a node i iii Another way to insert i Fig Inserting a node. Dark nodes and edges should be updated or created. We only need to update a few generators shown in dark in Figure Again, we can consider the connection changes among nodes and use QR factorizations to find the new generators. he details are similar to those in the previous subsection and can be found in [34]. Note that in Figure 3.10, the HSS tree can be more general, and that the node i to be inserted can also represent another HSS tree. In all cases, we only need to update few generators to get the new matrix. hus the total cost is no more than ON flops HSS extend-add. Recall in Section 3.2 the discussions of the structured extend-add with regard to an element and its two child elements as shown in Figure 3.5 or also Figure he frontal and update matrices in the extend-add F i = F 0 i U c 1 U c2 F 0 i + Ûc 1 + Ûc 2 have the relationship as shown in Figure 3.6. Here we consider to use HSS structures in the extend-add. hat is, all matrices are in HSS forms. he block sizes of HSS matrices follow this rule:

17 SUPERFAS MULIFRONAL MEHOD FOR LINEAR SYSEMS 17 Fig Extend-add for one element. Separators in the solid-line ovals are for update matrix U c1 and in dashed-line ovals are for update matrix U c2. Rule 3.4. In the multifrontal method with HSS structures we keep the HSS block sizes approximately the same. Avoid too large or too small blocks. Extend-add operations accumulate the numbers of blocks. he reason for this is, when the sizes of frontal/update matrices increase their HSS ranks increase very slowly. We can achieve high efficiency by setting the block sizes close to HSS ranks, as have seen in various HSS operations [10, 8]. he general HSS extend-add procedure is as follows. Assume that before the extend-add the frontal matrices F c1 and F c2 for the left and right elements, respectively, are already in HSS forms. hese HSS forms can come recursively from lower level separators, or from simple constructions at the starting level of structured factorizations. For simplicity, assume each separator is represented by one leaf of the HSS tree, although each leaf can be potentially a subtree. here are then five leaf nodes in the HSS tree. As we use HSS trees with full siblings we naturally use trees as shown in the first row of Figure A tree like that has the minimum depth among all binary trees with full siblings. he two trees in the first row of Figure 3.13 are for F c1 and F c2, respectively, with their leaf nodes marked by the separators in Figure hose graphs also show the eliminations of c 1 and c 2 to get Ûc 1 and Ûc 2, respectively. Separators in the figure are also marked with numbers 1, {1}, etc. ypically, there are five steps as follows for an extend-add operation to advance from the level of c 1 and c 2 to the level of i. For convenience, we also include the factorization of frontal matrices here at Step 0. 0 Eliminate c 1 and c 2 and get update matrices U c1 and U c2 in HSS forms. 1 Permute the trees for the update matrices as in Rule Insert appropriate zero nodes to the HSS trees of U c1 and U c2 to get Ûc 1 and Û c2, respectively. 3 Split HSS blocks in Ûc 1 and Ûc 2 to handle overlaps in separators. Update Ûc 1 and Ûc 2. 4 Write the initial frontal matrix Fi 0 in HSS form based on the tree structure of U c1 U c2 Ûc 1 + Ûc 2. 5 Get F i by adding HSS matrices Fi 0, Ûc 1, Ûc 2. Compress F i when necessary. Step 0 can be done by applying the generalized HSS Cholesky factorization in [8] to only the leading principal blocks of F cj corresponding to separator c j group 0 in Figure 3.13 for j = 1, 2. When separator c i is removed, all the rest nodes in the HSS tree of F cj are updated using the Schur complement computation procedure in the HSS form traditional Cholesky factorization in [8]. he Schur complement corresponding to the updated tree is the update matrix U cj. Step 1 is to permute some branches of the HSS tree of U cj. Note that even if

18 18 S. CHANDRASEKARAN, M. GU, X. S. LI AND J. XIA the HSS tree has more levels we still only need to change few top level nodes, because we only need to permute the four separators. his means the cost of permutations in the superfast multifrontal method is O1, even if the update matrix has dimension N. Specifically, to permute U c1 see Figure 3.13, left column, to convert row 1 to row 2, we exchange group p 2 {1} and i{3} as discussed in Rule 3.2. We can set new generators in tilded notations of the permuted tree to be B 5 = B 4, R8 = B 7 R 6, B2 = R 2 R 3 B 7, R2 = R 2 B 3, B3 = I. Note that although these formulas look specific, they are sufficient for general extendadd in the superfast multifrontal method. Similarly, to permute U c2 see Figure 3.13, right column, to convert row 1 to row 2, we permute group p 4 3 and p R 3 4 as discussed in Rule 3.2. he new generators of the permuted tree should satisfy B 2 = R 2 R 3 R4, B8 = B7 R6 R5, R2 R2 R B 3 R R 8 R 5 = 3 B 7 R 2 R 3 R5. 4 R 4 R 6 B 7 B 4 An SVD of the right hand side of the last equation can provide all the generators on the left hand side. At Step 2 we insert zero blocks into the permuted updated matrices to get Ûc 1 and Ûc 2. For the left element, a zero block node is attached to the matrix tree, see Figure 3.13, left column, row 3. his operation is trivial in that only certain zero generators should be added. For the right element, a zero node is inserted between group p R 1 2 and p R 3 4. his was already discussed in Subsection After we get the HSS trees as shown in the last row of Figure 3.13 we use Step 3 to handle the overlaps so that Ûc 1 and Ûc 2 will have the same HSS tree structure. As discussed in Section 3.2, overlaps occur in separators p 1 and p 3. As an example, the overlap p L 1 p R 1 may correspond to HSS blocks in both Ûc 1 and Ûc 2, or more likely, parts of HSS blocks. For the later case Figure 3.12i we need to cut p L 1 p R 1 from either p L 1, on the right end, or from p R 1, on the left end. As in the HSS tree of Ûc 1 the separator p L 1 is followed by a zero block on the right, we can split the overlap from p L 1 and merge into the zero block on the right. Or we can handle p R 1 similarly. We use the splitting procedure in Subsection Now Ûc 1 and Ûc 2 have the same HSS tree structure Figure 3.12ii which becomes the structure of Ûc 1 + Ûc 2 and also F i. hen at Step 4 we convert the initial frontal matrix Fi 0 into an HSS form following the HSS structure of Ûc 1 +Ûc 2. Note that as Fi 0 is often very sparse because the original matrix A is, we are generally able to develop its HSS form symbolically in advance. For example, for Model Problem 1.1 with nested dissection ordering, Fi 0 in 2.3 has the sparsity pattern where A ii is tridiagonal, A ip2 = 0, A ip4 = 0, and each of A ip1 and A ip3 has only one nonzero entry. Such a matrix has HSS rank 2. Now at Step 5 we are ready to get F i by computing the HSS sum of Fi 0, Û c1, and Ûc 2 with formulas in [8]. he sizes of the generators of F i increase quickly after this addition, although the actual HSS rank of F i will not. hus usually the HSS addition is followed by a compression step [8]. Now F i is in compact HSS form and we can continue the factorization along the elimination tree Algorithm and performance. Based on the previous discussions, we now show the main algorithm for our superfast multifrontal method with HSS structures. Before that we first clarify a few implementation issues for the HSS operations.

19 SUPERFAS MULIFRONAL MEHOD FOR LINEAR SYSEMS 19 i ii Fig i: HSS partitions of p L 1 and pr 1 showing the overlap pl 1 pr 1 dark band in the oval and how it builds into p 1 = p L 1 pr 1 ; ii: HSS tree structure of F i Implementation issues. One issue is about the HSS factorization of the frontal matrices. he techniques of the traditional Cholesky factorization of a SPD HSS matrix in [8] can be used to compute the Schur complement U i. his computation is efficient, since it corresponds to the removal of one single HSS tree node, the root of the HSS branch corresponding to separator i. he second issue is related to the mesh boundary. For convenience, we can assume the mesh boundary corresponds to empty separators. We may then have empty branches in HSS trees. An empty branch is represented by one empty node which is not associated with any actual separator or operation. Empty nodes do not accumulate or change. Another issue is to pre-determine the HSS structures before the actual factorizations. Similar to the traditional multifrontal method, we can have a symbolic factorization stage after the nested dissection to decide in advance the HSS structures for the frontal/update matrices at all levels of the elimination. Finally, we usually avoid too large or too small HSS block sizes. Because of this, in the bottom levels of elimination we do not use HSS factorizations, instead, we use the traditional multifrontal method. hen starting from certain level l s we switch to HSS factorizations. We call this l s the switching level Factorization algorithm. Our superfast multifrontal method with HSS structures can be summarized into an algorithm. Algorithm 3.5. Superfast multifrontal method with HSS structures 1. Use nested dissection to order the nodes in the n n mesh. Build elimination tree with separator ordering. Assume the total number of separators to be s, and the total number of levels to be l = log 2 n. 2. Decide l s, the switching level from traditional Cholesky factorizations to structured factorizations see heorem 3.6 below. 3. For separator i = 1,..., s a If separator i is at level l i > l s, do traditional Cholesky factorization and extend-add. i. If i is a leaf node in the elimination tree, obtain the frontal matrix F i from A and compute U i as in 2.3. Push U i onto an update

Sparse matrices, graphs, and tree elimination

Sparse matrices, graphs, and tree elimination Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse

More information

PARALLEL RANDOMIZED AND MATRIX-FREE DIRECT SOLVERS FOR LARGE STRUCTURED DENSE LINEAR SYSTEMS

PARALLEL RANDOMIZED AND MATRIX-FREE DIRECT SOLVERS FOR LARGE STRUCTURED DENSE LINEAR SYSTEMS SIAM J SCI COMPU Vol 38, No 5, pp S508 S538 c 016 Society for Industrial and Applied Mathematics PARALLEL RANDOMIZED AND MARIX-FREE DIREC SOLVERS FOR LARGE SRUCURED DENSE LINEAR SYSEMS XIAO LIU, JIANLIN

More information

c 2006 Society for Industrial and Applied Mathematics

c 2006 Society for Industrial and Applied Mathematics SIAM J. MATRIX ANAL. APPL. Vol. 29, No. 1, pp. 67 81 c 2006 Society for Industrial and Applied Mathematics A FAST SOLVER FOR HSS REPRESENTATIONS VIA SPARSE MATRICES S. CHANDRASEKARAN, P. DEWILDE, M. GU,

More information

Sparse Matrices Direct methods

Sparse Matrices Direct methods Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.

More information

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination

More information

PROGRAMMING OF MULTIGRID METHODS

PROGRAMMING OF MULTIGRID METHODS PROGRAMMING OF MULTIGRID METHODS LONG CHEN In this note, we explain the implementation detail of multigrid methods. We will use the approach by space decomposition and subspace correction method; see Chapter:

More information

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00 PRLLEL SPRSE HOLESKY FTORIZTION J URGEN SHULZE University of Paderborn, Department of omputer Science Furstenallee, 332 Paderborn, Germany Sparse matrix factorization plays an important role in many numerical

More information

Lecture 27: Fast Laplacian Solvers

Lecture 27: Fast Laplacian Solvers Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall

More information

Numerical Algorithms

Numerical Algorithms Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0

More information

Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves

Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves Parallelizing Adaptive Triangular Grids with Refinement Trees and Space Filling Curves Daniel Butnaru butnaru@in.tum.de Advisor: Michael Bader bader@in.tum.de JASS 08 Computational Science and Engineering

More information

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor

More information

Sparse Direct Solvers for Extreme-Scale Computing

Sparse Direct Solvers for Extreme-Scale Computing Sparse Direct Solvers for Extreme-Scale Computing Iain Duff Joint work with Florent Lopez and Jonathan Hogg STFC Rutherford Appleton Laboratory SIAM Conference on Computational Science and Engineering

More information

AA220/CS238 Parallel Methods in Numerical Analysis. Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems)

AA220/CS238 Parallel Methods in Numerical Analysis. Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems) AA0/CS8 Parallel ethods in Numerical Analysis Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems) Kincho H. Law Professor of Civil and Environmental Engineering Stanford University

More information

A comparison of parallel rank-structured solvers

A comparison of parallel rank-structured solvers A comparison of parallel rank-structured solvers François-Henry Rouet Livermore Software Technology Corporation, Lawrence Berkeley National Laboratory Joint work with: - LSTC: J. Anton, C. Ashcraft, C.

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Lecture 17: More Fun With Sparse Matrices

Lecture 17: More Fun With Sparse Matrices Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you

More information

What is Multigrid? They have been extended to solve a wide variety of other problems, linear and nonlinear.

What is Multigrid? They have been extended to solve a wide variety of other problems, linear and nonlinear. AMSC 600/CMSC 760 Fall 2007 Solution of Sparse Linear Systems Multigrid, Part 1 Dianne P. O Leary c 2006, 2007 What is Multigrid? Originally, multigrid algorithms were proposed as an iterative method to

More information

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Sergey Solovyev 1, Dmitry Vishnevsky 1, Hongwei Liu 2 Institute of Petroleum Geology and Geophysics SB RAS 1 EXPEC ARC,

More information

Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009

Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Dan l Pierce, PhD, MBA, CEO & President AAI Joint with: Yukai Hung, Chia-Chi Liu, Yao-Hung Tsai, Weichung Wang, and David Yu Access

More information

Parallelizing LU Factorization

Parallelizing LU Factorization Parallelizing LU Factorization Scott Ricketts December 3, 2006 Abstract Systems of linear equations can be represented by matrix equations of the form A x = b LU Factorization is a method for solving systems

More information

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3 6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning

Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Michael M. Wolf 1,2, Erik G. Boman 2, and Bruce A. Hendrickson 3 1 Dept. of Computer Science, University of Illinois at Urbana-Champaign,

More information

GraphBLAS Mathematics - Provisional Release 1.0 -

GraphBLAS Mathematics - Provisional Release 1.0 - GraphBLAS Mathematics - Provisional Release 1.0 - Jeremy Kepner Generated on April 26, 2017 Contents 1 Introduction: Graphs as Matrices........................... 1 1.1 Adjacency Matrix: Undirected Graphs,

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Outline. Definition. 2 Height-Balance. 3 Searches. 4 Rotations. 5 Insertion. 6 Deletions. 7 Reference. 1 Every node is either red or black.

Outline. Definition. 2 Height-Balance. 3 Searches. 4 Rotations. 5 Insertion. 6 Deletions. 7 Reference. 1 Every node is either red or black. Outline 1 Definition Computer Science 331 Red-Black rees Mike Jacobson Department of Computer Science University of Calgary Lectures #20-22 2 Height-Balance 3 Searches 4 Rotations 5 s: Main Case 6 Partial

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Approaches to Parallel Implementation of the BDDC Method

Approaches to Parallel Implementation of the BDDC Method Approaches to Parallel Implementation of the BDDC Method Jakub Šístek Includes joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík. Institute of Mathematics of the AS CR, Prague

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Performance Evaluation of a New Parallel Preconditioner

Performance Evaluation of a New Parallel Preconditioner Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Scanning Real World Objects without Worries 3D Reconstruction

Scanning Real World Objects without Worries 3D Reconstruction Scanning Real World Objects without Worries 3D Reconstruction 1. Overview Feng Li 308262 Kuan Tian 308263 This document is written for the 3D reconstruction part in the course Scanning real world objects

More information

Downloaded 04/17/18 to Redistribution subject to SIAM license or copyright; see

Downloaded 04/17/18 to Redistribution subject to SIAM license or copyright; see SIAM J. MATRIX ANAL. APPL. Vol. 37, No. 1, pp. 338 353 c 2016 Society for Industrial and Applied Mathematics A FAST MEMORY EFFICIENT CONSTRUCTION ALGORITHM FOR HIERARCHICALLY SEMI-SEPARABLE REPRESENTATIONS

More information

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY General definitions; Representations; Graph Traversals; Topological sort; Graphs definitions & representations Graph theory is a fundamental tool in sparse

More information

Chapter 4. Matrix and Vector Operations

Chapter 4. Matrix and Vector Operations 1 Scope of the Chapter Chapter 4 This chapter provides procedures for matrix and vector operations. This chapter (and Chapters 5 and 6) can handle general matrices, matrices with special structure and

More information

Lecture 3: Camera Calibration, DLT, SVD

Lecture 3: Camera Calibration, DLT, SVD Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P

More information

On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme

On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme Emmanuel Agullo 1,6,3,7, Abdou Guermouche 5,4,8, and Jean-Yves L Excellent 2,6,3,9 1 École Normale Supérieure de

More information

Least-Squares Fitting of Data with B-Spline Curves

Least-Squares Fitting of Data with B-Spline Curves Least-Squares Fitting of Data with B-Spline Curves David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

2D/3D Geometric Transformations and Scene Graphs

2D/3D Geometric Transformations and Scene Graphs 2D/3D Geometric Transformations and Scene Graphs Week 4 Acknowledgement: The course slides are adapted from the slides prepared by Steve Marschner of Cornell University 1 A little quick math background

More information

Lecture 3: Art Gallery Problems and Polygon Triangulation

Lecture 3: Art Gallery Problems and Polygon Triangulation EECS 396/496: Computational Geometry Fall 2017 Lecture 3: Art Gallery Problems and Polygon Triangulation Lecturer: Huck Bennett In this lecture, we study the problem of guarding an art gallery (specified

More information

Organizing Spatial Data

Organizing Spatial Data Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the

More information

Null space computation of sparse singular matrices with MUMPS

Null space computation of sparse singular matrices with MUMPS Null space computation of sparse singular matrices with MUMPS Xavier Vasseur (CERFACS) In collaboration with Patrick Amestoy (INPT-IRIT, University of Toulouse and ENSEEIHT), Serge Gratton (INPT-IRIT,

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

SCALABLE ALGORITHMS for solving large sparse linear systems of equations

SCALABLE ALGORITHMS for solving large sparse linear systems of equations SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational

More information

Hypercubes. (Chapter Nine)

Hypercubes. (Chapter Nine) Hypercubes (Chapter Nine) Mesh Shortcomings: Due to its simplicity and regular structure, the mesh is attractive, both theoretically and practically. A problem with the mesh is that movement of data is

More information

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction C O N D I T I O N S C O N D I T I O N S Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.86: Parallel Computing Spring 21, Agarwal Handout #5 Homework #

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression Xiaoye Sherry Li, xsli@lbl.gov Liza Rebrova, Gustavo Chavez, Yang Liu, Pieter Ghysels Lawrence Berkeley National

More information

Journal of Computational Physics

Journal of Computational Physics Journal of Computational Physics 23 (22) 34 338 Contents lists available at SciVerse ScienceDirect Journal of Computational Physics journal homepage: www.elsevier.com/locate/jcp A fast direct solver for

More information

Lecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions

Lecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions U.C. Berkeley CS70 : Algorithms Midterm 2 Solutions Lecturers: Sanjam Garg and Prasad aghavra March 20, 207 Midterm 2 Solutions. (0 points) True/False Clearly put your answers in the answer box in front

More information

Mathematics. 2.1 Introduction: Graphs as Matrices Adjacency Matrix: Undirected Graphs, Directed Graphs, Weighted Graphs CHAPTER 2

Mathematics. 2.1 Introduction: Graphs as Matrices Adjacency Matrix: Undirected Graphs, Directed Graphs, Weighted Graphs CHAPTER 2 CHAPTER Mathematics 8 9 10 11 1 1 1 1 1 1 18 19 0 1.1 Introduction: Graphs as Matrices This chapter describes the mathematics in the GraphBLAS standard. The GraphBLAS define a narrow set of mathematical

More information

Methods of solving sparse linear systems. Soldatenko Oleg SPbSU, Department of Computational Physics

Methods of solving sparse linear systems. Soldatenko Oleg SPbSU, Department of Computational Physics Methods of solving sparse linear systems. Soldatenko Oleg SPbSU, Department of Computational Physics Outline Introduction Sherman-Morrison formula Woodbury formula Indexed storage of sparse matrices Types

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations

Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations Load Balancing for Problems with Good Bisectors, and Applications in Finite Element Simulations Stefan Bischof, Ralf Ebner, and Thomas Erlebach Institut für Informatik Technische Universität München D-80290

More information

Report of Linear Solver Implementation on GPU

Report of Linear Solver Implementation on GPU Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,

More information

The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method

The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method VLADIMIR ROTKIN and SIVAN TOLEDO Tel-Aviv University We describe a new out-of-core sparse Cholesky factorization

More information

Discrete mathematics

Discrete mathematics Discrete mathematics Petr Kovář petr.kovar@vsb.cz VŠB Technical University of Ostrava DiM 470-2301/02, Winter term 2018/2019 About this file This file is meant to be a guideline for the lecturer. Many

More information

CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus

CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus Tim Davis last modified September 23, 2014 1 Catalog Description CSCE 689. Special

More information

Mathematics and Computer Science

Mathematics and Computer Science Technical Report TR-2006-010 Revisiting hypergraph models for sparse matrix decomposition by Cevdet Aykanat, Bora Ucar Mathematics and Computer Science EMORY UNIVERSITY REVISITING HYPERGRAPH MODELS FOR

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION DESIGN AND ANALYSIS OF ALGORITHMS Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION http://milanvachhani.blogspot.in EXAMPLES FROM THE SORTING WORLD Sorting provides a good set of examples for analyzing

More information

I. INTRODUCTION. 2 matrix, integral-equation-based methods, matrix inversion.

I. INTRODUCTION. 2 matrix, integral-equation-based methods, matrix inversion. 2404 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 59, NO. 10, OCTOBER 2011 Dense Matrix Inversion of Linear Complexity for Integral-Equation-Based Large-Scale 3-D Capacitance Extraction Wenwen

More information

Framework for Design of Dynamic Programming Algorithms

Framework for Design of Dynamic Programming Algorithms CSE 441T/541T Advanced Algorithms September 22, 2010 Framework for Design of Dynamic Programming Algorithms Dynamic programming algorithms for combinatorial optimization generalize the strategy we studied

More information

High Performance Computing: Tools and Applications

High Performance Computing: Tools and Applications High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:

More information

Sparse Linear Systems

Sparse Linear Systems 1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite

More information

THE preceding chapters were all devoted to the analysis of images and signals which

THE preceding chapters were all devoted to the analysis of images and signals which Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to

More information

CSE 417 Dynamic Programming (pt 4) Sub-problems on Trees

CSE 417 Dynamic Programming (pt 4) Sub-problems on Trees CSE 417 Dynamic Programming (pt 4) Sub-problems on Trees Reminders > HW4 is due today > HW5 will be posted shortly Dynamic Programming Review > Apply the steps... 1. Describe solution in terms of solution

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

A Parallel Implementation of the BDDC Method for Linear Elasticity

A Parallel Implementation of the BDDC Method for Linear Elasticity A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague

More information

Adaptive-Mesh-Refinement Pattern

Adaptive-Mesh-Refinement Pattern Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points

More information

An efficient algorithm for sparse PCA

An efficient algorithm for sparse PCA An efficient algorithm for sparse PCA Yunlong He Georgia Institute of Technology School of Mathematics heyunlong@gatech.edu Renato D.C. Monteiro Georgia Institute of Technology School of Industrial & System

More information

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

AMS527: Numerical Analysis II

AMS527: Numerical Analysis II AMS527: Numerical Analysis II A Brief Overview of Finite Element Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 1 / 25 Overview Basic concepts Mathematical

More information

CS473-Algorithms I. Lecture 11. Greedy Algorithms. Cevdet Aykanat - Bilkent University Computer Engineering Department

CS473-Algorithms I. Lecture 11. Greedy Algorithms. Cevdet Aykanat - Bilkent University Computer Engineering Department CS473-Algorithms I Lecture 11 Greedy Algorithms 1 Activity Selection Problem Input: a set S {1, 2,, n} of n activities s i =Start time of activity i, f i = Finish time of activity i Activity i takes place

More information

Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices

Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices Nerma Baščelija Sarajevo School of Science and Technology Department of Computer Science Hrasnicka Cesta 3a, 71000 Sarajevo

More information

Preconditioning for linear least-squares problems

Preconditioning for linear least-squares problems Preconditioning for linear least-squares problems Miroslav Tůma Institute of Computer Science Academy of Sciences of the Czech Republic tuma@cs.cas.cz joint work with Rafael Bru, José Marín and José Mas

More information

Data mining with sparse grids

Data mining with sparse grids Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

A Comparison of Data Structures for Dijkstra's Single Source Shortest Path Algorithm. Shane Saunders

A Comparison of Data Structures for Dijkstra's Single Source Shortest Path Algorithm. Shane Saunders A Comparison of Data Structures for Dijkstra's Single Source Shortest Path Algorithm Shane Saunders November 5, 1999 Abstract Dijkstra's algorithm computes the shortest paths between a starting vertex

More information

Sparse Matrices and Graphs: There and Back Again

Sparse Matrices and Graphs: There and Back Again Sparse Matrices and Graphs: There and Back Again John R. Gilbert University of California, Santa Barbara Simons Institute Workshop on Parallel and Distributed Algorithms for Inference and Optimization

More information

Scheduling Strategies for Parallel Sparse Backward/Forward Substitution

Scheduling Strategies for Parallel Sparse Backward/Forward Substitution Scheduling Strategies for Parallel Sparse Backward/Forward Substitution J.I. Aliaga M. Bollhöfer A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ. Jaume I (Spain) {aliaga,martina,quintana}@icc.uji.es

More information

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem L. De Giovanni M. Di Summa The Traveling Salesman Problem (TSP) is an optimization problem on a directed

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

MA27 1 SUMMARY 2 HOW TO USE THE PACKAGE

MA27 1 SUMMARY 2 HOW TO USE THE PACKAGE PAKAGE SPEIFIATION HSL ARHIVE 1 SUMMARY To solve a sparse symmetric system of linear equations. Given a sparse symmetric matrix A = {a ij} n-vector b, this subroutine solves the system Ax=b. The matrix

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Probably the simplest kind of problem. Occurs in many contexts, often as part of larger problem. Symbolic manipulation packages can do linear algebra "analytically" (e.g. Mathematica,

More information

Evolutionary tree reconstruction (Chapter 10)

Evolutionary tree reconstruction (Chapter 10) Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early

More information

An Out-of-Core Sparse Cholesky Solver

An Out-of-Core Sparse Cholesky Solver An Out-of-Core Sparse Cholesky Solver JOHN K. REID and JENNIFER A. SCOTT Rutherford Appleton Laboratory 9 Direct methods for solving large sparse linear systems of equations are popular because of their

More information

Chapter 13. Boundary Value Problems for Partial Differential Equations* Linz 2002/ page

Chapter 13. Boundary Value Problems for Partial Differential Equations* Linz 2002/ page Chapter 13 Boundary Value Problems for Partial Differential Equations* E lliptic equations constitute the third category of partial differential equations. As a prototype, we take the Poisson equation

More information

CS 770G - Parallel Algorithms in Scientific Computing

CS 770G - Parallel Algorithms in Scientific Computing CS 770G - Parallel lgorithms in Scientific Computing Dense Matrix Computation II: Solving inear Systems May 28, 2001 ecture 6 References Introduction to Parallel Computing Kumar, Grama, Gupta, Karypis,

More information

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation Chapter 7 Introduction to Matrices This chapter introduces the theory and application of matrices. It is divided into two main sections. Section 7.1 discusses some of the basic properties and operations

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

Space-filling curves for 2-simplicial meshes created with bisections and reflections

Space-filling curves for 2-simplicial meshes created with bisections and reflections Space-filling curves for 2-simplicial meshes created with bisections and reflections Dr. Joseph M. Maubach Department of Mathematics Eindhoven University of Technology Eindhoven, The Netherlands j.m.l.maubach@tue.nl

More information

A Note on Scheduling Parallel Unit Jobs on Hypercubes

A Note on Scheduling Parallel Unit Jobs on Hypercubes A Note on Scheduling Parallel Unit Jobs on Hypercubes Ondřej Zajíček Abstract We study the problem of scheduling independent unit-time parallel jobs on hypercubes. A parallel job has to be scheduled between

More information

PARDISO Version Reference Sheet Fortran

PARDISO Version Reference Sheet Fortran PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly

More information