Tools and Libraries for Parallel Sparse Matrix Computations. Edmond Chow and Yousef Saad. University of Minnesota. Minneapolis, MN
|
|
- Michael Eaton
- 5 years ago
- Views:
Transcription
1 Tools and Libraries for Parallel Sparse Matrix Computations Edmond Chow and Yousef Saad Department of Computer Science, and Minnesota Supercomputer Institute University of Minnesota Minneapolis, MN June 1994 Abstract This paper describes two portable packages for general-purpose sparse matrix computations: SPARSKIT and P SPARSLIB. Their emphasis is on iterative techniques, with the latter also emphasizing parallel computation. The packages are a collection of tools which may be used either as a library, or as templates for the development of specialized codes. The majority of this paper will describe the key components of the parallel iterative solution of linear systems with P SPARSLIB. Key words: sparse matrix computations, parallel computing, matrix-vector product, partitioning, iterative methods, preconditioning, tools and libraries. 1 Introduction The complexity of parallel software makes it particularly necessary for development tools and software reuse. Tools and libraries for sparse matrix computations are scarce compared to packages such as LAPACK that are available for dense matrix computations. Two reasons for this are the high complexity of sparse matrix routines, and the need for dierent solution techniques and data structures to obtain good performance on various architectures. It is arguable that current numerical problems the problems that are worth solving are so dicult that high-performance, customized solution procedures are required, and therefore portable, high-level library routines are not useful. At the same time, it is clear that libraries are globally economical in the sense that the gains obtained from their overall use, however partial, can redeem their development price several times over. We wish to point out that a successful compromise may be the use of library codes as templates or a source of algorithms for developing machine-specic codes. In the current environment of quickly changing hardware and increasingly dicult problems, this approach of software reuse may not only be viable, but also unavoidable. Work supported by the NSF under grant NSF/CCR , and by ARPA under grant NIST 60NANB2D1272.
2 SPARSKIT and P SPARSLIB 2 We also mention here that what many researchers need is not necessarily high-performance routines, but a useful collection of routines on their platform for experimenting with algorithms. The sparse matrix support in MATLAB has been invaluable to this end, but unfortunately is not exible nor ecient enough for larger, more realistic problems. This paper describes SPARSKIT and P SPARSLIB, two FORTRAN 77 packages for sparse matrix computations. SPARSKIT is not designed to run on a parallel machine, but contains essential tools for developing research or specialized application codes, and is often used as templates as described above. SPARSKIT contains conversion routines between 16 dierent storage formats, and has more than 200 routines for operating on sparse matrices such as matrix addition, reordering, iterative solution, matrix generation, and plotting. The tools work closely with matrices stored externally in the Harwell-Boeing format. Version 2 of SPARSKIT was recently released. P SPARSLIB is a parallel sparse matrix computations library. For generality, parallelism is extracted using a domain decomposition approach on the matrix rather than on the physical problem. The code is exible enough to handle, for example, overlapping domains. P SPARSLIB uses message passing and runs portably on top of PVM. This layered solution for portability takes advantage of future improvements in the underlying communication library or hardware. P SPARSLIB provides useful kernels and tools such as parallel sparse matrix-vector multiplication, parallel preconditioning, iterative solution of linear systems, partitioning, multicoloring, and reordering. 2 SPARSKIT Because of the complexity of sparse matrix routines, a common set of tools shared among researchers should dramatically reduce the time to implement sparse matrix research codes. SPARSKIT is a package developed for this purpose, providing routines such as extracting submatrices, matrix addition and multiplication, etc. The package also alleviates the problem of the wide variety of sparse matrix storage formats by providing conversion routines between them, and facilitates the exchange of data with the Harwell-Boeing format and through matrix generators. In the following, we briey describe each module of SPARSKIT. See [7] for a complete description. FORMATS This module contains two sets of routines. The rst set is composed of routines which convert the storage format of a matrix to and from the basic Compressed Sparse Row format. Thus one can translate between any of the supported formats with two transformations at the most. The formats currently supported are the following. DNS Dense format BND Linpack Banded format CSR Compressed Sparse Row format CSC Compressed Sparse Column format
3 SPARSKIT and P SPARSLIB 3 COO Coordinate format ELL Ellpack-Itpack generalized diagonal format DIA Diagonal format BSR Block Sparse Row format MSR Modied Compressed Sparse Row format SSK Symmetric Skyline format NSK Nonsymmetric Skyline format LNK Linked list storage format JAD Jagged Diagonal format SSS Symmetric Sparse Skyline format USS Unsymmetric Sparse Skyline format VBR Variable Block Row format The second set of routines contains a number of routines that perform simple manipulation functions on sparse matrices, such as extracting a particular diagonal, permuting a matrix, computing norms, or ltering out small elements. For reasons of space we cannot list these routines here. BLASSM This module contains a number of routines for performing basic linear algebra with sparse matrices. It is also composed of two sets of routines. The rst set consists of matrix-matrix operations (e.g., multiplication of matrices) and the second consists of matrix-vector operations. The rst set allows one to perform the following operations with sparse matrices, where A; B; C are sparse matrices, D is a diagonal matrix, and is a scalar: C = AB, C = A+B, C = A+B, C = A B T, C = A + B T, A = A + I, C = A + D. The second set contains various routines for performing matrix-vector products and solving sparse triangular linear systems in dierent storage formats. INOUT This module consists of routines to read and write matrices in the Harwell-Boeing format. For more information on this format and the Harwell-Boeing collection, see [2]. This module also provides routines for printing the pattern of the matrix in postscript, or simply dumping the nonzeros in a readable format. INFO The purpose of this module is to provide as many statistics as possible on a matrix with little cost. For example, the code analyzes diagonal dominance of the matrix (row and column), its degree of symmetry (structural as well as numerical), its block structure, its diagonal structure, etc. Functionality for estimating information about the spectrum of the matrix may be added later.
4 SPARSKIT and P SPARSLIB 4 MATGEN The set of routines in this module allows one to generate test matrices. There are generators for several dierent types of matrices: ve-point and seven-point matrices on rectangular regions discretizing a general elliptic partial dierential equation, block forms of these (several degrees of freedom per grid point in the PDE), nite elements matrices for the convectiondiusion problem using various domains (including user-provided ones), Markov chain matrices arising from a random walk on a triangular grid, and some others. ORDERINGS This module provides matrix reorderings based on level sets (including Cuthill- McKee implemented with breadth rst search), coloring (including a greedy algorithm for multicolor ordering), and strongly connected components. The latter two are useful for extracting parallelism from sparse matrices. ITSOL This module currently contains four preconditioners and nine Krylov-subspace iterative methods. The preconditioners are ILUT, a robust preconditioner which uses a dual threshold for dropping elements; ILUTP, a variant with column pivoting; ILU(0); MILU(0). The iterative solvers include popular ones such as CG, CGNR, BiCG, BiCGSTAB, TFQMR, and GMRES, and are implemented with reverse communication to make them independent of the matrix storage format and preconditioner. See Section 3.4 for more details. UNSUPP As suggested by its name, this module contains various unsupported software tools that are not necessarily portable or do not t in any of the previous modules. This module currently contains routines for plotting matrices and routines related to matrix exponentials. 3 P SPARSLIB Many sparse matrices arise from the discretization of partial dierential equations. In these applications, domain decomposition has been a successful general approach for extracting parallelism. In essence, the domain of interest is partitioned into a number of subdomains and some technique is used to recover the global solution. For generality, P SPARSLIB begins with a matrix rather than a dierential equation, and partitioning is performed on the adjacency graph of the matrix, which is the same as the discretization mesh if this was indeed the source of the matrix. However, this broader concept of `graph partitioning' does not use the geometric information that may be available in the problem, and which may be necessary when solving more dicult problems. General descriptions of data structures and algorithms in P SPARSLIB may be found in [4, 6]. P SPARSLIB has the structure shown in Figure 3.1. We have implemented a number of routines in each of the modules shown. P SPARSLIB uses message passing for exibility and in the gure, B-COMS is a temporary name for the communication library provided by the manufacturer, possibly augmented with a few high-level communication primitives required for sparse problems. P SPARSLIB, as well as SPARSKIT described above, is primarily geared towards iterative solvers because of the growing importance of these techniques and the limitation of direct
5 SPARSKIT and P SPARSLIB 5 Basic kernels B-COMS D-BLAS Preprocessing tools Partition, color, setup,... Matrix Primitives Matvec, Tsolve,... Preconditioners, D-ILU,D-SOR,.. ITERATIVE SOLVERS Figure 3.1 General block diagram of P SPARSLIB. solvers both in terms of their potential for high parallel eciency and in terms of their unmanageable requirements for realistic 3-dimensional problems. In the remainder of this paper, we will describe the key components of the parallel iterative solution of linear systems. These components are the matrix-vector product, the iterative solvers themselves, parallel preconditioning, and partitioning. Before we begin, we discuss the data structures used for distributed sparse matrices. 3.1 Distributed sparse matrices Assume that we have a convenient partitioning of the graph and, without any loss of generality, we can think of the matrix under consideration as originating from the discretization of a partial dierential equation on a certain domain as is illustrated in Figure 3.2. We need to set up a local data structure in each processor (or subdomain, or subgraph) which will allow us to perform basic operations such as global matrix-vector products and preconditioning operations. We will assume that the rows and associated unknowns are mapped to the same processor, i.e., the matrix is distributed row-wise to the processors according to the distribution of the variables. Note that if there is an obvious blocking which may come from several unknowns associated with the same grid-point, then this should be exploited and all the unknowns should be mapped together. In other words, in our mapping algorithms we should deal with the reduced adjacency graph corresponding to a physical grid rather than with the original adjacency graph. Another assumption we will make here is that the graph is undirected, i.e., the matrix has a symmetric pattern. This restriction is only made for simplicity and because we would like to use exchange of information across boundaries (swaps) rather than one-way sends and receives.
6 SPARSKIT and P SPARSLIB 6 Local interface points Internal points External interface points Figure 3.2 Decomposition of the domain (or adjacency graph) and classi- cation of nodal points. The rst part of the local data-structure consists of a list of all other processors with which a given processor must exchange information when performing matrix-vector products. Although the processors on this list are not necessarily physical neighbors, they hold subdomains that are adjacent to the subdomain that is mapped to them. The information needed to nd these neighboring processors is a global node-to-processor mapping, described by an array map, where map(j) is the processor to which node j is mapped. For simplicity we will assume for this description that there is no overlap, i.e., any node j belongs to only one processor, namely processor map(j). The local rows are inspected one by one and for each nonzero a ij with map(j) 6= myproc, where myproc is the label of the current processor, we add map(j) to the list of neighboring processors if it is not already listed. We store the labels of the neighboring processors in an array proc(1 : nproc) where nproc is the number of neighboring processors. In this initial phase, each processor myproc will also determine for each of its neighboring processors the list of nodes that are coupled with nodes of that processor. We refer to these nodes as local interface nodes. When performing a matrix-vector product, neighboring processors must exchange values of their adjacent interface nodes. In order to perform this data exchange operation eciently, it is important to group these nodes processor by processor. Thus, we list
7 SPARSKIT and P SPARSLIB 7 External nodes A loc Local nodes Aext Aext External nodes Figure 3.3 The distributed sparse matrix A. rst all those nodes that must be sent to proc(1), followed by those to be sent to proc(2), etc. Two arrays are used for this purpose, one called ix which lists the nodes as indicated above and a pointer array ipr which points to the beginning of the list for proc(i). Once the boundary exchange information is determined, we need to set up the distributed matrices in each processor, using a suitable data structure. In order to perform a matrix-vector product with a matrix that is distributed in the manner described earlier, we need to multiply the matrix consisting of rows that are local to a given processor by some global vector x. Some components of this vector will be local, and some components must be moved to the current processor for the operation to complete. Let x loc be the local components of vector x for a given processor and let x ext be the external components that are required. The vector x loc itself is composed of strictly internal nodes x int and boundary, or local interface nodes x bnd. Let A 0 be the local matrix, i.e., the rectangular matrix consisting of all the rows that are mapped to myproc. We will call A loc the `diagonal block' of A located in A 0, i.e., the submatrix of A 0 whose nonzero elements a ij are such that j is a local variable. Similarly, we will call A ext the `o-diagonal' block, i.e., the submatrix of A 0 whose nonzero elements a ij are such that j is not a local variable.
8 SPARSKIT and P SPARSLIB Matrix-vector product Let us consider the simple case illustrated in Figure 3.3, in which the rows are assigned to 5 given processors in block order. To perform a matrix-vector product, we start by multiplying the diagonal block A loc by the local variables. We then multiply A ext by the external variables. Notice that since the external interface points are not coupled with local internal points, only the rows corresponding to the boundary nodes in A ext will have nonzero elements. Thus, we can separate the matrix-vector product into two such operations, one involving only the local variables and the other involving external variables. We need to construct these two matrices and dene a local numbering of the local variables in order to perform the two matrixvector products eciently each time. For convenience the local interface points are labeled last in each processor. This is illustrated in Figure 3.4. A loc = Internal points (x int ) Local interface points (x bnd ) A ext = External interface matrix Figure 3.4 The local matrix data structure for each subdomain. The algorithm for matrix-vector product is then as follows: Algorithm 3.1 Distributed sparse matrix-vector product Exchange interface data. Scatter x bnd to neighbors and gather x ext from neighbors Local matrix-vector product: y = A loc x loc External matrix-vector product: y = y + A ext x ext
9 SPARSKIT and P SPARSLIB Graph partitioning We now outline an approach for partitioning the matrix graph vertices now that it is clear that to minimize communication costs, we need to minimize the number of neighbors of each subdomain and the number of interface points (or the number of edges connecting the domains). Load balancing by making the subdomains of roughly equal size or some other criteria is also necessary. For vertex partitioning when geometric information is not available, we are developing parallel algorithms based on the simultaneous level-set expansion from a set of center points, one for each subdomain [3]. This procedure is inherently parallel, but requires unbiased arbitration in case of conicts. The challenging aspect of the method is determining a good set of center points, and for this we have developed heuristic algorithms using cost functions. From an initial, perhaps random set of centers, the cost of this set of centers is computed, for example, as the sum of the inverse of the distances between the centers. The centers are then moved in a way so that the cost is decreased. When the cost cannot be further decreased, the algorithm terminates. Many criteria such as those described at the beginning of this section may be built into the cost function. We do not consider the mapping of subdomains to processors, since this is obviously architecture dependent, and many computer manufacturers are striving to make the dierence between best-case and worst-case communication as small as possible. 3.4 Iterative solvers The iterative solvers module in P SPARSLIB is the same as that in SPARSKIT. We note that a exible variant of GMRES called FGMRES [5] that allows the preconditioner to change at each step is especially useful for parallel computation, as will be seen in Section 3.5. The iterative methods will not be described here, except to say that the four basic operations required are the matrix-vector product, the preconditioning operation, SAXPY, and dot product. The matrixvector product was described in Section 3.2, and the preconditioning operation will be described in Section 3.5. SAXPY involves no communication if the vectors are partitioned the same way across the processors, and the dot product is a global reduction operation, often provided by the hardware or underlying software. The iterative solvers are implemented in a way so that they are independent of: 1. the environment, whether it is parallel or serial 2. the storage format of the matrix and the preconditioner 3. the preconditioning operation This could be achieved, for example, by using callback functions for the four basic operations described above. However, this is not entirely exible for the matrix-vector product and preconditioning operations without using global parameters, since the callback functions must have their calling sequences xed beforehand. P SPARSLIB uses a reverse-communication mechanism for these two operations to achieve the same eect. Here, the iterative solver exits back to the caller and indicates through an output variable which operation needs to be performed on its output vector. A typical code follows. See [8] for more details.
10 SPARSKIT and P SPARSLIB 10 icode = 0 1 continue call fgmres(n,im,rhs,sol,i,vv,w,wk1,wk2,eps,maxits,iout,icode) if (icode.eq. 1) then call precon(n,wk1,wk2) goto 1 else if (icode.eq. 2) then call matvec(n,wk1,wk2) goto 1 endif 3.5 Distributed preconditioners Krylov subspace methods generally work very poorly if no preconditioning is used. Unfortunately, the traditional and most eective preconditioners are extremely dicult to parallelize. In this section, we describe multicolor SOR and a new technique based on approximate inverses. We begin by describing multicoloring, a useful tool for extracting parallelism from sparse matrices. Multicoloring assigns each subdomain a `color' such that no two adjacent subdomains have the same color. The standard heuristic method to multicolor a graph is to rst select an order in which to color the nodes. Consider the natural order 1; 2; : : :; n. If we want to execute this algorithm in a parallel environment, then we observe that a given node never needs to examine the nodes whose labels are larger than its own label. Although this seems to establish a sequential procedure, there is actually a substantial amount of parallelism because of the sparsity of the graph. Typically, the degree of parallelism is of the order of the diameter of the graph. Since the coloring process is essentially a preprocessing task, this amount of parallelism is sucient for most practical situations, even for a large number of processors. A parallel implementation, with each processor holding one domain, would be as follows: Algorithm 3.2 Parallel multicoloring Start with one colored region From each neighbor proc(k) with proc(k) < myproc do receive color(proc(k)) Compute color(myproc) = minfvalid colorsg To each neighbor proc(k) with proc(k) > myproc do send color(myproc) Once the subdomains have been colored, one can use a form of multicolor SOR or SSOR relaxation as a preconditioner. In each processor, a one-step SOR iteration takes the form: Algorithm 3.3 Multicolor SOR preconditioning For k = 1; 2; : : :; ncolrs do Exchange interface values If (k = mycol) then x loc := x loc +!A?1 loc [b loc? A ext x ext ]
11 SPARSKIT and P SPARSLIB 11 enddo endif In this algorithm, myproc represents the processor number in which the code is being executed, and mycol is the color of myproc. An s-step SOR iteration would simply consist of adding an outer loop to the above algorithm. The solution of a system with the local matrix A loc indicated by the A?1 loc operation in the above algorithm is typical of parallel preconditioners such as this one and, for example, block Jacobi. A preconditioned iterative method may be used, such as ILUT preconditioned GMRES, which is available in SPARSKIT. Since the preconditioning operation is an iterative process, FGMRES is required for the outer iterations. Sparse direct solution methods are also useful in this case. Since the local systems A loc are small and must be solved with several right-hand-sides, it may be worthwhile to factor A loc exactly. Another class of preconditioner are explicit: these use matrix-vector multiplication as the preconditioning step, which can be parallelized as described in Section 3.2. We have investigated approaches to approximating the inverse of the system matrix A directly by minimizing the Frobenius norm of the residual matrix F (M) = ki? AM k F where M is the approximate right inverse. In practice, this is achieved by solving approximately Am j = e j ; j = 1; : : :; n where m j is the j-th column of M and e j is the j-th coordinate vector. The approximate solution of these linear systems is an iterative procedure itself and thus involves the operations already described. An important dierence is that we must keep M sparse, and the Krylov basis which is used to construct the solution is also kept sparse. If it is possible for each processor to access the entire matrix A, then an alternative parallel implementation is to compute the n individual columns of M simultaneously, with no communication required. We have found that this approximate inverse preconditioner is advantageous in many cases where the matrix A is nonsymmetric or indenite [1]. 4 Conclusion We have described some implementations and algorithms for basic sparse matrix computations which may be used as part of a library, or because of its simple and open design, may be used as templates for the development of research or application codes. We have argued that a portable parallel sparse matrix computations package is timely, although it may not immediately be extremely ecient. Such a code, for example, may be used as a benchmark for testing the suitability of various parallel architectures for sparse matrix computations. Acknowlegements The authors wish to thank Sandra Carney, Todd Goehring, Kesheng Wu, and Mike Heroux for many helpful discussions.
12 SPARSKIT and P SPARSLIB 12 References [1] E. Chow and Y. Saad. Approximate inverse preconditioners for general sparse matrices, UMSI 94/101, Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, Minnesota, [2] I. S. Du, R. G. Grimes and J. G. Lewis. Users' guide for the Harwell-Boeing sparse matrix collection. TR/PA/92/86, CERFACS, Toulouse, [3] T. Goehring and Y. Saad. Heuristic algorithms for automatic graph partitioning, UMSI 94/29, Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, Minnesota, [4] Y. Saad. Data structures and algorithms for domain decomposition and distributed sparse matrix computations. In preparation, Army High Performance Computing Research Center, Minneapolis, Minnesota, [5] Y. Saad. A exible inner-outer preconditioned GMRES algorithm. SIAM J. Sci. Comput., 14 (1993), pp [6] Y. Saad. Krylov subspace methods in distributed computing environments. AHPCRC , Army High Performance Computing Research Center, Minneapolis, Minnesota, [7] Y. Saad. SPARSKIT: a basic tool kit for sparse matrix computations, Version 2. Manuscript, University of Minnesota, Minneapolis, Minnesota, [8] Y. Saad and K. Wu. Parallel sparse matrix library (P SPARSLIB): the iterative solvers module, AHPCRC , Army High Performance Computing Research Center, Minneapolis, Minnesota, 1994.
Nonsymmetric Problems. Abstract. The eect of a threshold variant TPABLO of the permutation
Threshold Ordering for Preconditioning Nonsymmetric Problems Michele Benzi 1, Hwajeong Choi 2, Daniel B. Szyld 2? 1 CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France (benzi@cerfacs.fr) 2 Department
More informationSparse Matrix Libraries in C++ for High Performance. Architectures. ferent sparse matrix data formats in order to best
Sparse Matrix Libraries in C++ for High Performance Architectures Jack Dongarra xz, Andrew Lumsdaine, Xinhui Niu Roldan Pozo z, Karin Remington x x Oak Ridge National Laboratory z University oftennessee
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26
More informationData Structures for sparse matrices
Data Structures for sparse matrices The use of a proper data structures is critical to achieving good performance. Generate a symmetric sparse matrix A in matlab and time the operations of accessing (only)
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationParallel resolution of sparse linear systems by mixing direct and iterative methods
Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationA parallel direct/iterative solver based on a Schur complement approach
A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse
More informationContents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationBLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker
BLAS and LAPACK + Data Formats for Sparse Matrices Part of the lecture Wissenschaftliches Rechnen Hilmar Wobker Institute of Applied Mathematics and Numerics, TU Dortmund email: hilmar.wobker@math.tu-dortmund.de
More informationIterative Solver Benchmark Jack Dongarra, Victor Eijkhout, Henk van der Vorst 2001/01/14 1 Introduction The traditional performance measurement for co
Iterative Solver Benchmark Jack Dongarra, Victor Eijkhout, Henk van der Vorst 2001/01/14 1 Introduction The traditional performance measurement for computers on scientic application has been the Linpack
More informationParallel Threshold-based ILU Factorization
A short version of this paper appears in Supercomputing 997 Parallel Threshold-based ILU Factorization George Karypis and Vipin Kumar University of Minnesota, Department of Computer Science / Army HPC
More informationSELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND
Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationTechnical Report TR , Computer and Information Sciences Department, University. Abstract
An Approach for Parallelizing any General Unsymmetric Sparse Matrix Algorithm Tariq Rashid y Timothy A.Davis z Technical Report TR-94-036, Computer and Information Sciences Department, University of Florida,
More informationSmall Matrices fit into cache. Large Matrices do not fit into cache. Performance (MFLOPS) Performance (MFLOPS) bcsstk20 blckhole e05r0000 watson5
On Improving the Performance of Sparse Matrix-Vector Multiplication James B. White, III P. Sadayappan Ohio Supercomputer Center Ohio State University Columbus, OH 43221 Columbus, OH 4321 Abstract We analyze
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationA NEW MIXED PRECONDITIONING METHOD BASED ON THE CLUSTERED ELEMENT -BY -ELEMENT PRECONDITIONERS
Contemporary Mathematics Volume 157, 1994 A NEW MIXED PRECONDITIONING METHOD BASED ON THE CLUSTERED ELEMENT -BY -ELEMENT PRECONDITIONERS T.E. Tezduyar, M. Behr, S.K. Aliabadi, S. Mittal and S.E. Ray ABSTRACT.
More informationA Compiler for Parallel Finite Element Methods. with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS
Contemporary Mathematics Volume 00, 0000 A Compiler for Parallel Finite Element Methods with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS December 11, 1993 Abstract.
More informationDEVELOPMENT OF A RESTRICTED ADDITIVE SCHWARZ PRECONDITIONER FOR SPARSE LINEAR SYSTEMS ON NVIDIA GPU
INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, SERIES B Volume 5, Number 1-2, Pages 13 20 c 2014 Institute for Scientific Computing and Information DEVELOPMENT OF A RESTRICTED ADDITIVE SCHWARZ
More informationChapter 4. Matrix and Vector Operations
1 Scope of the Chapter Chapter 4 This chapter provides procedures for matrix and vector operations. This chapter (and Chapters 5 and 6) can handle general matrices, matrices with special structure and
More informationThe Matrix Market Exchange Formats:
NISTIR 5935 The Matrix Market Exchange Formats: Initial Design Ronald F. Boisvert Roldan Pozo Karin A. Remington U. S. Department of Commerce Technology Administration National Institute of Standards and
More informationHIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach
HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach Mini-workshop PHyLeaS associated team J. Gaidamour, P. Hénon July 9, 28 HIPS : an hybrid direct/iterative solver /
More informationarxiv: v1 [cs.ms] 2 Jun 2016
Parallel Triangular Solvers on GPU Zhangxin Chen, Hui Liu, and Bo Yang University of Calgary 2500 University Dr NW, Calgary, AB, Canada, T2N 1N4 {zhachen,hui.j.liu,yang6}@ucalgary.ca arxiv:1606.00541v1
More informationLecture 17: More Fun With Sparse Matrices
Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you
More informationTHE application of advanced computer architecture and
544 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 45, NO. 3, MARCH 1997 Scalable Solutions to Integral-Equation and Finite-Element Simulations Tom Cwik, Senior Member, IEEE, Daniel S. Katz, Member,
More informationScaling Strategy. Compress. Strategy. Background/ Scaling. Expose. Code Inputs User Interface Inputs Matrix Data Inputs Outputs.
EMILY: A VISUALIZATION TOOL FOR LARGE SPARSE MATRICES T. LOOS AND R. BRAMLEY Abstract. A visualization tool for large sparse matrices and its usage is described. Because such matrices come from a wide
More informationSparse Linear Systems
1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite
More informationCSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices
CSE 599 I Accelerated Computing - Programming GPUS Parallel Pattern: Sparse Matrices Objective Learn about various sparse matrix representations Consider how input data affects run-time performance of
More informationStorage Formats for Sparse Matrices in Java
Storage Formats for Sparse Matrices in Java Mikel Luján, Anila Usman, Patrick Hardie, T.L. Freeman, and John R. Gurd Centre for Novel Computing, The University of Manchester, Oxford Road, Manchester M13
More informationSolving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems
AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination
More information1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3
6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require
More informationLecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1
CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the
More informationMatrices. D. P. Koester, S. Ranka, and G. C. Fox. The Northeast Parallel Architectures Center (NPAC) Syracuse University
Parallel LU Factorization of Block-Diagonal-Bordered Sparse Matrices D. P. Koester, S. Ranka, and G. C. Fox School of Computer and Information Science and The Northeast Parallel Architectures Center (NPAC)
More informationCS 542G: Solving Sparse Linear Systems
CS 542G: Solving Sparse Linear Systems Robert Bridson November 26, 2008 1 Direct Methods We have already derived several methods for solving a linear system, say Ax = b, or the related leastsquares problem
More informationFlow simulation. Frank Lohmeyer, Oliver Vornberger. University of Osnabruck, D Osnabruck.
To be published in: Notes on Numerical Fluid Mechanics, Vieweg 1994 Flow simulation with FEM on massively parallel systems Frank Lohmeyer, Oliver Vornberger Department of Mathematics and Computer Science
More informationPARDISO Version Reference Sheet Fortran
PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly
More informationWei Shu and Min-You Wu. Abstract. partitioning patterns, and communication optimization to achieve a speedup.
Sparse Implementation of Revised Simplex Algorithms on Parallel Computers Wei Shu and Min-You Wu Abstract Parallelizing sparse simplex algorithms is one of the most challenging problems. Because of very
More informationDistributed Schur Complement Solvers for Real and Complex Block-Structured CFD Problems
Distributed Schur Complement Solvers for Real and Complex Block-Structured CFD Problems Dr.-Ing. Achim Basermann, Dr. Hans-Peter Kersken German Aerospace Center (DLR) Simulation- and Software Technology
More informationOptimizing the operations with sparse matrices on Intel architecture
Optimizing the operations with sparse matrices on Intel architecture Gladkikh V. S. victor.s.gladkikh@intel.com Intel Xeon, Intel Itanium are trademarks of Intel Corporation in the U.S. and other countries.
More information1 INTRODUCTION The LMS adaptive algorithm is the most popular algorithm for adaptive ltering because of its simplicity and robustness. However, its ma
MULTIPLE SUBSPACE ULV ALGORITHM AND LMS TRACKING S. HOSUR, A. H. TEWFIK, D. BOLEY University of Minnesota 200 Union St. S.E. Minneapolis, MN 55455 U.S.A fhosur@ee,tewk@ee,boley@csg.umn.edu ABSTRACT. The
More informationChapter 3 SPARSE MATRICES. 3.1 Introduction
Chapter 3 SPARSE MATRICES As described in the previous chapter, standard discretizations of Partial Differential Equations typically lead to large and sparse matrices. A sparse matrix is defined, somewhat
More informationFigure 6.1: Truss topology optimization diagram.
6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.
More informationImproving Performance of Sparse Matrix-Vector Multiplication
Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science and Center of Simulation of Advanced Rockets University of Illinois at Urbana-Champaign
More informationSequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices
Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices Nerma Baščelija Sarajevo School of Science and Technology Department of Computer Science Hrasnicka Cesta 3a, 71000 Sarajevo
More informationPorting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation
Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA NVIDIA Corporation Outline! Overview of CG benchmark! Overview of CUDA Libraries! CUSPARSE! CUBLAS! Porting Sequence! Algorithm Analysis! Data/Code
More informationChapter 1. Reprinted from "Proc. 6th SIAM Conference on Parallel. Processing for Scientic Computing",Norfolk, Virginia (USA), March 1993.
Chapter 1 Parallel Sparse Matrix Vector Multiplication using a Shared Virtual Memory Environment Francois Bodin y Jocelyne Erhel y Thierry Priol y Reprinted from "Proc. 6th SIAM Conference on Parallel
More informationHypergraph Partitioning for Parallel Iterative Solution of General Sparse Linear Systems
Hypergraph Partitioning for Parallel Iterative Solution of General Sparse Linear Systems Masha Sosonkina Bora Uçar Yousef Saad February 1, 2007 Abstract The efficiency of parallel iterative methods for
More informationImplicit schemes for wave models
Implicit schemes for wave models Mathieu Dutour Sikirić Rudjer Bo sković Institute, Croatia and Universität Rostock April 17, 2013 I. Wave models Stochastic wave modelling Oceanic models are using grids
More informationPreconditioner updates for solving sequences of linear systems in matrix-free environment
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2000; 00:1 6 [Version: 2002/09/18 v1.02] Preconditioner updates for solving sequences of linear systems in matrix-free environment
More informationSparse Matrices and Graphs: There and Back Again
Sparse Matrices and Graphs: There and Back Again John R. Gilbert University of California, Santa Barbara Simons Institute Workshop on Parallel and Distributed Algorithms for Inference and Optimization
More informationΩ2
CACHE BASED MULTIGRID ON UNSTRUCTURED TWO DIMENSIONAL GRIDS CRAIG C. DOUGLAS, JONATHAN HU y, ULRICH R UDE z, AND MARCO BITTENCOURT x. Abstract. High speed cache memory is commonly used to address the disparity
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.3 Iterative Methods Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign
More informationPerformance Evaluation of a New Parallel Preconditioner
Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller October 994 CMU-CS-94-25 Marco Zagha School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 This
More informationMemory Hierarchy Management for Iterative Graph Structures
Memory Hierarchy Management for Iterative Graph Structures Ibraheem Al-Furaih y Syracuse University Sanjay Ranka University of Florida Abstract The increasing gap in processor and memory speeds has forced
More informationIterative Sparse Triangular Solves for Preconditioning
Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations
More informationPreconditioning for linear least-squares problems
Preconditioning for linear least-squares problems Miroslav Tůma Institute of Computer Science Academy of Sciences of the Czech Republic tuma@cs.cas.cz joint work with Rafael Bru, José Marín and José Mas
More informationAccelerating a Simulation of Type I X ray Bursts from Accreting Neutron Stars Mark Mackey Professor Alexander Heger
Accelerating a Simulation of Type I X ray Bursts from Accreting Neutron Stars Mark Mackey Professor Alexander Heger The goal of my project was to develop an optimized linear system solver to shorten the
More informationPROGRESS IN NEWTON-KRYLOV METHODS FOR AERODYNAMIC CALCULATIONS. University of Toronto Institute for Aerospace Studies
PROGRESS IN NEWTON-KRYLOV METHODS FOR AERODYNAMIC CALCULATIONS Alberto Pueyo David W. Zingg y University of Toronto Institute for Aerospace Studies 4925 Duerin Street, Downsview, Ontario M3H 5T6 Canada
More informationGraphBLAS Mathematics - Provisional Release 1.0 -
GraphBLAS Mathematics - Provisional Release 1.0 - Jeremy Kepner Generated on April 26, 2017 Contents 1 Introduction: Graphs as Matrices........................... 1 1.1 Adjacency Matrix: Undirected Graphs,
More informationNAG Fortran Library Routine Document F11DSF.1
NAG Fortran Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised terms and other implementation-dependent
More information2 do i = 1,n sum = 0.0D0 do j = rowptr(i), rowptr(i+1)-1 sum = sum + a(jp) * x(colind(jp)) end do y(i) = sum end do Fig. 1. A sparse matrix-vector mul
Improving Memory-System Performance of Sparse Matrix-Vector Multiplication Sivan Toledo y Abstract Sparse matrix-vector multiplication is an important kernel that often runs ineciently on superscalar RISC
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationEfficient Minimization of New Quadric Metric for Simplifying Meshes with Appearance Attributes
Efficient Minimization of New Quadric Metric for Simplifying Meshes with Appearance Attributes (Addendum to IEEE Visualization 1999 paper) Hugues Hoppe Steve Marschner June 2000 Technical Report MSR-TR-2000-64
More informationOutline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency
1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming
More informationHighly Parallel Multigrid Solvers for Multicore and Manycore Processors
Highly Parallel Multigrid Solvers for Multicore and Manycore Processors Oleg Bessonov (B) Institute for Problems in Mechanics of the Russian Academy of Sciences, 101, Vernadsky Avenue, 119526 Moscow, Russia
More informationMathematics and Computer Science
Technical Report TR-2006-010 Revisiting hypergraph models for sparse matrix decomposition by Cevdet Aykanat, Bora Ucar Mathematics and Computer Science EMORY UNIVERSITY REVISITING HYPERGRAPH MODELS FOR
More informationnag sparse nsym sol (f11dec)
f11 Sparse Linear Algebra f11dec nag sparse nsym sol (f11dec) 1. Purpose nag sparse nsym sol (f11dec) solves a real sparse nonsymmetric system of linear equations, represented in coordinate storage format,
More informationAn Ecient Parallel Algorithm. for Matrix{Vector Multiplication. Albuquerque, NM 87185
An Ecient Parallel Algorithm for Matrix{Vector Multiplication Bruce Hendrickson 1, Robert Leland 2 and Steve Plimpton 3 Sandia National Laboratories Albuquerque, NM 87185 Abstract. The multiplication of
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationOptimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning
Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning Michael M. Wolf 1,2, Erik G. Boman 2, and Bruce A. Hendrickson 3 1 Dept. of Computer Science, University of Illinois at Urbana-Champaign,
More informationSCALABLE ALGORITHMS for solving large sparse linear systems of equations
SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational
More informationNative mesh ordering with Scotch 4.0
Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse
More informationA High Performance Sparse Cholesky Factorization Algorithm For. University of Minnesota. Abstract
A High Performance Sparse holesky Factorization Algorithm For Scalable Parallel omputers George Karypis and Vipin Kumar Department of omputer Science University of Minnesota Minneapolis, MN 55455 Technical
More informationXinyu Dou Acoustics Technology Center, Motorola, Inc., Schaumburg, Illinois 60196
A unified boundary element method for the analysis of sound and shell-like structure interactions. II. Efficient solution techniques Shaohai Chen and Yijun Liu a) Department of Mechanical Engineering,
More informationFOR P3: A monolithic multigrid FEM solver for fluid structure interaction
FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany
More informationParallel ILU Ordering and Convergence Relationships: Numerical Experiments
NASA/CR-00-2119 ICASE Report No. 00-24 Parallel ILU Ordering and Convergence Relationships: Numerical Experiments David Hysom and Alex Pothen Old Dominion University, Norfolk, Virginia Institute for Computer
More informationESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report
ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new
More informationpaper, we focussed on the GMRES(m) which is improved GMRES(Generalized Minimal RESidual method), and developed its library on distributed memory machi
Performance of Automatically Tuned Parallel GMRES(m) Method on Distributed Memory Machines Hisayasu KURODA 1?, Takahiro KATAGIRI 12, and Yasumasa KANADA 3 1 Department of Information Science, Graduate
More informationPerformance Evaluation of a New Parallel Preconditioner
Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The
More information(recursive) `Divide and Conquer' strategies hierarchical data and solver structures, but also hierarchical (!) `matrix structures' ScaRC as generaliza
SOME BASIC CONCEPTS OF FEAST M. Altieri, Chr. Becker, S. Kilian, H. Oswald, S. Turek, J. Wallis Institut fur Angewandte Mathematik, Universitat Heidelberg Im Neuenheimer Feld 294, 69120 Heidelberg, Germany
More informationMinimal Equation Sets for Output Computation in Object-Oriented Models
Minimal Equation Sets for Output Computation in Object-Oriented Models Vincenzo Manzoni Francesco Casella Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza Leonardo da Vinci 3, 033
More informationSparse Matrices Introduction to sparse matrices and direct methods
Sparse Matrices Introduction to sparse matrices and direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections,
More informationAim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview
Aim Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity Julian Hall School of Mathematics University of Edinburgh jajhall@ed.ac.uk What should a 2-hour PhD lecture on structure
More information8. Hardware-Aware Numerics. Approaching supercomputing...
Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 48 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum
More information8. Hardware-Aware Numerics. Approaching supercomputing...
Approaching supercomputing... Numerisches Programmieren, Hans-Joachim Bungartz page 1 of 22 8.1. Hardware-Awareness Introduction Since numerical algorithms are ubiquitous, they have to run on a broad spectrum
More informationParallel Implementations of Gaussian Elimination
s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n
More informationSparse Linear Algebra
Lecture 5 Sparse Linear Algebra The solution of a linear system Ax = b is one of the most important computational problems in scientific computing. As we shown in the previous section, these linear systems
More informationNetwork. Department of Statistics. University of California, Berkeley. January, Abstract
Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,
More informationUsing Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning
Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning Edmond Chow a,, Hartwig Anzt b,c, Jennifer Scott d, Jack Dongarra c,e,f a School of
More informationNAG Library Function Document nag_sparse_nsym_sol (f11dec)
f11 Large Scale Linear Systems NAG Library Function Document nag_sparse_nsym_sol () 1 Purpose nag_sparse_nsym_sol () solves a real sparse nonsymmetric system of linear equations, represented in coordinate
More informationSivan Toledo Coyote Hill Road. Palo Alto, CA November 25, Abstract
Improving Memory-System Performance of Sparse Matrix-Vector Multiplication Sivan Toledo Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 9434 November 25, 1996 Abstract Sparse Matrix-Vector
More informationS0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS
S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS John R Appleyard Jeremy D Appleyard Polyhedron Software with acknowledgements to Mark A Wakefield Garf Bowen Schlumberger Outline of Talk Reservoir
More informationCpu time [s] BICGSTAB_RPC CGS_RPC BICGSTAB_LPC BICG_RPC BICG_LPC LU_NPC
Application of Non-stationary Iterative Methods to an Exact Newton-Raphson Solution Process for Power Flow Equations Rainer Bacher, Eric Bullinger Swiss Federal Institute of Technology (ETH), CH-809 Zurich,
More informationChallenges and Advances in Parallel Sparse Matrix-Matrix Multiplication
Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication Aydin Buluc John R. Gilbert University of California, Santa Barbara ICPP 2008 September 11, 2008 1 Support: DOE Office of Science,
More informationTHE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS
ITERATIVE LIEAR SOLVERS. Objectives The goals of the laboratory workshop are as follows: to learn basic properties of iterative methods for solving linear least squares problems, to study the properties
More informationA Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography
1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography
More information