Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
|
|
- Virgil Patterson
- 5 years ago
- Views:
Transcription
1 Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) Sparse matrices and storage schemes (formats) Parallel algorithms for basic sparse operations Inner product (sdot) Matrix-vector multiply (structured and unstructured sparseness) Iterative methods for solving Ax = b parallel aspects Jacobi, Gauss-Seidel, conjugate gradients (CG), [preconditioning] Ordering and reordering of rows in sparse A and grid graphs Things that are not covered: Direct methods for sparse systems (+robust and general, -more memory) Reordering (faster and more accurate/stable solution) Symbolic factorization (set up the data structure for the matrix factorization, allocates memory, fill-in) Numeric factorization Parallel algorithms for sparse systems Ax = b 3 Discretized domain a metal sheet 4 Physical system represented by a mathematical model (e.g. PDE) Continuous domain (surface, space etc) discretized by a mesh finite number of grid points Relations between variables in the discretized model only give a small number of nonzero elements in the system matrices Computations are much more efficient and require much less storage space if the sparseness is utilized In addition, the memory requirements for storing the sparse matrices as dense matrices (nxn, typically) do not exist The metal sheet s surface temperature is modeled by computing its values at grid points 0-48.
2 Storage schemes for sparse matrices: Coordinate format 5 Compressed sparse row (CSR) format 6 q = # nonzero el. << n 2 Nonzero elements arbitrary order qx1 Row coordinates Nonzero elements in row order Corresponding column index nxn Column coordinates Pointer to i:th rows first el. in VAL and J Also compressed sparse column (CSC) rows and colums exchange roles in the storage layout (also called Harwell-Boeing-format) Diagonal storage format d = #diagonals = 4 Nonzero elements stored diagonalwise in each column Distance to main diagonal 7 Ellpack-Itpack format (E-I format) Good when m = max(#nz) in any row is not much larger mean(#nz) per row Nonzero elements stored row-by-row Corresponding column index 8 nxd nxm dx1-1 signals end of row
3 Jagged-diagonal format 9 Parallel dot (inner) product 10 First: Matrix rows are ordered in the decreasing number of nonzero entries Common op. in algorithms for sparse matrices (vectors are dense!) If x and y (n x 1) are uniformly partitioned among p processors, each processor performs n/p mults (*) and n/p-1 adds (+) followed by a global sum T p = 2n/p + (t s )log p Nonzero elements of (b)-matrix in jagged diagonal - order qx1 (q = # nz) Column index Pointer to start index for each diagonal Sparse matrix-vector multipy (GEMV) 11 Block-tridiagonal matrices Laplace PDE 12 y = Ax, A sparse nxn, x dense nx1 y dense Most costly operation in most (iterative) algorithms for solving linear systems of equations Examples of sparse matrix structures: a few diagonals close to the main diagonal unstructured (almost random location of matrix elements) band matrices: nonzero elements confined in a band around the main diagnal (but can be unstructured within the band) symmetric Make use of the matrix structure(s) if possible! Assume discretized mesh for PDE (Laplace) Grid points numbered row-by-row from 0 to n-1 Finite difference approximation of the derivatives of u(x,y): In general for row i : a i x[i n] + b i x[i 1] + c i x[i] + d i x[i + 1] + e i x[i + n] = f i Coeff s with index i represent elements in matrix A (5 el s per row at most) Vector x[0 : n-1] keeps approximations to u(x,y) for the n grid points.
4 Block-tridiagonal matrices Laplace example n = #grid points = 16 block size = n x n c 0 d 0 e 0 b 1 c 1 d 1 e 1 b 2 c 2 d 2 e 2 b 3 c 3 e 3 a 4 c 4 d 4 e 4 a 5 b 5 c 5 d 5 e 5 a 6 b 6 c 6 d 6 e 6 a 7 b 7 c 7 e 7 a 8 c 8 d 8 a 9 b 9 c 9 d 9 e 9 a 10 b 10 c 10 d 10 c 10 a 11 b 11 c 11 c 11 a 12 c 12 d 12 a 13 b 13 c 13 d 13 a 14 b 14 c 14 d 14 a 15 b 15 c 15 n blocks on main diagonal; n 1 blocks on sub- och sup-diagonals How do we do parallel matrix-vector multiply with this structure? e 8 13 Matrix-vector mult. for block-tridiagonal matrix y = A x, diagonal storage for A Assume block-striped partitioning of A and x, p n (# elems per proc n/p > n) Each row in A requires 5 vector-elements x[i] for its subcalculation Elements x[i] on main diagonal placed right same index i - trivial parallelization [ Elements x[i-1],x[i+1] on sub- and super-diagonals placed at proc. neighbors comm. cost 2(t s ) ] only if p > n! Elements on distant diagonals (+/- n) must also be exchanged comm. cost 2(t s n) Computation cost: 5t a *n/p T p = 5t a n/p + 2(t s n) Isoefficiency function: Θ(p²) No comm. needed Comm. with nearest neighbor 14 Better partitioning of block-tridiagonal matrix 15 Matrix-vector mult. for unstructured matrix 16 For p > n the following partition of grid is much better (n = 36, p = 9): nxn matrix, m = avg(#nz)/row mn/p el s / proc Matrix row el s which belong to points within a given partition are on the same proc. similar with the vector (n/p) (= 2) block of (n/p) rows (+ vector-element) Vector elements corresponding to boundary points of the partition are exchanged with logical neighbors: 4(t s (n/p)) T p = 5t a n/p + 4(t s (n/p)) Iso. eff. func: Θ(p²) E-I: Row blocking of VAL and J T p = t a mn/p + t s log p + t w n = Θ(T s )! Similar problem!!
5 Faster algorithm for unstructured matrices 17 Sparse matrix and its graph representation 18 2D-blocking of A Vector belongs to last proc.-column Alignment-operation of x One-to-all broadcast of x in processor columns Single-node accumulation of sub-results T p = t a mn/p + t s log p + (3/2)t w nlog p/ p Θ(T s ) Not very scalable or cost optimal! Dependences between matrix elements are shown by the adjacency-matrix graph In order to minimize communication the grah is partitioned using good heuristics No details here!!! For better algorithms some structure is required (e.g. symmetry) Matrix-vector mult. - unstructured band matrix Iterative methods for solving Ax = b w band width Ellpack-Itpack Exchange of vector- elements needed for doing the computations T p = t a mn/p + t s wp/n + t w w Cost optimal for p = O(mn/w) Conclusion: worse scalability for large band width! Iterative methods for solving Ax = b start with an initial guess x 0 and generate a sequence of approximations x k to the solution x In each iteration the matrix A is used in one or several matrix-vector multiply operations (sparse GEMV) #iterations to solve the problem depends on the method used, properties of A and required accuracy of the solution In practice, iteration terminates when the residual norm(b Ax k ), or some other measure of error, is as small as desired (<=tol) Other common operations: inner products (sdot), saxpy Performance analysis is typically done per iteration Unlike direct methods for solving Ax = b (LU), no fill is incurred
6 Jacobi method 21 Parallell Jacobi 22 Consider A = D + M, where D is diagonal and M the rest. Jacobis method: x k+1 = - D -1 (Mx k + b) Strictly diagonal dominance of A: a ii > sum( a ij ), i <> j, for all i, guarantees convergence Requires communication! Performed in parallel without communication!!! Requires communication! Parallel Gauss-Seidel 23 The row ordering impacts on the dependences 24 Intuitively in Jacobi: computation of x must be done sequentially since x[i] is dependent on x[0], x[1]...,x[i-1] If A is sparse there is not dependences to all preceding x-values In Jacobi: computation of x[i] depends on x[i-1] and x[i- n] Since x[i-1] is computed in the iteration before x[i] these computations cannot be parallelized (depending on the row ordering!) Solution: compute independent x-values (non-neighbors) in //! if A[i,j] = 0 then x[i] has no dependency to x[j] x[i] can be computed as soon as all x[j] for j < i and A[i,j] 0 have been computed several x-values can be computed in parallel Gauss-Seidel reuses new values as soon as they have been computed and performs two Jacobi-steps in each iteration: A = D L - M, D = diagonal, L = strict lower triangular, M = rest x k+1 = (D - L) -1 (Mx k + b) c 0 d 0 e 0 b 1 c 1 d 1 e 1 b 2 c 2 d 2 e 2 b 3 c 3 e 3 a 4 c 4 d 4 e 4 a 5 b 5 c 5 d 5 e 5 a 6 b 6 c 6 d 6 e 6 a 7 b 7 c 7 e 7 a 8 c 8 d 8 e 8 a 9 b 9 c 9 d 9 e 9 a 10 b 10 c 10 d 10 c 10 a 11 b 11 c 11 c 11 a 12 c 12 d 12 a 13 b 13 c 13 d 13 a 14 b 14 c 14 d 14 a 15 b 15 c 15
7 Red-black ordering Gauss-Seidel uses implicit redblack reordering first are red points computed, then black points. 25 Multi-colored ordering of matrices 26 Generalization of Gauss-Seidel Conjugate gradient (CG) method Parallel conjugate gradient (PCG) method Most used method for iterative solution of Ax = b when A is symmetric (A = A T ) and positive definite (x T Ax > 0, for all vectors x <> 0) Finds the minimum of q(x) = (1/2) x T Ax - x T b Ingredients in PCG: SAXPY (single prec. ax plus y) Inner-products Matrix-vector multiply operations Solution of linear systems (preconditioned CG) Gradient (derivative) to q(x) is Ax - b ( =0 i min-point) In iteration k the search direction p k and a step length σ k which minimizes q along p k are computed Parallelization done with methods mentioned (or not mentioned!) New x-vector is computed: x k = x k-1 + σ k p k Residual is updated: r k = r k-1 - σ k A p k The iteration is finished when the residual is small enough!
8 Finite element method (FEM) 29 Stiffness matrix 30 Compute approximate numerical solutions to PDEs over a discretized domain Unlike the finite difference (FD) grid, a grid point exchanges information with all grid points with which it shares an element (in total 9 including itself) Stiffness matrix A derived by computing a set of integrals over the elements of the finite element graph (A[i,j] 0 if grid points i and j share an element) Ax = b, b is the force vector In most applications the graph is quite irregular => unstructured sparse matrix Computation of stiffness matrix A and force vector b is relatively cheap and can be done locally by the processor that owns the respective grid points computation of A is trivial to parallelize Linear system is large and sparse => most computational expensive phase of the FEM Assume Ax = b is solved by using the CG-method: SAXPY - no communcation overhead Dot-product: O(log p) with CT routing p = # processors used Matrix-vector multiply: communication depends on the number of grid points that the processor holds and which share element(s) with another processor Kery issue: Minimize load inbalance and maximize comp. intensity (#flops/mem transfer) Performance is determined by how the computational grid is partitioned! 1-dimensional partitioning of grid graphs 31 2-dimensional partitioning of grid graphs 32 Optimal partitioning is NP- hard! Vertical and horisontal partitioning are overlapped Does not give the same amount of nodes per partition Balance the load between partitions transfer nodes from heavily loaded to lightly loaded nodes
9 Block partitioning of arbitrary graphs 33 Graph is partitioned in levels Processors are assigned nodes according to the level partition
CSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing
HW #9 10., 10.3, 10.7 Due April 17 { } Review Completing Graph Algorithms Maximal Independent Set Johnson s shortest path algorithm using adjacency lists Q= V; for all v in Q l[v] = infinity; l[s] = 0;
More information(Sparse) Linear Solvers
(Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert
More informationNumerical Algorithms
Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26
More informationParallel Implementations of Gaussian Elimination
s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n
More information(Sparse) Linear Solvers
(Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 1 Don t you just invert
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationContents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationExam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3
UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis
More informationMatrix multiplication
Matrix multiplication Standard serial algorithm: procedure MAT_VECT (A, x, y) begin for i := 0 to n - 1 do begin y[i] := 0 for j := 0 to n - 1 do y[i] := y[i] + A[i, j] * x [j] end end MAT_VECT Complexity:
More informationDense Matrix Algorithms
Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication
More information1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3
6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require
More informationIterative Sparse Triangular Solves for Preconditioning
Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More informationSolving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems
AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination
More informationBLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker
BLAS and LAPACK + Data Formats for Sparse Matrices Part of the lecture Wissenschaftliches Rechnen Hilmar Wobker Institute of Applied Mathematics and Numerics, TU Dortmund email: hilmar.wobker@math.tu-dortmund.de
More informationFigure 6.1: Truss topology optimization diagram.
6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.
More informationLecture 17: More Fun With Sparse Matrices
Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you
More informationData Structures for sparse matrices
Data Structures for sparse matrices The use of a proper data structures is critical to achieving good performance. Generate a symmetric sparse matrix A in matlab and time the operations of accessing (only)
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationParallel Computing: Parallel Algorithm Design Examples Jin, Hai
Parallel Computing: Parallel Algorithm Design Examples Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! Given associative operator!! a 0! a 1! a 2!! a
More informationLecture 4: Principles of Parallel Algorithm Design (part 4)
Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Minimize execution time Reduce overheads of execution Sources of overheads: Inter-process interaction
More informationChapter 8 Dense Matrix Algorithms
Chapter 8 Dense Matrix Algorithms (Selected slides & additional slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to arallel Computing, Addison Wesley, 23. Topic Overview
More informationMatrix Multiplication
Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse
More informationIterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms
Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear
More informationSparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg
Sparse Matrices Many matrices in computing only contain a very small percentage of nonzeros. Such matrices are called sparse ( dünn besetzt ). Often, an upper bound on the number of nonzeros in a row can
More informationImplicit schemes for wave models
Implicit schemes for wave models Mathieu Dutour Sikirić Rudjer Bo sković Institute, Croatia and Universität Rostock April 17, 2013 I. Wave models Stochastic wave modelling Oceanic models are using grids
More informationf xx + f yy = F (x, y)
Application of the 2D finite element method to Laplace (Poisson) equation; f xx + f yy = F (x, y) M. R. Hadizadeh Computer Club, Department of Physics and Astronomy, Ohio University 4 Nov. 2013 Domain
More informationHigh Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms
High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering (CiE) Scientific Computing
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:
More informationComputational Fluid Dynamics - Incompressible Flows
Computational Fluid Dynamics - Incompressible Flows March 25, 2008 Incompressible Flows Basis Functions Discrete Equations CFD - Incompressible Flows CFD is a Huge field Numerical Techniques for solving
More informationIterative Methods for Linear Systems
Iterative Methods for Linear Systems 1 the method of Jacobi derivation of the formulas cost and convergence of the algorithm a Julia function 2 Gauss-Seidel Relaxation an iterative method for solving linear
More informationCS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1
More informationF k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00
PRLLEL SPRSE HOLESKY FTORIZTION J URGEN SHULZE University of Paderborn, Department of omputer Science Furstenallee, 332 Paderborn, Germany Sparse matrix factorization plays an important role in many numerical
More informationApplication of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures
Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application
More informationReport of Linear Solver Implementation on GPU
Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,
More informationMultigrid Pattern. I. Problem. II. Driving Forces. III. Solution
Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids
More informationWhat is Multigrid? They have been extended to solve a wide variety of other problems, linear and nonlinear.
AMSC 600/CMSC 760 Fall 2007 Solution of Sparse Linear Systems Multigrid, Part 1 Dianne P. O Leary c 2006, 2007 What is Multigrid? Originally, multigrid algorithms were proposed as an iterative method to
More informationChapter 14: Matrix Iterative Methods
Chapter 14: Matrix Iterative Methods 14.1INTRODUCTION AND OBJECTIVES This chapter discusses how to solve linear systems of equations using iterative methods and it may be skipped on a first reading of
More informationBlocking SEND/RECEIVE
Message Passing Blocking SEND/RECEIVE : couple data transfer and synchronization - Sender and receiver rendezvous to exchange data P P SrcP... x : =... SEND(x, DestP)... DestP... RECEIVE(y,SrcP)... M F
More informationSELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND
Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana
More informationCS 542G: Solving Sparse Linear Systems
CS 542G: Solving Sparse Linear Systems Robert Bridson November 26, 2008 1 Direct Methods We have already derived several methods for solving a linear system, say Ax = b, or the related leastsquares problem
More informationNumerical Linear Algebra
Numerical Linear Algebra Probably the simplest kind of problem. Occurs in many contexts, often as part of larger problem. Symbolic manipulation packages can do linear algebra "analytically" (e.g. Mathematica,
More informationPerformance Evaluation of a New Parallel Preconditioner
Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The
More informationRoofline Model (Will be using this in HW2)
Parallel Architecture Announcements HW0 is due Friday night, thank you for those who have already submitted HW1 is due Wednesday night Today Computing operational intensity Dwarves and Motifs Stencil computation
More information1 Exercise: 1-D heat conduction with finite elements
1 Exercise: 1-D heat conduction with finite elements Reading This finite element example is based on Hughes (2000, sec. 1.1-1.15. 1.1 Implementation of the 1-D heat equation example In the previous two
More informationNumerical Methods to Solve 2-D and 3-D Elliptic Partial Differential Equations Using Matlab on the Cluster maya
Numerical Methods to Solve 2-D and 3-D Elliptic Partial Differential Equations Using Matlab on the Cluster maya David Stonko, Samuel Khuvis, and Matthias K. Gobbert (gobbert@umbc.edu) Department of Mathematics
More informationPROGRAMMING OF MULTIGRID METHODS
PROGRAMMING OF MULTIGRID METHODS LONG CHEN In this note, we explain the implementation detail of multigrid methods. We will use the approach by space decomposition and subspace correction method; see Chapter:
More information2 Fundamentals of Serial Linear Algebra
. Direct Solution of Linear Systems.. Gaussian Elimination.. LU Decomposition and FBS..3 Cholesky Decomposition..4 Multifrontal Methods. Iterative Solution of Linear Systems.. Jacobi Method Fundamentals
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear
More informationThe numerical simulation of complex PDE problems. A numerical simulation project The finite element method for solving a boundary-value problem in R 2
Universidad de Chile The numerical simulation of complex PDE problems Facultad de Ciencias Físicas y Matemáticas P. Frey, M. De Buhan Year 2008 MA691 & CC60X A numerical simulation project The finite element
More informationWhy Use the GPU? How to Exploit? New Hardware Features. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. Semiconductor trends
Imagine stream processor; Bill Dally, Stanford Connection Machine CM; Thinking Machines Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz Eitan Grinspun Caltech Ian Farmer
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.3 Iterative Methods Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign
More informationSummer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics
Summer 2009 REU: Introduction to Some Advanced Topics in Computational Mathematics Moysey Brio & Paul Dostert July 4, 2009 1 / 18 Sparse Matrices In many areas of applied mathematics and modeling, one
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationChapter 4. Matrix and Vector Operations
1 Scope of the Chapter Chapter 4 This chapter provides procedures for matrix and vector operations. This chapter (and Chapters 5 and 6) can handle general matrices, matrices with special structure and
More informationSparse Linear Systems
1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite
More informationIntroduction to Parallel. Programming
University of Nizhni Novgorod Faculty of Computational Mathematics & Cybernetics Introduction to Parallel Section 9. Programming Parallel Methods for Solving Linear Systems Gergel V.P., Professor, D.Sc.,
More informationIntroduction to Parallel & Distributed Computing Parallel Graph Algorithms
Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental
More informationPrinciple Of Parallel Algorithm Design (cont.) Alexandre David B2-206
Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction
More informationCOMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction
COMP/CS 605: Introduction to Parallel Computing Topic: Parallel Computing Overview/Introduction Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University
More informationPARDISO Version Reference Sheet Fortran
PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationPerformance Evaluation of a New Parallel Preconditioner
Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller October 994 CMU-CS-94-25 Marco Zagha School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 This
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 5 Vector and Matrix Products Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel
More informationA Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography
1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography
More informationChapter 3 SPARSE MATRICES. 3.1 Introduction
Chapter 3 SPARSE MATRICES As described in the previous chapter, standard discretizations of Partial Differential Equations typically lead to large and sparse matrices. A sparse matrix is defined, somewhat
More informationParallel Computing. Parallel Algorithm Design
Parallel Computing Parallel Algorithm Design Task/Channel Model Parallel computation = set of tasks Task Program Local memory Collection of I/O ports Tasks interact by sending messages through channels
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationGeometric Modeling Assignment 3: Discrete Differential Quantities
Geometric Modeling Assignment : Discrete Differential Quantities Acknowledgements: Julian Panetta, Olga Diamanti Assignment (Optional) Topic: Discrete Differential Quantities with libigl Vertex Normals,
More informationOutline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency
1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming
More informationLecture 4: Graph Algorithms
Lecture 4: Graph Algorithms Definitions Undirected graph: G =(V, E) V finite set of vertices, E finite set of edges any edge e = (u,v) is an unordered pair Directed graph: edges are ordered pairs If e
More informationTools and Libraries for Parallel Sparse Matrix Computations. Edmond Chow and Yousef Saad. University of Minnesota. Minneapolis, MN
Tools and Libraries for Parallel Sparse Matrix Computations Edmond Chow and Yousef Saad Department of Computer Science, and Minnesota Supercomputer Institute University of Minnesota Minneapolis, MN 55455
More informationHomework # 1 Due: Feb 23. Multicore Programming: An Introduction
C O N D I T I O N S C O N D I T I O N S Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.86: Parallel Computing Spring 21, Agarwal Handout #5 Homework #
More informationExploring unstructured Poisson solvers for FDS
Exploring unstructured Poisson solvers for FDS Dr. Susanne Kilian hhpberlin - Ingenieure für Brandschutz 10245 Berlin - Germany Agenda 1 Discretization of Poisson- Löser 2 Solvers for 3 Numerical Tests
More informationAdaptive-Mesh-Refinement Pattern
Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points
More informationExploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology
Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation
More informationLinear Equation Systems Iterative Methods
Linear Equation Systems Iterative Methods Content Iterative Methods Jacobi Iterative Method Gauss Seidel Iterative Method Iterative Methods Iterative methods are those that produce a sequence of successive
More informationChapter Introduction
Chapter 4.1 Introduction After reading this chapter, you should be able to 1. define what a matrix is. 2. identify special types of matrices, and 3. identify when two matrices are equal. What does a matrix
More informationSparse Matrix Libraries in C++ for High Performance. Architectures. ferent sparse matrix data formats in order to best
Sparse Matrix Libraries in C++ for High Performance Architectures Jack Dongarra xz, Andrew Lumsdaine, Xinhui Niu Roldan Pozo z, Karin Remington x x Oak Ridge National Laboratory z University oftennessee
More informationProf. Dr. Stefan Funken, Prof. Dr. Alexander Keller, Prof. Dr. Karsten Urban 11. Januar Scientific Computing Parallele Algorithmen
Prof. Dr. Stefan Funken, Prof. Dr. Alexander Keller, Prof. Dr. Karsten Urban 11. Januar 2007 Scientific Computing Parallele Algorithmen Page 2 Scientific Computing 11. Januar 2007 Funken / Keller / Urban
More informationIntroduction to Parallel Computing
Introduction to Parallel Computing W. P. Petersen Seminar for Applied Mathematics Department of Mathematics, ETHZ, Zurich wpp@math. ethz.ch P. Arbenz Institute for Scientific Computing Department Informatik,
More informationIntroduction to Multigrid and its Parallelization
Introduction to Multigrid and its Parallelization! Thomas D. Economon Lecture 14a May 28, 2014 Announcements 2 HW 1 & 2 have been returned. Any questions? Final projects are due June 11, 5 pm. If you are
More informationGRAPH CENTERS USED FOR STABILIZATION OF MATRIX FACTORIZATIONS
Discussiones Mathematicae Graph Theory 30 (2010 ) 349 359 GRAPH CENTERS USED FOR STABILIZATION OF MATRIX FACTORIZATIONS Pavla Kabelíková Department of Applied Mathematics FEI, VSB Technical University
More informationLecture 5: Search Algorithms for Discrete Optimization Problems
Lecture 5: Search Algorithms for Discrete Optimization Problems Definitions Discrete optimization problem (DOP): tuple (S, f), S finite set of feasible solutions, f : S R, cost function. Objective: find
More informationDistributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs
Distributed NVAMG Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Istvan Reguly (istvan.reguly at oerc.ox.ac.uk) Oxford e-research Centre NVIDIA Summer Internship
More informationOptimizing Data Locality for Iterative Matrix Solvers on CUDA
Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,
More informationComputational issues in linear programming
Computational issues in linear programming Julian Hall School of Mathematics University of Edinburgh 15th May 2007 Computational issues in linear programming Overview Introduction to linear programming
More informationChapter 13. Boundary Value Problems for Partial Differential Equations* Linz 2002/ page
Chapter 13 Boundary Value Problems for Partial Differential Equations* E lliptic equations constitute the third category of partial differential equations. As a prototype, we take the Poisson equation
More informationPorting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation
Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA NVIDIA Corporation Outline! Overview of CG benchmark! Overview of CUDA Libraries! CUSPARSE! CUBLAS! Porting Sequence! Algorithm Analysis! Data/Code
More informationCh 09 Multidimensional arrays & Linear Systems. Andrea Mignone Physics Department, University of Torino AA
Ch 09 Multidimensional arrays & Linear Systems Andrea Mignone Physics Department, University of Torino AA 2017-2018 Multidimensional Arrays A multidimensional array is an array containing one or more arrays.
More informationHandling Parallelisation in OpenFOAM
Handling Parallelisation in OpenFOAM Hrvoje Jasak hrvoje.jasak@fsb.hr Faculty of Mechanical Engineering and Naval Architecture University of Zagreb, Croatia Handling Parallelisation in OpenFOAM p. 1 Parallelisation
More informationMultigrid Methods for Markov Chains
Multigrid Methods for Markov Chains Hans De Sterck Department of Applied Mathematics, University of Waterloo collaborators Killian Miller Department of Applied Mathematics, University of Waterloo, Canada
More informationAn Investigation into Iterative Methods for Solving Elliptic PDE s Andrew M Brown Computer Science/Maths Session (2000/2001)
An Investigation into Iterative Methods for Solving Elliptic PDE s Andrew M Brown Computer Science/Maths Session (000/001) Summary The objectives of this project were as follows: 1) Investigate iterative
More informationThe Finite Element Method
The Finite Element Method A Practical Course G. R. Liu and S. S. Quek Chapter 1: Computational modeling An overview 1 CONTENTS INTRODUCTION PHYSICAL PROBLEMS IN ENGINEERING COMPUTATIONAL MODELLING USING
More informationSimulating ocean currents
Simulating ocean currents We will study a parallel application that simulates ocean currents. Goal: Simulate the motion of water currents in the ocean. Important to climate modeling. Motion depends on
More informationSemester Final Report
CSUMS SemesterFinalReport InLaTex AnnKimball 5/20/2009 ThisreportisageneralsummaryoftheaccumulationofknowledgethatIhavegatheredthroughoutthis semester. I was able to get a birds eye view of many different
More informationParallel solution for finite element linear systems of. equations on workstation cluster *
Aug. 2009, Volume 6, No.8 (Serial No.57) Journal of Communication and Computer, ISSN 1548-7709, USA Parallel solution for finite element linear systems of equations on workstation cluster * FU Chao-jiang
More information