CS 770G - Parallel Algorithms in Scientific Computing

Size: px
Start display at page:

Download "CS 770G - Parallel Algorithms in Scientific Computing"

Transcription

1 CS 770G - Parallel lgorithms in Scientific Computing Dense Matrix Computation II: Solving inear Systems May 28, 2001 ecture 6

2 References Introduction to Parallel Computing Kumar, Grama, Gupta, Karypis, Benjamin Cummings. Numerical inear lgebra for High-Performance Computers Dongarra, Duff, Sorensen, van der Vorst, SIM. portion of the notes comes from Prof. J. Demmel s CS267 course at C Berkeley. 2

3 Review of Gaussian Elimination (GE) for Solving xb dd multiples of each row to later rows to make upper triangular. Solve resulting triangular system x c by substitution. for each column i zero it out below the diagonal by adding multiples of row i to later rows for i 1 to n-1 for each row j below row i for j i+1 to n add a multiple of row i to row j for k i to n (j,k) (j,k) - ((j,i)/(i,i)) * (i,k) 3

4 Initial version: Refine GE lgorithm (1) for each column i zero it out below the diagonal by adding multiples of row i to later rows for i 1 to n-1 for each row j below row i for j i+1 to n add a multiple of row i to row j for k i to n (j,k) (j,k) - ((j,i)/(i,i)) * (i,k) Remove computation of constant (j,i)/(i,i) from inner loop: for i 1 to n-1 for j i+1 to n m (j,i)/(i,i) for k i to n (j,k) (j,k) - m * (i,k) 4

5 ast version: Refine GE lgorithm (2) for i 1 to n-1 for j i+1 to n m (j,i)/(i,i) for k i to n (j,k) (j,k) - m * (i,k) Don t compute what we already know: zeros below diagonal in column i for i 1 to n-1 for j i+1 to n m (j,i)/(i,i) for k i+1 to n (j,k) (j,k) - m * (i,k) 5

6 ast version: Refine GE lgorithm (3) for i 1 to n-1 for j i+1 to n m (j,i)/(i,i) for k i+1 to n (j,k) (j,k) - m * (i,k) Store multipliers m below diagonal in zeroed entries for later use: for i 1 to n-1 for j i+1 to n (j,i) (j,i)/(i,i) for k i+1 to n (j,k) (j,k) - (j,i) * (i,k) 6

7 ast version: Refine GE lgorithm (4) for i 1 to n-1 for j i+1 to n (j,i) (j,i)/(i,i) for k i+1 to n (j,k) (j,k) - (j,i) * (i,k) Express using matrix operations (BS) for i 1 to n-1 (i+1:n,i) (i+1:n,i) / (i,i) (i+1:n,i+1:n) (i+1:n, i+1:n ) - (i+1:n, i) * (i, i+1:n) 7

8 What GE Really Computes for i 1 to n-1 (i+1:n,i) (i+1:n,i) / (i,i) (i+1:n,i+1:n) (i+1:n, i+1:n ) - (i+1:n, i) * (i, i+1:n) Call the strictly lower triangular matrix of multipliers M, and let I+M. Call the upper triangle of the final matrix. emma ( Factorization): If the above algorithm terminates (does not divide by zero) then *. Solving *xb using GE Factorize * using GE (cost 2/3 n 3 flops) Solve *y b for y, using substitution (cost n 2 flops) Solve *x y for x, using substitution (cost n 2 flops) Thus *x (*)*x *(*x) *y b as desired 8

9 Problems with Basic GE lgorithm What if some (i,i) is zero? Or very small? Result may not exist, or be unstable, so need to pivot Current computation all BS 1 or BS 2, but we know that BS 3 (matrix multiply) is fastest for i 1 to n-1 (i+1:n,i) (i+1:n,i) / (i,i) BS 1 (scale a vector) (i+1:n,i+1:n) (i+1:n, i+1:n ) BS 2 (rank-1 update) - (i+1:n, i) * (i, i+1:n) Peak BS 3 BS 2 BS 1 9

10 Pivoting in Gaussian Elimination [ 0 1 ] fails completely, even though is easy [ 1 0 ] Illustrate problems in 3-decimal digit arithmetic: [ 1e-4 1 ] and b [ 1 ], correct answer to 3 places is x [ 1 ] [ 1 1 ] [ 2 ] [ 1 ] Result of decomposition is [ 1 0 ] [ 1 0 ] [ fl(1/1e-4) 1 ] [ 1e4 1 ] No roundoff error yet [ 1e-4 1 ] [ 1e-4 1 ] Error in 4th decimal place [ 0 fl(1-1e4*1) ] [ 0-1e4 ] Check if * [ 1e-4 1 ] (2,2) entry entirely wrong [ 1 0 ] lgorithm forgets (2,2) entry, gets same and for all (2,2) <5 Numerical instability Computed solution x totally inaccurate Cure: Pivot (swap rows of ) so entries of and bounded 10

11 Gaussian Elimination with Partial Pivoting (GEPP) Partial Pivoting: swap rows so that each multiplier satisfies: (i,j) (j,i)/(i,i) < 1 for i 1 to n-1 find and record k where (k,i) max{i < j < n} (j,i) i.e. largest entry in rest of column i if (k,i) 0 exit with a warning that is singular, or nearly so elseif k! i swap rows i and k of end if (i+1:n,i) (i+1:n,i) / (i,i) each quotient lies in [-1,1] (i+1:n,i+1:n) (i+1:n, i+1:n ) - (i+1:n, i) * (i, i+1:n) emma: This algorithm computes P**, where P is a permutation matrix Since each entry of (i,j) < 1, this algorithm is considered numerically stable. For details see PCK code at and Dongarra s book.

12 Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would have been optimized for a given hardware platform ny other matrix operation that you may think of (matrixmatrix multiply, Cholesky factorization, QR, Householder method, etc) can be constructed from BS subprograms in a similar fashion and in fact have been constructed in the package called PCK. Note that only evel 1 and 2 BS routines have been used in decomposition. Efficiency considerations? 12

13 Overview of PCK Standard library for dense/banded linear algebra inear systems: *xb east squares problems: min x *x-b 2 Eigenvalue problems: x λx, x λbx Singular value decomposition (SVD): ΣV T lgorithms reorganized to use BS3 as much as possible. Basis of math libraries on many computers. Many algorithmic innovations remain. 13

14 Performance of PCK (n1000) 14

15 Performance of PCK (n100) 15

16 Summary, Cont d Need to devise algorithms that can make use of evel 3 BS (matrix-matrix) routines, for several reasons: evel 3 routines are known to run much more efficiently due to larger ratio of computation to memory references/communication Parallel algorithms on distributed memory machines will require that we decompose the original matrix into blocks which reside in each processor (similar to HW1) Parallel algorithms will require that we minimize the surface-to volume ratio of our decompositions, and blocking becomes the natural approach. 16

17 Converting BS2 to BS3 in GEPP Blocking se to optimize matrix-multiplication. Harder here because of data dependencies in GEPP. Delayed pdates Save updates to trailing matrix from several consecutive BS2 updates. pply many saved updates simultaneously in one BS3 operation. Same idea works for much of dense linear algebra Open questions remain. Need to choose a block size b (k in Dongarra s book) lgorithm will save and apply b updates. b must be small enough so that active submatrix consisting of b columns of fits in cache. b must be large enough to make BS3 fast. 17

18 18 Blocked lgorithms - Factorization

19 Blocked lgorithms - Factorization With these relationships, we can develop different algorithms by choosing the order in which operations are performed. Blocksize, b, needs to be chosen carefully. b1 produces the usual algorithm, which b>1 will improve performance on a single-processor Three natural variants: eft-ooking Right-ooking Crout 19

20 Blocked lgorithms - Factorization ssume you have already done the first row and column of the GEPP nd you have the sub-block below left to work on Notice that the decomposition in this sub-block is independent of the portion you have already completed 20

21 Blocked lgorithms - Factorization For simplicity, change notation of sub-block to P P Notice that, once you have done Gaussian Elimination on and 21, you have already obtained,, and 21. Now you can re-arrange the block equations by substituting: ~ 12 ( nd repeat the procedure recursively. )

22 Blocked lgorithms - Factorization graphical view of what is going on is given by:

23 Blocked lgorithms - Factorization eft-ooking Right-ooking Crout Variations in algorithm are due to the order in which submatrix operations are performed. Slight advantages to Crout s algorithm (hybrid of the first two). Pre-computed sub-blocks Currently being operated on sub-blocks 23

24 Review: BS 3 (Blocked) GEPP BS 3 for ib 1 to n-1 step b Process matrix b columns at a time end ib + b-1 Point to end of block of b columns apply BS2 version of GEPP to get (ib:n, ib:end) P * * let denote the strict lower triangular part of (ib:end, ib:end) + I (ib:end, end+1:n) -1 * (ib:end, end+1:n) update next b rows of (end+1:n, end+1:n ) (end+1:n, end+1:n ) - (end+1:n, ib:end) * (ib:end, end+1:n) apply delayed updates with single matrix-multiply with inner dimension b 24

25 Parallel lgorithms for Dense Matrices ll that follows is applicable to dense or full matrices only. Square matrices discussed, but arguments valid for rectangular matrices as well. Typical parallelization steps: Decomposition: identify parallel work and partitioning. Mapping: which procs execute which portion of the work. ssignment: load balance work among procs. Organization: communication and synchronization. 25

26 Parallel lgorithms for Dense Matrices The proc that owns a given portion of a matrix is responsible for doing all of the computation that involved that portion of the matrix. This is the sensible thing to so, since communication is minimized (although, due to data dependencies within the matrix, it will still be necessary) The question is: how should we subdivide a matrix so that parallel efficiency is maximized? There are various options. 26

27 Different Data ayouts for Parallel GE Bad load balance: P 0 idle after first n/4 steps oad balanced, but can t easily use BS2 or BS3 Can trade load balance and BS2/3 performance by choosing b, but factorization of block column is a bottleneck The winner! Complicated addressing 27

28 Row and Column Block Cyclic ayout Matrix is composed of brow-by-bcol blocks. Procs are distributed in a 2D array indexed by (pi, pj), 0 pi < Prow, 0 pj < Pcol. i,j is mapped to proc (pi, pj) using the formulae: pi pj floor( i / brow) floor( j / bcol) mod mod Prow Pcol In the figure, p4, ProwPcolbrowbcol2. Pcol-fold parallelism in any column, and calls to the BS2 and BS3 on matrices of size brow-bybcol. Serial bottleneck is eased. Need not be symmetric in rows and columns. 28

29 Row and Column Block Cyclic ayout In factorization, distribution of work becomes uneven as the computation progresses. arger block sizes result in greater load imbalance but reduce frequency of communication between procs. Block size controls these tradeoffs. Some procs need to do more work between synchronization points than others (e.g. partial pivoting over rows in a single block-col, other procs stay idle. The computation of each block row of the factorization requires the solution of a lower triangular system across procs in a single row). Processor decomposition controls this type of tradeoff. 29

30 Parallel GE with a 2D Block Cyclic ayout Block size, b, in the algorithm and the block sizes brow and bcol in the layout satisfy bbrowbcol. Shaded regions indicate busy processors or communication performed. nnecessary to have a barrier between each step of the algorithm, e.g.. step 9, 10, and can be pipelined. See Dongarra s book for more details. 30

31 31

32 32 Matrix multiply of green green - blue * pink

33 33

34 Parallel Matrix Transpose The transpose of a matrix is defined as: T i, j, 0 i, j, i ll elements below diagonal move above the diagonal and vice versa. ssume it takes 1 unit time to exchange a pair of matrix elements. Sequential time of transposing an n n matrix is: T s n 2 n 2 Consider parallel architectures organized in both a 2D mesh and hypercube structures. j n 34

35 Parallel Matrix Transpose - 2D Mesh P P 0 1 P P 2 3 P 0 P P 4 8 P 12 P 4 P 5 P 6 P 7 P 1 P 5 P 9 P 13 P8 P P P 9 10 P 2 P 6 P 10 P 14 P 12 P 13 P 14 P 15 P P 3 7 P P 15 Initial Matrix Final Matrix Elements/blocks on lower-left part of matrix move up to the diagonal, and then right to their final location. Each step taken requires communication. Elements/blocks on upper-right part of matrix move down to the diagonal, and then left to their final location. Each step taken requires communication. 35

36 Parallel Matrix Transpose - 2D Mesh P P 0 1 P P 2 3 P 0 P P 4 8 P 12 P 4 P 5 P 6 P 7 P 1 P 5 P 9 P 13 P8 P P P 9 10 P 2 P 6 P 10 P 14 P 12 P 13 P 14 P 15 P P 3 7 P P 15 Initial Matrix Final Matrix If each of the p procs contains a single number, after all of these communication steps, the matrix has been transposed. However, if each proc contains a sub-block of the matrix, after all blocks have been communicated to their final locations, they need to be locally transposed. Each sub-block will contain (n/ p) (n/ p) elements and the cost of communication will be higher than before. Cost of communication is dominated by elements/blocks that reside in the top-right and bottom-left corners. They have to take an approximate number of hops equal to 2 p. 36

37 Parallel Matrix Transpose - 2D Mesh 2 Each block contains n / p elements, so it takes at most 2 2( t + t n / p) p s w for all blocks to move to their final destinations. fter that, the local blocks need to be transposed, which can be done in an amount of time approximately equal to 2 n /(2 p) Thus, a total wall clock time equals to 2 n T + 2t p + 2t P s 2 p Summing over all p processors, the total time consumed by the parallel algorithm is of order T ( 2 TOT Θ n p) w 2 n p which is higher than the sequential complexity (order n^2). This algorithm, on a 2D mesh is not cost optimal. The same is true regardless of whether store-and-forward or cut-through routing schemes are used. 37

38 Parallel Matrix Transpose - Hypercube P P 0 1 P P 2 3 P 0 P 1 P P 8 9 P 4 P 5 P 6 P 7 P 4 P 5 P 12 P 13 P8 P P P 9 10 P P 2 3 P P 10 P 12 P 13 P 14 P 15 P 6 P 7 P 14 P 15 This algorithm is called recursive subdivision and maps naturally onto a hypercube. fter the blocks have all been transposed, the elements inside each block (local to a proc) still need to be transposed. Wall clock time Total time 2 2 n n T + ( t + t ) log p T ( 2 log ) P s w TOT Θ n p 2 p p P 0 P 1 P 2 P 5 P 6 P P 3 7 P P 4 8 P 9 P 10 P 12 P 13 P 14 P P 15 38

39 39 Parallel Matrix Transpose - Hypercube With cut-through routing, the timing improves slightly to ) log ( )log 2 ( 2 2 p n T p t p n t t T TOT h w s P Θ + + which is still suboptimal. sing striped partitioning (a.k.a column blocked layout) and cut-through routing on a hypercube, the total time becomes cost-optimal. ) ( log 2 1 1) ( n T p p t p n t p t p n T TOT h w s P Θ Note that this type of partitioning may be cost-optimal for the transpose operation. But not necessarily for other matrix operations, such as factorization and matrix-matrix multiply.

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

Parallelizing LU Factorization

Parallelizing LU Factorization Parallelizing LU Factorization Scott Ricketts December 3, 2006 Abstract Systems of linear equations can be represented by matrix equations of the form A x = b LU Factorization is a method for solving systems

More information

Lecture 7: Linear Algebra Algorithms

Lecture 7: Linear Algebra Algorithms Outline Lecture 7: Linear Algebra Algorithms Jack Dongarra, U of Tennessee Slides are adapted from Jim Demmel, UCB s Lecture on Linear Algebra Algorithms 1 Motivation, overview for Dense Linear Algebra

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Parallel Implementations of Gaussian Elimination

Parallel Implementations of Gaussian Elimination s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 5 Vector and Matrix Products Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel

More information

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur

Lecture 5: Matrices. Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur Lecture 5: Matrices Dheeraj Kumar Singh 07CS1004 Teacher: Prof. Niloy Ganguly Department of Computer Science and Engineering IIT Kharagpur 29 th July, 2008 Types of Matrices Matrix Addition and Multiplication

More information

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Zero elements of first column below 1 st row multiplying 1 st

More information

Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices

Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices Nerma Baščelija Sarajevo School of Science and Technology Department of Computer Science Hrasnicka Cesta 3a, 71000 Sarajevo

More information

Parallel Reduction from Block Hessenberg to Hessenberg using MPI

Parallel Reduction from Block Hessenberg to Hessenberg using MPI Parallel Reduction from Block Hessenberg to Hessenberg using MPI Viktor Jonsson May 24, 2013 Master s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Lars Karlsson Examiner: Fredrik Georgsson

More information

Dense LU Factorization

Dense LU Factorization Dense LU Factorization Dr.N.Sairam & Dr.R.Seethalakshmi School of Computing, SASTRA Univeristy, Thanjavur-613401. Joint Initiative of IITs and IISc Funded by MHRD Page 1 of 6 Contents 1. Dense LU Factorization...

More information

Lecture 17: Array Algorithms

Lecture 17: Array Algorithms Lecture 17: Array Algorithms CS178: Programming Parallel and Distributed Systems April 4, 2001 Steven P. Reiss I. Overview A. We talking about constructing parallel programs 1. Last time we discussed sorting

More information

Numerical Algorithms

Numerical Algorithms Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3 UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis

More information

Lecture 4: Principles of Parallel Algorithm Design (part 4)

Lecture 4: Principles of Parallel Algorithm Design (part 4) Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Minimize execution time Reduce overheads of execution Sources of overheads: Inter-process interaction

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

CS Software Engineering for Scientific Computing Lecture 10:Dense Linear Algebra

CS Software Engineering for Scientific Computing Lecture 10:Dense Linear Algebra CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear Algebra Slides from James Demmel and Kathy Yelick 1 Outline What is Dense Linear Algebra? Where does the time go in an algorithm?

More information

15. The Software System ParaLab for Learning and Investigations of Parallel Methods

15. The Software System ParaLab for Learning and Investigations of Parallel Methods 15. The Software System ParaLab for Learning and Investigations of Parallel Methods 15. The Software System ParaLab for Learning and Investigations of Parallel Methods... 1 15.1. Introduction...1 15.2.

More information

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017 Outline Introduction Graph Partitioning Problem Multilevel

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Probably the simplest kind of problem. Occurs in many contexts, often as part of larger problem. Symbolic manipulation packages can do linear algebra "analytically" (e.g. Mathematica,

More information

Hypercubes. (Chapter Nine)

Hypercubes. (Chapter Nine) Hypercubes (Chapter Nine) Mesh Shortcomings: Due to its simplicity and regular structure, the mesh is attractive, both theoretically and practically. A problem with the mesh is that movement of data is

More information

Introduction to Parallel. Programming

Introduction to Parallel. Programming University of Nizhni Novgorod Faculty of Computational Mathematics & Cybernetics Introduction to Parallel Section 9. Programming Parallel Methods for Solving Linear Systems Gergel V.P., Professor, D.Sc.,

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

Matrix multiplication

Matrix multiplication Matrix multiplication Standard serial algorithm: procedure MAT_VECT (A, x, y) begin for i := 0 to n - 1 do begin y[i] := 0 for j := 0 to n - 1 do y[i] := y[i] + A[i, j] * x [j] end end MAT_VECT Complexity:

More information

Chapter 8 Dense Matrix Algorithms

Chapter 8 Dense Matrix Algorithms Chapter 8 Dense Matrix Algorithms (Selected slides & additional slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to arallel Computing, Addison Wesley, 23. Topic Overview

More information

Linear Arrays. Chapter 7

Linear Arrays. Chapter 7 Linear Arrays Chapter 7 1. Basics for the linear array computational model. a. A diagram for this model is P 1 P 2 P 3... P k b. It is the simplest of all models that allow some form of communication between

More information

Project Report. 1 Abstract. 2 Algorithms. 2.1 Gaussian elimination without partial pivoting. 2.2 Gaussian elimination with partial pivoting

Project Report. 1 Abstract. 2 Algorithms. 2.1 Gaussian elimination without partial pivoting. 2.2 Gaussian elimination with partial pivoting Project Report Bernardo A. Gonzalez Torres beaugonz@ucsc.edu Abstract The final term project consist of two parts: a Fortran implementation of a linear algebra solver and a Python implementation of a run

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular Linear Systems Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 19 25 October 2018 Topics for

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

Parallel Implementation of QRD Algorithms on the Fujitsu AP1000

Parallel Implementation of QRD Algorithms on the Fujitsu AP1000 Parallel Implementation of QRD Algorithms on the Fujitsu AP1000 Zhou, B. B. and Brent, R. P. Computer Sciences Laboratory Australian National University Canberra, ACT 0200 Abstract This paper addresses

More information

Sparse matrices, graphs, and tree elimination

Sparse matrices, graphs, and tree elimination Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);

More information

Chemical Engineering 541

Chemical Engineering 541 Chemical Engineering 541 Computer Aided Design Methods Direct Solution of Linear Systems 1 Outline 2 Gauss Elimination Pivoting Scaling Cost LU Decomposition Thomas Algorithm (Iterative Improvement) Overview

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Fast Algorithms for Regularized Minimum Norm Solutions to Inverse Problems

Fast Algorithms for Regularized Minimum Norm Solutions to Inverse Problems Fast Algorithms for Regularized Minimum Norm Solutions to Inverse Problems Irina F. Gorodnitsky Cognitive Sciences Dept. University of California, San Diego La Jolla, CA 9293-55 igorodni@ece.ucsd.edu Dmitry

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

Scientific Computing. Some slides from James Lambers, Stanford

Scientific Computing. Some slides from James Lambers, Stanford Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical

More information

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC Lecture 9: Group Communication Operations Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 4 slides of the text, by A. Grama w/ a few changes, augmentations and corrections Topic Overview

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003 Topic Overview One-to-All Broadcast

More information

Blocked Schur Algorithms for Computing the Matrix Square Root. Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui. MIMS EPrint: 2012.

Blocked Schur Algorithms for Computing the Matrix Square Root. Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui. MIMS EPrint: 2012. Blocked Schur Algorithms for Computing the Matrix Square Root Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui 2013 MIMS EPrint: 2012.26 Manchester Institute for Mathematical Sciences School of Mathematics

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 21 Outline 1 Course

More information

Lecture 12 (Last): Parallel Algorithms for Solving a System of Linear Equations. Reference: Introduction to Parallel Computing Chapter 8.

Lecture 12 (Last): Parallel Algorithms for Solving a System of Linear Equations. Reference: Introduction to Parallel Computing Chapter 8. CZ4102 High Performance Computing Lecture 12 (Last): Parallel Algorithms for Solving a System of Linear Equations - Dr Tay Seng Chuan Reference: Introduction to Parallel Computing Chapter 8. 1 Topic Overview

More information

Vector: A series of scalars contained in a column or row. Dimensions: How many rows and columns a vector or matrix has.

Vector: A series of scalars contained in a column or row. Dimensions: How many rows and columns a vector or matrix has. ASSIGNMENT 0 Introduction to Linear Algebra (Basics of vectors and matrices) Due 3:30 PM, Tuesday, October 10 th. Assignments should be submitted via e-mail to: matlabfun.ucsd@gmail.com You can also submit

More information

Lecture 27: Fast Laplacian Solvers

Lecture 27: Fast Laplacian Solvers Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall

More information

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P.

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P. 1 2 The LINPACK Benchmark on the Fujitsu AP 1000 Richard P. Brent Computer Sciences Laboratory The LINPACK Benchmark A popular benchmark for floating-point performance. Involves the solution of a nonsingular

More information

ON DATA LAYOUT IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM WITH PRE PROCESSING

ON DATA LAYOUT IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM WITH PRE PROCESSING Proceedings of ALGORITMY 2009 pp. 449 458 ON DATA LAYOUT IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM WITH PRE PROCESSING MARTIN BEČKA, GABRIEL OKŠA, MARIÁN VAJTERŠIC, AND LAURA GRIGORI Abstract. An efficient

More information

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines

A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 17 26 International Conference on Computational Science, ICCS 2012 A class of communication-avoiding algorithms for solving

More information

CS 664 Structure and Motion. Daniel Huttenlocher

CS 664 Structure and Motion. Daniel Huttenlocher CS 664 Structure and Motion Daniel Huttenlocher Determining 3D Structure Consider set of 3D points X j seen by set of cameras with projection matrices P i Given only image coordinates x ij of each point

More information

QR Decomposition on GPUs

QR Decomposition on GPUs QR Decomposition QR Algorithms Block Householder QR Andrew Kerr* 1 Dan Campbell 1 Mark Richards 2 1 Georgia Tech Research Institute 2 School of Electrical and Computer Engineering Georgia Institute of

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Least-Squares Fitting of Data with B-Spline Curves

Least-Squares Fitting of Data with B-Spline Curves Least-Squares Fitting of Data with B-Spline Curves David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International

More information

Basic matrix math in R

Basic matrix math in R 1 Basic matrix math in R This chapter reviews the basic matrix math operations that you will need to understand the course material and how to do these operations in R. 1.1 Creating matrices in R Create

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 7: parallel algorithms for QR factorization

CS 598: Communication Cost Analysis of Algorithms Lecture 7: parallel algorithms for QR factorization CS 598: Communication Cost Analysis of Algorithms Lecture 7: parallel algorithms for QR factorization Edgar Solomonik University of Illinois at Urbana-Champaign September 14, 2016 Parallel Householder

More information

Accelerating GPU kernels for dense linear algebra

Accelerating GPU kernels for dense linear algebra Accelerating GPU kernels for dense linear algebra Rajib Nath, Stanimire Tomov, and Jack Dongarra Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville {rnath1, tomov,

More information

Blocked Schur Algorithms for Computing the Matrix Square Root

Blocked Schur Algorithms for Computing the Matrix Square Root Blocked Schur Algorithms for Computing the Matrix Square Root Edvin Deadman 1, Nicholas J. Higham 2,andRuiRalha 3 1 Numerical Algorithms Group edvin.deadman@nag.co.uk 2 University of Manchester higham@maths.manchester.ac.uk

More information

Computational Methods CMSC/AMSC/MAPL 460. Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Matrix norms Can be defined using corresponding vector norms Two norm One norm Infinity

More information

Chapter Introduction

Chapter Introduction Chapter 4.1 Introduction After reading this chapter, you should be able to 1. define what a matrix is. 2. identify special types of matrices, and 3. identify when two matrices are equal. What does a matrix

More information

Parallel Numerics, WT 2017/ Introduction. page 1 of 127

Parallel Numerics, WT 2017/ Introduction. page 1 of 127 Parallel Numerics, WT 2017/2018 1 Introduction page 1 of 127 Scope Revise standard numerical methods considering parallel computations! Change method or implementation! page 2 of 127 Scope Revise standard

More information

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Analytical Modeling of Parallel Systems To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Sources of Overhead in Parallel Programs Performance Metrics for

More information

Fail-Stop Failure ABFT for Cholesky Decomposition

Fail-Stop Failure ABFT for Cholesky Decomposition Fail-Stop Failure ABFT for Cholesky Decomposition Doug Hakkarinen, Student Member, IEEE, anruo Wu, Student Member, IEEE, and Zizhong Chen, Senior Member, IEEE Abstract Modeling and analysis of large scale

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Introduction to Parallel Computing Errata

Introduction to Parallel Computing Errata Introduction to Parallel Computing Errata John C. Kirk 27 November, 2004 Overview Book: Introduction to Parallel Computing, Second Edition, first printing (hardback) ISBN: 0-201-64865-2 Official book website:

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. Fall 2017 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks

More information

1 Motivation for Improving Matrix Multiplication

1 Motivation for Improving Matrix Multiplication CS170 Spring 2007 Lecture 7 Feb 6 1 Motivation for Improving Matrix Multiplication Now we will just consider the best way to implement the usual algorithm for matrix multiplication, the one that take 2n

More information

SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS

SUPERFAST MULTIFRONTAL METHOD FOR STRUCTURED LINEAR SYSTEMS OF EQUATIONS SUPERFAS MULIFRONAL MEHOD FOR SRUCURED LINEAR SYSEMS OF EQUAIONS S. CHANDRASEKARAN, M. GU, X. S. LI, AND J. XIA Abstract. In this paper we develop a fast direct solver for discretized linear systems using

More information

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch.

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. Iterative Improvement Algorithm design technique for solving optimization problems Start with a feasible solution Repeat the following step until no improvement can be found: change the current feasible

More information

Lecture 9. Introduction to Numerical Techniques

Lecture 9. Introduction to Numerical Techniques Lecture 9. Introduction to Numerical Techniques Ivan Papusha CDS270 2: Mathematical Methods in Control and System Engineering May 27, 2015 1 / 25 Logistics hw8 (last one) due today. do an easy problem

More information

Performance Evaluation of a New Parallel Preconditioner

Performance Evaluation of a New Parallel Preconditioner Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.1 Vector and Matrix Products Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview Aim Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity Julian Hall School of Mathematics University of Edinburgh jajhall@ed.ac.uk What should a 2-hour PhD lecture on structure

More information

COMP 558 lecture 19 Nov. 17, 2010

COMP 558 lecture 19 Nov. 17, 2010 COMP 558 lecture 9 Nov. 7, 2 Camera calibration To estimate the geometry of 3D scenes, it helps to know the camera parameters, both external and internal. The problem of finding all these parameters is

More information

Lecture 10: Performance Metrics. Shantanu Dutt ECE Dept. UIC

Lecture 10: Performance Metrics. Shantanu Dutt ECE Dept. UIC Lecture 10: Performance Metrics Shantanu Dutt ECE Dept. UIC Acknowledgement Adapted from Chapter 5 slides of the text, by A. Grama w/ a few changes, augmentations and corrections in colored text by Shantanu

More information

COMPUTER OPTIMIZATION

COMPUTER OPTIMIZATION COMPUTER OPTIMIZATION Storage Optimization: Since Normal Matrix is a symmetric matrix, store only half of it in a vector matrix and develop a indexing scheme to map the upper or lower half to the vector.

More information

COSC6365. Introduction to HPC. Lecture 21. Lennart Johnsson Department of Computer Science

COSC6365. Introduction to HPC. Lecture 21. Lennart Johnsson Department of Computer Science Introduction to HPC Lecture 21 Department of Computer Science Most slides from UC Berkeley CS 267 Spring 2011, Lecture 12, Dense Linear Algebra (part 2), Parallel Gaussian Elimination. Jim Demmel Dense

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 28 August 2018 Last Thursday Introduction

More information

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. November 2014 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks

More information

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Some special matrices Matlab code How many operations and memory

More information

Sparse Linear Systems

Sparse Linear Systems 1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite

More information

Matrices. A Matrix (This one has 2 Rows and 3 Columns) To add two matrices: add the numbers in the matching positions:

Matrices. A Matrix (This one has 2 Rows and 3 Columns) To add two matrices: add the numbers in the matching positions: Matrices A Matrix is an array of numbers: We talk about one matrix, or several matrices. There are many things we can do with them... Adding A Matrix (This one has 2 Rows and 3 Columns) To add two matrices:

More information

CS 6210 Fall 2016 Bei Wang. Review Lecture What have we learnt in Scientific Computing?

CS 6210 Fall 2016 Bei Wang. Review Lecture What have we learnt in Scientific Computing? CS 6210 Fall 2016 Bei Wang Review Lecture What have we learnt in Scientific Computing? Let s recall the scientific computing pipeline observed phenomenon mathematical model discretization solution algorithm

More information

5. Direct Methods for Solving Systems of Linear Equations. They are all over the place... and may have special needs

5. Direct Methods for Solving Systems of Linear Equations. They are all over the place... and may have special needs 5. Direct Methods for Solving Systems of Linear Equations They are all over the place... and may have special needs They are all over the place... and may have special needs, December 13, 2012 1 5.3. Cholesky

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination

More information

Dense matrix algebra and libraries (and dealing with Fortran)

Dense matrix algebra and libraries (and dealing with Fortran) Dense matrix algebra and libraries (and dealing with Fortran) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Dense matrix algebra and libraries (and dealing with Fortran)

More information

For example, the system. 22 may be represented by the augmented matrix

For example, the system. 22 may be represented by the augmented matrix Matrix Solutions to Linear Systems A matrix is a rectangular array of elements. o An array is a systematic arrangement of numbers or symbols in rows and columns. Matrices (the plural of matrix) may be

More information

A Few Numerical Libraries for HPC

A Few Numerical Libraries for HPC A Few Numerical Libraries for HPC CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) A Few Numerical Libraries for HPC Spring 2016 1 / 37 Outline 1 HPC == numerical linear

More information

CHAPTER 2 SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS

CHAPTER 2 SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS CHAPTER SENSITIVITY OF LINEAR SYSTEMS; EFFECTS OF ROUNDOFF ERRORS The two main concepts involved here are the condition (of a problem) and the stability (of an algorithm). Both of these concepts deal with

More information

Chapter 4. Matrix and Vector Operations

Chapter 4. Matrix and Vector Operations 1 Scope of the Chapter Chapter 4 This chapter provides procedures for matrix and vector operations. This chapter (and Chapters 5 and 6) can handle general matrices, matrices with special structure and

More information

A Study of Numerical Methods for Simultaneous Equations

A Study of Numerical Methods for Simultaneous Equations A Study of Numerical Methods for Simultaneous Equations Er. Chandan Krishna Mukherjee B.Sc.Engg., ME, MBA Asstt. Prof. ( Mechanical ), SSBT s College of Engg. & Tech., Jalgaon, Maharashtra Abstract: -

More information

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker BLAS and LAPACK + Data Formats for Sparse Matrices Part of the lecture Wissenschaftliches Rechnen Hilmar Wobker Institute of Applied Mathematics and Numerics, TU Dortmund email: hilmar.wobker@math.tu-dortmund.de

More information

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Issues in Sorting on Parallel

More information

COMPUTATIONAL LINEAR ALGEBRA

COMPUTATIONAL LINEAR ALGEBRA COMPUTATIONAL LINEAR ALGEBRA Matrix Vector Multiplication Matrix matrix Multiplication Slides from UCSD and USB Directed Acyclic Graph Approach Jack Dongarra A new approach using Strassen`s algorithm Jim

More information