Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication

Size: px
Start display at page:

Download "Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication"

Transcription

1 ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication Penporn Koanantakool 1,2, riful zad 2, ydin Buluç 2,Dmitriy Morozov 2, Sang-Yun Oh 2,3, Leonid Oliker 2, Katherine Yelick 1,2 1 omputer Science Division, University of alifornia, Berkeley 2 Lawrence Berkeley National Laboratory 3 University of alifornia, Santa Barbara May 26, 2016

2 Outline Background and Motivation lgorithms 2D SUMM 3D SUMM 1.5D variants Experimental Results : 100x speedup Iterative Multiplication onclusions ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 2

3 Parallel lgorithm ost Model p distributed, homogenous processors connected through network Per-processor costs along critical path Image courtesy of: James Demmel PU DRM PU DRM PU DRM PU DRM time = #flops time_per_flop + #messages time_per_message + #words time_per_word omputation ommunication time_per_flop, time_per_message, time_per_word are machine-specific constants. Goal: minimize #messages (S) and #words (W) ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 3

4 Sparse Dense Matmul (sparse) B = (dense) (dense) i j k i,j += i,k B k,j B :,j i,: i,j ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 4

5 Matrix Multiplication lgorithms B B B 1.5D 2.5D 1D 2D 3D 2D (SUMM) B (dense) B (dense) 1D p 0 p 0 Optimal for dense-dense/ sparse-sparse matmul [Solomonik and Demmel 2011] [Ballard et al. 2013] (sparse) (dense) (sparse) (dense) ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication * 2D and 3D images courtesy of Grey Ballard 5

6 Existing lgorithms ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 6

7 1D Ring lgorithm p=4 1D layout p B ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 7

8 1D Ring lgorithm p=4 1D layout Shifts matrix around cyclically p B ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 8

9 1D Ring lgorithm p=4 1D layout Shifts matrix around cyclically p B ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 9

10 1D Ring lgorithm p=4 1D layout Shifts matrix around cyclically #msgs: p msg size: nnz()/p #words: nnz() p B ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 10

11 2D SUMM lgorithm p=16 2D layout, grid a b c d e f B a b c d e f a b c d e f ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 11

12 2D SUMM lgorithm p=16 2D layout, outer products grid B ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 12

13 2D SUMM lgorithm p=16 2D layout, outer products grid B broadcasts row-wise ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 13

14 2D SUMM lgorithm p=16 2D layout, outer products grid B broadcasts row-wise B broadcasts column-wise ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 14

15 2D SUMM lgorithm p=16 2D layout, outer products grid B ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 15

16 2D SUMM lgorithm p=16 2D layout, outer products grid B ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 16

17 2D SUMM lgorithm p=16 2D layout, outer products grid broadcasts of size nnz()/p broadcasts of size nnz(b)/p B ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 17

18 3D SUMM lgorithm p=32, c=2 (c is #copies) grid (teams of size c) B B ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 18

19 3D SUMM lgorithm p=32, c=2 (c is #copies) grid (teams of size c) B B First team member calculates the first half outer products. Second team member calculates the latter half outer products. ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 19

20 3D SUMM lgorithm p=32, c=2 (c is #copies) grid (teams of size c) Each team member calculates 1/c of all outer products broadcasts of size nnz()c/p broadcasts of size nnz(b)c/p B Second team member calculates the latter half outer products. ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 20

21 3D SUMM: osts Replication Replicate : Replicate B: Propagation Broadcast : Broadcast B: ollection Reduce : #messages #words ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 21

22 1.5D lgorithms ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 22

23 1.5D ol p p=4, c=2 B has p/c block columns p/c 0, 1 2, Round 1 ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 23

24 1.5D ol p p=4, c=2 B has p/c block columns p/c #msgs: p/c msg size: 0, 1 2, Round 1 #words: nnz() 2, 3 0, Round 2 ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 24

25 1.5D ol B p/c p=8, c=2 and B both have p/c block columns p/c B Each team member does 1/c of work Round 1 ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 25

26 1.5D ol B p/c p=8, c=2 and B both have p/c block columns p/c B Each team member does 1/c of work #msgs: p/c Round 1 msg size: #words: Round 2 ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 26

27 1.5D Inner B p/c p=8, c=2 has p/c block rows B has p/c block columns B Each team member does 1/c of work 0, 5 2, 7 4, 1 6, Round 1 ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 27

28 1.5D Inner B p/c p=8, c=2 has p/c block rows B has p/c block columns B Each team member does 1/c of work #msgs: p/c 2 0, 5 2, 7 4, 1 6, Round 1 msg size: #words: 6, 3 0, 5 2, 7 4, Round 2 ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 28

29 ost omparison Latency cost ppend,, for bandwidth cost Even replication and collection can be the bottleneck if nnz() nnz(b), nnz() ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 29

30 Experimental Results ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 30

31 Test environments omparison 1.5D variants: ol, ol B, and Inner B Baseline: Best of 2D/2.5D/3D SUMM B ++ MPI, hybrid (12 threads per MPI process) MKL s multithreaded routine for local computation (mkl_dcsrmm) (double-precision) SR + row major 2.57PFLOPS ray X30 12 cores, 2 sockets each ray ries, Dragonfly topology ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 31

32 ost Breakdown p=3,072, n=64k x 64k, 41 nnz per row, ray X30 Existing New ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 32

33 Moving Dense Matrix is ostly p=3,072, n=64k x 64k, 41 nnz per row, ray X30 Existing New ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 33

34 Replication might not pay off p=3,072, n=64k x 64k, 41 nnz per row, ray X30 17x Existing New ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 34

35 Different omputation Efficiency p=3,072, n=64k x 64k, 41 nnz per row, ray X30 17x Existing New ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 35

36 100x Improvement 66k x 172k, B 172k x 66k, % nnz, ray X30 100x Up is good ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 36

37 When is 1.5D better? Best theoretical bandwidth cost areas ssuming nnz() = nnz(b) lso applicable to dense-dense and sparse-sparse Most sparse matrices have < 5% nonzeroes ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 37

38 Iterative Multiplication ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 38

39 Sparse IM Estimation *[S. Oh et al 2014] IM = inverse covariance matrix Direct pairwise dependency Data could be Brain fmri scan Gene expression S is usually rank-deficient or too expensive to invert directly Estimate by setting an objective function and optimize r observations X T n variables/features X (observed data) S (sample covariance) rank-deficient ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 39

40 ONORD-IST ONORD objective function : diagonal elements : off-diagonal elements : estimated sparse inverse of S Iteratively compute [or ] Ω S or Ω X T Replication (and sometimes collection) costs can be amortized. ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 40

41 onclusions 1.5D beneficial when #nnz of matrices are imbalanced. lso applicable to dense-dense/sparse-sparse. Observed up to 100 speedup. 1.5D can handle moderately imbalanced load with greedy partitioning. Iterative multiplication gains even more benefit. Future work Parallel ONORD-IST New communication lower bounds Try recursive approach onsider different computation efficiency ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 41

42 References I P. Koanantakool,. zad,. Buluç, D. Morozov, S. Oh, L. Oliker, K. Yelick. ommunication-avoiding parallel sparse-dense matrix-matrix multiplication. In IPDPS E. Solomonik and J. Demmel, ommunication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In Euro-Par, 2011, vol. 6853, pp G. Ballard,. Buluc, J. Demmel, L. Grigori, B. Lipshitz, O. Schwartz, and S. Toledo. ommunication optimal parallel multiplication of sparse random matrices. In Proceedings of the twenty-fifth annual M symposium on Parallelism in algorithms and architectures. M, 2013, pp ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 42

43 References II M. Driscoll, E. Georganas, P. Koanantakool, E. Solomonik, K. Yelick. communication-optimal n-body algorithm for direct interactions. In IPDPS S. Oh, O. Dalal, K. Khare, and B. Rajaratnam, Optimization methods for sparse pseudo-likelihood graphical model selection. In NIPS 27, 2014, pp ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 43

44 Thank you! Questions? Suggestions/pplications? ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 44

45 ommunication osts 3 phases Phase Description s c increases.. Replication to have c copies Propagation to multiply ollection of matrix Non-replicating version only has propagation cost. Replication and collection happen between team members, i.e., only c processors ( ), so they are usually cheaper than propagation. ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 45

46 ONORD-IST ONORD objective function non-smooth : estimated sparse inverse of S Proximal gradient method Smooth: Gradient descent Non-smooth: Soft-thresholding (IST: Iterative Soft-Thresholding lgorithm) : diagonal elements : off-diagonal elements λ λ Does not assume Gaussian distribution ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 46

47 Pseudocode : estimated sparse inverse of S S = X T X/r Ω = I do Ω 0 = Ω ompute gradient G try τ = {1, 1/2, 1/4, } // find appropriate step size Ω = soft_thresholding(ω 0 -τg, τλ) until descent condition satisfied while max( Ω 0 -Ω ) ε // until no more progress Distributed operations S = X T X/r : dense-dense matrix multiplication SΩ+ΩS : sparse-dense matrix multiplication ommunication-voiding Parallel Sparse-Dense Matrix-Matrix Multiplication 47

Parallel Methods for Convex Optimization. A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney

Parallel Methods for Convex Optimization. A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney Parallel Methods for Convex Optimization A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney Problems minimize g(x)+f(x; A, b) Sparse regression g(x) =kxk 1 f(x) =kax bk 2 2 mx Sparse SVM g(x) =kxk

More information

Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication

Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication Aydin Buluc John R. Gilbert University of California, Santa Barbara ICPP 2008 September 11, 2008 1 Support: DOE Office of Science,

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 4: communication avoiding algorithms for LU factorization

CS 598: Communication Cost Analysis of Algorithms Lecture 4: communication avoiding algorithms for LU factorization CS 598: Communication Cost Analysis of Algorithms Lecture 4: communication avoiding algorithms for LU factorization Edgar Solomonik University of Illinois at Urbana-Champaign August 31, 2016 Review of

More information

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply University of California, Berkeley Berkeley Benchmarking and Optimization Group (BeBOP) http://bebop.cs.berkeley.edu

More information

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Tools and Primitives for High Performance Graph Computation

Tools and Primitives for High Performance Graph Computation Tools and Primitives for High Performance Graph Computation John R. Gilbert University of California, Santa Barbara Aydin Buluç (LBNL) Adam Lugowski (UCSB) SIAM Minisymposium on Analyzing Massive Real-World

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 5 Vector and Matrix Products Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath Parallel

More information

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 19 25 October 2018 Topics for

More information

Automatic Tuning of Sparse Matrix Kernels

Automatic Tuning of Sparse Matrix Kernels Automatic Tuning of Sparse Matrix Kernels Kathy Yelick U.C. Berkeley and Lawrence Berkeley National Laboratory Richard Vuduc, Lawrence Livermore National Laboratory James Demmel, U.C. Berkeley Berkeley

More information

Modern GPUs (Graphics Processing Units)

Modern GPUs (Graphics Processing Units) Modern GPUs (Graphics Processing Units) Powerful data parallel computation platform. High computation density, high memory bandwidth. Relatively low cost. NVIDIA GTX 580 512 cores 1.6 Tera FLOPs 1.5 GB

More information

Maunam: A Communication-Avoiding Compiler

Maunam: A Communication-Avoiding Compiler Maunam: A Communication-Avoiding Compiler Karthik Murthy Advisor: John Mellor-Crummey Department of Computer Science Rice University! karthik.murthy@rice.edu February 2014!1 Knights Tour Closed Knight's

More information

Lecture 18: March 23

Lecture 18: March 23 0-725/36-725: Convex Optimization Spring 205 Lecturer: Ryan Tibshirani Lecture 8: March 23 Scribes: James Duyck Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.1 Vector and Matrix Products Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 7: parallel algorithms for QR factorization

CS 598: Communication Cost Analysis of Algorithms Lecture 7: parallel algorithms for QR factorization CS 598: Communication Cost Analysis of Algorithms Lecture 7: parallel algorithms for QR factorization Edgar Solomonik University of Illinois at Urbana-Champaign September 14, 2016 Parallel Householder

More information

Matrix algorithms: fast, stable, communication-optimizing random?!

Matrix algorithms: fast, stable, communication-optimizing random?! Matrix algorithms: fast, stable, communication-optimizing random?! Ioana Dumitriu Department of Mathematics University of Washington (Seattle) Joint work with Grey Ballard, James Demmel, Olga Holtz, Robert

More information

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3 UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Server Side Applications (i.e., public/private Clouds and HPC)

Server Side Applications (i.e., public/private Clouds and HPC) Server Side Applications (i.e., public/private Clouds and HPC) Kathy Yelick UC Berkeley Lawrence Berkeley National Laboratory Proposed DOE Exascale Science Problems Accelerators Carbon Capture Cosmology

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular Linear Systems Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Parallel Reduction from Block Hessenberg to Hessenberg using MPI

Parallel Reduction from Block Hessenberg to Hessenberg using MPI Parallel Reduction from Block Hessenberg to Hessenberg using MPI Viktor Jonsson May 24, 2013 Master s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Lars Karlsson Examiner: Fredrik Georgsson

More information

Lecture 4: Principles of Parallel Algorithm Design (part 4)

Lecture 4: Principles of Parallel Algorithm Design (part 4) Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Minimize execution time Reduce overheads of execution Sources of overheads: Inter-process interaction

More information

All-Pairs Shortest Paths - Floyd s Algorithm

All-Pairs Shortest Paths - Floyd s Algorithm All-Pairs Shortest Paths - Floyd s Algorithm Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 31, 2011 CPD (DEI / IST) Parallel

More information

Parallel dense linear algebra computations (1)

Parallel dense linear algebra computations (1) Parallel dense linear algebra computations (1) Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA, Spring 2008 [L.07] Tuesday, January 29, 2008 1 Sources for today s material Mike Heath

More information

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j

More information

A Communication-Optimal N-Body Algorithm for Direct Interactions

A Communication-Optimal N-Body Algorithm for Direct Interactions A Communication-Optimal N-Body Algorithm for Direct Interactions Michael Driscoll, Evangelos Georganas, Penporn Koanantakool, Edgar Solomonik, and Katherine Yelick Computer Science Division, University

More information

Communication Efficient Gaussian Elimination with Partial Pivoting using a Shape Morphing Data Layout

Communication Efficient Gaussian Elimination with Partial Pivoting using a Shape Morphing Data Layout Communication Efficient Gaussian Elimination with Partial Pivoting using a Shape Morphing Data Layout Grey Ballard James Demmel Benjamin Lipshitz Oded Schwartz Sivan Toledo Electrical Engineering and Computer

More information

Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture

Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Exploiting Locality in Sparse Matrix-Matrix Multiplication on the Many Integrated Core Architecture K. Akbudak a, C.Aykanat

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 21 Outline 1 Course

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

QR Decomposition on GPUs

QR Decomposition on GPUs QR Decomposition QR Algorithms Block Householder QR Andrew Kerr* 1 Dan Campbell 1 Mark Richards 2 1 Georgia Tech Research Institute 2 School of Electrical and Computer Engineering Georgia Institute of

More information

Lecture 17: Array Algorithms

Lecture 17: Array Algorithms Lecture 17: Array Algorithms CS178: Programming Parallel and Distributed Systems April 4, 2001 Steven P. Reiss I. Overview A. We talking about constructing parallel programs 1. Last time we discussed sorting

More information

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development

Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Performance Analysis of BLAS Libraries in SuperLU_DIST for SuperLU_MCDT (Multi Core Distributed) Development M. Serdar Celebi

More information

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li Evaluation of sparse LU factorization and triangular solution on multicore architectures X. Sherry Li Lawrence Berkeley National Laboratory ParLab, April 29, 28 Acknowledgement: John Shalf, LBNL Rich Vuduc,

More information

HIGH PERFORMANCE NUMERICAL LINEAR ALGEBRA. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

HIGH PERFORMANCE NUMERICAL LINEAR ALGEBRA. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 1 HIGH PERFORMANCE NUMERICAL LINEAR ALGEBRA Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 BLAS BLAS 1, 2, 3 Performance GEMM Optimized BLAS Parallel

More information

Blocking Optimization Strategies for Sparse Tensor Computation

Blocking Optimization Strategies for Sparse Tensor Computation Blocking Optimization Strategies for Sparse Tensor Computation Jee Choi 1, Xing Liu 1, Shaden Smith 2, and Tyler Simon 3 1 IBM T. J. Watson Research, 2 University of Minnesota, 3 University of Maryland

More information

Reducing Inter-process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication

Reducing Inter-process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication Reducing Inter-process Communication Overhead in Parallel Sparse Matrix-Matrix Multiplication Md Salman Ahmed, Jennifer Houser, Mohammad Asadul Hoque, Phil Pfeiffer Department of Computing Department of

More information

Communication Optimal Parallel Multiplication of Sparse Random Matrices

Communication Optimal Parallel Multiplication of Sparse Random Matrices Communication Optimal arallel Multiplication of Sparse Random Matrices Grey Ballard Aydin Buluc James Demmel Laura Grigori Benjamin Lipshitz Oded Schwartz Sivan Toledo Electrical Engineering and Computer

More information

CS 770G - Parallel Algorithms in Scientific Computing

CS 770G - Parallel Algorithms in Scientific Computing CS 770G - Parallel lgorithms in Scientific Computing Dense Matrix Computation II: Solving inear Systems May 28, 2001 ecture 6 References Introduction to Parallel Computing Kumar, Grama, Gupta, Karypis,

More information

Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors

Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors Adaptive Matrix Transpose Algorithms for Distributed Multicore ors John C. Bowman and Malcolm Roberts Abstract An adaptive parallel matrix transpose algorithm optimized for distributed multicore architectures

More information

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P.

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P. 1 2 The LINPACK Benchmark on the Fujitsu AP 1000 Richard P. Brent Computer Sciences Laboratory The LINPACK Benchmark A popular benchmark for floating-point performance. Involves the solution of a nonsingular

More information

15. The Software System ParaLab for Learning and Investigations of Parallel Methods

15. The Software System ParaLab for Learning and Investigations of Parallel Methods 15. The Software System ParaLab for Learning and Investigations of Parallel Methods 15. The Software System ParaLab for Learning and Investigations of Parallel Methods... 1 15.1. Introduction...1 15.2.

More information

swsptrsv: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architecture Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu

swsptrsv: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architecture Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu swsptrsv: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architecture 1,3 2 1,3 1,3 Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu 1 2 3 Outline 1. Background 2. Sunway architecture

More information

Numerical Algorithms

Numerical Algorithms Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0

More information

Graph Partitioning for Scalable Distributed Graph Computations

Graph Partitioning for Scalable Distributed Graph Computations Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering

More information

Communication-Avoiding Sparse Matrix Algorithms for Large Graph and Machine Learning Problems

Communication-Avoiding Sparse Matrix Algorithms for Large Graph and Machine Learning Problems Communication-Avoiding Sparse Matrix Algorithms for Large Graph and Machine Learning Problems Aydın Buluç Computational Research Division, LBNL EECS Department, UC Berkeley New Architectures and Algorithms

More information

Avoiding communication in primal and dual block coordinate descent methods

Avoiding communication in primal and dual block coordinate descent methods Avoiding communication in primal and dual block coordinate descent methods Aditya Devarakonda Kimon Fountoulakis James Demmel Michael W. Mahoney Electrical Engineering and Computer Sciences University

More information

Summary. A simple model for point-to-point messages. Small message broadcasts in the α-β model. Messaging in the LogP model.

Summary. A simple model for point-to-point messages. Small message broadcasts in the α-β model. Messaging in the LogP model. Summary Design of Parallel and High-Performance Computing: Distributed-Memory Models and lgorithms Edgar Solomonik ETH Zürich December 9, 2014 Lecture overview Review: α-β communication cost model LogP

More information

EXPLOITING MULTIPLE LEVELS OF PARALLELISM IN SPARSE MATRIX-MATRIX MULTIPLICATION

EXPLOITING MULTIPLE LEVELS OF PARALLELISM IN SPARSE MATRIX-MATRIX MULTIPLICATION EXPLOITING MULTIPLE LEVELS OF PARALLELISM IN SPARSE MATRIX-MATRIX MULTIPLICATION ARIFUL AZAD, GREY BALLARD, AYDIN BULUÇ, JAMES DEMMEL, LAURA GRIGORI, ODED SCHWARTZ, SIVAN TOLEDO #, AND SAMUEL WILLIAMS

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

Introduction to Parallel. Programming

Introduction to Parallel. Programming University of Nizhni Novgorod Faculty of Computational Mathematics & Cybernetics Introduction to Parallel Section 9. Programming Parallel Methods for Solving Linear Systems Gergel V.P., Professor, D.Sc.,

More information

Adaptable benchmarks for register blocked sparse matrix-vector multiplication

Adaptable benchmarks for register blocked sparse matrix-vector multiplication Adaptable benchmarks for register blocked sparse matrix-vector multiplication Berkeley Benchmarking and Optimization group (BeBOP) Hormozd Gahvari and Mark Hoemmen Based on research of: Eun-Jin Im Rich

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication Nur Dean PhD Program in Computer Science The Graduate Center, CUNY 05/01/2017 Nur Dean (The Graduate Center) Matrix Multiplication 05/01/2017 1 / 36 Today, I will talk about matrix

More information

Lecture 17: More Fun With Sparse Matrices

Lecture 17: More Fun With Sparse Matrices Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you

More information

Lecture 4: Graph Algorithms

Lecture 4: Graph Algorithms Lecture 4: Graph Algorithms Definitions Undirected graph: G =(V, E) V finite set of vertices, E finite set of edges any edge e = (u,v) is an unordered pair Directed graph: edges are ordered pairs If e

More information

Communication-avoidance!

Communication-avoidance! Communication-avoidance! and! Automatic performance tuning! Nick Knight! UC-Berkeley Parlab (Bebop group)! 1 Students and Collaborators (Bebop) PIs! James Demmel! Kathy Yelick! Students! Marghoob Mohiyuddin!

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

Introduction to Multithreaded Algorithms

Introduction to Multithreaded Algorithms Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd

More information

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview Aim Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity Julian Hall School of Mathematics University of Edinburgh jajhall@ed.ac.uk What should a 2-hour PhD lecture on structure

More information

Artificial Intelligence for Robotics: A Brief Summary

Artificial Intelligence for Robotics: A Brief Summary Artificial Intelligence for Robotics: A Brief Summary This document provides a summary of the course, Artificial Intelligence for Robotics, and highlights main concepts. Lesson 1: Localization (using Histogram

More information

Lecture 19: November 5

Lecture 19: November 5 0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Unrolling parallel loops

Unrolling parallel loops Unrolling parallel loops Vasily Volkov UC Berkeley November 14, 2011 1 Today Very simple optimization technique Closely resembles loop unrolling Widely used in high performance codes 2 Mapping to GPU:

More information

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for

More information

Graphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General

Graphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General Robot Mapping Three Main SLAM Paradigms Least Squares Approach to SLAM Kalman filter Particle filter Graphbased Cyrill Stachniss least squares approach to SLAM 1 2 Least Squares in General! Approach for

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

CSE 160 Lecture 23. Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives)

CSE 160 Lecture 23. Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives) CS 160 Lecture 23 Matrix Multiplication Continued Managing communicators Gather and Scatter (Collectives) Today s lecture All to all communication Application to Parallel Sorting Blocking for cache 2013

More information

Performance without Pain = Productivity Data Layout and Collective Communication in UPC

Performance without Pain = Productivity Data Layout and Collective Communication in UPC Performance without Pain = Productivity Data Layout and Collective Communication in UPC By Rajesh Nishtala (UC Berkeley), George Almási (IBM Watson Research Center), Călin Caşcaval (IBM Watson Research

More information

Parallel algorithms for sparse matrix product, indexing, and assignment

Parallel algorithms for sparse matrix product, indexing, and assignment Parallel algorithms for sparse matrix product, indexing, and assignment Aydın Buluç Lawrence Berkeley National Laboratory February 8, 2012 Joint work with John R. Gilbert (UCSB) With inputs from G. Ballard,

More information

Department of Electrical and Computer Engineering

Department of Electrical and Computer Engineering LAGRANGIAN RELAXATION FOR GATE IMPLEMENTATION SELECTION Yi-Le Huang, Jiang Hu and Weiping Shi Department of Electrical and Computer Engineering Texas A&M University OUTLINE Introduction and motivation

More information

Analysis and Mapping of Sparse Matrix Computations

Analysis and Mapping of Sparse Matrix Computations Analysis and Mapping of Sparse Matrix Computations Nadya Bliss & Sanjeev Mohindra Varun Aggarwal & Una-May O Reilly MIT Computer Science and AI Laboratory September 19th, 2007 HPEC2007-1 This work is sponsored

More information

Parallelizing LU Factorization

Parallelizing LU Factorization Parallelizing LU Factorization Scott Ricketts December 3, 2006 Abstract Systems of linear equations can be represented by matrix equations of the form A x = b LU Factorization is a method for solving systems

More information

Communication-optimal Parallel and Sequential Cholesky decomposition

Communication-optimal Parallel and Sequential Cholesky decomposition Communication-optimal arallel and Sequential Cholesky decomposition Grey Ballard James Demmel Olga Holtz Oded Schwartz Electrical Engineering and Computer Sciences University of California at Berkeley

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017 Outline Introduction Graph Partitioning Problem Multilevel

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

How HPC Hardware and Software are Evolving Towards Exascale

How HPC Hardware and Software are Evolving Towards Exascale How HPC Hardware and Software are Evolving Towards Exascale Kathy Yelick Associate Laboratory Director and NERSC Director Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley NERSC Overview

More information

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new

More information

Sparse matrices, graphs, and tree elimination

Sparse matrices, graphs, and tree elimination Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 28 August 2018 Last Thursday Introduction

More information

Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors

Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors Adaptive Matrix Transpose Algorithms for Distributed Multicore ors John C. Bowman and Malcolm Roberts Abstract An adaptive parallel matrix transpose algorithm optimized for distributed multicore architectures

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation

More information

Graph Algorithms on A transpose A. Benjamin Chang John Gilbert, Advisor

Graph Algorithms on A transpose A. Benjamin Chang John Gilbert, Advisor Graph Algorithms on A transpose A. Benjamin Chang John Gilbert, Advisor June 2, 2016 Abstract There are strong correspondences between matrices and graphs. Of importance to this paper are adjacency matrices

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26

More information

The Fast Multipole Method on NVIDIA GPUs and Multicore Processors

The Fast Multipole Method on NVIDIA GPUs and Multicore Processors The Fast Multipole Method on NVIDIA GPUs and Multicore Processors Toru Takahashi, a Cris Cecka, b Eric Darve c a b c Department of Mechanical Science and Engineering, Nagoya University Institute for Applied

More information

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression Xiaoye Sherry Li, xsli@lbl.gov Liza Rebrova, Gustavo Chavez, Yang Liu, Pieter Ghysels Lawrence Berkeley National

More information

Parallel Graph Algorithms. Richard Peng Georgia Tech

Parallel Graph Algorithms. Richard Peng Georgia Tech Parallel Graph Algorithms Richard Peng Georgia Tech OUTLINE Model and problems Graph decompositions Randomized clusterings Interface with optimization THE MODEL The `scale up approach: Have a number of

More information

CME 213 SPRING Eric Darve

CME 213 SPRING Eric Darve CME 213 SPRING 2017 Eric Darve LINEAR ALGEBRA MATRIX-VECTOR PRODUCTS Application example: matrix-vector product We are going to use that example to illustrate additional MPI functionalities. This will

More information

Hypergraph Partitioning for Computing Matrix Powers

Hypergraph Partitioning for Computing Matrix Powers Hypergraph Partitioning for Computing Matrix Powers Nicholas Knight Erin Carson, James Demmel UC Berkeley, Parallel Computing Lab CSC 11 1 Hypergraph Partitioning for Computing Matrix Powers Motivation

More information

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai Parallel Computing: Parallel Algorithm Design Examples Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! Given associative operator!! a 0! a 1! a 2!! a

More information

Combinatorial problems in a Parallel Hybrid Linear Solver

Combinatorial problems in a Parallel Hybrid Linear Solver Combinatorial problems in a Parallel Hybrid Linear Solver Ichitaro Yamazaki and Xiaoye Li Lawrence Berkeley National Laboratory François-Henry Rouet and Bora Uçar ENSEEIHT-IRIT and LIP, ENS-Lyon SIAM workshop

More information

Gauss-Jordan Matrix Inversion Speed-Up using GPUs with the Consideration of Power Consumption

Gauss-Jordan Matrix Inversion Speed-Up using GPUs with the Consideration of Power Consumption Gauss-Jordan Matrix Inversion Speed-Up using GPUs with the Consideration of Power Consumption Mahmoud Shirazi School of Computer Science Institute for Research in Fundamental Sciences (IPM) Tehran, Iran

More information

Communication-efficient parallel generic pairwise elimination

Communication-efficient parallel generic pairwise elimination Communication-efficient parallel generic pairwise elimination Alexander Tiskin Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom Abstract The model of bulk-synchronous

More information

All About Erasure Codes: - Reed-Solomon Coding - LDPC Coding. James S. Plank. ICL - August 20, 2004

All About Erasure Codes: - Reed-Solomon Coding - LDPC Coding. James S. Plank. ICL - August 20, 2004 All About Erasure Codes: - Reed-Solomon Coding - LDPC Coding James S. Plank Logistical Computing and Internetworking Laboratory Department of Computer Science University of Tennessee ICL - August 2, 24

More information