Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Size: px

Start display at page:

Download "Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA"

Samuel Rodgers
6 years ago
Views:

1 Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2012 svmoore.pbworks.com November 8,

2 Learning ObjecNves AOer complenng this lesson, you should be able to List and describe advantages of using linear algebra libraries List types of computanons performed by linear algebra libraries Describe funcnonality of the BLAS Locate and use documentanon on linear algebra libraries for your plarorm Insert calls to linear algebra library rounnes into your program and compile and run the resulnng program Describe current research on numerical linear algebra for mulncore and heterogeneous architectures 2

3 Numerical Linear Algebra Algorithms for performing matrix operanons on computers Widely used in sciennfic, engineering, and financial applicanons Fundamental algorithms Basic matrix and vector operanons LU decomposinon QR decomposinon Singular value decomposinon Eigenvalues 3

4 BLAS Basic Linear Algebra Subprograms De facto standard (all implementanons use the same calling interface) First published in 1979 h`p:// BLA Quick Reference Guide: h`p:// Tuned versions implemented by vendors (Intel MKL, AMD ACML, Cray LibSci, IBM ESSL) RouNnes to perform basic operanons such as vector and matrix mulnplicanon 4

5 BLAS FuncNonality and Levels Level 1! This level contains vector operations of the form" as well as scalar dot products and vector norms, among other things." Level 2! This level contains matrix-vector operations of the form" as well as solving for with being triangular, among other things." Level 3! This level contains matrix-matrix operations of the form " as well as solving for triangular matrices, among other " things. This level contains the widely used General Matrix Multiply (GEMM) operation." 5

6 General Matrix MulNply (GEMM) where TRANSA and TRANSB determine if the matrices A and B are to be transposed M is the number of rows in matrix C and, depending on TRANSA, the number of rows in the original matrix A or its transpose. N is the number of columns in matrix C and, depending on TRANSB, the number of columns in the matrix B or its transpose. K is the number of columns in matrix A (or its transpose) and rows in matrix B (or its transpose). LDA, LDB and LDC specify the size of the first dimension of the matrices, as laid out in memory; meaning the memory distance between the start of each row/column, depending on the memory structure. Precision (x) S for single, D for double, C for complex single, Z for complex double 6

7 LAPACK Linear Algebra PACKage De facto standard Successor to the linear equanons and linear least- squares rounnes of LINPACK and the eigenvalue rounnes of EISPACK RouNnes for solving systems of linear equanons, linear least squares, eigenvalue problems, and singular value decomposinon RouNnes to implement the associated matrix factorizanons such as LU, QR, Cholesky and Schur decomposinon Handles real and complex matrices in both single and double precision Depends on the BLAS to effecnvely exploit caches on modern cache- based architectures Tuned versions implemented in vendor libraries (e.g., AMD ACML, Intel MKL, Cray LibSci, IBM ESSL) 7

8 LAPACK Naming Scheme A LAPACK subrounne name is in the form pmmaaa, where: p is a one- le`er code denonng the type of numerical constants used. S, D stand for real floanng point arithmenc respecnvely in single and double precision, while C and Z stand for complex arithmenc with respecnvely single and double precision. mm is a two- le`er code denonng the kind of matrix expected by the algorithm. The actual data are stored in a different format depending on the specific kind; e.g., when the code DI is given, the subrounne expects a vector of length n containing the elements on the diagonal, while when the code GE is given, the subrounne expects an n n array containing the entries of the matrix. aaa is a one- to three- le`er code describing the actual algorithm implemented in the subrounne, e.g. SV denotes a subrounne to solve linear system, while R denotes a rank- 1 update. For example, the subrounne to solve a linear system with a general (non- structured) matrix using real double- precision arithmenc is called DGESV. For details, see the LAPACK User s Guide at 8

9 ACML AMD Core Math Library h`p://developer.amd.com/tools/cpu- development/amd- core- math- library- acml/ ACML consists of the following main components: A full implementanon of Level 1, 2 and 3 Basic Linear Algebra Subprograms (BLAS), with opnmizanons for AMD Opteron processors. A full suite of Linear Algebra (LAPACK) rounnes. A comprehensive suite of Fast Fourier transform (FFTs) in single-, double-, single- complex and double- complex data types. Fast scalar, vector, and array math transcendental library rounnes Random Number Generators in both single- and double- precision /shared/acml on Griffin 9

10 Class Exercises Use the ACML DGEMM rounne to do the matrix mulnplicanon in untuned.c Solve the following Ax=b by hand using Gaussian eliminanon with parnal pivonng: Run one of the ACML xgetrf examples to solve a linear system of equanons. 10

11 ScaLAPACK Scalable Linear Algebra PACKage Library of high- performance linear algebra rounnes for parallel distributed memory machines Solves dense and banded linear systems, least squares problems, eigenvalue problems, and singular value problems Key ideas block cyclic data distribunon for dense matrices and a block data distribunon for banded matrices, parameterizable at runnme block- parnnoned algorithms to ensure high levels of data reuse Efficient low- level communicanon implemented by BLACS (Basic Linear Algebra CommunicaNon Subprograms) Will run on any machine with BLAS, LAPACK, and BLACS 11

12 Current Efforts Parallel Linear Algebra SoOware for MulNcore Architectures (PLASMA) icl.cs.utk.edu/plasma/ Matrix Algebra on GPU and MulNcore Architectures (MAGMA) icl.cs.utk.edu/magma/ 12

Scientific Computing. Some slides from James Lambers, Stanford

Scientific Computing. Some slides from James Lambers, Stanford Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical