FFPACK: Finite Field Linear Algebra Package

Size: px
Start display at page:

Download "FFPACK: Finite Field Linear Algebra Package"

Transcription

1 FFPACK: Finite Field Linear Algebra Package Jean-Guillaume Dumas, Pascal Giorgi and Clément Pernet {Jean.Guillaume.Dumas, P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 1

2 Introduction Motivation: Integer linear algebra P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 2

3 Introduction Motivation: Integer linear algebra Sparse or structured matrix: specific methods (Blackbox,...) Otherwise: Dense methods P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 2

4 Introduction Motivation: Integer linear algebra Sparse or structured matrix: specific methods (Blackbox,...) Otherwise: Dense methods Limit the growth of intermediate results: Several computations over distinct finite fields and reconstruction using Chinese Remaindering P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 2

5 Introduction Motivation: Integer linear algebra Sparse or structured matrix: specific methods (Blackbox,...) Otherwise: Dense methods Limit the growth of intermediate results: Several computations over distinct finite fields and reconstruction using Chinese Remaindering Applications: integer polynomial factorization, Gröbner basis computation, integer system solving,... P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 2

6 Exact Dense Linear Algebra Routines FFLAS Finite Field Linear Algebra Subroutines Based on a Matrix Multiplication kernel Using numerical BLAS through conversions Fast Matrix Multiplication algorithm P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 3

7 Exact Dense Linear Algebra Routines FFLAS Finite Field Linear Algebra Subroutines Based on a Matrix Multiplication kernel Using numerical BLAS through conversions Fast Matrix Multiplication algorithm Matrix matrix classic multiplication on a PIII 735 MHz ATLAS. (1) GF(19) Delayed with 32 bits. (2) GF(19) Prime field on top of Atlas. (3) GF(3 2 ) Galois field on top of Atlas. (4) 500 Speed (Mop/s) Matrix order P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 3

8 Exact Dense Linear Algebra Routines FFLAS Finite Field Linear Algebra Subroutines Based on a Matrix Multiplication kernel Using numerical BLAS through conversions Fast Matrix Multiplication algorithm Matrix matrix classic multiplication on a PIII 735 MHz ATLAS. (1) GF(19) Delayed with 32 bits. (2) GF(19) Prime field on top of Atlas. (3) GF(3 2 ) Galois field on top of Atlas. (4) 500 Speed (Mop/s) Matrix order FFPACK Finite Field Linear Algebra Package Higher Level (cf LAPACK) Based on matrix triangularization P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 3

9 Contents 1. Base field representations 2. Triangular System Solve (a) Three implementations (b) Two cascade algorithms and comparison 3. Triangularizations (a) Three implementations and comparison (b) Dealing with data locality 4. Conclusions and Perspectives P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 4

10 Base field representation Modular<double>: Based on machine double floating point representation Only using the mantissa Exact representation of integer up to 2 53 Avoids conversions and extra memory storage when using FFLAS P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 5

11 Base field representation Modular<double>: Based on machine double floating point representation Only using the mantissa Exact representation of integer up to 2 53 Avoids conversions and extra memory storage when using FFLAS Givaro-ZpZ: based on machine integer (16,32 or 64 bits) specialized dot-product (using delayed modulus) P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 5

12 Contents 1. Base field representations 2. Triangular System Solve (a) Three implementations (b) Two cascade algorithms and comparison 3. Triangularizations (a) Three implementations and comparison (b) Dealing with data locality 4. Conclusions and Perspectives P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 6

13 Triangular System Solve: trsm Compute a matrix X K m n, s.t. AX = B. P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 7

14 Triangular System Solve: trsm Compute a matrix X K m n, s.t. AX = B. Used for numerical computations (in the BLAS) P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 7

15 Triangular System Solve: trsm Compute a matrix X K m n, s.t. AX = B. Used for numerical computations (in the BLAS) Building block for triangularization block algorithms P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 7

16 Triangular System Solve: trsm Compute a matrix X K m n, s.t. AX = B. Used for numerical computations (in the BLAS) Building block for triangularization block algorithms Three different approaches for exact computation over a finite field : 1. A block recursive algorithm 2. A wrapping of the BLAS dtrsm 3. A matrix-vector based routine P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 7

17 Triangular System Solve: trsm Compute a matrix X K m n, s.t. AX = B. Used for numerical computations (in the BLAS) Building block for triangularization block algorithms Three different approaches for exact computation over a finite field : 1. A block recursive algorithm 2. A wrapping of the BLAS dtrsm 3. A matrix-vector based routine Cascade algorithms as solution P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 7

18 1. The block recursive algorithm [ ] [ ] [ ] A 1 A 2 0 A 3 X 1 X 2 = B 1 B 2 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 8

19 1. The block recursive algorithm [ ] [ ] [ ] A 1 A 2 0 A 3 X 1 X 2 = B 1 B 2 X 2 := recursive call on (A 3,B 2 ). P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 8

20 1. The block recursive algorithm [ ] [ ] [ ] A 1 A 2 0 A 3 X 1 X 2 = B 1 B 2 X 2 := recursive call on (A 3,B 2 ). B 1 := B 1 A 2 X 2. P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 8

21 1. The block recursive algorithm [ ] [ ] [ ] A 1 A 2 0 A 3 X 1 X 2 = B 1 B 2 X 2 := recursive call on (A 3,B 2 ). B 1 := B 1 A 2 X 2. X 1 := recursive call on (A 1,B 1 ). P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 8

22 1. The block recursive algorithm [ ] [ ] [ ] A 1 A 2 0 A 3 X 1 X 2 = B 1 B 2 X 2 := recursive call on (A 3,B 2 ). B 1 := B 1 A 2 X 2. X 1 := recursive call on (A 1,B 1 ). Reduces to matrix multiplication P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 8

23 1. The block recursive algorithm [ ] [ ] [ ] A 1 A 2 0 A 3 X 1 X 2 = B 1 B 2 X 2 := recursive call on (A 3,B 2 ). B 1 := B 1 A 2 X 2. X 1 := recursive call on (A 1,B 1 ). Reduces to matrix multiplication O(n ω ) algebraic time complexity P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 8

24 1. The block recursive algorithm [ ] [ ] [ ] A 1 A 2 0 A 3 X 1 X 2 = B 1 B 2 X 2 := recursive call on (A 3,B 2 ). B 1 := B 1 A 2 X 2. X 1 := recursive call on (A 1,B 1 ). Reduces to matrix multiplication O(n ω ) algebraic time complexity Efficiency of FFLAS P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 8

25 2. Wrapping the BLAS dtrsm Same approach as for the matrix multiplication in FFLAS: Conversion : Finite Field Real (double) Computation over the real (using BLAS dtrsm) Conversion : Real (double) Finite Field P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 9

26 2. Wrapping the BLAS dtrsm Same approach as for the matrix multiplication in FFLAS: Conversion : Finite Field Real (double) Computation over the real (using BLAS dtrsm) Conversion : Real (double) Finite Field Two constraints: No division must occur during BLAS computation No overflow P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 9

27 2. Wrapping the BLAS dtrsm First constraint: Divisions must be exact in x i = 1 a i,i b i n j=i+1 a i,j x j A must have a unit diagonal. Precondition A: diagonal of A. U = AD 1 A solve UY = B X = D 1 A Y where D A is the P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 10

28 2. Wrapping the BLAS dtrsm Second constraint: No overflow during the BLAS computation Must bound the growth of the coefficients: P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 11

29 2. Wrapping the BLAS dtrsm Second constraint: No overflow during the BLAS computation Must bound the growth of the coefficients: Naive:(p 1)p n 1 < 2 m P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 11

30 2. Wrapping the BLAS dtrsm Second constraint: No overflow during the BLAS computation Must bound the growth of the coefficients: Naive:(p 1)p n 1 < 2 m Using a classical prime field repr.: 0 x p 1: p 1 2 [ p n 1 + (p 2) n 1] < 2 m P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 11

31 2. Wrapping the BLAS dtrsm Second constraint: No overflow during the BLAS computation Must bound the growth of the coefficients: Naive:(p 1)p n 1 < 2 m Using a classical prime field repr.: 0 x p 1: p 1 2 [ p n 1 + (p 2) n 1] < 2 m Using a centered prime field repr.: p 1 2 x p 1 2 : p 1 2 ( ) n 1 p+1 2 < 2 m P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 11

32 2. Wrapping the BLAS dtrsm Limit the matrix order for a given prime p: P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 12

33 2. Wrapping the BLAS dtrsm Limit the matrix order for a given prime p: For m = 53: p = 2 n 55 p = 9739 n 4 p = n 2 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 12

34 2. Wrapping the BLAS dtrsm Limit the matrix order for a given prime p: For m = 53: Représentation positive Représentation centrée p = 2 n 55 p = 9739 n 4 p = n 2 n p P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 12

35 3. Using Matrix-vector products A. Xi = X 1 Bi i = B i A.X i+1..n Matrix vector product Xi+1..n Different implementations: P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 13

36 3. Using Matrix-vector products A. Xi = X 1 Bi i = B i A.X i+1..n Matrix vector product Xi+1..n Different implementations: Using modular<double>: BLAS gemv and modulo P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 13

37 3. Using Matrix-vector products A. Xi = X 1 Bi i = B i A.X i+1..n Matrix vector product Xi+1..n Different implementations: Using modular<double>: BLAS gemv and modulo Using integral representations: design of specialized dot-product routines [Dumas CASC 04] P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 13

38 3. Using Matrix-vector products A. Xi = X 1 Bi i = B i A.X i+1..n Matrix vector product Xi+1..n Different implementations: Using modular<double>: BLAS gemv and modulo Using integral representations: design of specialized dot-product routines [Dumas CASC 04] Advantages: Delayed modulus (1 for each row of X) P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 13

39 3. Using Matrix-vector products A. Xi = X 1 Bi i = B i A.X i+1..n Matrix vector product Xi+1..n Different implementations: Using modular<double>: BLAS gemv and modulo Using integral representations: design of specialized dot-product routines [Dumas CASC 04] Advantages: Delayed modulus (1 for each row of X) nicer bound: n(p 1) 2 < 2 m P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 13

40 3. Using Matrix-vector products A. Xi = X 1 Bi i = B i A.X i+1..n Matrix vector product Xi+1..n Different implementations: Using modular<double>: BLAS gemv and modulo Using integral representations: design of specialized dot-product routines [Dumas CASC 04] Advantages: Delayed modulus (1 for each row of X) nicer bound: n(p 1) 2 < 2 m Drawback: less efficient for large matrices (n 100) P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 13

41 Two cascade algorithms YES Idea: Or trsm rec n:=n/2 YES (n>n_blas)? NO trsm blas trsm rec n:=n/2 (n>n_delayed)? NO trsm delayed P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 14

42 Two cascade algorithms YES Idea: Or trsm rec n:=n/2 YES (n>n_blas)? NO trsm blas trsm rec n:=n/2 (n>n_delayed)? NO trsm delayed Timings of the cascade algorithm over Z/5Z using modular<double> on a P4 2.4Ghz Timings of the cascade algorithm over Z/5Z using Givaro ZpZ on a P4 2.4Ghz Speed (Mfops) Speed (Mfops) trsm rec trsm blas trsm delayed Matrix order trsm rec trsm blas trsm delayed Matrix order modular<double> p = 5 Givaro-ZpZ p = 5 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 14

43 Two cascade algorithms YES Idea: Or trsm rec n:=n/2 YES (n>n_blas)? NO trsm blas trsm rec n:=n/2 (n>n_delayed)? NO trsm delayed P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 14

44 Conclusion 1. trsm-blas highly depends on the prime trsm-delayed do not P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 15

45 Conclusion 1. trsm-blas highly depends on the prime trsm-delayed do not 2. If the field representation can be chosen Use Modular<double> with trsm-blas P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 15

46 Conclusion 1. trsm-blas highly depends on the prime trsm-delayed do not 2. If the field representation can be chosen Use Modular<double> with trsm-blas 3. Otherwise For some cases, a specialization of dot-product can slightly outperform trsm-blas P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 15

47 Contents 1. Base field representations 2. Triangular System Solve (a) Three implementations (b) Two cascade algorithms and comparison 3. Triangularizations (a) Three implementations and comparison (b) Dealing with data locality 4. Conclusions and Perspectives P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 16

48 Triangularization Specific issues: Have to deal with singular matrices Memory requirements P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 17

49 Triangularization Specific issues: Have to deal with singular matrices Memory requirements Provide a better analysis of the algebraic time complexity: improves the constant of the dominant term giving T = 2 3 n3 + O(n 2 ) in the nonsingular case with classic matrix multiplication P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 17

50 Triangularization Specific issues: Have to deal with singular matrices Memory requirements Provide a better analysis of the algebraic time complexity: improves the constant of the dominant term giving T = 2 3 n3 + O(n 2 ) in the nonsingular case with classic matrix multiplication We will compare 3 implementations: LSP: a block recursive algorithm [Ibara & Al.] LUdivine: LSP with lesser memory requirements LQUP : Fully in-place triangularization P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 17

51 LSP algorithms LSP [Ibara]: Split the row dimension P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 18

52 LSP algorithms LSP [Ibara]: Split the row dimension recursive call on A 1 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 18

53 LSP algorithms LSP [Ibara]: Split the row dimension recursive call on A 1 G A 21 U 1 1 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 18

54 LSP algorithms LSP [Ibara]: Split the row dimension recursive call on A 1 G A 21 U 1 1 A 22 A 22 GV P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 18

55 LSP algorithms LSP [Ibara]: Split the row dimension recursive call on A 1 S S 1 S 2 G A 21 U 1 1 A 22 A 22 GV recursive call on A 22 L L1 G L 2 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 18

56 LSP algorithms LUdivine: result is in place Split the row dimension P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 19

57 LSP algorithms LUdivine: result is in place Split the row dimension recursive call on A 1 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 19

58 LSP algorithms LUdivine: result is in place Split the row dimension recursive call on A 1 G A 21 U 1 1 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 19

59 LSP algorithms LUdivine: result is in place Split the row dimension recursive call on A 1 G A 21 U 1 1 A 22 A 22 GW P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 19

60 LSP algorithms LUdivine: result is in place Split the row dimension recursive call on A 1 G A 21 U 1 1 A 22 A 22 GW recursive call on A 22 S L1 1 G S 2 L 2 V P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 19

61 LSP algorithms LQUP: fully in place Split the row dimension P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 20

62 LSP algorithms LQUP: fully in place Split the row dimension recursive call on A 1 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 20

63 LSP algorithms LQUP: fully in place Split the row dimension recursive call on A 1 G A 21 U 1 1 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 20

64 LSP algorithms LQUP: fully in place Split the row dimension recursive call on A 1 G A 21 U 1 1 A 22 A 22 GV P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 20

65 LSP algorithms LQUP: fully in place Split the row dimension recursive call on A 1 G A 21 U 1 1 A 22 A 22 GV recursive call on A 22 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 20

66 LSP algorithms LQUP: fully in place Split the row dimension recursive call on A 1 G A 21 U 1 1 A 22 A 22 GV recursive call on A 22 row permutations L U 1 1 G L U 2 2 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 20

67 Comparisons n LSP LUdivine LQUP Similar timings when matrix fit in the RAM LQUP is slightly faster LQUP is fully in-place no swap for n = 8000 P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 21

68 Dealing with data Locality Application: parallelism, out of core computations Use square recursive blocked data structure A triangularization algorithm: TURBO P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 22

69 Dealing with data Locality Application: parallelism, out of core computations Use square recursive blocked data structure A triangularization algorithm: TURBO 3000 TURBO vs LQUP for rank computation over Z101 on a P4 2.4Ghz, 512Mb RAM Mfops (2) LQUP using Givaro ZpZ Matrix order P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 22

70 Dealing with data Locality Application: parallelism, out of core computations Use square recursive blocked data structure A triangularization algorithm: TURBO 3000 TURBO vs LQUP for rank computation over Z101 on a P4 2.4Ghz, 512Mb RAM Mfops (1) TURBO using Givaro ZpZ (2) LQUP using Givaro ZpZ Matrix order P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 22

71 Conclusion Results: Approach the timings of numerical routines: 6.5s for a numeric LUP of a matrix 7.6s for a symbolic LQUP of a matrix P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 23

72 Conclusion Results: Approach the timings of numerical routines: 6.5s for a numeric LUP of a matrix 7.6s for a symbolic LQUP of a matrix Improved memory management of LSP factorization Further analysis of LSP time complexity. P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 23

73 Conclusion Results: Approach the timings of numerical routines: 6.5s for a numeric LUP of a matrix 7.6s for a symbolic LQUP of a matrix Improved memory management of LSP factorization Further analysis of LSP time complexity. Optimal bounds for the coefficient growth in trsm P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 23

74 Conclusion Results: Approach the timings of numerical routines: 6.5s for a numeric LUP of a matrix 7.6s for a symbolic LQUP of a matrix Improved memory management of LSP factorization Further analysis of LSP time complexity. Optimal bounds for the coefficient growth in trsm Part of the LinBox library [ P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 23

75 Conclusion Conclusion: Again: Wrapping numerical routines as much as possible appears to be the best choice When not possible (ex LSP) block recursive algorithms P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 24

76 Conclusion Conclusion: Again: Wrapping numerical routines as much as possible appears to be the best choice When not possible (ex LSP) block recursive algorithms BLAS No modulo Need to control the coefficient growth Bounds for correctness P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 24

77 Conclusion Conclusion: Again: Wrapping numerical routines as much as possible appears to be the best choice When not possible (ex LSP) block recursive algorithms BLAS No modulo Need to control the coefficient growth Bounds for correctness Cascade structure Switches between algorithms due to Correctness constraints (theoretical thresholds) Performance constraints (experimental thresholds) P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 24

78 Further developments Self adapting software: automatic setup of optimal experimental thresholds P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 25

79 Further developments Self adapting software: automatic setup of optimal experimental thresholds Apply of the factorization to other applications: characteristic polynomial, null space,... P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 25

80 FFPACK: Finite Field Linear Algebra Package Jean-Guillaume Dumas, Pascal Giorgi and Clément Pernet {Jean.Guillaume.Dumas, P. Giorgi, J-G. Dumas & C. Pernet - FFPACK: Finite Field Linear Algebra Package p. 26

From BLAS routines to finite field exact linear algebra solutions

From BLAS routines to finite field exact linear algebra solutions From BLAS routines to finite field exact linear algebra solutions Pascal Giorgi Laboratoire de l Informatique du Parallélisme (Arenaire team) ENS Lyon - CNRS - INRIA - UCBL France Main goals Solve Linear

More information

The LinBox Project for Linear Algebra Computation

The LinBox Project for Linear Algebra Computation The LinBox Project for Linear Algebra Computation A Practical Tutorial Daniel S. Roche Symbolic Computation Group School of Computer Science University of Waterloo MOCAA 2008 University of Western Ontario

More information

Adaptive Parallel Exact dense LU factorization

Adaptive Parallel Exact dense LU factorization 1/35 Adaptive Parallel Exact dense LU factorization Ziad SULTAN 15 mai 2013 JNCF2013, Université de Grenoble 2/35 Outline 1 Introduction 2 Exact gaussian elimination Gaussian elimination in numerical computation

More information

Task based parallelization of recursive linear algebra routines using Kaapi

Task based parallelization of recursive linear algebra routines using Kaapi Task based parallelization of recursive linear algebra routines using Kaapi Clément PERNET joint work with Jean-Guillaume DUMAS and Ziad SULTAN Université Grenoble Alpes, LJK-CASYS January 20, 2017 Journée

More information

Multiplication matrice creuse vecteur dense exacte et efficace dans Fflas-Ffpack.

Multiplication matrice creuse vecteur dense exacte et efficace dans Fflas-Ffpack. Multiplication matrice creuse vecteur dense exacte et efficace dans Fflas-Ffpack. Brice Boyer LIP6, UPMC, France Joint work with Pascal Giorgi (LIRMM), Clément Pernet (LIG, ENSL), Bastien Vialla (LIRMM)

More information

Generating Optimized Sparse Matrix Vector Product over Finite Fields

Generating Optimized Sparse Matrix Vector Product over Finite Fields Generating Optimized Sparse Matrix Vector Product over Finite Fields Pascal Giorgi 1 and Bastien Vialla 1 LIRMM, CNRS, Université Montpellier 2, pascal.giorgi@lirmm.fr, bastien.vialla@lirmm.fr Abstract.

More information

Computing the rank of big sparse matrices modulo p using gaussian elimination

Computing the rank of big sparse matrices modulo p using gaussian elimination Computing the rank of big sparse matrices modulo p using gaussian elimination Charles Bouillaguet 1 Claire Delaplace 2 12 CRIStAL, Université de Lille 2 IRISA, Université de Rennes 1 JNCF, 16 janvier 2017

More information

Parallel computation of echelon forms

Parallel computation of echelon forms Parallel computation of echelon forms Jean-Guillaume Dumas, Thierry Gautier, Clément Pernet, Ziad Sultan To cite this version: Jean-Guillaume Dumas, Thierry Gautier, Clément Pernet, Ziad Sultan. Parallel

More information

Genericity and efficiency in exact linear algebra with the FFLAS-FFPACK and LinBox libraries

Genericity and efficiency in exact linear algebra with the FFLAS-FFPACK and LinBox libraries Genericity and efficiency in exact linear algebra with the FFLAS-FFPACK and LinBox libraries Clément Pernet & the LinBox group U. Joseph Fourier (Grenoble 1, Inria/LIP AriC) Séminaire Performance et Généricité,

More information

The M4RI & M4RIE libraries for linear algebra over F 2 and small extensions

The M4RI & M4RIE libraries for linear algebra over F 2 and small extensions The M4RI & M4RIE libraries for linear algebra over F 2 and small extensions Martin R. Albrecht Nancy, March 30, 2011 Outline M4RI Introduction Multiplication Elimination M4RIE Introduction Travolta Tables

More information

Section 3.1 Gaussian Elimination Method (GEM) Key terms

Section 3.1 Gaussian Elimination Method (GEM) Key terms Section 3.1 Gaussian Elimination Method (GEM) Key terms Rectangular systems Consistent system & Inconsistent systems Rank Types of solution sets RREF Upper triangular form & back substitution Nonsingular

More information

DOCTEUR DE L UNIVERSITÉ DE GRENOBLE

DOCTEUR DE L UNIVERSITÉ DE GRENOBLE THÈSE Pour obtenir le grade de DOCTEUR DE L UNIVERSITÉ DE GRENOBLE Spécialité : Mathématiques Appliquées Arrêté ministériel : 7 août 2006 Présentée par Ziad SULTAN Thèse dirigée par Jean-Guillaume DUMAS

More information

2.7 Numerical Linear Algebra Software

2.7 Numerical Linear Algebra Software 2.7 Numerical Linear Algebra Software In this section we will discuss three software packages for linear algebra operations: (i) (ii) (iii) Matlab, Basic Linear Algebra Subroutines (BLAS) and LAPACK. There

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Efficient Matrix Rank Computation with Application to the Study of Strongly Regular Graphs

Efficient Matrix Rank Computation with Application to the Study of Strongly Regular Graphs Efficient Matrix Rank Computation with Application to the Study of Strongly Regular Graphs John P. May University of Delaware Newark, DE, USA jpmay@udel.edu David Saunders University of Delaware Newark,

More information

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P.

LINPACK Benchmark. on the Fujitsu AP The LINPACK Benchmark. Assumptions. A popular benchmark for floating-point performance. Richard P. 1 2 The LINPACK Benchmark on the Fujitsu AP 1000 Richard P. Brent Computer Sciences Laboratory The LINPACK Benchmark A popular benchmark for floating-point performance. Involves the solution of a nonsingular

More information

Numeric-Symbolic Exact Rational Linear System Solver

Numeric-Symbolic Exact Rational Linear System Solver Numeric-Symbolic Exact Rational Linear System Solver B David Saunders, David Wood, and Bryan Youse University of Delaware June 10, 2011 Motivation Ax = b Given A Z m n and b Z m, compute x Q n Core problem

More information

Optimizations of BLIS Library for AMD ZEN Core

Optimizations of BLIS Library for AMD ZEN Core Optimizations of BLIS Library for AMD ZEN Core 1 Introduction BLIS [1] is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries [2] The framework was

More information

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3 UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

Elements of Design for Containers and Solutions in the LinBox Library

Elements of Design for Containers and Solutions in the LinBox Library Elements of Design for Containers and Solutions in the LinBox Library Brice Boyer, Jean-Guillaume Dumas, Pascal Giorgi, Clément Pernet, B. David Saunders To cite this version: Brice Boyer, Jean-Guillaume

More information

On the maximum rank of completions of entry pattern matrices

On the maximum rank of completions of entry pattern matrices Linear Algebra and its Applications 525 (2017) 1 19 Contents lists available at ScienceDirect Linear Algebra and its Applications wwwelseviercom/locate/laa On the maximum rank of completions of entry pattern

More information

Solving Dense Linear Systems on Graphics Processors

Solving Dense Linear Systems on Graphics Processors Solving Dense Linear Systems on Graphics Processors Sergio Barrachina Maribel Castillo Francisco Igual Rafael Mayo Enrique S. Quintana-Ortí High Performance Computing & Architectures Group Universidad

More information

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

AMath 483/583 Lecture 22. Notes: Another Send/Receive example. Notes: Notes: Another Send/Receive example. Outline:

AMath 483/583 Lecture 22. Notes: Another Send/Receive example. Notes: Notes: Another Send/Receive example. Outline: AMath 483/583 Lecture 22 Outline: MPI Master Worker paradigm Linear algebra LAPACK and the BLAS References: $UWHPSC/codes/mpi class notes: MPI section class notes: Linear algebra Another Send/Receive example

More information

Dense matrix algebra and libraries (and dealing with Fortran)

Dense matrix algebra and libraries (and dealing with Fortran) Dense matrix algebra and libraries (and dealing with Fortran) CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Dense matrix algebra and libraries (and dealing with Fortran)

More information

A Few Numerical Libraries for HPC

A Few Numerical Libraries for HPC A Few Numerical Libraries for HPC CPS343 Parallel and High Performance Computing Spring 2016 CPS343 (Parallel and HPC) A Few Numerical Libraries for HPC Spring 2016 1 / 37 Outline 1 HPC == numerical linear

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

Faugère-Lachartre Parallel Gaussian Elimination for Gröbner Bases Computations Over Finite Fields

Faugère-Lachartre Parallel Gaussian Elimination for Gröbner Bases Computations Over Finite Fields UPMC - Pierre and Marie Curie University MASTER OF SCIENCE AND TECHNOLOGY IN Distributed Systems and Applications Faugère-Lachartre Parallel Gaussian Elimination for Gröbner Bases Computations Over Finite

More information

THE DESIGN AND ANALYSIS OF COMPUTER ALGORITHMS

THE DESIGN AND ANALYSIS OF COMPUTER ALGORITHMS 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. THE DESIGN AND ANALYSIS OF COMPUTER ALGORITHMS Alfred V. Aho Bell

More information

Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra. Mark Gates, Stan Tomov, Azzam Haidar SIAM LA Oct 29, 2015

Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra. Mark Gates, Stan Tomov, Azzam Haidar SIAM LA Oct 29, 2015 Comparing Hybrid CPU-GPU and Native GPU-only Acceleration for Linear Algebra Mark Gates, Stan Tomov, Azzam Haidar SIAM LA Oct 29, 2015 Overview Dense linear algebra algorithms Hybrid CPU GPU implementation

More information

Implicit Matrix Representations of Rational Bézier Curves and Surfaces

Implicit Matrix Representations of Rational Bézier Curves and Surfaces Implicit Matrix Representations of Rational Bézier Curves and Surfaces Laurent Busé INRIA Sophia Antipolis, France Laurent.Buse@inria.fr GD/SPM Conference, Denver, USA November 11, 2013 Overall motivation

More information

NAG Library Function Document nag_superlu_lu_factorize (f11mec)

NAG Library Function Document nag_superlu_lu_factorize (f11mec) NAG Library Function Document nag_superlu_lu_factorize () 1 Purpose nag_superlu_lu_factorize () computes the LU factorization of a real sparse matrix in compressed column (Harwell Boeing), column-permuted

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Probably the simplest kind of problem. Occurs in many contexts, often as part of larger problem. Symbolic manipulation packages can do linear algebra "analytically" (e.g. Mathematica,

More information

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,

More information

Parallel Implementations of Gaussian Elimination

Parallel Implementations of Gaussian Elimination s of Western Michigan University vasilije.perovic@wmich.edu January 27, 2012 CS 6260: in Parallel Linear systems of equations General form of a linear system of equations is given by a 11 x 1 + + a 1n

More information

CS 97SI: INTRODUCTION TO PROGRAMMING CONTESTS. Jaehyun Park

CS 97SI: INTRODUCTION TO PROGRAMMING CONTESTS. Jaehyun Park CS 97SI: INTRODUCTION TO PROGRAMMING CONTESTS Jaehyun Park Today s Lecture Algebra Number Theory Combinatorics (non-computational) Geometry Emphasis on how to compute Sum of Powers n k=1 k 2 = 1 6 n(n

More information

fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner

fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 08 01 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer

More information

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini

DM545 Linear and Integer Programming. Lecture 2. The Simplex Method. Marco Chiarandini DM545 Linear and Integer Programming Lecture 2 The Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. 2. 3. 4. Standard Form Basic Feasible Solutions

More information

Performance of Multicore LUP Decomposition

Performance of Multicore LUP Decomposition Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

Efficient Implementation of Polynomial Arithmetic in a Multiple-Level Programming Environment

Efficient Implementation of Polynomial Arithmetic in a Multiple-Level Programming Environment Efficient Implementation of Polynomial Arithmetic in a Multiple-Level Programming Environment Xin Li and Marc Moreno Maza University of Western Ontario, London N6A 5B7, Canada Abstract. The purpose of

More information

Performance improvements to peer-to-peer file transfers using network coding

Performance improvements to peer-to-peer file transfers using network coding Performance improvements to peer-to-peer file transfers using network coding Aaron Kelley April 29, 2009 Mentor: Dr. David Sturgill Outline Introduction Network Coding Background Contributions Precomputation

More information

Numerical Algorithms

Numerical Algorithms Chapter 10 Slide 464 Numerical Algorithms Slide 465 Numerical Algorithms In textbook do: Matrix multiplication Solving a system of linear equations Slide 466 Matrices A Review An n m matrix Column a 0,0

More information

Storage Formats for Sparse Matrices in Java

Storage Formats for Sparse Matrices in Java Storage Formats for Sparse Matrices in Java Mikel Luján, Anila Usman, Patrick Hardie, T.L. Freeman, and John R. Gurd Centre for Novel Computing, The University of Manchester, Oxford Road, Manchester M13

More information

Iterative methods for use with the Fast Multipole Method

Iterative methods for use with the Fast Multipole Method Iterative methods for use with the Fast Multipole Method Ramani Duraiswami Perceptual Interfaces and Reality Lab. Computer Science & UMIACS University of Maryland, College Park, MD Joint work with Nail

More information

The p-sized partitioning algorithm for fast computation of factorials of numbers

The p-sized partitioning algorithm for fast computation of factorials of numbers J Supercomput (2006) 38:73 82 DOI 10.1007/s11227-006-7285-5 The p-sized partitioning algorithm for fast computation of factorials of numbers Ahmet Ugur Henry Thompson C Science + Business Media, LLC 2006

More information

Exact Sparse Matrix-Vector Multiplication on GPU s and Multicore Architectures

Exact Sparse Matrix-Vector Multiplication on GPU s and Multicore Architectures Exact Sparse Matrix-Vector Multiplication on GPU s and Multicore Architectures Brice Boyer, Jean-Guillaume Dumas, Pascal Giorgi To cite this version: Brice Boyer, Jean-Guillaume Dumas, Pascal Giorgi. Exact

More information

BLAS: Basic Linear Algebra Subroutines I

BLAS: Basic Linear Algebra Subroutines I BLAS: Basic Linear Algebra Subroutines I Most numerical programs do similar operations 90% time is at 10% of the code If these 10% of the code is optimized, programs will be fast Frequently used subroutines

More information

F04EBFP.1. NAG Parallel Library Routine Document

F04EBFP.1. NAG Parallel Library Routine Document F04 Simultaneous Linear Equations F04EBFP NAG Parallel Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check for implementation-dependent

More information

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear

More information

Mathematics. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Mathematics. Jaehyun Park. CS 97SI Stanford University. June 29, 2015 Mathematics Jaehyun Park CS 97SI Stanford University June 29, 2015 Outline Algebra Number Theory Combinatorics Geometry Algebra 2 Sum of Powers n k=1 k 3 k 2 = 1 n(n + 1)(2n + 1) 6 = ( k ) 2 = ( 1 2 n(n

More information

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix. Row echelon form A matrix is said to be in the row echelon form if the leading entries shift to the

More information

Mathematics. 2.1 Introduction: Graphs as Matrices Adjacency Matrix: Undirected Graphs, Directed Graphs, Weighted Graphs CHAPTER 2

Mathematics. 2.1 Introduction: Graphs as Matrices Adjacency Matrix: Undirected Graphs, Directed Graphs, Weighted Graphs CHAPTER 2 CHAPTER Mathematics 8 9 10 11 1 1 1 1 1 1 18 19 0 1.1 Introduction: Graphs as Matrices This chapter describes the mathematics in the GraphBLAS standard. The GraphBLAS define a narrow set of mathematical

More information

Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley Innovative Computing Laboratory University of Tennessee.

Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley Innovative Computing Laboratory University of Tennessee. Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley Innovative Computing Laboratory University of Tennessee Outline Pre-intro: BLAS Motivation What is ATLAS Present release How ATLAS works

More information

About the Author. Dependency Chart. Chapter 1: Logic and Sets 1. Chapter 2: Relations and Functions, Boolean Algebra, and Circuit Design

About the Author. Dependency Chart. Chapter 1: Logic and Sets 1. Chapter 2: Relations and Functions, Boolean Algebra, and Circuit Design Preface About the Author Dependency Chart xiii xix xxi Chapter 1: Logic and Sets 1 1.1: Logical Operators: Statements and Truth Values, Negations, Conjunctions, and Disjunctions, Truth Tables, Conditional

More information

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010

More information

Communication-efficient parallel generic pairwise elimination

Communication-efficient parallel generic pairwise elimination Communication-efficient parallel generic pairwise elimination Alexander Tiskin Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom Abstract The model of bulk-synchronous

More information

Unlabeled equivalence for matroids representable over finite fields

Unlabeled equivalence for matroids representable over finite fields Unlabeled equivalence for matroids representable over finite fields November 16, 2012 S. R. Kingan Department of Mathematics Brooklyn College, City University of New York 2900 Bedford Avenue Brooklyn,

More information

COMPUTATIONAL LINEAR ALGEBRA

COMPUTATIONAL LINEAR ALGEBRA COMPUTATIONAL LINEAR ALGEBRA Matrix Vector Multiplication Matrix matrix Multiplication Slides from UCSD and USB Directed Acyclic Graph Approach Jack Dongarra A new approach using Strassen`s algorithm Jim

More information

Lecture 16: Introduction to Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

Lecture 16: Introduction to Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY Lecture 16: Introduction to Dynamic Programming Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Problem of the Day

More information

LARP / 2018 ACK : 1. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates

LARP / 2018 ACK : 1. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates Triangular Factors and Row Exchanges LARP / 28 ACK :. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates Then there were three

More information

Algorithms and Data Structures

Algorithms and Data Structures Charles A. Wuethrich Bauhaus-University Weimar - CogVis/MMC June 22, 2017 1/51 Introduction Matrix based Transitive hull All shortest paths Gaussian elimination Random numbers Interpolation and Approximation

More information

Transforming Imperfectly Nested Loops

Transforming Imperfectly Nested Loops Transforming Imperfectly Nested Loops 1 Classes of loop transformations: Iteration re-numbering: (eg) loop interchange Example DO 10 J = 1,100 DO 10 I = 1,100 DO 10 I = 1,100 vs DO 10 J = 1,100 Y(I) =

More information

Scientific Computing. Some slides from James Lambers, Stanford

Scientific Computing. Some slides from James Lambers, Stanford Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical

More information

LAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1

LAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1 LAPACK Linear Algebra PACKage 1 Janice Giudice David Knezevic 1 Motivating Question Recalling from last week... Level 1 BLAS: vectors ops Level 2 BLAS: matrix-vectors ops 2 2 O( n ) flops on O( n ) data

More information

HSL MA48 1 SUMMARY 2 HOW TO USE THE PACKAGE. All use is subject to licence. 1 C INTERFACE HSL 2013

HSL MA48 1 SUMMARY 2 HOW TO USE THE PACKAGE. All use is subject to licence.   1 C INTERFACE HSL 2013 HSL MA48 C INTERFACE HSL 2013 1 SUMMARY To solve a sparse unsymmetric system of linear equations. Given a sparse matrix A = {a i j } m n and a vector b, this subroutine solves the system Ax = b or the

More information

fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner

fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 09 10 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer

More information

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang University of Massachusetts Amherst Introduction Singular Value Decomposition (SVD) A: m n matrix (m n) U, V: orthogonal

More information

Sparse Matrices and Graphs: There and Back Again

Sparse Matrices and Graphs: There and Back Again Sparse Matrices and Graphs: There and Back Again John R. Gilbert University of California, Santa Barbara Simons Institute Workshop on Parallel and Distributed Algorithms for Inference and Optimization

More information

Chapter 4. Matrix and Vector Operations

Chapter 4. Matrix and Vector Operations 1 Scope of the Chapter Chapter 4 This chapter provides procedures for matrix and vector operations. This chapter (and Chapters 5 and 6) can handle general matrices, matrices with special structure and

More information

A Standard for Batching BLAS Operations

A Standard for Batching BLAS Operations A Standard for Batching BLAS Operations Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 5/8/16 1 API for Batching BLAS Operations We are proposing, as a community

More information

What is high-dimensional combinatorics?

What is high-dimensional combinatorics? What is high-dimensional combinatorics? Random-Approx, August 08 A (very) brief and (highly) subjective history of combinatorics Combinatorics must always have been fun. But when and how did it become

More information

Introduction to GPGPUs and to CUDA programming model: CUDA Libraries

Introduction to GPGPUs and to CUDA programming model: CUDA Libraries Introduction to GPGPUs and to CUDA programming model: CUDA Libraries www.cineca.it Marzia Rivi m.rivi@cineca.it NVIDIA CUDA Libraries http://developer.nvidia.com/technologies/libraries CUDA Toolkit includes

More information

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017 Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised

More information

Work Allocation. Mark Greenstreet. CpSc 418 Oct. 25, Mark Greenstreet Work Allocation CpSc 418 Oct. 25, / 13

Work Allocation. Mark Greenstreet. CpSc 418 Oct. 25, Mark Greenstreet Work Allocation CpSc 418 Oct. 25, / 13 Work Allocation Mark Greenstreet CpSc 418 Oct. 25, 2012 Mark Greenstreet Work Allocation CpSc 418 Oct. 25, 2012 0 / 13 Lecture Outline Work Allocation Static Allocation (matrices and other arrays) Stripes

More information

Iterative Methods for Linear Systems

Iterative Methods for Linear Systems Iterative Methods for Linear Systems 1 the method of Jacobi derivation of the formulas cost and convergence of the algorithm a Julia function 2 Gauss-Seidel Relaxation an iterative method for solving linear

More information

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview Aim Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity Julian Hall School of Mathematics University of Edinburgh jajhall@ed.ac.uk What should a 2-hour PhD lecture on structure

More information

Optimization of Triangular Matrix Functions in BLAS Library on Loongson2F

Optimization of Triangular Matrix Functions in BLAS Library on Loongson2F Optimization of Triangular Matrix Functions in BLAS Library on Loongson2F Yun Xu 1,2, Mingzhi Shao 1,2, and Da Teng 1,2 1 School of Computer Science and Technology, University of Science and Technology

More information

Numerical Analysis I - Final Exam Matrikelnummer:

Numerical Analysis I - Final Exam Matrikelnummer: Dr. Behrens Center for Mathematical Sciences Technische Universität München Winter Term 2005/2006 Name: Numerical Analysis I - Final Exam Matrikelnummer: I agree to the publication of the results of this

More information

Linear Algebra for Modern Computers. Jack Dongarra

Linear Algebra for Modern Computers. Jack Dongarra Linear Algebra for Modern Computers Jack Dongarra Tuning for Caches 1. Preserve locality. 2. Reduce cache thrashing. 3. Loop blocking when out of cache. 4. Software pipelining. 2 Indirect Addressing d

More information

Finite Fields can be represented in various ways. Generally, they are most

Finite Fields can be represented in various ways. Generally, they are most Using Fibonacci Cycles Modulo p to Represent Finite Fields 1 Caitlyn Conaway, Jeremy Porché, Jack Rebrovich, Shelby Robertson, and Trey Smith, PhD Abstract Finite Fields can be represented in various ways.

More information

Memory efficient scheduling of Strassen-Winograd s matrix multiplication algorithm

Memory efficient scheduling of Strassen-Winograd s matrix multiplication algorithm Memory efficient scheduling of Strassen-Winograd s matrix multiplication algorithm Brice Boyer, Jean-Guillaume Dumas, Clément Pernet, Wei Zhou To cite this version: Brice Boyer, Jean-Guillaume Dumas, Clément

More information

Introduction to the new AES Standard: Rijndael

Introduction to the new AES Standard: Rijndael Introduction to the new AES Standard: Rijndael Paul Donis This paper will explain how the Rijndael Cipher Reference Code in C works. Rijndael is a block cipher that encrypts and decrypts 128, 192, and

More information

BLAS: Basic Linear Algebra Subroutines I

BLAS: Basic Linear Algebra Subroutines I BLAS: Basic Linear Algebra Subroutines I Most numerical programs do similar operations 90% time is at 10% of the code If these 10% of the code is optimized, programs will be fast Frequently used subroutines

More information

Fast Multiscale Algorithms for Information Representation and Fusion

Fast Multiscale Algorithms for Information Representation and Fusion Prepared for: Office of Naval Research Fast Multiscale Algorithms for Information Representation and Fusion No. 2 Devasis Bassu, Principal Investigator Contract: N00014-10-C-0176 Telcordia Technologies

More information

Advanced Computing Research Laboratory. Adaptive Scientific Software Libraries

Advanced Computing Research Laboratory. Adaptive Scientific Software Libraries Adaptive Scientific Software Libraries and Texas Learning and Computation Center and Department of Computer Science University of Houston Challenges Diversity of execution environments Growing complexity

More information

Automatic Performance Tuning. Jeremy Johnson Dept. of Computer Science Drexel University

Automatic Performance Tuning. Jeremy Johnson Dept. of Computer Science Drexel University Automatic Performance Tuning Jeremy Johnson Dept. of Computer Science Drexel University Outline Scientific Computation Kernels Matrix Multiplication Fast Fourier Transform (FFT) Automated Performance Tuning

More information

Adaptive and Hybrid Algorithms: classification and illustration on triangular system solving

Adaptive and Hybrid Algorithms: classification and illustration on triangular system solving Adaptive and Hybrid Algorithms: classification and illustration on triangular system solving Van-Dat Cung, Vincent Danjean, Jean-Guillaume Dumas, Thierry Gautier, Guillaume Huard, Bruno Raffin, Christophe

More information

My 2 hours today: 1. Efficient arithmetic in finite fields minute break 3. Elliptic curves. My 2 hours tomorrow:

My 2 hours today: 1. Efficient arithmetic in finite fields minute break 3. Elliptic curves. My 2 hours tomorrow: My 2 hours today: 1. Efficient arithmetic in finite fields 2. 10-minute break 3. Elliptic curves My 2 hours tomorrow: 4. Efficient arithmetic on elliptic curves 5. 10-minute break 6. Choosing curves Efficient

More information

NAG Fortran Library Routine Document F04CJF.1

NAG Fortran Library Routine Document F04CJF.1 F04 Simultaneous Linear Equations NAG Fortran Library Routine Document Note: before using this routine, please read the Users Note for your implementation to check the interpretation of bold italicised

More information

Uppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2

Uppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2 Uppsala University Department of Information technology Hands-on : Ill-conditioning Exercise (Ill-conditioned linear systems) Definition A system of linear equations is said to be ill-conditioned when

More information

Sorting. Bubble Sort. Selection Sort

Sorting. Bubble Sort. Selection Sort Sorting In this class we will consider three sorting algorithms, that is, algorithms that will take as input an array of items, and then rearrange (sort) those items in increasing order within the array.

More information

Performance Evaluation of a New Parallel Preconditioner

Performance Evaluation of a New Parallel Preconditioner Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The

More information

CSCE 411 Design and Analysis of Algorithms

CSCE 411 Design and Analysis of Algorithms CSCE 411 Design and Analysis of Algorithms Set 4: Transform and Conquer Slides by Prof. Jennifer Welch Spring 2014 CSCE 411, Spring 2014: Set 4 1 General Idea of Transform & Conquer 1. Transform the original

More information

Performance. frontend. iratrecon - rational reconstruction. sprem - sparse pseudo division

Performance. frontend. iratrecon - rational reconstruction. sprem - sparse pseudo division Performance frontend The frontend command is used extensively by Maple to map expressions to the domain of rational functions. It was rewritten for Maple 2017 to reduce time and memory usage. The typical

More information

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core Tuning on a single core 1 From models to practice In lecture 2, we discussed features such as instruction-level parallelism and cache hierarchies that we need to understand in order to have a reasonable

More information