Prof. Dr. Stefan Funken, Prof. Dr. Alexander Keller, Prof. Dr. Karsten Urban 11. Januar Scientific Computing Parallele Algorithmen

Size: px

Start display at page:

Download "Prof. Dr. Stefan Funken, Prof. Dr. Alexander Keller, Prof. Dr. Karsten Urban 11. Januar Scientific Computing Parallele Algorithmen"

Susan Carroll
5 years ago
Views:

1 Prof. Dr. Stefan Funken, Prof. Dr. Alexander Keller, Prof. Dr. Karsten Urban 11. Januar 2007 Scientific Computing Parallele Algorithmen

2 Page 2 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Iterative Methods Flowchart with suggestions for the selection of iterative methods. Is the matrix symmetric? n y Is the matrix y Are the outer y definite? eigenvalues known? n n Try Chebyshev or CG Try MinRES or CG Try CG Is the transpose available? n y Try QMR Is storage at a premium n y Try CGS or Bi CGSTAB Try GMRES with long restart

3 Page 3 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Iterative Methods This book is also available in Postscript from ftp.netlib.org/templates/templates.ps.

4 Page 4 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Iterative Methods for Algebraic Eigenvalue Problems There exists also a similiar book for algebraic eigenvalue problems. This is also available as online document at dongarra/etemplates/book.html.

5 Page 5 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner Convergence rate of iterative methods depends on spectral properties of the coefficient matrix. Example: CG-method x x (k) A 2ρ k x x (k) A with ρ 2 := κ 2(A) 1 κ 2 (A) 1 and x x (k) 2 A := x x (k), A(x x (k) ). Note: The number of iterations to reach a relative reduction of ɛ in the error is proportional to κ 2.

6 Page 6 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner Convergence rate of iterative methods depends on spectral properties of the coefficient matrix. Hence, one may attempt to transform the linear system into one that is equivalent in the sense that it has the same solution, but that has more favorable spectral properties. A preconditioner is a matrix that effects such a transformation. For instance, if a matrix W approximates the coefficient matrix A in some way, the transformed system W 1 Ax = W 1 b has the same solution as the original system Ax = b, but the spectral properties of its coefficient matrix W 1 A may be more favorable.

7 Page 7 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner In devising a preconditioner, we are faced with a choice between finding a matrix W that approximates A, and for which solving a system is easier than solving one with A,

8 Page 7 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner In devising a preconditioner, we are faced with a choice between finding a matrix W that approximates A, and for which solving a system is easier than solving one with A, or finding a matrix W that approximates A 1, so that only multiplication by W is needed.

9 Page 7 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner In devising a preconditioner, we are faced with a choice between finding a matrix W that approximates A, and for which solving a system is easier than solving one with A, or finding a matrix W that approximates A 1, so that only multiplication by W is needed. The majority of preconditioners falls in the first category.

10 Page 7 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner In devising a preconditioner, we are faced with a choice between finding a matrix W that approximates A, and for which solving a system is easier than solving one with A, or finding a matrix W that approximates A 1, so that only multiplication by W is needed. The majority of preconditioners falls in the first category. On parallel machines there is a further trade-off between the efficancy of a preconditioner in the classical sense, and its parallel efficiency.

11 Page 7 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner In devising a preconditioner, we are faced with a choice between finding a matrix W that approximates A, and for which solving a system is easier than solving one with A, or finding a matrix W that approximates A 1, so that only multiplication by W is needed. The majority of preconditioners falls in the first category. On parallel machines there is a further trade-off between the efficancy of a preconditioner in the classical sense, and its parallel efficiency. Many of the traditional preconditioners have a large sequential component.

12 Page 8 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner We consider the following parallel preconditioners

13 Page 8 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner We consider the following parallel preconditioners Richardson method,

14 Page 8 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner We consider the following parallel preconditioners Richardson method, Jacobi method,

15 Page 8 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner We consider the following parallel preconditioners Richardson method, Jacobi method,

16 Page 8 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner We consider the following parallel preconditioners Richardson method, Jacobi method, non-overlapping domain decomposition,

17 Page 8 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner We consider the following parallel preconditioners Richardson method, Jacobi method, non-overlapping domain decomposition, and the parallelization of the Gauß-Seidel and SOR method with wavefront numbering,

18 Page 8 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Preconditioner We consider the following parallel preconditioners Richardson method, Jacobi method, non-overlapping domain decomposition, and the parallelization of the Gauß-Seidel and SOR method with wavefront numbering, red-black numbering.

19 Page 9 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. on each diagonale, each component can be computed seperatly P1: P2: P3: P4: P5:

20 Page 9 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. on each diagonale, each component can be computed seperatly 2. work load unbalanced P1: P2: P3: P4: P5:

21 Page 9 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. on each diagonale, each component can be computed seperatly 2. work load unbalanced 3. maximal possible speed-up in a P P-mesh is P/2 P1: P2: P3: P4: P5:

22 Page 9 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. on each diagonale, each component can be computed seperatly 2. work load unbalanced 3. maximal possible speed-up in a P P-mesh is P/2 4. what about more general meshes (no quadratic mesh)? P1: P2: P3: P4: P5:

23 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm P1: P2: P3: P4: P5: P6: P7: P8:

24 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. start at a node s.t. number of layers is minimal P1: P2: P3: P4: P5: P6: P7: P8:

25 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. start at a node s.t. number of layers is minimal 2. mark next layer and update as much nodes as possible P1: P2: P3: P4: P5: P6: P7: P8:

26 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. start at a node s.t. number of layers is minimal 2. mark next layer and update as much nodes as possible 3. update remainding nodes before marking next layer P1: P2: P3: P4: P5: P6: P7: P8:

27 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. start at a node s.t. number of layers is minimal 2. mark next layer and update as much nodes as possible 3. update remainding nodes before marking next layer 4. continue with 2. P1: P2: P3: P4: P5: P6: P7: P8:

28 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. start at a node s.t. number of layers is minimal 2. mark next layer and update as much nodes as possible 3. update remainding nodes before marking next layer 4. continue with 2. P1: P2: P3: P4: P5: P6: P7: P8:

29 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. start at a node s.t. number of layers is minimal 2. mark next layer and update as much nodes as possible 3. update remainding nodes before marking next layer 4. continue with 2. P1: P2: P3: P4: P5: P6: P7: P8:

30 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. start at a node s.t. number of layers is minimal 2. mark next layer and update as much nodes as possible 3. update remainding nodes before marking next layer 4. continue with 2. P1: P2: P3: P4: P5: P6: P7: P8:

31 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. start at a node s.t. number of layers is minimal 2. mark next layer and update as much nodes as possible 3. update remainding nodes before marking next layer 4. continue with 2. P1: P2: P3: P4: P5: P6: P7: P8:

32 Page 10 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Wavefront Numbering Algorithm 1. start at a node s.t. number of layers is minimal 2. mark next layer and update as much nodes as possible 3. update remainding nodes before marking next layer 4. continue with 2. P1: P2: P3: P4: P5: P6: P7: P8:

33 Page 11 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Block-Strips Algorithm 1. each block strip will be computed one after another

34 Page 11 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Block-Strips Algorithm 1. each block strip will be computed one after another 2. work load balanced (optimal for kp kp-meshes)

35 Page 11 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Block-Strips Algorithm 1. each block strip will be computed one after another 2. work load balanced (optimal for kp kp-meshes) 3. maximal possible speed-up is kp/(k + 1)

36 Page 12 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first?

37 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? B@

38 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? B@

39 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? B@

40 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? B@

41 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? B@

42 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? B@

43 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? B@

44 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? B@

45 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? B@

46 Page 13 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering B@ What happens, if we number all red nodes first? CA

47 Page 14 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? Properties 1. FEM-matrix with swaped rows and columns

48 Page 14 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? Properties 1. FEM-matrix with swaped rows and columns block matrix

49 Page 14 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Red-Black Numbering What happens, if we number all red nodes first? Properties 1. FEM-matrix with swaped rows and columns block matrix 3. diagonal blocks are diagonal matrices

50 Page 15 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Jacobi / Gauß-Seidel Iteration Consider the system Ax = b and the decomposition A = L + D + U. Sequential version of Jacobi iteration. x (k+1) := D 1 (b Lx (k) Ux (k) ) If D 1 is available on each processor, only communication is necessary to exchange parts of x (k+1) after updating. Sequential version of Gauß-Seidel iteration. or x (k+1) := D 1 (b Lx (k+1) Ux (k) ) x (k+1) := D 1 (b Lx (k) Ux (k+1) )

51 Page 16 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Parallel Gauß-Seidel Iteration (Red-Black-Numbering) Let A = (a ij ) R n n. Assume, we have at least two disjoint index sets I red and I black, s.t. a ij 0 for all i j I red resp. I black. Parallel version of Gauß-Seidel iteration. x (k+1) red x (k+1) black := D 1 red (b red (L rb + U rb )x (k) black ) := D 1 black (b black (L br + U br )x (k) red ) If P >= 2 it is recommended to use a block version, s.t. blocks of the same color need no communication.

52 Page 17 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Non-overlapping Subdomains Different Indizes 1. I nodes in interior of subdomains [N I = p j=1 N I,j]. 2. E nodes in interior of subdomains-edges [N E = n e j=1 N E,j]. (n e number of subdomain-edges) 3. V crosspoints, i.e. endpoints of subdomain-edges [N V ]

53 Page 18 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Types of Vectors Two types of vectors, depending on the storage type: type I: u is stored on P k as restriction u k = C k u. Complete value accessable on P k. type II: r is stored on P k as r k, s.t. r = p k=1 C k T r k. Nodes on the interface have only a part of the full value. How should we parallelize the Gauß-Seidel iteration if we have non-overlapping subdomains? resp. x (k+1) := D 1 (b Lx (k+1) Ux (k) ) x (k+1) i := ( C i ). 1 p p Ck T diag(d) Cl T (b Lx (k+1) Ux (k) ) k=1 l=1

54 Page 19 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Parallel Gauß-Seidel (Non-Overlapping Domains Consider the following ordering of global index set: (V, E, I ) A VV A VE A VI A EV A EE A EI x V x E = b V b E A IV A IE A II x I b I

55 Page 20 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Parallel Gauß-Seidel (Non-Overlapping Domains, Draft) Let d := {1/d ii } i=1,...,n, componentwise multiplication. r V := b V A VV x k V A VE x k E A VI x k I p w V := communication l=1 C T V,lr V,l x k+1 V := x k V + d V w V

56 Page 20 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Parallel Gauß-Seidel (Non-Overlapping Domains, Draft) Let d := {1/d ii } i=1,...,n, componentwise multiplication. r V := b V A VV x k V A VE x k E A VI x k I p w V := communication l=1 C T V,lr V,l x k+1 V := x k V + d V w V r E := b E A EV x k+1 V A EE x k E A EI x k I p w E := communication, real Gauß-Seidel??? l=1 C T E,lr E,l x k+1 E := x k E + d E w E

57 Page 20 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Parallel Gauß-Seidel (Non-Overlapping Domains, Draft) Let d := {1/d ii } i=1,...,n, componentwise multiplication. r V := b V A VV x k V A VE x k E A VI x k I p w V := communication l=1 C T V,lr V,l x k+1 V := x k V + d V w V r E := b E A EV x k+1 V A EE x k E A EI x k I p w E := communication, real Gauß-Seidel??? l=1 C T E,lr E,l x k+1 E := x k E + d E w E r I := b I A IV x k+1 V A IE x k+1 E w I := p l=1 C T I,lr I,l x k+1 I := x k I + d I w I A II x k I no communication!!!

58 Page 21 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Parallel Gauß-Seidel (Non-Overlapping Domains, modified) Assume at least one node on each coupling edge and no connection between different edges. r V := b V A VV x k V A VE x k E A VI x k I p w V := communication l=1 C T V,lr V,l x k+1 V := x k V + d V w V

59 Page 21 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Parallel Gauß-Seidel (Non-Overlapping Domains, modified) Assume at least one node on each coupling edge and no connection between different edges. r V := b V A VV x k V A VE x k E A VI x k I p w V := communication l=1 C T V,lr V,l x k+1 V := x k V + d V w V r E := b E A EV x k+1 V A EE x k E A EI x k I p w E := l=1 C T E,lr E,l x k+1 E := x k E + A 1 EE w E block diagonal matrix, each block tridiagonal

60 Page 21 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Parallel Gauß-Seidel (Non-Overlapping Domains, modified) Assume at least one node on each coupling edge and no connection between different edges. r V := b V A VV x k V A VE x k E A VI x k I p w V := communication l=1 C T V,lr V,l x k+1 V := x k V + d V w V r E := b E A EV x k+1 V A EE x k E A EI x k I p w E := l=1 C T E,lr E,l x k+1 E := x k E + A 1 EE w E block diagonal matrix, each block tridiagonal r I := b I A IV x k+1 V A IE x k+1 E A II x k I w I := x k+1 p l=1 C T I,lr I,l I := x k I + A 1 II w I no communication!!!

61 Page 22 Scientific Computing 11. Januar 2007 Funken / Keller / Urban Parallel Numerical Algorithms Gauß-Seidel via Jacobi Definition: A Matrix A R m n is called non-negativ, if all coefficients a ij of A are non-negativ. Satz: [Stein and Rosenberg] Let the iteration matrix C J R n n of the Jacobi-iteration be non-negativ. Then there hold the following properties i) ϱ(c J ) = ϱ(c G ) = 0 ii) ϱ(c J ) = ϱ(c G ) = 1 iii) 0 < ϱ(c G ) < ϱ(c J ) < 1 iv) 1 < ϱ(c J ) < ϱ(c G ) Example: A = , C J = Gauß-Seidel is faster than Jacobi (for FEM matrices, also in 2D/3D)!

Iterative Methods for Linear Systems

Iterative Methods for Linear Systems 1 the method of Jacobi derivation of the formulas cost and convergence of the algorithm a Julia function 2 Gauss-Seidel Relaxation an iterative method for solving linear