Parallel resolution of sparse linear systems by mixing direct and iterative methods

Size: px

Start display at page:

Download "Parallel resolution of sparse linear systems by mixing direct and iterative methods"

Adrian Shepherd
5 years ago
Views:

Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project), France

1 Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project), France University of Minnesota, USA May, Bordeaux, France An hybrid direct/iterative solver 1 / 27

2 Outline Hybrid Solver Parallelization Results 1 Introduction 2 Hybrid Solver Schur complement techniques Ordering and partitioning of the Schur complement 3 Parallelization 4 Experimental results 5 Conclusion An hybrid direct/iterative solver 2 / 27

3 Plan Hybrid Solver Parallelization Results 1 Introduction 2 Hybrid Solver Schur complement techniques Ordering and partitioning of the Schur complement 3 Parallelization 4 Experimental results 5 Conclusion An hybrid direct/iterative solver 3 / 27

4 Motivation of this work The most popular algebraic methods to solve large sparse linear system A.x = r are : Direct method (exact factorization) Build a dense block structure of the factor (BLAS 3) Solution have a great accuracy ( ) High memory consumption (unable to solve very large 3D problems) Preconditioned iterative methods Robustness depends on how much memory is allowed in the preconditioner Based on scalar implementation (eg : ILU(k) or ILUT) Convergence difficult on very ill-conditioned system we want a trade-off : a solver that can solve difficult problems and that requires less memory than direct solver An hybrid direct/iterative solver 4 / 27

5 Our approach HIPS : Hierarchical Iterative Parallel Solver Goals : Generic algebraic approach : no information about the problem (black box). Reuse direct solver technologies (BLAS, symbolic factorization). Try to take advantage of parallelism of domain decomposition like methods. An hybrid direct/iterative solver 5 / 27

6 Our approach HIPS : Hierarchical Iterative Parallel Solver Caracteristics : Build a decomposition of the adjacency graph of the system into a set of subdomains with overlap. Direct methods inside of subdomains, iterative resolution in the interfaces. Robust preconditioner in the Schur Complement : the number of iterations weakly depends on the number of domains small subdomains to reduce memory consumption. An hybrid direct/iterative solver 6 / 27

7 Plan Hybrid Solver Parallelization Results Schur Ordering 1 Introduction 2 Hybrid Solver Schur complement techniques Ordering and partitioning of the Schur complement 3 Parallelization 4 Experimental results 5 Conclusion An hybrid direct/iterative solver 7 / 27

8 Schur complement (1/2) : Schur Ordering The linear system A.x = r can be written as : ( AB F E A C ) = ( LB EU B 1 S ) ( UB L 1 B F I ) (1) The system A.x = r can be solved in three steps : A B.z B = r B S.x C = r C E.z B (2) A B.x B = r B F.x C with S = A C E.A 1 B.F = A C E.U 1 B.L 1 B.F An hybrid direct/iterative solver 8 / 27

9 Schur complement (2/2) : Schur Ordering Schur Complement utilization : A B = L B.U B : exact factorization direct resolution of subsystems (1) and (3) Each interior of subdomains can be computed independently S L s.u s : incomplete factorization (2) is solved by a preconditioned Krylov subspace method Solve the Schur complement by a preconditioned GMRES. 8 >< A B.z B = r B (1) S.x C = r C E.z B (2) >: A B.x B = r B F.x C (3) Iterative resolution : Iterate on S is numerically equivalent to iterate on the whole system A. An hybrid direct/iterative solver 9 / 27

10 Schur Ordering Ordering and partitioning of the Schur complement We need a special ordering for the Schur complement to compute a block incomplete factorization. The unknowns in the interface are ordering according to a Hierarchical Interface Decomposition (Hénon, Saad, SIAM SISC). Interior Points Cross- Point Domain Edges Grid 8 8. The reordered matrix. We use the quotient graph induced by this partition to define block incomplete factorizations An hybrid direct/iterative solver 10 / 27

11 Schur Ordering Precondition the Schur complement Non-zero pattern of the global factors obtained on a small matrix : (Fill-in allowed only in local Schur complement) ( LB EU B 1 S ) How to avoid memory cost of EU B 1 and S in 3D problems : ILUT : EU B 1 is numerically sparsified, S factors sparsified during theirs computation (left looking approach). We do not need to store S to compute Schur product using its implicit formulation : (A C E.U 1 B.L 1 B.F).x An hybrid direct/iterative solver 11 / 27

12 Schur Ordering Precondition the Schur complement To reduce memory consumption and enhance parallelism, we defined also another block fill-in pattern for the factors : Locally consistent rules Strictly consistent rules Strictly consistent rules : No fill-in is allowed between the connectors of a same level (same block pattern than A) to keep the block diagonal pattern induced by the HID ordering. An hybrid direct/iterative solver 12 / 27

13 Plan Hybrid Solver Parallelization Results 1 Introduction 2 Hybrid Solver Schur complement techniques Ordering and partitioning of the Schur complement 3 Parallelization 4 Experimental results 5 Conclusion An hybrid direct/iterative solver 13 / 27

14 Unknown elimination in parallel We build a decomposition of the adjacency graph of the system into a set of small subdomains ( nodes). We can recover communications between processors by elimination of local subdomains An hybrid direct/iterative solver 14 / 27

15 Construction of the domain partition Justification of small subdomains choice : Need low memory (not too much direct), Convergence independent of the number of processors, Number of subdomains become a parameter to control memory / convergence according to the problem difficulty, Give high potential parallelism (multiple domains per processors). An hybrid direct/iterative solver 15 / 27

16 Equilibration Subdomains distribution over available processors : Equilibration using a graph partitionner (SCOTCH) Equilibration of S.x computation (solving step) by using the symbolic factorization to compute the number of NNZ of the interiors of subdomains. Election of the processor responsible for the computation of a piece of interface (connectors). An hybrid direct/iterative solver 16 / 27

17 Plan Hybrid Solver Parallelization Results 1 Introduction 2 Hybrid Solver Schur complement techniques Ordering and partitioning of the Schur complement 3 Parallelization 4 Experimental results 5 Conclusion An hybrid direct/iterative solver 17 / 27

18 Test cases Experimental conditions : 10 nodes of 2.6 Ghz quadri dual-core Opteron (Myrinet) Partitionner : Scotch b A.x / b < 10 7, no restart in GMRES Tests cases : Haltere, Amande (CEA/CESTA) : Symmetric complex matrix 3D electromagnetism problems (Helmholtz operator) An hybrid direct/iterative solver 18 / 27

19 Test case : Haltere (sequential study) Haltere (CEA/CESTA) : n = 1, 288, 825 ; nnz(a) = 10, 476, 775, fill ratio : x HIPS : ILUT (locally consistent, τ = 0.01, 10 7 ) # domains Precond. Solve Total Iter. Fill (sec.) (sec.) (sec.) ratio An hybrid direct/iterative solver 19 / 27

20 Test case : Haltere (sequential study) Convergence/time for several parameters with two different domain size parameters : Domain size set to 1000 (1021 domains) : Domain size set to (119 domains) : 0.01 Strictly consistent, t = 0.01 Strictly consistent, t = Locally consistent, t = 0.01 Locally consistent, t = Strictly consistent, t = 0.01 Strictly consistent, t = Locally consistent, t = 0.01 Locally consistent, t = e-04 1e-04 Relative residual norm 1e-06 1e-08 Relative residual norm 1e-06 1e-08 1e-10 1e-10 1e-12 1e Time (sec.) Time (sec.) (preconditioning time = curve offset) An hybrid direct/iterative solver 20 / 27

21 Test case : Haltere (parallel study) HIPS : ILUT (τ = 0.01, 10 7 ) 1021 domains of 1481 nodes fill ratio in precond : 5.70 (peak) dim(s) = 14.26% of dim(a) Strictly consistent : 21 iterations fill ratio in solve : 5.52 # proc Precond. Solve Total (sec.) (sec.) (sec.) Locally consistent : 13 iterations fill ratio in solve : 5.69 # proc Precond. Solve Total (sec.) (sec.) (sec.) An hybrid direct/iterative solver 21 / 27

22 Test case : Amande Amande (CEA/CESTA) : n = 6, 994, 683 ; nnz(a) = 58, 477, 383, fill ratio : x HIPS : ILUT (locally consistent, τ = 0.001, 10 7 ) 2053 domains of 3770 nodes 77 iterations fill ratio in precond / solve : (peak) dim(s) = 9.59 % of dim(a) # proc Precond. Solve Total nnz(p max).10 6 (sec.) (sec.) (sec.) An hybrid direct/iterative solver 22 / 27

23 Test case : Amande HIPS : ILUT (locally consistent, τ = 0.001, 10 7 ) Precond. Solve Total Optimal total 512 time (s) number of processors Time decomposition for one iteration of GMRES : # proc Total Triangular S.x Other 1 Iter. (sec.) Solve (sec.) (sec.) (sec.) An hybrid direct/iterative solver 23 / 27

24 Plan Hybrid Solver Parallelization Results 1 Introduction 2 Hybrid Solver Schur complement techniques Ordering and partitioning of the Schur complement 3 Parallelization 4 Experimental results 5 Conclusion An hybrid direct/iterative solver 24 / 27

25 Conclusion Conclusion : Generic algebraic approach, mix direct and iterative methods thought a Schur complement approach, The part of direct factorization is controlled by the size of domains, Many different strategies are implemented (dense block ILU). Perspective (preprocessing) : PT-Scotch integration, Parallel interface renumbering, Providing indications about good domain size parameters. HIPS public release : March 2008 (Cecill-C license) Features : real (symmetric, unsymmetric), complex (symmetric) An hybrid direct/iterative solver 25 / 27

26 * An hybrid direct/iterative solver 26 / 27

27 The domain partition is constructed from the reordering based on Nested-Dissection like algorithms (eg : METIS, SCOTCH) C 7 C 4 C 6 C 7 C 3 C 2 C C 6 3 C 5 C 1 C C C C D D D D D D D D Minimize overlap between subdomains, quality of the interface An hybrid direct/iterative solver 27 / 27

28 * An hybrid direct/iterative solver 27 / 27

29 We choose a level of the elimination tree of direct method : Subtrees rooted in this level are the interior of subdomains The upper part of the elimination tree corresponds to the interfaces Possibility to choose the ratio of direct/iterative according to the problem difficulty or the accuracy needed. An hybrid direct/iterative solver 27 / 27

A parallel direct/iterative solver based on a Schur complement approach

A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008