HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach

Size: px

Start display at page:

Download "HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach"

Loren Cameron
5 years ago
Views:

1 HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach Mini-workshop PHyLeaS associated team J. Gaidamour, P. Hénon July 9, 28 HIPS : an hybrid direct/iterative solver / 35

2 Outlines Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 2 / 35

3 Outlines Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 3 / 35

4 Introduction HIPS : Hierarchical Iterative Parallel Solver Goals : Solve A.x = b Build algebraic preconditioners for a Krylov method : no information about the mathematical problem (black box). Parallelism of domain decomposition like methods (e.g. add. Schwarz methods) is appealing but convergence can decrease quickly with the number of domains. Build a global Schur complement preconditioner (ILU) from the local domain matrices only. HIPS : an hybrid direct/iterative solver 4 / 35

5 Introduction We want the smallest Schur complement vs nb domains : use decomposition of the adj. graph of the matrix with an overlap of one-vertex wide. Example : HIPS : an hybrid direct/iterative solver 5 / 35

6 Outlines HID Overview Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 6 / 35

7 HID Overview Incomplete factorization based on a HID [Hénon, Saad, 6] We want an incomplete factorization of the matrix without creating edge (fill-in) outside the local domain matrices (keep the parallelism). Problem : in a domain, we need an ordering of the interface compatible with neighboring domains. Use a hierarchical interface decomposition : partition the interface nodes according to the domains they belong to. respect some properties to ensure good parallelism and numerical behavior. HIPS : an hybrid direct/iterative solver 7 / 35

8 Simple case : a 2D grid HID Overview In this case the HID Interior Points Cross- Point Domain Edges Grid 8 8. The reordered matrix. We use the quotient graph induced by this partition to define block incomplete factorizations HIPS : an hybrid direct/iterative solver 8 / 35

9 Extend to irregular graphs HID Overview A HID respects two rules : Connectors of a same level are not connected 2 A connector of the level k is a separator for at least two connectors of the level k. Example : HIPS : an hybrid direct/iterative solver 9 / 35

10 Extend to irregular graphs HID Overview A HID respects two rules : Connectors of a same level are not connected 2 A connector of the level k is a separator for at least two connectors of the level k. Example : HIPS : an hybrid direct/iterative solver 9 / 35

11 Extend to irregular graphs HID Overview A HID respects two rules : Connectors of a same level are not connected 2 A connector of the level k is a separator for at least two connectors of the level k. Example : The HID heuristics (NP problems) : minimize the number of nodes in the higher connector levels. minimize the number of connector levels. HIPS : an hybrid direct/iterative solver / 35

12 HID Overview Fill-in block pattern (viewed from a local matrix) Global domain partitioned into 6 subdomains Local blocked matrix for subdomain 2 2,2 2 2, , , ,5 2,5 2,5 Empty sparse matrix in intial matrix (fill in occurs during factorization) 2,2 2 2,5,2,4,5 2,5 Fill in in these blocks is allowed in the locally consistent strategy 2 2 2,3 2,5 2,5 2,3,5,6 HIPS : an hybrid direct/iterative solver / 35

13 HID Overview Fill-in block pattern (viewed from the global matrix) L S, U S : strictly consistent rule No fill-in is allowed outside the initial block pattern of A (keep the block diag. struct.) HIPS : an hybrid direct/iterative solver 2 / 35

14 HID Overview Fill-in block pattern (viewed from the global matrix) L S, U S : locally consistent rule Fill-in is allowed in any place of the local domain matrices. HIPS : an hybrid direct/iterative solver 3 / 35

15 HID Overview Elimination order (matters in locally consistent case) : The connectors are ordered locally according to a global ordering. We use a MIS algorithm to eliminate at a time a maximum number of connectors. HIPS : an hybrid direct/iterative solver 4 / 35

16 HID Overview Block fill-in pattern in a real case (bcsstk4) The strictly pattern is really cheaper than the locally pattern. We can use a ILUT (numerical threshold) to control the fill inside the block pattern. Locally consistent rules Strictly consistent rules HIPS : an hybrid direct/iterative solver 5 / 35

17 Overview of HIPS HID Overview HIPS : an hybrid direct/iterative solver 6 / 35

18 Multistage ILUT strategy HID Overview droptol >, droptole >, droptol > User can give a partition as an input. One domain per processor (default strategy) partitions given by a graph partitioner in this case (not from elim. tree). Iteration on the whole set of unknowns. HIPS : an hybrid direct/iterative solver 7 / 35

19 HYBRID strategy HID Overview droptol =, droptole >=, droptol > Iteration on the Schur complement only. Block computation when droptol{,e}= (BLAS). HIPS : an hybrid direct/iterative solver 8 / 35

20 Outlines Schur Complement Approach Results Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 9 / 35

21 Schur Complement Approach Results Hybrid direct/iterative : Schur complement approach The linear system A.x = b can be written as : ( ) ( ) AB A BC xb. = A CB A C x C ( yb The system A.x = B can be solved in three steps : A B.z B = y B S.x C = y C A CB.z B (2) A B.x B = y B A BC.x C y C ) () with S = A C A CB.A B.A BC = A C A CB.U B.L B.A BC HIPS : an hybrid direct/iterative solver 2 / 35

22 Schur Complement Approach Results Hybrid direct/iterative : Schur complement approach Schur Complement utilization : A B = L B.U B : exact factorization direct resolution of subsystems () and (3) Each interior of subdomains can be computed independently S L S.Ũ S : incomplete factorization (2) is solved by a preconditioned Krylov subspace method Solve the Schur complement by a preconditioned Krylov method (GMRES). 8 >< A B.z B = r B () S.x C = r C A CB.z B (2) >: A B.x B = r B A BC.x C (3) Iterative resolution : Iterate on S is numerically equivalent to iterate on the whole system A. HIPS : an hybrid direct/iterative solver 2 / 35

23 Precondition the Schur complement Schur Complement Approach Results Non-zero pattern of the global factors obtained on a small matrix : (Fill-in allowed only in local Schur complement) ( L B A CB U B S ) How to avoid memory cost of A CB U B and S in 3D problems [Gaidamour, Hénon, 8] : We do not need to store S, instead of S.x we use (A C A CB.U B.L B.A BC).x ILUT : A CB U B (resp. L B A BC ) is numerically sparsified along its computations (blockwise ILUC(t)= left looking) : L S. U S S = (A CB.U B ). (L B.A BC) S HIPS : an hybrid direct/iterative solver 22 / 35

24 Test cases Schur Complement Approach Results Test cases from grid-tlse collection ( MHD : Unsymmetric real matrix 3D magneto-hydrodynamic flow problem AUDI : Symmetric real matrix 3D structural mechanic problem Haltere, Amande : Symmetric complex matrix 3D electromagnetism problems (Maxwell) HIPS : an hybrid direct/iterative solver 23 / 35

25 Test cases Schur Complement Approach Results Fill-in and OPC are given for a direct method. Matrix unknowns non-zeros fill-in OPC MHD 485,597 24,233, AUDI 943,695 39,297, Haltere,288,825,476, Amande 6,994,683 58,477, HIPS : an hybrid direct/iterative solver 24 / 35

26 Test cases Schur Complement Approach Results Experimental conditions : nodes of 2.6 Ghz quadri dual-core Opteron (Myrinet) Partitionner : Scotch b A.x / b < 7, no restart in GMRES Thresholds : MHD, Audi, Amande =., Haltere =. The Amande test case does not converge with the strictly pattern and better results were obtained with using no threshold in coupling (L s, L t s are computed from the exact Schur complement). HIPS : an hybrid direct/iterative solver 25 / 35

27 Iterations/number of domains Schur Complement Approach Results MHD (485, 597) : nb domains from 33 to 32 2 MHD : Strictly MHD : Locally AUDI (943, 695) : nb domains from 53 to AUDI : Strictly AUDI : Locally HALTERE (, 288, 825) : nb domains from 9 to HALTERE : Strictly HALTERE : Locally AMANDE (6, 994, 683) : nb domains from 8 to 34 2 AMANDE : Locally HIPS : an hybrid direct/iterative solver 26 / 35

28 Schur Complement Approach Results Seq. Times (prec+solve)/number of domains MHD (485, 597) : nb domains from 33 to 32 6 MHD : Strictly MHD : Locally AUDI (943, 695) : nb domains from 53 to AUDI : Strictly AUDI : Locally HALTERE (, 288, 825) : nb domains from 9 to HALTERE : Strictly HALTERE : Locally AMANDE (6, 994, 683) : nb domains from 8 to 34 3 AMANDE : Locally HIPS : an hybrid direct/iterative solver 27 / 35

29 Schur Complement Approach Results Fill ratio/number of domains (Amande : Peak = +3.5) MHD (485, 597) : nb domains from 33 to 32 MHD : Strictly MHD : Locally AUDI (943, 695) : nb domains from 53 to AUDI : Strictly AUDI : Locally HALTERE (, 288, 825) : nb domains from 9 to HALTERE : Strictly HALTERE : Locally AMANDE (6, 994, 683) : nb domains from 8 to 34 8 AMANDE : Locally AMANDE : Locally HIPS : an hybrid direct/iterative solver 28 / 35

30 Outlines Parallelization scheme Results Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 29 / 35

31 Unknown elimination in parallel Parallelization scheme Results We build a decomposition of the adjacency graph of the system into a set of small subdomains ( to 5 nodes). We can recover communications between processors by elimination of local subdomains HIPS : an hybrid direct/iterative solver 3 / 35

32 Workload and memory balance Parallelization scheme Results Subdomains distribution on the processors : Obtained by partitioning the weighted domain graph (SCOTCH or Metis) Balance of S.x computation (solving step) by using the symbolic factorization to compute the number of NNZ of the interiors of subdomains. Finer level of balance : election of the processor responsible for the computation of a piece of interface (connectors). HIPS : an hybrid direct/iterative solver 3 / 35

33 Parallelization scheme Results Parallel time (prec+solve) (logarithmic scale) MHD (485, 597) : 64 domains AUDI (943, 695) : 23 domains Strictly Ideal strictly Locally Ideal locally Strictly Ideal strictly Locally Ideal locally HALTERE (, 288, 825) : 62 domains Strictly Ideal strictly Locally Ideal locally AMANDE (6, 994, 683) : 262 domains Locally Ideal HIPS : an hybrid direct/iterative solver 32 / 35

34 Outlines Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 33 / 35

35 Conclusion Conclusion : Generic algebraic approach, tight coupling between state of the art direct method (supernodal) and ILU(t) implementation, Trade-off performance/memory : easy tuning (choose thresholds, domain size) Prospects : Provide automatically a domain size parameter (based on memory). Partitioning is an open question...numerical weight, connectivity Coarse grid based on the HID (for elliptic problems) HIPS : an hybrid direct/iterative solver 34 / 35

Download HIPS http://hips.gforge.inria.fr Features : symmetric, unsymmetric (Cholesky/LU, ICC(t)/ILU(t)), real or complex systems.

36 Download HIPS Features : symmetric, unsymmetric (Cholesky/LU, ICC(t)/ILU(t)), real or complex systems. Method : Hybrid, multistage ICC(t) and ILU(t) (phidal). Compatible with the graph partitioners SCOTCH and METIS. Cecill-C license (LGPL-like licence) HIPS : an hybrid direct/iterative solver 35 / 35

Parallel resolution of sparse linear systems by mixing direct and iterative methods

Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix