HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach
|
|
- Loren Cameron
- 5 years ago
- Views:
Transcription
1 HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach Mini-workshop PHyLeaS associated team J. Gaidamour, P. Hénon July 9, 28 HIPS : an hybrid direct/iterative solver / 35
2 Outlines Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 2 / 35
3 Outlines Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 3 / 35
4 Introduction HIPS : Hierarchical Iterative Parallel Solver Goals : Solve A.x = b Build algebraic preconditioners for a Krylov method : no information about the mathematical problem (black box). Parallelism of domain decomposition like methods (e.g. add. Schwarz methods) is appealing but convergence can decrease quickly with the number of domains. Build a global Schur complement preconditioner (ILU) from the local domain matrices only. HIPS : an hybrid direct/iterative solver 4 / 35
5 Introduction We want the smallest Schur complement vs nb domains : use decomposition of the adj. graph of the matrix with an overlap of one-vertex wide. Example : HIPS : an hybrid direct/iterative solver 5 / 35
6 Outlines HID Overview Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 6 / 35
7 HID Overview Incomplete factorization based on a HID [Hénon, Saad, 6] We want an incomplete factorization of the matrix without creating edge (fill-in) outside the local domain matrices (keep the parallelism). Problem : in a domain, we need an ordering of the interface compatible with neighboring domains. Use a hierarchical interface decomposition : partition the interface nodes according to the domains they belong to. respect some properties to ensure good parallelism and numerical behavior. HIPS : an hybrid direct/iterative solver 7 / 35
8 Simple case : a 2D grid HID Overview In this case the HID Interior Points Cross- Point Domain Edges Grid 8 8. The reordered matrix. We use the quotient graph induced by this partition to define block incomplete factorizations HIPS : an hybrid direct/iterative solver 8 / 35
9 Extend to irregular graphs HID Overview A HID respects two rules : Connectors of a same level are not connected 2 A connector of the level k is a separator for at least two connectors of the level k. Example : HIPS : an hybrid direct/iterative solver 9 / 35
10 Extend to irregular graphs HID Overview A HID respects two rules : Connectors of a same level are not connected 2 A connector of the level k is a separator for at least two connectors of the level k. Example : HIPS : an hybrid direct/iterative solver 9 / 35
11 Extend to irregular graphs HID Overview A HID respects two rules : Connectors of a same level are not connected 2 A connector of the level k is a separator for at least two connectors of the level k. Example : The HID heuristics (NP problems) : minimize the number of nodes in the higher connector levels. minimize the number of connector levels. HIPS : an hybrid direct/iterative solver / 35
12 HID Overview Fill-in block pattern (viewed from a local matrix) Global domain partitioned into 6 subdomains Local blocked matrix for subdomain 2 2,2 2 2, , , ,5 2,5 2,5 Empty sparse matrix in intial matrix (fill in occurs during factorization) 2,2 2 2,5,2,4,5 2,5 Fill in in these blocks is allowed in the locally consistent strategy 2 2 2,3 2,5 2,5 2,3,5,6 HIPS : an hybrid direct/iterative solver / 35
13 HID Overview Fill-in block pattern (viewed from the global matrix) L S, U S : strictly consistent rule No fill-in is allowed outside the initial block pattern of A (keep the block diag. struct.) HIPS : an hybrid direct/iterative solver 2 / 35
14 HID Overview Fill-in block pattern (viewed from the global matrix) L S, U S : locally consistent rule Fill-in is allowed in any place of the local domain matrices. HIPS : an hybrid direct/iterative solver 3 / 35
15 HID Overview Elimination order (matters in locally consistent case) : The connectors are ordered locally according to a global ordering. We use a MIS algorithm to eliminate at a time a maximum number of connectors. HIPS : an hybrid direct/iterative solver 4 / 35
16 HID Overview Block fill-in pattern in a real case (bcsstk4) The strictly pattern is really cheaper than the locally pattern. We can use a ILUT (numerical threshold) to control the fill inside the block pattern. Locally consistent rules Strictly consistent rules HIPS : an hybrid direct/iterative solver 5 / 35
17 Overview of HIPS HID Overview HIPS : an hybrid direct/iterative solver 6 / 35
18 Multistage ILUT strategy HID Overview droptol >, droptole >, droptol > User can give a partition as an input. One domain per processor (default strategy) partitions given by a graph partitioner in this case (not from elim. tree). Iteration on the whole set of unknowns. HIPS : an hybrid direct/iterative solver 7 / 35
19 HYBRID strategy HID Overview droptol =, droptole >=, droptol > Iteration on the Schur complement only. Block computation when droptol{,e}= (BLAS). HIPS : an hybrid direct/iterative solver 8 / 35
20 Outlines Schur Complement Approach Results Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 9 / 35
21 Schur Complement Approach Results Hybrid direct/iterative : Schur complement approach The linear system A.x = b can be written as : ( ) ( ) AB A BC xb. = A CB A C x C ( yb The system A.x = B can be solved in three steps : A B.z B = y B S.x C = y C A CB.z B (2) A B.x B = y B A BC.x C y C ) () with S = A C A CB.A B.A BC = A C A CB.U B.L B.A BC HIPS : an hybrid direct/iterative solver 2 / 35
22 Schur Complement Approach Results Hybrid direct/iterative : Schur complement approach Schur Complement utilization : A B = L B.U B : exact factorization direct resolution of subsystems () and (3) Each interior of subdomains can be computed independently S L S.Ũ S : incomplete factorization (2) is solved by a preconditioned Krylov subspace method Solve the Schur complement by a preconditioned Krylov method (GMRES). 8 >< A B.z B = r B () S.x C = r C A CB.z B (2) >: A B.x B = r B A BC.x C (3) Iterative resolution : Iterate on S is numerically equivalent to iterate on the whole system A. HIPS : an hybrid direct/iterative solver 2 / 35
23 Precondition the Schur complement Schur Complement Approach Results Non-zero pattern of the global factors obtained on a small matrix : (Fill-in allowed only in local Schur complement) ( L B A CB U B S ) How to avoid memory cost of A CB U B and S in 3D problems [Gaidamour, Hénon, 8] : We do not need to store S, instead of S.x we use (A C A CB.U B.L B.A BC).x ILUT : A CB U B (resp. L B A BC ) is numerically sparsified along its computations (blockwise ILUC(t)= left looking) : L S. U S S = (A CB.U B ). (L B.A BC) S HIPS : an hybrid direct/iterative solver 22 / 35
24 Test cases Schur Complement Approach Results Test cases from grid-tlse collection ( MHD : Unsymmetric real matrix 3D magneto-hydrodynamic flow problem AUDI : Symmetric real matrix 3D structural mechanic problem Haltere, Amande : Symmetric complex matrix 3D electromagnetism problems (Maxwell) HIPS : an hybrid direct/iterative solver 23 / 35
25 Test cases Schur Complement Approach Results Fill-in and OPC are given for a direct method. Matrix unknowns non-zeros fill-in OPC MHD 485,597 24,233, AUDI 943,695 39,297, Haltere,288,825,476, Amande 6,994,683 58,477, HIPS : an hybrid direct/iterative solver 24 / 35
26 Test cases Schur Complement Approach Results Experimental conditions : nodes of 2.6 Ghz quadri dual-core Opteron (Myrinet) Partitionner : Scotch b A.x / b < 7, no restart in GMRES Thresholds : MHD, Audi, Amande =., Haltere =. The Amande test case does not converge with the strictly pattern and better results were obtained with using no threshold in coupling (L s, L t s are computed from the exact Schur complement). HIPS : an hybrid direct/iterative solver 25 / 35
27 Iterations/number of domains Schur Complement Approach Results MHD (485, 597) : nb domains from 33 to 32 2 MHD : Strictly MHD : Locally AUDI (943, 695) : nb domains from 53 to AUDI : Strictly AUDI : Locally HALTERE (, 288, 825) : nb domains from 9 to HALTERE : Strictly HALTERE : Locally AMANDE (6, 994, 683) : nb domains from 8 to 34 2 AMANDE : Locally HIPS : an hybrid direct/iterative solver 26 / 35
28 Schur Complement Approach Results Seq. Times (prec+solve)/number of domains MHD (485, 597) : nb domains from 33 to 32 6 MHD : Strictly MHD : Locally AUDI (943, 695) : nb domains from 53 to AUDI : Strictly AUDI : Locally HALTERE (, 288, 825) : nb domains from 9 to HALTERE : Strictly HALTERE : Locally AMANDE (6, 994, 683) : nb domains from 8 to 34 3 AMANDE : Locally HIPS : an hybrid direct/iterative solver 27 / 35
29 Schur Complement Approach Results Fill ratio/number of domains (Amande : Peak = +3.5) MHD (485, 597) : nb domains from 33 to 32 MHD : Strictly MHD : Locally AUDI (943, 695) : nb domains from 53 to AUDI : Strictly AUDI : Locally HALTERE (, 288, 825) : nb domains from 9 to HALTERE : Strictly HALTERE : Locally AMANDE (6, 994, 683) : nb domains from 8 to 34 8 AMANDE : Locally AMANDE : Locally HIPS : an hybrid direct/iterative solver 28 / 35
30 Outlines Parallelization scheme Results Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 29 / 35
31 Unknown elimination in parallel Parallelization scheme Results We build a decomposition of the adjacency graph of the system into a set of small subdomains ( to 5 nodes). We can recover communications between processors by elimination of local subdomains HIPS : an hybrid direct/iterative solver 3 / 35
32 Workload and memory balance Parallelization scheme Results Subdomains distribution on the processors : Obtained by partitioning the weighted domain graph (SCOTCH or Metis) Balance of S.x computation (solving step) by using the symbolic factorization to compute the number of NNZ of the interiors of subdomains. Finer level of balance : election of the processor responsible for the computation of a piece of interface (connectors). HIPS : an hybrid direct/iterative solver 3 / 35
33 Parallelization scheme Results Parallel time (prec+solve) (logarithmic scale) MHD (485, 597) : 64 domains AUDI (943, 695) : 23 domains Strictly Ideal strictly Locally Ideal locally Strictly Ideal strictly Locally Ideal locally HALTERE (, 288, 825) : 62 domains Strictly Ideal strictly Locally Ideal locally AMANDE (6, 994, 683) : 262 domains Locally Ideal HIPS : an hybrid direct/iterative solver 32 / 35
34 Outlines Introduction 2 Incomplete factorization based on a HID 3 Hybrid Solver Experimental results 4 Parallelization Experimental results 5 Conclusion HIPS : an hybrid direct/iterative solver 33 / 35
35 Conclusion Conclusion : Generic algebraic approach, tight coupling between state of the art direct method (supernodal) and ILU(t) implementation, Trade-off performance/memory : easy tuning (choose thresholds, domain size) Prospects : Provide automatically a domain size parameter (based on memory). Partitioning is an open question...numerical weight, connectivity Coarse grid based on the HID (for elliptic problems) HIPS : an hybrid direct/iterative solver 34 / 35
36 Download HIPS Features : symmetric, unsymmetric (Cholesky/LU, ICC(t)/ILU(t)), real or complex systems. Method : Hybrid, multistage ICC(t) and ILU(t) (phidal). Compatible with the graph partitioners SCOTCH and METIS. Cecill-C license (LGPL-like licence) HIPS : an hybrid direct/iterative solver 35 / 35
Parallel resolution of sparse linear systems by mixing direct and iterative methods
Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix
More informationA parallel direct/iterative solver based on a Schur complement approach
A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008
More informationSolvers and partitioners in the Bacchus project
1 Solvers and partitioners in the Bacchus project 11/06/2009 François Pellegrini INRIA-UIUC joint laboratory The Bacchus team 2 Purpose Develop and validate numerical methods and tools adapted to problems
More informationMaPHyS, a sparse hybrid linear solver and preliminary complexity analysis
MaPHyS, a sparse hybrid linear solver and preliminary complexity analysis Emmanuel AGULLO, Luc GIRAUD, Abdou GUERMOUCHE, Azzam HAIDAR, Jean ROMAN INRIA Bordeaux Sud Ouest CERFACS Université de Bordeaux
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26
More informationCombinatorial problems in a Parallel Hybrid Linear Solver
Combinatorial problems in a Parallel Hybrid Linear Solver Ichitaro Yamazaki and Xiaoye Li Lawrence Berkeley National Laboratory François-Henry Rouet and Bora Uçar ENSEEIHT-IRIT and LIP, ENS-Lyon SIAM workshop
More informationPreconditioning Linear Systems Arising from Graph Laplacians of Complex Networks
Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Kevin Deweese 1 Erik Boman 2 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationNative mesh ordering with Scotch 4.0
Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse
More informationS0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS
S0432 NEW IDEAS FOR MASSIVELY PARALLEL PRECONDITIONERS John R Appleyard Jeremy D Appleyard Polyhedron Software with acknowledgements to Mark A Wakefield Garf Bowen Schlumberger Outline of Talk Reservoir
More informationToward robust hybrid parallel sparse solvers for large scale applications
Toward robust hybrid parallel sparse solvers for large scale applications Luc Giraud (INPT/INRIA) joint work with Azzam Haidar (CERFACS-INPT/IRIT) and Jean Roman (ENSEIRB, LaBRI and INRIA) 1st workshop
More informationDistributed Schur Complement Solvers for Real and Complex Block-Structured CFD Problems
Distributed Schur Complement Solvers for Real and Complex Block-Structured CFD Problems Dr.-Ing. Achim Basermann, Dr. Hans-Peter Kersken German Aerospace Center (DLR) Simulation- and Software Technology
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationDEVELOPMENT OF A RESTRICTED ADDITIVE SCHWARZ PRECONDITIONER FOR SPARSE LINEAR SYSTEMS ON NVIDIA GPU
INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, SERIES B Volume 5, Number 1-2, Pages 13 20 c 2014 Institute for Scientific Computing and Information DEVELOPMENT OF A RESTRICTED ADDITIVE SCHWARZ
More informationHypergraph Partitioning for Parallel Iterative Solution of General Sparse Linear Systems
Hypergraph Partitioning for Parallel Iterative Solution of General Sparse Linear Systems Masha Sosonkina Bora Uçar Yousef Saad February 1, 2007 Abstract The efficiency of parallel iterative methods for
More informationproblems on. Of course, additional interface conditions must be
Thirteenth International Conference on Domain Decomposition Methods Editors: N. Debit, M.Garbey, R. Hoppe, J. Périaux, D. Keyes, Y. Kuznetsov c 2001 DDM.org 53 Schur Complement Based Preconditioners for
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationAmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative
More informationIterative Sparse Triangular Solves for Preconditioning
Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations
More informationBlock Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems
Block Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems Dr.-Ing. Achim Basermann, Melven Zöllner** German Aerospace Center (DLR) Simulation- and Software Technology
More informationPartitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA
Partitioning and Partitioning Tools Tim Barth NASA Ames Research Center Moffett Field, California 94035-00 USA 1 Graph/Mesh Partitioning Why do it? The graph bisection problem What are the standard heuristic
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More informationScheduling Strategies for Parallel Sparse Backward/Forward Substitution
Scheduling Strategies for Parallel Sparse Backward/Forward Substitution J.I. Aliaga M. Bollhöfer A.F. Martín E.S. Quintana-Ortí Deparment of Computer Science and Engineering, Univ. Jaume I (Spain) {aliaga,martina,quintana}@icc.uji.es
More informationAccelerating the Iterative Linear Solver for Reservoir Simulation
Accelerating the Iterative Linear Solver for Reservoir Simulation Wei Wu 1, Xiang Li 2, Lei He 1, Dongxiao Zhang 2 1 Electrical Engineering Department, UCLA 2 Department of Energy and Resources Engineering,
More informationToward a supernodal sparse direct solver over DAG runtimes
Toward a supernodal sparse direct solver over DAG runtimes HOSCAR 2013, Bordeaux X. Lacoste Xavier LACOSTE HiePACS team Inria Bordeaux Sud-Ouest November 27, 2012 Guideline Context and goals About PaStiX
More informationApproaches to Parallel Implementation of the BDDC Method
Approaches to Parallel Implementation of the BDDC Method Jakub Šístek Includes joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík. Institute of Mathematics of the AS CR, Prague
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:
More informationA comparison of Algorithms for Sparse Matrix. Real-time Multibody Dynamic Simulation
A comparison of Algorithms for Sparse Matrix Factoring and Variable Reordering aimed at Real-time Multibody Dynamic Simulation Jose-Luis Torres-Moreno, Jose-Luis Blanco, Javier López-Martínez, Antonio
More informationGPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA
GPU-Accelerated Algebraic Multigrid for Commercial Applications Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA ANSYS Fluent 2 Fluent control flow Accelerate this first Non-linear iterations Assemble
More informationSimulating tsunami propagation on parallel computers using a hybrid software framework
Simulating tsunami propagation on parallel computers using a hybrid software framework Xing Simula Research Laboratory, Norway Department of Informatics, University of Oslo March 12, 2007 Outline Intro
More informationA Parallel Implementation of the BDDC Method for Linear Elasticity
A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague
More informationPARDISO Version Reference Sheet Fortran
PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly
More informationDomain decomposition for 3D electromagnetic modeling
Earth Planets Space, 51, 1013 1018, 1999 Domain decomposition for 3D electromagnetic modeling Zonghou Xiong CRC AMET, Earth Science, Macquarie University, Sydney, NSW 2109, Australia (Received November
More informationHierarchical hybrid sparse linear solver for multicore platforms
Hierarchical hybrid sparse linear solver for multicore platforms Emmanuel Agullo, Luc Giraud, Stojce Nakov, Jean Roman To cite this version: Emmanuel Agullo, Luc Giraud, Stojce Nakov, Jean Roman. Hierarchical
More informationGPU-based Parallel Reservoir Simulators
GPU-based Parallel Reservoir Simulators Zhangxin Chen 1, Hui Liu 1, Song Yu 1, Ben Hsieh 1 and Lei Shao 1 Key words: GPU computing, reservoir simulation, linear solver, parallel 1 Introduction Nowadays
More informationOverview of Trilinos and PT-Scotch
29.03.2012 Outline PT-Scotch 1 PT-Scotch The Dual Recursive Bipartitioning Algorithm Parallel Graph Bipartitioning Methods 2 Overview of the Trilinos Packages Examples on using Trilinos PT-Scotch The Scotch
More informationMultigrid Pattern. I. Problem. II. Driving Forces. III. Solution
Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids
More informationImplicit schemes for wave models
Implicit schemes for wave models Mathieu Dutour Sikirić Rudjer Bo sković Institute, Croatia and Universität Rostock April 17, 2013 I. Wave models Stochastic wave modelling Oceanic models are using grids
More informationAccelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead
Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU Robert Strzodka NVAMG Project Lead A Parallel Success Story in Five Steps 2 Step 1: Understand Application ANSYS Fluent Computational Fluid Dynamics
More informationPrimal methods of iterative substructuring
Primal methods of iterative substructuring Jakub Šístek Institute of Mathematics of the AS CR, Prague Programs and Algorithms of Numerical Mathematics 16 June 5th, 2012 Co-workers Presentation based on
More informationarxiv: v1 [cs.ms] 2 Jun 2016
Parallel Triangular Solvers on GPU Zhangxin Chen, Hui Liu, and Bo Yang University of Calgary 2500 University Dr NW, Calgary, AB, Canada, T2N 1N4 {zhachen,hui.j.liu,yang6}@ucalgary.ca arxiv:1606.00541v1
More informationDistributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs
Distributed NVAMG Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Istvan Reguly (istvan.reguly at oerc.ox.ac.uk) Oxford e-research Centre NVIDIA Summer Internship
More informationTools and Libraries for Parallel Sparse Matrix Computations. Edmond Chow and Yousef Saad. University of Minnesota. Minneapolis, MN
Tools and Libraries for Parallel Sparse Matrix Computations Edmond Chow and Yousef Saad Department of Computer Science, and Minnesota Supercomputer Institute University of Minnesota Minneapolis, MN 55455
More informationRecent developments in the solution of indefinite systems Location: De Zwarte Doos (TU/e campus)
1-day workshop, TU Eindhoven, April 17, 2012 Recent developments in the solution of indefinite systems Location: De Zwarte Doos (TU/e campus) :10.25-10.30: Opening and word of welcome 10.30-11.15: Michele
More informationParallel FEM Computation and Multilevel Graph Partitioning Xing Cai
Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example
More informationProf. Dr. Stefan Funken, Prof. Dr. Alexander Keller, Prof. Dr. Karsten Urban 11. Januar Scientific Computing Parallele Algorithmen
Prof. Dr. Stefan Funken, Prof. Dr. Alexander Keller, Prof. Dr. Karsten Urban 11. Januar 2007 Scientific Computing Parallele Algorithmen Page 2 Scientific Computing 11. Januar 2007 Funken / Keller / Urban
More informationCOMMUNICATION AVOIDING ILU0 PRECONDITIONER
SIAM J. SCI. COMPUT. Vol. 37, No. 2, pp. C217 C246 c 2015 Society for Industrial and Applied Mathematics COMMUNICATION AVOIDING ILU0 PRECONDITIONER LAURA GRIGORI AND SOPHIE MOUFAWAD Abstract. In this paper
More informationComparison of parallel preconditioners for a Newton-Krylov flow solver
Comparison of parallel preconditioners for a Newton-Krylov flow solver Jason E. Hicken, Michal Osusky, and David W. Zingg 1Introduction Analysis of the results from the AIAA Drag Prediction workshops (Mavriplis
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationSCALABLE ALGORITHMS for solving large sparse linear systems of equations
SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational
More informationVBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems Liao, Jia
University of Groningen VBARMS: A variable block algebraic recursive multilevel solver for sparse linear systems Liao, Jia IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's
More informationSELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND
Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana
More informationNumerical Implementation of Overlapping Balancing Domain Decomposition Methods on Unstructured Meshes
Numerical Implementation of Overlapping Balancing Domain Decomposition Methods on Unstructured Meshes Jung-Han Kimn 1 and Blaise Bourdin 2 1 Department of Mathematics and The Center for Computation and
More informationHYPERGRAPH PARTITIONING FOR SPARSE LINEAR SYSTEMS: A CASE STUDY WITH A SIMPLE DISCONTINUOUS PDE
HYPERGRAPH PARTITIONING FOR SPARSE LINEAR SYSTEMS: A CASE STUDY WITH A SIMPLE DISCONTINUOUS PDE MASHA SOSONKINA AND YOUSEF SAAD Abstract. A number of challenges need to be addressed when solving sparse
More informationFOR P3: A monolithic multigrid FEM solver for fluid structure interaction
FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany
More informationLecture 17: More Fun With Sparse Matrices
Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you
More informationParallel Greedy Matching Algorithms
Parallel Greedy Matching Algorithms Fredrik Manne Department of Informatics University of Bergen, Norway Rob Bisseling, University of Utrecht Md. Mostofa Patwary, University of Bergen 1 Outline Background
More informationGPU Cluster Computing for FEM
GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing
More informationElliptic model problem, finite elements, and geometry
Thirteenth International Conference on Domain Decomposition Methods Editors: N. Debit, M.Garbey, R. Hoppe, J. Périaux, D. Keyes, Y. Kuznetsov c 200 DDM.org 4 FETI-DP Methods for Elliptic Problems with
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse
More informationA Parallel Solver for Laplacian Matrices. Tristan Konolige (me) and Jed Brown
A Parallel Solver for Laplacian Matrices Tristan Konolige (me) and Jed Brown Graph Laplacian Matrices Covered by other speakers (hopefully) Useful in a variety of areas Graphs are getting very big Facebook
More informationSparse Linear Systems
1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.3 Iterative Methods Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign
More informationImproved Algebraic Cryptanalysis of QUAD, Bivium and Trivium via Graph Partitioning on Equation Systems
Improved Algebraic of QUAD, Bivium and Trivium via on Equation Systems Kenneth Koon-Ho Wong 1, Gregory V. Bard 2 1 Information Security Institute Queensland University of Technology, Brisbane, Australia
More informationAdvanced Numerical Techniques for Cluster Computing
Advanced Numerical Techniques for Cluster Computing Presented by Piotr Luszczek http://icl.cs.utk.edu/iter-ref/ Presentation Outline Motivation hardware Dense matrix calculations Sparse direct solvers
More informationNonsymmetric Problems. Abstract. The eect of a threshold variant TPABLO of the permutation
Threshold Ordering for Preconditioning Nonsymmetric Problems Michele Benzi 1, Hwajeong Choi 2, Daniel B. Szyld 2? 1 CERFACS, 42 Ave. G. Coriolis, 31057 Toulouse Cedex, France (benzi@cerfacs.fr) 2 Department
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationTHE application of advanced computer architecture and
544 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 45, NO. 3, MARCH 1997 Scalable Solutions to Integral-Equation and Finite-Element Simulations Tom Cwik, Senior Member, IEEE, Daniel S. Katz, Member,
More informationTHE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS
ITERATIVE LIEAR SOLVERS. Objectives The goals of the laboratory workshop are as follows: to learn basic properties of iterative methods for solving linear least squares problems, to study the properties
More informationAdvances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing
Advances in Parallel Partitioning, Load Balancing and Matrix Ordering for Scientific Computing Erik G. Boman 1, Umit V. Catalyurek 2, Cédric Chevalier 1, Karen D. Devine 1, Ilya Safro 3, Michael M. Wolf
More informationA direct solver with reutilization of previously-computed LU factorizations for h-adaptive finite element grids with point singularities
Maciej Paszynski AGH University of Science and Technology Faculty of Computer Science, Electronics and Telecommunication Department of Computer Science al. A. Mickiewicza 30 30-059 Krakow, Poland tel.
More informationACCELERATING PRECONDITIONED ITERATIVE LINEAR SOLVERS ON GPU
INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, SERIES B Volume 5, Number 1-2, Pages 136 146 c 2014 Institute for Scientific Computing and Information ACCELERATING PRECONDITIONED ITERATIVE LINEAR
More informationTHE MORTAR FINITE ELEMENT METHOD IN 2D: IMPLEMENTATION IN MATLAB
THE MORTAR FINITE ELEMENT METHOD IN D: IMPLEMENTATION IN MATLAB J. Daněk, H. Kutáková Department of Mathematics, University of West Bohemia, Pilsen MECAS ESI s.r.o., Pilsen Abstract The paper is focused
More informationSolving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems
AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination
More informationA Compiler for Parallel Finite Element Methods. with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS
Contemporary Mathematics Volume 00, 0000 A Compiler for Parallel Finite Element Methods with Domain-Decomposed Unstructured Meshes JONATHAN RICHARD SHEWCHUK AND OMAR GHATTAS December 11, 1993 Abstract.
More informationHigh-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers
High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers July 14, 1997 J Daniel S. Katz (Daniel.S.Katz@jpl.nasa.gov) Jet Propulsion Laboratory California Institute of Technology
More informationIlya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker)
Ilya Lashuk, Merico Argentati, Evgenii Ovtchinnikov, Andrew Knyazev (speaker) Department of Mathematics and Center for Computational Mathematics University of Colorado at Denver SIAM Conference on Parallel
More informationIterative methods for use with the Fast Multipole Method
Iterative methods for use with the Fast Multipole Method Ramani Duraiswami Perceptual Interfaces and Reality Lab. Computer Science & UMIACS University of Maryland, College Park, MD Joint work with Nail
More informationA High-Order Accurate Unstructured GMRES Solver for Poisson s Equation
A High-Order Accurate Unstructured GMRES Solver for Poisson s Equation Amir Nejat * and Carl Ollivier-Gooch Department of Mechanical Engineering, The University of British Columbia, BC V6T 1Z4, Canada
More informationGeneralized trace ratio optimization and applications
Generalized trace ratio optimization and applications Mohammed Bellalij, Saïd Hanafi, Rita Macedo and Raca Todosijevic University of Valenciennes, France PGMO Days, 2-4 October 2013 ENSTA ParisTech PGMO
More informationPreconditioner updates for solving sequences of linear systems in matrix-free environment
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2000; 00:1 6 [Version: 2002/09/18 v1.02] Preconditioner updates for solving sequences of linear systems in matrix-free environment
More informationMUMPS. The MUMPS library. Abdou Guermouche and MUMPS team, June 22-24, Univ. Bordeaux 1 and INRIA
The MUMPS library Abdou Guermouche and MUMPS team, Univ. Bordeaux 1 and INRIA June 22-24, 2010 MUMPS Outline MUMPS status Recently added features MUMPS and multicores? Memory issues GPU computing Future
More informationParallel Threshold-based ILU Factorization
A short version of this paper appears in Supercomputing 997 Parallel Threshold-based ILU Factorization George Karypis and Vipin Kumar University of Minnesota, Department of Computer Science / Army HPC
More informationThe GPU as a co-processor in FEM-based simulations. Preliminary results. Dipl.-Inform. Dominik Göddeke.
The GPU as a co-processor in FEM-based simulations Preliminary results Dipl.-Inform. Dominik Göddeke dominik.goeddeke@mathematik.uni-dortmund.de Institute of Applied Mathematics University of Dortmund
More informationSolution of 2D Euler Equations and Application to Airfoil Design
WDS'6 Proceedings of Contributed Papers, Part I, 47 52, 26. ISBN 8-86732-84-3 MATFYZPRESS Solution of 2D Euler Equations and Application to Airfoil Design J. Šimák Charles University, Faculty of Mathematics
More informationRandomized Algorithms
Randomized Algorithms Last time Network topologies Intro to MPI Matrix-matrix multiplication Today MPI I/O Randomized Algorithms Parallel k-select Graph coloring Assignment 2 Parallel I/O Goal of Parallel
More informationSolving large systems on HECToR using the 2-lagrange multiplier methods Karangelis, Anastasios; Loisel, Sebastien; Maynard, Chris
Heriot-Watt University Heriot-Watt University Research Gateway Solving large systems on HECToR using the 2-lagrange multiplier methods Karangelis, Anastasios; Loisel, Sebastien; Maynard, Chris Published
More informationThermomechanical and hydraulic industrial simulations using MUMPS at EDF
K u f Thermomechanical and hydraulic industrial simulations using MUMPS at EDF MUMPS User group meeting 15 april 2010 O.Boiteau, C.Denis, F.Zaoui MUltifrontal Massively Parallel sparse direct Solver 1
More informationTools and Primitives for High Performance Graph Computation
Tools and Primitives for High Performance Graph Computation John R. Gilbert University of California, Santa Barbara Aydin Buluç (LBNL) Adam Lugowski (UCSB) SIAM Minisymposium on Analyzing Massive Real-World
More informationPULP: Fast and Simple Complex Network Partitioning
PULP: Fast and Simple Complex Network Partitioning George Slota #,* Kamesh Madduri # Siva Rajamanickam * # The Pennsylvania State University *Sandia National Laboratories Dagstuhl Seminar 14461 November
More informationSparse Matrices and Graphs: There and Back Again
Sparse Matrices and Graphs: There and Back Again John R. Gilbert University of California, Santa Barbara Simons Institute Workshop on Parallel and Distributed Algorithms for Inference and Optimization
More informationLecture 11: Randomized Least-squares Approximation in Practice. 11 Randomized Least-squares Approximation in Practice
Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 11-10/09/013 Lecture 11: Randomized Least-squares Approximation in Practice Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these
More informationResearch Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems
Mathematical Problems in Engineering Volume 2015, Article ID 147286, 7 pages http://dx.doi.org/10.1155/2015/147286 Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity
More informationThe effect of irregular interfaces on the BDDC method for the Navier-Stokes equations
153 The effect of irregular interfaces on the BDDC method for the Navier-Stokes equations Martin Hanek 1, Jakub Šístek 2,3 and Pavel Burda 1 1 Introduction The Balancing Domain Decomposition based on Constraints
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More informationKeywords: Block ILU preconditioner, Krylov subspace methods, Additive Schwarz, Domain decomposition
BLOCK ILU PRECONDITIONERS FOR PARALLEL AMR/C SIMULATIONS Jose J. Camata Alvaro L. G. A. Coutinho Federal University of Rio de Janeiro, NACAD, COPPE Department of Civil Engineering, Rio de Janeiro, Brazil
More informationApplication of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures
Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application
More informationSome progresses on hybrid solvers toward extreme scale
Inria Sophia-Antipolis Sept. 23, 2015 Some progresses on hybrid solvers toward extreme scale Luc Giraud joint work with Inria HiePACS project members HiePACS Inria Project INRIA Bordeaux Sud-Ouest Joint
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More information