2 Fundamentals of Serial Linear Algebra
|
|
- Philip Lloyd
- 5 years ago
- Views:
Transcription
1 . Direct Solution of Linear Systems.. Gaussian Elimination.. LU Decomposition and FBS..3 Cholesky Decomposition..4 Multifrontal Methods. Iterative Solution of Linear Systems.. Jacobi Method Fundamentals of Serial Linear Algebra.. Preconditioned Conjugate Gradient Method (PCG).3 Comparison of Direct and Iterative Methods
2 Fundamentals of Serial Linear Algebra Solution of linear systems plays major role in the FEM for example in linear static analyses this is still the most expensive part of the whole analysis the task is to solve a linear system of equations of the form where : A R B R n n n m A X = B coefficient matrix (e.g.stiffness matrix) right hand side vectors (e.g. load vectors) X R n m solution vectors to be computed (e.g.displacement vectors) best solution technique depends on properties of linear system, for example sparse or dense coefficient matrix A symmetric or unsymmetric A number of right hand sides m size of system n (small, medium, large, ) Nonzero pattern of coefficient matrix A, for example banded matrix
3 Fundamentals of Serial Linear Algebra even linear systems arising from the FEM can have very different characteristics, dependent on application areas, for example in linear statics up to 6 dofs per grid, in heat transfer usually dof per grid element types: denser matrices with solid elements (TETRA, HEXA, etc.) than with D elements (TRIA, QUAD, etc.), because more grids are connected with each other within each element QUAD4 elements, dof per grid Black: element ids blue: grid ids row row row 3 row 4 row 5 row 6 row 7 row 8 row 9 row 0 row row nonzero terms, density 46%
4 Fundamentals of Serial Linear Algebra HEXA8 elements, dof per grid row row row 3 row 4 row 5 row 6 row 7 row 8 row 9 row 0 row row Black: element ids blue: grid ids nonzero terms, density 78%
5 Fundamentals of Serial Linear Algebra linear systems arising from linear static analysis are usually sparse symmetric small number of right hand sides, often m= higher concentration of nonzero terms around the diagonal, but not necessarily banded structure positive definite: v T Av > 0 v 0 due to physics, because remember that was energy built up from strains and stresses (strain energy); if model is properly defined and fixed, any displacement u<>0 should result in a positive strain energy but even linear systems arising from linear static analysis can vary significantly, for example in density because of element types T σ ε V dv = in size because of model size (number of grids and elements) u T Ku
6 Example: piston Fundamentals of Serial Linear Algebra FE model Stiffness matrix # grids 9,90 # elements 43,084 TETRA4 # dofs 8,590 # loads # rows 8,590 # nonzeros,089,7 density 0.33 % # RHSs
7 . Direct Solution of Linear Systems.. Gaussian Elimination Fundamentals of Serial Linear Algebra
8 .. LU Decomposition and FBS. Direct Solution of Linear Systems
9 ..3 Cholesky Decomposition see derivation on board basic dense algorithm not acceptable in terms of run time and memory/disk requirements even for this small example exploitation of sparsity needed possible solutions:. Direct Solution of Linear Systems solvers exploiting bandedness, idea: only computations within a band matrix can be transformed into pseudo-banded form by a suitable permutation matrix P: resequencing P T A P multifrontal methods (see..4)
10 idea: exploit sparsity example matrix..4 Multifrontal Methods during the decomposition of a sparse matrix A it is frequently observed that some rows can be eliminated independently this is due to the fact that the elimination of a row k creates a contribution to a row i only if the term in row k and column i of the T transpose Cholesky factor L is not equal to 0, that means if 0 the resulting partial ordering of the rows is usually represented by an elimination tree in an elimination tree each row k of the linear system to be solved is represented as a node l ik
11 If a node i is an ancestor of node k in the tree, then row k must be eliminated before row i formally the elimination tree is defined as directed graph if A is an nxn matrix, we define V = { v k {,..., n}} T..4 Multifrontal Methods T ( A) = ( VT, ET ); VT : vertices, ET k :edges ET = {( vi, vk ) VT VT i = min{ q { k +,..., n} lqk 0}} where l qk is again the term in row q and column k of the Cholesky T factor and thus the term in row k and column q of L L v this definition means that i is the parent of k if and only if i is the column index of the first offdiagonal term in row k of the T transpose Cholesky factor L this definition is intuitive, because if column i contains the first T offdiagonal term in row k of L, then row i is the first row below row k to which the elimination of row k creates contributions, therefore it makes sense to define v i as closest ancestor, which means parent, of in the elimination tree v k v
12 ..4 Multifrontal Methods Elimination tree for our 9x9 sample matrix once the elimination tree has been created, the algorithm for the multifrontal matrix decomposition can be described the multifrontal decomposition executes a bottom-up traversal of the elimination tree for each node s we create a dense nfront(s)xnfront(s) submatrix, T where nfront(s) is the number of nonzero terms in row s of L this submatrix is called front s, nfront(s) is the corresponding front size for symmetric matrices we store only the upper (or lower) triangle
13 ..4 Multifrontal Methods Then the created front is initialized with 0s for a leaf node the next step is to fill the first row of front s with a s, j s 0 after that the first row of front s is eliminated by applying an algorithm similar to procedure CHOLESKY to the front, except that the outermost look is only executed for k= (only first row is eliminated) after this elimination row of front s is equal to row s of column s of L respectively) the remaining rows of front s have to be passed to the parent node as the contributions of row s to other matrix rows which have not been eliminated yet nonleaf nodes s are processed similar, except for the fact that between the initialization of front s with a s, j s 0 and its elimination the contributions of the fronts of the children in the elimination tree have to be assembled into front s T L (and
14 ..4 Multifrontal Methods Example: assembly and elimination of front and Row/column of factor Contributions from elimination of row Row/column of factor Contributions from elimination of row Example: assembly of front This procedure is continued until the root is reached used in MSC/NASTRAN for direct solution of linear systems
15 ..4 Multifrontal Methods Example: multifrontal decomposition in MSC/NASTRAN for piston model: 3:36:50 : SEKRRS 7 DCMP BEGN *** USER INFORMATION MESSAGE 457 (DFMSYN) PARAMETERS FOR SPARSE DECOMPOSITION OF DATA BLOCK KLL ( TYPE=RDP ) FOLLOW MATRIX SIZE = 8590 ROWS NUMBER OF NONZEROES = TERMS NUMBER OF ZERO COLUMNS = 0 NUMBER OF ZERO DIAGONAL TERMS = 0 CPU TIME ESTIMATE = 09 SEC I/O TIME ESTIMATE = SEC MINIMUM MEMORY REQUIREMENT = 377 K WORDS MEMORY AVAILABLE = 8560 K WORDS MEMORY REQR'D TO AVOID SPILL = 963 K WORDS EST. INTEGER WORDS IN FACTOR = 965 K WORDS EST. NONZERO TERMS = 545 K TERMS ESTIMATED MAXIMUM FRONT SIZE = 966 TERMS RANK OF UPDATE = 6 3:36:58 : SPDC BGN TE=09 3:37:35 : # # SPDC END *** USER INFORMATION MESSAGE 6439 (DFMSA) ACTUAL MEMORY AND DISK SPACE REQUIREMENTS FOR SPARSE SYM. DECOMPOSITION SPARSE DECOMP MEMORY USED = 963 K WORDS MAXIMUM FRONT SIZE = 966 TERMS INTEGER WORDS IN FACTOR = 6 K WORDS NONZERO TERMS IN FACTOR = 545 K TERMS SPARSE DECOMP SUGGESTED MEMORY = 905 K WORDS *8** Module DMAP Matrix Cols Rows F T IBlks NBlks NumFrt FrtMax DCMP 7 LLL *8** *8** Module DMAP Matrix Cols Rows F T NzWds Density BlockT StrL NbrStr BndAvg BndMax NulCol DCMP 7 SCRATCH D *8** DCMP 7 SCRATCH D *8** 3:37:35 : # # SEKRRS DCMP END CPU time of decomposition: 8 seconds factor size: 5.5 mio nonzeros, 4.6 MB maximum front size: maximum number of nonzeros in a column of L 9 9 number of FLOPS: decomp: , FBS: enormous savings if compared to dense algorithms
16 . Iterative Solution of Linear Systems.. Jacobi Method Fundamentals of Serial Linear Algebra
17 Fundamentals of Serial Linear Algebra.. Preconditioned Conjugate Gradient Method (PCG) belongs to nonstationary methods nonstationary methods use projection or direction vectors or other search algorithms to obtain updated approximate solutions sketch of derivation of CG method: x( i +) basic idea: try to find a new approximate solution vector ( +) which minimizes the functional x i x( i +) minimization of F will decrease residual and make converge
18 CG algorithm:.. Preconditioned Conjugate Gradient Method (PCG)
19 PCG algorithm:.. Preconditioned Conjugate Gradient Method (PCG)
20 PCG method is the basis of almost any effective iterative solver found in commercial finite element programs today, they vary mainly in the applied preconditioning techniques example: run iterative solver in MSC/NASTRAN with Jacobi preconditioning add NASTRAN ITER=YES on top of data deck add SMETHOD=<SID> in case control section add ITER <SID>.. Preconditioned Conjugate Gradient Method (PCG) PRECOND=J MSGFLG=YES in bulk data section our piston with iterative solver: nastran pist0000it mem=0m scr=yes convergence history in f06 file: *** USER INFORMATION MESSAGE 6447 (SITDRV) ITERATIVE SOLVER DIAGNOSTIC OUTPUT MXY FITS INCORE EPS : E-06 JACOBI PRECONDITIONING ITERATION NUMBER CONVERGENCE RATIO NORM OF RESIDUAL E E E E E E+0
21 .. Preconditioned Conjugate Gradient Method (PCG) convergence history in f06 file (cont d): E E E E-05 ITERATION NUMBER CONVERGENCE RATIO LOAD NUMBER E iterations effort in each iteration of Jacobi preconditioning dominated by matrix-vector multiplication *nnz-n FLOPs 3 vector products 3*(n-) FLOPs 3 scaled vector updates: 3*(n) FLOPs Jacobi preconditioning step: n FLOPs for i iterations approx. i*(nnz+n) FLOPs in piston example: number of FLOPs approx. 43*(*,089,7+*8,590) =,04,80,4 =.04 e09 FLOPs
22 F04 file pist0000it.f04 shows:.. Preconditioned Conjugate Gradient Method (PCG) :0:38 : STATRS 56 SOLVIT BEGN *** SYSTEM INFORMATION MESSAGE 457 (SITDRV) PARAMETERS FOR THE ITERATIVE SOLUTION WITH DATA BLOCK KLL (TYPE = RDP ) FOLLOW MATRIX SIZE = 8590 ROWS DENSITY = STRING LENGTH = 4.9 AVG NUMBER OF STRINGS = 59 K NONZERO TERMS = 089 K FULL BAND WIDTH = 548 AVG MEMORY AVAILABLE = 8560 K WORDS MIN MEMORY NEEDED = 48 K WORDS NUMBER OF RHS = NUMBER OF PASSES = OPTIMAL MEMORY = 0 K WORDS PREFACE CPU TIME = 0.00 SECONDS AVG. CPU/ITER = SECONDS *8** Module DMAP Matrix Cols Rows F T NzWds Density BlockT StrL NbrStr BndAvg BndMax NulCol SOLVIT 56 UL D *8** SOLVIT 56 RUL D *8** ::5 : # # STATRS SOLVIT END 68. CPU seconds average CPU performance:,04,80,4 MFLOP 68.sec 5.3 MFLOP sec why is the MFLOP/sec rate so low? Dominating operation is a sparse matrix-vector multiplication low data locality: ratio of data transfer from/to memory over number of operations is high indexed operations (supported by special hardware in vector supercomputers!
23 .3 Comparison of Direct and Iterative Methods Advantages of direct methods: Fundamentals of Serial Linear Algebra robust: delivers solution for any properly defined finite element model easy to use: can be used as black box solver, without the need for selecting special parameters if a linear system with multiple right hand sides has to be solved, one (expensive) decomposition followed by multiple (cheap) FBSes is sufficient high data locality: ratio of data transfer to number of operations is low good for modern computer architectures (like RISC with cache memory), highly tuned kernels can be used, e.g. BLAS in the piston example, an average of 46.4 MFLOP/sec can be achieved on HP Omnibook for the multifrontal decomposition of the matrix
24 Disadvantages of direct methods: basic algorithms (e.g. Cholesky decomposition) are not suitable for the very large, sparse matrices arising from the FEM sophisticated algorithms are required, for example multifrontal methods high number of operations, for example 30 MFLOP for piston with direct multifrontal solution (04 MFLOP for iterative solution with simple Jacobi preconditioning!) the computed matrix factor (Cholesky factor) can grow very large, in the small piston example: data for matrix: for each nonzero term we store double precision numerical value (8 bytes) plus one integer for row position (4 bytes) storage of upper (or lower) triangle including diagonal is sufficient for symmetric matrix in total:.3 Comparison of Direct and Iterative Methods (,089,7 + 8,590) 6.4 MB,04,04
25 data for factor: from f04 file (UIM 6439): integer words in factor: 6,000 nonzero terms in factor: 5,45,000 in total: ( 5,45, ,000) MB,04,04 appr. 6.7 times more than the amount of data in the matrix due to fill-in high amount of I/O:.3 Comparison of Direct and Iterative Methods factor is written to disk in decomposition (absolutely necessary for large matrices) factor is read twice in each FBS (once forward, once backward)
26 .3 Comparison of Direct and Iterative Methods Advantages of iterative methods number of operations is often lower than with direct methods, in the FEM this is in general true with solid models, I.e. with models built from tetrahedrons, hexahedrons and wedges no fill-in, at least not for simple preconditioning techniques like Jacobi storage requirements dominated by memory for matrix low or even no I/O traffic during iterations if matrix fits into memory iterative methods are usually the best method for solid models with quadratic elements (TETRA0, HEXA0, etc.) Disadvantages of iterative methods less robust, often convergence problems with shell models like car bodies (shell elements are for example quadrilateral and triangular elements)
27 Fundamentals of Serial Linear Algebra Example: van body on IBM RS/ H # grids 9,066 with PCG and Jacobi precond. 36,50 iterations, 38,96 seconds! (note influence of round-off errors, in theory n=47,07 iterations) direct multifrontal solver: 79 seconds # elements 6,874 QUAD4,77 TRIA3 57 BAR 5 ELAS # dofs 47,07 # nzts,336,904 # loads in FEM analysis direct methods are still preferred for shell element models like car bodies, planes, etc.
28 .3 Comparison of Direct and Iterative Methods Disadvantages of iterative methods (cont d) lower MFLOP/sec rates, therefore in many cases where number of operations would be lower than with direct methods, direct methods are still faster; this happens often with linear solid elements (TETRA4, HEXA8) careful selection of preconditioners required the more elaborate the preconditioner, the lower the number of iterations but effort to compute this preconditioner and its storage requirements go up, preconditioning step gets more expensive example: block incomplete Cholesky preconditioner (BIC) in MSC/NASTRAN for piston (P is computed by an incomplete decomposition of A, fill-in is partially ignored): #iterations CPU time Memory PCG+J sec 0 KW = 8.4 MB PCG+BIC sec 3857 KW = 4.7 MB
29 .3 Comparison of Direct and Iterative Methods Now faster than direct solution! Iterative solvers usually cannot be used as black box solvers yet with number of right hand sides m>, iterative algorithm usually has to be repeated for each RHS number of operations increases by a factor of m increase in computation time is lower, since data locality is higher (algorithms work on multiple vectors simultaneously) note: so-called projection methods, which exploit the existence of multiple RHSs to find better direction vectors p can improve the situation, but are not discussed here ( search into multiple directions simultaneously) summing up: in the FEM, iterative methods often result in lower number of operations for solid models and require less (disk) storage, but are more difficult to apply and require numerical background knowledge from the engineer.
Contents. I The Basic Framework for Stationary Problems 1
page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other
More informationSparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg
Sparse Matrices Many matrices in computing only contain a very small percentage of nonzeros. Such matrices are called sparse ( dünn besetzt ). Often, an upper bound on the number of nonzeros in a row can
More informationContents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet
Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage
More informationHYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S
More information6 Implementation of Parallel FE Systems
6 Implementation of Parallel FE Systems 6.1 Implementation of Domain Decomposition in MSC.NASTRAN V70.7 6.2 Further Parallel Features of MSC.NASTRAN V70.7 6.2.1 Parallel Normal Modes Analysis 6.2.2 Parallel
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26
More informationReport of Linear Solver Implementation on GPU
Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,
More informationSCALABLE ALGORITHMS for solving large sparse linear systems of equations
SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational
More informationSparse Matrices Direct methods
Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.
More informationPerformance Evaluation of a New Parallel Preconditioner
Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The
More informationA Parallel Implementation of the BDDC Method for Linear Elasticity
A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague
More informationEFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI
EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,
More informationChapter 4. Matrix and Vector Operations
1 Scope of the Chapter Chapter 4 This chapter provides procedures for matrix and vector operations. This chapter (and Chapters 5 and 6) can handle general matrices, matrices with special structure and
More informationSparse Linear Systems
1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite
More informationHigh-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers
High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers July 14, 1997 J Daniel S. Katz (Daniel.S.Katz@jpl.nasa.gov) Jet Propulsion Laboratory California Institute of Technology
More informationGTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013
GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»
More informationSecond Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering
State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France
More informationExploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology
Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationSELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND
Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana
More informationIterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms
Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear
More informationPerformance Evaluation of a New Parallel Preconditioner
Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller October 994 CMU-CS-94-25 Marco Zagha School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 This
More informationAim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview
Aim Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity Julian Hall School of Mathematics University of Edinburgh jajhall@ed.ac.uk What should a 2-hour PhD lecture on structure
More informationMatrix-free IPM with GPU acceleration
Matrix-free IPM with GPU acceleration Julian Hall, Edmund Smith and Jacek Gondzio School of Mathematics University of Edinburgh jajhall@ed.ac.uk 29th June 2011 Linear programming theory Primal-dual pair
More informationTowards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers
Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationME964 High Performance Computing for Engineering Applications
ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964
More informationIntel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) Sparse Solvers Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager Copyright 3, Intel Corporation. All rights reserved. Sparse
More informationGPU COMPUTING WITH MSC NASTRAN 2013
SESSION TITLE WILL BE COMPLETED BY MSC SOFTWARE GPU COMPUTING WITH MSC NASTRAN 2013 Srinivas Kodiyalam, NVIDIA, Santa Clara, USA THEME Accelerated computing with GPUs SUMMARY Current trends in HPC (High
More informationParallel solution for finite element linear systems of. equations on workstation cluster *
Aug. 2009, Volume 6, No.8 (Serial No.57) Journal of Communication and Computer, ISSN 1548-7709, USA Parallel solution for finite element linear systems of equations on workstation cluster * FU Chao-jiang
More informationTechniques for Optimizing FEM/MoM Codes
Techniques for Optimizing FEM/MoM Codes Y. Ji, T. H. Hubing, and H. Wang Electromagnetic Compatibility Laboratory Department of Electrical & Computer Engineering University of Missouri-Rolla Rolla, MO
More informationIntel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager
Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides
More informationLecture 17: More Fun With Sparse Matrices
Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you
More informationEfficient Minimization of New Quadric Metric for Simplifying Meshes with Appearance Attributes
Efficient Minimization of New Quadric Metric for Simplifying Meshes with Appearance Attributes (Addendum to IEEE Visualization 1999 paper) Hugues Hoppe Steve Marschner June 2000 Technical Report MSR-TR-2000-64
More informationCSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus
CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus Tim Davis last modified September 23, 2014 1 Catalog Description CSCE 689. Special
More informationReckoning With The Limits Of FEM Analysis
Special reprint from CAD CAM 9-10/2008 Reckoning With The Limits Of FEM Analysis 27. Jahrgang 11,90 N 9-10 September/Oktober 2008 TRENDS - TECHNOLOGIEN - BEST PRACTICE DIGITALE FABRIK: VIRTUELLE PRODUKTION
More informationExam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3
UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis
More informationFigure 6.1: Truss topology optimization diagram.
6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.
More informationfspai-1.1 Factorized Sparse Approximate Inverse Preconditioner
fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 09 10 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer
More informationLecture 15: More Iterative Ideas
Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!
More informationFinite Element Implementation
Chapter 8 Finite Element Implementation 8.1 Elements Elements andconditions are the main extension points of Kratos. New formulations can be introduced into Kratos by implementing a new Element and its
More informationApproaches to Parallel Implementation of the BDDC Method
Approaches to Parallel Implementation of the BDDC Method Jakub Šístek Includes joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík. Institute of Mathematics of the AS CR, Prague
More information(Sparse) Linear Solvers
(Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert
More informationThe Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method
The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method VLADIMIR ROTKIN and SIVAN TOLEDO Tel-Aviv University We describe a new out-of-core sparse Cholesky factorization
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationAlgorithms and Architecture. William D. Gropp Mathematics and Computer Science
Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse
More informationAA220/CS238 Parallel Methods in Numerical Analysis. Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems)
AA0/CS8 Parallel ethods in Numerical Analysis Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems) Kincho H. Law Professor of Civil and Environmental Engineering Stanford University
More informationOn the Parallel Solution of Sparse Triangular Linear Systems. M. Naumov* San Jose, CA May 16, 2012 *NVIDIA
On the Parallel Solution of Sparse Triangular Linear Systems M. Naumov* San Jose, CA May 16, 2012 *NVIDIA Why Is This Interesting? There exist different classes of parallel problems Embarrassingly parallel
More informationAccelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations
Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Hartwig Anzt 1, Marc Baboulin 2, Jack Dongarra 1, Yvan Fournier 3, Frank Hulsemann 3, Amal Khabou 2, and Yushan Wang 2 1 University
More informationfspai-1.0 Factorized Sparse Approximate Inverse Preconditioner
fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 08 01 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer
More informationBLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker
BLAS and LAPACK + Data Formats for Sparse Matrices Part of the lecture Wissenschaftliches Rechnen Hilmar Wobker Institute of Applied Mathematics and Numerics, TU Dortmund email: hilmar.wobker@math.tu-dortmund.de
More informationSolving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems
AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination
More informationSparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009
Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Dan l Pierce, PhD, MBA, CEO & President AAI Joint with: Yukai Hung, Chia-Chi Liu, Yao-Hung Tsai, Weichung Wang, and David Yu Access
More informationOutline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency
1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming
More informationSuper Matrix Solver-P-ICCG:
Super Matrix Solver-P-ICCG: February 2011 VINAS Co., Ltd. Project Development Dept. URL: http://www.vinas.com All trademarks and trade names in this document are properties of their respective owners.
More informationWindows Hardware Performance Tuning for Nastran. Easwaran Viswanathan (Siemens PLM Software)
Windows Hardware Performance Tuning for Nastran By Easwaran Viswanathan (Siemens PLM Software) NX Nastran is a very I/O intensive application. It is important to select the proper hardware to satisfy expected
More informationPreconditioning Linear Systems Arising from Graph Laplacians of Complex Networks
Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Kevin Deweese 1 Erik Boman 2 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms
More informationExploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement
Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Tim Davis (Texas A&M University) with Sanjay Ranka, Mohamed Gadou (University of Florida) Nuri Yeralan (Microsoft) NVIDIA
More informationESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report
ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new
More informationCS 542G: Solving Sparse Linear Systems
CS 542G: Solving Sparse Linear Systems Robert Bridson November 26, 2008 1 Direct Methods We have already derived several methods for solving a linear system, say Ax = b, or the related leastsquares problem
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationLecture 27: Fast Laplacian Solvers
Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall
More informationRobot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss
Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for
More informationGraphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General
Robot Mapping Three Main SLAM Paradigms Least Squares Approach to SLAM Kalman filter Particle filter Graphbased Cyrill Stachniss least squares approach to SLAM 1 2 Least Squares in General! Approach for
More informationParallel resolution of sparse linear systems by mixing direct and iterative methods
Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More information1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3
6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationPARDISO Version Reference Sheet Fortran
PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly
More informationIterative Sparse Triangular Solves for Preconditioning
Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations
More informationComputational Fluid Dynamics - Incompressible Flows
Computational Fluid Dynamics - Incompressible Flows March 25, 2008 Incompressible Flows Basis Functions Discrete Equations CFD - Incompressible Flows CFD is a Huge field Numerical Techniques for solving
More informationResearch Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems
Mathematical Problems in Engineering Volume 2015, Article ID 147286, 7 pages http://dx.doi.org/10.1155/2015/147286 Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity
More informationComputational issues in linear programming
Computational issues in linear programming Julian Hall School of Mathematics University of Edinburgh 15th May 2007 Computational issues in linear programming Overview Introduction to linear programming
More information(Sparse) Linear Solvers
(Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 1 Don t you just invert
More informationEfficient Use of Iterative Solvers in Nested Topology Optimization
Efficient Use of Iterative Solvers in Nested Topology Optimization Oded Amir, Mathias Stolpe and Ole Sigmund Technical University of Denmark Department of Mathematics Department of Mechanical Engineering
More informationA parallel direct/iterative solver based on a Schur complement approach
A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationSparse Direct Solvers for Extreme-Scale Computing
Sparse Direct Solvers for Extreme-Scale Computing Iain Duff Joint work with Florent Lopez and Jonathan Hogg STFC Rutherford Appleton Laboratory SIAM Conference on Computational Science and Engineering
More informationLeast-Squares Fitting of Data with B-Spline Curves
Least-Squares Fitting of Data with B-Spline Curves David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International
More informationFinite Element Modeling Techniques (2) دانشگاه صنعتي اصفهان- دانشكده مكانيك
Finite Element Modeling Techniques (2) 1 Where Finer Meshes Should be Used GEOMETRY MODELLING 2 GEOMETRY MODELLING Reduction of a complex geometry to a manageable one. 3D? 2D? 1D? Combination? Bulky solids
More informationPreconditioning for linear least-squares problems
Preconditioning for linear least-squares problems Miroslav Tůma Institute of Computer Science Academy of Sciences of the Czech Republic tuma@cs.cas.cz joint work with Rafael Bru, José Marín and José Mas
More informationSolid and shell elements
Solid and shell elements Theodore Sussman, Ph.D. ADINA R&D, Inc, 2016 1 Overview 2D and 3D solid elements Types of elements Effects of element distortions Incompatible modes elements u/p elements for incompressible
More informationScientific Computing. Some slides from James Lambers, Stanford
Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical
More informationTHE application of advanced computer architecture and
544 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 45, NO. 3, MARCH 1997 Scalable Solutions to Integral-Equation and Finite-Element Simulations Tom Cwik, Senior Member, IEEE, Daniel S. Katz, Member,
More informationApplication of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures
Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application
More informationHow to use FEKO with Altair HyperMesh
How to use FEKO with Altair HyperMesh This How To applies to: FEKO Suite 6.2, HyperMesh 11.0 Users who would like to make use of the benefits of the advanced meshing features of Altair HyperMesh while
More informationUppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2
Uppsala University Department of Information technology Hands-on : Ill-conditioning Exercise (Ill-conditioned linear systems) Definition A system of linear equations is said to be ill-conditioned when
More informationEvaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li
Evaluation of sparse LU factorization and triangular solution on multicore architectures X. Sherry Li Lawrence Berkeley National Laboratory ParLab, April 29, 28 Acknowledgement: John Shalf, LBNL Rich Vuduc,
More informationEmpirical Complexity of Laplacian Linear Solvers: Discussion
Empirical Complexity of Laplacian Linear Solvers: Discussion Erik Boman, Sandia National Labs Kevin Deweese, UC Santa Barbara John R. Gilbert, UC Santa Barbara 1 Simons Institute Workshop on Fast Algorithms
More informationAnalysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms
Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms H. Anzt, V. Heuveline Karlsruhe Institute of Technology, Germany
More informationCombinatorial problems in a Parallel Hybrid Linear Solver
Combinatorial problems in a Parallel Hybrid Linear Solver Ichitaro Yamazaki and Xiaoye Li Lawrence Berkeley National Laboratory François-Henry Rouet and Bora Uçar ENSEEIHT-IRIT and LIP, ENS-Lyon SIAM workshop
More informationExample 24 Spring-back
Example 24 Spring-back Summary The spring-back simulation of sheet metal bent into a hat-shape is studied. The problem is one of the famous tests from the Numisheet 93. As spring-back is generally a quasi-static
More informationSparse matrices, graphs, and tree elimination
Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);
More informationBDDCML. solver library based on Multi-Level Balancing Domain Decomposition by Constraints copyright (C) Jakub Šístek version 1.
BDDCML solver library based on Multi-Level Balancing Domain Decomposition by Constraints copyright (C) 2010-2012 Jakub Šístek version 1.3 Jakub Šístek i Table of Contents 1 Introduction.....................................
More informationAMS527: Numerical Analysis II
AMS527: Numerical Analysis II A Brief Overview of Finite Element Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 1 / 25 Overview Basic concepts Mathematical
More informationLibraries for Scientific Computing: an overview and introduction to HSL
Libraries for Scientific Computing: an overview and introduction to HSL Mario Arioli Jonathan Hogg STFC Rutherford Appleton Laboratory 2 / 41 Overview of talk Brief introduction to who we are An overview
More informationPerformance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply
Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply University of California, Berkeley Berkeley Benchmarking and Optimization Group (BeBOP) http://bebop.cs.berkeley.edu
More informationA Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography
1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography
More informationComputational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science
Computational Methods CMSC/AMSC/MAPL 460 Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Some special matrices Matlab code How many operations and memory
More information