2 Fundamentals of Serial Linear Algebra

Size: px
Start display at page:

Download "2 Fundamentals of Serial Linear Algebra"

Transcription

1 . Direct Solution of Linear Systems.. Gaussian Elimination.. LU Decomposition and FBS..3 Cholesky Decomposition..4 Multifrontal Methods. Iterative Solution of Linear Systems.. Jacobi Method Fundamentals of Serial Linear Algebra.. Preconditioned Conjugate Gradient Method (PCG).3 Comparison of Direct and Iterative Methods

2 Fundamentals of Serial Linear Algebra Solution of linear systems plays major role in the FEM for example in linear static analyses this is still the most expensive part of the whole analysis the task is to solve a linear system of equations of the form where : A R B R n n n m A X = B coefficient matrix (e.g.stiffness matrix) right hand side vectors (e.g. load vectors) X R n m solution vectors to be computed (e.g.displacement vectors) best solution technique depends on properties of linear system, for example sparse or dense coefficient matrix A symmetric or unsymmetric A number of right hand sides m size of system n (small, medium, large, ) Nonzero pattern of coefficient matrix A, for example banded matrix

3 Fundamentals of Serial Linear Algebra even linear systems arising from the FEM can have very different characteristics, dependent on application areas, for example in linear statics up to 6 dofs per grid, in heat transfer usually dof per grid element types: denser matrices with solid elements (TETRA, HEXA, etc.) than with D elements (TRIA, QUAD, etc.), because more grids are connected with each other within each element QUAD4 elements, dof per grid Black: element ids blue: grid ids row row row 3 row 4 row 5 row 6 row 7 row 8 row 9 row 0 row row nonzero terms, density 46%

4 Fundamentals of Serial Linear Algebra HEXA8 elements, dof per grid row row row 3 row 4 row 5 row 6 row 7 row 8 row 9 row 0 row row Black: element ids blue: grid ids nonzero terms, density 78%

5 Fundamentals of Serial Linear Algebra linear systems arising from linear static analysis are usually sparse symmetric small number of right hand sides, often m= higher concentration of nonzero terms around the diagonal, but not necessarily banded structure positive definite: v T Av > 0 v 0 due to physics, because remember that was energy built up from strains and stresses (strain energy); if model is properly defined and fixed, any displacement u<>0 should result in a positive strain energy but even linear systems arising from linear static analysis can vary significantly, for example in density because of element types T σ ε V dv = in size because of model size (number of grids and elements) u T Ku

6 Example: piston Fundamentals of Serial Linear Algebra FE model Stiffness matrix # grids 9,90 # elements 43,084 TETRA4 # dofs 8,590 # loads # rows 8,590 # nonzeros,089,7 density 0.33 % # RHSs

7 . Direct Solution of Linear Systems.. Gaussian Elimination Fundamentals of Serial Linear Algebra

8 .. LU Decomposition and FBS. Direct Solution of Linear Systems

9 ..3 Cholesky Decomposition see derivation on board basic dense algorithm not acceptable in terms of run time and memory/disk requirements even for this small example exploitation of sparsity needed possible solutions:. Direct Solution of Linear Systems solvers exploiting bandedness, idea: only computations within a band matrix can be transformed into pseudo-banded form by a suitable permutation matrix P: resequencing P T A P multifrontal methods (see..4)

10 idea: exploit sparsity example matrix..4 Multifrontal Methods during the decomposition of a sparse matrix A it is frequently observed that some rows can be eliminated independently this is due to the fact that the elimination of a row k creates a contribution to a row i only if the term in row k and column i of the T transpose Cholesky factor L is not equal to 0, that means if 0 the resulting partial ordering of the rows is usually represented by an elimination tree in an elimination tree each row k of the linear system to be solved is represented as a node l ik

11 If a node i is an ancestor of node k in the tree, then row k must be eliminated before row i formally the elimination tree is defined as directed graph if A is an nxn matrix, we define V = { v k {,..., n}} T..4 Multifrontal Methods T ( A) = ( VT, ET ); VT : vertices, ET k :edges ET = {( vi, vk ) VT VT i = min{ q { k +,..., n} lqk 0}} where l qk is again the term in row q and column k of the Cholesky T factor and thus the term in row k and column q of L L v this definition means that i is the parent of k if and only if i is the column index of the first offdiagonal term in row k of the T transpose Cholesky factor L this definition is intuitive, because if column i contains the first T offdiagonal term in row k of L, then row i is the first row below row k to which the elimination of row k creates contributions, therefore it makes sense to define v i as closest ancestor, which means parent, of in the elimination tree v k v

12 ..4 Multifrontal Methods Elimination tree for our 9x9 sample matrix once the elimination tree has been created, the algorithm for the multifrontal matrix decomposition can be described the multifrontal decomposition executes a bottom-up traversal of the elimination tree for each node s we create a dense nfront(s)xnfront(s) submatrix, T where nfront(s) is the number of nonzero terms in row s of L this submatrix is called front s, nfront(s) is the corresponding front size for symmetric matrices we store only the upper (or lower) triangle

13 ..4 Multifrontal Methods Then the created front is initialized with 0s for a leaf node the next step is to fill the first row of front s with a s, j s 0 after that the first row of front s is eliminated by applying an algorithm similar to procedure CHOLESKY to the front, except that the outermost look is only executed for k= (only first row is eliminated) after this elimination row of front s is equal to row s of column s of L respectively) the remaining rows of front s have to be passed to the parent node as the contributions of row s to other matrix rows which have not been eliminated yet nonleaf nodes s are processed similar, except for the fact that between the initialization of front s with a s, j s 0 and its elimination the contributions of the fronts of the children in the elimination tree have to be assembled into front s T L (and

14 ..4 Multifrontal Methods Example: assembly and elimination of front and Row/column of factor Contributions from elimination of row Row/column of factor Contributions from elimination of row Example: assembly of front This procedure is continued until the root is reached used in MSC/NASTRAN for direct solution of linear systems

15 ..4 Multifrontal Methods Example: multifrontal decomposition in MSC/NASTRAN for piston model: 3:36:50 : SEKRRS 7 DCMP BEGN *** USER INFORMATION MESSAGE 457 (DFMSYN) PARAMETERS FOR SPARSE DECOMPOSITION OF DATA BLOCK KLL ( TYPE=RDP ) FOLLOW MATRIX SIZE = 8590 ROWS NUMBER OF NONZEROES = TERMS NUMBER OF ZERO COLUMNS = 0 NUMBER OF ZERO DIAGONAL TERMS = 0 CPU TIME ESTIMATE = 09 SEC I/O TIME ESTIMATE = SEC MINIMUM MEMORY REQUIREMENT = 377 K WORDS MEMORY AVAILABLE = 8560 K WORDS MEMORY REQR'D TO AVOID SPILL = 963 K WORDS EST. INTEGER WORDS IN FACTOR = 965 K WORDS EST. NONZERO TERMS = 545 K TERMS ESTIMATED MAXIMUM FRONT SIZE = 966 TERMS RANK OF UPDATE = 6 3:36:58 : SPDC BGN TE=09 3:37:35 : # # SPDC END *** USER INFORMATION MESSAGE 6439 (DFMSA) ACTUAL MEMORY AND DISK SPACE REQUIREMENTS FOR SPARSE SYM. DECOMPOSITION SPARSE DECOMP MEMORY USED = 963 K WORDS MAXIMUM FRONT SIZE = 966 TERMS INTEGER WORDS IN FACTOR = 6 K WORDS NONZERO TERMS IN FACTOR = 545 K TERMS SPARSE DECOMP SUGGESTED MEMORY = 905 K WORDS *8** Module DMAP Matrix Cols Rows F T IBlks NBlks NumFrt FrtMax DCMP 7 LLL *8** *8** Module DMAP Matrix Cols Rows F T NzWds Density BlockT StrL NbrStr BndAvg BndMax NulCol DCMP 7 SCRATCH D *8** DCMP 7 SCRATCH D *8** 3:37:35 : # # SEKRRS DCMP END CPU time of decomposition: 8 seconds factor size: 5.5 mio nonzeros, 4.6 MB maximum front size: maximum number of nonzeros in a column of L 9 9 number of FLOPS: decomp: , FBS: enormous savings if compared to dense algorithms

16 . Iterative Solution of Linear Systems.. Jacobi Method Fundamentals of Serial Linear Algebra

17 Fundamentals of Serial Linear Algebra.. Preconditioned Conjugate Gradient Method (PCG) belongs to nonstationary methods nonstationary methods use projection or direction vectors or other search algorithms to obtain updated approximate solutions sketch of derivation of CG method: x( i +) basic idea: try to find a new approximate solution vector ( +) which minimizes the functional x i x( i +) minimization of F will decrease residual and make converge

18 CG algorithm:.. Preconditioned Conjugate Gradient Method (PCG)

19 PCG algorithm:.. Preconditioned Conjugate Gradient Method (PCG)

20 PCG method is the basis of almost any effective iterative solver found in commercial finite element programs today, they vary mainly in the applied preconditioning techniques example: run iterative solver in MSC/NASTRAN with Jacobi preconditioning add NASTRAN ITER=YES on top of data deck add SMETHOD=<SID> in case control section add ITER <SID>.. Preconditioned Conjugate Gradient Method (PCG) PRECOND=J MSGFLG=YES in bulk data section our piston with iterative solver: nastran pist0000it mem=0m scr=yes convergence history in f06 file: *** USER INFORMATION MESSAGE 6447 (SITDRV) ITERATIVE SOLVER DIAGNOSTIC OUTPUT MXY FITS INCORE EPS : E-06 JACOBI PRECONDITIONING ITERATION NUMBER CONVERGENCE RATIO NORM OF RESIDUAL E E E E E E+0

21 .. Preconditioned Conjugate Gradient Method (PCG) convergence history in f06 file (cont d): E E E E-05 ITERATION NUMBER CONVERGENCE RATIO LOAD NUMBER E iterations effort in each iteration of Jacobi preconditioning dominated by matrix-vector multiplication *nnz-n FLOPs 3 vector products 3*(n-) FLOPs 3 scaled vector updates: 3*(n) FLOPs Jacobi preconditioning step: n FLOPs for i iterations approx. i*(nnz+n) FLOPs in piston example: number of FLOPs approx. 43*(*,089,7+*8,590) =,04,80,4 =.04 e09 FLOPs

22 F04 file pist0000it.f04 shows:.. Preconditioned Conjugate Gradient Method (PCG) :0:38 : STATRS 56 SOLVIT BEGN *** SYSTEM INFORMATION MESSAGE 457 (SITDRV) PARAMETERS FOR THE ITERATIVE SOLUTION WITH DATA BLOCK KLL (TYPE = RDP ) FOLLOW MATRIX SIZE = 8590 ROWS DENSITY = STRING LENGTH = 4.9 AVG NUMBER OF STRINGS = 59 K NONZERO TERMS = 089 K FULL BAND WIDTH = 548 AVG MEMORY AVAILABLE = 8560 K WORDS MIN MEMORY NEEDED = 48 K WORDS NUMBER OF RHS = NUMBER OF PASSES = OPTIMAL MEMORY = 0 K WORDS PREFACE CPU TIME = 0.00 SECONDS AVG. CPU/ITER = SECONDS *8** Module DMAP Matrix Cols Rows F T NzWds Density BlockT StrL NbrStr BndAvg BndMax NulCol SOLVIT 56 UL D *8** SOLVIT 56 RUL D *8** ::5 : # # STATRS SOLVIT END 68. CPU seconds average CPU performance:,04,80,4 MFLOP 68.sec 5.3 MFLOP sec why is the MFLOP/sec rate so low? Dominating operation is a sparse matrix-vector multiplication low data locality: ratio of data transfer from/to memory over number of operations is high indexed operations (supported by special hardware in vector supercomputers!

23 .3 Comparison of Direct and Iterative Methods Advantages of direct methods: Fundamentals of Serial Linear Algebra robust: delivers solution for any properly defined finite element model easy to use: can be used as black box solver, without the need for selecting special parameters if a linear system with multiple right hand sides has to be solved, one (expensive) decomposition followed by multiple (cheap) FBSes is sufficient high data locality: ratio of data transfer to number of operations is low good for modern computer architectures (like RISC with cache memory), highly tuned kernels can be used, e.g. BLAS in the piston example, an average of 46.4 MFLOP/sec can be achieved on HP Omnibook for the multifrontal decomposition of the matrix

24 Disadvantages of direct methods: basic algorithms (e.g. Cholesky decomposition) are not suitable for the very large, sparse matrices arising from the FEM sophisticated algorithms are required, for example multifrontal methods high number of operations, for example 30 MFLOP for piston with direct multifrontal solution (04 MFLOP for iterative solution with simple Jacobi preconditioning!) the computed matrix factor (Cholesky factor) can grow very large, in the small piston example: data for matrix: for each nonzero term we store double precision numerical value (8 bytes) plus one integer for row position (4 bytes) storage of upper (or lower) triangle including diagonal is sufficient for symmetric matrix in total:.3 Comparison of Direct and Iterative Methods (,089,7 + 8,590) 6.4 MB,04,04

25 data for factor: from f04 file (UIM 6439): integer words in factor: 6,000 nonzero terms in factor: 5,45,000 in total: ( 5,45, ,000) MB,04,04 appr. 6.7 times more than the amount of data in the matrix due to fill-in high amount of I/O:.3 Comparison of Direct and Iterative Methods factor is written to disk in decomposition (absolutely necessary for large matrices) factor is read twice in each FBS (once forward, once backward)

26 .3 Comparison of Direct and Iterative Methods Advantages of iterative methods number of operations is often lower than with direct methods, in the FEM this is in general true with solid models, I.e. with models built from tetrahedrons, hexahedrons and wedges no fill-in, at least not for simple preconditioning techniques like Jacobi storage requirements dominated by memory for matrix low or even no I/O traffic during iterations if matrix fits into memory iterative methods are usually the best method for solid models with quadratic elements (TETRA0, HEXA0, etc.) Disadvantages of iterative methods less robust, often convergence problems with shell models like car bodies (shell elements are for example quadrilateral and triangular elements)

27 Fundamentals of Serial Linear Algebra Example: van body on IBM RS/ H # grids 9,066 with PCG and Jacobi precond. 36,50 iterations, 38,96 seconds! (note influence of round-off errors, in theory n=47,07 iterations) direct multifrontal solver: 79 seconds # elements 6,874 QUAD4,77 TRIA3 57 BAR 5 ELAS # dofs 47,07 # nzts,336,904 # loads in FEM analysis direct methods are still preferred for shell element models like car bodies, planes, etc.

28 .3 Comparison of Direct and Iterative Methods Disadvantages of iterative methods (cont d) lower MFLOP/sec rates, therefore in many cases where number of operations would be lower than with direct methods, direct methods are still faster; this happens often with linear solid elements (TETRA4, HEXA8) careful selection of preconditioners required the more elaborate the preconditioner, the lower the number of iterations but effort to compute this preconditioner and its storage requirements go up, preconditioning step gets more expensive example: block incomplete Cholesky preconditioner (BIC) in MSC/NASTRAN for piston (P is computed by an incomplete decomposition of A, fill-in is partially ignored): #iterations CPU time Memory PCG+J sec 0 KW = 8.4 MB PCG+BIC sec 3857 KW = 4.7 MB

29 .3 Comparison of Direct and Iterative Methods Now faster than direct solution! Iterative solvers usually cannot be used as black box solvers yet with number of right hand sides m>, iterative algorithm usually has to be repeated for each RHS number of operations increases by a factor of m increase in computation time is lower, since data locality is higher (algorithms work on multiple vectors simultaneously) note: so-called projection methods, which exploit the existence of multiple RHSs to find better direction vectors p can improve the situation, but are not discussed here ( search into multiple directions simultaneously) summing up: in the FEM, iterative methods often result in lower number of operations for solid models and require less (disk) storage, but are more difficult to apply and require numerical background knowledge from the engineer.

Contents. I The Basic Framework for Stationary Problems 1

Contents. I The Basic Framework for Stationary Problems 1 page v Preface xiii I The Basic Framework for Stationary Problems 1 1 Some model PDEs 3 1.1 Laplace s equation; elliptic BVPs... 3 1.1.1 Physical experiments modeled by Laplace s equation... 5 1.2 Other

More information

Sparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg

Sparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg Sparse Matrices Many matrices in computing only contain a very small percentage of nonzeros. Such matrices are called sparse ( dünn besetzt ). Often, an upper bound on the number of nonzeros in a row can

More information

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet Contents 2 F10: Parallel Sparse Matrix Computations Figures mainly from Kumar et. al. Introduction to Parallel Computing, 1st ed Chap. 11 Bo Kågström et al (RG, EE, MR) 2011-05-10 Sparse matrices and storage

More information

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER AVISHA DHISLE PRERIT RODNEY ADHISLE PRODNEY 15618: PARALLEL COMPUTER ARCHITECTURE PROF. BRYANT PROF. KAYVON LET S

More information

6 Implementation of Parallel FE Systems

6 Implementation of Parallel FE Systems 6 Implementation of Parallel FE Systems 6.1 Implementation of Domain Decomposition in MSC.NASTRAN V70.7 6.2 Further Parallel Features of MSC.NASTRAN V70.7 6.2.1 Parallel Normal Modes Analysis 6.2.2 Parallel

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 20: Sparse Linear Systems; Direct Methods vs. Iterative Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 26

More information

Report of Linear Solver Implementation on GPU

Report of Linear Solver Implementation on GPU Report of Linear Solver Implementation on GPU XIANG LI Abstract As the development of technology and the linear equation solver is used in many aspects such as smart grid, aviation and chemical engineering,

More information

SCALABLE ALGORITHMS for solving large sparse linear systems of equations

SCALABLE ALGORITHMS for solving large sparse linear systems of equations SCALABLE ALGORITHMS for solving large sparse linear systems of equations CONTENTS Sparse direct solvers (multifrontal) Substructuring methods (hybrid solvers) Jacko Koster, Bergen Center for Computational

More information

Sparse Matrices Direct methods

Sparse Matrices Direct methods Sparse Matrices Direct methods Iain Duff STFC Rutherford Appleton Laboratory and CERFACS Summer School The 6th de Brùn Workshop. Linear Algebra and Matrix Theory: connections, applications and computations.

More information

Performance Evaluation of a New Parallel Preconditioner

Performance Evaluation of a New Parallel Preconditioner Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller Marco Zagha School of Computer Science Carnegie Mellon University 5 Forbes Avenue Pittsburgh PA 15213 Abstract The

More information

A Parallel Implementation of the BDDC Method for Linear Elasticity

A Parallel Implementation of the BDDC Method for Linear Elasticity A Parallel Implementation of the BDDC Method for Linear Elasticity Jakub Šístek joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík Institute of Mathematics of the AS CR, Prague

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

Chapter 4. Matrix and Vector Operations

Chapter 4. Matrix and Vector Operations 1 Scope of the Chapter Chapter 4 This chapter provides procedures for matrix and vector operations. This chapter (and Chapters 5 and 6) can handle general matrices, matrices with special structure and

More information

Sparse Linear Systems

Sparse Linear Systems 1 Sparse Linear Systems Rob H. Bisseling Mathematical Institute, Utrecht University Course Introduction Scientific Computing February 22, 2018 2 Outline Iterative solution methods 3 A perfect bipartite

More information

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers

High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers High-Performance Computational Electromagnetic Modeling Using Low-Cost Parallel Computers July 14, 1997 J Daniel S. Katz (Daniel.S.Katz@jpl.nasa.gov) Jet Propulsion Laboratory California Institute of Technology

More information

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013 GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS Kyle Spagnoli Research Engineer @ EM Photonics 3/20/2013 INTRODUCTION» Sparse systems» Iterative solvers» High level benchmarks»

More information

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Ajaccio, France

More information

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms By:- Nitin Kamra Indian Institute of Technology, Delhi Advisor:- Prof. Ulrich Reude 1. Introduction to Linear

More information

Performance Evaluation of a New Parallel Preconditioner

Performance Evaluation of a New Parallel Preconditioner Performance Evaluation of a New Parallel Preconditioner Keith D. Gremban Gary L. Miller October 994 CMU-CS-94-25 Marco Zagha School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 This

More information

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview Aim Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity Julian Hall School of Mathematics University of Edinburgh jajhall@ed.ac.uk What should a 2-hour PhD lecture on structure

More information

Matrix-free IPM with GPU acceleration

Matrix-free IPM with GPU acceleration Matrix-free IPM with GPU acceleration Julian Hall, Edmund Smith and Jacek Gondzio School of Mathematics University of Edinburgh jajhall@ed.ac.uk 29th June 2011 Linear programming theory Primal-dual pair

More information

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications ME964 High Performance Computing for Engineering Applications Outlining Midterm Projects Topic 3: GPU-based FEA Topic 4: GPU Direct Solver for Sparse Linear Algebra March 01, 2011 Dan Negrut, 2011 ME964

More information

Intel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager

Intel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager Intel Math Kernel Library (Intel MKL) Sparse Solvers Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager Copyright 3, Intel Corporation. All rights reserved. Sparse

More information

GPU COMPUTING WITH MSC NASTRAN 2013

GPU COMPUTING WITH MSC NASTRAN 2013 SESSION TITLE WILL BE COMPLETED BY MSC SOFTWARE GPU COMPUTING WITH MSC NASTRAN 2013 Srinivas Kodiyalam, NVIDIA, Santa Clara, USA THEME Accelerated computing with GPUs SUMMARY Current trends in HPC (High

More information

Parallel solution for finite element linear systems of. equations on workstation cluster *

Parallel solution for finite element linear systems of. equations on workstation cluster * Aug. 2009, Volume 6, No.8 (Serial No.57) Journal of Communication and Computer, ISSN 1548-7709, USA Parallel solution for finite element linear systems of equations on workstation cluster * FU Chao-jiang

More information

Techniques for Optimizing FEM/MoM Codes

Techniques for Optimizing FEM/MoM Codes Techniques for Optimizing FEM/MoM Codes Y. Ji, T. H. Hubing, and H. Wang Electromagnetic Compatibility Laboratory Department of Electrical & Computer Engineering University of Missouri-Rolla Rolla, MO

More information

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager Intel Math Kernel Library (Intel MKL) BLAS Victor Kostin Intel MKL Dense Solvers team manager Intel MKL BLAS/Sparse BLAS Original ( dense ) BLAS available from www.netlib.org Additionally Intel MKL provides

More information

Lecture 17: More Fun With Sparse Matrices

Lecture 17: More Fun With Sparse Matrices Lecture 17: More Fun With Sparse Matrices David Bindel 26 Oct 2011 Logistics Thanks for info on final project ideas. HW 2 due Monday! Life lessons from HW 2? Where an error occurs may not be where you

More information

Efficient Minimization of New Quadric Metric for Simplifying Meshes with Appearance Attributes

Efficient Minimization of New Quadric Metric for Simplifying Meshes with Appearance Attributes Efficient Minimization of New Quadric Metric for Simplifying Meshes with Appearance Attributes (Addendum to IEEE Visualization 1999 paper) Hugues Hoppe Steve Marschner June 2000 Technical Report MSR-TR-2000-64

More information

CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus

CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus CSCE 689 : Special Topics in Sparse Matrix Algorithms Department of Computer Science and Engineering Spring 2015 syllabus Tim Davis last modified September 23, 2014 1 Catalog Description CSCE 689. Special

More information

Reckoning With The Limits Of FEM Analysis

Reckoning With The Limits Of FEM Analysis Special reprint from CAD CAM 9-10/2008 Reckoning With The Limits Of FEM Analysis 27. Jahrgang 11,90 N 9-10 September/Oktober 2008 TRENDS - TECHNOLOGIEN - BEST PRACTICE DIGITALE FABRIK: VIRTUELLE PRODUKTION

More information

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3 UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis

More information

Figure 6.1: Truss topology optimization diagram.

Figure 6.1: Truss topology optimization diagram. 6 Implementation 6.1 Outline This chapter shows the implementation details to optimize the truss, obtained in the ground structure approach, according to the formulation presented in previous chapters.

More information

fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner

fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 09 10 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

Finite Element Implementation

Finite Element Implementation Chapter 8 Finite Element Implementation 8.1 Elements Elements andconditions are the main extension points of Kratos. New formulations can be introduced into Kratos by implementing a new Element and its

More information

Approaches to Parallel Implementation of the BDDC Method

Approaches to Parallel Implementation of the BDDC Method Approaches to Parallel Implementation of the BDDC Method Jakub Šístek Includes joint work with P. Burda, M. Čertíková, J. Mandel, J. Novotný, B. Sousedík. Institute of Mathematics of the AS CR, Prague

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 2 Don t you just invert

More information

The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method

The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method The Design and Implementation Of A New Out-of-Core Sparse Cholesky Factorization Method VLADIMIR ROTKIN and SIVAN TOLEDO Tel-Aviv University We describe a new out-of-core sparse Cholesky factorization

More information

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,

More information

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 5: Sparse Linear Systems and Factorization Methods Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 18 Sparse

More information

AA220/CS238 Parallel Methods in Numerical Analysis. Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems)

AA220/CS238 Parallel Methods in Numerical Analysis. Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems) AA0/CS8 Parallel ethods in Numerical Analysis Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems) Kincho H. Law Professor of Civil and Environmental Engineering Stanford University

More information

On the Parallel Solution of Sparse Triangular Linear Systems. M. Naumov* San Jose, CA May 16, 2012 *NVIDIA

On the Parallel Solution of Sparse Triangular Linear Systems. M. Naumov* San Jose, CA May 16, 2012 *NVIDIA On the Parallel Solution of Sparse Triangular Linear Systems M. Naumov* San Jose, CA May 16, 2012 *NVIDIA Why Is This Interesting? There exist different classes of parallel problems Embarrassingly parallel

More information

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations Hartwig Anzt 1, Marc Baboulin 2, Jack Dongarra 1, Yvan Fournier 3, Frank Hulsemann 3, Amal Khabou 2, and Yushan Wang 2 1 University

More information

fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner

fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner Thomas Huckle Matous Sedlacek 2011 08 01 Technische Universität München Research Unit Computer Science V Scientific Computing in Computer

More information

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker BLAS and LAPACK + Data Formats for Sparse Matrices Part of the lecture Wissenschaftliches Rechnen Hilmar Wobker Institute of Applied Mathematics and Numerics, TU Dortmund email: hilmar.wobker@math.tu-dortmund.de

More information

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems AMSC 6 /CMSC 76 Advanced Linear Numerical Analysis Fall 7 Direct Solution of Sparse Linear Systems and Eigenproblems Dianne P. O Leary c 7 Solving Sparse Linear Systems Assumed background: Gauss elimination

More information

Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009

Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Sparse Multifrontal Performance Gains via NVIDIA GPU January 16, 2009 Dan l Pierce, PhD, MBA, CEO & President AAI Joint with: Yukai Hung, Chia-Chi Liu, Yao-Hung Tsai, Weichung Wang, and David Yu Access

More information

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency

Outline. Parallel Algorithms for Linear Algebra. Number of Processors and Problem Size. Speedup and Efficiency 1 2 Parallel Algorithms for Linear Algebra Richard P. Brent Computer Sciences Laboratory Australian National University Outline Basic concepts Parallel architectures Practical design issues Programming

More information

Super Matrix Solver-P-ICCG:

Super Matrix Solver-P-ICCG: Super Matrix Solver-P-ICCG: February 2011 VINAS Co., Ltd. Project Development Dept. URL: http://www.vinas.com All trademarks and trade names in this document are properties of their respective owners.

More information

Windows Hardware Performance Tuning for Nastran. Easwaran Viswanathan (Siemens PLM Software)

Windows Hardware Performance Tuning for Nastran. Easwaran Viswanathan (Siemens PLM Software) Windows Hardware Performance Tuning for Nastran By Easwaran Viswanathan (Siemens PLM Software) NX Nastran is a very I/O intensive application. It is important to select the proper hardware to satisfy expected

More information

Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks

Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks Kevin Deweese 1 Erik Boman 2 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms

More information

Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement

Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement Tim Davis (Texas A&M University) with Sanjay Ranka, Mohamed Gadou (University of Florida) Nuri Yeralan (Microsoft) NVIDIA

More information

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report ESPRESO ExaScale PaRallel FETI Solver Hybrid FETI Solver Report Lubomir Riha, Tomas Brzobohaty IT4Innovations Outline HFETI theory from FETI to HFETI communication hiding and avoiding techniques our new

More information

CS 542G: Solving Sparse Linear Systems

CS 542G: Solving Sparse Linear Systems CS 542G: Solving Sparse Linear Systems Robert Bridson November 26, 2008 1 Direct Methods We have already derived several methods for solving a linear system, say Ax = b, or the related leastsquares problem

More information

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization

More information

Lecture 27: Fast Laplacian Solvers

Lecture 27: Fast Laplacian Solvers Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall

More information

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss Robot Mapping Least Squares Approach to SLAM Cyrill Stachniss 1 Three Main SLAM Paradigms Kalman filter Particle filter Graphbased least squares approach to SLAM 2 Least Squares in General Approach for

More information

Graphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General

Graphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General Robot Mapping Three Main SLAM Paradigms Least Squares Approach to SLAM Kalman filter Particle filter Graphbased Cyrill Stachniss least squares approach to SLAM 1 2 Least Squares in General! Approach for

More information

Parallel resolution of sparse linear systems by mixing direct and iterative methods

Parallel resolution of sparse linear systems by mixing direct and iterative methods Parallel resolution of sparse linear systems by mixing direct and iterative methods Phyleas Meeting, Bordeaux J. Gaidamour, P. Hénon, J. Roman, Y. Saad LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix

More information

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation

Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M

More information

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3 6 Iterative Solvers Lab Objective: Many real-world problems of the form Ax = b have tens of thousands of parameters Solving such systems with Gaussian elimination or matrix factorizations could require

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

PARDISO Version Reference Sheet Fortran

PARDISO Version Reference Sheet Fortran PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly

More information

Iterative Sparse Triangular Solves for Preconditioning

Iterative Sparse Triangular Solves for Preconditioning Euro-Par 2015, Vienna Aug 24-28, 2015 Iterative Sparse Triangular Solves for Preconditioning Hartwig Anzt, Edmond Chow and Jack Dongarra Incomplete Factorization Preconditioning Incomplete LU factorizations

More information

Computational Fluid Dynamics - Incompressible Flows

Computational Fluid Dynamics - Incompressible Flows Computational Fluid Dynamics - Incompressible Flows March 25, 2008 Incompressible Flows Basis Functions Discrete Equations CFD - Incompressible Flows CFD is a Huge field Numerical Techniques for solving

More information

Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems

Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems Mathematical Problems in Engineering Volume 2015, Article ID 147286, 7 pages http://dx.doi.org/10.1155/2015/147286 Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity

More information

Computational issues in linear programming

Computational issues in linear programming Computational issues in linear programming Julian Hall School of Mathematics University of Edinburgh 15th May 2007 Computational issues in linear programming Overview Introduction to linear programming

More information

(Sparse) Linear Solvers

(Sparse) Linear Solvers (Sparse) Linear Solvers Ax = B Why? Many geometry processing applications boil down to: solve one or more linear systems Parameterization Editing Reconstruction Fairing Morphing 1 Don t you just invert

More information

Efficient Use of Iterative Solvers in Nested Topology Optimization

Efficient Use of Iterative Solvers in Nested Topology Optimization Efficient Use of Iterative Solvers in Nested Topology Optimization Oded Amir, Mathias Stolpe and Ole Sigmund Technical University of Denmark Department of Mathematics Department of Mechanical Engineering

More information

A parallel direct/iterative solver based on a Schur complement approach

A parallel direct/iterative solver based on a Schur complement approach A parallel direct/iterative solver based on a Schur complement approach Gene around the world at CERFACS Jérémie Gaidamour LaBRI and INRIA Bordeaux - Sud-Ouest (ScAlApplix project) February 29th, 2008

More information

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about

More information

Sparse Direct Solvers for Extreme-Scale Computing

Sparse Direct Solvers for Extreme-Scale Computing Sparse Direct Solvers for Extreme-Scale Computing Iain Duff Joint work with Florent Lopez and Jonathan Hogg STFC Rutherford Appleton Laboratory SIAM Conference on Computational Science and Engineering

More information

Least-Squares Fitting of Data with B-Spline Curves

Least-Squares Fitting of Data with B-Spline Curves Least-Squares Fitting of Data with B-Spline Curves David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International

More information

Finite Element Modeling Techniques (2) دانشگاه صنعتي اصفهان- دانشكده مكانيك

Finite Element Modeling Techniques (2) دانشگاه صنعتي اصفهان- دانشكده مكانيك Finite Element Modeling Techniques (2) 1 Where Finer Meshes Should be Used GEOMETRY MODELLING 2 GEOMETRY MODELLING Reduction of a complex geometry to a manageable one. 3D? 2D? 1D? Combination? Bulky solids

More information

Preconditioning for linear least-squares problems

Preconditioning for linear least-squares problems Preconditioning for linear least-squares problems Miroslav Tůma Institute of Computer Science Academy of Sciences of the Czech Republic tuma@cs.cas.cz joint work with Rafael Bru, José Marín and José Mas

More information

Solid and shell elements

Solid and shell elements Solid and shell elements Theodore Sussman, Ph.D. ADINA R&D, Inc, 2016 1 Overview 2D and 3D solid elements Types of elements Effects of element distortions Incompatible modes elements u/p elements for incompressible

More information

Scientific Computing. Some slides from James Lambers, Stanford

Scientific Computing. Some slides from James Lambers, Stanford Scientific Computing Some slides from James Lambers, Stanford Dense Linear Algebra Scaling and sums Transpose Rank-one updates Rotations Matrix vector products Matrix Matrix products BLAS Designing Numerical

More information

THE application of advanced computer architecture and

THE application of advanced computer architecture and 544 IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 45, NO. 3, MARCH 1997 Scalable Solutions to Integral-Equation and Finite-Element Simulations Tom Cwik, Senior Member, IEEE, Daniel S. Katz, Member,

More information

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures Paper 6 Civil-Comp Press, 2012 Proceedings of the Eighth International Conference on Engineering Computational Technology, B.H.V. Topping, (Editor), Civil-Comp Press, Stirlingshire, Scotland Application

More information

How to use FEKO with Altair HyperMesh

How to use FEKO with Altair HyperMesh How to use FEKO with Altair HyperMesh This How To applies to: FEKO Suite 6.2, HyperMesh 11.0 Users who would like to make use of the benefits of the advanced meshing features of Altair HyperMesh while

More information

Uppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2

Uppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2 Uppsala University Department of Information technology Hands-on : Ill-conditioning Exercise (Ill-conditioned linear systems) Definition A system of linear equations is said to be ill-conditioned when

More information

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li Evaluation of sparse LU factorization and triangular solution on multicore architectures X. Sherry Li Lawrence Berkeley National Laboratory ParLab, April 29, 28 Acknowledgement: John Shalf, LBNL Rich Vuduc,

More information

Empirical Complexity of Laplacian Linear Solvers: Discussion

Empirical Complexity of Laplacian Linear Solvers: Discussion Empirical Complexity of Laplacian Linear Solvers: Discussion Erik Boman, Sandia National Labs Kevin Deweese, UC Santa Barbara John R. Gilbert, UC Santa Barbara 1 Simons Institute Workshop on Fast Algorithms

More information

Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms

Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms H. Anzt, V. Heuveline Karlsruhe Institute of Technology, Germany

More information

Combinatorial problems in a Parallel Hybrid Linear Solver

Combinatorial problems in a Parallel Hybrid Linear Solver Combinatorial problems in a Parallel Hybrid Linear Solver Ichitaro Yamazaki and Xiaoye Li Lawrence Berkeley National Laboratory François-Henry Rouet and Bora Uçar ENSEEIHT-IRIT and LIP, ENS-Lyon SIAM workshop

More information

Example 24 Spring-back

Example 24 Spring-back Example 24 Spring-back Summary The spring-back simulation of sheet metal bent into a hat-shape is studied. The problem is one of the famous tests from the Numisheet 93. As spring-back is generally a quasi-static

More information

Sparse matrices, graphs, and tree elimination

Sparse matrices, graphs, and tree elimination Logistics Week 6: Friday, Oct 2 1. I will be out of town next Tuesday, October 6, and so will not have office hours on that day. I will be around on Monday, except during the SCAN seminar (1:25-2:15);

More information

BDDCML. solver library based on Multi-Level Balancing Domain Decomposition by Constraints copyright (C) Jakub Šístek version 1.

BDDCML. solver library based on Multi-Level Balancing Domain Decomposition by Constraints copyright (C) Jakub Šístek version 1. BDDCML solver library based on Multi-Level Balancing Domain Decomposition by Constraints copyright (C) 2010-2012 Jakub Šístek version 1.3 Jakub Šístek i Table of Contents 1 Introduction.....................................

More information

AMS527: Numerical Analysis II

AMS527: Numerical Analysis II AMS527: Numerical Analysis II A Brief Overview of Finite Element Methods Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 1 / 25 Overview Basic concepts Mathematical

More information

Libraries for Scientific Computing: an overview and introduction to HSL

Libraries for Scientific Computing: an overview and introduction to HSL Libraries for Scientific Computing: an overview and introduction to HSL Mario Arioli Jonathan Hogg STFC Rutherford Appleton Laboratory 2 / 41 Overview of talk Brief introduction to who we are An overview

More information

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply University of California, Berkeley Berkeley Benchmarking and Optimization Group (BeBOP) http://bebop.cs.berkeley.edu

More information

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography 1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography

More information

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science Some special matrices Matlab code How many operations and memory

More information