Sparse Matrices. This means that for increasing problem size the matrices become sparse and sparser. O. Rheinbach, TU Bergakademie Freiberg

Similar documents
HYPERDRIVE IMPLEMENTATION AND ANALYSIS OF A PARALLEL, CONJUGATE GRADIENT LINEAR SOLVER PROF. BRYANT PROF. KAYVON 15618: PARALLEL COMPUTER ARCHITECTURE

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

Lecture 15: More Iterative Ideas

Report of Linear Solver Implementation on GPU

Matrix-free IPM with GPU acceleration

2 Fundamentals of Serial Linear Algebra

OpenFOAM + GPGPU. İbrahim Özküçük

Lecture 17: More Fun With Sparse Matrices

CSE 599 I Accelerated Computing - Programming GPUS. Parallel Pattern: Sparse Matrices

(Sparse) Linear Solvers

Figure 6.1: Truss topology optimization diagram.

(Sparse) Linear Solvers

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Julian Hall School of Mathematics University of Edinburgh. June 15th Parallel matrix inversion for the revised simplex method - a study

Porting the NAS-NPB Conjugate Gradient Benchmark to CUDA. NVIDIA Corporation

1D Model Problem. Conjugate Gradients. restart; with(linearalgebra): with(plots):

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

Iterative Sparse Triangular Solves for Preconditioning

PARDISO Version Reference Sheet Fortran

Iterative Algorithms I: Elementary Iterative Methods and the Conjugate Gradient Algorithms

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Computational issues in linear programming

Chapter 1 - The Spark Machine Learning Library

Performance Strategies for Parallel Mathematical Libraries Based on Historical Knowledgebase

Performance Evaluation of a New Parallel Preconditioner

Sparse Matrix Libraries in C++ for High Performance. Architectures. ferent sparse matrix data formats in order to best

Contents. I The Basic Framework for Stationary Problems 1

nag sparse nsym sol (f11dec)

THE DEVELOPMENT OF THE POTENTIAL AND ACADMIC PROGRAMMES OF WROCLAW UNIVERISTY OF TECH- NOLOGY ITERATIVE LINEAR SOLVERS

Sparse Linear Systems

Spare Matrix Formats, and The Standard Template Library

Prof. Dr. Stefan Funken, Prof. Dr. Alexander Keller, Prof. Dr. Karsten Urban 11. Januar Scientific Computing Parallele Algorithmen

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

Uppsala University Department of Information technology. Hands-on 1: Ill-conditioning = x 2

Application of GPU-Based Computing to Large Scale Finite Element Analysis of Three-Dimensional Structures

Preconditioning Linear Systems Arising from Graph Laplacians of Complex Networks

NAG Library Function Document nag_sparse_nsym_sol (f11dec)

AA220/CS238 Parallel Methods in Numerical Analysis. Introduction to Sparse Direct Solver (Symmetric Positive Definite Systems)

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU

Structure-preserving Smoothing for Seismic Amplitude Data by Anisotropic Diffusion using GPGPU

Accelerating the Iterative Linear Solver for Reservoir Simulation

ECEN 615 Methods of Electric Power Systems Analysis Lecture 11: Sparse Systems

Intel MKL Sparse Solvers. Software Solutions Group - Developer Products Division

Iterative Methods for Solving Linear Problems

Parallel Implementations of Gaussian Elimination

NAG Library Function Document nag_sparse_sym_sol (f11jec)

How to use sparse lib in FEM package

Chapter 4. Matrix and Vector Operations

Implicit schemes for wave models

Empirical Complexity of Laplacian Linear Solvers: Discussion

Gaussian Elimination 2 5 = 4

CS 542G: Solving Sparse Linear Systems

Data Structures for sparse matrices

GPU Implementation of Elliptic Solvers in NWP. Numerical Weather- and Climate- Prediction

Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach

Analysis and Optimization of Power Consumption in the Iterative Solution of Sparse Linear Systems on Multi-core and Many-core Platforms

Paralution & ViennaCL

Nonsymmetric Problems. Abstract. The eect of a threshold variant TPABLO of the permutation

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

CSCE 5160 Parallel Processing. CSCE 5160 Parallel Processing

Least Squares and SLAM Pose-SLAM

Block Distributed Schur Complement Preconditioners for CFD Computations on Many-Core Systems

fspai-1.1 Factorized Sparse Approximate Inverse Preconditioner

GPU-based Parallel Reservoir Simulators

PACKAGE SPECIFICATION HSL To solve a symmetric, sparse and positive definite set of linear equations Ax = b i.e. j=1

Accelerated ANSYS Fluent: Algebraic Multigrid on a GPU. Robert Strzodka NVAMG Project Lead

THE STEP BY STEP INTERACTIVE GUIDE

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

A Bit-Compatible Parallelization for ILU(k) Preconditioning

Accelerating a Simulation of Type I X ray Bursts from Accreting Neutron Stars Mark Mackey Professor Alexander Heger

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Sparse matrices: Basics

IBM Research. IBM Research Report

Advanced Techniques for Mobile Robotics Graph-based SLAM using Least Squares. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

On the Parallel Solution of Sparse Triangular Linear Systems. M. Naumov* San Jose, CA May 16, 2012 *NVIDIA

Multi-GPU simulations in OpenFOAM with SpeedIT technology.

10/26/ Solving Systems of Linear Equations Using Matrices. Objectives. Matrices

High Performance Computing Programming Paradigms and Scalability Part 6: Examples of Parallel Algorithms

Introduction to PETSc KSP, PC. CS595, Fall 2010

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations

Overview of Intel MKL Sparse BLAS. Software and Services Group Intel Corporation

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015

Parallel solution for finite element linear systems of. equations on workstation cluster *

Task-Oriented Parallel ILU(k) Preconditioning on Computer Cluster and Multi-core Machine

Sparse matrices, graphs, and tree elimination

fspai-1.0 Factorized Sparse Approximate Inverse Preconditioner

NAG Fortran Library Routine Document F11DSF.1

Numerical Linear Algebra

NEW ADVANCES IN GPU LINEAR ALGEBRA

Tools and Libraries for Parallel Sparse Matrix Computations. Edmond Chow and Yousef Saad. University of Minnesota. Minneapolis, MN

Iterative Solver Benchmark Jack Dongarra, Victor Eijkhout, Henk van der Vorst 2001/01/14 1 Introduction The traditional performance measurement for co

Research Article A PETSc-Based Parallel Implementation of Finite Element Method for Elasticity Problems

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

Methods of solving sparse linear systems. Soldatenko Oleg SPbSU, Department of Computational Physics

Parallel Threshold-based ILU Factorization

AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENETIC ALGORITHMS

paper, we focussed on the GMRES(m) which is improved GMRES(Generalized Minimal RESidual method), and developed its library on distributed memory machi

Transcription:

Sparse Matrices Many matrices in computing only contain a very small percentage of nonzeros. Such matrices are called sparse ( dünn besetzt ). Often, an upper bound on the number of nonzeros in a row can be given (e.g., n = 100), which is independent of the matrix size. This means that for increasing problem size the matrices become sparse and sparser. 0-100 -200-300 -400-500 -600-700 -800-900 0 100 200 300 400 500 600 700 800 900 1

Solution of Systems of Equations Iterative methods like (Preconditioned) Conjugate Gradient Method (PCG), GMRES a.o. Use only matrix vector multiplikations Use little additional memory Direct sparse or banded solvers like LU-decomposition (Gaussian Elimination) First step is symbolic. Allocates additional memory for the fill-in during elimination Followed by a second step in which the elimination takes place 2

Preconditioned CG-Method (PCG) The PCG method is an iterative method to solve linear systems where A is symmetric positive definite. Works by minimizing the energy Ax = b 1 2 xt Ax x T b. Minimum is the solution of Ax = b and the necessary condition is 0 = ( 1 2 xt Ax x T b) = Ax b. Needs only multiplications with the matrix A and an (optional) preconditioner matrix or operator M 1. Simple preconditioners are, e.g., be Jacobi/Gauß-Seydel (not optimal) Optimal preconditioners will result in a constant number of iterations (for a given error tolerance). Optimal or almost-optimal preconditioners are much more sophisticated algorithms than Jacobi or Gauß-Seydel. 3

Preconditioned CG-Method (PCG) - Algorithm /* Preconditioned CG-Method for Ax=b */ i=0 r=b-a*x /* */ d=m^{-1}*r delta_new=<r,d> while delta_new > eps do q=a*d /* */ alpha=delta_new / <d,q> x=x + alpha*d r=r - alpha*q done s=m^{-1}*r /* */ delta_old=delta_new delta_new=<r,s> beta=delta_new / delta_old d=s + beta*d i=i+1 4

Naive Coordinate Format for Sparse Matrices Use three arrays or linked lists (column, row, value) Insertion/deletion if entries (+) Matrix operations (?) 5

Compressed Sparse Row Format (CSR) for Sparse Matrices Use three arrays val stores the nonzero entries of the matrix in row-wise order cols stores the column number of the entry rowstart stores the indices of the start of the rows in val and cols. the diagonal is (often) stored first to allow fast access to diagonal entries. 0 1 2 3 4 0 1 2 3 4 14 1 1 1 1 1 1 4 0 1 2 3 4 5 6 7 8 9 10 11 val 4-1 -1 4-1 -1-1 4-1 4-1 4 cols 0 1 2 1 0 3 4 2 0 3 1 4 rowstart 0 3 7 9 11 12 6

Compressed Row Format compact and efficient very general fast for the most important operations Sorting the rows allows a fast access to the entries by binary search - Insertion of entries is very inefficient (has to be accepted, if frequent insertion is necessary use a different format and then convert) 7

Compressed Sparse Column Format (SCS) for Sparse Matrices Is equivalent to compressed row storage of A T. 0 1 2 3 4 0 1 2 3 4 14 1 1 1 1 1 1 4 0 1 2 3 4 5 6 7 8 9 10 11 val 4-1 -1 4-1 -1 4-1 4-1 4-1 rows 0 1 2 1 0 3 2 0 3 1 4 1 colstart 0 3 6 8 10 12 8

Knuth scheme Combination of CCR and CCS. The fields nextr and nextc allow fast traversal of columns AND rows. The are linked lists to the next row or column. The last in a row/column links back to the first. Diagonal is not sorted to the front. To find the first entry in a row or column, both, rowstart and colstart are present. 0 1 2 3 4 0 1 2 3 4 1 1 1 1 1 4 1 1 1 1 4 val 4-1 -1-1 4-1 -1-1 4-1 4 4 row 0 0 0 1 1 1 1 2 2 3 3 4 col 0 1 2 0 1 3 4 0 2 1 3 4 nextr 1 2 0 4 5 6 3 8 7 10 9 11 nextc 3 4 8 7 9 10 11 0 2 1 5 6 rowstart 0 3 7 9 11 colstart 0 1 2 5 6 9

Hypersparse Matrices In typical sparse matrices the number of elements in a row and column is bounded. Hypersparse matrices are matrices where almost all rows and columns are zero. Examples are restriction operators, adjacence matrices of hypersparse graphs,... Compressed Row and Compressed Column are inefficient for hypersparse matrices. In CSR rowstart has as many entries as there are rows. In CSC colstart has as many entries as there are columns. Both can be much larger than the number of nonzeros in the matrix. To avoid this rowstart or colstart, respectively, can be compressed to save space, i.e., by a runlength encoding. Remark: This is also important when implementing an efficient sparse matrix-matrix multiplication. 10

Example for Sparse Operations Matrix-vector multiplikation in compressed row format Matrix-vector multiplikation in compressed column format 11

Matrix-Vector Multiplikation in Compressed Row Format /* Berechnet v=a*w, wobei A im CRF */ for i=0..n-1 v(i)=0; left=rowstart(i); right=rowstart(i+1)-1; for j=left..right v(i)=v(i)+val(j)*w(cols(j)) end end 0 1 2 3 4 5 6 7 8 9 10 11 val 4-1 -1 4-1 -1-1 4-1 4-1 4 cols 0 1 2 1 0 3 4 2 0 3 1 4 rowstart 0 3 7 9 11 12 12

Matrix-Vector Multiplication im Compressed Column Format /* Berechnet v=a*w, wobei A im CCF */ for i=0..n-1 v(i)=0; top=colstart(i); bottom=colstart(i+1)-1; for j=top..bottom v(rows(j))=v(rows(j))+val(j)*w(i) end end 0 1 2 3 4 5 6 7 8 9 10 11 val 4-1 -1 4-1 -1 4-1 4-1 4-1 rows 0 1 2 1 0 3 2 0 3 1 4 1 colstart 0 3 6 8 10 12 13

Banded LU-Decomposition, Symbolic Step, Find Skyline 4 1 1 1 1 14

Banded LU-decomposition, symbolic step Schritt: allocate memory, copy, sort 4 1 1 0 1 1 1 0 4 0 0 1 0 4 0 1 0 0 4 val 4-1 -1 4-1 0-1 -1 4-1 0 0 0 cols 0 1 2 1 0 2 3 4 2 0 1 3 4 rowstart 0 3 8 13 17 21 Data structure after symbolic step. 4-1 0 0 4-1 0 0 3 1 2 4 4 1 2 3 Banded LU Decomposition: Complexity n (bandwidth) 2 15