A comparison of Algorithms for Sparse Matrix. Real-time Multibody Dynamic Simulation

Similar documents
(Sparse) Linear Solvers

(Sparse) Linear Solvers

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Lecture 9. Introduction to Numerical Techniques

Advanced Techniques for Mobile Robotics Graph-based SLAM using Least Squares. Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz

Computational Aspects and Recent Improvements in the Open-Source Multibody Analysis Software MBDyn

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Efficient Iterative Semi-supervised Classification on Manifold

The Efficient Extension of Globally Consistent Scan Matching to 6 DoF

ICRA 2016 Tutorial on SLAM. Graph-Based SLAM and Sparsity. Cyrill Stachniss

Aim. Structure and matrix sparsity: Part 1 The simplex method: Exploiting sparsity. Structure and matrix sparsity: Overview

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Sparse Matrix Algorithms

CVPR 2014 Visual SLAM Tutorial Efficient Inference

International Conference on Computational Science (ICCS 2017)

Least Squares and SLAM Pose-SLAM

GTC 2013: DEVELOPMENTS IN GPU-ACCELERATED SPARSE LINEAR ALGEBRA ALGORITHMS. Kyle Spagnoli. Research EM Photonics 3/20/2013

THE procedure used to solve inverse problems in areas such as Electrical Impedance

Computational Methods CMSC/AMSC/MAPL 460. Vectors, Matrices, Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

Preconditioning for linear least-squares problems

HIPS : a parallel hybrid direct/iterative solver based on a Schur complement approach

PARDISO Version Reference Sheet Fortran

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

arxiv: v1 [cs.pl] 18 May 2017

A GPU Sparse Direct Solver for AX=B

Null space computation of sparse singular matrices with MUMPS

L15. POSE-GRAPH SLAM. NA568 Mobile Robotics: Methods & Algorithms

Structure from Motion

GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU

CS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018

A parallel direct/iterative solver based on a Schur complement approach

Interlude: Solving systems of Equations

Accelerating the Iterative Linear Solver for Reservoir Simulation

Sparse LU Factorization for Parallel Circuit Simulation on GPUs

Computational Methods CMSC/AMSC/MAPL 460. Linear Systems, LU Decomposition, Ramani Duraiswami, Dept. of Computer Science

The 3D DSC in Fluid Simulation

Parallel resolution of sparse linear systems by mixing direct and iterative methods

Robot Mapping. Least Squares Approach to SLAM. Cyrill Stachniss

Graphbased. Kalman filter. Particle filter. Three Main SLAM Paradigms. Robot Mapping. Least Squares Approach to SLAM. Least Squares in General

Real-Time Shape Editing using Radial Basis Functions

Sparse Matrices Introduction to sparse matrices and direct methods

Sparse Direct Solvers for Extreme-Scale Computing

Combinatorial problems in a Parallel Hybrid Linear Solver

Exploiting Multiple GPUs in Sparse QR: Regular Numerics with Irregular Data Movement

Advanced Computer Graphics

MATH 3511 Basics of MATLAB

Issues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

MUMPS. The MUMPS library. Abdou Guermouche and MUMPS team, June 22-24, Univ. Bordeaux 1 and INRIA

Algorithm 8xx: SuiteSparseQR, a multifrontal multithreaded sparse QR factorization package

GRAPH CENTERS USED FOR STABILIZATION OF MATRIX FACTORIZATIONS

Contents. I The Basic Framework for Stationary Problems 1

A Parallel Implementation of the BDDC Method for Linear Elasticity

CGO: G: Decoupling Symbolic from Numeric in Sparse Matrix Computations

PERFORMANCE ANALYSIS OF LOAD FLOW COMPUTATION USING FPGA 1

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker

Scheduling Strategies for Parallel Sparse Backward/Forward Substitution

Autar Kaw Benjamin Rigsby. Transforming Numerical Methods Education for STEM Undergraduates

ICES REPORT December Chetan Jhurani and L. Demkowicz

ME964 High Performance Computing for Engineering Applications

Advanced Computer Graphics

GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis

MATH 5520 Basics of MATLAB

Comparing Least Squares Calculations

ESPRESO ExaScale PaRallel FETI Solver. Hybrid FETI Solver Report

On the I/O Volume in Out-of-Core Multifrontal Methods with a Flexible Allocation Scheme

Figure 6.1: Truss topology optimization diagram.

Introduction to Optimization

2.7 Numerical Linear Algebra Software

Lecture 27: Fast Laplacian Solvers

High-Performance Linear Algebra Processor using FPGA

High-Performance Libraries and Tools. HPC Fall 2012 Prof. Robert van Engelen

Fast solvers for mesh-based computations

HYPERGRAPH-BASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION

SUMMARY. solve the matrix system using iterative solvers. We use the MUMPS codes and distribute the computation over many different

Applications. Human and animal motion Robotics control Hair Plants Molecular motion

Geometric Modeling Assignment 3: Discrete Differential Quantities

Biostatistics 615/815 Lecture 13: Programming with Matrix

AMS527: Numerical Analysis II

The Alternating Direction Method of Multipliers

Advanced Numerical Techniques for Cluster Computing

PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

HYBRID DIRECT AND ITERATIVE SOLVER FOR H ADAPTIVE MESHES WITH POINT SINGULARITIES

Reminder: Lecture 20: The Eight-Point Algorithm. Essential/Fundamental Matrix. E/F Matrix Summary. Computing F. Computing F from Point Matches

Optimizing Sparse Matrix Vector Multiplication on Emerging Multicores

Numerical Analysis I - Final Exam Matrikelnummer:

Sparse Training Data Tutorial of Parameter Server

ME305: Introduction to System Dynamics

LARP / 2018 ACK : 1. Linear Algebra and Its Applications - Gilbert Strang 2. Autar Kaw, Transforming Numerical Methods Education for STEM Graduates

arxiv: v1 [cs.sy] 8 Feb 2016

When Sparsity Meets Low-Rankness: Transform Learning With Non-Local Low-Rank Constraint For Image Restoration

Sparse Matrices Direct methods

NEW ADVANCES IN GPU LINEAR ALGEBRA

LAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1

Optimizing the operations with sparse matrices on Intel architecture

Numerical Methods 5633

IN OUR LAST HOMEWORK, WE SOLVED LARGE

2nd Introduction to the Matrix package

Report of Linear Solver Implementation on GPU

Transcription:

A comparison of Algorithms for Sparse Matrix Factoring and Variable Reordering aimed at Real-time Multibody Dynamic Simulation Jose-Luis Torres-Moreno, Jose-Luis Blanco, Javier López-Martínez, Antonio Giménez-Fernández 1-4 July, 2013 Zagreb (Croatia) ECCOMAS Multibody Dynamics 2013

Goal of this work Problem under study Multibody dynamic formulation: Best numerical solver algorithm? = = \ 2 It depends Ax b x A b

Talk outline 1. Problem introduction 2. Tested algorithms 3. Benchmark 4. Results 3

Problem introduction Modeling decisions: coordinates, formulation, etc. (,&, ) q&& t = f q t q t t ( ) ( ) ( ) (, &, ) && z t = f z t z t t ( ) ( ) ( ) Our choices for this work: Dependent coordinates vs. Independent coordinates Coordinate type for modeling: Dynamic formulation: Natural coordinates Index-1 Lagrange w. Baumgarte 4 1/4: Introduction

Problem introduction A x = b Problem structure? N dependent coordinates M constraints (N+M) x (N+M) (N+M M) x 1 = (N+M M) x 1 Square, symmetric, Dense columns psd and sparse 5 1/4: Introduction

Structure of matrix A A Most important features - M is constant - Static non-zero structure - SPARSITY 6 1/4: Introduction

Why sparsity matters? Naive matrix inversion: T M φq q&& Q = φq 0 λ c 14243 A q && λ 1 T M φ q Q = φq 0 c 14243 A 1 7 1/4: Introduction

Why sparsity matters? nnz=528 (93.76% are zeros) nnz=8266 (2.34% are zeros) 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 M φq φ T q 0 8 1/4: Introduction M φq In general, the inverse of a sparse matrix is NOT sparse! ( 3 ( + ) ) Cost: O N M φ T q 0 1

Why sparsity matters? How to keep matrices sparse? Avoid direct matrix inversion. Alternatives? Sparse matrix factorization algorithms: 1) Method: LU, Cholesky. 2) Variable ordering 9 1/4: Introduction

Why variable reordering matters? nnz=528 (93.76% are zeros) 10 20 30 ~ 94% zeros Natural ordering A = 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 ~ 70% zeros nnz=2599 (69.29% are zeros) nnz=2456 (70.98% are zeros) 10 10 20 20 30 30 A = L U = 40 50 40 50 60 60 70 70 80 80 90 90 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 10 1/4: Introduction

Why variable reordering matters? Reordering = permutation of variable indices nnz=528 (93.76% are zeros) nnz=528 (93.76% are zeros) 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 A ( :, : ) A( p, p ) 11 1/4: Introduction

Why variable reordering matters? nnz=528 (93.76% are zeros) 10 20 30 ~ 94% zeros Optimized ordering 40 50 60 70 80 90 ~97% 10 20 30 40 50 60 70 80 90 and ~93% zeros nnz=244 (97.12% are zeros) nnz=620 (92.67% are zeros) 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 10 20 30 40 50 60 70 80 90 90 10 20 30 40 50 60 70 80 90 12 1/4: Introduction

Talk outline 1. Problem introduction 2. Tested algorithms 3. Benchmark 4. Results 13

Tested algorithms Approaches: LU decomposition Cholesky decomposition 14 3/4: Algorithms

Tested algorithms Approaches: LU decomposition Cholesky decomposition Ax = b } A 1) L Ux { = L y= b, solve for y ( LU) x = b = y 2) Ux = y, solve for x nnz=244 (97.12% are zeros) nnz=620 (92.67% are zeros) 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 15 3/4: Algorithms

Tested algorithms Benchmarked LU implementations (6 variations): 1) Dense LU Eigen C++ library 2) Sparse LU Suitesparse s KLU a) AMD ordering b) COLAMD ordering 3) Sparse LU Suitesparse s UMFPACK a) Natural ordering b) AMD ordering c) METIS ordering Symbolic vs. numeric factorization! 16 3/4: Algorithms

Tested algorithms Approaches: LU decomposition Cholesky decomposition (only when M is definite positive) T M φq q&& Q = φq 0 λ c 14243 A Apply Cholesky to M. Use Schür complement to build two reduced systems of equations. [R. von Schwerin, Mathematical Methods for MBS in Descriptor Form, 1999] 17 3/4: Algorithms

Tested algorithms Approaches: LU decomposition Cholesky decomposition (only when M is definite positive) Benchmarked Cholesky implementations (3 variations): 1) Sparse Cholesky Suitesparse s CHOLMOD a) AMD ordering b) COLAMD ordering c) METIS ordering 18 3/4: Algorithms

Talk outline 1. Problem introduction 2. Tested algorithms 3. Benchmark 4. Results 19

Our benchmark model Test mechanism: Parameterized 2D lattice of N x N y articulated quadrangles. Example: Nx = 5 Ny = 4 20 3/4: Benchmark

Our benchmark model Irregular node locations to avoid singular configurations: 21 3/4: Benchmark

Benchmark C++ implementation MB model Solver-specific tasks to be profiled Assemble Solver Assembled model Preprocessing 1. Update Jacobians 2. Build RHS 3. Triplet -> CCS (*) 4. Numeric factoring 5. Solve Ax=b N.I.S. 22 3/4: Benchmark

The benchmark For each solver For each ordering For N=1 32 Build mechanism with N x = N, N y =N. Run thousands of time steps for each configuration. Measure average execution time of each calculation. 23 3/4: Benchmark

Talk outline 1. Problem introduction 2. Tested algorithms 3. Benchmark 4. Results 24

http://www.youtube.com/watch?v=1red2bqbnys 25 4/4: Results

Benchmark C++ implementation MB model Solver-specific tasks to be profiled Assemble Solver Assembled model Preprocessing 1. Update Jacobians 2. Build RHS 3. Triplet -> CCS (*) 4. Numeric factoring 5. Solve Ax=b N.I.S. 26 3/4: Benchmark

Results: Preprocessing (symbolic factorization) 10 3 10 2 10 1 Time (ms) 10 0 10-1 10-2 Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 10-3 10 1 10 2 10 3 10 4 Number of redundant coordinates 27 4/4: Results

Benchmark C++ implementation MB model Solver-specific tasks to be profiled Assemble Solver Assembled model Preprocessing 1. Update Jacobians 2. Build RHS 3. Triplet -> CCS (*) 4. Numeric factoring 5. Solve Ax=b N.I.S. 28 3/4: Benchmark

Results: Distribution of time (KLU+COLAMD) Update Jacobians Build RHS Solve Ax=b Triplet->CCS Numeric factoring 4% 7% 4% 4% Triplet -> CCS can be eliminated with smarter coding! 81% 29 4/4: Results

Results: num. factoring + solve Total numeric factorization & solve time 300 250 Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 200 Time (ms) 150 100 50 UMFPACK with different orderings KLU with different orderings 0 500 1000 1500 2000 2500 3000 3500 4000 Number of redundant coordinates 30 4/4: Results

A comment on the Cholesky approach E matrix chol( E * E ) dense!! 31 4/4: Results

Results: num. factoring + solve (log scale) 10 2 Dense LU UMFPACK+AMD 10 1 Time (ms) 10 0 10-1 10-2 KLU+COLAMD Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 10 1 10 2 10 3 32 Number of redundant coordinates 4/4: Results

Results: num. factoring + solve (log scale) 10 2 Dense LU UMFPACK+AMD 10 1 ~ 50 coords Time (ms) 10 0 ~ 2000 coords 10-1 10-2 KLU+COLAMD Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 10 1 10 2 10 3 33 Number of redundant coordinates 4/4: Results

Results: num. factoring + solve (log scale) 10 2 Dense LU UMFPACK+AMD 10 1 Time (ms) 10 0 T=1ms 10-1 10-2 KLU+COLAMD Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 10 1 10 2 10 3 34 Number of redundant coordinates 4/4: Results

Conclusions MBS dynamics leads to very sparse systems. Exploiting sparsity is crucial for performance unless N<50 approx. 50<N<2000 Use KLU+COLAMD 2000<N Use UMFPACK+AMD In any case: reuse symbolic factorizations! Slides available online: http://www.ual.es/~jlblanco/ Publications 35 4/4: Results

A comparison of Algorithms for Sparse Matrix Factoring and Variable Reordering aimed at Real-time Multibody Dynamic Simulation Thanks for your attention! Slides available online: http://www.ual.es/~jlblanco/ Publications