A comparison of Algorithms for Sparse Matrix Factoring and Variable Reordering aimed at Real-time Multibody Dynamic Simulation Jose-Luis Torres-Moreno, Jose-Luis Blanco, Javier López-Martínez, Antonio Giménez-Fernández 1-4 July, 2013 Zagreb (Croatia) ECCOMAS Multibody Dynamics 2013
Goal of this work Problem under study Multibody dynamic formulation: Best numerical solver algorithm? = = \ 2 It depends Ax b x A b
Talk outline 1. Problem introduction 2. Tested algorithms 3. Benchmark 4. Results 3
Problem introduction Modeling decisions: coordinates, formulation, etc. (,&, ) q&& t = f q t q t t ( ) ( ) ( ) (, &, ) && z t = f z t z t t ( ) ( ) ( ) Our choices for this work: Dependent coordinates vs. Independent coordinates Coordinate type for modeling: Dynamic formulation: Natural coordinates Index-1 Lagrange w. Baumgarte 4 1/4: Introduction
Problem introduction A x = b Problem structure? N dependent coordinates M constraints (N+M) x (N+M) (N+M M) x 1 = (N+M M) x 1 Square, symmetric, Dense columns psd and sparse 5 1/4: Introduction
Structure of matrix A A Most important features - M is constant - Static non-zero structure - SPARSITY 6 1/4: Introduction
Why sparsity matters? Naive matrix inversion: T M φq q&& Q = φq 0 λ c 14243 A q && λ 1 T M φ q Q = φq 0 c 14243 A 1 7 1/4: Introduction
Why sparsity matters? nnz=528 (93.76% are zeros) nnz=8266 (2.34% are zeros) 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 M φq φ T q 0 8 1/4: Introduction M φq In general, the inverse of a sparse matrix is NOT sparse! ( 3 ( + ) ) Cost: O N M φ T q 0 1
Why sparsity matters? How to keep matrices sparse? Avoid direct matrix inversion. Alternatives? Sparse matrix factorization algorithms: 1) Method: LU, Cholesky. 2) Variable ordering 9 1/4: Introduction
Why variable reordering matters? nnz=528 (93.76% are zeros) 10 20 30 ~ 94% zeros Natural ordering A = 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 ~ 70% zeros nnz=2599 (69.29% are zeros) nnz=2456 (70.98% are zeros) 10 10 20 20 30 30 A = L U = 40 50 40 50 60 60 70 70 80 80 90 90 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 10 1/4: Introduction
Why variable reordering matters? Reordering = permutation of variable indices nnz=528 (93.76% are zeros) nnz=528 (93.76% are zeros) 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 A ( :, : ) A( p, p ) 11 1/4: Introduction
Why variable reordering matters? nnz=528 (93.76% are zeros) 10 20 30 ~ 94% zeros Optimized ordering 40 50 60 70 80 90 ~97% 10 20 30 40 50 60 70 80 90 and ~93% zeros nnz=244 (97.12% are zeros) nnz=620 (92.67% are zeros) 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 10 20 30 40 50 60 70 80 90 90 10 20 30 40 50 60 70 80 90 12 1/4: Introduction
Talk outline 1. Problem introduction 2. Tested algorithms 3. Benchmark 4. Results 13
Tested algorithms Approaches: LU decomposition Cholesky decomposition 14 3/4: Algorithms
Tested algorithms Approaches: LU decomposition Cholesky decomposition Ax = b } A 1) L Ux { = L y= b, solve for y ( LU) x = b = y 2) Ux = y, solve for x nnz=244 (97.12% are zeros) nnz=620 (92.67% are zeros) 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 90 90 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 15 3/4: Algorithms
Tested algorithms Benchmarked LU implementations (6 variations): 1) Dense LU Eigen C++ library 2) Sparse LU Suitesparse s KLU a) AMD ordering b) COLAMD ordering 3) Sparse LU Suitesparse s UMFPACK a) Natural ordering b) AMD ordering c) METIS ordering Symbolic vs. numeric factorization! 16 3/4: Algorithms
Tested algorithms Approaches: LU decomposition Cholesky decomposition (only when M is definite positive) T M φq q&& Q = φq 0 λ c 14243 A Apply Cholesky to M. Use Schür complement to build two reduced systems of equations. [R. von Schwerin, Mathematical Methods for MBS in Descriptor Form, 1999] 17 3/4: Algorithms
Tested algorithms Approaches: LU decomposition Cholesky decomposition (only when M is definite positive) Benchmarked Cholesky implementations (3 variations): 1) Sparse Cholesky Suitesparse s CHOLMOD a) AMD ordering b) COLAMD ordering c) METIS ordering 18 3/4: Algorithms
Talk outline 1. Problem introduction 2. Tested algorithms 3. Benchmark 4. Results 19
Our benchmark model Test mechanism: Parameterized 2D lattice of N x N y articulated quadrangles. Example: Nx = 5 Ny = 4 20 3/4: Benchmark
Our benchmark model Irregular node locations to avoid singular configurations: 21 3/4: Benchmark
Benchmark C++ implementation MB model Solver-specific tasks to be profiled Assemble Solver Assembled model Preprocessing 1. Update Jacobians 2. Build RHS 3. Triplet -> CCS (*) 4. Numeric factoring 5. Solve Ax=b N.I.S. 22 3/4: Benchmark
The benchmark For each solver For each ordering For N=1 32 Build mechanism with N x = N, N y =N. Run thousands of time steps for each configuration. Measure average execution time of each calculation. 23 3/4: Benchmark
Talk outline 1. Problem introduction 2. Tested algorithms 3. Benchmark 4. Results 24
http://www.youtube.com/watch?v=1red2bqbnys 25 4/4: Results
Benchmark C++ implementation MB model Solver-specific tasks to be profiled Assemble Solver Assembled model Preprocessing 1. Update Jacobians 2. Build RHS 3. Triplet -> CCS (*) 4. Numeric factoring 5. Solve Ax=b N.I.S. 26 3/4: Benchmark
Results: Preprocessing (symbolic factorization) 10 3 10 2 10 1 Time (ms) 10 0 10-1 10-2 Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 10-3 10 1 10 2 10 3 10 4 Number of redundant coordinates 27 4/4: Results
Benchmark C++ implementation MB model Solver-specific tasks to be profiled Assemble Solver Assembled model Preprocessing 1. Update Jacobians 2. Build RHS 3. Triplet -> CCS (*) 4. Numeric factoring 5. Solve Ax=b N.I.S. 28 3/4: Benchmark
Results: Distribution of time (KLU+COLAMD) Update Jacobians Build RHS Solve Ax=b Triplet->CCS Numeric factoring 4% 7% 4% 4% Triplet -> CCS can be eliminated with smarter coding! 81% 29 4/4: Results
Results: num. factoring + solve Total numeric factorization & solve time 300 250 Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 200 Time (ms) 150 100 50 UMFPACK with different orderings KLU with different orderings 0 500 1000 1500 2000 2500 3000 3500 4000 Number of redundant coordinates 30 4/4: Results
A comment on the Cholesky approach E matrix chol( E * E ) dense!! 31 4/4: Results
Results: num. factoring + solve (log scale) 10 2 Dense LU UMFPACK+AMD 10 1 Time (ms) 10 0 10-1 10-2 KLU+COLAMD Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 10 1 10 2 10 3 32 Number of redundant coordinates 4/4: Results
Results: num. factoring + solve (log scale) 10 2 Dense LU UMFPACK+AMD 10 1 ~ 50 coords Time (ms) 10 0 ~ 2000 coords 10-1 10-2 KLU+COLAMD Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 10 1 10 2 10 3 33 Number of redundant coordinates 4/4: Results
Results: num. factoring + solve (log scale) 10 2 Dense LU UMFPACK+AMD 10 1 Time (ms) 10 0 T=1ms 10-1 10-2 KLU+COLAMD Dense LU UMFPACK Natural UMFPACK AMD UMFPACK METIS KLU AMD KLU COLAMD CHOLMOD AMD/AMD CHOLMOD COLAMD/COLAMD CHOLMOD METIS/METIS 10 1 10 2 10 3 34 Number of redundant coordinates 4/4: Results
Conclusions MBS dynamics leads to very sparse systems. Exploiting sparsity is crucial for performance unless N<50 approx. 50<N<2000 Use KLU+COLAMD 2000<N Use UMFPACK+AMD In any case: reuse symbolic factorizations! Slides available online: http://www.ual.es/~jlblanco/ Publications 35 4/4: Results
A comparison of Algorithms for Sparse Matrix Factoring and Variable Reordering aimed at Real-time Multibody Dynamic Simulation Thanks for your attention! Slides available online: http://www.ual.es/~jlblanco/ Publications