Direct Algorithms for Sparse Schur Complements and Inverses

Size: px

Start display at page:

Download "Direct Algorithms for Sparse Schur Complements and Inverses"

Kathleen Allison
5 years ago
Views:

1 Direct Algorithms for Sparse Schur Complements and Inverses Dr. Ryan Chilton MyraMath

2 Outline Examine some less common sparse direct algorithms: Partial linear solution. Schur complements. Sampling the inverse operator. Apply them as frontends for low-rank skeletonization: Cross approximation. Range estimation. Ritz projection. Motivations: fast direct solvers for FE-BI s and FE-DDM s.

3 Refresher: Factor A=LL T Reorder: Left(0), Right(1), Separator(2). A 01 = A 10 = all zero! Right looking. Factor A 00 /A 11, schur downdate A 22, factor. FEM mesh: Reordered matrix: Algorithm steps: A 00 0 A 02 Factor A 22 0 A 11 A 12 Schur Downdate A 22 A 20 A 21 A 22 Solve A 20 Solve A 21 Separator induces these zeroes. They can t fill-in! Factor A 00 Factor A 11 Left (0) Separator (2) Right (1) Note A 00 and A 11 also sparse, apply idea recursively. Leads to a tree of operations, eliminating from bottom up.

1M GEMM 12sec 42s 161s 35s 3D=O(n 1.87 ) 2D=O(n 1.53 ) 1D=O(n 1.

4 Selected profiling data. Example problem under study: I x J x K brick (N = IJK) N= 80 3 = 512K N=100 3 = 1M N=128 3 = 2.1M GEMM 12sec 42s 161s 35s 3D=O(n 1.87 ) 2D=O(n 1.53 ) 1D=O(n 1.08 ) 105sec 367s 1559s 405s Intel E x8=16 Xeon at 2.4GHz, MKL 48 3 Discrete graph laplacian (7-point): well understood spectrum. Structured grid: easy to reorder using nested dissection.

5 Partial solution x=r it A -1 R j b In plain english: only b(j) nonzero, only x(i) is needed. = Many engineering QoI s use only boundary-valued b and x. Lx=b L T x=b Solve Partial Solve j j i i O(n 4/3 ) time, like x=a -1 b. Only O(n 2/3 ) space per RHS, not O(n).

6 Schur complement S=B T A -1 B Concept: form saddle system of A and B, then quit early. Arise from FE-BI hybrids, eg scattering from apertures.

7 Sampling the inverse Z(i,j), Z=A -1 Closely related to Schur complement, Z(i,j) = R it A -1 R j Arise in FETI/DDM, iterate/exchange fields at boundaries. Scatter, solve, gather. Scatter, solve, gather. Tabulating Z(i,j) opens up reuse/preconditioning options.

8 Cross Approximating Z(i,j) [1/2] Alternately sample row/column with largest error modulus. log 10 (Z-UV T ) Estimated Error Actual Error SVD(Z) Key idea: partialsolve() can efficiently extract rows/columns: c = Z([i],j) = solver.partialsolve([i],j,x=1.0,'left') r = Z(i,[j]) = solver.partialsolve(i,[j],x=1.0,'right')

9 Cross Approximating Z(i,j) [2/2] Beats solver.inverse() at large N, especially at low rank/tol. 8 digits 6 digits 4 digits But in parallel the gap narrows, BLAS3 vs BLAS1 effects.

10 (error) Range estimation of Z(i,j) [1/2] Apply action of Z to random vectors X, form image Y=ZX. If Z has rapidly decaying σ s, Y probably spans range(z). // Find Q = span(z) X = rand(z.cols,k) Y = Z.apply(X) [Q,R,π] = QR(Y,0) k=4 k=8 x SVD(Z) SVD(UV T ) Pass 1, Pass 2.. // Build k-svd from Q W = Z.apply(Q) [U,Ʃ,V] = svd(w,0) Z (Q U) Ʃ (V) k=16 k=32 Key idea: partialsolve() can efficiently apply Y=Z(i,j) X: Y = Z([i],[j])*X = solver.partialsolve([i],[j],x,'left')

11 Range estimation of Z(i,j) [2/2] All the same problem instances as before (sizes,shapes). 8 digits 6 digits 4 digits Availability of all forcing data up front leads to speedup. Can be faster than parallel solver.inverse(), even at modest N.

12 Ritz Projection of Z(i,j) [1/3] What about approximating more than just one block? (B)lock (L)ow (R)ank (H)eirarchical Matrix Optimization(BLR)/amortization(H) opportunities do exist.

13 All of exterior, partitioned into (leaf) groups. G3 G2 G1 Y(3,0) = colspan Z(3,0) G0 Y(0,3) = colspan Z(0,3) = rowspan Z(3,0) Ritz Projection of Z(i,j) [2/3] First pass: find row/column spans using fat partialsolve() k k k k R = Y(3,0) T Z(3,0) Y(0,3) k k = T [schur] R = solver.schur(y 30,Y 03 ) [U,Ʃ,V] = svd(r03) Z 30 (Y 30 U) Ʃ (V Y 03 ) X Y Second pass: Ritz projection using solver.schur(), k-svd

14 Ritz Projection of Z(i,j) [3/3] Fill an H-matrix representation of Z restricted to boundary. 1385sec Factor Form Y [partialsolve] Form B [schur,qr,svd] Form Z [inverse] Algorithm quickly furnishes all (admissible) blocks. Can form H-matrix of S=B T A -1 B with a few minor changes.

15 Wrapping Up Examined several uncommon sparse direct algorithms: Partial linear solution: x=r it A -1 R j b (sparse b, sifted x) Schur complements: B T A -1 B, B T A -1 C, all sparse Sampling the inverse operator: Z(i,j) = R i A -1 R j Used them as frontends for low-rank/skeletonization: Cross approximation: partialsolve() can extract row/column Range estimation: partialsolve() can apply Z(i,j) quickly Ritz projection: schur()+partialsolve(), amortization over blocks Essential tools for FEBI/DDM methods (sparsity+lowrank).

16 Contact: MyraMath: sparse factor/solve/schur/inverse/partialsolve. MyraKL: BLAS/LAPACK API for MyraMath, or use MKL. Free software (GPL), or dual license

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Sergey Solovyev 1, Dmitry Vishnevsky 1, Hongwei Liu 2 Institute of Petroleum Geology and Geophysics SB RAS 1 EXPEC ARC,