Intel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager

Size: px

Start display at page:

Download "Intel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager"

Letitia Small
5 years ago
Views:

1 Intel Math Kernel Library (Intel MKL) Sparse Solvers Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager

Copyright 3, Intel Corporation. All rights reserved. Sparse Solvers component Intel MKL PARDISO routine for solving a system of linear equations Ax=f with sparse coefficient matrix A.

2 Copyright 3, Intel Corporation. All rights reserved. Sparse Solvers component Intel MKL PARDISO routine for solving a system of linear equations Ax=f with sparse coefficient matrix A. DSS simplified interface to Intel MKL PARDISO CG Conjugate Gradient iterative solver FGMRES Flexible Generalized Minimum RESidual method Extended Eigensolver Routines (available since Intel MKL.., not shown in the list) A

3 Copyright 3, Intel Corporation. All rights reserved. PARDISO* basics Columns and rows of the matrix are permuted ~ A ( P P P)( P Permutation P is chosen to decrease so called fill- in of factors while doing the factorization step. Factorization ~ A Q t t A L U A P or x) Ay Solving triangular systems ~ ~ A L L t P ~ Ay Q L Uy g Lz Uy Q z t or g f ~ A g L D L t

5 Copyright 3, Intel Corporation. All rights reserved. Intel MKL doc Intel MKL PARDISO parameters: handle and matrix call pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, b, x, error) pt handle to Intel MKL PARDISO internal data structure n size of the matrix a, ia, ja arrays describing the coefficient matrix in the CSR format storage: a(nnz) one dimensional array of nonzero entries of the matrix ia(n+), ja(nnz) array of indices (see description of CSR format)

6 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: support for many matrices and RHS call pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, b, x, error) maxfct maximum number of matrices with the same matrix structure (skeleton, portrait) to be solved by Intel MKL PARDISO mnum current matrix number to work with nrhs number of right hand sides

7 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: matrix types call pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, b, x, error) mtype description real and structurally symmetric real and symmetric positive definite - real and symmetric indefinite 3 complex and structurally symmetric 4 complex and Hermitian positive definite -4 complex and Hermitian indefinite 6 complex and symmetric real and nonsymmetric 3 complex and nonsymmetric Important: depending on mtype the matrix is provided in different formats. In particular, for a symmetric matrix only its upper triangle is required.

8 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: solution phases call pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, b, x, error) phase Solver Execution step Analysis Analysis, numerical factorization 3 Analysis, numerical factorization, solve, iterative refinement Numerical factorization 3 Numerical factorization, solve, iterative refinement 33 Solve, iterative refinement 33 like phase=33, but only forward substitution 33 like phase=33, but only diagonal substitution (if available) 333 like phase=33, but only backward substitution Release internal memory for L and U matrix number mnum - Release all internal memory for all matrices Analysis means fill-reduction analysis and symbolic factorization NB: There is no direct access to factors elements after factorization step

9 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: permutations call pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, b, x, error) perm array of size n, contains permutation vector. Depending on values of some input parameters it is output parameter that contains information about global permutations in the matrix A input parameter providing a user to apply his/her permutations to the matrix A not used at all

10 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: RHS and solution call pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, b, x, error) b array of (nrhs) right hand sides. Depending on values of some parameters this array might be used to store the solution. x array of (nrhs) solution vectors. Even if the solution is supposed to be stored in b array x is needed as a work array.

11 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: statistical info call pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, b, x, error) msglvl message level info: msglvl= silent mode msglvl= the solver prints statistical info to the screen error error flag. If on return error= the routine worked correctly.

12 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: iparm call pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, b, x, error) iparm input/output integer array of size 64. On input, contains values of some parameters that are necessary for correct work of Intel MKL PARDISO If iparm() =, all other values are set to defaults. If iparm()!=, a user must supply all values of iparm() through iparm(64) On output, contains some useful info

13 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: iparm() iparm() Fill-in reordering for the input matrix = Minimum Degree Algorithm = Nested Dissection Algorithm (default) = 3 Parallel Nested Dissection Algorithm Important: Serial and parallel versions of Nested Dissection Algorithm might provide different results!

14 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: iparm() iparm() pivoting perturbation The parameter instructs Intel MKL PARDISO how to handle small or zero pivots while doing factorization. Small pivots are replaced with eps= sign(l ii )* (-iparm()) * A.

15 Copyright 3, Intel Corporation. All rights reserved. Intel MKL PARDISO parameters: iparm(6) iparm(6) switching between In-Core (IC) and Out-Of-Core (OOC) modes. =, IC mode (default) =, Intel MKL PARDISO determines whether it is beneficial to switch to OOC mode. =, OOC mode is turned on factors are stored on the hard disk. Amount of RAM needed for Intel MKL PARDISO is significantly decreased. Important: Efficiency of the solver may decrease significantly in OOC mode due to exchange of data between RAM and hard disk. For this reason you should set iparm(6) =.

16 Copyright 3, Intel Corporation. All rights reserved. Direct Sparse Solver (DSS) interface (/) Initialize solver: call dss_create(handle,opt) handle (integer*8) pointer to the data structure storing internal DSS results opt (integer) parameter to pass DSS options Delete all of the data structures: call dss_delete(handle, opt) Define locations of non-zero elements in the matrix: call dss_define_structure( handle, opt, rowindex, nrows, ncols, columns, nnonzeros) rowindex, columns define locations of non-zero elements in CSR format opt: values for real matrices MKL_DSS_SYMMETRIC_STRUCTURE MKL_DSS_SYMMETRIC MKL_DSS_NON_SYMMETRIC opt: values for complex matrices MKL_DSS_SYMMETRIC_STRUCTURE_COMPLEX MKL_DSS_SYMMETRIC_COMPLEX MKL_DSS_NON_SYMMETRIC_COMPLEX

17 Copyright 3, Intel Corporation. All rights reserved. Direct Sparse Solver (DSS) interface (/) Compute or apply permutations to reduce fill-in call dss_reorder(handle, opt, perm) opt contains MKL_DSS_AUTO_ORDER or MKL_DSS_MY_ORDER or MKL_DSS_METIS_OPENMP_ORDER perm contains permutation vector (input or output depending on opt) Compute factorization of the matrix call dss_factor_real(handle, opt, rvalues) call dss_factor_complex(handle, opt, cvalues) rvalues, cvalues contain non-zero matrix elements NB: As in Intel MKL PARDISO there is no direct access to elements of factors Compute solution call dss_solve_real(handle, opt, rrhsvalues, nrhs, rsolvalues) call dss_solve_complex(handle, opt, crhsvalues, nrhs, csolvalues) nrhs number of right hand sides rrhsvalues, crhsvalues contain components of right hand side vectors rsolvalues, csolvalues contain components of solution vectors NB: Partial solutions (Forward or Backward substitution) can be obtained as in Intel MKL PARDISO via special values of opt DSS interface covers all main cases of using Intel MKL PARDISO. But Intel MKL PARDISO interface is more flexible and allows fine tuning.

Intel MKL Extended Eigensolver Routines Prototype: http://www.ecs.umass.

19 FEAST* functionality Solves symmetric standard eigenproblem: t Au u min, A A, max or symmetric generalized eigenproblem Au Bu t t, A A, B B min, max Complex Hermitian matrices are also supported Copyright 3, Intel Corporation. All rights reserved. 9

20 FEAST basic math z min max Contour integral i zi A dz ( A) is the orthogonal projection onto invariant subspace of A corresponding to eigenvalues within the contour. Given rectangular matrix X of M columns compute Y i ) zi A Xdz ( A X If M is the dimension of the invariant subspace and X is of common place orthogonalizing Y Y Q R, Q t Q I provides matrix Q of eigenvectors of A. C Q t AQ m, Q Q t MxM-matrix is of small size and has eigenvalues within the interval min, max Copyright 3, Intel Corporation. All rights reserved.

21 FEAST basic idea (/) z min z z max M can be tried (supposing that M << N) X can be chosen randomly Contour integral Y i zi A Xdz can be approximated by quadrature formula z k Y i j w j z I A X j w j are weights. z j I A Z Computing Z z ji A X is just solving systems of linear equations with M right hand sides. However, the coefficient matrix becomes complex-valued even if A was real-valued. For a sparse matrix PARDISO can efficiently solve these problems. using parallelism with respect to right hand sides, LAPACK and BLAS level. Moreover, the problems at each point on the contour can be solved independently (additional parallelism). X Copyright 3, Intel Corporation. All rights reserved.

22 FEAST basic idea (/) Orthogonalize columns of Y Y Q R, Q t Q I m z z This is LAPACK QR-factorization min z max Compute MxM-matrix C Q t AQ z k These are Sparse BLAS and BLAS operations Compute its eigendecomposition C U LAPACK symmetric eigensolver functionality U t Form new matrix X Q U convergence is achieved. and repeat steps from computing Y by quadrature formula until Stopping criterion is stabilization of the matrix C trace. Also it is necessary to check all eigenvalues are within the search interval. Change of M might be needed. Copyright 3, Intel Corporation. All rights reserved.

24 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Core, Intel Inside, the Intel Inside logo, Itanium, Itanium Inside, Pentium, Pentium Inside, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #84 Copyright 3, Intel Corporation. All rights reserved. 4

26 Copyright 3, Intel Corporation. All rights reserved. Parameters: iparm(8), iparm(9) iparm(8) reports the number of nonzero elements in the factors < enables reporting (default=-) >= disables reporting iparm(9) reports the number of floating point operations necessary to factor the matrix < enables reporting This may increase the reordering time >= disables reporting

27 Copyright 3, Intel Corporation. All rights reserved. Parameters: iparm(8) iparm(8) single or double precision =, input arrays (a, x, b) must be presented in double precision (default) All internal computations are done in double precision =, input arrays (a, x, b) must be presented in single precision. All internal computations are done in single precision.

28 Copyright 3, Intel Corporation. All rights reserved. Parameters: iparm(5) iparm(5) User permutation: =, User permutation in the perm is ignored (default) =, Intel MKL PARDISO uses the user supplied permutation from the perm array =, Intel MKL PARDISO returns permutation vector computed at phase in the perm array Important: Efficiency of phases and 3 depends greatly on permutations think twice before providing your own permutations!

29 Copyright 3, Intel Corporation. All rights reserved. Parameters: iparm(6) iparm(6) defines where to store the solution: =, the solution is stored in vector x (default); the right hand side b is kept unchanged =, the solution is stored in vector b Important: Even if iparm(6)=, array x is used

30 Copyright 3, Intel Corporation. All rights reserved. Parameters: iparm() iparm() scaling vectors Intel MKL PARDISO uses a Maximum Weight Matching Algorithm to permute large elements on the diagonal and to scale so that the diagonal elements are equal to and the absolute values of the off-diagonal entries are less or equal to. = disable scaling (default for symmetric indefinite matrices) =enable scaling (default for nonsymmetric matrices). Important: If you use scaling you must provide the numerical values of A in the analysis phase (phase=).

31 Copyright 3, Intel Corporation. All rights reserved. Parameters: iparm(7) iparm(7) matrix checker =, Intel MKL PARDISO does not check the sparse matrix representation for errors (default) =, Intel MKL PARDISO checks integer arrays ia and ja. In particular, Intel MKL PARDISO checks whether column indices are sorted in increasing order within each row.

PARDISO Version Reference Sheet Fortran

PARDISO Version Reference Sheet Fortran PARDISO Version 5.0.0 1 Reference Sheet Fortran CALL PARDISO(PT, MAXFCT, MNUM, MTYPE, PHASE, N, A, IA, JA, 1 PERM, NRHS, IPARM, MSGLVL, B, X, ERROR, DPARM) 1 Please note that this version differs significantly