Intel Math Kernel Library

Size: px

Start display at page:

Download "Intel Math Kernel Library"

Hortense Lane
5 years ago
Views:

1 Intel Math Kernel Library Release 7.0 March 2005

2 Intel MKL Purpose Performance, performance, performance! Intel s scientific and engineering floating point math library Initially only basic linear algebra subroutines (BLAS) and fast Fourier transformations (FFT) Address: Solvers such as linear algebra package (LAPACK) and BLAS Eigenvector/eigenvalue solvers (BLAS, LAPACK) Some quantum chemistry needs (dgemm) PDEs, signal processing, seismic, solid-state physics (FFTs) General scientific, financial - vector transcendental functions, vector markup language (VML) Tune for Intel processors current & future Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

3 Intel MKL Purpose Don ts But don t use Intel MKL on X Y Z W = 4x4 Transformation matrix X Y Z W But you could use Intel IPP 1 Geometric transformation Don t use Intel MKL on small counts Don t call vector math functions on small n 1 Intel Integrated Performance Primitives Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

4 Intel MKL Contents BLAS (basic linear algebra subroutines) Level 1 BLAS vector-vector operations 15 function types 48 functions Level 2 BLAS matrix-vector operations 26 function types 66 functions Level 3 BLAS matrix-matrix operations 9 function types 30 functions Extended BLAS level 1 BLAS for sparse vectors 8 function types 24 functions Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

5 Intel MKL Contents LAPACK (linear algebra package) Solvers and eigensolvers, hundreds of routines! More than 1000 user callable and support routines FFTs (fast Fourier transforms) One and two dimensional With and without frequency ordering (bit reversal) VML (vector math library) Set of vectorized transcendental functions Most of libm functions, but faster Direct Sparse solver (Pardiso*) Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

6 Intel MKL Contents Most of Intel MKL is Fortran interface Legacy of high performance computation BLAS, LAPACK are both Fortran, make up most of library CBLAS interface more convenient for C/C++ programmer to call BLAS Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

7 Intel MKL Contents - Environment Supports cdecl and CVF default interfaces Supports Intel and CVF Fortran compilers import for this support relates to runtime libraries Supports Linux* and Windows* OS Static and dynamically linked libraries Supports all processors 32-bit and 64-bit Large set of tests and examples Extensive documentation Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

8 Threading Most of Intel MKL could be threaded but Limited resource is memory bandwidth Threading level 1, level 2 BLAS mostly ineffective ( O(n) ) Numerous opportunities for threading Level 3 BLAS ( O(n3) ) LAPACK ( O(n3) ) FFTs ( O(n log(n) ) VML? Depends on processor and function All threading uses OpenMP* All Intel MKL is designed and compiled for thread safety Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

9 How to Link With MKL on Itanium Set path to installation directory E.g. export MKLPATH=/opt/intel/mkl Static sample: ld myprog.o $MKLPATH/libmkl_lapack.a $MKLPATH/libmkl_ipf.a -L$MKLPATH -lguide -lpthread Itanium -based processor static linking of LAPACK and kernels. Processor dispatcher will call the appropriate kernel for the system at runtime. Dynamic sample: ld myprog.o -L$MKLPATH -lmkl_lapack64 -lmkl -lguide -lpthread Dynamic linking on Itanium -based platforms, LAPACK library (double precision functions), Itanium-based processor kernels. Shared object dispatcher will dynamically load the appropriate shared object with specific kernel for the system at runtime Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

10 BLAS Review 3 3 levels of functions + sparse Level 1: vector-vector operations Level 2: vector-matrix operations Level 3: matrix-matrix operations Sparse: level 1 operations on sparse vectors Levels follow history Level 1 in early 70 s Level 2 in mid-70 s followed immediately by level 3 The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

11 BLAS Naming Conventions General scheme: <precision><name><modifier> precision: one or two letters 1 letter implies input and output are same type s = single, d = double, c = single complex, z = double complex 2 letters input and output are different cs, zd: : complex in, real out; sc, dz: : real in, complex out The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

12 BLAS Naming Conventions Level 1 BLAS: <precision><name><modifier> where modifiers are c: conjugated (cdotc), u: unconjugated (cdotu), g: givens (srotg) Level 2, 3 BLAS <name>: g: general - ge: general; gb: band s: symmetric - sy: symmetric; sp: packed; sb: band h: : Hermitian - he: Hermitian; hp: packed ; hb: band t: triangular - tr: triangular; tp: packed; tb: band The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

13 BLAS Naming Conventions Level 2 <modifier> mv: matrix-vector; sv: solve (vector operations); r: rank update; r2: rank 2 update dger: double-precision general rank update: A := alpha * x * y + A Level 3 <modifier> mm: matrix-matrix; sm: solve (matrix operations); r: rank update; r2: rank 2 update dsyr2k: double-precision symmetric rank-2 update The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

14 RNG Functions Gaussian (RPM, Box-Muller Methods) Exponential Laplace Uniform (a,b), (-a,a) Weibull Rayleigh Cauchy Lognormal Discrete Uniform [a,b) Geometric Bernoulli Others

DGEMM on IPF2: N = 1024 90% 85% 80% 75% 70% 65% 60%

15 DGEMM on IPF2: N = % 85% 80% 75% 70% 65% 60% 55% 50% 45% % peak 40% 35% 30% 25% 20% 15% 1.0 GHz 10% Itanium 2 Processor in 6.0 update 5%

16 LINPACK on 1.0 GHz IPF2 MFLOPS Number of Equations 1 CPU 2 CPU 4 CPU

17 2D DFTs*on 900 MHz IPF2 MFLOPS P 2P *Single precision complex MKL 6.0 β update Transform Siz4e

18 1D DFTs*on 900 MHz IPF MFLOPS P *Single precision complex MKL 6.0 β update Transform Size

19 MKL Status, Plans Current Production Release is 7.2 available a in 2 versions Standard MKL Cluster MKL Standard MKL _ ScaLAPACK Version to be released in Q1/2005 Improvements on Itanium BLAS:DGEMM: 1-3% improvement for TN and TT cases BLAS:*TRMV, ZGERC, ZGERU: 20-30% improvement VML vdpowx: improved for special cases To be released in Q3/2004

20 Future Releases of MKL New capabilities C++ Wrappers Iterative Sparse Solver LAPACK 4.0 support Additional statistical functions Support for upcoming Intel processors More Intel Cluster MKL Distributed Memory DFTs Distributed Memory sparse solver Additional ScaLAPACK performance optimizations

21 MKL Summary Easy way to portable code for all Intel architectures, Linux* and Windows* MKL for Itanium processor path to easy high performance for applications Technical computation support linear algebra (BLAS, LAPACK) FFTs vector transcendentals (VML) Cluster computing being added

22 Backup

23 ScaLAPACK Overview What is ScaLAPACK? The ScaLAPACK (Scalable Linear Algebra PACKage) library includes a subset of LAPACK routines redesigned for distributed memory parallel computers Allowing numerical computing applications to take advantage of compute power across the multiple nodes of a cluster ScaLAPACK in Intel MKL 7.0 Performance version of ScaLAPACK for clusters using the Intel Pentium 4, Xeon and Itanium 2 processors API V1.7 (available at Linux* only Support MPICH, Myrinet* MPI (Message Passing Interface) *Other names and brands may be claimed as the property of others.

24 Calling ScaLAPACK Syntax similar to LAPACK Conversion from LAPACK to ScaLAPACK DGETRF(M,N, A(IA,JA), LDA,, IPIV,INFO) becomes PDGETRF(M,N, A,IA,JA, DESCA,, IPIV,INFO) DESCA is an integer array with 9 elements that describe how the matrix is to be distributed including Cluster context Size of matrix Size of matrix blocks Node on which top left element of matrix is located Leading dimension of the matrix fragment on that node.

Intel Math Kernel Library 10.3

Intel Math Kernel Library 10.3 Product Brief Intel Math Kernel Library 10.3 The Flagship High Performance Computing Math Library for Windows*, Linux*, and Mac OS* X Intel Math Kernel Library (Intel MKL)