Intel Performance Libraries

Similar documents
Intel Math Kernel Library 10.3

Sergey Maidanov. Software Engineering Manager for Intel Distribution for Python*

Chao Yu, Technical Consulting Engineer, Intel IPP and MKL Team

Fastest and most used math library for Intel -based systems 1

Maximizing performance and scalability using Intel performance libraries

Intel Math Kernel Library ( Intel MKL )

Intel Math Kernel Library (Intel MKL) Overview. Hans Pabst Software and Services Group Intel Corporation

Intel Math Kernel Library (Intel MKL) Latest Features

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

Intel Math Kernel Library. Getting Started Tutorial: Using the Intel Math Kernel Library for Matrix Multiplication

Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System

Intel Math Kernel Library Perspectives and Latest Advances. Noah Clemons Lead Technical Consulting Engineer Developer Products Division, Intel

Intel Parallel Studio XE 2015

Intel Math Kernel Library

Scaling Out Python* To HPC and Big Data

Getting Reproducible Results with Intel MKL

Brief notes on setting up semi-high performance computing environments. July 25, 2014

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Intel Distribution for Python* и Intel Performance Libraries

Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation

Intel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Some notes on efficient computing and high performance computing environments

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

High Performance Computing The Essential Tool for a Knowledge Economy

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Sarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018

Intel Math Kernel Library

Issues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

April 2 nd, Bob Burroughs Director, HPC Solution Sales

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

FAST FORWARD TO YOUR <NEXT> CREATION

Investigation of Intel MIC for implementation of Fast Fourier Transform

Linear Algebra libraries in Debian. DebConf 10 New York 05/08/2010 Sylvestre

Kevin O Leary, Intel Technical Consulting Engineer

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Intel Advisor XE. Vectorization Optimization. Optimization Notice

Hans Pabst, January 24 th 2013 Software and Services Group Intel Corporation

LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015

Achieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017

Intel Many Integrated Core (MIC) Architecture

Intel Visual Fortran Compiler Professional Edition 11.0 for Windows* In-Depth

Graphics Performance Analyzer for Android

Running Docker* Containers on Intel Xeon Phi Processors

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Get Ready for Intel MKL on Intel Xeon Phi Coprocessors. Zhang Zhang Technical Consulting Engineer Intel Math Kernel Library

Bei Wang, Dmitry Prohorov and Carlos Rosales

Using the Intel Math Kernel Library (Intel MKL) and Intel Compilers to Obtain Run-to-Run Numerical Reproducible Results

Enabling Performance-per-Watt Gains in High-Performance Cluster Computing

IXPUG 16. Dmitry Durnov, Intel MPI team

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor

Intel MPI Library Conditional Reproducibility

PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures

CUDA Accelerated Compute Libraries. M. Naumov

Intel Software Development Products for High Performance Computing and Parallel Programming

INTEL MKL Vectorized Compact routines

Reusing this material

What s New August 2015

Evaluation of Intel Memory Drive Technology Performance for Scientific Applications

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Intel Xeon Processor E v3 Family

Intel Math Kernel Library

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Overview of Data Fitting Component in Intel Math Kernel Library (Intel MKL) Intel Corporation

Speeding up numerical calculations in Python *

Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor

H.J. Lu, Sunil K Pandey. Intel. November, 2018

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory

Intel Math Kernel Library

Installation Guide and Release Notes

What s P. Thierry

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

Intel MKL Sparse Solvers. Software Solutions Group - Developer Products Division

Intel Xeon Phi Coprocessor

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation

Intel C++ Compiler Professional Edition 11.0 for Windows* In-Depth

Parallelism V. HPC Profiling. John Cavazos. Dept of Computer & Information Sciences University of Delaware

Intel Math Kernel Library for Windows*

Intel Math Kernel Library Getting Started Tutorial: Using the Intel Math Kernel Library for Matrix Multiplication

Intel C++ Compiler Professional Edition 11.1 for Mac OS* X. In-Depth

Intel Cluster Checker 3.0 webinar

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

Achieve Better Performance with PEAK on XSEDE Resources

LS-DYNA Performance on Intel Scalable Solutions

Message Passing Interface (MPI) on Intel Xeon Phi coprocessor

Vectorization Advisor: getting started

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Double Rewards of Porting Scientific Applications to the Intel MIC Architecture

Introducing the Intel Xeon Phi Coprocessor Architecture for Discovery

HPCG Results on IA: What does it tell about architecture?

Scientific Computing. Some slides from James Lambers, Stanford

Intel C++ Compiler Professional Edition 11.0 for Linux* In-Depth

Intel Software Development Products Licensing & Programs Channel EMEA

Using Intel VTune Amplifier XE for High Performance Computing

Transcription:

Intel Performance Libraries

Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation Speeds math processing in scientific, engineering and financial applications Functionality for dense and sparse linear algebra (BLAS, LAPACK, PARDISO), FFTs, vector math, summary statistics and more Provides scientific programmers and domain scientists Interfaces to de-facto standard APIs from C++, Fortran, C#, Python and more Support for Linux*, Windows* and OS X* operating systems The ability to extract great parallel performance with minimal effort Unleash the performance of Intel Core, Intel Xeon and Intel Xeon Phi product families Optimized for single core vectorization and cache utilization Coupled with automatic OpenMP*-based parallelism for multi-core, manycore and coprocessors Scales to PetaFlop (1015 floating-point operations/second) clusters and beyond Included in Intel Parallel Studio XE Suites **http://www.top500.org Used on the World s Fastest Supercomputers** 2

Intel Math Kernel Library is a Computational Math Library Mathematical problems arise in many scientific disciplines Energy Signal Processing These scientific applications areas typically involve mathematics... Differential equations Linear algebra Fourier transforms Statistics Financial Analytics Engineering Design u2 x 2 u2 y 2 + q u = f x, y Digital Content Creation Intel MKL can help solve your computational challenges Science & Research 3

Optimized Mathematical Building Blocks Intel Math Kernel Library Linear Algebra BLAS LAPACK Sparse Solvers Iterative Pardiso* ScaLAPACK Vector RNGs Congruential Wichmann-Hill Mersenne Twister Sobol Neiderreiter Non-deterministic Fast Fourier Transforms Multidimensional FFTW interfaces Cluster FFT Summary Statistics Kurtosis Variation coefficient Order statistics Min/max Variance-covariance Vector Math Trigonometric Hyperbolic Exponential, Log Power / Root And More Splines Interpolation Trust Region Fast Poisson Solver 4

Intel Math Kernel Library is a Performance Library We go to extremes to get the most performance from the available resources Core: vectorization, prefetching, cache utilization Multicore (processor/socket) level parallelization Multi-socket (node) level parallelization Cluster scaling Data locality is key Sequential Intel MKL MKL + OpenMP MKL + Intel MPI Many Core Intel Xeon Phi TM Coprocessor Automatic scaling from multicore to manycore and beyond 5

Performance (GFlops) Immediate Performance Benefit to Applications Intel Math Kernel Library 350 300 250 200 150 100 50 0 Significant LAPACK Performance Boost using Intel Math Kernel Library versus ATLAS* DGETRF on Intel Xeon E5-2690 Processor 2000 3000 4000 5000 10000 15000 20000 25000 30000 35000 40000 45000 Matrix Size Intel MKL provides significant performance boost over ATLAS* Intel MKL - 16 threads Intel MKL - 8 threads ATLAS - 16 threads ATLAS - 8 threads Configuration: Hardware: CPU: Dual Intel Xeon E5-2697v2@2.70Ghz; 64 GB RAM. Interconnect: Mellanox Technologies* MT27500 Family [ConnectX*-3] FDR.. Software: RedHat* RHEL 6.2; OFED 3.5-2; Intel MPI Library 5.0 Intel MPI Benchmarks 3.2.4 (default parameters; built with Intel C++ Compiler XE 13.1.1 for Linux*); Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. * Other brands and names are the property of their respective owners. Benchmark Source: Intel Corporation : Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804. The latest version of Intel MKL unleashes the performance benefits of Intel architectures 6

Notable New Features Intel Math Kernel Library As always, optimized for the latest Intel Xeon processors Intel Xeon Phi Coprocessor support native compiler assisted offload automatic offload Conditional Numerical Reproducibility consistent results from run-to-run controls provided to ensure consistency across processor families LAPACKE Interfaces extends the de-factor Fortran LAPACK APIs callable from C and with support for rowmajor storage Extended Eigensolvers solves standard and generalized sparse symmetric/hermitian Eigenproblems based on and compatible with FEAST* http://www.ecs.umass.edu/~polizzi/feast / Available since Intel MKL 11.0 7

Intel Xeon Phi Coprocessor Support Intel Math Kernel Library (Intel MKL) Automatic Offload No code changes required Automatically uses both host and target Transparent data transfer and execution management Compiler Assisted Offload Explicit controls for data transfer and remote execution using compiler pragma offload Can be used together with Automatic Offload Native Execution Uses the coprocessors as independent nodes Input data and binaries are copied to targets in advance 8

Conditional Numerical Reproducibility Intel Math Kernel Library (Intel MKL) The cause for a variation in results With floating-point numbers, the order of computation matters! Remember that associativity does not always hold, that is, (a+b)+c a+(b+c) 2-63 + 1 + -1= 2-63 (infinitely precise result) ( 2-63 + 1 ) + -1= 0 (correct IEEE double precision result) 2-63 + ( 1 + -1) = 2-63 (correct IEEE double precision result) Intel MKL 11.0 brought fixed code path options for aligned data and deterministic scheduling Example: attain identical results on every Intel processor supporting AVX instructions or later function call: mkl_cbwr_set(mkl_cbwr_avx) environment variable: set MKL_CBWR_BRANCH= AVX Intel MKL 11.1 removed the data alignment restriction 9

New Features Intel Math Kernel Library (Intel MKL) 11.2 The Cluster Direct Sparse Solver extends the capabilities of Intel MKL PARDISO, enabling users to solve large distributed sparse systems of equations on clusters Small Matrix Multiply performance improvements deliver performance boosts of 1.3 times on average for small problem sizes (less than 20x20) Support for the next generation Intel Advanced Vector Extensions 512 (Intel AVX- 512) instruction set with optimizations in BLAS, DFT and VML A Intel MKL cookbook that provides step-by-step recipes to solve common mathematical problems using existing library functions Verbose mode allows users to understand how Intel MKL is used in your program provides detailed Intel MKL versioning information identifies the library functions called and the parameters passed to them returns the amount of time spent in each function call 10

Futures Intel Math Kernel Library (Intel MKL) Technology Previews Available Intel MKL Sparse Matrix Vector Multiply Prototype Package improved sparse matrix-vector multiply for Intel Xeon Phi coprocessor 2-stage API for sparse BLAS (analyze and execute) support for a new Ellpack Sparse Block (ESB) format Intel Optimized Technology Preview for High Performance Conjugate Gradient Benchmark proposed to supplement the current High Performance Linpack Benchmark designed to be more representative of common application workloads Packages available contact intel.mkl@intel.com We seek your insights into defining new Intel MKL features 11

DGEMM Small Matrix Improvements Intel Math Kernel Library 11.2 12

Future Directions: SpMV Prototype Format Package Intel Math Kernel Library (Intel MKL) Improved sparse matrix-vector multiply for Intel Xeon Phi 2-stage API for Sparse BLAS (analyze and execute) Ellpack Sparse Block (ESB) format Package available contact intel.mkl@intel.com Seek industry feedback on API and performance 13