Optimizing the operations with sparse matrices on Intel architecture

Similar documents
PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

Intel Cache Acceleration Software for Windows* Workstation

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

High Performance Dense Linear Algebra in Intel Math Kernel Library (Intel MKL)

LED Manager for Intel NUC

Software Evaluation Guide for WinZip* esources-performance-documents.html

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Intel vpro Technology Virtual Seminar 2010

Intel vpro Technology Virtual Seminar 2010

How to Create a.cibd File from Mentor Xpedition for HLDRC

INTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool

Bitonic Sorting Intel OpenCL SDK Sample Documentation

How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod

Intel Desktop Board DZ68DB

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Installation Guide and Release Notes

Installation Guide and Release Notes

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Intel & Lustre: LUG Micah Bhakti

Intel Stereo 3D SDK Developer s Guide. Alpha Release

Intel Desktop Board DP55SB

Intel Desktop Board DG41CN

Intel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design

Theory and Practice of the Low-Power SATA Spec DevSleep

IEEE1588 Frequently Asked Questions (FAQs)

Data Center Energy Efficiency Using Intel Intelligent Power Node Manager and Intel Data Center Manager

Intel RealSense Depth Module D400 Series Software Calibration Tool

Intel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*

Intel Desktop Board DG31PR

Introduction. How it works

Drive Recovery Panel

Intel Desktop Board DG41RQ

Software Evaluation Guide for WinZip 15.5*

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

Intel Desktop Board D945GCLF2

Intel Desktop Board D975XBX2

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel RealSense D400 Series Calibration Tools and API Release Notes

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Software Evaluation Guide for Photodex* ProShow Gold* 3.2

Overview of Intel MKL Sparse BLAS. Software and Services Group Intel Corporation

Intel Desktop Board D945GCCR

Intel 945(GM/GME)/915(GM/GME)/ 855(GM/GME)/852(GM/GME) Chipsets VGA Port Always Enabled Hardware Workaround

Intel Desktop Board D946GZAB

Installation Guide and Release Notes

Software Evaluation Guide for CyberLink MediaEspresso *

Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012

Intel Core TM i7-4702ec Processor for Communications Infrastructure

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012

Solid-State Drive System Optimizations In Data Center Applications

Intel Desktop Board DH55TC

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003

The Intel Processor Diagnostic Tool Release Notes

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Intel Desktop Board DP45SG

Intel Core TM Processor i C Embedded Application Power Guideline Addendum

Intel Desktop Board DQ35JO

Intel vpro Technology Virtual Seminar 2010

Intel Cache Acceleration Software - Workstation

Intel USB 3.0 extensible Host Controller Driver

Upgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence

Software Evaluation Guide Adobe Premiere Pro CS3 SEG

Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors

Intel G31/P31 Express Chipset

Device Firmware Update (DFU) for Windows

Intel Desktop Board D945GCLF

Version 1.0. Intel-powered classmate PC Arcsoft WebCam Companion 3* Training Foils. *Other names and brands may be claimed as the property of others.

Intel Platform Controller Hub EG20T

Intel Desktop Board DH61SA

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3

Intel Many Integrated Core (MIC) Architecture

Intel Desktop Board D845PT Specification Update

Intel s Architecture for NFV

Installation Guide and Release Notes

Software Evaluation Guide for Sony Vegas Pro 8.0b* Blu-ray Disc Image Creation Burning HD video to Blu-ray Disc

Intel and Badaboom Video File Transcoding

Krzysztof Laskowski, Intel Pavan K Lanka, Intel

SDK API Reference Manual for VP8. API Version 1.12

Intel Desktop Board DP67DE

Mobility: Innovation Unleashed!

Product Change Notification

Intel Platform Controller Hub EG20T

Migration Guide: Numonyx StrataFlash Embedded Memory (P30) to Numonyx StrataFlash Embedded Memory (P33)

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

Product Change Notification

2013 Intel Corporation

Product Change Notification

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Intel Desktop Board DH61CR

Intel IT Director 1.7 Release Notes

Configuring Intel Compute Stick STK2MV64CC/L for Intel AMT

Intel Open Source HD Graphics Programmers' Reference Manual (PRM)

Intel Parallel Amplifier Sample Code Guide

Transcription:

Optimizing the operations with sparse matrices on Intel architecture Gladkikh V. S. victor.s.gladkikh@intel.com Intel Xeon, Intel Itanium are trademarks of Intel Corporation in the U.S. and other countries. 1/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Technical Collateral Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site 2/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Agenda Overview of sparse BLAS Performance of different formats Performance factors Summary 3/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Sparse Matrix Storage formats Intel Math Kernel Library (Intel MKL) supports 6 sparse matrix storage format (CSR, CSC, COO, DIA, SKY, BSR): COO: The coordinate format is the most flexible and simplest format for the sparse matrix representation. Only nonzero entries are provided, and the coordinates of each nonzero entry are given explicitly. CSR: Compress Sparse Row format - The compression of the non-zeros of a sparse matrix A into a linear array is done by walking across each row in order, and writing the non-zero elements to a linear array. BSR: This format is similar to the CSR format. Nonzero entries in the BSR are square dense blocks. 4/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Sparse BLAS functionality Intel MKL Sparse BLAS is a group of computational kernels optimized for Intel architectures that perform operations: 1. Sparse matrix-vector/matrix-matrix product: C <- alpha* op(a)* B + beta * C C <- op(a)* B (Level 2 only) 2. Solving a triangular system or C <- alpha*inverse of ( op(a) ) B or C <- inverse of ( op(a) ) B (Level 2 only) where C and B are dense matrices or vectors, A is a sparse matrix, alpha and beta are scalars, op(a) is one of the possible operations: op(a)=a or op(a)= transpose of A. 5/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Scaling of sparse matrix-matrix multiply Ratio MKL_DCSRMM (MKL 10.0 Gold) 8 7 6 5 4 3 2 1 0 airfoil_2d.mtx Pres_Poisson.mtx FEM_3D_thermal1.mtx bcsstk36.mtx std1_jac2.mtx ct20stif.mtx venkat01.mtx matrix nd6k.mtx Dual-Core Intel Itanium 2 processor 9050 1.6GHz, 24MB L3 Cache laminar_duct3d.mtx time 1th/2th time 1th/4th time 1 th/8th Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations 6/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Performance of symmetric matrix-vector multiplication COO vs CSR vs BSR (MKL 10.0 Gold) 2 x Quad-Core Intel Xeon 5300 Series Processor 2.4GHz, 8MB L2 Cache 7 6 time mkl_dcoosymv/mkl_dcsrsymv time mkl_dcoosymv/mkl_dbsrsymv 5 Ratio 4 3 2 1 0 50676-block-2 101384-block-2 151940-block-2 76014-block-3 152076-block-3 227910-block-3 126015-block-5 253800-block-5 503625-block-5 126690-block-5 253460-block-5 505940-block-5 Matrix Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations 7/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Performance of symmetric matrix-vector multiplication (+converters) 3 2.5 COO vs CSR vs BSR (MKL 10.0 Gold) time mkl_dcoosymv/mkl_dcsrsymv time mkl_dcoosymv/mkl_bsrsymv 2 x Quad-Core Intel Xeon 5300 Series Processor 2.4GHz, 8MB L2 Cache 2 Ratio 1.5 1 0.5 0 33691 56197 86983 127345 178579 241981 318847 410473 786871 1830361 3534451 6061141 9572431 14230321 20196811 27633901 36703591 47567881 60388771 75328261 92548351 number of nonzeros Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations 8/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Performance factors Hardware characteristics (cache size, FSB bandwidth and etc) impact on the performance of Sparse BLAS routines. Proper choice of sparse matrix format can increase performance and scalability of your application. Memory bandwidth plays important role and impacts on performance and scalability of your application. Good matrix structure (matrix bandwidth, sparsity, ) can increase the performance of application. 9/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Thus, Intel MKL Sparse BLAS: Support for wide range of sparse formats and operations Optimization for Intel platforms Open MP support Static and dynamic Efficient and fast solution for high- performance application design 10/11 5-12 Oct 2008 Differential Equations. Function Spaces.

Q&A 11/11 5-12 Oct 2008 Differential Equations. Function Spaces.