HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

Similar documents
Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory

HPCG Results on IA: What does it tell about architecture?

Optimizing the operations with sparse matrices on Intel architecture

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Installation Guide and Release Notes

Intel Many Integrated Core (MIC) Architecture

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Bitonic Sorting Intel OpenCL SDK Sample Documentation

IXPUG 16. Dmitry Durnov, Intel MPI team

What s P. Thierry

PARDISO - PARallel DIrect SOlver to solve SLAE on shared memory architectures

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Installation Guide and Release Notes

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod

INTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool

Intel Desktop Board DZ68DB

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Drive Recovery Panel

Using the Intel VTune Amplifier 2013 on Embedded Platforms

Intel Cache Acceleration Software for Windows* Workstation

Installation Guide and Release Notes

Intel Core TM i7-4702ec Processor for Communications Infrastructure

Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3

Intel Stereo 3D SDK Developer s Guide. Alpha Release

Intel Core TM Processor i C Embedded Application Power Guideline Addendum

Software Evaluation Guide for WinZip* esources-performance-documents.html

LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015

Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Ravindra Babu Ganapathi

Intel Desktop Board DG41CN

Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability

How to Create a.cibd File from Mentor Xpedition for HLDRC

High Performance Computing The Essential Tool for a Knowledge Economy

How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

Intel Desktop Board DG31PR

Intel Desktop Board D945GCCR

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel Xeon Processor E v3 Family

Intel Desktop Board DG41RQ

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Software Evaluation Guide for Photodex* ProShow Gold* 3.2

Intel Desktop Board D946GZAB

Intel Desktop Board D975XBX2

LED Manager for Intel NUC

Intel Desktop Board D945GCLF2

Using Intel Inspector XE 2011 with Fortran Applications

SELINUX SUPPORT IN HFI1 AND PSM2

Intel 945(GM/GME)/915(GM/GME)/ 855(GM/GME)/852(GM/GME) Chipsets VGA Port Always Enabled Hardware Workaround

Vectorization Advisor: getting started

Software Evaluation Guide for CyberLink MediaEspresso *

Sarah Knepper. Intel Math Kernel Library (Intel MKL) 25 May 2018, iwapt 2018

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003

Intel Desktop Board DP55SB

Software Evaluation Guide for WinZip 15.5*

Intel s Architecture for NFV

Intel Direct Sparse Solver for Clusters, a research project for solving large sparse systems of linear algebraic equation

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Enabling DDR2 16-Bit Mode on Intel IXP43X Product Line of Network Processors

Installation Guide and Release Notes

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

H.J. Lu, Sunil K Pandey. Intel. November, 2018

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth

Krzysztof Laskowski, Intel Pavan K Lanka, Intel

Intel technologies, tools and techniques for power and energy efficiency analysis

Theory and Practice of the Low-Power SATA Spec DevSleep

Upgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence

Intel Desktop Board D945GCLF

Intel vpro Technology Virtual Seminar 2010

ECC Handling Issues on Intel XScale I/O Processors

Intel Desktop Board DP45SG

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor

Innovating and Integrating for Communications and Storage

The Intel Processor Diagnostic Tool Release Notes

Intel Desktop Board DH61SA

Software Evaluation Guide Adobe Premiere Pro CS3 SEG

Fastest and most used math library for Intel -based systems 1

Intel Desktop Board D845PT Specification Update

Product Change Notification

Intel Integrated Native Developer Experience 2015 Build Edition for OS X* Installation Guide and Release Notes

Intel Desktop Board DQ35JO

Software Evaluation Guide for Sony Vegas Pro 8.0b* Blu-ray Disc Image Creation Burning HD video to Blu-ray Disc

Intel Math Kernel Library (Intel MKL) Sparse Solvers. Alexander Kalinkin Intel MKL developer, Victor Kostin Intel MKL Dense Solvers team manager

Intel Desktop Board DH55TC

Techniques for Lowering Power Consumption in Design Utilizing the Intel EP80579 Integrated Processor Product Line

Intel & Lustre: LUG Micah Bhakti

Intel vpro Technology Virtual Seminar 2010

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel

Software Occlusion Culling

Intel Desktop Board DP67DE

Transcription:

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF 1

Outline KNL results Our other work related to HPCG 2

~47 GF/s per KNL ~10 GF/s per HSW 3

Single-Node KNL 72c Xeon Phi 7290 68c Xeon Phi 7250 64c Xeon Phi 7210 Perf. (GFLOP/s) 51.3 (flat mode) 49.4 (flat mode), 13.8 (DDR) 46.7 (flat mode) Cache mode provides a similar performance (~3% drop) MCDRAM provides >3.5x performance than DDR Easier to use than KNC Less reliance on software prefetching 2 threads per core is enough to get the best performance Smaller gap between SpMV using CSR and SELLPACK for U of Florida Matrix Collection n=192 usually gives the best results. All results are measured with quad cluster mode and code from https://software.intel.com/en-us/articles/intel-mkl-benchmarks-suite 4

Multi-Node KNL Each node in flat/quad mode connected with Omni-Path fabric (OPA) 5

Outline IA result updates Our other work related to HPCG 6

Related Work (1) Library MKL inspector-executor sparse BLAS routines https://software.intel.com/en-us/articles/intel-math-kernel-libraryinspector-executor-sparse-blas-routines SpMP open source library (https://github.com/jspark1105/spmp ) BFS/RCM reordering, task graph construction of SpTrSv and ILU, Optimizing AMG in HYPRE library Included from HYPRE 2.11.0 7

Related Work (2) Compiler Automating Wavefront Parallelization for Sparse Matrix Computations, Venkat et al., SC 16 12-core Xeon E5-2695 v2, ILU0 pre-conditioner, speedups include inspection overhead time 8

Related Work (3) Script Language Sparso: Context-driven Optimizations of Sparse Linear Algebra, Rong et al., PACT 16, https://github.com/intellabs/sparso 14-core Xeon E5-2697 v3, Julia with Sparso package 9

Notice and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications, product descriptions, and plans at any time, without notice. All products, dates, and figures are preliminary for planning purposes and are subject to change without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. The Intel products discussed herein may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's website at http://www.intel.com. Intel Itanium, Intel Xeon, Xeon Phi, Pentium, Intel SpeedStep and Intel NetBurst, Intel, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Copyright 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.. 10

Notice and Disclaimers Continued Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 11

12 Q&A