What s P. Thierry

Similar documents
Intel Many Integrated Core (MIC) Architecture

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017

Installation Guide and Release Notes

HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Bitonic Sorting. Intel SDK for OpenCL* Applications Sample Documentation. Copyright Intel Corporation. All Rights Reserved

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Bitonic Sorting Intel OpenCL SDK Sample Documentation

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Intel Parallel Studio XE 2011 for Windows* Installation Guide and Release Notes

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

Installation Guide and Release Notes

Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory

Ravindra Babu Ganapathi

Kirill Rogozhin. Intel

Vectorization Advisor: getting started

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature

Non-Volatile Memory Cache Enhancements: Turbo-Charging Client Platform Performance

Forging a Future in Memory: New Technologies, New Markets, New Applications. Ed Doller Chief Technology Officer

Intel Software Development Products Licensing & Programs Channel EMEA

Intel s Architecture for NFV

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

Intel Desktop Board DZ68DB

INTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Installation Guide and Release Notes

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel

Drive Recovery Panel

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Intel Parallel Studio XE 2011 SP1 for Linux* Installation Guide and Release Notes

Bei Wang, Dmitry Prohorov and Carlos Rosales

12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS. Presented by Phil Cayton Intel Corporation. April 6th, 2016

Theory and Practice of the Low-Power SATA Spec DevSleep

SELINUX SUPPORT IN HFI1 AND PSM2

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Using Intel VTune Amplifier XE for High Performance Computing

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

Intel Desktop Board D945GCCR

HPCG Results on IA: What does it tell about architecture?

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

High Performance Computing The Essential Tool for a Knowledge Economy

Installation Guide and Release Notes

Intel Desktop Board DH55TC

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3

HPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel

Intel Core TM i7-4702ec Processor for Communications Infrastructure

Intel Desktop Board D975XBX2

IXPUG 16. Dmitry Durnov, Intel MPI team

Intel technologies, tools and techniques for power and energy efficiency analysis

Intel G31/P31 Express Chipset

Intel Cache Acceleration Software for Windows* Workstation

Intel Parallel Studio XE 2011 for Linux* Installation Guide and Release Notes

Intel Desktop Board DP55SB

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003

Graphics Performance Analyzer for Android

Intel Core TM Processor i C Embedded Application Power Guideline Addendum

Innovating and Integrating for Communications and Storage

Intel Desktop Board D945GCLF2

Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany

Outline. Motivation Parallel k-means Clustering Intel Computing Architectures Baseline Performance Performance Optimizations Future Trends

Intel Xeon Phi Coprocessor

Intel Desktop Board D946GZAB

Intel Desktop Board DG41CN

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Performance Profiler. Klaus-Dieter Oertel Intel-SSG-DPD IT4I HPC Workshop, Ostrava,

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod

H.J. Lu, Sunil K Pandey. Intel. November, 2018

Intel Atom Processor E6xx Series Embedded Application Power Guideline Addendum January 2012

Solid-State Drive System Optimizations In Data Center Applications

Using Intel Inspector XE 2011 with Fortran Applications

Intel Desktop Board DG31PR

Intel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*

Achieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013

Intel & Lustre: LUG Micah Bhakti

Expand Your HPC Market Reach and Grow Your Sales with Intel Cluster Ready

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division

LED Manager for Intel NUC

Software Evaluation Guide for CyberLink MediaEspresso *

Intel Xeon Phi coprocessor (codename Knights Corner) George Chrysos Senior Principal Engineer Hot Chips, August 28, 2012

Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012

Intel Stereo 3D SDK Developer s Guide. Alpha Release

Intel Desktop Board D945GCLF

Migration Guide: Numonyx StrataFlash Embedded Memory (P30) to Numonyx StrataFlash Embedded Memory (P33)

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation

Optimizing the operations with sparse matrices on Intel architecture

Introduction. How it works

Memory & Thread Debugger

What s New August 2015

6th Generation Intel Core Processor Series

The Transition to PCI Express* for Client SSDs

Using the Intel VTune Amplifier 2013 on Embedded Platforms

LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015

The Intel Processor Diagnostic Tool Release Notes

Intel Xeon Phi Coprocessor Performance Analysis

Intel Cluster Checker 3.0 webinar

Intel SDK for OpenCL* - Sample for OpenCL* and Intel Media SDK Interoperability

Transcription:

What s new@intel P. Thierry Principal Engineer, Intel Corp philippe.thierry@intel.com CPU trend Memory update Software Characterization in 30 mn

10 000 feet view CPU : Range of few TF/s and <200 GB/s per node. Large memory footprint. Standard programing model (PM) MIC : bootable. hundreds of threads. 3TF/s DP, 500GB/s HBM, Standard PM FPGA : Discrete and with Xeon CPU. 1.5 TF/s A10 (now) GenGraphics: Single socket CPU. 45W. 1.5 GF/s SP. DDR4 and edram. Omp and Ocl ASIC : > 50 Tops/s, HBM low level PM for now + SSD/ NVM + Parallel File System + HPC Software stack, Compilers, Math Libraries

10 000 feet view 2017 2018 Workload Optimized Nervana Processor Altera FPGA Arria 10 FPGA Nervana Buena Vista Crest Stratix 10 FPGA Nervana Engine https://www.nervanasys.com/technology/engine/ General Purpose Infrastructure Intel Xeon Phi Intel Xeon SP (2S) Knights Landing Groveport Platform Skylake Purley Platform Knights Mill Groveport Platform Patented FlexPoint precision for maximum Tput and high accuracy Over Tera b/s of inter/intra connectivity for optimal scaling Standard PCIe Gen3x16

Storage Evolution Storage class memory On package memory Memory Caches DRAM Caches HBM Caches HBM Caches HBM Storage PCIe SSD SSD DRAM SSD DRAM 3D XPoint 3D XPoint 3D XPoint HDD HDD SSD HDD SSD HDD Today Tomorrow 3D XPoint 3D XPoint DIMM NAND based Intel P3700 (Fultondale) for NVMe 4 / 16

3D Xpoint : DIMM form factor Flexible, Usage Specific Partitions Server Non-Volatile Memory Pool IMC IMC Apache Pass* DDR4 DRAM* DRAM, or DRAM as cache Memory Storage App Direct DDR4 electrical & physical Close to DRAM latency 5 / 16

Compiler, mkl, ipp, MPI, Openmp, Opencl vtune, advisor, itac, inspector.. CPU trend Memory update Software Characterization in 5 mn

Free download: http://www.intel.com/performance-snapshot. Also included with Intel Parallel Studio Cluster Edition. Intel Confidential claimed as for the property use of with others. All customers products, dates, and figures under are preliminary NDA 7 and are subject to change without any notice. Copyright 2015, Intel Corporation.

Storage performance snapshot Intel Confidential for use with customers under NDA 8

Intel Advisor - Vectorization Advisor The data and guidance you need: Compiler diagnostics + Performance Data + SIMD efficiency Detect problems & recommend fixes Loop-Carried Dependency Analysis Memory Access Patterns Analysis Intel Confidential for use with customers under NDA 9

Roofline Model. High level characterization R.M gives the max achievable performance on a given platform FP roof / ( ) = min BW roof See where your application stands And what you can expect AI Application AIPlatform = Pf / Pb AI: arithmetic intensity p f : peak FP p b : peak bandwidth D. Lazowska, J. Zahorjan, G. Graham, K. Sevcik, Quantitative System Performance (1984) S. Williams, Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures. (2009)

Performance Characterization BWD bound In between CPU bound

Temporal roofline: Application phases identification An application with 2 phases in the extremities on E5-2697 v2 Bandwidth bound: Stream CPU bound: HPL The scalar AI and flops are averaged and are not representative of the application evolution. Temporal roofline identifies these phases distinctively. Courtesy of A. Mrabet et al. 2015

Cache aware roofline

Cache aware roofline Know where you are vs peaks what to expect Per block view for the whole apps. Linked to source and assembly

How to collect Flops and Bytes (AI definition) = SDE Possible for future architectures Average over execution time Hardware counters Not always available Vtune, PCM, LIKWID, PAPI By hands Not always possible Hardware counters Vtune, PCM, LIKWID, PAPI By hands Not always possible How to define memory demand without cache impact or latency? DRAM demand : how many DRAM transactions a workload 'wants' to do. versus DRAM BW : the no. of DRAM transactions completed per unit time

Programming model Compiler: Auto-vectorization (no change of code) Compiler: Auto-vectorization hints (#pragma vector, ) Compiler: Intel Cilk Plus Array Notation Extensions SIMD intrinsic class (e.g.: F32vec, F64vec, ) Vector intrinsic (e.g.: _mm_fmadd_pd( ), _mm_add_ps( ), ) Assembly code (e.g.: [v]addps, [v]addss, ) Ease of use Programmer control Python, Matlab C, F90, OMP OCL Cuda Assembly 16

Questions?

Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life-saving, life-sustaining, critical control or safety systems, or in nuclear facility applications. Intel products may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel may make changes to dates, specifications, product descriptions, and plans referenced in this document at any time, without notice. This document may contain information on products in the design phase of development. The information herein is subject to change without notice. Do not finalize a design with this information. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Intel Corporation or its subsidiaries in the United States and other countries may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Wireless connectivity and some features may require you to purchase additional software, services or external hardware. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations Intel, the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. Copyright 2015 Intel Corporation. All rights reserved. 19 / 16

Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright 2015, Intel Corporation. All rights reserved. Intel, the Intel logo, Atom, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 20 / 16