April 2 nd, Bob Burroughs Director, HPC Solution Sales

Similar documents
Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

FAST FORWARD TO YOUR <NEXT> CREATION

Intel Architecture 2S Server Tioga Pass Performance and Power Optimization

unleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer

Intel SSD Data center evolution

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Andreas Schneider. Markus Leberecht. Senior Cloud Solution Architect, Intel Deutschland. Distribution Sales Manager, Intel Deutschland

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Ultimate Workstation Performance

Accelerating Insights In the Technical Computing Transformation

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

Achieving 2.5X 1 Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization

Changpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group

High Performance Computing The Essential Tool for a Knowledge Economy

Data-Centric Innovation Summit NAVEEN RAO CORPORATE VICE PRESIDENT & GENERAL MANAGER ARTIFICIAL INTELLIGENCE PRODUCTS GROUP

A U G U S T 8, S A N T A C L A R A, C A

Jim Pappas Director of Technology Initiatives, Intel Vice-Chair, Storage Networking Industry Association (SNIA) December 07, 2018

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group

Hubert Nueckel Principal Engineer, Intel. Doug Nelson Technical Lead, Intel. September 2017

Fast forward. To your <next>

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

OPENSHMEM AND OFI: BETTER TOGETHER

Fast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud

Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017

Bei Wang, Dmitry Prohorov and Carlos Rosales

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel

Jim Harris. Principal Software Engineer. Intel Data Center Group

Intel s Architecture for NFV

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT

Intel Many Integrated Core (MIC) Architecture

H.J. Lu, Sunil K Pandey. Intel. November, 2018

Daniel Verkamp, Software Engineer

Enabling the future of Artificial intelligence

Data-Centric Innovation Summit ALPER ILKBAHAR VICE PRESIDENT & GENERAL MANAGER MEMORY & STORAGE SOLUTIONS, DATA CENTER GROUP

Agenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.

Hardware and Software Co-Optimization for Best Cloud Experience

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Mohan J. Kumar Intel Fellow Intel Corporation

Ernesto Su, Hideki Saito, Xinmin Tian Intel Corporation. OpenMPCon 2017 September 18, 2017

Innovation Accelerating Mission Critical Infrastructure

Intel HPC Portfolio September Emiliano Politano Technical Account Manager

Välkommen. Intel Anders Huge

Outline. Motivation Parallel k-means Clustering Intel Computing Architectures Baseline Performance Performance Optimizations Future Trends

What s P. Thierry

Accelerate Machine Learning on macos with Intel Integrated Graphics. Hisham Chowdhury May 23, 2018

Efficient Parallel Programming on Xeon Phi for Exascale

Move ǀ store ǀ process

Accelerating Data Center Workloads with FPGAs

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers

OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel

EPYC VIDEO CUG 2018 MAY 2018

John Hengeveld Director of Marketing, HPC Evangelist

Intel HPC Technologies Outlook

Jim Harris. Principal Software Engineer. Data Center Group

LENOVO BUSINESS SOLUTIONS AND INTEL OPTANE MEMORY

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Essential Performance and Advanced Security

Intel optane memory as platform accelerator. Vladimir Knyazkin

Arm Processor Technology Update and Roadmap

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Deep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

Agenda. Introduction Network functions virtualization (NFV) promise and mission cloud native approach Where do we want to go with NFV?

Intel Performance Libraries

SOLUTIONS BRIEF: Transformation of Modern Healthcare

Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs

Knights Corner: Your Path to Knights Landing

Intel Parallel Studio XE 2015

Graphics Performance Analyzer for Android

HPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel

Real World Development examples of systems / iot

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

Sayantan Sur, Intel. ExaComm Workshop held in conjunction with ISC 2018

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor

Jim Cownie, Johnny Peyton with help from Nitya Hariharan and Doug Jacobsen

Intel Xeon Processor E v3 Family

POWER YOUR CREATIVITY WITH THE INTEL CORE X-SERIES PROCESSOR FAMILY

Future of datacenter STORAGE. Carol Wilder, Niels Reimers,

Case Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing

Trends in systems and how to get efficient performance

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Desktop 4th Generation Intel Core, Intel Pentium, and Intel Celeron Processor Families and Intel Xeon Processor E3-1268L v3

DEVITO AUTOMATED HIGH-PERFORMANCE FINITE DIFFERENCES FOR GEOPHYSICAL EXPLORATION

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

Ed Warnicke, Cisco. Tomasz Zawadzki, Intel

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Intel Cluster Checker 3.0 webinar

Data center: The center of possibility

Jacek Czaja, Machine Learning Engineer, AI Product Group

Leading at the edge TECHNOLOGY AND MANUFACTURING DAY

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

6th Generation Intel Core Processor Series

Kirill Rogozhin. Intel

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth

Transcription:

April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales

Today - Introducing 2 nd Generation Intel Xeon Scalable Processors

how Intel Speeds HPC performance Work Time System Peak Efficiency Software Optimized 2 nd Generation Intel Xeon Scalable Processors speed HPC performance in ways that matter AVX 512 Memory Bandwidth Intel DL Boost Intel Optane DC persistent memory Architecture Performance Multi-node Scaling Software Tools Performance Libraries Development Tools New technologies to exceed performance boundaries Strong efficiency through strong hardware Realize value with minimum work

2 nd Gen Intel Xeon Scalable Processors Architected for HPC & AI Architecture Performance Boost Applications Amdahl s Law requires hardware improvement for real workload performance increases Up to 1.7x better Floating Point perf/core 1 Intel AVX-512 Operation Efficiency Special instructions optimized for HPC workloads and integrated into Intel Xeon Scalable Processors 2x better than AVX2 2 Intel Xeon Platinum 9200 Processors Compute Density New class of Intel Xeon CPUs to boost performance for memory intensive HPC workloads 2x memory channels per CPU Intel DL Boost AI Convergence Converge HPC & AI Solutions to execute AI on HPC infrastructure combined with radical AI performance increases on Intel Xeon 14x over 1 st Gen Intel Xeon SP 3 Intel Optane DC persistent memory for HPC Lower Cost More Memory Use Intel Optane DC persistent memory over DDR to solve larger problems at lower cost; Discover new use cases with Intel Optane technology 1 1-copy SPECrate2017_fp_base* 2 socket Intel 8280 vs 2 socket AMD EPYC 7601; config details slide 12 2 Theoretical performance when comparing AVX512 to AVX2. Details on slide 11 3 Platinum 8210 vs Plat 8180 at launch; config details on slide 12

Time (%) Strong Compute Architecture Matters Amdahl s Law Programs can only be accelerated as much as the serial component of that program Single Node Real Performance Impact CosmoFlow Application predicting cosmological parameters by analyzing dark matter distributions in the universe Single Node Breakdown 1 100% 75% 50% 25% 20% 36% 32% 16% 22% 61% S P 0% 13% Master Node Worker Node Multi-Node MPI S S P P Wait Time ML Compute Speeding up each node reduces the amount of waiting time Active Compute Other 1 Profile of time spent in various stages of the CosmoFlow application on a single Intel Xeon Phi Processor 7250 (KNL) node. The stages are OpenMP spin time and overhead, non-convolutional computational time, 3D convolutions, CPE ML Plugin, other time, Linux kernel time, and TensorFlow framework time.. Configurations: See slide 11

6 Intel AVX-512: accelerating hpc & ai applications CPU Performance # of CPU Cores FP64: 8 FLOPs/cycle FP32: 16 FLOPs/cycle Core Frequency Operation Efficiency AVX-512 FP64: 32 FLOPs/cycle FP32: 64 FLOPs/cycle Int8: 256 OPs/cycle AVX2 vs AVX512 Real Performance Gain 8260L running FSI Apps 2 1.5 1 (higher is better) 1.5 1.8 1.2 1.9 1.3 AVX-2 Vector,1 FMA AVX-512 Vector, 2 FMA AVX-512 Optimized for HPC & AI 0.5 DP SP UP TO 4X DP SP Int8 HPC DL Training DL Inference DP SP Int8 HPC DL Training DL Inference 0 Monte Carlo 256 Wide Vector Double wide vector Double the FMAs 512 (Double) Wide Vector More operations per cycle AVX2 AVX512 Performance results are based on testing as of 02/28/2019 and may not reflect all publicly available security updates. See configuration disclosure on slide 11. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

A new class of Advanced Performance Intel Xeon platinum 9200 processors for your most demanding workloads Extending Intel Xeon Processor Leadership for High Performant Workloads Performance leadership 9200 series with Up to 56 cores Unprecedented Memory Bandwidth 12 DDR4 memory channels per CPU at 2933 MT/s Optimized Multi Chip Package High speed interconnect for increased compute density 80 PCIe Gen3 lanes per node 7

8 HPC AI convergence Increasing AI performance on Intel Xeon PROCESSORS Intel Optimizations for Caffe ResNet-50 Inference Throughput Performance Intel DL Boost Theoretical Throughput per core over 1 st Generation Intel Xeon Scalable Processors BASE SKX launch July 2017 5.7x 1 vs. BASE 14x 1 30x 1 vs. BASE vs. BASE 1 s Gen Xeon-SP FP32 Up to 1.3x Faster throughput, but inefficient Uses 3 instructions per operation 1 st Gen Xeon-SP Int8 3 Instructions VPMADDUBSW VPMADDWD VPADDD 2S Intel Xeon Platinum 8180 processor (28 cores/s) 1 st Generation Intel Xeon Scalable Processor 2S Intel Xeon Platinum 8280 processor (28 cores/s) 2S Intel Xeon Platinum 9282 processor (56 cores/s) 2nd Generation Intel Xeon Scalable Processor 1 st Gen Xeon-SP Int8 Up to 3x 2 nd Gen Xeon-SP Int8 w/ Intel DL Boost DL Boost fixes this, combines 3 instructions into 1 1 Instruction VPDPBUSD 1 Based on Intel internal testing: 1X,5.7x,14x and 30x performance improvement based on Intel Optimization for Café ResNet-50 inference throughput performance on Intel Xeon Scalable Processor. See Configuration Details slide 13 Performance results are based on testing as of 7/11/2017(1x),11/8/2018 (5.7x), 2/20/2019 (14x) and 2/26/2019 (30x) and may not reflect all publically available security updates. No product can be absolutely secure. See configuration slide 13 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance

Recently announced: Intel to Build First Exascale Supercomputer for U.S. doe Intel partnered with Argonne National Laboratory driving the convergence of HPC and AI Aurora will accelerate the convergence of traditional HPC, data analytics, and AI Intel s data centric portfolio at the heart of Aurora, integrated with Cray s Shasta system Achieving Exascale is imperative not only to better the scientific community, but also to better the lives of everyday Americans. Aurora and the next-generation of Exascale supercomputers will apply HPC and AI technologies to areas such as cancer research, climate modeling, and veterans health treatments. The innovative advancements that will be made with Exascale will have an incredibly significant impact on our society. - Rick Perry, US Secretary of Energy Leading research & academia programs already engaged to harness Aurora and enable software ecosystem

1 Notices & Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks. Intel Advanced Vector Extensions (Intel AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo. Intel's compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. *Other names and brands may be claimed as property of others. Intel, the Intel logo, Xeon, and Optane are trademarks of Intel Corporation in the United States and other countries. 2019 Intel Corporation.