Dr Christopher Dahnken. SSG DRD EMEA Datacenter
|
|
- Gwen Atkinson
- 5 years ago
- Views:
Transcription
1 Dr Christopher Dahnken SSG DRD EMEA Datacenter
2 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright 2017, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi,, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel s compilers may or may not optimize to the same degree for non-intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #
3 Agenda Xeon Phi: Knights Landing Xeon: Sky Lake Short outlook Non-Volatile Memory/Optane
4
5 KNL Architecture Overview ISA Intel Xeon Processor Binary-Compatible (w/broadwell) On-package memory Up to 16GB, ~500 GB/s STREAM at launch Platform Memory Up to 384GB (6ch MHz) Fixed ttlenecks 2D Mesh Architecture Out-of-Order s 3x single-thread vs. KNC x4 DMI2 to PCH 36 Lanes PCIe* Gen3 (x16, x16, x4) KNL Package TILE: (up to 36) 2VPU HUB 1MB L2 2VPU 4 4 Enhanced Intel Atom cores based on Silvermont Microarchitecture EDC (embedded DRAM controller) IMC (integrated memory controller) IIO (integrated I/O controller)
6 KNL Mesh Interconnect Mesh of Rings OPIO OPIO PCIe OPIO OPIO Every row and column is a (half) ring EDC EDC IIO EDC EDC YX routing: Go in Y Turn Go in X Messages arbitrate at injection and on turn imc imc Coherent Interconnect MESIF protocol (F = Forward) Distributed directory to filter snoops EDC EDC Misc EDC EDC OPIO OPIO OPIO OPIO Three Cluster Modes (1) All-to-All (2) Quadrant (3) Sub-NUMA Clustering 6
7 Cluster Mode: All-to-All OPIO OPIO PCIe OPIO OPIO Address uniformly hashed across all distributed directories EDC EDC IIO EDC EDC 1 4 imc imc 3 No affinity between, Directory and Memory Lower performance mode, compared to other modes. Mainly for fall-back EDC EDC Misc EDC EDC OPIO OPIO OPIO OPIO 1) L2 miss, 2) Directory access, 3) Memory access, 4) Data return 2 Typical Read L2 miss 1. L2 miss encountered 2. Send request to the distributed directory 3. Miss in the directory. Forward to memory 4. Memory sends the data to the requestor 7
8 Cluster Mode: Quadrant OPIO OPIO PCIe OPIO OPIO EDC EDC IIO EDC EDC 3 Chip divided into four virtual Quadrants 1 4 imc imc 2 Address hashed to a Directory in the same quadrant as the Memory Affinity between the Directory and Memory EDC EDC Misc EDC EDC Lower latency and higher BW than all-to-all. SW Transparent. OPIO OPIO OPIO OPIO 1) L2 miss, 2) Directory access, 3) Memory access, 4) Data return 8
9 Cluster Mode: Sub-NUMA Clustering (SNC) OPIO OPIO PCIe OPIO OPIO EDC EDC IIO EDC EDC 3 Each Quadrant (Cluster) exposed as a separate NUMA domain to OS. imc imc Looks analogous to 4-Socket Xeon Affinity between, Directory and Memory Local communication. Lowest latency of all modes. EDC EDC Misc EDC EDC OPIO OPIO OPIO OPIO 1) L2 miss, 2) Directory access, 3) Memory access, 4) Data return SW needs to NUMA optimize to get benefit. 9
10 KNL and VPU Out-of-order core w/ 4 SMT threads VPU tightly integrated with core pipeline 2-wide decode/rename/retire 2x 64B load & 1 64B store port for D$ L1 prefetcher and L2 prefetcher Fast unaligned and cache-line split support Fast gather/scatter support 10
11 Physical Address KNL Memory Modes Model Mode selected at boot - covers all Hybrid Model Flat Models 11
12 : vs Flat Mode Recommended Only as Only Flat + Hybrid Software Effort Performance No software changes required Not peak performance. Change allocations for bandwidth-critical data. Best performance. Limited memory capacity Optimal HW utilization + opportunity for new algorithms 12
13 KNL Instruction Set ER BW PF DQ CDI CDI AVX-512 F AVX-512 F TSX TSX AVX2 AVX2 AVX2 AVX AVX AVX AVX SSE* SSE* SSE* SSE* SSE* Xeon 5600 Nehalem Xeon E Sandy Bridge Xeon E5-2600v3 Haswell Xeon Phi Knights Landing Xeon Sky Lake 13
14
15 2-socket+ Intel Xeon Roadmap Thurley Platform Romley Platform Grantley Platform Purley Platform Intel Microarchitecture Codenamed Nehalem Intel Microarchitecture Codenamed Sandy Bridge Intel Microarchitecture Codenamed Haswell Intel Microarchitecture Codenamed Skylake Nehalem Westmere Sandy Bridge Ivy Bridge Haswell Broadwell Skylake Future 45nm 32nm 32nm 22nm 22nm 14nm 14nm 14nm New Microarchitecture New Microarchitecture New Microarchitecture New Microarchitecture Brickland Platform is Ivy Bridge-EX, Haswell-EX, and Broadwell-EX Skylake microarchitecture delivers ~10% (geomean) IPC improvement v. Broadwell 16
16 New Skylake Uncore Interconnect Architecture Broadwell Server 24-core die dual-ring interconnect Skylake (or Cascade Lake) Server 28-core die mesh interconnect QPI QPI Link Link R3QPI QPI Agent PCI-E PCI-E PCI-E PCI-E X16 X16 X8 X4 (ESI) Ux PCU CB DMA R2PCI IOAPIC IIO 2x UPI x20 PCIe* * x16 PCIe x16 DMI x 4 CBDMA On Pkg PCIe x16 1x UPI x20 PCIe x16 U D P N U D P N D U N P U D P N U D P N D U N P D U N P 4 MC 4 4 MC U D P N D U N P U D P N D U N P U D P N D U N P U D P N UP DN Home Agent Mem Ctlr Home Agent Mem Ctlr CHA Caching & Home Agent SF Snoop Filter Mesh interconnect (Skylake Server) replaces dual-ring interconnect (BDW E5/E7) 17
17 VEC INT Microarchitecture Enhancements Front End 32KB L1 I$ Pre decode Inst Q Load Buffer Store Buffer Port 0 Port 1 ALU Shift JMP 2 FMA ALU Shift DIV ALU LEA MUL FMA ALU Shift Branch Prediction Unit Reorder Buffer Port 5 ALU LEA FMA ALU Shuffle Port 6 ALU Shift JMP 1 Load Data 2 Load Data 3 Scheduler Port 4 Store Data 1MB L2$ Decoders μop Allocate/Rename/Retire Port 2 Load/STA Memory Control Fill Buffers 5 6 Port 3 Load/STA Fill Buffers Port 7 STA 32KB L1 D$ μop Queue In order OOO Memory Broadwell uarch Skylake uarch Out-of-order Window In-flight Loads + Stores Scheduler Entries Registers Integer + FP Allocation Queue 56 64/thread L1D BW (B/Cyc) Load + Store L2 Unified TLB 4K+2M: K+2M: G: 16 Larger and improved branch predictor, higher throughput decoder, larger window to extract ILP Improved scheduler and execution engine, improved throughput and latency of divide/sqrt More load/store bandwidth, deeper load/store buffers, improved prefetcher Data center specific enhancements: Intel AVX-512 with 2 FMAs per core, larger 1MB MLC About 10% performance improvement per core on integer applications at same frequency 18
18 Intel Xeon Scalable Processor Feature Overview 10GbE 3x16 PCIe* Gen3 Skylake-SP CPU OPA DMI Intel QAT ME IE 4x10GbE NIC Lewisburg PCH TPM 2 or 3 Intel UPI High Speed IO GPIO x 100Gb OPA Fabric SPI USB3 PCIe3 SATA3 espi/lpc Firmware 3x16 PCIe Gen3 Skylake-SP CPU OPA BMC 1x 100Gb OPA Fabric CPU VRs OPA VRs Mem VRs Firmware BMC: Baseboard Management Controller PCH: Intel Platform Controller Hub IE: Innovation Engine Intel OPA: Intel Omni-Path Architecture Intel QAT: Intel QuickAssist Technology ME: Manageability Engine NIC: Network Interface Controller VMD: Volume Management Device NTB: Non-Transparent Bridge Feature Socket Scalability CPU TDP Chipset Networking Compression and Crypto Acceleration Storage Security Manageability Details Socket P 2S, 4S, 8S, and >8S (with node controller support) 70W 205W Intel C620 Series (code name Lewisburg) Intel Omni-Path Fabric (integrated or discrete) 4x10GbE (integrated w/ chipset) 100G/40G/25G discrete options Intel QuickAssist Technology to support 100Gb/s comp/decomp/crypto 100K RSA2K public key Integrated QuickData Technology, VMD, and NTB Intel Optane SSD, Intel 3D-NAND NVMe & SATA SSD CPU enhancements (MBE, PPK, MPX) Manageability Engine Intel Platform Trust Technology Intel Key Protection Technology Innovation Engine (IE) Intel Node Manager Intel Datacenter Manager 19
19 Platform Topologies 2S Configurations 4S Configurations 8S Configuration LBG LBG Intel UPI DMI LBG ** 3x16 PCIe* 1x100G Intel OP Fabric x4 3x16 PCIe* 1x100G Intel OP Fabric LBG LBG (2S-2UPI & 2S-3UPI shown) DMI LBG 3x16 PCIe* (4S-2UPI & 4S-3UPI shown) Intel Xeon Scalable Processor supports configurations ranging from 2S-2UPI to 8S DMI LBG 3x16 PCIe* LBG 20
20
Copyright 2017 Intel Corporation
Agenda Intel Xeon Scalable Platform Overview Architectural Enhancements 2 Platform Overview 3x16 PCIe* Gen3 2 or 3 Intel UPI 3x16 PCIe Gen3 Capabilities Details 10GbE Skylake-SP CPU OPA DMI Intel C620
More informationIFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor
IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization
More informationHubert Nueckel Principal Engineer, Intel. Doug Nelson Technical Lead, Intel. September 2017
Hubert Nueckel Principal Engineer, Intel Doug Nelson Technical Lead, Intel September 2017 Legal Disclaimer Intel technologies features and benefits depend on system configuration and may require enabled
More informationThe Intel Xeon PHI Architecture
The Intel Xeon PHI Architecture (codename Knights Landing ) Dr. Christopher Dahnken Senior Application Engineer Software and Services Group Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT
More informationIntel Xeon Scalable Processor for HPC 나승구이사
Intel Xeon Scalable Processor for HPC 나승구이사 Growing Challenges in HPC System Bottlenecks The Walls Divergent Workloads Machine learning hpc Big Data visualization Barriers to Extending Usage Optimizing
More informationIntel Architecture for Software Developers
Intel Architecture for Software Developers 1 Agenda Introduction Processor Architecture Basics Intel Architecture Intel Core and Intel Xeon Intel Atom Intel Xeon Phi Coprocessor Use Cases for Software
More informationMike Greenfield, Intel MultiCore 7 Workshop September 27 and 28, 2017 National Center for Atmospheric Research in Boulder, Colorado
Mike Greenfield, Intel Multi 7 Workshop September 27 and 28, 2017 National Center for Atmospheric Research in ulder, Colorado * Legal Disclaimers Intel technologies may require enabled hardware, specific
More informationDisclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme
FUT3056BU VMware vsphere Scales on the Amazing Next-Gen Intel Xeon Architecture VMworld 2017 Content: Not for publication Tom Adelmeyer, Richard A. Brunner, Principal Engineer, Intel Principal Engineer,
More informationVisualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature. Intel Software Developer Conference London, 2017
Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference London, 2017 Agenda Vectorization is becoming more and more important What is
More informationIXPUG 16. Dmitry Durnov, Intel MPI team
IXPUG 16 Dmitry Durnov, Intel MPI team Agenda - Intel MPI 2017 Beta U1 product availability - New features overview - Competitive results - Useful links - Q/A 2 Intel MPI 2017 Beta U1 is available! Key
More informationIntel Many-Core Processor Architecture for High Performance Computing
Intel Many-Core Processor Architecture for High Performance Computing Andrey Semin Principal Engineer, Intel IXPUG, Moscow June 1, 2017 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH
More informationBei Wang, Dmitry Prohorov and Carlos Rosales
Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture
More informationH.J. Lu, Sunil K Pandey. Intel. November, 2018
H.J. Lu, Sunil K Pandey Intel November, 2018 Issues with Run-time Library on IA Memory, string and math functions in today s glibc are optimized for today s Intel processors: AVX/AVX2/AVX512 FMA It takes
More informationIntel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant
Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant Parallel is the Path Forward Intel Xeon and Intel Xeon Phi Product Families are both going parallel Intel Xeon processor
More informationKevin O Leary, Intel Technical Consulting Engineer
Kevin O Leary, Intel Technical Consulting Engineer Moore s Law Is Going Strong Hardware performance continues to grow exponentially We think we can continue Moore's Law for at least another 10 years."
More informationFAST FORWARD TO YOUR <NEXT> CREATION
FAST FORWARD TO YOUR CREATION THE ULTIMATE PROFESSIONAL WORKSTATIONS POWERED BY INTEL XEON PROCESSORS 7 SEPTEMBER 2017 WHAT S NEW INTRODUCING THE NEW INTEL XEON SCALABLE PROCESSOR BREAKTHROUGH PERFORMANCE
More informationIntel s Architecture for NFV
Intel s Architecture for NFV Evolution from specialized technology to mainstream programming Net Futures 2015 Network applications Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION
More informationIntroducing Sandy Bridge
Introducing Sandy Bridge Bob Valentine Senior Principal Engineer 1 Sandy Bridge - Intel Next Generation Microarchitecture Sandy Bridge: Overview Integrates CPU, Graphics, MC, PCI Express* On Single Chip
More informationAccelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing
Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product
More informationVisualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature
Visualizing and Finding Optimization Opportunities with Intel Advisor Roofline feature Intel Software Developer Conference Frankfurt, 2017 Klaus-Dieter Oertel, Intel Agenda Intel Advisor for vectorization
More informationIntel HPC Technologies Outlook
Intel HPC Technologies Outlook Andrey Semin Principal Engineer, HPC Technology Manager, EMEA October 19 th, 2015 ZKI Tagung AK Supercomputing Munich, Germany Legal Disclaimers INFORMATION IN THIS DOCUMENT
More informationUltimate Workstation Performance
Product brief & COMPARISON GUIDE Intel Scalable Processors Intel W Processors Ultimate Workstation Performance Intel Scalable Processors and Intel W Processors for Professional Workstations Optimized to
More informationVectorization Advisor: getting started
Vectorization Advisor: getting started Before you analyze Run GUI or Command Line Set-up environment Linux: source /advixe-vars.sh Windows: \advixe-vars.bat Run GUI or Command
More informationIntel Knights Landing Hardware
Intel Knights Landing Hardware TACC KNL Tutorial IXPUG Annual Meeting 2016 PRESENTED BY: John Cazes Lars Koesterke 1 Intel s Xeon Phi Architecture Leverages x86 architecture Simpler x86 cores, higher compute
More informationIntel Xeon Phi coprocessor (codename Knights Corner) George Chrysos Senior Principal Engineer Hot Chips, August 28, 2012
Intel Xeon Phi coprocessor (codename Knights Corner) George Chrysos Senior Principal Engineer Hot Chips, August 28, 2012 Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
More informationGraphics Performance Analyzer for Android
Graphics Performance Analyzer for Android 1 What you will learn from this slide deck Detailed optimization workflow of Graphics Performance Analyzer Android* System Analysis Only Please see subsequent
More informationThe Intel Xeon Phi Coprocessor. Dr-Ing. Michael Klemm Software and Services Group Intel Corporation
The Intel Xeon Phi Coprocessor Dr-Ing. Michael Klemm Software and Services Group Intel Corporation (michael.klemm@intel.com) Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED
More informationIntel Architecture for HPC
Intel Architecture for HPC Georg Zitzlsberger georg.zitzlsberger@vsb.cz 1st of March 2018 Agenda Salomon Architectures Intel R Xeon R processors v3 (Haswell) Intel R Xeon Phi TM coprocessor (KNC) Ohter
More informationEfficient Parallel Programming on Xeon Phi for Exascale
Efficient Parallel Programming on Xeon Phi for Exascale Eric Petit, Intel IPAG, Seminar at MDLS, Saclay, 29th November 2016 Legal Disclaimers Intel technologies features and benefits depend on system configuration
More informationWhat s P. Thierry
What s new@intel P. Thierry Principal Engineer, Intel Corp philippe.thierry@intel.com CPU trend Memory update Software Characterization in 30 mn 10 000 feet view CPU : Range of few TF/s and
More informationIntel Advisor XE. Vectorization Optimization. Optimization Notice
Intel Advisor XE Vectorization Optimization 1 Performance is a Proven Game Changer It is driving disruptive change in multiple industries Protecting buildings from extreme events Sophisticated mechanics
More informationRavindra Babu Ganapathi Product Owner/ Technical Lead Omni Path Libraries, Intel Corp. Sayantan Sur Senior Software Engineer, Intel Corp.
Ravindra Babu Ganapathi Product Owner/ Technical Lead Omni Path Libraries, Intel Corp. Sayantan Sur Senior Software Engineer, Intel Corp. Legal All information provided here is subject to change without
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2016 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture How Programming
More informationEPYC VIDEO CUG 2018 MAY 2018
AMD UPDATE CUG 2018 EPYC VIDEO CRAY AND AMD PAST SUCCESS IN HPC AMD IN TOP500 LIST 2002 TO 2011 2011 - AMD IN FASTEST MACHINES IN 11 COUNTRIES ZEN A FRESH APPROACH Designed from the Ground up for Optimal
More informationECE 571 Advanced Microprocessor-Based Design Lecture 18
ECE 571 Advanced Microprocessor-Based Design Lecture 18 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 5 April 2018 HW#9 will be posted Announcements 1 Reading of the Article
More informationDPDK Intel Cryptodev Performance Report Release 17.11
DPDK Intel Cryptodev Performance Report Test Date: Nov 20th 2017 Author: Intel DPDK Validation team Revision History Date Revision Comment Nov 20th, 2017 1.0 Initial document for release 2 Contents Audience
More informationIntel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor
Technical Resources Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS
More informationPhilippe Thierry Sr Staff Engineer Intel Corp.
HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market
More informationDPDK Intel Cryptodev Performance Report Release 18.08
DPDK Intel Cryptodev Performance Report Test Date: August 7th 2018 Author: Intel DPDK Validation team Revision History Date Revision Comment August 7th, 2018 1.0 Initial document for release 2 Contents
More informationunleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer
the future unleashed Alexey Belogortsev Field Application Engineer Intel Xeon Scalable Processors for High Performance Computing Growing Challenges in System Architecture The Walls System Bottlenecks Divergent
More information12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS. Presented by Phil Cayton Intel Corporation. April 6th, 2016
12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS Presented by Phil Cayton Intel Corporation April 6th, 2016 NVM Express * Organization Scaling NVMe in the datacenter Architecture / Implementation Overview Standardization
More informationLIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015
LIBXSMM Library for small matrix multiplications. Intel High Performance and Throughput Computing (EMEA) Hans Pabst, March 12 th 2015 Abstract Library for small matrix-matrix multiplications targeting
More informationInnovating and Integrating for Communications and Storage
Innovating and Integrating for Communications and Storage Stephen Price Director of Marketing Performance Platform Division Embedded and Communications Group September 2009 WHAT IS THE NEWS? New details
More informationGrowth in Cores - A well rehearsed story
Intel CPUs Growth in Cores - A well rehearsed story 2 1. Multicore is just a fad! Copyright 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
More informationCapability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL
SABELA RAMOS, TORSTEN HOEFLER Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL spcl.inf.ethz.ch Microarchitectures are becoming more and more complex CPU L1 CPU L1 CPU L1 CPU
More informationHPCG on Intel Xeon Phi 2 nd Generation, Knights Landing. Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF
HPCG on Intel Xeon Phi 2 nd Generation, Knights Landing Alexander Kleymenov and Jongsoo Park Intel Corporation SC16, HPCG BoF 1 Outline KNL results Our other work related to HPCG 2 ~47 GF/s per KNL ~10
More informationExpressing and Analyzing Dependencies in your C++ Application
Expressing and Analyzing Dependencies in your C++ Application Pablo Reble, Software Engineer Developer Products Division Software and Services Group, Intel Agenda TBB and Flow Graph extensions Composable
More informationOpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel
OpenMP * 4 Support in Clang * / LLVM * Andrey Bokhanko, Intel Clang * : An Excellent C++ Compiler LLVM * : Collection of modular and reusable compiler and toolchain technologies Created by Chris Lattner
More informationPerformance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor
* Some names and brands may be claimed as the property of others. Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel Xeon Phi Processor E.J. Bylaska 1, M. Jacquelin
More informationCrosstalk between VMs. Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA
Crosstalk between VMs Alexander Komarov, Application Engineer Software and Services Group Developer Relations Division EMEA 2 September 2015 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT
More informationIntel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation
Intel Atom Processor Based Platform Technologies Intelligent Systems Group Intel Corporation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS
More informationEARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA
EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility
More informationAgenda. Optimization Notice Copyright 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Agenda VTune Amplifier XE OpenMP* Analysis: answering on customers questions about performance in the same language a program was written in Concepts, metrics and technology inside VTune Amplifier XE OpenMP
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationThe Transition to PCI Express* for Client SSDs
The Transition to PCI Express* for Client SSDs Amber Huffman Senior Principal Engineer Intel Santa Clara, CA 1 *Other names and brands may be claimed as the property of others. Legal Notices and Disclaimers
More informationGuy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany
Guy Blank Intel Corporation, Israel March 27-28, 2017 European LLVM Developers Meeting Saarland Informatics Campus, Saarbrücken, Germany Motivation C AVX2 AVX512 New instructions utilized! Scalar performance
More informationAndreas Schneider. Markus Leberecht. Senior Cloud Solution Architect, Intel Deutschland. Distribution Sales Manager, Intel Deutschland
Markus Leberecht Senior Cloud Solution Architect, Intel Deutschland Andreas Schneider Distribution Sales Manager, Intel Deutschland Legal Disclaimers 2016 Intel Corporation. Intel, the Intel logo, Xeon
More informationAchieving Peak Performance on Intel Hardware. Intel Software Developer Conference London, 2017
Achieving Peak Performance on Intel Hardware Intel Software Developer Conference London, 2017 Welcome Aims for the day You understand some of the critical features of Intel processors and other hardware
More informationLecture 10: Cache Coherence. Parallel Computer Architecture and Programming CMU / 清华 大学, Summer 2017
Lecture 10: Cache Coherence Parallel Computer Architecture and Programming CMU / 清华 大学, Summer 2017 Course schedule (where we are) Week 1: How parallel hardware works: types of parallel execution in modern
More informationIntel Cluster Checker 3.0 webinar
Intel Cluster Checker 3.0 webinar June 3, 2015 Christopher Heller Technical Consulting Engineer Q2, 2015 1 Introduction Intel Cluster Checker 3.0 is a systems tool for Linux high performance compute clusters
More informationINTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT
INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT UPDATE ON OPENSWR: A SCALABLE HIGH- PERFORMANCE SOFTWARE RASTERIZER FOR SCIVIS Jefferson Amstutz Intel
More informationIntel Architecture 2S Server Tioga Pass Performance and Power Optimization
Intel Architecture 2S Server Tioga Pass Performance and Power Optimization Terry Trausch/Platform Architect/Intel Inc. Whitney Zhao/HW Engineer/Facebook Inc. Agenda Tioga Pass Feature Overview Intel Xeon
More informationOpportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory
Opportunities and Challenges in Sparse Linear Algebra on Many-Core Processors with High-Bandwidth Memory Jongsoo Park, Parallel Computing Lab, Intel Corporation with contributions from MKL team 1 Algorithm/
More informationAlexei Katranov. IWOCL '16, April 21, 2016, Vienna, Austria
Alexei Katranov IWOCL '16, April 21, 2016, Vienna, Austria Hardware: customization, integration, heterogeneity Intel Processor Graphics CPU CPU CPU CPU Multicore CPU + integrated units for graphics, media
More informationRavindra Babu Ganapathi
14 th ANNUAL WORKSHOP 2018 INTEL OMNI-PATH ARCHITECTURE AND NVIDIA GPU SUPPORT Ravindra Babu Ganapathi Intel Corporation [ April, 2018 ] Intel MPI Open MPI MVAPICH2 IBM Platform MPI SHMEM Intel MPI Open
More informationIntroduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes
Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel
More informationStanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015
Stanislav Bratanov; Roman Belenov; Ludmila Pakhomova 4/27/2015 What is Intel Processor Trace? Intel Processor Trace (Intel PT) provides hardware a means to trace branching, transaction, and timing information
More informationKnights Corner: Your Path to Knights Landing
Knights Corner: Your Path to Knights Landing James Reinders, Intel Wednesday, September 17, 2014; 9-10am PDT Photo (c) 2014, James Reinders; used with permission; Yosemite Half Dome rising through forest
More informationIntel Many Integrated Core (MIC) Architecture
Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products
More informationSilvermont. Introducing Next Generation Low Power Microarchitecture: Dadi Perlmutter
Introducing Next Generation Low Power Microarchitecture: Silvermont Dadi Perlmutter Executive Vice President General Manager, Intel Architecture Group Chief Product Officer Risk Factors Today s presentations
More informationContributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth
Presenter: Surabhi Jain Contributors: Surabhi Jain, Gengbin Zheng, Maria Garzaran, Jim Cownie, Taru Doodi, and Terry L. Wilmarth May 25, 2018 ROME workshop (in conjunction with IPDPS 2018), Vancouver,
More informationApril 2 nd, Bob Burroughs Director, HPC Solution Sales
April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software
More informationLS-DYNA Performance on Intel Scalable Solutions
LS-DYNA Performance on Intel Scalable Solutions Nick Meng, Michael Strassmaier, James Erwin, Intel nick.meng@intel.com, michael.j.strassmaier@intel.com, james.erwin@intel.com Jason Wang, LSTC jason@lstc.com
More informationTHE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF
14th ANNUAL WORKSHOP 2018 THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF Paul Luse Intel Corporation Apr 2018 AGENDA Storage Performance Development Kit What is SPDK? The SPDK Community Why are so
More informationAdvanced Parallel Programming I
Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University
More informationLS-DYNA Performance Benchmark and Profiling. October 2017
LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource
More informationFast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud
Fast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud Kyle Corrigan, Cloud Product Line Manager, Intel Server Products Group Wagner Diaz, Product Marketing Engineer, Intel Data Center
More informationMunara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.
Munara Tolubaeva Technical Consulting Engineer 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries. notices and disclaimers Intel technologies features and benefits depend
More informationSample for OpenCL* and DirectX* Video Acceleration Surface Sharing
Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing User s Guide Intel SDK for OpenCL* Applications Sample Documentation Copyright 2010 2013 Intel Corporation All Rights Reserved Document
More informationCompiling for Scalable Computing Systems the Merit of SIMD. Ayal Zaks Intel Corporation Acknowledgements: too many to list
Compiling for Scalable Computing Systems the Merit of SIMD Ayal Zaks Intel Corporation Acknowledgements: too many to list Takeaways 1. SIMD is mainstream and ubiquitous in HW 2. Compiler support for SIMD
More informationOptimizing Film, Media with OpenCL & Intel Quick Sync Video
Optimizing Film, Media with OpenCL & Intel Quick Sync Video Petter Larsson, Senior Software Engineer Ryan Tabrah, Product Manager The Intel Vision Enriching the lives of every person on earth through technology
More informationMore performance options
More performance options OpenCL, streaming media, and native coding options with INDE April 8, 2014 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Intel Inside, Intel Xeon, and Intel
More informationAchieving High Performance. Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013
Achieving High Performance Jim Cownie Principal Engineer SSG/DPD/TCAR Multicore Challenge 2013 Does Instruction Set Matter? We find that ARM and x86 processors are simply engineering design points optimized
More informationKirill Rogozhin. Intel
Kirill Rogozhin Intel From Old HPC principle to modern performance model Old HPC principles: 1. Balance principle (e.g. Kung 1986) hw and software parameters altogether 2. Compute Density, intensity, machine
More informationBecca Paren Cluster Systems Engineer Software and Services Group. May 2017
Becca Paren Cluster Systems Engineer Software and Services Group May 2017 Clusters are complex systems! Challenge is to reduce this complexity barrier for: Cluster architects System administrators Application
More informationIntroduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes
Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Multi-core today: Intel Xeon 600v4 (016) Xeon E5-600v4 Broadwell
More informationTechnologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017
Technologies and application performance Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 The landscape is changing We are no longer in the general purpose era the argument of
More informationReal World Development examples of systems / iot
Real World Development examples of systems / iot Intel Software Developer Conference Seoul 2017 Jon Kim Software Consulting Engineer Contents IOT end-to-end Scalability with Intel x86 Architect Real World
More informationIntel optane memory as platform accelerator. Vladimir Knyazkin
Intel optane memory as platform accelerator Vladimir Knyazkin 2 Legal Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service
More informationMICHAL MROZEK ZBIGNIEW ZDANOWICZ
MICHAL MROZEK ZBIGNIEW ZDANOWICZ Legal Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
More informationEmbree Ray Tracing Kernels: Overview and New Features
Embree Ray Tracing Kernels: Overview and New Features Attila Áfra, Ingo Wald, Carsten Benthin, Sven Woop Intel Corporation Intel, the Intel logo, Intel Xeon Phi, Intel Xeon Processor are trademarks of
More informationHPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel
15.03.2012 1 Accelerating HPC HPC Advisory Council Lugano, CH March 15 th, 2012 Herbert Cornelius Intel Legal Disclaimer 15.03.2012 2 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS.
More informationISA-L Performance Report Release Test Date: Sept 29 th 2017
Test Date: Sept 29 th 2017 Revision History Date Revision Comment Sept 29 th, 2017 1.0 Initial document for release 2 Contents Audience and Purpose... 4 Test setup:... 4 Intel Xeon Platinum 8180 Processor
More informationBenchmarking Software Data Planes Intel Xeon Skylake vs. Broadwell 1. Maciek Konstantynowicz
Benchmarking Software Data Planes Intel Xeon Skylake vs. Broadwell 1 March 7 th, 2019 Georgii Tkachuk georgii.tkachuk@intel.com Maciek Konstantynowicz mkonstan@cisco.com Shrikant M. Shah shrikant.m.shah@intel.com
More informationDPDK Performance Report Release Test Date: Nov 16 th 2016
Test Date: Nov 16 th 2016 Revision History Date Revision Comment Nov 16 th, 2016 1.0 Initial document for release 2 Contents Audience and Purpose... 4 Test setup:... 4 Intel Xeon Processor E5-2699 v4 (55M
More informationJackson Marusarz Intel
Jackson Marusarz Intel Agenda Motivation Threading Advisor Threading Advisor Workflow Advisor Interface Survey Report Annotations Suitability Analysis Dependencies Analysis Vectorization Advisor & Roofline
More informationModern CPU Architectures
Modern CPU Architectures Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2014 16.04.2014 1 Motivation for Parallelism I CPU History RISC Software GmbH Johannes
More informationIN-PERSISTENT-MEMORY COMPUTING WITH JAVA ERIC KACZMAREK INTEL CORPORATION
IN-PERSISTENT-MEMORY COMPUTING WITH JAVA ERIC KACZMAREK INTEL CORPORATION LEGAL DISCLAIMER & OPTIMIZATION NOTICE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL
More informationComputer Architecture and Structured Parallel Programming James Reinders, Intel
Computer Architecture and Structured Parallel Programming James Reinders, Intel Parallel Computing CIS 410/510 Department of Computer and Information Science Lecture 17 Manycore Computing and GPUs Computer
More information