Arm Processor Technology Update and Roadmap

Similar documents
Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

SUSE Linux Entreprise Server for ARM

Ampere emag Processor Optimized for the Cloud Kumar Sankaran Vice President, Software & Platforms, Ampere

Atos ARM solutions for HPC

Intel Many Integrated Core (MIC) Architecture

April 2 nd, Bob Burroughs Director, HPC Solution Sales

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

Beyond Hardware IP An overview of Arm development solutions

Arm's role in co-design for the next generation of HPC platforms

Software Ecosystem for Arm-based HPC

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Standardized Firmware for ARMv8 based Volume Servers

DR. LISA SU

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

Building the Ecosystem for ARM Servers

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Next Generation Enterprise Solutions from ARM

Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard

Building blocks for 64-bit Systems Development of System IP in ARM

Pedraforca: a First ARM + GPU Cluster for HPC

INGRAM MICRO & DELL. Partner Kit

Transforming the Data Center with ARM

CAUTIONARY STATEMENT 1 EPYC PROCESSOR ONE YEAR ANNIVERSARY JUNE 2018

ARM High Performance Computing

Interconnect Your Future

ARISTA: Improving Application Performance While Reducing Complexity

Revolutionizing Open. Cecilia Carniel IBM Power Systems Scale Out sales

ARM processors driving automotive innovation

The Mont-Blanc project Updates from the Barcelona Supercomputing Center

Gen-Z Memory-Driven Computing

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers

C6000 Compiler Roadmap

Jay Kruemcke Sr. Product Manager, HPC, Arm,

RapidIO.org Update. Mar RapidIO.org 1

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

PCCC WORKSHOP:AMD の最新製品戦略とプラットフォームソリューション FEBRUARY 19 TH 2016 HIDETOSHI IWASA, FAE MANAGER AMD JAPAN

IBM Power 9 надежная платформа для развертывания облаков. Ташкент. Юрий Кондратенко Cross-Brand Sales Specialist

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist

RapidIO.org Update.

A Peek at the Future Intel s Technology Roadmap. Jesse Treger Datacenter Strategic Planning October/November 2012

OpenPOWER Performance

AMD EPYC BASED DELL EMC POWEREDGE 14G SERVERS Scott Aylor, Corporate Vice President and General Manager, Datacenter and Embedded Solutions Group

Looking ahead with IBM i. 10+ year roadmap

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

Deep Learning mit PowerAI - Ein Überblick

IBM Power Advanced Compute (AC) AC922 Server

OCP Engineering Workshop - Telco

3D Graphics in Future Mobile Devices. Steve Steele, ARM

IBM Power AC922 Server

Embracing Open Technologies in the HPEC Market

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Power your planet. Optimizing the Enterprise Data Center POWER7 Powers a Smarter Infrastructure

Intel s Architecture for NFV

Enabling and Optimizing MariaDB on Qualcomm Centriq 2400 Arm-based Servers

Multi-Core Microprocessor Chips: Motivation & Challenges

The Arm Technology Ecosystem: Current Products and Future Outlook

IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE

OpenDataPlane (ODP) A Quick Introduction and Overview. Linaro Networking Group (LNG) Presented by Bill Fischofer.

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

IBM Emulex 16Gb Fibre Channel HBA Evaluation

THE ADVANCEMENT OF STORAGE SYSTEM DESIGNS FOR DIGITAL INDIA. Dana Kammersgard February 2017

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

Intel Enterprise Processors Technology

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Top 5 Reasons HPE Delivers the Best Microsoft Azure Stack Solution

Leading Performance for Oracle Applications? John McAbel Collaborate 2015

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

Birds of a Feather Presentation

Disclosures Statements in this presentation that refer to Business Outlook, future plans and expectations are forward-looking statements that involve

FAST FORWARD TO YOUR <NEXT> CREATION

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017

Fujitsu High Performance CPU for the Post-K Computer

2016 IBM Corporation 1

Infrastructure Matters: POWER8 vs. Xeon x86

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

IBM Virtual Fabric Architecture

Ultimate Workstation Performance

Interconnect Your Future

NVIDIA GRID. Ralph Stocker, GRID Sales Specialist, Central Europe

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS

CCR. ISC18 June 28, Kevin Pedretti, Jim H. Laros III, Si Hammond SAND C. Photos placed in horizontal env

ARM mbed Towards Secure, Scalable, Efficient IoT of Scale

powered by Cloudian and Veritas

Linux Networx HPC Strategy and Roadmap

High Performance Computing The Essential Tool for a Knowledge Economy

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

QLogic 2500 Series FC HBAs Accelerate Application Performance

Calxeda : RACK TRUMPS THE CHIP

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

Practical High Performance Computing

Power your cloud infrastructure with Oracle VM and Cisco!

IBM CORAL HPC System Solution

Goro Watanabe. Bill King. OOW 2013 The Best Platform for Big Data and Oracle Database 12c. EVP Fujitsu R&D Center North America

EPYC VIDEO CUG 2018 MAY 2018

IBM Power Systems: Open Innovation to put data to work. Juan López-Vidriero Mata Director técnico de ventas de servidores

Transcription:

Arm Processor Technology Update and Roadmap

ARM Processor Technology Update and Roadmap Cavium: Giri Chukkapalli is a Distinguished Engineer in the Data Center Group (DCG) Introduction to ARM Architecture for HPC deployment and rationale for the design point of ThunderX2 Core and SOC in the single thread performance vs throughput space is presented in this talk. Focus is on the sustained performance per TCO and ease of exploiting spectral parallelism of HPC applications. Preliminary experience of porting, runningand performance analysis of HPC applications will be discussed. 2016 Cavium, Inc. Confidential and Proprietary Information

ThunderX2 in HPC

Cavium Corporate Overview Enterprise Mobile Infrastructure Data Center and Cloud Service Provider Cloud Multi-Core MIPS, ARM Processors, Security, SDN Switch and Server/Storage Connectivity ~$10B TAM 4

ARM Servers & ARM for HPC Most Widely Used ü Over 90B shipped in 25 yrs ü Out ships x86 by 20X per year Licensing Model ü Anyone can build ü Innovate & Optimize for targeted applications ARM for HPC ARM = Choice & path to more optimized solutions March to Exascale opening door for new ISA Massive parallelism requires SW changes ARM HPC projects active worldwide HPC has large open source component Thriving ARM ecosystem for HPC

Cavium s Proven Leadership in Silicon Design #1 in Security & Wireless Infrastructure,#2 in Embedded Multicore CPU Expert Performance 2S Config Highest performance, most widely supported, dual socket ARMv8 servers in production THE CPU company for Infrastructure ARMv8 architectural licensee High Perf Custom Cores Complete Portfolio 2 core to 48 core, variety of price points, TDP Power, Perf, Area Optimized Common SW architecture OPTIMIZING ARM64 SERVERS FOR HPC & CLOUD DATA CENTER

World s Highest Performance Xeon Class ARM Server 2 nd generation product from Cavium ARM Leadership ThunderX2 FIRSTS for ARM Processors Multi threaded, fully out of order high performance ARMv8 custom cores Single and dual socket support Highest memory bandwidth & capacity Server class virtualization Server class RAS Extensive power management Rich IO configurations Extensive Power management Core and Socket level performance competitive with next gen incumbent server CPUs Comprehensive hardware and software ecosystem 7

Differentiation Cores Higher core count delivers higher throughput Total Threads Higher thread count = larger number of vcpus Memory Bandwidth More memory bandwidth for memory intensive workloads Memory Capacity More memory capacity for in-memory workloads PCIe Lanes Incumbent Server CPU ThunderX2 Rich IO connectivity options Direct attach to VMe devices

: Thriving HPC Ecosystem Linux Enterprise SLE12 Industry Leading Operating Systems Debuggers, Profilers & Cluster Mgt Open Source & Community Focus Standards Based Sys Management & FW Optimized Compilers & Dev Environments

ThunderX Momentum in HPC Continues to Grow 1.0 2.2X Memory Bandwidth 2.5X Floating Point 3X Integer 4X Vectors 2-4X better HPC performance Server platforms at World s premier HPC Labs Significant HPC Engagements Early Press Announcements

Early Performance for HPC Applications

ThunderX2 Delivers Compelling Memory Bandwidth Stream Scaling % of peak bandwidth 100 90 80 70 60 50 40 30 20 10 0 Highest Memory bandwidth enables memory bound applications to scale better % of cpu load load copy scale add triad Details: ThunderX2 CPU Linux kernel 4.8.0-32-generic (4k pages) Stream compiled with GCC version 5.4.0-6 Ubuntu 16.04.4 at -O3

OpenBLAS DGEMM Efficient cores capable of achieving close to theoretical peak performance Details: ThunderX2 CPU Linux kernel 4.8.0-32-generic (4k pages) OpenBLAS compile with GCC version 5.4.0-6ubuntu1~16.04.4 at -O3 13

OpenBLAS SGEMM Efficient cores capable of achieving close to theoretical peak performance Details: ThunderX2 CPU Linux kernel 4.8.0-32-generic (4k pages) OpenBLAS compile with GCC version 5.4.0-6ubuntu1~16.04.4 at -O3 14

ThunderX2 Delivers Best-In-Class HPL Performance % of peak performance 100 80 60 40 20 0 HPL scaling - % of peak Gflops 3 13 19 25 28 38 50 63 75 78 94 100 % of system load Efficient cores capable of achieving close to theoretical peak performance 15 Details: ThunderX2 CPU Linux kernel 4.8.0-32-generic (4k pages) HPL compile with GCC version 5.4.0-6ubuntu1~16.04.4 (defaults) mpich v3.2, no openmp, single socket test, process grid for each test case based on number of cores in test

ThunderX2 Performance Scaling on Real Applications High memory throughput benefits simple simulations Large high performance core count combined with high memory throughput benefits complex simulations 16