Arm s Latest CPU for Laptop-Class Performance

Similar documents
Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

A Secure and Connected Intelligent Future. Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017

Each Milliwatt Matters

WAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Arm Limited

Accelerating intelligence at the edge for embedded and IoT applications

A Developer's Guide to Security on Cortex-M based MCUs

ECE 571 Advanced Microprocessor-Based Design Lecture 22

Unleash the DSP performance of Arm Cortex processors

Accelerate Ceph By SPDK on AArch64

DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks

Bringing Intelligence to Enterprise Storage Drives

Optimize HPC - Application Efficiency on Many Core Systems

CCIX: a new coherent multichip interconnect for accelerated use cases

DynamIQ Processor Designs Using Cortex-A75 & Cortex- A55 for 5G Networks

ARM instruction sets and CPUs for wide-ranging applications

The Changing Face of Edge Compute

DPDK on Arm64 Status Review & Plan

ARM processors driving automotive innovation

Advanced IP solutions enabling the autonomous driving revolution

Building blocks for 64-bit Systems Development of System IP in ARM

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications

ARM big.little Technology Unleashed An Improved User Experience Delivered

Software Ecosystem for Arm-based HPC

Arm crossplatform. VI-HPS platform October 16, Arm Limited

Beyond TrustZone PSA Reed Hinkel Senior Manager Embedded Security Market Development

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Artificial Intelligence Enriched User Experience with ARM Technologies

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

HOT CHIPS 2014 NVIDIA S DENVER PROCESSOR. Darrell Boggs, CPU Architecture Co-authors: Gary Brown, Bill Rozas, Nathan Tuck, K S Venkatraman

Inside Intel Core Microarchitecture

What is gem5 and where do I get it?

How to Build Optimized ML Applications with Arm Software

Connect your IoT device: Bluetooth 5, , NB-IoT

How to Build Optimized ML Applications with Arm Software

Connect Your IoT Device: Bluetooth 5, , NB-IoT

A New Security Platform for High Performance Client SoCs

Using Virtual Platforms To Improve Software Verification and Validation Efficiency

Compute solutions for mass deployment of autonomy

ARM Cortex processors

Addressing 7nm Arm DynamIQ Cluster Design Challenges Using the Cadence Digital Implementation Flow

Amber Baruffa Vincent Varouh

2017 Arm Limited. How to design an IoT SoC and get Arm CPU IP for no upfront license fee

ECE 8823: GPU Architectures. Objectives

Next Generation Enterprise Solutions from ARM

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Enable AI on Mobile Devices

Arm s First-Generation Machine Learning Processor

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm

Power 7. Dan Christiani Kyle Wieschowski

ARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

Arm TrustZone Armv8-M Primer

Bringing Intelligence to Enterprise Storage Drives

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Perform. Travis Lanier Sr. Director, Product Management Qualcomm Technologies,

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing

Hardware- Software Co-design at Arm GPUs

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012

3D Graphics in Future Mobile Devices. Steve Steele, ARM

Making progress vs strategy

Why PartnerDirect. Choice, flexibility, simplicity

Beyond TrustZone Part 1 - PSA

Standard Cell Design and Optimization Methodology for ASAP7 PDK

The Next Steps in the Evolution of ARM Cortex-M

TZMP-1 Software Reference Implementation. Ken Liu 2018-Mar-12

TABLET COMPARISON WITH BENCHMARKS TABLETS WE TESTED A PRINCIPLED TECHNOLOGIES TEST REPORT. SEPTEMBER 2014 (Revised) Commissioned by Intel Corp.

Rendering Structures Analyzing modern rendering on mobile

A Comparative Performance Evaluation of Different Application Domains on Server Processor Architectures

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know.

Silvermont. Introducing Next Generation Low Power Microarchitecture: Dadi Perlmutter

Developing the Bifrost GPU architecture for mainstream graphics

Building firmware update: The devil is in the details

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Building Ultra-Low Power Wearable SoCs

Microarchitecture Overview. Performance

Implementing debug. and trace access. through functional I/O. Alvin Yang Staff FAE. Arm Tech Symposia Arm Limited

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Computer Architecture!

Microarchitecture Overview. Performance

Intel SSD Data center evolution

Improve the container image compatibility on Arm

Computer Architecture!

Lecture: SMT, Cache Hierarchies. Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)

Modernize and Transform Your Storage Network. Alain HUGUET EMEA Technical Alliance Manager for DELL EMC

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University

How to write powerful parallel Applications

White Paper. First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)

Computer Architecture. Fall Dongkun Shin, SKKU

Arm Processor Technology Update and Roadmap

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.

Dynamic Memory Dependence Predication

Microprocessor Trends and Implications for the Future

Computer Architecture s Changing Definition

Lecture: SMT, Cache Hierarchies. Topics: memory dependence wrap-up, SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.

Leading the world to 5G

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Hardware-Based Speculation

Transcription:

Arm s Latest CPU for Laptop-Class Performance 2018 Arm Limited Aditya Bedi Arm Tech Symposia India

Untethered. Connected. Immersive. Innovation continues to drive growth and performance demands on our compute devices Small to large screen Multi-day battery life 5G transformation Security 2 2018 Arm Limited

Last year we announced new DynamIQ processors >20% More mobile performance vs Cortex-A73 Same Sustained performance as Cortex-A73 +40% Infrastructure performance vs Cortex-A72 Performance leadership in mobile Best possible power profile Improved performance in infrastructure Up to 2x more performance Up to 15% better power efficiency Up to 10x more configurable For advanced use cases Higher sustained performance Edge to cloud scalability 3 2018 Arm Limited

Designing for user experience from small screens to large System responsiveness High single thread performance for user facing metrics Apps launch and closing Web browsing Productivity applications Scalability with area efficient octa-core solution CPU availability for performance demand responsiveness 4 2018 Arm Limited

Designing for user experience from small screens to large Energy savings and power efficiency Efficient scheduling Power efficiency cores for low demanding tasks and background services Power optimization Finer-grained speed control Autonomous memory power management Fast power on/sleep/off management 5 2018 Arm Limited

Arm Cortex-A portfolio Armv7-A Armv8-A Cortex-A15/A17 Infrastructure performance; mobile efficiency 32-bit Cortex-A57 Proven infrastructure performance Cortex-A72 For all applications Cortex-A73 For mobile and consumer Cortex-A75 Ground-breaking performance for all markets Cortex-A76 Laptop-class performance with smartphone efficiency Cortex-A7x series Cortex-A9 Well-established, mid-range processor Cortex-A53 Balanced performance and efficiency Cortex-A55 Highest efficiency midrange processor Cortex-A5x series 32-bit Cortex-A5/A7 Smallest and lowest power Armv7-A Cortex-A35 Smallest, lowest power Armv8-A Cortex-A32 Smallest, lowest power 32-bit Armv8-A Cortex-A3x series 32-bit 32-bit 2008 2013 2014 2015 2016 2017 6 2018 Arm Limited Arm big.little compatible Year of IP release, volume devices in the subsequent year 2018

Arm Cortex-A portfolio Armv7-A Armv8-A Cortex-A15/A17 Infrastructure performance; mobile efficiency 32-bit Cortex-A57 Proven infrastructure performance Cortex-A72 For all applications Cortex-A73 For mobile and consumer Cortex-A75 Ground-breaking performance for all markets Cortex-A76 Laptop-class performance with smartphone efficiency Cortex-A7x series Cortex-A9 Well-established, mid-range processor Cortex-A53 Balanced performance and efficiency Cortex-A55 Highest efficiency midrange processor Cortex-A5x series 32-bit Cortex-A5/A7 Smallest and lowest power Armv7-A Cortex-A35 Smallest, lowest power Armv8-A Cortex-A32 Smallest, lowest power 32-bit Armv8-A Cortex-A3x series 32-bit 32-bit 2008 2013 2014 2015 2016 2017 7 2018 Arm Limited Arm big.little compatible Year of IP release, volume devices in the subsequent year 2018

Let s take a closer look 8 2018 Arm Limited

Arm Cortex-A76 CPU Laptop-class performance, smartphone experience Built from the ground up with new microarchitecture capabilities Built on innovative DynamIQ technology Battery life that can outlast your work day Longer Battery Life Better energy efficiency Intelligent Computing Increased Machine Learning performance Increasing Productivity Performance without compromise 9 2018 Arm Limited

Cortex-A76: Performance efficiency - focus on the user Cortex-A76 CPU is focused on performance and performance efficiency Performance efficiency - extract significantly more performance than any other microarchitectures at similar complexity Requires intense focus on every aspect of the microarchitecture More performance from every logic block Focus on the end-user, enable sustained full-speed performance Yes, we also do well on benchmarks 10 2018 Arm Limited

Branch prediction Cortex-A76: Front-end Front-end built to hide latency at high bandwidth Multi-level branch-target caches Hybrid indirect predictor - unparalleled prediction capability Front-end Instruction Fetch 64K I-Cache L1-ITLB Decode/Rename/ Commit 11 2018 Arm Limited

4-8 instructions/cycle 4 Mops/cycle 8 uops/cycle dispatch Decode DQ Register rename Dispatch Cortex-A76: Decode/Rename/Commit 4-instruction/cycle, power-optimized decode High-density decode/rename Decode/Rename/Commit Execution core Dispatch to out-of-order core and commit unit L1 Data cache / MMU Commit 12 2018 Arm Limited

ASIMD Integer Cortex-A76: Execution core Uops dispatched to 120-entry issue queue capacity Dual 128-bit ASIMD/FP execution pipelines State-of-the-art latency-optimized VX data paths IQ IQ Branch ALU ALU ALU/MAC/DIV Execution core IQ FMUL/FADD/FDIV/ALU/IMAC FMUL/FADD/ALU L1 Data cache / MMU 13 2018 Arm Limited

Cortex-A76: Cache hierarchy and performance Full cache hierarchy is co-optimized for latency and bandwidth Sophisticated 4th generation prefetcher 256KB-512KB private L2 with 9-cycle LD-use 2M-4M DynamIQ L3 with 26-31 cycle LD-use 3 2.5 2 1.5 1 0.5 0 Memory hierarchy bandwidth Cortex-A76 vs. Cortex-A75 L1 cache L2 cache L3 cache DRAM 14 2018 Arm Limited

Accelerating the performance curve in any workloads Pushing the single-thread performance +25% more integer IPC than the Cortex-A75 CPU +35% higher ASIMD/FP performance +90% higher memory bandwidth Boosting mobile experience +28% more Geekbench performance +35% more Javascript performance Enabling intelligence at the edge 3.9x more AI performance Baseline IPC - frequency upside from here 1.58x 1.79x 1.56x SPECINT2K6 SPECFP2K6 Geekbench v4 1.77x Javascript 2.44x LMBench memcpy Cortex-A73 Cortex-A75 Cortex-A76 IPC comparison - iso-process/-frequency 9.7x GEMM lowp 15 2018 Arm Limited

Building for the premium experience for advanced process High-performance Cortex-A76 implementation 3+ GHz in 7nm Increasing Cortex-A55 CPU private L2 cache Cortex-A76 Cortex-A55 DynamIQ Shared Unit Implementing 4MB L3 cache CoreLink CCI-550 Other IPs Optimized memory system DMC LPDDR4x DMC Memory System Integrated TrustZone technology 16 2018 Arm Limited

Cortex-A76 CPU delivers premium performance Performance (relative scores based on AArch64 SpecInt2K6) 2x Performance improvement 2.1x 1.9x Peak single-thread performance Cortex-A73 16nm Cortex-A75 10nm big.little performance 5W Cortex-A76 7nm 17 2018 Arm Limited Configuration: Cortex-A73 2.45GHz, L1 64KB, L3 2MB: Cortex-A75 2.8 GHz, L1 64KB, L2 512KB, L3 2MB: Cortex-A76 3.3 GHz, L1 64KB, L2 512KB, L3 4MB

It starts with an ecosystem 18 2018 Arm Limited

Yoga C630 Miix 630 Envy x2 835 NovaGo Always On, Always Connected PCs powered by Snapdragon And more Pace of innovation *Requires network connection and will support up to 20 hours of battery life Credit: Qualcomm Technologies, Inc. 19

The evolution of the always on, always connected pc 21 2018 Arm Limited Image created by Arm based off Shrout Research data: download the full whitepaper at www.shroutresearch.com

Delivering on promises Improvement across all benchmarks 1.4 Relative to previous generation Cortex-A system Over 25% minimum performance uplift 1.2 1 0.8 0.6 1.4x 1.3x 1.2x 1.3x 1.4x 1.4x 0.4 0.2 0 Speedometer Geekbench single-core Geekbench multi-core WebXPRT 3 TouchXPRT16 MotionMark 1.0 Source: Shrout Research, measured on Lenovo C630 and HP Envy x2 devices 22 2018 Arm Limited

Extended battery life and thermal constraints on real-system Running your apps longer Web browsing battery life improvement (relative to previous system) Multi-day battery life Time improvement between charge (relative to previous system) 1.3x 1.3x 23 2018 Arm Limited Source: Shrout Research, measured on Lenovo C630 and HP Envy x2 devices

The journey ahead 24 2018 Arm Limited

Client Compute CPU roadmap Cortex-A73 16nm Cortex-A75 10nm Cortex-A76 7nm Deimos 7nm Hercules 7nm and 5nm 2017 2018 2019 2020 25 2018 Arm Limited

Performance Path to compute performance leadership with efficiency Intel Core i5 U-series A performance trajectory surpassing Moore s law Core i5-4300u 22nm Core i5-6300u 14nm Core i5-7300u 14nm Arm Compute 2.5x increase Unmatched year-over-year Arm CPU performance gains Cortex-A15 Cortex-A57 Cortex-A72 Cortex-A73 Cortex-A75 Cortex-A76 Deimos Hercules 28nm 20nm 16nm 16nm 10nm 7nm 7nm 5nm 2013 Single-core performance estimates based on SPECINT2k6 26 2018 Arm Limited

Expanding the mobile experience Innovation on mobile from small screens to large is changing the user experience and continues to push growth The new premium IP delivers Laptop-class performance Arm with its ecosystem is aligning itself to meet customer needs and get ready for 5G evolution for truly connected experiences 27 2018 Arm Limited

Thank You Danke Merci 谢谢ありがとう Gracias Kiitos 감사합니다 धन यव द תודה 28 2018 Arm Limited