ARM instruction sets and CPUs for wide-ranging applications Chris Turner Director, CPU technology marketing ARM Tech Forum Taipei July 4 th 2017
ARM computing is everywhere #1 shipping GPU in the world is Mali > 5Bn people using ARM-based mobile phones 6.6Bn ARM-based embedded chips shipped in 2016 100Bn ARM-based chips to date 2
Total computing from sensors to server Performance, efficiency and security Efficiency enabling distributed compute Efficiency delivering TCO benefits Demands more performance with greater emphasis on efficiency and power management 3
ARM CPU architecture for total computing Cortex-A Cortex-R Cortex-M SecurCore Highest performance Fast response Smallest/lowest power Tamper resistant high-level operating systems high performance, hard real-time applications discrete processing and microcontrollers physical security 4
Total computing in mobile 5
The right combination of CPUs Mobile Consumer Automotive IoT Cortex-A Cortex-R Cortex-M Rich UI and OS, open system, high performance Safety, performance and real-time control Low power, deterministic sensing and control 6
ARM architecture profiled for the application Mobile Consumer Automotive IoT ARMv8-A ARMv8-R ARMv8-M Virtual memory Protected memory Programmable exception model Automated exceptions TrustZone + Virtualization Virtualization TrustZone A64, A32, T32 A32, T32 T32 NEON SIMD DSP 7
Evolving architecture that supports an ecosystem Applications Software vendors Development tools Silicon partners ARM9 ARM11 8
ARM ecosystem widest choice, most innovation 9
ARM application processors are everywhere Cortex-A CPUs cover a wide variety of markets Scale efficiently to substantially higher performance Fit even more compute in a smaller footprint with less power Mobile and consumer Automotive, industrial Servers and networking IoT and embedded 10
ARM Cortex-A portfolio ARMv7-A Cortex-A15/A17 Infrastructure performance; mobile efficiency 32-bit Cortex-A57 Proven infrastructure performance 64/32-bit Cortex-A72 64/32-bit ARMv8-A ARMv8-A For all applications Cortex-A57 Cortex-A73 For mobile and consumer 64/32-bit Cortex-A72 Cortex-A75 Ground-breaking performance for all markets 64/32-bit A7x Series Cortex-A9 Well-established, mid-range processor Cortex-A53 Balanced performance and efficiency Cortex-A55 Highest efficiency mid-range processor A5x Series 32-bit 64/32-bit 64/32-bit Cortex-A5/A7 Smallest and lowest power ARMv7-A Cortex-A35 Smallest, lowest power ARMv8-A Cortex-A32 Smallest, lowest power 32-bit ARMv8-A A3x Series 32-bit 64/32-bit 32-bit 11 2008 2013 2014 2015 2016 2017 Year of IP release, volume devices in the subsequent year big.little compatible
Relative Performance at Target Frequency Cortex- A9 Cortex-A15 Cortex- A17 Cortex-A57 Cortex-A72 Cortex-A73 Cortex-A75 Relative Iso-frequency Performance Cortex-A9 Cotex-A35 Cortex-A53 Cortex-A55 Relative iso-frequency performance Cortex-A5 Cortex-A7 Cortex-A32, Cortex-A35 Meeting performance needs for any application Cortex-A portfolio performance comparisons* 200% 180% 160% 140% 120% 100% 80% 60% 40% 20% 0% High Performance 140% 120% 100% 80% 60% 40% 20% 0% Cortex-A High Efficiency 140% 120% 100% 80% 60% 40% 20% 0% Ultra high efficiency Cortex-A9 @ 1.5GHz Cortex-A17 @ 1.6GHz Cortex-A72 @ 2.5GHz Cortex-A75 @ 3GHz Cortex-A15 @ 1.9GHz Cortex-A57 @ 2.1GHz Cortex-A73 @ 2.7GHz Cortex-A9 Cortex-A35 Cortex-A53 Cortex-A5 Cortex-A7 Cortex-A32, Cortex-A35 12 *Performance comparison using SPECINT2000 benchmark suite
big.little performance and efficiency for mobile Higher performance Faster, more responsive systems Interrupt controller big cluster LITTLE cluster Increased battery life Cache Cache Superior power efficiency Higher, longer sustained performance Cache Coherent Interconnect Tuned for mobile, consumer and embedded Higher compute capacity 13
DynamIQ plus big.little CPU1 L1C & L2C 1 to 8 CPUs DynamIQ Shared Unit (DSU) Asynchronous bridges CPU8 L1C & L2C Snoop filter L3 Cache Bus i/f ACP and peripheral port i/f Power manager cluster architecture 1b+7L 2b+6L 4b+4L 1b+2L 1b+3L 1b+4L 14
NEON flexible high-performance data computing Delivers Performance Enhanced multimedia user experience Intensive data processing Seamless Development Parallel computing AI ML CV Wide Software Support for TTM 15
ARM compute library leverages NEON for CV/ML Optimized low-level functions for CPU and GPU Most popular CV and ML functions Common functions underpinning popular ML frameworks Faster performance and development cycle Public availability as open source Key Functions categories Basic arithmetic Convolutions Colour manipulation Feature detection Neural network GEMM Pyramids Filters Image reshaping Mathematical functions 16 CPU version - tested on Huawei Mate 8 (single threaded)
ARM TrustZone protecting billions of devices Authentication Lifecycle Security non-trusted Mobile Payment Content Protection Enterprise Security Communications Security Device Security Crypto Root of Trust trusted trusted software trusted hardware secure system secure storage 17
ARM architecture for total computing Cortex-A Cortex-R Cortex-M SecurCore Highest performance Fast response Smallest/lowest power Tamper resistant high-level operating systems high performance, hard real-time applications discrete processing and microcontrollers physical security 18
ARM architecture for total computing Cortex-A Cortex-R Cortex-M SecurCore Highest performance Fast response Smallest/lowest power Tamper resistant high-level operating systems high performance, hard real-time applications discrete processing and microcontrollers physical security 19
Cortex-M CryptoCell CoreLink ARMv8-M TrustZone for IoT microcontrollers Simplified development Real-time response Efficient security Functional safety 20
ARM Cortex-M portfolio Cortex-M7 Maximum performance, control and DSP 25Bn Total units shipped * High performance Cortex-M3 Performance efficiency Cortex-M4 Mainstream control and DSP Cortex-M33 Flexibility, control and DSP with TrustZone Performance efficiency Cortex-M0 Cortex-M0+ Cortex-M23 Lowest cost, low power Highest energy efficiency TrustZone in smallest area, lowest power Lowest power & area ARMv7-M ARMv6-M ARMv8-M *Data as of Dec. 2016 21
Cortex-M23 TrustZone in the smallest footprint Smallest footprint Maximum efficiency Constrained applications Secure Ultra efficient +50% more efficient than Cortex-M33 Smart lock Safe Smart bandage Ubiquitous 75% smaller than Cortex-M33 same ultra-high efficiency as Cortex-M0+ Making energy harvesting IoT viable Medical nanorobot Asset tracking 22
DSP FPU Co-proc i/f Cortex-M33 efficiency, security and flexibility Extremely compact Configurable and extensible Widely applicable Cortex-A5 Cortex-M33 80% smaller than Cortex-A5 Base core TrustZone Cortex-A5, Cortex-M33 size based on 40nm 23
ARM CPU architecture for total computing Cortex-A Cortex-R Cortex-M SecurCore Highest performance Fast response Smallest/lowest power Tamper resistant high-level operating systems high performance, hard real-time applications discrete processing and microcontrollers physical security 24
Cortex-R real-time, high performance, safety Market-leading, real-time compute across many markets HDD and SSD storage 3G, 4G, 5G and modems Automotive functional safety Industrial control Communications Networking >4.5Bn units shipped to date SoC real-time controllers 25
ARM Cortex-R portfolio Cortex-R4 Real-time performance Cortex-R5 Real-time performance and peripheral control Cortex-R7 High performance 4G modem and storage Cortex-R8 Highest performance 5G modem and storage Storage and modem Cortex-R5 Real-time performance with functional safety ARMv7-R Cortex-R52 Most advanced processor for functional safety ARMv8-R Functional safety 26
CoreMarks* Cortex-R8 next generation mobile and storage 30000 25000 20000 15000 Spanning performance needs 5G Scalable & efficient Software workload can be spread across up to 4 cores Software compatible Reduce time-to-market and protect software investment 10000 5000 0 Cortex-R4 Cortex-R5 Cortex-R7 Cortex-R8 Market-leading performance Best-in-class hard real-time performance and power efficiency 27 * Total MP CoreMarks using 28nm HPM. Max multi processor config.
Cortex-R52 ARM s most advanced processor for safety Simplifying functional safety. Providing enhanced safety features and safety support ARM s highest performance, real-time processor for safety applications Enabling partner choice through the standardized ARM architecture and #1 ecosystem 28
ARM CPU architecture for total computing Cortex-A Cortex-R Cortex-M SecurCore Highest performance Fast response Smallest/lowest power Tamper resistant high-level operating systems high performance, hard real-time applications discrete processing and microcontrollers physical security 29
ARM SecurCore for physical security SC000 SC300 ARMv6-M Optimized area, anti-tampering ARMv7-M Performance, anti-tampering SecurCore De facto standard for SIM and identification Small 32-bit embedded secure CPU for constrained applications 32-bit embedded, high performance CPU with anti-tampering De facto standard for secure elements The easiest and most proven path to meet for physical security Ultra Low Power Anti-tampering Proven Solution Performance 30
ARM SecurCore portfolio Cortex-M7 Maximum performance, control and DSP Anti-tampering 2.5Bn SecurCore shipments in 2016 High performance Cortex-M3 Performance efficiency Cortex-M4 Mainstream control and DSP SC300 Performance, anti-tampering Performance efficiency Cortex-M0 Cortex-M0+ SC000 Lowest cost, low power Highest energy efficiency Optimized area, anti-tampering Lowest power & area ARMv7-M ARMv7-M 31 ARMv6-M ARMv6-M
Summary ARM provides the world s most power-efficient processors ARM s diverse portfolio has solutions for a wide range of applications The ARM ecosystem is innovating for Total Computing 32
Cortex - M Cortex - R Cortex - A Performance and scalability for a diverse range of applications Previous ARMv6 ARMv7 ARMv8 ARMv5 ARM968E-S ARM946E-S ARM926EJ-S ARMv6 ARM11MPCore ARM1176JZ(F)-S ARM1136J(F)-S ARMv7-A Cortex-A17 Cortex-A15 Cortex-A9 Cortex-A8 Cortex-A7 Cortex-A5 Cortex-A73 Cortex-A57 Cortex-A53 ARMv8-A Cortex-A35 Cortex-A32 Cortex-A75 Cortex-A72 Cortex-A55 High performance High efficiency Ultra high efficiency ARM1156T2(F)-S ARMv7-R Cortex-R8 Cortex-R7 Cortex-R5 Cortex-R4 ARMv8-R Cortex-R52 Real Time ARMv4 ARM7TDMI ARM920T ARMv6-M Cortex-M0+ Cortex-M0 ARMv7-M Cortex-M7 Cortex-M4 Cortex-M3 ARMv8-M Cortex-M33 Cortex-M23 High performance Performance efficiency Lowest power and area 33
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. Copyright 2017 ARM Limited