ARM big.little Technology Unleashed An Improved User Experience Delivered

Similar documents
Building blocks for 64-bit Systems Development of System IP in ARM

ARM Vision for Thermal Management and Energy Aware Scheduling on Linux

ARM Intelligent Power Allocation

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

ARM instruction sets and CPUs for wide-ranging applications

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

Intelligent Power Allocation for Consumer & Embedded Thermal Control

Helio X20: The First Tri-Gear Mobile SoC with CorePilot 3.0 Technology

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

Each Milliwatt Matters

Next Generation Enterprise Solutions from ARM

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

A Study on C-group controlled big.little Architecture

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

3D Graphics in Future Mobile Devices. Steve Steele, ARM

ARMv8-A CPU Architecture Overview

Arm s Latest CPU for Laptop-Class Performance

Enabling Arm DynamIQ support. Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro)

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Building Ultra-Low Power Wearable SoCs

Energy Discounted Computing On Multicore Smartphones Meng Zhu & Kai Shen. Atul Bhargav

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

ARM the Company ARM the Research Collaborator

Silvermont. Introducing Next Generation Low Power Microarchitecture: Dadi Perlmutter

A Secure and Connected Intelligent Future. Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

Heterogeneous Architecture. Luca Benini

The Evolution of the ARM Architecture Towards Big Data and the Data-Centre

Mobile & IoT Market Trends and Memory Requirements

F28HS Hardware-Software Interface: Systems Programming

DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks

Power management for in-vehicle infotainment systems

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Many-core back to the future. Matt Horsnell ARM Research and Development

ECE 571 Advanced Microprocessor-Based Design Lecture 22

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Attack Your SoC Power Challenges with Virtual Prototyping

«Real Time Embedded systems» Multi Masters Systems

Mobile & IoT Market Trends and Memory Requirements

HOT CHIPS 2014 NVIDIA S DENVER PROCESSOR. Darrell Boggs, CPU Architecture Co-authors: Gary Brown, Bill Rozas, Nathan Tuck, K S Venkatraman

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Mobile & IoT Market Trends and Memory Requirements

POWER MANAGEMENT AND ENERGY EFFICIENCY

Power Measurements using performance counters

Leakage Mitigation Techniques in Smartphone SoCs

New ARMv8-R technology for real-time control in safetyrelated

KeyStone II. CorePac Overview

R goes Mobile: Efficient Scheduling for Parallel R Programs on Heterogeneous Embedded Systems

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

UEFI ARM Update. UEFI PlugFest March 18-22, 2013 Andrew N. Sloss (ARM, Inc.) presented by

Artificial Intelligence Enriched User Experience with ARM Technologies

Programming for Multicore & ARM big.little Technology. Ed Plowman Director of Solutions Architecture Media Processing Group, ARM

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System

Cortex-A15 MPCore Software Development

Multicore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd

Techniques and tools for measuring energy efficiency of scientific software applications

Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Jae Wook Lee. SIC R&D Lab. LG Electronics

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

Advanced IP solutions enabling the autonomous driving revolution

Position Paper: OpenMP scheduling on ARM big.little architecture

ARMv8-A Software Development

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

A case for bad big.little switching: How to scale power-performance in SI-HMP

Power Management for Embedded Systems

Bifrost - The GPU architecture for next five billion

Evolving IP configurability and the need for intelligent IP configuration

Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones

Agenda. System Performance Scaling of IBM POWER6 TM Based Servers

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Hypervisors at Hyperscale

Core 2 vs I-series. How Far Have We Really Come?

Computer Systems Architecture

SMARTPHONE HARDWARE: ANATOMY OF A HANDSET. Mainak Chaudhuri Indian Institute of Technology Kanpur Commonwealth of Learning Vancouver

Outline Marquette University

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015

DynamIQ Processor Designs Using Cortex-A75 & Cortex- A55 for 5G Networks

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

Fast, Scalable and Energy Efficient IO Solutions: Accelerating infrastructure SoC time-to-market

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

LCA14-104: GTS- A solution to support ARM s big.little technology. Mon-3-Mar, 11:15am, Mathieu Poirier

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems

FTF-CON-F0403. An Introduction to Heterogeneous Multiprocessing (ARM Cortex -A + Cortex- M) on Next-Generation i.mx Applications Processors

UTILIZING A BIG.LITTLE TM SOLUTION IN AUTOMOTIVE

ARM and x86 on Qseven & COM Express Mini. Zeljko Loncaric, Marketing Engineer, congatec AG

ARM Multimedia IP: working together to drive down system power and bandwidth

Designing Security & Trust into Connected Devices

Gen-Z Memory-Driven Computing

Snapdragon S4 System on Chip

Transcription:

ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1

Agenda Introduction to big.little Technology Benefits of big.little Technology Future big.little systems Summary Questions 2

Power Power Power Mobile Application Workloads Web Browsing Mobile users spend a high amount of time on a range of mobile applications*: 38% on web browsing and Facebook 32% on gaming 16% on audio, video and utility Time Gaming Common building blocks in workloads: Short bursts of high intensity Long periods of sustained high intensity Low intensity Time Audio Playback * Source: Flurry Analytics Time 3 Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform

Mobile Application Workloads Applications require a mix of performance levels Mobile users want a better user experience but not at a cost of reduced battery life Power Category 1 Burst of High Intensity Workloads Category 2 Sustained Performance at Thermal Limit Category 3 Long-use Low-Intensity Workloads Sustained Performance Envelope Example: Web Browsing Example: Castlemaster Example: Audio Playback 4

Percentage of Time Spent in DVFS States Mobile Application Workload Profiles Applications require a mix of performance levels Mobile users want a better user experience but not at a cost of reduced battery life Category 1 Burst High Intensity Workloads Category 2 Sustained Performance at Thermal Limit Category 3 Long-use Low-Intensity Workloads High Mid Low WFI Idle / Power Down Example: Web Browsing Example: Castlemaster Example: Audio Playback 5 Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform

big.little Technology Heterogeneous Computing 2x higher performance vs. LITTLE only Up to 75% CPU power savings vs. big only big Cluster Interrupt Control Architecturally Identical Processors High performance tuned big cores Low power tuned LITTLE cores L2 Cache LITTLE Cluster L2 Cache Hardware Coherency Cache Coherent Interconnect (CCI) L1 and L2 snooping between clusters Seamless & Automatic Task Allocation Cache Coherent Interconnect Right Task on the Right Core Up to 40% SOC power savings* 6 * Measured across a set of casual games and common use-cases on an ARM Partner 4xCortex-A15.4xCortex-A7 big.little device

Agenda Introduction to big.little Technology Benefits of big.little Technology Future big.little systems Summary Questions 7

big.little MP Software Evolution Cluster Migration 1 1 1 1 2 2 2 big.little CPU Migration 1 1 2 3 2 3 Global Task Scheduling (big.little MP) 1 2 3 5 6 7 Measured Power and Performance on big.little Devices 180% 160% 140% 120% 100% 80% 60% (big.little MP relative to Cluster Migration) Power -29% -38% 180% 160% 140% 120% 100% 80% 60% Performance +20% +60% Cluster Migration big.little MP 2 4 4 4 8 40% 20% 40% 20% Improving Performance and Efficiency 2012 H1 2013 H2 2013 0% Web Intensive Browsing Gaming (Lower is Better) 0% Web Intensive Browsing Gaming (Higher is Better) 8

big.little MP Measured Power and Performance on big.little Devices (big.little MP relative to Cluster Migration) Delivers higher power efficiency Extends battery life 180% 160% 140% 120% 100% Power -29% -38% 180% 160% 140% 120% 100% Performance +60% +20% Cluster Migration big.little MP 80% 80% 60% 60% Improves user experience 40% 20% 0% 40% 20% 0% Web Intensive Browsing Gaming (Lower is Better) Web Intensive Browsing Gaming (Higher is Better) 9

big.little MP Improves User Experience (UX) 100% 80% DVFS states: Web Browsing with Audio Normalized Jank* (Less is Better) 58% 65% 47% UX Improvement 60% 40% 20% 0% LITTLE cores handle background tasks and audio Short bursts of performance on big cores enable sustained levels of smooth user-experience CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 LITTLE core Idle LITTLE core Mid Frequency big core Idle LITTLE Cluster big core Mid Frequency big Cluster LITTLE core Low Frequency LITTLE core High Frequency big core Low Frequency big core High Frequency Asphalt 7 Dungeon Defenders Video Playback * Measure of variance in frame rate Measurements conducted on the same big.little platform LITTLE only big.little 10

big.little MP Delivers Higher Power Efficiency 2.00 1.50 1.00 0.50 0.00 4x4 big.little MP vs. 4x4 Cluster Migration Efficiency Power Efficiency Cluster Migration Frequency residency profile while running Antutu CPU 1.2GHz 1.4GHz Cortex-A15 MP4 A7 cores not running due to cluster migration Cortex-A7 MP4 Cluster Migration SoC thermal budget constrains Cortex-A15 cores to lower frequency resulting in lower benchmark performance 1.7GHz 1.2GHz 1.1GHz 1.3 GHz big.little MP 35% average improvement in power efficiency across Single-Thread and Multi-Thread workloads Cortex-A15 MP4 Cortex-A7 MP4 Cortex-A15 and Cortex-A7 clusters at peak performance within the thermal budget 11

big.little MP Extends Battery Life 100% DVFS states : Temple run 200% Relative battery life on big.little MP Cluster Migration 80% 60% Single-thread performance on highly efficient LITTLE cores enable increased power savings 150% big.little MP 100% 40% Cores in the big cluster are powered down 20% 50% 0% LITTLE Cluster big Cluster A7 CPU0 A7 CPU1 A7 CPU2 A7 CPU3 A15 CPU4 A15 CPU5 0% 12 LITTLE core idle LITTLE core Med frequency big core idle big core Med frequency LITTLE core low frequency LITTLE core high frequency big core low frequency big core high frequency

big.little MP Support and Services Available big.little MP Software http://git.linaro.org/gitweb?p=arm/big.little/mp.git Linaro Landing Teams for Club and Core Members Provides Software Support under NDA Exclusive Landing Teams for each Member company Services and Support Offered through ARM Active Assist Design Review big.little system Technical Support & Application Notes big.little MP Integration and Tuning Guides On-site Software Training 13

Agenda Introduction to big.little Technology Benefits of big.little Technology Future big.little systems Summary Questions 14

Power (mw) ARMv8-A Enables 64-bit big.little Improved performance on big.little ARMv8 Cortex-A57: Highest performance big CPU in thermal envelope Cortex-A53: Most energy efficient LITTLE CPU 1500 SpecInt2000 Power vs. Performance* Higher performance at same power 1000 500 Extended range of efficiency Cortex-A15 (ARMv7-A big) Cortex-A7 (ARMv7-A LITTLE) Cortex-A57 (ARMv8-A big) Cortex-A53 (ARMv8-A LITTLE) 0 0 200 400 Performance 600(Spec2000) 800 1000 1200 15 *SpecInt2000 on iso-process & 32-bit

Extending big.little MP for Thermal Management ARM Intelligent Power Allocation (IPA) Power transforms to heat Device SoC SoC Tdie Tskin Performance Requests big LITTLE GPU IPA Real time CPU & GPU performance requests Elements: Proactive temperature control Power estimation Dynamic power allocation big LITTLE GPU Allocated Performance Dynamic Allocation by: Performance required Thermal headroom 16

Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq Time Device temperature is below threshold There are no constraints on power / performance Every actor runs at max required frequency Median filtered chart for clarity 17

Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq Time High load on GPU & low load on CPU GPU gets allocated most of the power Median filtered chart for clarity 18

Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq Time High load on CPU & low load on GPU CPU gets allocated most of the power Median filtered chart for clarity 19

Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq Time Device temperature gets hotter IPA reduces available power to actors This maintains temperature control Median filtered chart for clarity 20

IPA vs. Traditional (Relative Performance) Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq 40 30 20 10 0 Median filtered chart for clarity 13% Improvement Time 34% Improvement 36% Improvement 28% Improvement 1st Run 2nd Run 3rd Run Average 21

big.little Mobile 2015 Display NIC-400 Display Cortex-A57 GIC-400 Cortex-A53 Mali T720 GPU MMU-400 I/O Coherent Masters NIC-400 MMU-400 MMU-400 CoreLink CCI-400 TZC-400 DMC-400 DRAM (2 * x32 DDR3-1600) Peripherals 22

ARM big.little Mobile Roadmap ARM IP Present Future Cortex-A17 Cortex-A15 Cortex-A7 Cortex-A57 Cortex-A53 Next-Gen High Performance big CPUs Next-Gen Power Efficient LITTLE CPUs CCI-400 Next-Gen Cache Coherent Interconnects Intelligent Power Allocation ARM Software Global Task Scheduling + + 64-bit Android L Support 23

Agenda Introduction to big.little Technology Benefits of big.little Technology Future big.little systems Summary Questions 24

Summary big.little is fast becoming the de-facto power optimization technology in mobile big.little processing technology delivers best-in-class performance and energy efficiency in devices today Improved user-experience and prolonged battery life measured on real smartphone devices Devices transitioning to advanced big.little Technology with additional features and IP support 25

26 Thank You