ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1
Agenda Introduction to big.little Technology Benefits of big.little Technology Future big.little systems Summary Questions 2
Power Power Power Mobile Application Workloads Web Browsing Mobile users spend a high amount of time on a range of mobile applications*: 38% on web browsing and Facebook 32% on gaming 16% on audio, video and utility Time Gaming Common building blocks in workloads: Short bursts of high intensity Long periods of sustained high intensity Low intensity Time Audio Playback * Source: Flurry Analytics Time 3 Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform
Mobile Application Workloads Applications require a mix of performance levels Mobile users want a better user experience but not at a cost of reduced battery life Power Category 1 Burst of High Intensity Workloads Category 2 Sustained Performance at Thermal Limit Category 3 Long-use Low-Intensity Workloads Sustained Performance Envelope Example: Web Browsing Example: Castlemaster Example: Audio Playback 4
Percentage of Time Spent in DVFS States Mobile Application Workload Profiles Applications require a mix of performance levels Mobile users want a better user experience but not at a cost of reduced battery life Category 1 Burst High Intensity Workloads Category 2 Sustained Performance at Thermal Limit Category 3 Long-use Low-Intensity Workloads High Mid Low WFI Idle / Power Down Example: Web Browsing Example: Castlemaster Example: Audio Playback 5 Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform
big.little Technology Heterogeneous Computing 2x higher performance vs. LITTLE only Up to 75% CPU power savings vs. big only big Cluster Interrupt Control Architecturally Identical Processors High performance tuned big cores Low power tuned LITTLE cores L2 Cache LITTLE Cluster L2 Cache Hardware Coherency Cache Coherent Interconnect (CCI) L1 and L2 snooping between clusters Seamless & Automatic Task Allocation Cache Coherent Interconnect Right Task on the Right Core Up to 40% SOC power savings* 6 * Measured across a set of casual games and common use-cases on an ARM Partner 4xCortex-A15.4xCortex-A7 big.little device
Agenda Introduction to big.little Technology Benefits of big.little Technology Future big.little systems Summary Questions 7
big.little MP Software Evolution Cluster Migration 1 1 1 1 2 2 2 big.little CPU Migration 1 1 2 3 2 3 Global Task Scheduling (big.little MP) 1 2 3 5 6 7 Measured Power and Performance on big.little Devices 180% 160% 140% 120% 100% 80% 60% (big.little MP relative to Cluster Migration) Power -29% -38% 180% 160% 140% 120% 100% 80% 60% Performance +20% +60% Cluster Migration big.little MP 2 4 4 4 8 40% 20% 40% 20% Improving Performance and Efficiency 2012 H1 2013 H2 2013 0% Web Intensive Browsing Gaming (Lower is Better) 0% Web Intensive Browsing Gaming (Higher is Better) 8
big.little MP Measured Power and Performance on big.little Devices (big.little MP relative to Cluster Migration) Delivers higher power efficiency Extends battery life 180% 160% 140% 120% 100% Power -29% -38% 180% 160% 140% 120% 100% Performance +60% +20% Cluster Migration big.little MP 80% 80% 60% 60% Improves user experience 40% 20% 0% 40% 20% 0% Web Intensive Browsing Gaming (Lower is Better) Web Intensive Browsing Gaming (Higher is Better) 9
big.little MP Improves User Experience (UX) 100% 80% DVFS states: Web Browsing with Audio Normalized Jank* (Less is Better) 58% 65% 47% UX Improvement 60% 40% 20% 0% LITTLE cores handle background tasks and audio Short bursts of performance on big cores enable sustained levels of smooth user-experience CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 LITTLE core Idle LITTLE core Mid Frequency big core Idle LITTLE Cluster big core Mid Frequency big Cluster LITTLE core Low Frequency LITTLE core High Frequency big core Low Frequency big core High Frequency Asphalt 7 Dungeon Defenders Video Playback * Measure of variance in frame rate Measurements conducted on the same big.little platform LITTLE only big.little 10
big.little MP Delivers Higher Power Efficiency 2.00 1.50 1.00 0.50 0.00 4x4 big.little MP vs. 4x4 Cluster Migration Efficiency Power Efficiency Cluster Migration Frequency residency profile while running Antutu CPU 1.2GHz 1.4GHz Cortex-A15 MP4 A7 cores not running due to cluster migration Cortex-A7 MP4 Cluster Migration SoC thermal budget constrains Cortex-A15 cores to lower frequency resulting in lower benchmark performance 1.7GHz 1.2GHz 1.1GHz 1.3 GHz big.little MP 35% average improvement in power efficiency across Single-Thread and Multi-Thread workloads Cortex-A15 MP4 Cortex-A7 MP4 Cortex-A15 and Cortex-A7 clusters at peak performance within the thermal budget 11
big.little MP Extends Battery Life 100% DVFS states : Temple run 200% Relative battery life on big.little MP Cluster Migration 80% 60% Single-thread performance on highly efficient LITTLE cores enable increased power savings 150% big.little MP 100% 40% Cores in the big cluster are powered down 20% 50% 0% LITTLE Cluster big Cluster A7 CPU0 A7 CPU1 A7 CPU2 A7 CPU3 A15 CPU4 A15 CPU5 0% 12 LITTLE core idle LITTLE core Med frequency big core idle big core Med frequency LITTLE core low frequency LITTLE core high frequency big core low frequency big core high frequency
big.little MP Support and Services Available big.little MP Software http://git.linaro.org/gitweb?p=arm/big.little/mp.git Linaro Landing Teams for Club and Core Members Provides Software Support under NDA Exclusive Landing Teams for each Member company Services and Support Offered through ARM Active Assist Design Review big.little system Technical Support & Application Notes big.little MP Integration and Tuning Guides On-site Software Training 13
Agenda Introduction to big.little Technology Benefits of big.little Technology Future big.little systems Summary Questions 14
Power (mw) ARMv8-A Enables 64-bit big.little Improved performance on big.little ARMv8 Cortex-A57: Highest performance big CPU in thermal envelope Cortex-A53: Most energy efficient LITTLE CPU 1500 SpecInt2000 Power vs. Performance* Higher performance at same power 1000 500 Extended range of efficiency Cortex-A15 (ARMv7-A big) Cortex-A7 (ARMv7-A LITTLE) Cortex-A57 (ARMv8-A big) Cortex-A53 (ARMv8-A LITTLE) 0 0 200 400 Performance 600(Spec2000) 800 1000 1200 15 *SpecInt2000 on iso-process & 32-bit
Extending big.little MP for Thermal Management ARM Intelligent Power Allocation (IPA) Power transforms to heat Device SoC SoC Tdie Tskin Performance Requests big LITTLE GPU IPA Real time CPU & GPU performance requests Elements: Proactive temperature control Power estimation Dynamic power allocation big LITTLE GPU Allocated Performance Dynamic Allocation by: Performance required Thermal headroom 16
Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq Time Device temperature is below threshold There are no constraints on power / performance Every actor runs at max required frequency Median filtered chart for clarity 17
Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq Time High load on GPU & low load on CPU GPU gets allocated most of the power Median filtered chart for clarity 18
Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq Time High load on CPU & low load on GPU CPU gets allocated most of the power Median filtered chart for clarity 19
Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq Time Device temperature gets hotter IPA reduces available power to actors This maintains temperature control Median filtered chart for clarity 20
IPA vs. Traditional (Relative Performance) Running Frequency Intelligent Power Allocation in Action Three consecutive runs of GLB TRex Max big freq big running freq Max LITTLE freq LITTLE running freq Max GPU freq GPU running freq 40 30 20 10 0 Median filtered chart for clarity 13% Improvement Time 34% Improvement 36% Improvement 28% Improvement 1st Run 2nd Run 3rd Run Average 21
big.little Mobile 2015 Display NIC-400 Display Cortex-A57 GIC-400 Cortex-A53 Mali T720 GPU MMU-400 I/O Coherent Masters NIC-400 MMU-400 MMU-400 CoreLink CCI-400 TZC-400 DMC-400 DRAM (2 * x32 DDR3-1600) Peripherals 22
ARM big.little Mobile Roadmap ARM IP Present Future Cortex-A17 Cortex-A15 Cortex-A7 Cortex-A57 Cortex-A53 Next-Gen High Performance big CPUs Next-Gen Power Efficient LITTLE CPUs CCI-400 Next-Gen Cache Coherent Interconnects Intelligent Power Allocation ARM Software Global Task Scheduling + + 64-bit Android L Support 23
Agenda Introduction to big.little Technology Benefits of big.little Technology Future big.little systems Summary Questions 24
Summary big.little is fast becoming the de-facto power optimization technology in mobile big.little processing technology delivers best-in-class performance and energy efficiency in devices today Improved user-experience and prolonged battery life measured on real smartphone devices Devices transitioning to advanced big.little Technology with additional features and IP support 25
26 Thank You