MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency

Similar documents
MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

Helio X20: The First Tri-Gear Mobile SoC with CorePilot 3.0 Technology

ARM big.little Technology Unleashed An Improved User Experience Delivered

ARM Vision for Thermal Management and Energy Aware Scheduling on Linux

Heterogeneous Architecture. Luca Benini

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Building blocks for 64-bit Systems Development of System IP in ARM

A Study on C-group controlled big.little Architecture

UTILIZING A BIG.LITTLE TM SOLUTION IN AUTOMOTIVE

Cut Power Consumption by 5x Without Losing Performance

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Intelligent Power Allocation for Consumer & Embedded Thermal Control

Silvermont. Introducing Next Generation Low Power Microarchitecture: Dadi Perlmutter

A unified multicore programming model

Efficient Programming for Multicore Processor Heterogeneity: OpenMP Versus OmpSs

Energy Discounted Computing On Multicore Smartphones Meng Zhu & Kai Shen. Atul Bhargav

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing

Multicore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd

An Asymmetry-aware Energy-efficient Hypervisor Scheduling Policy for Asymmetric Multi-core

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

ARM Intelligent Power Allocation

Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms

Thread Affinity Experiments

LCA14-104: GTS- A solution to support ARM s big.little technology. Mon-3-Mar, 11:15am, Mathieu Poirier

ARM instruction sets and CPUs for wide-ranging applications

Next Generation Enterprise Solutions from ARM

DE0 Nano SoC - CPU Performance and Power

On-chip Networks Enable the Dark Silicon Advantage. Drew Wingard CTO & Co-founder Sonics, Inc.

Seahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor

Towards Power Management for FreeBSD

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

EECS 750: Advanced Operating Systems. 01/29 /2014 Heechul Yun

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System

COL862 - Low Power Computing

SWsoft ADVANCED VIRTUALIZATION AND WORKLOAD MANAGEMENT ON ITANIUM 2-BASED SERVERS

Energy Efficiency Analysis of Heterogeneous Platforms: Early Experiences

Samsung System LSI Business

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

NEXGEN N5 PERFORMANCE IN A VIRTUALIZED ENVIRONMENT

8. CONCLUSION AND FUTURE WORK. To address the formulated research issues, this thesis has achieved each of the objectives delineated in Chapter 1.

Position Paper: OpenMP scheduling on ARM big.little architecture

Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems

Building Ultra-Low Power Wearable SoCs

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Heterogeneous Computing with a Fused CPU+GPU Device

Accurate and Stable Empirical CPU Power Modelling for Multi- and Many-Core Systems

Energy efficient mapping of virtual machines

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

Best Practices for Setting BIOS Parameters for Performance

Kontron Technology ARM based Embedded

Messaging Overview. Introduction. Gen-Z Messaging

Jae Wook Lee. SIC R&D Lab. LG Electronics

10/1/ Introduction 2. Existing Methods 3. Future Research Issues 4. Existing works 5. My Research plan. What is Data Center

Web Browser Workload Characterization for Power Management on HMP Platforms

Leveraging the Benefits of Symmetric Multiprocessing (SMP) in Mobile Devices

SoC Idling & CPU Cluster PM

Quad-core Press Briefing First Quarter Update

LTE Device Customization For Operators

Hardware/Software T e T chniques for for DRAM DRAM Thermal Management

ECE519 Advanced Operating Systems

Exploring System Coherency and Maximizing Performance of Mobile Memory Systems

Resource Containers. A new facility for resource management in server systems. Presented by Uday Ananth. G. Banga, P. Druschel, J. C.

Data Sheet: Storage Management Veritas Storage Foundation by Symantec Heterogeneous online storage management

Parallel Simulation Accelerates Embedded Software Development, Debug and Test

Hardware-Assisted On-Demand Hypervisor Activation for Efficient Security Critical Code Execution on Mobile Devices

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Energy Conservation In Computational Grids

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

for Power Energy and

Leakage Mitigation Techniques in Smartphone SoCs

Power management for in-vehicle infotainment systems

MARACAS: A Real-Time Multicore VCPU Scheduling Framework

Virtualization Strategies on Oracle x86. Hwanki Lee Hardware Solution Specialist, Local Product Server Sales

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

Performance Analysis in the Real World of Online Services

Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications

Next Generation Visual Computing

Sustainable Computing: Informatics and Systems

HOT CHIPS 2014 NVIDIA S DENVER PROCESSOR. Darrell Boggs, CPU Architecture Co-authors: Gary Brown, Bill Rozas, Nathan Tuck, K S Venkatraman

Decoupled Access-Execute on ARM big.little

Kontron s ARM-based COM solutions and software services

Distributed File Systems Issues. NFS (Network File System) AFS: Namespace. The Andrew File System (AFS) Operating Systems 11/19/2012 CSC 256/456 1

Desktop Virtualization Endpoint Solutions for SMB. Erik Willey The VDI Endpoint Authority

IJESR Volume 1, Issue 1 ISSN:

Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API

Chapter 1: Introduction. Operating System Concepts 9 th Edit9on

Operating Systems Course 2 nd semester 2016/2017 Chapter 1: Introduction

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

I/O Systems (4): Power Management. CSE 2431: Introduction to Operating Systems

Find the right platform for your server needs

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

Power: What s the problem?

Transcription:

MediaTek CorePilot Heterogeneous Multi-Processing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip with Heterogeneous Multi-Processing. The MT8135 chipset for Android tablets features CorePilot technology that maximizes performance and power saving with interactive power management, adaptive thermal management and advanced scheduler algorithms.

Table of Contents ARM big.little Architecture...2 big.little Implementation Models...3 Cluster Migration...3 CPU Migration...3 Heterogeneous Multi-Processing...4 MediaTek CorePilot Heterogeneous Multi-Processing Technology...4 Interactive Power Management...6 Adaptive Thermal Management...6 Scheduler Algorithms...6 The MediaTek HMP Scheduler...6 The RT Scheduler...7 Task Scheduling & Performance... 8 CPU-Intensive Benchmarks... 8 Web Browsing...9 Task Scheduling & Power Efficiency...9 SUMMARY... 11 1

MediaTek CorePilot Heterogeneous Multi-Processing Technology ARM big.little Architecture ARM brought heterogeneous computing to mobile System on a Chip (SoC) designers with its big.little architecture. ARM big.little combines high-performance big CPUs with energy efficient LITTLE CPUs on the same SoC to reduce energy consumption (and hence preserve battery power), while still delivering peak performance. Since both CPUs are architecturally compatible, workloads can be allocated to each CPU, on demand, to suit performance needs. High intensity tasks such as games are allocated to the big CPUs, for example, while less demanding tasks such as email and audio playback are allocated to the LITTLE CPUs. The load balancing happens quickly enough to be completely transparent to the user and can reduce energy consumption by up to 70% for common tasks. Figure 1: Asymmetric computing with the ARM big.little architecture big CPU Performance Driven High performance High power-consumption Ideal for compute-intensive tasks LITTLE CPU Energy Efficient Moderate performance Low power-consumption Ideal for routine tasks Table 1: ARM big.little CPU comparison 2

big.little Implementation Models There are three different software models for implementing heterogeneous computing with the ARM big.little architecture: Cluster Migration, CPU Migration and Heterogeneous Multi-Processing. Cluster Migration Cluster Migration groups big and LITTLE CPUs into two clusters and shifts tasks between the two, as required. Only one cluster can be active at a time. Figure 2: The Cluster migration model CPU Migration CPU Migration groups each big and LITTLE CPU into a pair, but while a task can be allocated to one or more of these pairs, only one CPU in each pair can be active at a time. Figure 3: The CPU migration model 3

Heterogeneous Multi-Processing Heterogeneous multi-processing removes all limitations and allows a task to be allocated to any combination of big and LITTLE CPUs. Heterogeneous multi-processing is extremely flexible and inherently superior to cluster migration and CPU migration, but performance and energy consumption are highly dependent on the SoC s task scheduler. Figure 4: The Heterogeneous multi-processing model Cluster Migration CPU Migration Heterogeneous Multi-Processing Switching Granularity Cluster low flexibility CPU medium flexibility Unrestricted high flexibility Maximum Performance Medium: 2x big Medium: 2x big High: 2x big + 2x LITTLE Average Power Saving Low coarse granularity Medium High fine granularity Table 2: Comparison of different heterogeneous computing models MediaTek CorePilot Heterogeneous Multi-Processing Technology MediaTek is a leader in heterogeneous computing and is actively shaping its future as a founder member of the Heterogeneous System Architecture Foundation (www.hsafoundation.com), a not-for-profit consortium of SoC and software vendors, OEMs, and academia. In July 2013, MediaTek also delivered the industry s first mobile SoC to feature heterogeneous multi-processing (HMP) the MT8135 chipset with CorePilot technology for Android tablets. 4

Based on open-source HMP technology derived from Linaro (www.linaro.org), CorePilot maximizes the performance and power-saving potential of heterogeneous computing with interactive power management, adaptive thermal management and advanced scheduler algorithms. Figure 5: Power-efficiency methods Figure 6: MediaTek CorePilot overview 5

1. Interactive Power Management CorePilot s Interactive Power Manager reduces the amount of power and heat generated by CPU via two main modules. The Dynamic Voltage and Frequency Scaling module automatically adjusts the frequency and voltage of CPUs on the fly, while the CPU Hot Plug module switches CPUs on and off on demand. Dynamic Voltage & Frequency Scaling Traditional symmetric multi-processors apply a unified Dynamic Voltage and Frequency Scaling (DVFS) policy to all CPUs. CorePilot s Interactive Power Management applies different DVFS policies to big and LITTLE cores to maximize power and thermal efficiency. CPU Hot Plug Interactive Power Management monitors CPU load and seamlessly switches cores on or off to save power or to increase performance. CPUs can also be switched off with non-cpu-bound tasks to reduce power consumption. Table 3: Key components of Interactive Power Management 2. Adaptive Thermal Management The thermal limits of a mobile device can be exceeded when its CPU or GPU runs at peak performance. This, in turn, can be detrimental to the user experience. Adaptive Thermal Management (ATM) addresses this by monitoring device temperatures and dynamically adjusting the power budget to keep them within a specified range, while minimising the impact on performance. Compared with traditional dynamic power management designs with fixed temperature points for thermal throttling, ATM can achieve a 10% performance increase. 3. Scheduler Algorithms Performance is the usual goal for operating system scheduling and technology has evolved over time accordingly. With Symmetric Multi-Processing (SMP), the Completely Fair Scheduler (CFS) is currently the most common scheduling algorithm and it distributes the workload equally among CPU cores. With Heterogeneous Multi-Processing, however, CFS can result in performance degradation, since tasks are not efficiently matched to CPU core capabilities. MediaTek CorePilot, on the other hand, delivers a true heterogeneous compute model by using a scheduling algorithm that assigns tasks to two different schedulers, according to their priority the Heterogeneous Multi-Processing (HMP) scheduler and Real-Time (RT) scheduler. The MediaTek HMP Scheduler The HMP scheduler is responsible for assigning normal-priority tasks to the big.little CPU clusters and performs four main functions. 6

Load tracking By tracking the status of each task, the HMP scheduler determines which task is heavy and which task is relatively light. CPU Capacity Estimation The HMP Scheduler is aware of the available compute capacity of each processor in the big.little clusters, and so is able to make the most appropriate scheduling decisions. Intelligent Load-Balancing Load tracking and CPU capacity estimation are used in concert for rapid load balancing assigning and reassigning tasks to performance-driven or energy-efficient CPUs, as required. Task Packing The HMP scheduler consolidates as many light-load tasks as possible and matches them to the most appropriate CPUs. CPUs without active tasks can then be switched off via CPU Hot Plug, or put into an idle state. Table 4: Key components of the MediaTek HMP scheduler The RT Scheduler The RT scheduler assigns high-priority real-time tasks that require a fast CPU response to the big.little cluster. The RT scheduler has priority over the HMP scheduler and MediaTek has further modified its design so that the highest priority tasks are assigned to performance-driven CPUs. Lesser priority real-time tasks are then assigned to other available CPUs. Figure 7: Process flow for the HMP and RT schedulers 7

Task Scheduling & Performance Together, CorePilot s HMP and RT schedulers give software simultaneous access to all processors in the big.little clusters, and tasks are assigned selectively in real-time to strategically manage resources and increase power efficiency. The following real-world examples illustrate the effectiveness of CorePilot with regard to performance. CPU-Intensive Benchmarks With a CPU-intensive benchmark such as Antutu v3.3.2 (www.antutu.com), CorePilot spreads the load by assigning one task to each CPU core. The screenshot below, captured using the ARM Development Studio 5 (DS-5) application, shows this in action, with a full load on each core. Figure 8: Scheduling on the MT8135 SoC with the Antutu benchmark (data profiled by ARM DS5) The Linpack (www.netlib.org/benchmark/hpl) and Vellamo (www.quicinc.com/vellamo) benchmarks also clearly illustrate the performance difference between the standard SMP CFS scheduler and the CorePilot HMP scheduler. The benchmarks were run on the MediaTek MT8135 big.little heterogeneous computing platform. Linpack is a multi-threaded benchmark and the number of test tasks equals the number of CPU cores. Both the CFS and CorePilot scheduler assign one task to one CPU core, but since CorePilot is aware of the computing capability of each CPU, it can assign the heavy tasks to the performance-driven CPU first, with the remainder assigned to the energy-efficient CPU. Vellamo ( Metal Chapter ) is a single-threaded benchmark. The standard CFS scheduler assigns the benchmark task to the performance-driven or energy-efficient CPU on a random basis, but CorePilot always assigns the task to the performance-driven CPU. 8

Figure 9: MT8135 benchmark performance comparison Web Browsing Web browsing involves a mix of heavy and light tasks, so the CorePilot scheduler assigns tasks to performance-driven or energy-efficient CPUs accordingly. The screenshot below, captured using the ARM Development Studio 5 (DS-5) application, shows this in action. Figure 10: Web browsing on the MT8135 SoC (data profiled by ARM DS5) Task Scheduling & Power Efficiency CorePilot s Interactive Power Management not only limits CPU resources for power-efficient performance with light task loads, but also dynamically adjusts the CPU frequency to handle 9

heavier loads. Figure 11 shows the difference in power consumption between two otherwise identical platforms the MediaTek MT8135 with an unmodified CFS scheduler and the MT8135 with CorePilot running the Antutu (www.antutu.com) and Quadrant (www.aurorasoftworks.com) benchmarks. Figure 11: MT8135 energy consumption comparison with benchmarks Figures 12 and 13 show the difference in energy consumption between the standard ARM Cortex A-15 SMP quad-core platform with CFS and the MediaTek MT8135 HMP platform with CorePilot. CorePilot s interactive power management and adaptive thermal management help reduce energy consumption considerably with both benchmarks and realworld tasks such as multimedia playback. Note that the ARM Cortex A-15 quad-core energy consumption figures are extrapolated from dual-core data. Figure 12: ARM Cortex A15 & MT8135 energy consumption comparison with benchmarks 10

Figure 13: ARM Cortex A15 & MT8135 energy consumption comparison with multimedia playback SUMMARY Mobile SoCs have a limited power consumption budget. With ARM big.little, SoC platforms are capable of asymmetric computing where by tasks can be allocated to CPU cores in line with their processing needs. From the three available software models for configuring big.little SoC platforms, Heterogeneous Multi-Processing offers the best performance. MediaTek CorePilot technology is designed deliver the maximum compute performance from big.little mobile SoC platforms with low power consumption. The MediaTek CorePilot MT8135 chipset for Android is the industry s first Heterogeneous Multi-Processing implementation. MediaTek leads in the heterogeneous computing space and will release further CorePilot innovations in 2014. 11