Multicore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd

Similar documents
Samsung System LSI Business

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency

Cut Power Consumption by 5x Without Losing Performance

The Mont-Blanc approach towards Exascale

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

Multimedia in Mobile Phones. Architectures and Trends Lund

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing

SMARTPHONE HARDWARE: ANATOMY OF A HANDSET. Mainak Chaudhuri Indian Institute of Technology Kanpur Commonwealth of Learning Vancouver

ARM big.little Technology Unleashed An Improved User Experience Delivered

European energy efficient supercomputer project

Microprocessor Trends and Implications for the Future

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Multi-Core Microprocessor Chips: Motivation & Challenges

Building supercomputers from embedded technologies

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Heterogeneous Architecture. Luca Benini

Outline Marquette University

Fundamentals of Computers Design

R&D TECHNOLOGY DRIVING INNOVATION. Samsung Semiconductor, Inc. SAMSUNG SEMICONDUCTOR, INC. 1

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Fundamentals of Computer Design

Lecture 1: Why Parallelism? Parallel Computer Architecture and Programming CMU , Spring 2013

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

DEMYSTIFYING INTEL IVY BRIDGE MICROARCHITECTURE

2010 PROFIT. Research & Education. Tim Cheng ( 鄭光廷 )

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

Process and Design Solutions for Exploiting FD SOI Technology Towards Energy Efficient SOCs

Standalone application processors are wining in mobility markets, says Petrov Group

Position Paper: OpenMP scheduling on ARM big.little architecture

Intel Core i7 Processor

The Benefits of GPU Compute on ARM Mali GPUs

UnCovert: Evaluating thermal covert channels on Android systems. Pascal Wild

Soitec ultra-thin SOI substrates enabling FD-SOI technology. July, 2015

MICROPROCESSOR TECHNOLOGY

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

A Study on C-group controlled big.little Architecture

2009 International Solid-State Circuits Conference Intel Paper Highlights

Integrated mobile processors to challenge standalone application processors, says Petrov Group (part 2)

Computer Architecture!

Hardware-Software Codesign. 1. Introduction

Multithreading: Exploiting Thread-Level Parallelism within a Processor

CSC 170 Introduction to Computers and Their Applications. Computers

Computer Architecture!

Lecture 13 IoT and Augmented Reality

Power Management as I knew it. Jim Kardach

Multicore Hardware and Parallelism

Artificial Intelligence Enriched User Experience with ARM Technologies

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

What You Need to Know When Buying a New Computer JackaboutComputers.com

Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms

EE5780 Advanced VLSI CAD

REVISED SYLLABUS UNIT 5

Building blocks for 64-bit Systems Development of System IP in ARM

CS 61C: Great Ideas in Computer Architecture Case Studies: Server and Cellphone microprocessors. Not yet in producvon, the next core awer Ivy Bridge!

Kontron s ARM-based COM solutions and software services

Many-core back to the future. Matt Horsnell ARM Research and Development

Samsung Galaxy S 4g Full Specifications And

A 50% Lower Power ARM Cortex CPU using DDC Technology with Body Bias. David Kidd August 26, 2013

Qualcomm Snapdragon 450 Mobile Platform

Each Milliwatt Matters

Building supercomputers from commodity embedded chips

Parallelism in Hardware

Seahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor

Intelligent Power Allocation for Consumer & Embedded Thermal Control

Disclaimer: This e-book doesn t tend to express hatred against any smartphone company or operating system. We believe that every company holds a

ECE 571 Advanced Microprocessor-Based Design Lecture 7

Embedded Systems: Architecture

FPGA BASED SYSTEM DESIGN. Dr. Tayab Din Memon Lecture 1 & 2

Lecture 1: Gentle Introduction to GPUs

Smartwatches (April 12, 2017) Samsung Gear Live, 2014 Samsung S 3G, 2014 Samsung S3 LTE, November 2016

TINY System Ultra-Low Power Sensor Hub for Always-on Context Features

Utilization-based Power Modeling of Modern Mobile Application Processor

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

Contour Detection on Mobile Platforms

2010: TRANSITION & TRANSFORMATION

The Future Evolution of High-Performance Microprocessors

Architectural Musings

Arm s Latest CPU for Laptop-Class Performance

Helio X20: The First Tri-Gear Mobile SoC with CorePilot 3.0 Technology

Energy Efficiency Analysis of Heterogeneous Platforms: Early Experiences

MICROPROCESSOR ARCHITECTURE

A+ Guide to Hardware, 4e. Chapter 4 Processors and Chipsets

Computer Architecture!

Computer Architecture. Fall Dongkun Shin, SKKU

Fra superdatamaskiner til grafikkprosessorer og

Apple Ipod Video Instructions 8gb 4th Generation Facetime And Retina Display

Slide Set 8. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

An Adaptive Control Scheme for Multi-threaded Graphics Programs

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Parallelism and Concurrency. COS 326 David Walker Princeton University

Gigascale Integration Design Challenges & Opportunities. Shekhar Borkar Circuit Research, Intel Labs October 24, 2004

Leveraging the Benefits of Symmetric Multiprocessing (SMP) in Mobile Devices

FABRICATION TECHNOLOGIES

Agenda. What is Ryzen? History. Features. Zen Architecture. SenseMI Technology. Master Software. Benchmarks

Part 1 of 3 -Understand the hardware components of computer systems

The Mont-Blanc Project

ECE 571 Advanced Microprocessor-Based Design Lecture 24

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

Techniques and tools for measuring energy efficiency of scientific software applications

Transcription:

Multicore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd 1

Topics The Mobile Computing Platform The Application Processor CMOS Power Model Multicore Software: Complexity & Scaling Conclusion 2

The Mobile Computing Platform 3

Software Is Eating The World A smartphone is a computer which can make phone calls take notes tell the time take photos Physical devices have become icons 4

What makes a smartphone? A computer with a touchscreen (and some other special i/o) Sophisticated software Unix, GUI, TCP/IP, Browser, SatNav, Speech Recognition, 5

Samsung Galaxy S5 A ~30 core heterogeneous multiprocessor in your shirt pocket Ian Philips - ARM http://www.techinsights.com/teardown.com/samsung-galaxy-s5-teardown/ 6

Samsung Galaxy S5 Functionally decomposed into many chips Different semiconductor manufacturing processes Different suppliers Several contain processor cores Where does the software - operating system and applications - run? 7

The Application Processor 8

Exynos 5 - Octa-Core Applications Processor 9

Exynos 5 - Octa-Core Applications Processor Exynos 5422 Application Processor 4 ARM A15 + 4 ARM A7 9 core GPU 10

The Challenge The application processor must achieve high computational performance and responsiveness under strong constraints on Cost Energy Power 11

Power and Energy Simple relationship: e = p(t) dt Battery Capacity 55,000 mwh Energy - battery life Power - thermal issues burning and discomfort degradation of circuitry 1,400 mwh 1,810 mwh 2,915 mwh 2007 MacBook 2007 iphone 2014 iphone 6 2014 iphone 6+ 12

Engineering Relationships Computational capability microarchitecture, frequency and software Frequency (maximum) microarchitecture, process, implementation and voltage Power process, voltage, frequency, microarchitecture, implementation and software 13

Simple CMOS power model p = pswitching + pleakage pswitching CV 2 f eswitching CV 2 f t = CV 2 n where n = f t cycles It appears that eswitching depends on cycle count not frequency But (maximum) f and V are dependent - f V 14

Voltage-Frequency Scaling Consider two combinations, V1,f1 and V2,f2 with V1<V2 and f1<f2 Compare switching energy over n cycles e1/e2 = CV1 2 n / CV2 2 n = (V1/V2) 2 < 1 because V1/V2 < 1 Comparing switching power p1/p2 = CV1 2 f1 / CV2 2 f2 =(V1/V2) 2 (f1/f2) = e1/e2 (f1/f2) 1 Running slower lowers power and saves energy 15

Frequency, energy, power v. Voltage 3.0 Energy Relative to 1V Power Relative to 1V Frequency (GHz) 3 2.3 2.45 2.0 1.88 1.0 1 1.00 1.00 0.3 0.25 0.37 0.0 0.03 0.16 0.4 0.6 0.8 1 1.2 1.4 1.6 Voltage Source: ST Shanghai SOI Summit Oct 2013 16

But you ve forgotten leakage At realistic operating voltages and smartphone temperatures dynamic power/energy dominates From Intel for Pentium-like x86 0.5V 100 MHz 17 mw 0.8V 500 MHz 174 mw 1.2V 915 MHz 737 mw A 280mV-to-1.2V Wide-Operating-Range IA-32 Processor in 32nm CMOS: Intel, ISSC 2012 17

What s all this to do with multicore? 18

1: Pushing against performance limits If you are up against the performance limits of a uniprocessor your options are limited. Multicore sounds like a winner - if software scales 19

2: Power efficiency Assume workload scales Options: one processor at f or two processors at f/2 Power ratio: CV1 2 f : 2 x CV2 2 (f/ 2) = (V1/V2) 2 ST results show 25% power at 50% performance Two processors halve power and energy Source: ST Shanghai SOI Summit Oct 2013 20

Limits to voltage-frequency scaling There are limits to how far voltage can be reduced If 4 processors at f/4 run at the same voltage as 2 processors at f/2 No benefit - even if software scales 21

3: Big-Little Microarchitectures and implementations have different powerperformance characteristics Typically slower (compute) processors have lower power Idea: use ISA compatible processors with differing characteristics fast/high-power for heavy workloads slow/low-power for light workloads 22

A15-A7 Illustration EE Times - A15 is 5x area and power of A7 with 2-3x the performance Constant voltage power ratio A15 @ f/3.0 : A7 = 5CV 2 (f/ 3.0) : CV 2 f = 1.7 Voltage-frequency scaling power ratio A15 @ f/2.0:a7 = k5cv15 2 f/ 2 : kcv7 2 f = 2.5 x (V15/V7) 2 = 2.5 x 0.5 = 1.25 A15 @ f/2.3:a7 = k5cv15 2 f/ 2.3 : kcv7 2 f = 2.18 x (V15/V7) 2 = 2.18 x 0.36 = 0.78 Results depend critically on scaling of f with V and limits of V scaling 23

Software: Complexity & Scaling 24

Power Management Software The operating system has to has to manage Voltage-frequency scaling Choice of core where power/performance characteristics differ and all the other operating system stuff. 25

Software doesn t scale: 1 2005 - Herb Sutter s the free lunch is over - call to arms 2010 - Geoffrey Blake et al - Evolution of thread level parallelism in desktop applications dual-processors improved responsiveness most software (games, office applications, multimedia playback) can use only two processors effectively; very few applications (e.g. video authoring) can use more Comment: GPUs may eat the low-hanging parallel fruit * Evolution of thread level parallelism in desktop applications ISCA-10, 2010 26

Software doesn t scale: 2 Multicore Web Browsing - ST-Ericsson Page load time for two popular Android browsers Single to Dual ~1.3x faster Dual to Quad ~1.1x faster This is without taking into account a probable frequency drop when moving from dual-core to quad-core http://etn.se/images/expert/fd-soi-equad-white-paper.pdf 27

Multicore: the best use of silicon? NVidia state the move from ARM Cortex A9 r3 to r4 gains 25% on web-browsing This is cheaper than doubling the number of cores 28

Conclusion 29

Multicore The mobile phone is a heterogeneous multicore - for good reasons A dual-core application processor seems to be a good choice Performance, Power, Responsiveness Software scalability limits exploitation beyond two cores not to mention the complexity of managing lots of processors If the software complexity issues of big.little can be overcome. 30

Thank you 31