Multicore for mobile: The More the Merrier? Roger Shepherd Chipless Ltd 1
Topics The Mobile Computing Platform The Application Processor CMOS Power Model Multicore Software: Complexity & Scaling Conclusion 2
The Mobile Computing Platform 3
Software Is Eating The World A smartphone is a computer which can make phone calls take notes tell the time take photos Physical devices have become icons 4
What makes a smartphone? A computer with a touchscreen (and some other special i/o) Sophisticated software Unix, GUI, TCP/IP, Browser, SatNav, Speech Recognition, 5
Samsung Galaxy S5 A ~30 core heterogeneous multiprocessor in your shirt pocket Ian Philips - ARM http://www.techinsights.com/teardown.com/samsung-galaxy-s5-teardown/ 6
Samsung Galaxy S5 Functionally decomposed into many chips Different semiconductor manufacturing processes Different suppliers Several contain processor cores Where does the software - operating system and applications - run? 7
The Application Processor 8
Exynos 5 - Octa-Core Applications Processor 9
Exynos 5 - Octa-Core Applications Processor Exynos 5422 Application Processor 4 ARM A15 + 4 ARM A7 9 core GPU 10
The Challenge The application processor must achieve high computational performance and responsiveness under strong constraints on Cost Energy Power 11
Power and Energy Simple relationship: e = p(t) dt Battery Capacity 55,000 mwh Energy - battery life Power - thermal issues burning and discomfort degradation of circuitry 1,400 mwh 1,810 mwh 2,915 mwh 2007 MacBook 2007 iphone 2014 iphone 6 2014 iphone 6+ 12
Engineering Relationships Computational capability microarchitecture, frequency and software Frequency (maximum) microarchitecture, process, implementation and voltage Power process, voltage, frequency, microarchitecture, implementation and software 13
Simple CMOS power model p = pswitching + pleakage pswitching CV 2 f eswitching CV 2 f t = CV 2 n where n = f t cycles It appears that eswitching depends on cycle count not frequency But (maximum) f and V are dependent - f V 14
Voltage-Frequency Scaling Consider two combinations, V1,f1 and V2,f2 with V1<V2 and f1<f2 Compare switching energy over n cycles e1/e2 = CV1 2 n / CV2 2 n = (V1/V2) 2 < 1 because V1/V2 < 1 Comparing switching power p1/p2 = CV1 2 f1 / CV2 2 f2 =(V1/V2) 2 (f1/f2) = e1/e2 (f1/f2) 1 Running slower lowers power and saves energy 15
Frequency, energy, power v. Voltage 3.0 Energy Relative to 1V Power Relative to 1V Frequency (GHz) 3 2.3 2.45 2.0 1.88 1.0 1 1.00 1.00 0.3 0.25 0.37 0.0 0.03 0.16 0.4 0.6 0.8 1 1.2 1.4 1.6 Voltage Source: ST Shanghai SOI Summit Oct 2013 16
But you ve forgotten leakage At realistic operating voltages and smartphone temperatures dynamic power/energy dominates From Intel for Pentium-like x86 0.5V 100 MHz 17 mw 0.8V 500 MHz 174 mw 1.2V 915 MHz 737 mw A 280mV-to-1.2V Wide-Operating-Range IA-32 Processor in 32nm CMOS: Intel, ISSC 2012 17
What s all this to do with multicore? 18
1: Pushing against performance limits If you are up against the performance limits of a uniprocessor your options are limited. Multicore sounds like a winner - if software scales 19
2: Power efficiency Assume workload scales Options: one processor at f or two processors at f/2 Power ratio: CV1 2 f : 2 x CV2 2 (f/ 2) = (V1/V2) 2 ST results show 25% power at 50% performance Two processors halve power and energy Source: ST Shanghai SOI Summit Oct 2013 20
Limits to voltage-frequency scaling There are limits to how far voltage can be reduced If 4 processors at f/4 run at the same voltage as 2 processors at f/2 No benefit - even if software scales 21
3: Big-Little Microarchitectures and implementations have different powerperformance characteristics Typically slower (compute) processors have lower power Idea: use ISA compatible processors with differing characteristics fast/high-power for heavy workloads slow/low-power for light workloads 22
A15-A7 Illustration EE Times - A15 is 5x area and power of A7 with 2-3x the performance Constant voltage power ratio A15 @ f/3.0 : A7 = 5CV 2 (f/ 3.0) : CV 2 f = 1.7 Voltage-frequency scaling power ratio A15 @ f/2.0:a7 = k5cv15 2 f/ 2 : kcv7 2 f = 2.5 x (V15/V7) 2 = 2.5 x 0.5 = 1.25 A15 @ f/2.3:a7 = k5cv15 2 f/ 2.3 : kcv7 2 f = 2.18 x (V15/V7) 2 = 2.18 x 0.36 = 0.78 Results depend critically on scaling of f with V and limits of V scaling 23
Software: Complexity & Scaling 24
Power Management Software The operating system has to has to manage Voltage-frequency scaling Choice of core where power/performance characteristics differ and all the other operating system stuff. 25
Software doesn t scale: 1 2005 - Herb Sutter s the free lunch is over - call to arms 2010 - Geoffrey Blake et al - Evolution of thread level parallelism in desktop applications dual-processors improved responsiveness most software (games, office applications, multimedia playback) can use only two processors effectively; very few applications (e.g. video authoring) can use more Comment: GPUs may eat the low-hanging parallel fruit * Evolution of thread level parallelism in desktop applications ISCA-10, 2010 26
Software doesn t scale: 2 Multicore Web Browsing - ST-Ericsson Page load time for two popular Android browsers Single to Dual ~1.3x faster Dual to Quad ~1.1x faster This is without taking into account a probable frequency drop when moving from dual-core to quad-core http://etn.se/images/expert/fd-soi-equad-white-paper.pdf 27
Multicore: the best use of silicon? NVidia state the move from ARM Cortex A9 r3 to r4 gains 25% on web-browsing This is cheaper than doubling the number of cores 28
Conclusion 29
Multicore The mobile phone is a heterogeneous multicore - for good reasons A dual-core application processor seems to be a good choice Performance, Power, Responsiveness Software scalability limits exploitation beyond two cores not to mention the complexity of managing lots of processors If the software complexity issues of big.little can be overcome. 30
Thank you 31