LOWERING POWER CONSUMPTION OF HEVC DECODING. Chi Ching Chi Techinische Universität Berlin - AES PEGPUM 2014

Size: px
Start display at page:

Download "LOWERING POWER CONSUMPTION OF HEVC DECODING. Chi Ching Chi Techinische Universität Berlin - AES PEGPUM 2014"

Transcription

1 LOWERING POWER CONSUMPTION OF HEVC DECODING Chi Ching Chi Techinische Universität Berlin - AES PEGPUM 2014

2 Introduction How to achieve low power HEVC video decoding? Modern processors expose many low power techniques Clock gating, power gating, DVFS, asymmetric cores Increase application performance SIMD, multithreading Two (contradicting) general approaches: Exploit slack Race to idle What combination is best for video decoding? Slide 2

3 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 3

4 Architectural Techniques Power consumption processor: P cpu = αnkv dd I leak }{{} P static + (1 α)ncfv 2 dd }{{} P dynamic (1) Modern architecture incorporate various low power techniques to reduce the active and idle power Active power Dynamic Voltage Frequency Scaling (DVFS) Asymmetric cores Idle power Clock gating Power gating Slide 4

5 Low Power Active DVFS lowers V dd, I leak, f P static P dynamic Asymmetric ARM big.little extend DVFS in lower performance regions In Linux cpufreq subsystem controls both DVFS and asymmetric core switching Governors change frequency depending on load exynos_cpufreq driver maps Little cores to lower frequencies In general responds to load quickly to avoid performance regression Slide 5

6 Low Power Idle I Clock gating and power gating used to implement idle states (C-states) ACPI defines C0 - C3, Intel extends up to C10 Intel core C-states C0 Core is execution code C1 C1E C3 Core is halted most clocks are stopped Voltage reduced to Pn Core L1/L2 flushed to L3, PLL stopped C6-C10 Core state saved and power gated Slide 6

7 Low Power Idle II Intel package C-states C3 Mem self-refresh, some uncore clocks C6 C7 C8 stopped, some voltages reduced All uncore clocks stopped L3 power gated, more voltages reduced Most uncore power gated C9-C10 FIVR in low power/off state In Linux cpuidle subsystem controls C-states Trade off power saving in lower C-state vs. transition cost Predicts duration of next idle period to select C-state C-state, transition time, transition energy model specific Vendor supplies this in model/series specific drivers Slide 7

8 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 8

9 Power Optimization HEVC Decoder Two usage models HEVC decoding : offline and real-time Offline decoding Improve application performance General optimization ISA/Architecture specific (e.g. SIMD) Multithreading Additionally for real-time decoding Decrease worst-case complexity Coalesce computation Slide 9

10 Real-time Video Decoding I Decrease worst-case complexity using playback buffers Frametime (ms) 50 T buf Frame Lower peaks allow for execution on lower frequency/slower cores Improves possibilities for exploiting slack Slide 10

11 Real-time Video Decoding I Decrease worst-case complexity using playback buffers Frametime (ms) 50 T buf Frame Lower peaks allow for execution on lower frequency/slower cores Improves possibilities for exploiting slack Slide 10

12 Real-time Video Decoding I Decrease worst-case complexity using playback buffers Frametime (ms) 50 T buf Frame Lower peaks allow for execution on lower frequency/slower cores Improves possibilities for exploiting slack Slide 10

13 Real-time Video Decoding II Improve C-state residency with burst decoding 60 Ready buffer Frame Perform decoding bursts to improve deeper C-state residency Improves effectiveness of race to idle in theory In practice has no significant benefit except rare configurations Slide 11

14 Real-time Video Decoding II Improve C-state residency with burst decoding 60 Ready buffer s Frame Perform decoding bursts to improve deeper C-state residency Improves effectiveness of race to idle in theory In practice has no significant benefit except rare configurations Slide 11

15 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 12

16 Platforms I Used 4 platform representative of modern PC and consumer electronics processors Power measurable but not completely comparable Series Model Process Die size Measurable power Cores Uncore DRAM Haswell i5-4670t 22nm 177mm 2 Haswell ULT i5-4200u 22nm 134mm 2 Baytrail-T Z nm 102mm 2 Exynos nm 122mm 2 (GPU) Slide 13

17 Platforms II Vdd Haswell 4670T Haswell 4200U BayTrail Z3740* Exynos A15 Exynos A MHz 500 Slide 14

18 Haswell i5-4670t Haswell 4 cores TDP 45 W Low power desktop/ all-in-one Source: Intel Uncore Dram Cores 20 C1E Slide 15 C tu 00 rb o tu 00 rb o C tu 00 rb o tu 00 rb o C tu 00 rb o tu 00 rb o tu 00 rb o tu 00 rb o Watt 30 1-core 2-core 3-core 4-core

19 Haswell ULT i5-4200u Haswell 2 cores ( SMT ) TDP 15 W Ulltrabook/convertibles Source: Intel Uncore Cores Dram Watt turbo turbo turbo turbo turbo turbo turbo C7 C3 C1E C1 1-core 2-core 2c4t Slide 16

20 Baytrail-T Z3740 Silvermont 4 cores SDP 2 W/ TDP 5 W? Tablets 6 Uncore Source: Intel, tweakers.net Cores Watt C6 C1 1-core 2-core 3-core 4-core Slide 17

21 Exynos 5410 A15 + A7 big.little cores TDP 6 W? Smartphones/Tablets 6 Cores Dram Watt L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 C2 C1 1-core 2-core 3-core 4-core Slide 18

22 Methodology Use highly optimized HEVC Decoder developed by AES TUB All experiments executed under Linux (3.12+ for x86, 3.4 arm) GCC In total 100 FHD and UHD short 10 sec test sequences 1 second intra period, GOP size 8 FHD 24, 50, 60 fps with 40 to 200 kb/frame UHD 50 fps with 100 to 500 kb/frame Multithreading using wavefronts (WPP) Real-time execution simulated with 8 output picture buffer Slide 19

23 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 20

24 SIMD and Multithreading (1080p) Scalar SIMD mj/frame mj/frame Kb/frame Kb/frame SIMD+MT mj/frame Haswell 4670T Haswell 4200U BayTrail Z3740 Exynos A15 Exynos A Kb/frame Slide 21

25 SIMD and Multithreading (1080p) Scalar SIMD mj/frame mj/frame Kb/frame Kb/frame SIMD+MT mj/frame Haswell 4670T Haswell 4200U BayTrail Z3740 Exynos A15 Exynos A Kb/frame Slide 21

26 DVFS in Offline Decoding (1080p) Haswell 4670T Haswell 4200U Baytrail mj/frame MHz MHz MHz Cortex A15 Cortex A7 mj/frame T1 E tot T2 E tot T4 E tot MHz MHz Slide 22

27 DVFS in Offline Decoding (1080p) Haswell 4670T Haswell 4200U Baytrail mj/frame MHz MHz MHz Cortex A15 Cortex A7 mj/frame T1 E tot T2 E tot T4 E tot T1 E core T2 E core T4 E core MHz MHz Slide 22

28 DVFS in Real-time Decoding (1080p50 4.5Mbps) Haswell 4670T Haswell 4200U Watt T1 P tot T2 P tot T4 P tot T1 P core T2 P core T4 P core Baytrail Exynos T4 P tot T4 P core Watt MHz MHz Slide 23

29 DVFS in Real-time Decoding (1080p50 4.5Mbps) Haswell 4670T Haswell 4200U Watt T1 P tot T2 P tot T4 P tot T1 P core T2 P core T4 P core Ondemand Baytrail Exynos 5410 Watt T4 P tot T4 P core Ondemand MHz MHz Slide 23

30 Efficiency Out-of-the-box Haswell 4670T Haswell 4200U (no SMT) Watt p ondem. 2160p ondem Baytrail Z Exynos p ondem. Watt Activity (MCycles/s) Activity (MCycles/s) Slide 24

31 Efficiency Out-of-the-box Haswell 4670T Haswell 4200U (no SMT) Watt p ondem. 2160p ondem. 1080p manual 2160p manual Baytrail Z Exynos 5410 Watt p ondem. 1080p manual 1080p man. A Activity (MCycles/s) Activity (MCycles/s) Slide 24

32 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 25

33 Conclusions I How to achieve low power HEVC decoding: Offline decoding Application optimization (SIMD+MT) DVFS only when cores are consuming most of energy Real-time decoding Exploiting slack preferred over race to idle Multithreading only helpful when lowering clock < 2W for 1080p50 in software Limitations low power modes: OS not selecting lower frequencies Not able to determine when exploiting slack is desired Slide 26

34 Conclusions II Around 30%! power savings possible Architectures not efficient at low activity DVFS targets mainly cores, but cores consume little power at low frequency big.little solves reduced effectiveness DVFS closer to V th, but not main problem (for Intel) Deeper idle states + race to idle not a replacement Slide 27

35 Future Work Include rendering power in analysis Extend towards more OS (Windows, Android, IOS, etc..) Investigate towards application driven DFVS control Slide 28

POWER MANAGEMENT AND ENERGY EFFICIENCY

POWER MANAGEMENT AND ENERGY EFFICIENCY POWER MANAGEMENT AND ENERGY EFFICIENCY * Adopted Power Management for Embedded Systems, Minsoo Ryu 2017 Operating Systems Design Euiseong Seo (euiseong@skku.edu) Need for Power Management Power consumption

More information

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015

MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 Video Codecs 70% of internet traffic will be video in 2018 [CISCO] Video

More information

Power Management for Embedded Systems

Power Management for Embedded Systems Power Management for Embedded Systems Minsoo Ryu Hanyang University Why Power Management? Battery-operated devices Smartphones, digital cameras, and laptops use batteries Power savings and battery run

More information

Towards Power Management for FreeBSD

Towards Power Management for FreeBSD Towards Power Management for FreeBSD Robin Randhawa robin.randhawa@arm.com FreeBSD Developer Summit Computer Laboratory University of Cambridge August 2015 Agenda An overview of Energy Aware Scheduling

More information

Evaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability

Evaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability Evaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability Liangzhen Lai, Vikas Chandra* and Puneet Gupta UCLA Electrical Engineering Department ARM Research* Radiation-Induced

More information

Silvermont. Introducing Next Generation Low Power Microarchitecture: Dadi Perlmutter

Silvermont. Introducing Next Generation Low Power Microarchitecture: Dadi Perlmutter Introducing Next Generation Low Power Microarchitecture: Silvermont Dadi Perlmutter Executive Vice President General Manager, Intel Architecture Group Chief Product Officer Risk Factors Today s presentations

More information

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,

More information

Power and Energy Management

Power and Energy Management Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,

More information

POWER-AWARE SOFTWARE ON ARM. Paul Fox

POWER-AWARE SOFTWARE ON ARM. Paul Fox POWER-AWARE SOFTWARE ON ARM Paul Fox OUTLINE MOTIVATION LINUX POWER MANAGEMENT INTERFACES A UNIFIED POWER MANAGEMENT SYSTEM EXPERIMENTAL RESULTS AND FUTURE WORK 2 MOTIVATION MOTIVATION» ARM SoCs designed

More information

Embedded Linux Conference San Diego 2016

Embedded Linux Conference San Diego 2016 Embedded Linux Conference San Diego 2016 Linux Power Management Optimization on the Nvidia Jetson Platform Merlin Friesen merlin@gg-research.com About You Target Audience - The presentation is introductory

More information

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information

More information

Powernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems

Powernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems Powernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems Thomas Ilsche, Marcus Hähnel, Robert Schöne, Mario Bielert, and Daniel Hackenberg Technische Universität Dresden Observation

More information

QoS Handling with DVFS (CPUfreq & Devfreq)

QoS Handling with DVFS (CPUfreq & Devfreq) QoS Handling with DVFS (CPUfreq & Devfreq) MyungJoo Ham SW Center, 1 Performance Issues of DVFS Performance Sucks w/ DVFS! Battery-life Still Matters More Devices (components) w/ DVFS More Performance

More information

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto. Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors

More information

Benchmarking of Dynamic Power Management Solutions. Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007

Benchmarking of Dynamic Power Management Solutions. Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007 Benchmarking of Dynamic Power Management Solutions Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007 Why Benchmarking?! From Here to There, 2000whatever Vendor NXP

More information

AMD Fusion APU: Llano. Marcello Dionisio, Roman Fedorov Advanced Computer Architectures

AMD Fusion APU: Llano. Marcello Dionisio, Roman Fedorov Advanced Computer Architectures AMD Fusion APU: Llano Marcello Dionisio, Roman Fedorov Advanced Computer Architectures Outline Introduction AMD Llano architecture AMD Llano CPU core AMD Llano GPU Memory access management Turbo core technology

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

Evaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores

Evaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores Evaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores Tomohiro Hirano 1, Hideo Yamamoto 1, Shuhei Iizuka 1, Kohei Muto 1, Takashi Goto 1, Tamami Wake

More information

CPU Clock Ratio, CPU Frequency The settings above are synchronous to those under the same items on the Advanced Frequency Settings menu.

CPU Clock Ratio, CPU Frequency The settings above are synchronous to those under the same items on the Advanced Frequency Settings menu. Advanced CPU Core Features CPU Clock Ratio, CPU Frequency The settings above are synchronous to those under the same items on the Advanced Frequency Settings menu. CPU PLL Selection Allows you to set the

More information

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management Next-Generation Mobile Computing: Balancing Performance and Power Efficiency HOT CHIPS 19 Jonathan Owen, AMD Agenda The mobile computing evolution The Griffin architecture Memory enhancements Power management

More information

28x 29x 30x [ 24x] 3.20GHz ( 133x24) CPU Clock Ratio CPU Frequency. CPU Host Clock Control [ Enable] CPU Host Frequency ( MHz ) 133

28x 29x 30x [ 24x] 3.20GHz ( 133x24) CPU Clock Ratio CPU Frequency. CPU Host Clock Control [ Enable] CPU Host Frequency ( MHz ) 133 Intel Core i7 is a brand new architecture featuring the QPI bus which replaces the FSB bus. So, how does this affect overclocking? The Core i7 processor s frequency is Bclk * CPU multiplier. For ex. Intel

More information

A Framework for Modeling GPUs Power Consumption

A Framework for Modeling GPUs Power Consumption A Framework for Modeling GPUs Power Consumption Sohan Lal, Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January

More information

SoC Idling & CPU Cluster PM

SoC Idling & CPU Cluster PM SoC Idling & CPU Cluster PM Presented by Ulf Hansson Lina Iyer Kevin Hilman Date BKK16-410 March 10, 2016 Event Linaro Connect BKK16 SoC Idling & CPU Cluster PM Idle management of devices via runtime PM

More information

Utilization-based Power Modeling of Modern Mobile Application Processor

Utilization-based Power Modeling of Modern Mobile Application Processor Utilization-based Power Modeling of Modern Mobile Application Processor Abstract Power modeling of a modern mobile application processor (AP) is challenging because of its complex architectural characteristics.

More information

HIGH PERFORMANCE VIDEO CODECS

HIGH PERFORMANCE VIDEO CODECS Spin Digital Video Technologies GmbH Specialists on video codecs Spin-off of the Technical University of Berlin In-house developed software IP & products Based in Germany Innovative B2B company with decades

More information

Embedded Systems Architecture

Embedded Systems Architecture Embedded System Architecture Software and hardware minimizing energy consumption Conscious engineer protects the natur M. Eng. Mariusz Rudnicki 1/47 Software and hardware minimizing energy consumption

More information

Embedded System Architecture

Embedded System Architecture Embedded System Architecture Software and hardware minimizing energy consumption Conscious engineer protects the natur Embedded Systems Architecture 1/44 Software and hardware minimizing energy consumption

More information

Introduction to Energy-Efficient Software 2 nd life talk

Introduction to Energy-Efficient Software 2 nd life talk Introduction to Energy-Efficient Software 2 nd life talk Intel Software and Solutions Group Bob Steigerwald Nov 8, 2007 Taylor Kidd Nov 15, 2007 Agenda Demand for Mobile Computing Devices What is Energy-Efficient

More information

Heterogeneous Architecture. Luca Benini

Heterogeneous Architecture. Luca Benini Heterogeneous Architecture Luca Benini lbenini@iis.ee.ethz.ch Intel s Broadwell 03.05.2016 2 Qualcomm s Snapdragon 810 03.05.2016 3 AMD Bristol Ridge Departement Informationstechnologie und Elektrotechnik

More information

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor

More information

PACKAGE AND PLATFORM VIEW OF INTEL S FULLY INTEGRATED VOLTAGE REGULATORS (FIVR) Edward (Ted) Burton

PACKAGE AND PLATFORM VIEW OF INTEL S FULLY INTEGRATED VOLTAGE REGULATORS (FIVR) Edward (Ted) Burton PACKAGE AND PLATFORM VIEW OF INTEL S FULLY INTEGRATED VOLTAGE REGULATORS (FIVR) Edward (Ted) Burton Ivy Bridge Platform Haswell Platform Core VR 0V-1.2V Graphics VR 0V-1.2V PLL VR 1.8V I/O VR 1.0V System

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

The LPGPU2 Project. Ben Juurlink, TU Berlin

The LPGPU2 Project. Ben Juurlink, TU Berlin The LPGPU2 Ben Juurlink, TU Berlin LPGPU2 Consortium LPGPU2 = Low-Power Parallel Processing on GPUs 2 LPGPU2 Objectives 1. To improve the power efficiency of compute and graphics LPGPU2 applications running

More information

Dongjun Shin Samsung Electronics

Dongjun Shin Samsung Electronics 2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer

More information

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012

CPU Architecture Overview. Varun Sampath CIS 565 Spring 2012 CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations

More information

Power Efficiency for Software Algorithms running on Graphics Processors. Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller

Power Efficiency for Software Algorithms running on Graphics Processors. Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller 1 Power Efficiency for Software Algorithms running on Graphics Processors Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller Overview 2 Motivation Goal Project Applications Methodology Results

More information

TDT 4260 lecture 2 spring semester 2015

TDT 4260 lecture 2 spring semester 2015 1 TDT 4260 lecture 2 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Chapter 1: Fundamentals of Quantitative Design and Analysis, continued

More information

Sustainable Computing: Informatics and Systems

Sustainable Computing: Informatics and Systems Sustainable Computing: Informatics and Systems 3 (2013) 194 206 Contents lists available at SciVerse ScienceDirect Sustainable Computing: Informatics and Systems jo u r n al hom epa ge: www.elsevier.com/locate/suscom

More information

A Study on C-group controlled big.little Architecture

A Study on C-group controlled big.little Architecture A Study on C-group controlled big.little Architecture Renesas Electronics Corporation New Solutions Platform Business Division Renesas Solutions Corporation Advanced Software Platform Development Department

More information

Intel Architecture for Software Developers

Intel Architecture for Software Developers Intel Architecture for Software Developers 1 Agenda Introduction Processor Architecture Basics Intel Architecture Intel Core and Intel Xeon Intel Atom Intel Xeon Phi Coprocessor Use Cases for Software

More information

Parallel H.264/AVC Motion Compensation for GPUs using OpenCL

Parallel H.264/AVC Motion Compensation for GPUs using OpenCL Parallel H.264/AVC Motion Compensation for GPUs using OpenCL Biao Wang, Mauricio Alvarez-Mesa, Chi Ching Chi, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Modeling CPU Energy Consumption for Energy Efficient Scheduling

Modeling CPU Energy Consumption for Energy Efficient Scheduling Modeling CPU Energy Consumption for Energy Efficient Scheduling Abhishek Jaiantilal, Yifei Jiang, Shivakant Mishra University of Colorado - Boulder GCM '10 Proceedings of the 1st Workshop on Green Computing

More information

Energy Efficiency in Operating Systems

Energy Efficiency in Operating Systems Devices Timer interrupts CPU idling CPU frequency scaling Energy-aware scheduling Energy Efficiency in Operating Systems Björn Brömstrup Arbeitsbereich Wissenschaftliches Rechnen Fachbereich Informatik

More information

Power Measurement Using Performance Counters

Power Measurement Using Performance Counters Power Measurement Using Performance Counters October 2016 1 Introduction CPU s are based on complementary metal oxide semiconductor technology (CMOS). CMOS technology theoretically only dissipates power

More information

F28HS Hardware-Software Interface: Systems Programming

F28HS Hardware-Software Interface: Systems Programming F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has

More information

Energy Models for DVFS Processors

Energy Models for DVFS Processors Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July

More information

Managing Hardware Power Saving Modes for High Performance Computing

Managing Hardware Power Saving Modes for High Performance Computing Managing Hardware Power Saving Modes for High Performance Computing Second International Green Computing Conference 2011, Orlando Timo Minartz, Michael Knobloch, Thomas Ludwig, Bernd Mohr timo.minartz@informatik.uni-hamburg.de

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

I/O Systems (4): Power Management. CSE 2431: Introduction to Operating Systems

I/O Systems (4): Power Management. CSE 2431: Introduction to Operating Systems I/O Systems (4): Power Management CSE 2431: Introduction to Operating Systems 1 Outline Overview Hardware Issues OS Issues Application Issues 2 Why Power Management? Desktop PCs Battery-powered Computers

More information

User s Guide. Alexandra Yates Kristen C. Accardi

User s Guide. Alexandra Yates Kristen C. Accardi User s Guide Kristen C. Accardi kristen.c.accardi@intel.com Alexandra Yates alexandra.yates@intel.com PowerTOP is a Linux* tool used to diagnose issues related to power consumption and power management.

More information

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM

Integrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM The ARM Business Model Global leader in the development of

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot Heterogeneous Multi-Processing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on

More information

Samsung System LSI Business

Samsung System LSI Business Samsung System LSI Business NS (Stephen) Woo, Ph.D. President & GM of System LSI Samsung Electronics 0/32 Disclaimer The materials in this report include forward-looking statements which can generally

More information

Design and Implementation of an AHB SRAM Memory Controller

Design and Implementation of an AHB SRAM Memory Controller Design and Implementation of an AHB SRAM Memory Controller 1 Module Overview Learn the basics of Computer Memory; Design and implement an AHB SRAM memory controller, which replaces the previous on-chip

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

XEEMU: An Improved XScale Power Simulator

XEEMU: An Improved XScale Power Simulator Department of Software Engineering (1), University of Szeged Department of Electrical Engineering (2), University of Kaiserslautern XEEMU: An Improved XScale Power Simulator Z. Herczeg(1), Á. Kiss (1),

More information

Power Capping Linux. Len Brown, Jacob Pan, Srinivas Pandruvada

Power Capping Linux. Len Brown, Jacob Pan, Srinivas Pandruvada Power Capping Linux Len Brown, Jacob Pan, Srinivas Pandruvada Agenda Context System Power Management Issues Power Capping Overview Power capping participants Recommendation Linux Power Capping Framework

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 21

ECE 571 Advanced Microprocessor-Based Design Lecture 21 ECE 571 Advanced Microprocessor-Based Design Lecture 21 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 9 April 2013 Project/HW Reminder Homework #4 comments Good job finding references,

More information

Leveraging Mobile GPUs for Flexible High-speed Wireless Communication

Leveraging Mobile GPUs for Flexible High-speed Wireless Communication 0 Leveraging Mobile GPUs for Flexible High-speed Wireless Communication Qi Zheng, Cao Gao, Trevor Mudge, Ronald Dreslinski *, Ann Arbor The 3 rd International Workshop on Parallelism in Mobile Platforms

More information

Lecture 2. Systems Programming with the Raspberry Pi

Lecture 2. Systems Programming with the Raspberry Pi F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2015/16 0 No proprietary software has

More information

Enabling Arm DynamIQ support. Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro)

Enabling Arm DynamIQ support. Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro) Enabling Arm DynamIQ support Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro) Agenda DynamIQ introduction DynamIQ and Arm Trusted Firmware OS Power Management with DynamIQ L3 partial power-down

More information

45-year CPU Evolution: 1 Law -2 Equations

45-year CPU Evolution: 1 Law -2 Equations 4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

MICROPROCESSOR TECHNOLOGY

MICROPROCESSOR TECHNOLOGY MICROPROCESSOR TECHNOLOGY Assis. Prof. Hossam El-Din Moustafa Lecture 20 Ch.10 Intel Core Duo Processor Architecture 2-Jun-15 1 Chapter Objectives Understand the concept of dual core technology. Look inside

More information

Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms

Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms Arizona State University Dhinakaran Pandiyan(dpandiya@asu.edu) and Carole-Jean Wu(carole-jean.wu@asu.edu

More information

Tips and Tricks: Designing low power Native and WebApps. Harita Chilukuri and Abhishek Dhanotia

Tips and Tricks: Designing low power Native and WebApps. Harita Chilukuri and Abhishek Dhanotia Tips and Tricks: Designing low power Native and WebApps Harita Chilukuri and Abhishek Dhanotia Acknowledgements William Baughman for his help with the browser analysis Ross Burton & Thomas Wood for information

More information

Seahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor

Seahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor Seahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor PD Marketing ARM 1 Introduction to Cortex-A15 & Seahawk ARM Cortex-A15 is a high performance engine for superphones,

More information

Hyperthreading Technology

Hyperthreading Technology Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?

More information

Core 2 vs I-series. How Far Have We Really Come?

Core 2 vs I-series. How Far Have We Really Come? Core 2 vs I-series How Far Have We Really Come? Appendix 1. Introduction 2. Road map 3. General specifications 4. Hardware subtleties 5. Technology difference 6. Advantages of the new architecture 7. Conclusion

More information

COL862 - Low Power Computing

COL862 - Low Power Computing COL862 - Low Power Computing Power Measurements using performance counters and studying the low power computing techniques in IoT development board (PSoC 4 BLE Pioneer Kit) and Arduino Mega 2560 Submitted

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Technology Insight: Intel s Next Generation Microarchitecture Code Name Skylake

Technology Insight: Intel s Next Generation Microarchitecture Code Name Skylake Technology Insight: Intel s Next Generation Microarchitecture Code Name Skylake Julius Mandelblat, Senior Principal Engineer, Intel SPCS001 AGENDA Introduction and Overview Core Microarchitecture Interconnect

More information

Frequency and Voltage Scaling Design. Ruixing Yang

Frequency and Voltage Scaling Design. Ruixing Yang Frequency and Voltage Scaling Design Ruixing Yang 04.12.2008 Outline Dynamic Power and Energy Voltage Scaling Approaches Dynamic Voltage and Frequency Scaling (DVFS) CPU subsystem issues Adaptive Voltages

More information

Real-Time Dynamic Energy Management on MPSoCs

Real-Time Dynamic Energy Management on MPSoCs Real-Time Dynamic Energy Management on MPSoCs Tohru Ishihara Graduate School of Informatics, Kyoto University 2013/03/27 University of Bristol on Energy-Aware COmputing (EACO) Workshop 1 Background Low

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor

More information

Philippe Thierry Sr Staff Engineer Intel Corp.

Philippe Thierry Sr Staff Engineer Intel Corp. HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market

More information

Dell Dynamic Power Mode: An Introduction to Power Limits

Dell Dynamic Power Mode: An Introduction to Power Limits Dell Dynamic Power Mode: An Introduction to Power Limits By: Alex Shows, Client Performance Engineering Managing system power is critical to balancing performance, battery life, and operating temperatures.

More information

Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones

Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones HOT CHIPS 2009 August 24 2009 Rajesh Patel Lead Architect, Lincroft SoC Intel Corporation Legal Disclaimer INFORMATION

More information

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2 CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.

More information

User s Guide. Alexandra Yates Kristen C. Accardi

User s Guide. Alexandra Yates Kristen C. Accardi User s Guide Kristen C. Accardi kristen.c.accardi@intel.com Alexandra Yates alexandra.yates@intel.com PowerTOP is a Linux* tool used to diagnose issues related to power consumption and power management.

More information

THE UNIVERSITY OF CHICAGO MANAGING DIVERSITY IN PERFORMANCE AND ENERGY CHARACTERISTICS ON EMBEDDED SYSTEMS A DISSERTATION SUBMITTED TO

THE UNIVERSITY OF CHICAGO MANAGING DIVERSITY IN PERFORMANCE AND ENERGY CHARACTERISTICS ON EMBEDDED SYSTEMS A DISSERTATION SUBMITTED TO THE UNIVERSITY OF CHICAGO MANAGING DIVERSITY IN PERFORMANCE AND ENERGY CHARACTERISTICS ON EMBEDDED SYSTEMS A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY

More information

Phase-Aware Web Browser Power Management on HMP Platforms

Phase-Aware Web Browser Power Management on HMP Platforms Phase-Aware Web Browser Management on HMP Platforms N. Peters, S. Park, D. Clifford, S. Kyostila, R. McIlroy, B. Meurer, H. Payer, S. Chakraborty Technical University of Munich, Google Inc {nadja.peters,sangyoung.park,samarjit}@tum.de,{danno,skyostil,rmcilroy,bmeurer,hpayer}@google.com

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES

FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES , suitable for DFA on AES Jonas Krautter, Dennis R.E. Gnad, Mehdi B. Tahoori 10.09.2018 INSTITUTE OF COMPUTER ENGINEERING CHAIR OF DEPENDABLE NANO COMPUTING KIT Die Forschungsuniversität in der Helmholtz-Gemeinschaft

More information

Energy-Aware CPU Frequency Scaling for Mobile Video Streaming

Energy-Aware CPU Frequency Scaling for Mobile Video Streaming 1 Energy-Aware CPU Frequency Scaling for Mobile Video Streaming Yi Yang, Student Member, IEEE, Wenjie Hu, Student Member, IEEE, Xianda Chen, Student Member, IEEE, Guohong Cao, Fellow, IEEE, Abstract The

More information

Workload Prediction for Adaptive Power Scaling Using Deep Learning. Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14

Workload Prediction for Adaptive Power Scaling Using Deep Learning. Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14 Workload Prediction for Adaptive Power Scaling Using Deep Learning Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14 In these slides Machine learning (ML) is applied to

More information

ARM big.little Technology Unleashed An Improved User Experience Delivered

ARM big.little Technology Unleashed An Improved User Experience Delivered ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1 Agenda Introduction to big.little Technology Benefits

More information

Evaluating the Effectiveness of Model Based Power Characterization

Evaluating the Effectiveness of Model Based Power Characterization Evaluating the Effectiveness of Model Based Power Characterization John McCullough, Yuvraj Agarwal, Jaideep Chandrashekhar (Intel), Sathya Kuppuswamy, Alex C. Snoeren, Rajesh Gupta Computer Science and

More information

Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors

Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors 1 Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti +, Achilleas Anastasopoulos*, Scott Mahlke*, Trevor

More information

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip

More information