LOWERING POWER CONSUMPTION OF HEVC DECODING. Chi Ching Chi Techinische Universität Berlin - AES PEGPUM 2014
|
|
- Veronica Bridges
- 5 years ago
- Views:
Transcription
1 LOWERING POWER CONSUMPTION OF HEVC DECODING Chi Ching Chi Techinische Universität Berlin - AES PEGPUM 2014
2 Introduction How to achieve low power HEVC video decoding? Modern processors expose many low power techniques Clock gating, power gating, DVFS, asymmetric cores Increase application performance SIMD, multithreading Two (contradicting) general approaches: Exploit slack Race to idle What combination is best for video decoding? Slide 2
3 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 3
4 Architectural Techniques Power consumption processor: P cpu = αnkv dd I leak }{{} P static + (1 α)ncfv 2 dd }{{} P dynamic (1) Modern architecture incorporate various low power techniques to reduce the active and idle power Active power Dynamic Voltage Frequency Scaling (DVFS) Asymmetric cores Idle power Clock gating Power gating Slide 4
5 Low Power Active DVFS lowers V dd, I leak, f P static P dynamic Asymmetric ARM big.little extend DVFS in lower performance regions In Linux cpufreq subsystem controls both DVFS and asymmetric core switching Governors change frequency depending on load exynos_cpufreq driver maps Little cores to lower frequencies In general responds to load quickly to avoid performance regression Slide 5
6 Low Power Idle I Clock gating and power gating used to implement idle states (C-states) ACPI defines C0 - C3, Intel extends up to C10 Intel core C-states C0 Core is execution code C1 C1E C3 Core is halted most clocks are stopped Voltage reduced to Pn Core L1/L2 flushed to L3, PLL stopped C6-C10 Core state saved and power gated Slide 6
7 Low Power Idle II Intel package C-states C3 Mem self-refresh, some uncore clocks C6 C7 C8 stopped, some voltages reduced All uncore clocks stopped L3 power gated, more voltages reduced Most uncore power gated C9-C10 FIVR in low power/off state In Linux cpuidle subsystem controls C-states Trade off power saving in lower C-state vs. transition cost Predicts duration of next idle period to select C-state C-state, transition time, transition energy model specific Vendor supplies this in model/series specific drivers Slide 7
8 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 8
9 Power Optimization HEVC Decoder Two usage models HEVC decoding : offline and real-time Offline decoding Improve application performance General optimization ISA/Architecture specific (e.g. SIMD) Multithreading Additionally for real-time decoding Decrease worst-case complexity Coalesce computation Slide 9
10 Real-time Video Decoding I Decrease worst-case complexity using playback buffers Frametime (ms) 50 T buf Frame Lower peaks allow for execution on lower frequency/slower cores Improves possibilities for exploiting slack Slide 10
11 Real-time Video Decoding I Decrease worst-case complexity using playback buffers Frametime (ms) 50 T buf Frame Lower peaks allow for execution on lower frequency/slower cores Improves possibilities for exploiting slack Slide 10
12 Real-time Video Decoding I Decrease worst-case complexity using playback buffers Frametime (ms) 50 T buf Frame Lower peaks allow for execution on lower frequency/slower cores Improves possibilities for exploiting slack Slide 10
13 Real-time Video Decoding II Improve C-state residency with burst decoding 60 Ready buffer Frame Perform decoding bursts to improve deeper C-state residency Improves effectiveness of race to idle in theory In practice has no significant benefit except rare configurations Slide 11
14 Real-time Video Decoding II Improve C-state residency with burst decoding 60 Ready buffer s Frame Perform decoding bursts to improve deeper C-state residency Improves effectiveness of race to idle in theory In practice has no significant benefit except rare configurations Slide 11
15 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 12
16 Platforms I Used 4 platform representative of modern PC and consumer electronics processors Power measurable but not completely comparable Series Model Process Die size Measurable power Cores Uncore DRAM Haswell i5-4670t 22nm 177mm 2 Haswell ULT i5-4200u 22nm 134mm 2 Baytrail-T Z nm 102mm 2 Exynos nm 122mm 2 (GPU) Slide 13
17 Platforms II Vdd Haswell 4670T Haswell 4200U BayTrail Z3740* Exynos A15 Exynos A MHz 500 Slide 14
18 Haswell i5-4670t Haswell 4 cores TDP 45 W Low power desktop/ all-in-one Source: Intel Uncore Dram Cores 20 C1E Slide 15 C tu 00 rb o tu 00 rb o C tu 00 rb o tu 00 rb o C tu 00 rb o tu 00 rb o tu 00 rb o tu 00 rb o Watt 30 1-core 2-core 3-core 4-core
19 Haswell ULT i5-4200u Haswell 2 cores ( SMT ) TDP 15 W Ulltrabook/convertibles Source: Intel Uncore Cores Dram Watt turbo turbo turbo turbo turbo turbo turbo C7 C3 C1E C1 1-core 2-core 2c4t Slide 16
20 Baytrail-T Z3740 Silvermont 4 cores SDP 2 W/ TDP 5 W? Tablets 6 Uncore Source: Intel, tweakers.net Cores Watt C6 C1 1-core 2-core 3-core 4-core Slide 17
21 Exynos 5410 A15 + A7 big.little cores TDP 6 W? Smartphones/Tablets 6 Cores Dram Watt L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 L-500 L-800 L-1200 b-800 b-1200 b-1600 C2 C1 1-core 2-core 3-core 4-core Slide 18
22 Methodology Use highly optimized HEVC Decoder developed by AES TUB All experiments executed under Linux (3.12+ for x86, 3.4 arm) GCC In total 100 FHD and UHD short 10 sec test sequences 1 second intra period, GOP size 8 FHD 24, 50, 60 fps with 40 to 200 kb/frame UHD 50 fps with 100 to 500 kb/frame Multithreading using wavefronts (WPP) Real-time execution simulated with 8 output picture buffer Slide 19
23 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 20
24 SIMD and Multithreading (1080p) Scalar SIMD mj/frame mj/frame Kb/frame Kb/frame SIMD+MT mj/frame Haswell 4670T Haswell 4200U BayTrail Z3740 Exynos A15 Exynos A Kb/frame Slide 21
25 SIMD and Multithreading (1080p) Scalar SIMD mj/frame mj/frame Kb/frame Kb/frame SIMD+MT mj/frame Haswell 4670T Haswell 4200U BayTrail Z3740 Exynos A15 Exynos A Kb/frame Slide 21
26 DVFS in Offline Decoding (1080p) Haswell 4670T Haswell 4200U Baytrail mj/frame MHz MHz MHz Cortex A15 Cortex A7 mj/frame T1 E tot T2 E tot T4 E tot MHz MHz Slide 22
27 DVFS in Offline Decoding (1080p) Haswell 4670T Haswell 4200U Baytrail mj/frame MHz MHz MHz Cortex A15 Cortex A7 mj/frame T1 E tot T2 E tot T4 E tot T1 E core T2 E core T4 E core MHz MHz Slide 22
28 DVFS in Real-time Decoding (1080p50 4.5Mbps) Haswell 4670T Haswell 4200U Watt T1 P tot T2 P tot T4 P tot T1 P core T2 P core T4 P core Baytrail Exynos T4 P tot T4 P core Watt MHz MHz Slide 23
29 DVFS in Real-time Decoding (1080p50 4.5Mbps) Haswell 4670T Haswell 4200U Watt T1 P tot T2 P tot T4 P tot T1 P core T2 P core T4 P core Ondemand Baytrail Exynos 5410 Watt T4 P tot T4 P core Ondemand MHz MHz Slide 23
30 Efficiency Out-of-the-box Haswell 4670T Haswell 4200U (no SMT) Watt p ondem. 2160p ondem Baytrail Z Exynos p ondem. Watt Activity (MCycles/s) Activity (MCycles/s) Slide 24
31 Efficiency Out-of-the-box Haswell 4670T Haswell 4200U (no SMT) Watt p ondem. 2160p ondem. 1080p manual 2160p manual Baytrail Z Exynos 5410 Watt p ondem. 1080p manual 1080p man. A Activity (MCycles/s) Activity (MCycles/s) Slide 24
32 Outline Introduction Architectural Low Power Modes Power Optimization HEVC Decoder Platforms and Methodology Results Offline Decoding Real-time Decoding Efficiency Out-of-the-box Conclusions Slide 25
33 Conclusions I How to achieve low power HEVC decoding: Offline decoding Application optimization (SIMD+MT) DVFS only when cores are consuming most of energy Real-time decoding Exploiting slack preferred over race to idle Multithreading only helpful when lowering clock < 2W for 1080p50 in software Limitations low power modes: OS not selecting lower frequencies Not able to determine when exploiting slack is desired Slide 26
34 Conclusions II Around 30%! power savings possible Architectures not efficient at low activity DVFS targets mainly cores, but cores consume little power at low frequency big.little solves reduced effectiveness DVFS closer to V th, but not main problem (for Intel) Deeper idle states + race to idle not a replacement Slide 27
35 Future Work Include rendering power in analysis Extend towards more OS (Windows, Android, IOS, etc..) Investigate towards application driven DFVS control Slide 28
POWER MANAGEMENT AND ENERGY EFFICIENCY
POWER MANAGEMENT AND ENERGY EFFICIENCY * Adopted Power Management for Embedded Systems, Minsoo Ryu 2017 Operating Systems Design Euiseong Seo (euiseong@skku.edu) Need for Power Management Power consumption
More informationMAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES. Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015
MAPPING VIDEO CODECS TO HETEROGENEOUS ARCHITECTURES Mauricio Alvarez-Mesa Techische Universität Berlin - Spin Digital MULTIPROG 2015 Video Codecs 70% of internet traffic will be video in 2018 [CISCO] Video
More informationPower Management for Embedded Systems
Power Management for Embedded Systems Minsoo Ryu Hanyang University Why Power Management? Battery-operated devices Smartphones, digital cameras, and laptops use batteries Power savings and battery run
More informationTowards Power Management for FreeBSD
Towards Power Management for FreeBSD Robin Randhawa robin.randhawa@arm.com FreeBSD Developer Summit Computer Laboratory University of Cambridge August 2015 Agenda An overview of Energy Aware Scheduling
More informationEvaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability
Evaluating and Exploiting Impacts of Dynamic Power Management Schemes on System Reliability Liangzhen Lai, Vikas Chandra* and Puneet Gupta UCLA Electrical Engineering Department ARM Research* Radiation-Induced
More informationSilvermont. Introducing Next Generation Low Power Microarchitecture: Dadi Perlmutter
Introducing Next Generation Low Power Microarchitecture: Silvermont Dadi Perlmutter Executive Vice President General Manager, Intel Architecture Group Chief Product Officer Risk Factors Today s presentations
More informationPower and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur
Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,
More informationPower and Energy Management
Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,
More informationPOWER-AWARE SOFTWARE ON ARM. Paul Fox
POWER-AWARE SOFTWARE ON ARM Paul Fox OUTLINE MOTIVATION LINUX POWER MANAGEMENT INTERFACES A UNIFIED POWER MANAGEMENT SYSTEM EXPERIMENTAL RESULTS AND FUTURE WORK 2 MOTIVATION MOTIVATION» ARM SoCs designed
More informationEmbedded Linux Conference San Diego 2016
Embedded Linux Conference San Diego 2016 Linux Power Management Optimization on the Nvidia Jetson Platform Merlin Friesen merlin@gg-research.com About You Target Audience - The presentation is introductory
More informationIMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM
IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM I5 AND I7 PROCESSORS Juan M. Cebrián 1 Lasse Natvig 1 Jan Christian Meyer 2 1 Depart. of Computer and Information
More informationPowernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems
Powernightmares: The Challenge of Efficiently Using Sleep States on Multi-Core Systems Thomas Ilsche, Marcus Hähnel, Robert Schöne, Mario Bielert, and Daniel Hackenberg Technische Universität Dresden Observation
More informationQoS Handling with DVFS (CPUfreq & Devfreq)
QoS Handling with DVFS (CPUfreq & Devfreq) MyungJoo Ham SW Center, 1 Performance Issues of DVFS Performance Sucks w/ DVFS! Battery-life Still Matters More Devices (components) w/ DVFS More Performance
More informationEmbedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.
Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors
More informationBenchmarking of Dynamic Power Management Solutions. Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007
Benchmarking of Dynamic Power Management Solutions Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007 Why Benchmarking?! From Here to There, 2000whatever Vendor NXP
More informationAMD Fusion APU: Llano. Marcello Dionisio, Roman Fedorov Advanced Computer Architectures
AMD Fusion APU: Llano Marcello Dionisio, Roman Fedorov Advanced Computer Architectures Outline Introduction AMD Llano architecture AMD Llano CPU core AMD Llano GPU Memory access management Turbo core technology
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationEvaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores
Evaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores Tomohiro Hirano 1, Hideo Yamamoto 1, Shuhei Iizuka 1, Kohei Muto 1, Takashi Goto 1, Tamami Wake
More informationCPU Clock Ratio, CPU Frequency The settings above are synchronous to those under the same items on the Advanced Frequency Settings menu.
Advanced CPU Core Features CPU Clock Ratio, CPU Frequency The settings above are synchronous to those under the same items on the Advanced Frequency Settings menu. CPU PLL Selection Allows you to set the
More informationThe mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management
Next-Generation Mobile Computing: Balancing Performance and Power Efficiency HOT CHIPS 19 Jonathan Owen, AMD Agenda The mobile computing evolution The Griffin architecture Memory enhancements Power management
More information28x 29x 30x [ 24x] 3.20GHz ( 133x24) CPU Clock Ratio CPU Frequency. CPU Host Clock Control [ Enable] CPU Host Frequency ( MHz ) 133
Intel Core i7 is a brand new architecture featuring the QPI bus which replaces the FSB bus. So, how does this affect overclocking? The Core i7 processor s frequency is Bclk * CPU multiplier. For ex. Intel
More informationA Framework for Modeling GPUs Power Consumption
A Framework for Modeling GPUs Power Consumption Sohan Lal, Jan Lucas, Michael Andersch, Mauricio Alvarez-Mesa, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January
More informationSoC Idling & CPU Cluster PM
SoC Idling & CPU Cluster PM Presented by Ulf Hansson Lina Iyer Kevin Hilman Date BKK16-410 March 10, 2016 Event Linaro Connect BKK16 SoC Idling & CPU Cluster PM Idle management of devices via runtime PM
More informationUtilization-based Power Modeling of Modern Mobile Application Processor
Utilization-based Power Modeling of Modern Mobile Application Processor Abstract Power modeling of a modern mobile application processor (AP) is challenging because of its complex architectural characteristics.
More informationHIGH PERFORMANCE VIDEO CODECS
Spin Digital Video Technologies GmbH Specialists on video codecs Spin-off of the Technical University of Berlin In-house developed software IP & products Based in Germany Innovative B2B company with decades
More informationEmbedded Systems Architecture
Embedded System Architecture Software and hardware minimizing energy consumption Conscious engineer protects the natur M. Eng. Mariusz Rudnicki 1/47 Software and hardware minimizing energy consumption
More informationEmbedded System Architecture
Embedded System Architecture Software and hardware minimizing energy consumption Conscious engineer protects the natur Embedded Systems Architecture 1/44 Software and hardware minimizing energy consumption
More informationIntroduction to Energy-Efficient Software 2 nd life talk
Introduction to Energy-Efficient Software 2 nd life talk Intel Software and Solutions Group Bob Steigerwald Nov 8, 2007 Taylor Kidd Nov 15, 2007 Agenda Demand for Mobile Computing Devices What is Energy-Efficient
More informationHeterogeneous Architecture. Luca Benini
Heterogeneous Architecture Luca Benini lbenini@iis.ee.ethz.ch Intel s Broadwell 03.05.2016 2 Qualcomm s Snapdragon 810 03.05.2016 3 AMD Bristol Ridge Departement Informationstechnologie und Elektrotechnik
More informationAge nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications
Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor
More informationPACKAGE AND PLATFORM VIEW OF INTEL S FULLY INTEGRATED VOLTAGE REGULATORS (FIVR) Edward (Ted) Burton
PACKAGE AND PLATFORM VIEW OF INTEL S FULLY INTEGRATED VOLTAGE REGULATORS (FIVR) Edward (Ted) Burton Ivy Bridge Platform Haswell Platform Core VR 0V-1.2V Graphics VR 0V-1.2V PLL VR 1.8V I/O VR 1.0V System
More informationECE 486/586. Computer Architecture. Lecture # 2
ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationHigh performance, power-efficient DSPs based on the TI C64x
High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research
More informationFundamentals of Quantitative Design and Analysis
Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature
More informationThe LPGPU2 Project. Ben Juurlink, TU Berlin
The LPGPU2 Ben Juurlink, TU Berlin LPGPU2 Consortium LPGPU2 = Low-Power Parallel Processing on GPUs 2 LPGPU2 Objectives 1. To improve the power efficiency of compute and graphics LPGPU2 applications running
More informationDongjun Shin Samsung Electronics
2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer
More informationCPU Architecture Overview. Varun Sampath CIS 565 Spring 2012
CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations
More informationPower Efficiency for Software Algorithms running on Graphics Processors. Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller
1 Power Efficiency for Software Algorithms running on Graphics Processors Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller Overview 2 Motivation Goal Project Applications Methodology Results
More informationTDT 4260 lecture 2 spring semester 2015
1 TDT 4260 lecture 2 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Chapter 1: Fundamentals of Quantitative Design and Analysis, continued
More informationSustainable Computing: Informatics and Systems
Sustainable Computing: Informatics and Systems 3 (2013) 194 206 Contents lists available at SciVerse ScienceDirect Sustainable Computing: Informatics and Systems jo u r n al hom epa ge: www.elsevier.com/locate/suscom
More informationA Study on C-group controlled big.little Architecture
A Study on C-group controlled big.little Architecture Renesas Electronics Corporation New Solutions Platform Business Division Renesas Solutions Corporation Advanced Software Platform Development Department
More informationIntel Architecture for Software Developers
Intel Architecture for Software Developers 1 Agenda Introduction Processor Architecture Basics Intel Architecture Intel Core and Intel Xeon Intel Atom Intel Xeon Phi Coprocessor Use Cases for Software
More informationParallel H.264/AVC Motion Compensation for GPUs using OpenCL
Parallel H.264/AVC Motion Compensation for GPUs using OpenCL Biao Wang, Mauricio Alvarez-Mesa, Chi Ching Chi, Ben Juurlink Embedded Systems Architecture Technische Universität Berlin Berlin, Germany January
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationModeling CPU Energy Consumption for Energy Efficient Scheduling
Modeling CPU Energy Consumption for Energy Efficient Scheduling Abhishek Jaiantilal, Yifei Jiang, Shivakant Mishra University of Colorado - Boulder GCM '10 Proceedings of the 1st Workshop on Green Computing
More informationEnergy Efficiency in Operating Systems
Devices Timer interrupts CPU idling CPU frequency scaling Energy-aware scheduling Energy Efficiency in Operating Systems Björn Brömstrup Arbeitsbereich Wissenschaftliches Rechnen Fachbereich Informatik
More informationPower Measurement Using Performance Counters
Power Measurement Using Performance Counters October 2016 1 Introduction CPU s are based on complementary metal oxide semiconductor technology (CMOS). CMOS technology theoretically only dissipates power
More informationF28HS Hardware-Software Interface: Systems Programming
F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2017/18 0 No proprietary software has
More informationEnergy Models for DVFS Processors
Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July
More informationManaging Hardware Power Saving Modes for High Performance Computing
Managing Hardware Power Saving Modes for High Performance Computing Second International Green Computing Conference 2011, Orlando Timo Minartz, Michael Knobloch, Thomas Ludwig, Bernd Mohr timo.minartz@informatik.uni-hamburg.de
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationLow-power Architecture. By: Jonathan Herbst Scott Duntley
Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media
More informationI/O Systems (4): Power Management. CSE 2431: Introduction to Operating Systems
I/O Systems (4): Power Management CSE 2431: Introduction to Operating Systems 1 Outline Overview Hardware Issues OS Issues Application Issues 2 Why Power Management? Desktop PCs Battery-powered Computers
More informationUser s Guide. Alexandra Yates Kristen C. Accardi
User s Guide Kristen C. Accardi kristen.c.accardi@intel.com Alexandra Yates alexandra.yates@intel.com PowerTOP is a Linux* tool used to diagnose issues related to power consumption and power management.
More informationIntegrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM
Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM The ARM Business Model Global leader in the development of
More informationA Simple Model for Estimating Power Consumption of a Multicore Server System
, pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of
More informationMediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot Heterogeneous Multi-Processing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on
More informationSamsung System LSI Business
Samsung System LSI Business NS (Stephen) Woo, Ph.D. President & GM of System LSI Samsung Electronics 0/32 Disclaimer The materials in this report include forward-looking statements which can generally
More informationDesign and Implementation of an AHB SRAM Memory Controller
Design and Implementation of an AHB SRAM Memory Controller 1 Module Overview Learn the basics of Computer Memory; Design and implement an AHB SRAM memory controller, which replaces the previous on-chip
More informationIntroduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes
Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel
More informationParallelism in Hardware
Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law
More informationXEEMU: An Improved XScale Power Simulator
Department of Software Engineering (1), University of Szeged Department of Electrical Engineering (2), University of Kaiserslautern XEEMU: An Improved XScale Power Simulator Z. Herczeg(1), Á. Kiss (1),
More informationPower Capping Linux. Len Brown, Jacob Pan, Srinivas Pandruvada
Power Capping Linux Len Brown, Jacob Pan, Srinivas Pandruvada Agenda Context System Power Management Issues Power Capping Overview Power capping participants Recommendation Linux Power Capping Framework
More informationECE 571 Advanced Microprocessor-Based Design Lecture 21
ECE 571 Advanced Microprocessor-Based Design Lecture 21 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 9 April 2013 Project/HW Reminder Homework #4 comments Good job finding references,
More informationLeveraging Mobile GPUs for Flexible High-speed Wireless Communication
0 Leveraging Mobile GPUs for Flexible High-speed Wireless Communication Qi Zheng, Cao Gao, Trevor Mudge, Ronald Dreslinski *, Ann Arbor The 3 rd International Workshop on Parallelism in Mobile Platforms
More informationLecture 2. Systems Programming with the Raspberry Pi
F28HS Hardware-Software Interface: Systems Programming Hans-Wolfgang Loidl School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh Semester 2 2015/16 0 No proprietary software has
More informationEnabling Arm DynamIQ support. Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro)
Enabling Arm DynamIQ support Dan Handley (Arm) Ionela Voinescu (Arm) Vincent Guittot (Linaro) Agenda DynamIQ introduction DynamIQ and Arm Trusted Firmware OS Power Management with DynamIQ L3 partial power-down
More information45-year CPU Evolution: 1 Law -2 Equations
4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationMICROPROCESSOR TECHNOLOGY
MICROPROCESSOR TECHNOLOGY Assis. Prof. Hossam El-Din Moustafa Lecture 20 Ch.10 Intel Core Duo Processor Architecture 2-Jun-15 1 Chapter Objectives Understand the concept of dual core technology. Look inside
More informationQuantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms
Quantifying the Energy Cost of Data Movement for Emerging Smartphone Workloads on Mobile Platforms Arizona State University Dhinakaran Pandiyan(dpandiya@asu.edu) and Carole-Jean Wu(carole-jean.wu@asu.edu
More informationTips and Tricks: Designing low power Native and WebApps. Harita Chilukuri and Abhishek Dhanotia
Tips and Tricks: Designing low power Native and WebApps Harita Chilukuri and Abhishek Dhanotia Acknowledgements William Baughman for his help with the browser analysis Ross Burton & Thomas Wood for information
More informationSeahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor
Seahawk Power-optimized implementation of High Performance Quad-core Cortex-A15 Processor PD Marketing ARM 1 Introduction to Cortex-A15 & Seahawk ARM Cortex-A15 is a high performance engine for superphones,
More informationHyperthreading Technology
Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?
More informationCore 2 vs I-series. How Far Have We Really Come?
Core 2 vs I-series How Far Have We Really Come? Appendix 1. Introduction 2. Road map 3. General specifications 4. Hardware subtleties 5. Technology difference 6. Advantages of the new architecture 7. Conclusion
More informationCOL862 - Low Power Computing
COL862 - Low Power Computing Power Measurements using performance counters and studying the low power computing techniques in IoT development board (PSoC 4 BLE Pioneer Kit) and Arduino Mega 2560 Submitted
More informationComputer Architecture!
Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors
More informationTechnology Insight: Intel s Next Generation Microarchitecture Code Name Skylake
Technology Insight: Intel s Next Generation Microarchitecture Code Name Skylake Julius Mandelblat, Senior Principal Engineer, Intel SPCS001 AGENDA Introduction and Overview Core Microarchitecture Interconnect
More informationFrequency and Voltage Scaling Design. Ruixing Yang
Frequency and Voltage Scaling Design Ruixing Yang 04.12.2008 Outline Dynamic Power and Energy Voltage Scaling Approaches Dynamic Voltage and Frequency Scaling (DVFS) CPU subsystem issues Adaptive Voltages
More informationReal-Time Dynamic Energy Management on MPSoCs
Real-Time Dynamic Energy Management on MPSoCs Tohru Ishihara Graduate School of Informatics, Kyoto University 2013/03/27 University of Bristol on Energy-Aware COmputing (EACO) Workshop 1 Background Low
More informationComputer Architecture
Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor
More informationPhilippe Thierry Sr Staff Engineer Intel Corp.
HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market
More informationDell Dynamic Power Mode: An Introduction to Power Limits
Dell Dynamic Power Mode: An Introduction to Power Limits By: Alex Shows, Client Performance Engineering Managing system power is critical to balancing performance, battery life, and operating temperatures.
More informationMoorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones
Moorestown Platform: Based on Lincroft SoC Designed for Next Generation Smartphones HOT CHIPS 2009 August 24 2009 Rajesh Patel Lead Architect, Lincroft SoC Intel Corporation Legal Disclaimer INFORMATION
More information8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2
CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.
More informationUser s Guide. Alexandra Yates Kristen C. Accardi
User s Guide Kristen C. Accardi kristen.c.accardi@intel.com Alexandra Yates alexandra.yates@intel.com PowerTOP is a Linux* tool used to diagnose issues related to power consumption and power management.
More informationTHE UNIVERSITY OF CHICAGO MANAGING DIVERSITY IN PERFORMANCE AND ENERGY CHARACTERISTICS ON EMBEDDED SYSTEMS A DISSERTATION SUBMITTED TO
THE UNIVERSITY OF CHICAGO MANAGING DIVERSITY IN PERFORMANCE AND ENERGY CHARACTERISTICS ON EMBEDDED SYSTEMS A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY
More informationPhase-Aware Web Browser Power Management on HMP Platforms
Phase-Aware Web Browser Management on HMP Platforms N. Peters, S. Park, D. Clifford, S. Kyostila, R. McIlroy, B. Meurer, H. Payer, S. Chakraborty Technical University of Munich, Google Inc {nadja.peters,sangyoung.park,samarjit}@tum.de,{danno,skyostil,rmcilroy,bmeurer,hpayer}@google.com
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationFPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES
, suitable for DFA on AES Jonas Krautter, Dennis R.E. Gnad, Mehdi B. Tahoori 10.09.2018 INSTITUTE OF COMPUTER ENGINEERING CHAIR OF DEPENDABLE NANO COMPUTING KIT Die Forschungsuniversität in der Helmholtz-Gemeinschaft
More informationEnergy-Aware CPU Frequency Scaling for Mobile Video Streaming
1 Energy-Aware CPU Frequency Scaling for Mobile Video Streaming Yi Yang, Student Member, IEEE, Wenjie Hu, Student Member, IEEE, Xianda Chen, Student Member, IEEE, Guohong Cao, Fellow, IEEE, Abstract The
More informationWorkload Prediction for Adaptive Power Scaling Using Deep Learning. Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14
Workload Prediction for Adaptive Power Scaling Using Deep Learning Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14 In these slides Machine learning (ML) is applied to
More informationARM big.little Technology Unleashed An Improved User Experience Delivered
ARM big.little Technology Unleashed An Improved User Experience Delivered Govind Wathan Product Specialist Cortex -A Mobile & Consumer CPU Products 1 Agenda Introduction to big.little Technology Benefits
More informationEvaluating the Effectiveness of Model Based Power Characterization
Evaluating the Effectiveness of Model Based Power Characterization John McCullough, Yuvraj Agarwal, Jaideep Chandrashekhar (Intel), Sathya Kuppuswamy, Alex C. Snoeren, Rajesh Gupta Computer Science and
More informationParallelization Techniques for Implementing Trellis Algorithms on Graphics Processors
1 Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti +, Achilleas Anastasopoulos*, Scott Mahlke*, Trevor
More informationMediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency
MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip
More information