Energy-centric DVFS Controlling Method for Multi-core Platforms

Similar documents
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Energy Models for DVFS Processors

Power Control in Virtualized Data Centers

Enabling Consolidation and Scaling Down to Provide Power Management for Cloud Computing

A Fast Instruction Set Simulator for RISC-V

Using Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng. Jakub Krzywda Umeå University

Sandbox Based Optimal Offset Estimation [DPC2]

COL862 Programming Assignment-1

Lightweight Memory Tracing

UCB CS61C : Machine Structures

Managing Hardware Power Saving Modes for High Performance Computing

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012

DEMM: a Dynamic Energy-saving mechanism for Multicore Memories

Improving Throughput in Cloud Storage System

Scheduling the Intel Core i7

Footprint-based Locality Analysis

Abhishek Pandey Aman Chadha Aditya Prakash

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications

Thesis Defense Lavanya Subramanian

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Addressing End-to-End Memory Access Latency in NoC-Based Multicores

Energy-Based Accounting and Scheduling of Virtual Machines in a Cloud System

UCB CS61C : Machine Structures

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

VM Power Metering: Feasibility and Challenges

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

LEoNIDS: a Low-latency and Energyefficient Intrusion Detection System

Bias Scheduling in Heterogeneous Multi-core Architectures

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines

Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated Desktops

Near-Threshold Computing: How Close Should We Get?

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor

A Comprehensive Scheduler for Asymmetric Multicore Systems

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

A Simple Model for Estimating Power Consumption of a Multicore Server System

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

System Simulator for x86

Towards Energy Proportional Cloud for Data Processing Frameworks

High System-Code Security with Low Overhead

Predicting Performance Impact of DVFS for Realistic Memory Systems

Scalable Dynamic Task Scheduling on Adaptive Many-Cores

A Comparison of Capacity Management Schemes for Shared CMP Caches

OpenPrefetch. (in-progress)

Enhanced Operating System Security Through Efficient and Fine-grained Address Space Randomization

Flexible Cache Error Protection using an ECC FIFO

Microarchitecture Overview. Performance

Architecture of Parallel Computer Systems - Performance Benchmarking -

so Mechanism for Internet Services

CoolCloud: improving energy efficiency in virtualized data centers

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez

Kruiser: Semi-synchronized Nonblocking Concurrent Kernel Heap Buffer Overflow Monitoring

COL862 - Low Power Computing

Emerging NVM Memory Technologies

ECE 571 Advanced Microprocessor-Based Design Lecture 7

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

Evaluation of RISC-V RTL with FPGA-Accelerated Simulation

Quantifying power consumption variations of HPC systems using SPEC MPI benchmarks

DynRBLA: A High-Performance and Energy-Efficient Row Buffer Locality-Aware Caching Policy for Hybrid Memories

Meet the Increased Demands on Your Infrastructure with Dell and Intel. ServerWatchTM Executive Brief

Computer Architecture. Introduction

Static and Dynamic Frequency Scaling on Multicore CPUs

ENERGY consumption has become a major factor for

Power Measurement Using Performance Counters

Energy Efficient Big Data Processing at the Software Level

Call Paths for Pin Tools

AMD Opteron Processors In the Cloud

The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory

RAMZzz: Rank-Aware DRAM Power Management with Dynamic Migrations and Demotions

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur

Power and Energy Management

Efficient Resource Management for Cloud Computing Environments

Power Measurements using performance counters

ChargeCache. Reducing DRAM Latency by Exploiting Row Access Locality

Tuning Alya with READEX for Energy-Efficiency

Improving Execution Unit Occupancy on SMT-based Processors through Hardware-aware Thread Scheduling

NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems

Perceptron Learning for Reuse Prediction

POWER MANAGEMENT AND ENERGY EFFICIENCY

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Last time. Lecture #29 Performance & Parallel Intro

HideM: Protecting the Contents of Userspace Memory in the Face of Disclosure Vulnerabilities

Towards Fair and Efficient SMP Virtual Machine Scheduling

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Virtualized ECC: Flexible Reliability in Memory Systems

Increasing Cloud Power Efficiency through Consolidation Techniques

An Intelligent Fetching algorithm For Efficient Physical Register File Allocation In Simultaneous Multi-Threading CPUs

MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review

Dell PowerEdge R910 SQL OLTP Virtualization Study Measuring Performance and Power Improvements of New Intel Xeon E7 Processors and Low-Voltage Memory

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Transcription:

Energy-centric DVFS Controlling Method for Multi-core Platforms Shin-gyu Kim, Chanho Choi, Hyeonsang Eom, Heon Y. Yeom Seoul National University, Korea MuCoCoS 2012 Salt Lake City, Utah

Abstract Goal To minimize total energy consumption in considering of workload characteristics. We focus on a virtualized environment on multicore platforms such as Clouds. Contribution We propose a DVFS controlling method (edvfs) to achieve the goal by exploiting an energy efficiency factor.

Energy Saving Technologies Processor C-state, P-state, T-state, etc Data center Power capping, Cooling system design, etc Source of energy Solar power, wind power, etc

Dynamic Voltage and Frequency Scaling P-state Changing operating frequency while instructions can be processed. C-state = idle state It is also called a DVFS control. Intel Sandybridge Per-socket DVFS control Frequency range 40% ~ 100% of base freq. 100MHz, 133 MHz interval AMD Bulldozer Per-core DVFS control 5 levels

Test Environments Server specification CPU: 2-way Intel 8-core Xeon processor 15 DVFS levels RAM: 128GB (16GB * 8) Virtualization environment KVM Each VM has 1 virtual CPU and 2GB of memory. Each VM is pinned to a physical core. 1:1 mapping between VMs and physical cores.

CPU Power Consumption vs. DVFS

Memory Power Consumption vs. DVFS

Energy Saving via DVFS Low frequency low power Power (Watts) = Energy (Joules) per 1 second How about total energy consumption? Total energy consumption = Power Time Slow processing at low frequency vs. Fast processing at high frequency

Motivation - Energy Consumption vs. DVFS lbm omnetpp povray DVFS level should be varied according to workloads.

Energy Saving by DVFS Energy saving by DVFS is not a simple two options question between low and high frequencies. We should be able to select the most energy-saving DVFS level among more than 10 DVFS levels. Moreover, it can be varied by characteristics of workloads.

Energy Consumption of Two Benchmarks lbm (Memory-intensive) povray (CPU-intensive)

Memory Traffic of lbm and povray Attainable peak memory bandwidth

Energy Efficiency Total energy consumption: E Average power: P Runtime: T Instructions per second (IPS) Total amount of instructions: W Energy efficiency (eff)

Energy Efficiency (cont d) Energy-centric DVFS control (edvfs) Periodic controlling method Minimizing energy consumption per control period (t) Minimizing total energy consumption e.g. In case of n control periods,

Energy-centric DVFS Control Control policies Policy 1 If memory traffic exceeds a given threshold, decrease CPU frequency. Policy 2 If memory traffic is under a given threshold, move CPU frequency towards the direction in which eff is increased.

Implementation Overview CPU 3. Adjust CPU DVFS level according to the policies of edvfs 1. Read power consumption and the number of instructions edvfs Controller Performance Counter Monitor 2. Get required information from PCM every 5 seconds

Applying Control Policies Applying policy 1 is straightforward. But, applying policy 2 is not. Estimating eff for the changed CPU frequency is a very complex problem. To deal with this problem, edvfs changes CPU frequency to a different value at every control period.

Evaluation Experimental environments HW 2-way Intel Xeon 8-core processors, 128GB memory SW KVM virtualization framework Each VM has 1 virtual CPU and 2 GB memory. Workload Benchmarks Case 1 Case 2 High memory traffic (2 * lbm, 2 * libquantum, soplex, omnetpp, mcf, bzip2) Low memory traffic (lbm, libquantum, gobmk, hmmer, perlbench, sjeng, 2 * povray)

Experimental Results - Memory Traffic vs. DVFS Level Case 1 Case 2 When memory traffic is high, CPU frequency is kept at low level.

Experimental Results - Memory Traffic vs. Power Consumption Case 1 Case 2 When memory traffic is high, power consumption is largely affected by memory traffic. If not, it is largely affected by active CPU cores.

Experimental Results - Total Energy Consumption and Execution Time Case 1 Case 2 Static CPU frequency 2.6 GHz 1.9 GHz 1.4 GHz 193,156 J 2,138 sec 113,571 J 1,570 sec 160,385 J 2,389 sec 103,799 J 1,945 sec 156,889 J 2,856 sec 106,182 J 2,497 sec [Case 1: high memory traffic, Case 2: low memory traffic] Dynamic DVFS 148,584 J 2,593 sec 102,184 J 1,830 sec Our edvfs consumes less energy than the lowest CPU frequency case. edvfs is not the slowest.

Conclusion Low CPU frequency does not always mean low energy consumption. We proposed an energy-centric DVFS controlling method to adjust DVFS level in considering of workload characteristics. We showed that total energy consumption with our method is lower than that with the lowest CPU frequency.