ptop: A Process-level Power Profiling Tool

Similar documents
Fine-Grained Power Management Using Process-level Profiling

I/O Systems (4): Power Management. CSE 2431: Introduction to Operating Systems

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur

Power and Energy Management

Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM

Sedef Akinli Kocak, Ryerson University

Analysis of offloading mechanisms in edge environment for an embedded application

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

VM Power Prediction in Distributed Systems for Maximizing Renewable Energy Usage

CHAPTER 6 STATISTICAL MODELING OF REAL WORLD CLOUD ENVIRONMENT FOR RELIABILITY AND ITS EFFECT ON ENERGY AND PERFORMANCE

Evaluations of Target Tracking in Wireless Sensor Networks

Display Management: Outline

CHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER

References. K. Sohraby, D. Minoli, and T. Znati. Wireless Sensor Networks: Technology, Protocols, and

POWER MANAGEMENT AND ENERGY EFFICIENCY

Power Optimization a Reality Check

19: I/O Devices: Clocks, Power Management

ARCHITECTURAL SOFTWARE POWER ESTIMATION SUPPORT FOR POWER AWARE REMOTE PROCESSING

A Programming Environment with Runtime Energy Characterization for Energy-Aware Applications

Energy Management Issue in Ad Hoc Networks

Monitoring Agent for Unix OS Version Reference IBM

Implementation and Analysis of Large Receive Offload in a Virtualized System

POWER-AWARE SOFTWARE ON ARM. Paul Fox

Energy Management Issue in Ad Hoc Networks

Feng Chen and Xiaodong Zhang Dept. of Computer Science and Engineering The Ohio State University

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Apprehending Joule Thieves With Cinder

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques

Lecture 2: September 9

CHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT

A Power and Temperature Aware DRAM Architecture

CHAPTER 3 GRID MONITORING AND RESOURCE SELECTION

Portable Power/Performance Benchmarking and Analysis with WattProf

RAPIDIO USAGE IN A BIG DATA ENVIRONMENT

Lecture 15. Power Management II Devices and Algorithms CM0256

Real-Time & Embedded Operating Systems

A Simple Model for Estimating Power Consumption of a Multicore Server System

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Enosis: Bridging the Semantic Gap between

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

Fine Grained Power Modeling For Smartphones Using System Call Tracing. Y. Charlie Hu Paramvir Bahl Yi-Min Wang

Enhancing cloud energy models for optimizing datacenters efficiency.

DPDK Roadmap. Tim O Driscoll & Chris Wright Open Networking Summit 2017

Parallelizing TCP/IP Offline Log Analysis and Processing Exploiting Multiprocessor Functionality

Dynamic Translator-Based Virtualization

Swapping. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

May 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to

Operating Systems. Operating Systems

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

SysGauge SYSTEM MONITOR. User Manual. Version 3.8. Oct Flexense Ltd.

JavaScript and Flash Overhead in the Web Browser Sandbox

NC Education Cloud Feasibility Report

FlexFetch: A History-Aware Scheme for I/O Energy Saving in Mobile Computing

Precision Time Protocol, and Sub-Microsecond Synchronization

Meltdown or "Holy Crap: How did we do this to ourselves" Meltdown exploits side effects of out-of-order execution to read arbitrary kernelmemory

The Top Five Reasons to Deploy Software-Defined Networks and Network Functions Virtualization

ECE 486/586. Computer Architecture. Lecture # 2

10 Steps to Virtualization

File System Forensics : Measuring Parameters of the ext4 File System

ENGR 3950U / CSCI 3020U Midterm Exam SOLUTIONS, Fall 2012 SOLUTIONS

Virtual Memory Outline

Adapting Mixed Workloads to Meet SLOs in Autonomic DBMSs

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

COSC3330 Computer Architecture Lecture 20. Virtual Memory

System Requirements E 23 rd, Hutchinson KS (866)

EaSync: A Transparent File Synchronization Service across Multiple Machines

Understanding the performance of an X user environment

Operating Systems. Computer Science & Information Technology (CS) Rank under AIR 100

Process Scheduling Part 2

Energy-Aware Writes to Non-Volatile Main Memory


Operating Systems and Protection CS 217

Dynamic Classification of Repetitive Jobs In Linux For Energy-Aware Scheduling: A Feasibility Study

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

CC MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters

Benchmarking of Dynamic Power Management Solutions. Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007

OPTIMIZING MOBILITY MANAGEMENT IN FUTURE IPv6 MOBILE NETWORKS

Design of Elastic Application for Seamless Cloud Computing

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor

Improving Real-Time Performance on Multicore Platforms Using MemGuard

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

VARIABILITY IN OPERATING SYSTEMS

CPS221 Lecture: Operating System Functions

Dynamic Power Optimization for Higher Server Density Racks A Baidu Case Study with Intel Dynamic Power Technology

A Novel Priority-based Channel Access Algorithm for Contention-based MAC Protocol in WBANs

Operating- System Structures

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Managing Battery Lifetime with Energy-Aware Adaptation

instruction is 6 bytes, might span 2 pages 2 pages to handle from 2 pages to handle to Two major allocation schemes

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Memory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization

CSE 120 Principles of Operating Systems

Multiprocessor Systems. Chapter 8, 8.1

A unified multicore programming model

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

ENERGY has become increasingly important for resource

Energy efficient mapping of virtual machines

Optimal Algorithm. Replace page that will not be used for longest period of time Used for measuring how well your algorithm performs

Adaptive Middleware for Distributed Sensor Environments

Transcription:

ptop: A Process-level Power Profiling Tool Thanh Do, Suhib Rawshdeh, and Weisong Shi Wayne State University {thanh, suhib, weisong}@wayne.edu ABSTRACT We solve the problem of estimating the amount of energy consumed by each application in the system by presenting the design and implementation of ptop, a simple and efficient process-level power profiling tool. Being a service of the operating system, ptop provides real-time information about the energy consumed by each process in terms of different resource components. ptop also supports a set of well-defined energy-aware application programming interfaces (API) helping the system and application developers build energy optimization policies. Different from hardware-based measurement tools, ptop is software-based, requires no additional hardware, and therefore, it is easy to apply in various platforms. Using ptop APIs, we developed an application adaptation scheme that extends the battery lifetime to meet certain time constraints. Experiment results showed that ptop is lightweight and easy to use. 1. INTRODUCTION Nowadays, power management has become increasingly important in both data centers and mobile devices. Over the past five years, there has been a significant increase in the number of data centers, in addition to an estimated doubling of energy used by servers and cooling systems [1]. Moreover, the emergence of new technologies and a wide range of new mobile revolutionary products and applications have introduced a need for more power resources in mobile devices. The existing power resources Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 29 ACM X-XXXXX-XX-X/XX/XX...$5.. are limited and still do not meet the requirements of longer battery life time. Therefore, improving energy-efficiency in both data centers and mobile devices has become more and more demanding. In this paper, we focus on energy optimization in the application level, particularly application energy profiling. Some of the current solutions lack accuracy and exhibit some drawbacks making them difficult to use widely. First, they are hardwarebased [2, 3], making it expensive, inflexible, and difficult to deploy widely. Second, they provide only offline information, therefore making it difficult for applications to adapt their behaviour at runtime. Finally, energy profiling is not offered as a service in the system, causing difficulty for system developers to implement energy-aware adaptation protocols to obtain energy optimization. We present the design and implementation of ptop, a process-level energy profiling tool, with many helpful features. Being a service of the operating system, ptop runs at the kernel level and provides energy consumption data from all applications and processes running in the system. ptop is softwarebased, requires no additional hardware, therefore, easy to be applied in various platforms. Evaluation results showed that on average ptop consumes 3% of the CPU and.15 percent of memory. ptop is light-weight, thus, will not affect the performance of other applications. Moreover, ptop provides a set of well-defined energy-aware application programming interfaces (APIs) which will play an important role in making more energy real-time adapting and scheduling decisions in both mobile devices and cloud computing environments. With these APIs, we have developed an application adaptation framework (Section 4) for mobile devices that could extend the battery life time considerably, as well as making energy-aware QoS for application sessions across multiple domains [4]. Finally, we present future work and conclude in Section 6.

Display Utility Energy profiling Daemon CPU Disk NIC Memory APIs In-memory Data User space Kernel space Figure 1: The architecture of ptop. 2. PTOP DESIGN 2.1 Energy model Our tool uses a simple and feasible energy model. An application s energy consumption is estimated indirectly through its resource utilization. The quantity of energy consumed by an application E appi over any time interval t can be calculated as follow: E appi = U ij E resourcej + E interaction where U ij is the usage of application i on resource j, E resourcej is the amount of energy consumed by resource j, and E interaction is the indirect amount of energy consumed by the application because of the interaction among system resources, in the time interval t. Energy consumption of a particular resource is a function of its states(read, write, etc) and transitions. Given a particular resource, (e.g. The CPU, memory, network interface, etc), by knowing its set of states S and transitions T, its energy consumption in a time interval t is estimated as followed: E resourcej = P j t j + n k E k jins kint We believe that this approach is general enough and can easily be applied to different resources. Estimating E interaction is more difficult, since this energy is often marked by system-level policies such as caching and buffering. We left this for future work. 2.2 System architecture The system architecture of ptop is illustrated in Figure 1. ptop consists of an energy profiling daemon running in the background, continuously profiling resource utilization of each application process. It also tracks statistic data of states and transitions of each system resource if possible. The amount of energy consumed by each application in each time interval t is then calculated and stored temporarily in memory. Running at the kernel space, the energy profiling daemon is able to track all available system activity information. The second component of ptop is a display utility. This component is similar to that of top utility, except that it provides additional information about the break-down energy consumed by the application on different resources such as the CPU, network interface, memory and disk drive. Users of battery-based mobile devices such as PDAs and laptops can use this utility to figure out and kill unnecessary and energy-consuming applications, in order to save battery energy for other more important ones. The last but most interesting component of ptop is a set of well-defined energy-aware (APIs). Any process can call these APIs to acquire the energy consumption in terms of different energy consuming components in the last specified time period. The definition of these APIs is articulated next. 2.3 APIs In this section, we define a set of energy-aware APIs that provide current and previous energy consumption information of each application in terms of different resources such as the CPU, network interface, memory, hard disk and display. We envision that these primitive and general APIs are very useful for system developers to write energy optimization middleware such as energy-aware application adapters and energy-aware schedulers. For example, the following API defines the interface through which an application can query the energy consumed by the CPU in a time interval. int CPUEnergy( int PID, int length) The inputs consist of process id (PID) and the length of time in seconds (length) back from the time when this API is called. The output is the energy consumed by the CPU to serve the process in terms of Joule. We should set the threshold of length parameter appropriately, depending on the amount of available memory. ptop only keeps an energy profile of running processes for the last T seconds. If a long term energy profile is needed, it is the application s task to sample every T seconds to keep the record. The APIs for other resources are similar by replacing the CPU with their corresponding names. Based on these primitive and general APIs, developers can write more complex and specific ones. 3. IMPLEMENTATION We implemented ptop using C++ programming language in Linux (i.e., Fedora Core 1, kernel 2.6.28),

and we believe our ideas can be implemented in other operating systems easily. The energy profiling daemon runs in the kernel space to guarantee sufficient privileges to access to data of system activities. Like top utility, this daemon maintains a dynamic list of running processes, which contains information about each process resource utilization. This data is obtained by accessing the /proc directory. Ideally, the operating system and hardware vendor should provide interfaces to access power consumption data of hardware at different states. Unfortunately, such interfaces are not widely available; therefore, ptop has a configuration file specifying power data from hardware vendors. In the next subsection below, due to space limit, we briefly present how we estimate the energy consumed by each of most energy-consuming resources, namely the CPU, wireless card and hard disk. Note that we are currently working on the memory energy consumption, which will be included in ptop soon. 3.1 CPU energy We estimate the processor energy consumption by tracking the amount of time the processor running at different frequencies and the number of transitions between different frequencies in each sampling interval. This information is available in the interface /sys/devices/system/cpu/. Based on this data, total energy consumed by the processor during the sampling interval T is calculated as follows: E CP U = P j t j + n k E k j k where P j and t j are the power consumption and the time the processor running at a particular frequency, respectively; n k is the number of times transition k occurs, and E k is corresponding energy of that transition. We attribute the total energy of the CPU proportionally to the process s total CPU time. 3.2 Network interface energy Since a wired network interface consumes a very small amount of energy, in comparison with a wireless network interface, ptop only estimates the energy process consumes in the wireless network interface. Energy spent on the wireless network interface of a process i is calculated as follows: E Neti = t sendi P send + t recvi P recv where t sendi and t recvi are the amount of time process i sends and receives packet; P send and P recv are the power consumptions of the wireless card at sending and receiving states. We used kernel patch from atop [5] to get information about network activities per process. 3.3 Hard disk energy Energy spent on hard disk of a process i is calculated as follows: E Diski = t readi P read + t writei P write where t writei and t readi are the amount of time process i writes to the disk and reads from the disk; P write and P read are the power consumptions of the disk writing and reading states. We do not consider the disk transition states caused by an application, since such information is currently not available with the operating system. 3.4 Discussions Our energy estimation approach is based on resource consumptions of application and power specification information from hardware vendors. However, hardware vendors usually specify the Thermal Design Power (TDP), i.e., the maximum amount of power of the cooling system can dissipate. This value is not usually equal the actual maximum power consumed by the hardware component. We believe that TDP is good enough for our purpose. Moreover, we also assume that energy consumed by a hardware component is proportional to its current usage. This assumption holds for most of the computer hardware components. We do realize that the accuracy of our approach is mainly marked by the interaction between resource activities and system policies. For example, a page fault can lead to disk access, or different sizes in the network buffer can lead to different estimation results on network performance. We plan to investigate this interaction in the future. 4. PERFORMANCE EVALUATION In order to evaluate the accuracy of our tool, we used Watts Up Pro meter [6] to profile the energy usage on the client machine, Watts Up Pro can sample the energy usage on a client machine with approximately one time per second. Ideally ptop accuracy can be evaluated using special hardware on the motherboard like in [3], we are currently working on this. Figure 2 shows the power usage measured using Watts Up and our ptop tool under a random workload samples taken every 1 seconds from our case study applications, ptop is proved to be fine-grained, highly responsive and accurate with less than 2 Watts median error in power measurement according to our case study. 4.1 ptop Overhead To evaluate the overhead of ptop, we ran ptop in a laptop with the same configurations in Section 3. We used ptop to monitor resource usage info

Power (W) 18 15 12 9 6 ptop Watts Up Resource monitor User applications Demand predictor Adaptation manager 3 API calls User space 1 2 3 4 5 6 7 8 9 Time (S) ptop tool Kernel space Figure 2: ptop evaluation. Figure 3: Adaptation framework desgin. of all and more than 6 processes running in the systems. For each process, we monitored CPU usage, Network usage and Hard Disk usage. We set the sampling interval to 1 second. We measured the overhead of ptop using top commands. Experiment results showed that in average ptop consumes 3% of the CPU and.15 percent of memory. This illustrates that ptop is light weight and may not affect the performance of other applications. 4.2 Case Study To show how important and feasible our ptop tool is, we implemented a case study experiment on a computer laptop, we used three Applications, a sorter, downloader and image viewer, the goal of the experiment is to extend battery life time to meet the time requirement of the downloader application, to download 5 MB of data before getting interrupted because of the battery s energy. 4.3 Adaptation Framework In order to achieve our case study goal, a userlevel Adaptation model that works between user applications and our energy profiling tool APIs has been designed and implemented. Figure 3 shows the overall framework design of our adaptation model. Our adaptation model resides in the middle between user and kernel spaces, it consists of three main modules, resource monitor, demand predictor and adaptation manager. The resource monitor module is responsible of polling the underlying ptop API layer regarding power information of processes the user is running, then it forwards this information to the demand predictor module which uses a simple prediction algorithm to predict the power consumption of each process during the next time period. It then forwards this information to the adaptation manager which uses them along with applications priority information to decide which application to be adapted and to what extent. 4.4 Experiment Setup In this section, we implemented our case study experiment on an IBM Thinkpad T42 laptop 1.7 GHz with 1 GB of memory based on Linux kernel 2.6.28 operating system, we installed our adaptation framework and case study applications, a brief description of each application is shown in Table 1. Each application has specific rules among the energy saving techniques available, for the sorter application the energy consumption scheme can be reduced by gradually increasing the time interval between successive sorting operations, in our experiment we set different running modes for the sorter application, those include, 2, 4, 8 and up to 16 seconds interval between two successive sorting operations, For the image viewer application we believe that reducing the backlight brightness level of the LCD screen will decrease its energy consumption and the consumption of other applications as well. The LCD screen brightness of the IBM ThinkPad can be adapted based on 7 different modes ranging between % to 1% of brightness. In the experiment and for ease of calculation we normalized the current battery charge capacity to 3k Joules. The experiment was stopped when this amount of energy was consumed, The experiment s goal was to allow the downloader application to finish downloading 5 MB of data during the time specified. According to the downloading speed of the wireless network at the time we ran the experiment; it would take 2 minutes (12 s) to finish downloading the data from our server, this amount of time was set as the target running time we wanted to achieve. To help the downloader process finish on time, a series of adaptation decisions have been done by our framework to both the sorter application and the display brightness.

Energy (KJ) 35 3 25 2 15 1 5 Application Source Code Modification Power Consumption Due Priority Adaptation Modes Sorter Yes CPU 2, 2, 4, 8, 16 s intervals Downloader No Network + Disk 1 - Image Viewer No Disk + Display 3 % - 1% brightness Table 1: Applications description. Energy Profile With Adaptation 144 288 432 576 72 864 18 121 Time (S) Energy profile Estimation lines Figure 4: Adaptation decisions by time. 4.5 Experiment Results We ran the experiment with the adaptation decisions being made every 1.2 minutes, during this time and with a sampling period of 3 seconds, the demand predictor module collects 24 samples of the energy consumption of each process; Figure 4 shows the sequence of adaptation decisions performed after each data sampling process. The system s target lifetime has been successfully achieved at the end. 5. RELATED WORK A lot of energy optimization techniques have been applied so far in all three levels of computer systems. In computer architecture level, the notion of energy-proportional hardware has been proposed [7], low power hardware has also been introduced. In the system layer, a lot of techniques were introduced such as dynamic voltage scheduling, management of devices power states, workload localization and thermal-aware scheduling. And in the application level, research efforts have been focusing on techniques such as energy-adaptation by trading off application fidelity [2], controlling dynamic voltage and frequency scaling (DVFS) setting from within applications [8], energy complexity estimation, and energy-aware protocols [4, 2]. Many other studies and solutions have been done in the area of energy profiling [2, 8, 9, 1]; however, some of these solutions lack accuracy and exhibit some drawbacks making them difficult to use widely. Hardware-based energy profiling methods [2, 3] require multimeters for power sampling, multimeters are expensive, bulky and difficult to be deployed, other than that, energy profiling information is provided offline, which make it difficult for applications to adapt their behaviour at run-time. In ECOSystem [1] however, energy profiling is offered online. But we found that ECOSystem does not model clock frequency scaling, which is an important feature supported by most recent CPUs. Our profiling tool takes this into consideration and profiles the CPU energy usage based on different frequencies supported by the hardware. 6. SUMMARY AND FUTURE WORK In this paper, we propose a process-level power profiling tool ptop, which aims to provide real-time information about the energy consumed by each process in terms of different resource components. ptop also supports a set of APIs, whose powerfulness is demonstrated by an adaptation case study. Our future work includes three main directions. First, we plan to improve the accuracy of ptop and add support for memory. Second, we believe that future operating systems should support our APIs. Finally, we intend to investigate more effective adaptation mechanisms by leveraging our APIs. 7. REFERENCES [1] Report to congress on server and data center energy efficiency, Environment Protection Agency, 27. [2] J. Flinn and M. Satyanarayanan, Energy-aware adaptation for mobile applications, SOSP, 1999. [3] Y. L. C Xian, Le Cai, Power measurement of software programs on computers with multiple i/o components, IEEE Transactions on Instrumentation and Measurement, oct 27. [4] H. Lufei and W. Shi, Energy-aware qos for application sessions across multiple protocol domains in mobile computing, Comput. Netw., 27. [5] An advanced monitor for linux-systems, http://www.atcomputing.nl/tools/atop/index.html. [6] Watts up pro., http://www.wattsupmeters.com. [7] L. A. Barroso and U. Holzle., The case of energy proportional computing, IEEE Computer, sep 27. [8] X. Liu, P. Shenoy, and M. D. Corner, Chameleon: Application-level power management, IEEE Transactions on Mobile Computing, 28. [9] A. Kansal and F. Zhao, Fine-grained energy profiling for power-aware application design, SIGMETRICS Perform. Eval. Rev., 28. [1] H. Zeng, C. S. Ellis, A. R. Lebeck, and A. Vahdat, Ecosystem: managing energy as a first class operating system resource, SIGPLAN Not., 22.