Operating System Services for Task-Specific Power Management. Der Technischen Fakultät der. Universität Erlangen-Nürnberg. zur Erlangung des Grades

Size: px
Start display at page:

Download "Operating System Services for Task-Specific Power Management. Der Technischen Fakultät der. Universität Erlangen-Nürnberg. zur Erlangung des Grades"

Transcription

1 Operating System Services for Task-Specific Power Management Der Technischen Fakultät der Universität Erlangen-Nürnberg zur Erlangung des Grades DOKTOR-INGENIEUR vorgelegt von Andreas Weißel Erlangen 2006

2 Tag der Einreichung: Tag der Promotion: Als Dissertation genehmigt von der Technischen Fakultät der Universität Erlangen-Nürnberg Dekan: Berichterstatter: Prof. Dr.-Ing. Alfred Leipertz Prof. Dr.-Ing. Wolfgang Schröder-Preikschat, Prof. Dr.-Ing. Frank Bellosa

3 Acknowledgments Many people have supported and encouraged me from my first steps into operating system research to the final version of this dissertation I am very indebted to all of them. First of all I would like to thank Prof. Dr.-Ing. Schröder-Preikschat and Prof. Dr.-Ing. Bellosa for supervising this dissertation and for their generous time and commitment. I owe special thanks to Prof. Bellosa for his sustained interest in my academic progress and for the opportunity to pursue this thesis at his department at the University of Karlsruhe during the summer semester Special acknowledgments go to my colleagues at the department for fruitful discussions, a pleasant working atmosphere and most important a lot of fun. I am indebted to the talented students who contributed to various power management projects in the field of this dissertation by doing study and diploma theses. These include Björn Beutel, who worked on Cooperative-I/O, Martin Waitz, Simon Kellner and Florian Fruth, who studied approaches to energy accounting, and Matthias Faerber and Thomas Weinlein, who were involved in user-guided power management. I owe a lot of thanks to my family and friends for their patience and continuous support. I am very indebted to Marcus Meyerhöfer for valuable last-minute proof-reading, even on the day before his wedding. Finally, I would like to thank one very special person. Annette, thank you for being so patient with me during the last stages of the dissertation. You provided a lot of support and motivation, more than you could ever imagine.

4

5 Abstract Mobile computing systems have to provide sufficient operating time in spite of limited battery capacity. Therefore, they rely on energy-efficient management of system resources. This issue is addressed by system components with low-power operating modes which reduce the power consumption considerably. However, power management mechanisms can cause increased latencies and may affect application quality negatively. While this may be tolerated for specific applications as far as energy is saved, the user will expect maximum performance for other tasks. Consequently, one important insight is that algorithms controlling low-power operating modes have to make application-specific trade-offs between performance and energy savings. Furthermore, contemporary power management policies are often based on heuristics and implicit assumptions that do not consider this trade-off and cannot be modified or adapted to the performance requirements of the specific application. In this context, the terms performance and quality have to be understood as synonyms, related to speed, usability or other runtime properties of a task. The goal of this thesis is to provide system services that allow to make application-specific trade-offs between energy savings and application performance. Different approaches to power management are presented that consider task-specific performance requirements and take the effects of low-power modes on application quality into account. First, system services are introduced that determine the energy consumption and monitor runtime parameters related to application performance. With this information, power management policies obtain a feedback on the consequences of their decisions. Thus, they can react to insufficient energy savings and avoid violations of application-specific performance requirements. It will be demonstrated that an adaptive management of low-power modes is feasible for interactive applications. As a second approach, an extended system interface to be used by energy-aware programs is presented. The application developer can specify which device operations are time-critical and for which operations a performance degradation is tolerated. The granted flexibility can be exploited by the operating system to maximize energy savings without violating performance requirements of specific operations. Finally, an approach is presented that enables the user to train the system to make optimum, application-specific trade-offs between performance and energy savings at runtime. Therefore, methods from machine learning are applied to system power management. With this approach, the individual user s preferred power/performance trade-off can be taken into account. It is shown how to realize a hierarchical energy management that distinguishes certain applications and switches dynamically between different, specialized power management policies. Prototype implementations for Linux are presented and evaluated with energy measurements, proving the feasibility of task-specific power management. i

6 Parts of the material presented in this thesis have previously been published as: 1. Andreas Weißel and Frank Bellosa. Process Cruise Control Event-driven clock scaling for dynamic power management. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES 02), October Andreas Weißel, Björn Beutel, and Frank Bellosa. Cooperative-I/O A novel I/O semantics for energy-aware applications. In Proceedings of the Fifth Symposium on Operating System Design and Implementation (OSDI 02), December Frank Bellosa, Simon Kellner, Martin Waitz, and Andreas Weißel. Event-driven energy accounting for dynamic thermal management. In Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP 03), September Andreas Weißel, Matthias Faerber, and Frank Bellosa. Application characterization for wireless network power management. In Proceedings of the International Conference on Architecture and Computing Systems (ARCS 04), March ii

7 Table of Contents 1 Introduction Motivation Objectives Outline Background: Power Management at the Component Level ipaq Power Breakdown Processor and Memory DFVS Policies Clock Throttling Memory Power Management Hard Disk Break-Even Time Spin-Down Policies Wireless Network Interface Card Summary and Discussion Feedback-Driven Power Management Resource Containers Implementation Handling Client/Server Relationships Summary Feedback on Energy Consumption CPU and Memory Energy Accounting Energy Accounting of I/O Devices Energy Limits Evaluation Related Work on System Infrastructures for Energy Control Summary Influence of Power Management on Application Performance Process Cruise Control Performance of Interactive Applications Response Time and User Think Time iii

8 Table of Contents Interactive Response Times on the ipaq Handheld Related Work on Power Management for Interactive Workloads Discussion Summary Energy-Aware Applications Overview Design Cooperative File Operations Interactions Between Cooperative Operations and the Disk Cache Energy-Aware Caching & Update Device Control Implementation Cooperative File Operations Drive-Specific Cooperative Update Power Mode Control Evaluation A Cooperative Audio Player Synthetic Tests Varying the Number of Cooperative Processes Related Work Operating System Interfaces for Energy-Aware Applications Application-Aware Adaptation Source Code Transformation Summary and Discussion User-Guided Power Management Principle of Operation Approaches to Supervised Learning Machine Learning for Operating System Power Management Case Study: Wireless Network Power Management Nearest Neighbor Algorithm Classification and Regression Trees Summary Case Study: CPU Frequency Scaling Implementation Evaluation Summary Related Work on Workload Classification Summary and Discussion iv

9 Table of Contents 6 Conclusion Contributions Future Directions Bibliography 113 Einleitung 131 Zusammenfassung 137 v

10 Table of Contents vi

11 List of Figures 1.1 Power consumption of processors versus energy density of batteries Exemplary trade-off between energy consumption and performance Average power consumption and rel. execution time of MiBench benchmarks Principal of operation of clock throttling Transition of a Travelstar hard disk from idle to standby and back to idle mode IEEE wireless network power management Power consumption of the Cisco Aironet wireless network interface Example Resource Container hierarchy Power consumption of test programs running on an Intel PXA 255 CPU Resource Containers refreshing of energy limits Estimated and measured power consumption of the ipaq handheld Measurement of the ipaq s power consumption Execution times of different benchmarks on an Intel PXA 255 CPU Frequency domains of the Intel XScale processor Alternation of response times and user think times Heuristics for determining interactive response times Algorithm to derive response times: Resource Container being replaced Algorithm to derive response times: new (next) Resource Container Response times of different interactive applications Response times of the webbrowsers dillo and minimo CPU bursts and network communication of dillo and minimo Adaptive control of wireless network power management Clustering of I/O requests Components of Cooperative-I/O Amp switching between two buffers Comparison of different hard disk power management policies Intra-task clustering of hard disk accesses Inter-task clustering of hard disk accesses Reads with varying average period length Writes with varying average period length Varying the number of cooperative processes vii

12 List of Figures 4.10 Reordering of the process schedule to increase disk idle times The process of training & classification Training & classification for operating system power management Power consumption of the wireless interface card during a run of vlc The root of the classification tree for wireless network power management Classification tree for CPU power management Stromverbrauch von Prozessoren im Vergleich zur Energiedichte von Batterien Modellhafte Abwägung zwischen Energieverbrauch und Performance viii

13 List of Tables 2.1 System components of the ipaq with a high variation in power consumption IBM Travelstar 15 GN hard disk: operating modes and their properties Definitions for computing the break-even time for hard disk power management Power consumption of typical hard disk operating modes and transition overhead Power consumption and transition overhead of the Cisco Aironet card Intel PXA performance counter events Subset of events that correlate with energy consumption Energy estimation errors for different microbenchmarks Energy estimation errors for different applications Estimation errors when multiplexing between pairs of events Response times of different applications at a CPU speed of 398 MHz Time spent in different operating modes during a run of amp Time spent in different operating modes during synthetic tests Features used for classification (k-nearest neighbor algorithm) Most significant features to distinguish different applications Runtime parameters of network communication monitored by the OS Energy consumption of different applications running on the ipaq Energy consumption of MiBench running on a directory mounted over NFS. 97 ix

14

15 1 Introduction This dissertation investigates energy management in mobile, battery-powered computing devices. Two, often conflicting goals are addressed: increasing the system s runtime by saving energy and providing sufficient application quality. Operating system services are introduced that allow to monitor and control the power consumption and application quality. With a cooperative approach between the system s energy management and the application or the user, task-specific trade-offs between these two goals can be made. 1.1 Motivation In recent years, one aspect of computing devices has gained more and more importance: mobility. Personal appliances like PDAs, cell phones or laptops have become an indispensable part of everyday s life. The design and implementation of mobile devices faces several constraints, as the computing power, memory, and energy is limited. As these systems are usually battery powered, the power and energy consumption directly affects operating time and, consequently, the usability of the device. Constraints regarding the size and weight of batteries limit their available capacity. What makes the problem even harder is the constant need to add functionality and to advance computing power and performance, with the consequence of an ever-growing demand for energy. Battery capacity is improving by 5 10 % per year, according to optimistic studies, and cannot keep pace with the rapid growth of energy requirements. This phenomenon is illustrated in figure 1.1 which shows the widening gap between the power consumption of processors and the batteries energy density (from Lahiri et al. [LRDP02]). To address this issue, hardware manufacturers have developed system components with lowpower operating modes. The management of these low-power modes at runtime with the goal of 1

16 1 Introduction power consumption [W] energy density [Wh/kg] power [W] energy density [Wh/kg] Figure 1.1: The widening gap between power requirements of processors and the energy density of batteries maximizing energy savings is known as dynamic power management. Power management algorithms or policies are implemented on hardware, system or application level. Mobile devices often support wireless communication (e. g., via Infrared, Bluetooth or IEEE wireless LAN) and are equipped with some kind of storage device (e. g., flash memory or hard disk). Own experiments demonstrated that wireless network power management can increase the operating time of the popular ipaq 3970 handheld by up to 50 %. Modern hard disks allow to stop the spindle motor, reducing the idle power consumption of a 1-inch Microdrive hard disk by over 80 %. The power consumption of an IBM Thinkpad T43 laptop featuring an Intel Pentium M CPU at high load can be reduced from 43 to 31 W, i. e., by almost 30 %, if the frequency and voltage of the processor are scaled down. At first glance, power management techniques seem to be able to bridge the growing gap between the limited capacity of contemporary battery technology and the ever-increasing demand for energy. However, a closer look at the effects of power management reveals the following observations: Energy savings do not come for free. System components operate at a reduced speed and transitions between active and low-power modes can cause latencies and may affect the quality of an application. For instance, power management can reduce the throughput of an I/O transfer or cause lost frames and jitter in multimedia playback. There is a trade-off between energy savings and, in the broadest sense, quality. Consequently, there can be an influence on the performance of the system and individual applications, possibly affecting usability. Performance requirements are task-specific. If or to what degree the user is willing to tolerate a degradation in performance or quality depends on the specific task. Delays due to power management can frustrate the user, while for specific applications even higher energy savings may be favored. For instance, single keystrokes in a text editor should be processed without noticeable delays. However, loading a web page can take hundreds of 2

17 1.1 Motivation energy consumption minimum expected quality (2) (1) A energy consumption minimum expected quality (2) (1) B (3) (3) performance / quality performance / quality Figure 1.2: Exemplary trade-off between energy consumption and performance milliseconds, including delays due to network power management, without irritating the user. Throughout this thesis, the terms performance and quality are used synonymously and may apply to different quality-of-service measures like the speed, usability or response times of a task. Figure 1.2 illustrates the influence of power management on application performance for two different, hypothetical scenarios. The curves represent possible trade-offs between energy consumption and performance, specific for two applications A and B. Three operating modes or settings are distinguished (points (1) to (3)), e. g., different CPU frequency/voltage configurations. It can be seen that for application A, energy savings come at the cost of reduced performance, while B is not affected significantly by power management. Provided that the dotted lines represent the minimum quality level the user is willing to tolerate for each application, setting (3) should not be used when running A. As a consequence, power management policies have to be aware of the effects of low-power modes and the user s expectations on application quality. This insight reveals a fundamental aspect of power management: a low-power technique or policy will only be successful if it is operating transparently or if the user is willing to pay for it. Many power management algorithms found in today s soft- and hardware are based on heuristics and implicit assumptions. By observing the use of the device, these policies decide when to switch to which operating mode. The implemented rules are based on the assumption that there are workloads for which low-power modes are inappropriate and workloads for which energy management is tolerated. However, application scenarios can exist for which these heuristics will reach wrong decisions or the implicit assumptions may not apply. As a consequence, an operating mode can be chosen that is either insufficient for the performance requirements of the current application or wastes energy. In these cases, an adaptation, i. e., a replacement or modification of the heuristics is often not feasible. Application-specific performance demands are usually neglected. At best, these policies can be configured in some way in order to account for individual user preferences or platform-specific properties. A detailed analysis of power 3

18 1 Introduction management at the component level will be presented in chapter 2. These observations raise the following questions: Which opportunities exist to derive the current power consumption and application quality at runtime? Is it feasible to control the power/performance trade-off with this information automatically? How can dynamic power management be guided to make appropriate trade-offs between energy savings and application quality? How can information on task-specific performance requirements be incorporated into operating system power management? These questions are addressed in this thesis: With the support of appropriate system services, dynamic power management can save energy without violating task-specific performance requirements. With feedback on the effects of low-power modes, adaptive policies are feasible that limit the degradation of application quality and control the power consumption. A collaborative approach between the operating system and applications or the user enables the system to make optimum trade-offs between performance and energy. 1.2 Objectives The goal of this thesis is the exploration of different approaches to task-specific power management. Different applications have different performance requirements and are influenced in different ways by power management policies. It will be investigated how dynamically and with respect to the application energy savings can be traded for performance. I will present three approaches to energy management that address this trade-off explicitly: Let the system control the effects of power management on energy consumption and application quality To facilitate the implementation of adaptive power management, services are introduced that monitor and control the energy consumption and determine certain runtime parameters of applications. This way, energy-aware policies or programs obtain a feedback on the effects of dynamic power management. As a result, both power consumption and performance of specific applications can be controlled at runtime. Challenges of this approach are the runtime estimation of the system s power consumption and the quantification of changes in the performance as perceived by the user. For instance, power management should not affect the response times of interactive programs to user input negatively. With information on the energy consumption and certain system parameters that allow to derive the performance of applications, dependencies and correlations between operating modes of different system components can be detected. Without knowledge on task-specific performance demands, this approach is restricted to detect and limit changes in the performance, or more general, the behavior of certain application types. Let the applications support system power management by specifying performance demands The design and implementation of system services is based on the inherent assumption 4

19 1.2 Objectives that the user expects maximum performance. However, the application developer knows best which operations are time-critical and in which situations requests can be delayed without affecting application quality. Therefore, an extended interface to the operating system is proposed that enables energy-aware applications to guide power management policies. Feedback-controlled power management is limited to runtime information that can be monitored by the operating system. In contrast to that, programs using the proposed interface have the opportunity to allow the operating system to trade performance for energy savings when executing specific requests. I present Cooperative-I/O, a collaborative approach to energy management between applications and the operating system. System calls can be attributed with information on performance demands. If a specific request is not time-critical, the application can allow a flexible timing of its execution. This way, the operating system is not expected to execute the operation immediately. The granted flexibility can be exploited by power management policies. For instance, accesses to a hard disk can be deferred and clustered with other device requests in order to avoid costly transitions between low-power and active operating modes. Let the user support system power management by specifying performance demands A third approach is presented that allows the user (administrator, developer) to specify performance requirements of specific applications. This way, a cooperation between the operating system and applications is made possible, even for (legacy) programs that do not support system power management. During a training phase, characteristic properties of the resource consumption of individual tasks are learned. At runtime, the system monitors the resource usage, identifies active applications and remembers their appropriate power management policies or settings. In order to train the system, techniques from machine learning are applied to operating system power management. These solutions are to some degree orthogonal to each other. They differ in the source of information used for reaching decisions regarding the runtime management of low-power modes. The first approach is restricted to on-line information that can be derived automatically at the system level. While this solution is immediately applicable, the operating system is not aware of individual, application-specific performance demands. This is the motivation for the second approach that provides an extended interface to be used by energy-aware programs. This way, the operating system gains additional information regarding task-specific expectations on performance. With this infrastructure, a fine-grained control of the energy/performance trade-off is feasible. Therefore, applications are required to make use of the new interface, possibly restricting its applicability and acceptance. To close this gap and to be able to take individual user preferences into account, a third approach is investigated: the system can be trained to identify preferred power management policies or task-specific performance requirements, specified by the user or administrator, at runtime. The focus of this thesis is on operating system services that form the infrastructure for adaptive, task-specific power management on general purpose, non real-time systems. The operating system is the entity that has control and knowledge both of hardware components, their states 5

20 1 Introduction and characteristic properties, and the applications which access them. Only on the level of the operating system, detailed information on the use of available resources, the power consumption and the effects on application performance can be obtained. As energy consumption is an aspect of the whole system, the kernel is the appropriate entity to manage the energy consumption, as argued by Vahdat et al. [VLE00]. Power management policies are presented that make use of the proposed kernel services. In contrast to other studies on low-power systems, minimizing the energy consumption is not the only and primary goal of this thesis as the influence on application performance and the specific power/performance trade-off have to be taken into account. Additionally, the focus of this research is not on new and better energy-saving algorithms, but on system services that form an indispensable infrastructure for adaptive, application-specific power management. 1.3 Outline This dissertation is organized as follows. First, I discuss techniques to save power at the component level. In chapter 3, an approach to quantify application performance and estimate and control the power consumption at runtime is introduced. An extended system interface to be used by energy-aware applications is presented in chapter 4. Next, the process of training the system to identify workloads and their specific performance demands is discussed (chapter 5). Finally, the thesis is concluded. 6

21 2 Background: Power Management at the Component Level As a prerequisite for task-specific energy management, power saving mechanisms and their implications have to be understood. Therefore, the characteristics of low-power operating modes of different system components are investigated in this chapter. System-wide, energy-aware policies that control these modes, applied in real systems as well as proposed in the literature, are discussed. First, I identify the system components that typically constitute large portions of the power consumed by a mobile, battery-powered device. In the following sections, these components, their energy characteristics and existing power management policies are investigated: the CPU and memory (section 2.2), hard disks (section 2.3) and the wireless network interface card (section 2.4). 2.1 ipaq Power Breakdown To identify the components that contribute significantly to the power consumption of a typical mobile computer, I performed energy measurements of the popular ipaq 3970 handheld. This handheld will be used as the platform for experiments throughout this thesis. It is equipped with an Intel PXA 250 CPU featuring frequency scaling, 64 MB of SDRAM and an expansion pack with a Cisco Aironet wireless network interface card and a 4 GB Hitachi Microdrive hard disk (3K4-4). A data acquisition (DAQ) system was used to measure the voltage drop at a sense resistor in the power lines from the ipaq s internal battery. The expansion pack is powered by its own batteries; the power consumption of the network card and the hard disk were measured using an extender card. Table 2.1 shows the variation in power consumption (difference between low-power and active mode and maximum power savings) of different components of the ipaq. 7

22 2 Background: Power Management at the Component Level component variation in active maximum savings in power consumption low-power modes CPU, memory 0.87 W 0.23 W LCD, backlight 0.70 W Expansion pack: wireless interface 1.2 W 0.76 W hard disk 1.0 W 0.51 W Table 2.1: System components of the ipaq 3970 with a high variation in active power consumption. Memory and CPU could not be measured independently. A minimum idle power consumption of 0.38 W and a maximum active power consumption of 2.03 W were determined for the ipaq without the expansion pack. Energy can be saved by scaling the CPU frequency, switching the disk to standby mode and periodically putting the wireless network interface to sleep. For these components, the table shows the maximum active power, i. e., the difference between idle operation at the deepest sleep mode and active mode with peek power consumption. In addition to that, the table lists the maximum power savings that can be achieved if low-power modes are used. Approaches to display power management exist (see, e. g., [CSC02, GABR02]), but were not investigated as the high variation in display power consumption on the ipaq is solely due to different backlight brightness levels. It can be seen that CPU & memory and the I/O devices can contribute significantly to total power consumption. Consequently, the following analysis concentrates on processor, hard disk and wireless network power management. 2.2 Processor and Memory The power consumption of processor and memory can be divided into a static part due to leakage current and a dynamic part that is mainly caused by the components with high switching frequencies and a large number of capacitors. The CPU s dynamic energy consumption depends on the type of instructions executed and the activity of the different functional units (e. g., the instruction fetch/decode unit) involved. Caches and the memory management unit (MMU) contribute significantly to total power as they are made up of associative memory. Dynamic random access memory (DRAM) has a high static power consumption as the capacitors that store information have to be recharged periodically. Depending on the frequency and pattern of memory requests, a major part of the dynamic power consumption is caused by the MMU (for address translation), the caches and the DRAM (due to several decode and multiplex stages). Finally, the dynamic power consumption is also influenced by the activity of the interconnection network. In this chapter, mechanisms to reduce the power consumption of the processor are presented. In section 2.2.3, low-power features of the memory system and their interaction 8

23 2.2 Processor and Memory with CPU power management are discussed. The energy consumption of the CPU is proportional to the clock frequency and proportional to the square of the operating voltage. Running the processor more slowly allows to lower the voltage level which results in a quadratic reduction in energy consumption, at the cost of increased runtime. This trade-off can be used by dynamic frequency & voltage scaling algorithms (DFVS) to reduce the CPU speed as long as the deadlines of applications are still met. For instance, DFVS techniques are implemented in processors by Intel ( Intel SpeedStep Technology supported by Pentium III M, (4) M [Int04], Core Solo/Duo, and XScale CPUs) and AMD ( PowerNow! ). Many frequency scaling techniques do not scale the voltage. To get an impression of the effects of DFVS, I performed measurements of the power and energy consumption of an evaluation board equipped with the Intel XScale PXA 255 processor featuring frequency and voltage scaling 1. A similar version of this processor (Intel PXA 250) is also found on the ipaq handheld. The board is equipped with 16 MB of low-power SDRAM. This approach was chosen as the ipaq does not allow to measure the power consumption of CPU and memory directly. Measurements were performed of the free, commercially representative embedded benchmark suite MiBench [GRE + 01]: it consists of 21 test programs from the categories automotive and industrial control, network, security, consumer devices, office automation and telecommunication. The left graph of figure 2.1 shows the average active power consumption of a subset of the MiBench tests running at three different CPU speeds and voltage levels: 199 MHz (1.0 V), 299 MHz (1.1 V) and 398 MHz (1.3 V). The idle power (in the range of 320 to 380 mw) was subtracted from the measured power consumption. The right graph shows the execution time of each test relative to the runtime at 199 MHz. It can be seen that the average power consumption takes higher values for increased clock rates and voltages (left graph). However, the execution time is reduced if the CPU is run at a higher speed (right graph). The figure also demonstrates that the benchmarks differ in their average power consumption and that the performance degradation due to power management varies from test to test DFVS Policies Grunwald et al. analyzed and compared different speed setting policies proposed by Weiser, Govil and Pering [GLM + 00]. Among them are PAST and its generalized version AVG N which derive a prediction for the upcoming deadline based on the average load over a specific number of past periods [WWDS94]. The minimum CPU speed is selected for which the estimated deadline is not violated. The authors show that no scheduling policy they examined was able to achieve the goal of setting the optimal speed for MPEG playback, which is constant over the whole program run. This example motivates the need for application-specific power management. Or in the words of the authors: without information from the user level application, a kernel cannot accurately determine what deadlines an application operates under [GLM + 00]. Another speed setting policy is Processor Acceleration to Conserve Energy (PACE) proposed by Lorch et al. [LS01]: PACE does not change the performance, instead the speed schedule 1 Evaluation board Triton LP from Ka-Ro electronics GmbH 9

24 2 Background: Power Management at the Component Level active power consumption [mw] basicmath bitcount qsort susan patricia blowfish rijndael execution time relative to 199 MHz basicmath bitcount 199 MHz (@ 1.0V) 299 MHz (@ 1.1V) 398 MHz (@ 1.3V) qsort susan patricia blowfish rijndael Figure 2.1: Average power consumption and relative execution time of MiBench benchmarks (the sequence of speed settings) is changed in order to reduce the energy consumption without affecting the performance distribution of workloads. The CPU speed is gradually increased as the task progresses and set to the maximum level if the deadline is reached. PACE is applied to a number of speed setting policies and prediction methods proposed by Pering et al. [PBB98], Grunwald et al. [GLM + 00], Govil et al. [GCW95] and Weiser et al. [WWDS94]. The utilization of the upcoming interval is predicted to be: the last interval s utilization (PAST). an exponentially decreasing average of the utilization of all past intervals (Aged-α). the average of the 12 most recent intervals, with a higher weight for the three most recent (LongShort). a constant value u 1 (Flat-u). Speed setting methods either switch between minimum and maximum speed (PEG by Grunwald), gradually increase (decrease) the speed if the predicted utilization exceeds (falls below) a certain threshold (as proposed by Weiser) or compute the speed by multiplying the maximum speed with the utilization (as proposed by Chan) Clock Throttling Besides frequency/voltage scaling, some processors support clock throttling (or clock modulation) to dynamically modify the performance of an active processor (for instance, the Intel Pentium 4, Pentium M and Xeon). The main clock is gated with a throttling signal, but in contrast to frequency scaling, kept at the original frequency. The throttling signal is used to deactivate the clock periodically for a short period of time. For an illustration, see figure 2.2. Usually, eight clock throttling levels (100 % to 12.5 %) are supported; these settings differ in the amount of time the clock is throttled during a time window of approximately 3 µs. For instance, 10

25 2.2 Processor and Memory clock signal gating signal throttled clock Figure 2.2: Principal of operation of clock throttling if the throttling level is set to 62.5 %, the clock runs freely for the first 5/8th of the time window and is throttled for the remaining 3/8th. The clock throttling level can be adjusted by software by writing into a model-specific register. As a result, the CPU is effectively slowed down as it receives fewer clock cycles per time unit. There is a linear relationship between the power consumption and the throttling level. Clock throttling is often used for temperature management to provide a fast response to thermal emergencies. While changing the frequency and voltage level can incur a stall latency of up to 10 µs, the throttling level can be adjusted instantaneously, i. e., without a stall. Miyoshi et al. [MLH + 02] compare different approaches to processor power management, namely clock throttling and frequency/voltage scaling with respect to energy efficiency. They derive at the conclusion that on a Pentium III-based system featuring clock throttling it is energy efficient to run only at maximum CPU speed. With frequency scaling on a PowerPC 405 processor, the lowest speed setting maximizes energy efficiency. The authors generalize this observation and introduce the critical power slope. They assume a linear relationship between performance and CPU frequency and between active state power and frequency, while idle mode power is approximately constant over all frequencies. The critical power slope is the slope of the active power consumption for which the total energy usage (active and idle power) is constant over all speed settings. If the actual slope of a specific hardware is below the critical power slope, it will be energy efficient to run the system at a higher frequency in order to minimize the time in active state. The slowdown due to clock throttling is determined by comparing the number of unhalted cycles, which can be recorded using performance counters, with the original clock frequency, available through the time stamp counter Memory Power Management DFVS algorithms are used to reduce the power consumption of the CPU. However, the actual energy savings of specific frequency and voltage settings depend on the application-specific memory activity and the contribution of the memory system to total power consumption. Snowdon et al. demonstrate that contrary to the assumptions behind frequency/voltage scaling a 11

26 2 Background: Power Management at the Component Level higher clock speed can result in a reduction of energy consumption [SRH05]. This is the case if the memory base power is comparably high. As tasks are executed faster at a higher CPU frequency, the contribution of static memory power to total energy consumption is reduced. Many research projects investigate the potential of memory systems that offer power management features. In particular, the discontinued Rambus Dynamic Random Access Memory (RDRAM) can switch single memory banks in one of two low-power modes with dramatically reduced power consumption. As a drawback, access latencies increase by a factor of 10 to 1000 if a powered-down memory chip has to be activated. Fan et al. [FEL01] use trace-driven simulation to derive the energy-delay product of different memory power management policies. They arrive at the conclusion that the simple policy of switching the DRAM chip to a low-power mode immediately after an access is more energy efficient compared to other, more sophisticated algorithms. The authors also investigate the interaction of power-aware memory systems and dynamic frequency/voltage scaling [FEL03]. As a result, they find that there is a trade-off between memory and processor energy consumption: at low frequencies, memory dominates overall power. If the CPU frequency is increased, total power is initially reduced but increases as the power consumption of the processor is becoming more and more dominant. A technique to derive the trade-off between memory and CPU energy at runtime based on information from performance monitoring counters is presented. This estimation can be used by a DFVS algorithm to select an appropriate frequency/voltage configuration dynamically. Huang et al. [HPS03] also investigate the low-power modes of RDRAM chips. A NUMA abstraction is presented to organize and manage memory. Pages are allocated to a small number of memory banks in order to increase the number of banks that will not be accessed frequently and, therefore, can be kept in a low-power state. The scheduler determines the best and second best process to run and activates the sleeping memory banks of both processes in order to reduce the impact of access latencies. For the platform used in the evaluation (a Pentium 4 at 1.6 GHz), the context switching time can be utilized to hide the latency due to powering up bus and sense amplifiers and resynchronization with the external clock. In order to reduce the number of active memory banks, the contents of memory can be compressed. Besides hardware support for memory compression [ABS + 01, BBMM02], software techniques are investigated [BA03, LHW00]. 2.3 Hard Disk Hard disks feature several low-power modes which switch off parts of the electronics or mechanical components of the drive (e. g., the spindle motor). These operating modes have been available in hard disks since the early 1980s and have already been supported by the first ATA standard. Almost all drive models support the standby mode, which stops the spindle motor, and the sleep mode, which shuts down the device almost completely. The sleep mode is almost never used as it requires a soft or hard reset to reactivate the hard disk. The drive automatically leaves the standby mode if a read or write request is issued. In addition to that, modern drives support several low-power idle modes. However, an interface to control transitions between these modes does not usually exist; they are managed by the drive s firmware. 12

27 2.3 Hard Disk 5 4 power [W] time [s] Figure 2.3: Transition of a Travelstar hard disk from idle to standby and back to idle mode Figure 2.3 shows the power consumption of an IBM Travelstar 15 GN drive during an idlestandby-idle turnaround. t = 1 s: t = 1.8 s: t = 3.8 s: t = 4.8 s: The disk receives a shutdown command. The shaded region shows the hard disk switching from low-power idle to standby mode. After stopping the spindle motor the disk has reached standby mode and power consumption drops to about 0.24 W. The drive receives a write command and starts to spin up. The shaded region shows the hard disk switching from standby mode to active mode. Starting the spindle motor is quite expensive in terms of energy consumption. After 1 s, the disk has spun up and may serve read or write requests. In this test scenario only a single disk block gets written. Then, the disk switches back to low-power idle mode. The characteristics of the various operating modes of the Travelstar 15 GN hard disk were determined through power measurements. Due to the undocumented internal adaptive algorithm of the firmware the time and energy values vary according to the recent access pattern. Average values of several measurements are shown in table 2.2. The table also shows the latencies when leaving a low-power mode. Resuming to the activate state results in an overhead in time and energy which has to be accounted for by power management algorithms Break-Even Time The time spent in, e. g., standby mode has to exceed the break-even time in order for the amount of energy saved to be higher than the energy needed to perform the transitions to and from standby mode. This threshold is in the order of 2 to 20 seconds for most drives. Using the definitions from table 2.3, the break-even time t be is defined as follows: t be P i = (t be t sd t su )P s + E sd + E su t be = E sd + E su P s (t sd +t su ) P i P s 13

28 2 Background: Power Management at the Component Level mode properties power latency active read, write, or seek operation W performance idle All electronic components remain powered and 1.85 W the servo is operating at full frequency. active idle Parts of the electronics are powered off; the 0.85 W 20 ms heads are parked near the mid-diameter of the disk without servoing. low-power idle The heads are unloaded on the ramp (i. e., 0.66 W 300 ms parked); the spindle is still rotating at full speed. standby The spindle motor is switched off W s sleep Almost the complete electronics are switched off; a drive reset is required to leave the sleep mode. 0.1 W s Table 2.2: IBM Travelstar 15 GN hard disk: operating modes and their properties transition latency energy spin-up t su E su spin-down t sd E sd mode time power idle t i P i standby t s P s Table 2.3: Definitions for computing the break-even time for hard disk power management A transition from idle to standby mode reduces the energy consumption only if t i > t be. Break-even times for other mode transitions, e. g., from standby to sleep mode, can be computed analogously. Table 2.4 shows the energy characteristics of different hard disks (an IBM Travelstar 15 GN, 10 GB, a Toshiba MK2023GAS, 20 GB, and a Hitachi Microdrive, 4 GB) and their break-even time (for standby mode). In addition to that, the lifetime of a hard disk is affected by start/stop cycles, i. e., transitions between the idle and standby mode. Each spin-up and spin-down operation causes a small amount of wear to the heads, the spindle motor and the other components. Hard disk manufacturers specify the minimum number of start/stop cycles the drive is designed for to withstand during its service life. This value ranges from 50,000 to 300,000 or more. The effects of mode transitions can be reduced by parking the drive s heads on special ramps if they are not used (e. g., the load/unload technology [AS99], used in former IBM hard disks). As a consequence, there is not only a trade-off between energy consumption and access latency, but also between energy and the lifetime of the drive Spin-Down Policies Spin-down policies can be grouped into on-line and off-line policies. Off-line policies are assumed to be omniscient and optimal, having access to complete information on past and future hard disk accesses. Another classification is the distinction between adaptive and non- 14

Self Learning Hard Disk Power Management for Mobile Devices

Self Learning Hard Disk Power Management for Mobile Devices Self Learning Hard Disk Power Management for Mobile Devices Andreas Weissel weissel@cs.fau.de, http://www4.cs.fau.de Department of Computer Sciences 4 Distributed Systems and Operating Systems Friedrich

More information

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks Last Time Making correct concurrent programs Maintaining invariants Avoiding deadlocks Today Power management Hardware capabilities Software management strategies Power and Energy Review Energy is power

More information

Preface. Fig. 1 Solid-State-Drive block diagram

Preface. Fig. 1 Solid-State-Drive block diagram Preface Solid-State-Drives (SSDs) gained a lot of popularity in the recent few years; compared to traditional HDDs, SSDs exhibit higher speed and reduced power, thus satisfying the tough needs of mobile

More information

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor

Lecture 12. Motivation. Designing for Low Power: Approaches. Architectures for Low Power: Transmeta s Crusoe Processor Lecture 12 Architectures for Low Power: Transmeta s Crusoe Processor Motivation Exponential performance increase at a low cost However, for some application areas low power consumption is more important

More information

Application Characterization for Wireless Network Power Management

Application Characterization for Wireless Network Power Management Application Characterization for Wireless Network Power Management Andreas Weissel, Matthias Faerber and Frank Bellosa University of Erlangen, Department of Computer Science 4 {weissel,faerber,bellosa}@cs.fau.de

More information

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

CS 525M Mobile and Ubiquitous Computing Seminar. Michael Theriault

CS 525M Mobile and Ubiquitous Computing Seminar. Michael Theriault CS 525M Mobile and Ubiquitous Computing Seminar Michael Theriault Software Strategies for Portable Computer Energy Management Paper by Jacob R. Lorch and Alan J. Smith at the University of California In

More information

EDF-DVS Scheduling on the IBM Embedded PowerPC 405LP

EDF-DVS Scheduling on the IBM Embedded PowerPC 405LP EDF-DVS Scheduling on the IBM Embedded PowerPC 405LP Aravindh V. Anantaraman, Ali El-Haj Mahmoud, and Ravi K. Venkatesan {avananta, aaelhaj, rkvenkat}@ncsu.edu Instructor: Dr. Frank Mueller CSC714 Real-Time

More information

Network Swapping. Outline Motivations HW and SW support for swapping under Linux OS

Network Swapping. Outline Motivations HW and SW support for swapping under Linux OS Network Swapping Emanuele Lattanzi, Andrea Acquaviva and Alessandro Bogliolo STI University of Urbino, ITALY Outline Motivations HW and SW support for swapping under Linux OS Local devices (CF, µhd) Network

More information

Statement of Research for Taliver Heath

Statement of Research for Taliver Heath Statement of Research for Taliver Heath Research on the systems side of Computer Science straddles the line between science and engineering. Both aspects are important, so neither side should be ignored

More information

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics

,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics ,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also

More information

ptop: A Process-level Power Profiling Tool

ptop: A Process-level Power Profiling Tool ptop: A Process-level Power Profiling Tool Thanh Do, Suhib Rawshdeh, and Weisong Shi Wayne State University {thanh, suhib, weisong}@wayne.edu ABSTRACT We solve the problem of estimating the amount of energy

More information

Reconfigurable Multicore Server Processors for Low Power Operation

Reconfigurable Multicore Server Processors for Low Power Operation Reconfigurable Multicore Server Processors for Low Power Operation Ronald G. Dreslinski, David Fick, David Blaauw, Dennis Sylvester, Trevor Mudge University of Michigan, Advanced Computer Architecture

More information

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth

More information

Self-Learning Hard Disk Power Management for Mobile Devices

Self-Learning Hard Disk Power Management for Mobile Devices Self-Learning Hard Disk Power Management for Mobile Devices Andreas Weissel University of Erlangen-Nuremberg Distributed Systems and Operating Systems weissel@cs.fau.de Frank Bellosa University of Karlsruhe

More information

Benchmarking of Dynamic Power Management Solutions. Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007

Benchmarking of Dynamic Power Management Solutions. Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007 Benchmarking of Dynamic Power Management Solutions Frank Dols CELF Embedded Linux Conference Santa Clara, California (USA) April 19, 2007 Why Benchmarking?! From Here to There, 2000whatever Vendor NXP

More information

Cache Justification for Digital Signal Processors

Cache Justification for Digital Signal Processors Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose

More information

CS 201 The Memory Hierarchy. Gerson Robboy Portland State University

CS 201 The Memory Hierarchy. Gerson Robboy Portland State University CS 201 The Memory Hierarchy Gerson Robboy Portland State University memory hierarchy overview (traditional) CPU registers main memory (RAM) secondary memory (DISK) why? what is different between these

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory

More information

Improving Real-Time Performance on Multicore Platforms Using MemGuard

Improving Real-Time Performance on Multicore Platforms Using MemGuard Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Process Cruise Control

Process Cruise Control Process Cruise Control Event-Driven Clock Scaling for Dynamic Power Management Andreas Weissel University of Erlangen weissel@cs.fau.de Frank Bellosa University of Erlangen bellosa@cs.fau.de ABSTRACT Scalability

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

A Scalable Multiprocessor for Real-time Signal Processing

A Scalable Multiprocessor for Real-time Signal Processing A Scalable Multiprocessor for Real-time Signal Processing Daniel Scherrer, Hans Eberle Institute for Computer Systems, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland {scherrer, eberle}@inf.ethz.ch

More information

CENG3420 Lecture 08: Memory Organization

CENG3420 Lecture 08: Memory Organization CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory

More information

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 13 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 13 COMPUTER MEMORY So far, have viewed computer memory in a very simple way Two memory areas in our computer: The register file Small number

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

Power Management and Dynamic Voltage Scaling: Myths and Facts

Power Management and Dynamic Voltage Scaling: Myths and Facts Power Management and Dynamic Voltage Scaling: Myths and Facts David Snowdon, Sergio Ruocco and Gernot Heiser National ICT Australia and School of Computer Science and Engineering University of NSW, Sydney

More information

Storage. Hwansoo Han

Storage. Hwansoo Han Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

King 2 Abstract: There is one evident area of operating systems that has enormous potential for growth and optimization. Only recently has focus been

King 2 Abstract: There is one evident area of operating systems that has enormous potential for growth and optimization. Only recently has focus been King 1 Input and Output Optimization in Linux for Appropriate Resource Allocation and Management James Avery King March 25, 2016 University of North Georgia Annual Research Conference King 2 Abstract:

More information

Computer Systems Research in the Post-Dennard Scaling Era. Emilio G. Cota Candidacy Exam April 30, 2013

Computer Systems Research in the Post-Dennard Scaling Era. Emilio G. Cota Candidacy Exam April 30, 2013 Computer Systems Research in the Post-Dennard Scaling Era Emilio G. Cota Candidacy Exam April 30, 2013 Intel 4004, 1971 1 core, no cache 23K 10um transistors Intel Nehalem EX, 2009 8c, 24MB cache 2.3B

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Read this before starting!

Read this before starting! Points missed: Student's Name: Total score: /100 points East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 1 for Fall Semester, 2005 Section

More information

ONLINE CLOSED-LOOP OPTIMIZATION OF DISTRIBUTION NETWORKS

ONLINE CLOSED-LOOP OPTIMIZATION OF DISTRIBUTION NETWORKS ONLINE CLOSED-LOOP OPTIMIZATION OF DISTRIBUTION NETWORKS Werner FEILHAUER Michael HEINE Andreas SCHMIDT PSI AG, EE DE PSI AG, EE DE PSI AG, EE - DE wfeilhauer@psi.de mheine@psi.de aschmidt@psi.de ABSTRACT

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe

More information

An introduction to SDRAM and memory controllers. 5kk73

An introduction to SDRAM and memory controllers. 5kk73 An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part

More information

Lecture 1: Introduction

Lecture 1: Introduction Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline

Reminder. Course project team forming deadline. Course project ideas. Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Reminder Course project team forming deadline Friday 9/8 11:59pm You will be randomly assigned to a team after the deadline Course project ideas If you have difficulty in finding team mates, send your

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 24

ECE 571 Advanced Microprocessor-Based Design Lecture 24 ECE 571 Advanced Microprocessor-Based Design Lecture 24 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 25 April 2013 Project/HW Reminder Project Presentations. 15-20 minutes.

More information

SmartSaver: Turning Flash Drive into a Disk Energy Saver for Mobile Computers

SmartSaver: Turning Flash Drive into a Disk Energy Saver for Mobile Computers SmartSaver: Turning Flash Drive into a Disk Energy Saver for Mobile Computers Feng Chen 1 Song Jiang 2 Xiaodong Zhang 1 The Ohio State University, USA Wayne State University, USA Disks Cost High Energy

More information

Energy-aware Reconfiguration of Sensor Nodes

Energy-aware Reconfiguration of Sensor Nodes Energy-aware Reconfiguration of Sensor Nodes Andreas Weissel Simon Kellner Department of Computer Sciences 4 Distributed Systems and Operating Systems Friedrich-Alexander University Erlangen-Nuremberg

More information

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB)

Views of Memory. Real machines have limited amounts of memory. Programmer doesn t want to be bothered. 640KB? A few GB? (This laptop = 2GB) CS6290 Memory Views of Memory Real machines have limited amounts of memory 640KB? A few GB? (This laptop = 2GB) Programmer doesn t want to be bothered Do you think, oh, this computer only has 128MB so

More information

Advanced Multimedia Architecture Prof. Cristina Silvano June 2011 Amir Hossein ASHOURI

Advanced Multimedia Architecture Prof. Cristina Silvano June 2011 Amir Hossein ASHOURI Advanced Multimedia Architecture Prof. Cristina Silvano June 2011 Amir Hossein ASHOURI 764722 IBM energy approach policy: One Size Fits All Encompass Software/ Firmware/ Hardware Power7 predecessors features

More information

Microdrive: High Capacity Storage for the Handheld Revolution

Microdrive: High Capacity Storage for the Handheld Revolution Microdrive: High Capacity Storage for the Handheld Revolution IBM Almaden Research Center San Jose, CA IBM Mobile Storage Development Fujisawa, Japan IBM Storage Systems Division San Jose, CA Recent History

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

High performance, power-efficient DSPs based on the TI C64x

High performance, power-efficient DSPs based on the TI C64x High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research

More information

Crusoe Processor Model TM5800

Crusoe Processor Model TM5800 Model TM5800 Crusoe TM Processor Model TM5800 Features VLIW processor and x86 Code Morphing TM software provide x86-compatible mobile platform solution Processors fabricated in latest 0.13µ process technology

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1]

2 Improved Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers [1] EE482: Advanced Computer Organization Lecture #7 Processor Architecture Stanford University Tuesday, June 6, 2000 Memory Systems and Memory Latency Lecture #7: Wednesday, April 19, 2000 Lecturer: Brian

More information

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Dan Nicolaescu Alex Veidenbaum Alex Nicolau Dept. of Information and Computer Science University of California at Irvine

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Computer Architecture Lecture 24: Memory Scheduling

Computer Architecture Lecture 24: Memory Scheduling 18-447 Computer Architecture Lecture 24: Memory Scheduling Prof. Onur Mutlu Presented by Justin Meza Carnegie Mellon University Spring 2014, 3/31/2014 Last Two Lectures Main Memory Organization and DRAM

More information

Traffic Analysis on Business-to-Business Websites. Masterarbeit

Traffic Analysis on Business-to-Business Websites. Masterarbeit Traffic Analysis on Business-to-Business Websites Masterarbeit zur Erlangung des akademischen Grades Master of Science (M. Sc.) im Studiengang Wirtschaftswissenschaft der Wirtschaftswissenschaftlichen

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

Advanced Computer Architecture (CS620)

Advanced Computer Architecture (CS620) Advanced Computer Architecture (CS620) Background: Good understanding of computer organization (eg.cs220), basic computer architecture (eg.cs221) and knowledge of probability, statistics and modeling (eg.cs433).

More information

Memory Hierarchy Y. K. Malaiya

Memory Hierarchy Y. K. Malaiya Memory Hierarchy Y. K. Malaiya Acknowledgements Computer Architecture, Quantitative Approach - Hennessy, Patterson Vishwani D. Agrawal Review: Major Components of a Computer Processor Control Datapath

More information

Multimedia Systems 2011/2012

Multimedia Systems 2011/2012 Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware

More information

Power and Energy Management

Power and Energy Management Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,

More information

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD

More information

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur

Power and Energy Management. Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur Power and Energy Management Advanced Operating Systems, Semester 2, 2011, UNSW Etienne Le Sueur etienne.lesueur@nicta.com.au Outline Introduction, Hardware mechanisms, Some interesting research, Linux,

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

Multi-Core Microprocessor Chips: Motivation & Challenges

Multi-Core Microprocessor Chips: Motivation & Challenges Multi-Core Microprocessor Chips: Motivation & Challenges Dileep Bhandarkar, Ph. D. Architect at Large DEG Architecture & Planning Digital Enterprise Group Intel Corporation October 2005 Copyright 2005

More information

W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b l e d N e x t - G e n e r a t i o n V N X

W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b l e d N e x t - G e n e r a t i o n V N X Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b

More information

Understanding the performance of an X user environment

Understanding the performance of an X user environment Understanding the performance of an X550 11-user environment Overview NComputing s desktop virtualization technology enables significantly lower computing costs by letting multiple users share a single

More information

Lecture 23. Finish-up buses Storage

Lecture 23. Finish-up buses Storage Lecture 23 Finish-up buses Storage 1 Example Bus Problems, cont. 2) Assume the following system: A CPU and memory share a 32-bit bus running at 100MHz. The memory needs 50ns to access a 64-bit value from

More information

A Programming Environment with Runtime Energy Characterization for Energy-Aware Applications

A Programming Environment with Runtime Energy Characterization for Energy-Aware Applications A Programming Environment with Runtime Energy Characterization for Energy-Aware Applications Changjiu Xian Department of Computer Science Purdue University West Lafayette, Indiana cjx@cs.purdue.edu Yung-Hsiang

More information

Energy Management Issue in Ad Hoc Networks

Energy Management Issue in Ad Hoc Networks Wireless Ad Hoc and Sensor Networks - Energy Management Outline Energy Management Issue in ad hoc networks WS 2010/2011 Main Reasons for Energy Management in ad hoc networks Classification of Energy Management

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Cutting Power Consumption in HDD Electronics. Duncan Furness Senior Product Manager

Cutting Power Consumption in HDD Electronics. Duncan Furness Senior Product Manager Cutting Power Consumption in HDD Electronics Duncan Furness Senior Product Manager Situation Overview The industry continues to drive to lower power solutions Driven by: Need for higher reliability Extended

More information

Chapter 6. Storage and Other I/O Topics

Chapter 6. Storage and Other I/O Topics Chapter 6 Storage and Other I/O Topics Introduction I/O devices can be characterized by Behaviour: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections

More information

Energy Management Issue in Ad Hoc Networks

Energy Management Issue in Ad Hoc Networks Wireless Ad Hoc and Sensor Networks (Energy Management) Outline Energy Management Issue in ad hoc networks WS 2009/2010 Main Reasons for Energy Management in ad hoc networks Classification of Energy Management

More information

Chapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 02. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 02 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 2.1 The levels in a typical memory hierarchy in a server computer shown on top (a) and in

More information

IMPROVING LIVE PERFORMANCE IN HTTP ADAPTIVE STREAMING SYSTEMS

IMPROVING LIVE PERFORMANCE IN HTTP ADAPTIVE STREAMING SYSTEMS IMPROVING LIVE PERFORMANCE IN HTTP ADAPTIVE STREAMING SYSTEMS Kevin Streeter Adobe Systems, USA ABSTRACT While HTTP adaptive streaming (HAS) technology has been very successful, it also generally introduces

More information

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.

More information

Performance Extrapolation for Load Testing Results of Mixture of Applications

Performance Extrapolation for Load Testing Results of Mixture of Applications Performance Extrapolation for Load Testing Results of Mixture of Applications Subhasri Duttagupta, Manoj Nambiar Tata Innovation Labs, Performance Engineering Research Center Tata Consulting Services Mumbai,

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

A practical dynamic frequency scaling scheduling algorithm for general purpose embedded operating system

A practical dynamic frequency scaling scheduling algorithm for general purpose embedded operating system A practical dynamic frequency scaling scheduling algorithm for general purpose embedded operating system Chen Tianzhou, Huang Jiangwei, Zheng Zhenwei, Xiang Liangxiang College of computer science, ZheJiang

More information

TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT

TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT Nosayba El-Sayed, Ioan Stefanovici, George Amvrosiadis, Andy A. Hwang, Bianca Schroeder {nosayba, ioan, gamvrosi, hwang, bianca}@cs.toronto.edu

More information

Low-power Architecture. By: Jonathan Herbst Scott Duntley

Low-power Architecture. By: Jonathan Herbst Scott Duntley Low-power Architecture By: Jonathan Herbst Scott Duntley Why low power? Has become necessary with new-age demands: o Increasing design complexity o Demands of and for portable equipment Communication Media

More information

Lecture 15. Power Management II Devices and Algorithms CM0256

Lecture 15. Power Management II Devices and Algorithms CM0256 Lecture 15 Power Management II Devices and Algorithms CM0256 Power Management Power Management is a way for the computer or other device to save power by turning off certain features of the computer such

More information

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date:

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date: Subject Name: OPERATING SYSTEMS Subject Code: 10EC65 Prepared By: Kala H S and Remya R Department: ECE Date: Unit 7 SCHEDULING TOPICS TO BE COVERED Preliminaries Non-preemptive scheduling policies Preemptive

More information

vsan 6.6 Performance Improvements First Published On: Last Updated On:

vsan 6.6 Performance Improvements First Published On: Last Updated On: vsan 6.6 Performance Improvements First Published On: 07-24-2017 Last Updated On: 07-28-2017 1 Table of Contents 1. Overview 1.1.Executive Summary 1.2.Introduction 2. vsan Testing Configuration and Conditions

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

LECTURE 1. Introduction

LECTURE 1. Introduction LECTURE 1 Introduction CLASSES OF COMPUTERS When we think of a computer, most of us might first think of our laptop or maybe one of the desktop machines frequently used in the Majors Lab. Computers, however,

More information

Feng Chen and Xiaodong Zhang Dept. of Computer Science and Engineering The Ohio State University

Feng Chen and Xiaodong Zhang Dept. of Computer Science and Engineering The Ohio State University Caching for Bursts (C-Burst): Let Hard Disks Sleep Well and Work Energetically Feng Chen and Xiaodong Zhang Dept. of Computer Science and Engineering The Ohio State University Power Management in Hard

More information

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management Next-Generation Mobile Computing: Balancing Performance and Power Efficiency HOT CHIPS 19 Jonathan Owen, AMD Agenda The mobile computing evolution The Griffin architecture Memory enhancements Power management

More information