Utilization-controlled Task Consolidation for Power Optimization in Multi-Core Real-Time Systems

Size: px
Start display at page:

Download "Utilization-controlled Task Consolidation for Power Optimization in Multi-Core Real-Time Systems"

Transcription

1 Utilization-controlled Task Consolidation for Power Optimization in Multi-Core Real-Time Systems Xing Fu and Xiaorui Wang Dept. of Electrical Engineering and Computer Science, University of Tennessee, Knoxville Abstract Since multi-core processors have become a primary trend in processor development, new scheduling algorithms are needed to minimize power consumption while achieving the desired timeliness guarantees for multi-core (and many-core) real-time embedded systems. Although various power/energyefficient scheduling algorithms have recently been proposed, existing studies may have degraded run-time performance in terms of power/energy efficiency and real-time guarantees when applied to real-time embedded systems with uncertain execution times. In this paper, we propose a novel online solution that integrates core-level feedback control with processor-level optimization to minimize both the dynamic and leakage power consumption of a multi-core real-time embedded system. Our solution monitors the utilization of each CPU core in the system and dynamically responds to unpredictable execution time variations by conducting per-core voltage and frequency scaling. We then perform task consolidation on a longer timescale and shut down unused cores for maximized power savings. Both empirical results on a hardware multi-core testbed with well-known benchmarks and simulation results in many-core systems show that our solution provides the desired real-time guarantees while achieving more power savings than three state-of-the-art algorithms. I. INTRODUCTION Multi-core processors have become a primary trend in the current processor development due to well-known technological barriers such as the Power Wall and Instructionlevel Parallelism Wall. As a result, future high-performance real-time embedded systems are anticipated to be equipped with multi-core processors, or even many-core processors (i.e., processors with tens or hundreds of cores). However, power consumption still remains the major constraint for the further throughput improvement of multi-core processors. For example, the peak power consumption of a multi-core processor often needs to be capped as high power consumption may result in high die temperatures that can affect the reliability and performance of the processor [28]. The power problem is also exacerbated by the rapidly increasing level of core integration in the multi-core processor design. Therefore, new scheduling algorithms must be developed to minimize power consumption while achieving the desired timeliness guarantees for multi-core (and many-core) real-time embedded systems. Although various power/energy-efficient scheduling algorithms have recently been proposed for multi-core real-time embedded systems (e.g., [25]), existing studies focus on openloop solutions such as static speed scheduling and offline DVFS (dynamic voltage and frequency scaling) configurations. For example, many existing speed scheduling algorithms optimize energy/power based on worst-case execution times (WCETs) and thus, may fail the optimization goal at runtime as the actual execution times can be much smaller than the WCETs. Recent projects propose scheduling algorithms with the assumption that execution times follow certain probability distributions (e.g., []). While those open-loop solutions can work effectively for traditional real-time embedded systems deployed in closed execution environments, they may incur degraded performance in terms of power/energy efficiency and real-time guarantees when applied to real-time embedded systems that execute in open and unpredictable environments in which workloads (e.g., WCETs) are unknown and may vary significantly at runtime. Therefore, in order to achieve runtime power optimization and real-time guarantees, novel online strategies must be designed to dynamically respond to execution time variations for multi-core real-time embedded systems running in unpredictable environments. Recently, feedback control techniques have been demonstrated to be a valid tool in providing timeliness guarantees for real-time embedded systems by adapting to workload variations based on dynamic feedback. In particular, feedbackbased CPU utilization control [2] has been shown to be an effective way of providing real-time guarantees for soft realtime systems. The goal of utilization control is to enforce appropriate schedulable utilization bounds on all CPU cores in a real-time embedded system, despite significant uncertainties in system workloads. As a result, utilization control can guarantee all the real-time deadlines of the system without accurate knowledge of the workload, such as task execution times. However, existing utilization control algorithms are not designed to provide online power minimization for multi-core real-time systems. A recent study [27] proposes a poweraware utilization control approach that adopts DVFS to achieve utilization control and power efficiency. While this solution can effectively reduce dynamic power consumption, it cannot minimize static (leakage) power consumption because it does not minimize the number of active CPU cores in response to workload variations. As chip feature sizes continue to shrink, it becomes increasingly important to minimize leakage power since leakage power consumption is becoming a major contributor to the total power consumption of a multi-core processor [6]. To minimize the number of active CPU cores, it is necessary to migrate tasks among the cores for consolidation. In traditional multiprocessor real-time systems, tasks are often assigned to processors in a static way, at design time, due to the large overheads of online task migrations. A key advantage of the shared L2 caches in many multi-core real-

2 time systems is that the overhead of migrating a task among cores is less than 4 microseconds, which is sufficiently small in many real systems [3][3]. This feature allows multicore real-time systems to be more power-efficient since the leakage power consumption can be minimized by dynamic task consolidation. Although task migrations in multi-core processors may cause L cache misses, the typical penalty of an L cache miss is only -3 CPU cycles. In contrast, in traditional multiprocessor real-time systems, task migrations can be expensive by having frequent L2 cache misses, whose penalty is approximately -3 CPU cycles [3]. In this paper, we propose a novel online solution that integrates feedback control with optimization strategies to minimize (both dynamic and leakage) power consumption and guarantee timeliness for multi-core real-time embedded systems. Our solution monitors the utilization of each CPU core in the system and dynamically responds to execution time variations by conducting per-core DVFS and task consolidation among the cores in a multi-core processor. In our solution, each CPU core has a utilization controller that throttles the DVFS level of the core so that its utilization stays slightly below the schedulable bound for minimized dynamic power with real-time guarantees. To minimize leakage power, we dynamically consolidate real-time tasks onto a few of the most power-efficient cores on a longer timescale by utilizing the small overhead of migrating tasks among different cores within a multi-core processor. The migration is subject to the schedulable utilization bounds of the active cores. We then shut down unused CPU cores for minimized leakage power. The main contributions of this paper are three-fold: We propose a control theoretic solution for timeliness guarantees that minimizes both dynamic and leakage power consumption. Compared with traditional open-loop solutions, our solution can achieve better power efficiency and real-time performance when task execution times vary significantly at runtime in unpredictable environments. We design a two-level power optimization architecture that analytically integrates core-level utilization control with processor-level task consolidation to eliminate the complexity of one-level hybrid model-predictive control. The task consolidation problem is formulated as a binpacking problem and several solutions are comparatively studied. While the majority existing work relies solely on simulations for evaluation, we present empirical results on a hardware multi-core testbed to demonstrate the efficacy of our integrated solution with the Mibench benchmarks [5]. Extensive simulation results also show that our solution can achieve more power savings than state-ofthe-art algorithms in many-core systems. The rest of this paper is organized as follows. Section II reviews the related work. Section III presents the integrated system architecture. Section IV introduces the core-level utilization control loop while Section V discusses the processorlevel online task consolidation strategy. Section VI introduces the implementation of the integrated solution. Section VII presents our empirical results on our testbed and simulation results. Finally, Section VIII summarizes the paper. II. RELATED WORK Several projects have addressed the scheduling problems for multi-core real-time embedded systems. Anderson et al. proposed a cache-aware scheduling technique to avoid cache thrashing for real-time tasks on multi-core platforms [3]. Guan et al. presented test conditions for non-preemptive EDF and fixed priority scheduling [4]. However, these studies do not migrate tasks for power optimization. Sarkar et al. studied the impact of task migrations on the WCETs of real-time tasks [24]. Their work focused on WCET analysis instead of real-time scheduling. In addition, they assume non-shared L2 caches which can incur a much higher task migration overhead. Chattopadhyay et al. studied the WCET analysis for a unified cache multi-core processors [7]. All of these studies do not address the power optimization problem in multi-core real-time systems. In contrast, we attempt to minimize the power consumption of multi-core real-time systems in addition to providing real-time guarantees. Power management is an important problem for real-time embedded systems. Multiple projects have studied real-time scheduling with power management for uniprocessor systems (e.g., [8]). Aydin et al. considered the energy-aware partitioning of real-time tasks for multiprocessor systems [5]. However, the power models of [5] did not consider leakage power consumption. Chen et al. extended the power models adopted in [5] and proposed a real-time scheduling method that minimizes both dynamic and leakage energy consumption [9]. However, these studies focus on multi-processor realtime systems where task migration can be expensive due to state maintenance. Seo et al. studied energy efficient multicore real-time scheduling [25]. Their assumption is that all cores must run at the same frequency (chip-wide DVFS). In contrast, we utilize the availability of per-core DVFS for further power savings. As a result, the problem formulation is significantly different. All the aforementioned studies assume that task execution times are known a priori. While these studies can optimize the system power consumption when execution times do not change dynamically, the optimality is not guaranteed under execution time variations. Although much work on feedback control scheduling exists, to the best of our knowledge, our work is the first one which integrates the utilization control with DPM. Recently, [2] proposed to dynamically partition shared last-level caches of multi-core processors to control the utilization while reduce the power consumption. [2] is complementary to this work and can be integrated to further reduce the power consumption while guarantee real-time. III. SYSTEM ARCHITECTURE In this section, we present our system architecture. As shown in Figure, our system architecture features a task

3 UCL UCL UCL CN- C3 C TC Controller Parameter Update CN C4 C2 UCL UCL UCL TC: Processor-level Task Consolidation UCL: Utilization Control Loop Core-level Utilization Controller Frequency Modulator Ci: Core Utilization Monitor Frequency Scaling Multi-Core Processor Power Measurement The ith Core Real-Time Tasks Fig.. System architecture consolidation manager for the entire multi-core processor and a utilization control loop for each core in the processor. First, for every core, a utilization controller exists that controls the CPU utilization of the core by scaling the core frequency. The controller is a Single-Input-Single-Output (SISO) controller since the change of core frequency only affects the utilization of the core. This control loop works as follows: () the utilization monitor on each core sends the utilization of the core to the local controller; (2) the controller computes a new CPU frequency and sends it to the frequency modulator on the core; and (3) the frequency modulator then changes the core frequency using DVFS. Second, the processor-level task consolidation manager dynamically allocates tasks among the cores for task consolidation. It works as follows: () the task consolidation manager monitors all the tasks {T i i m} and measures their CPU utilizations at run-time; (2) the task consolidation manager computes an optimized new task allocation for the cores and sends the task migration requests to the operating system. The OS then redistributes the following releases (i.e., jobs) of the periodic tasks to the cores to enforce the migration of the periodic tasks; and (3) the OS changes the affinity of the tasks to the cores accordingly. The overhead of migrating a task among cores is less than 4 μs which is sufficiently small in the majority of practical real-time systems. The detailed overhead measurement results can be found in [3]. In a real system, similar to the power management unit implemented in POWER7 [29], our control architecture can be implemented in service processor firmware to interact with the main processor and OS. Our solution can also be implemented in the OS as a periodic task with the highest priority. It is important to note that without effective integration, the processor-level task consolidation manager and the core-level utilization control loops may conflict with each other. The task consolidation manager may cause the core-level utilization control loop to be unstable, as it will change the system models used by the utilization controllers. As a result, the utilization control loops need to be configured with the proper controller parameters, according to task migration. One of our main contributions is to solve the design challenges when integrating processor-level task migration and core-level utilization control. We discuss the details of the integration in Section V-C. IV. CORE-LEVEL UTILIZATION CONTROL In this section, we model, design, and analyze the core-level utilization control loop. A. Task Model To maximize the throughput of a multi-core system, an application assigned to run on multi-core processors typically consists of multiple tasks running in parallel; thus, we adopt a commonly used independent periodic task model (e.g., in [3]). A system is comprised of m periodic tasks {T i i m} executing on n cores {C i i n} in a multi-core processor. Task T i can be migrated among different cores. A core may host one or more tasks. Each task T i has a soft deadline that is equal to its period. r i is the inverse of the period of task T i. A well-known approach for meeting the deadlines on a core is to ensure its CPU utilization remains below its schedulable utilization bound (e.g., Liu and Layland bound for RMS scheduling)[9]. Note that our task model can be extended to support aperiodic tasks by using the corresponding schedulable utilization bound. For example, a utilization bound has been derived for systems with aperiodic tasks in [2]. Task rate adaptation can also be used for utilization control in some real-time systems [2]. We focus on DVFS and task migration for a more general solution since the rates of many real-time tasks cannot be adapted. Our task model has two important properties. First, while each task T i has an estimated execution time c i available at design time, a real-time task s actual execution time may differ from its estimation and vary at run-time due to two reasons: core frequency scaling by the DVFS and workload uncertainties. Modeling such uncertainties is important to systems operating in unpredictable environments. The estimated execution time can be an approximate estimation and is not necessarily the WCET. Second, the core frequency of each core C i can be dynamically adjusted on a per-core basis within a range [F min,f max ]. This assumption is based on the fact that more energy savings can be achieved with per-core DVFS when compared to conventional chip-wide DVFS [7] and many today s microprocessors already support per-core DVFS (e.g., AMD Independent Dynamic Core Technology). Note that our solution does not rely on WCET estimation, which is a key advantage of our solution, because WCETs are often unavailable or mis estimated in real-time embedded systems running in open execution environments. In contrast, a fundamental limitation of open-loop power optimization solutions is that they may fail the optimization goal at runtime when the actual execution times are significantly different from the WCETs used in the optimization. The frequency ranges are assumed to be continuous because a continuous value can be approximated by a series of discrete frequency levels supported by a processor, as we explain in Section VI. B. System Modeling We first introduce the following notation. T s, the control period, is selected such that multiple jobs of each task may be released during a control period. The utilization control loop is invoked every T s seconds. u i (k) is the utilization of core C i in the k th control period, i.e., the fraction of time that C i is not idle during the time interval [(k )T s,kt s ). B i is the desired utilization set point (i.e., schedulable bound) on C i.

4 S i (k) is the set of tasks located on core C i in the k th control period. f i (k) is the normalized CPU frequency (i.e., a value relative to the highest level F max ) of core C i in the k th control period. Following a control-theoretic methodology, we establish a dynamic model that characterizes the relationship between the controlled variable u i (k) and the manipulated variables S i (k) and f i (k). As observed in [23], since the frequencies of real microprocessors can be scaled only within limited ranges, the execution times of computation-intensive tasks on core C i can be approximately estimated to be proportional to C i s relative core frequency. Therefore, when core C i runs at f i (k), the estimated execution time of task T i on C i in the k th control period can be modeled as c i /f i (k). The estimated CPU utilization of core C i can be modeled as c j r j T j S i(k) b i (k) = () f i (k) We then define the estimated utilization change of C i, Δb i (k), as c j r j c j r j T j S i(k+) T j S i(k) Δb i (k) =. (2) f i (k +) f i (k) Note Δb i (k) is based on the estimated execution time c j. Since the actual execution times may differ from their estimation due to workload variations, we model the actual utilization of C i as the following difference equation u i (k +)=u i (k)+g i Δb i (k) (3) where the utilization gain g i represents the ratio between the change to the actual utilization and its estimation Δb i (k). For example, g i =2means that the actual change to utilization is twice the estimated change. Also note that the exact value of g i is unknown at design time due to the unpredictability of the tasks execution times. The system models (2) and (3) show the actual utilization determined by both the frequency and task allocation. Since S i (k) contains a discrete number of tasks, the system model introduces a significant challenge, which usually requires hybrid model-predictive control [2]. In a modelpredictive controller, the control problem is translated to a constrained least-squares problem [2]. The hybrid modelpredictive control problem is translated to a mixed integer non-linear programming problem (MINLP) and all existing MINLP solvers are not polynomial algorithms. To address this challenge, we adopt an integrated optimization and control approach. First, we determine the task allocation based on an optimization strategy introduced in Section V. The goal is to minimize the power consumption of the multi-core system. Second, a feedback controller is In general, the execution times of tasks which have intensive memory access and I/O operations may include frequency-independent parts that do not scale proportionally with the core frequency [4]. We plan to model frequencyindependent parts in our future work. designed for each core to achieve the desired utilization. Based on our control architecture, the core-level utilization control loop can be designed separately from the task migration optimization strategy. As a result, model (3) can be simplified by having S i (k) in (2) as a constant S i. This avoids designing a controller based on (2) to handle discrete changes of the task allocation. As a result, model (3) becomes u i (k +)=u i (k)+g i Δd i (k) c j r j (4) T j S i where Δd i (k) =/f i (k +) /f i (k). The model cannot be directly used to design the controller because the system gain g i is used to model the uncertainties in task execution times and is unknown at design time. Therefore, we design the controller based on an approximate system model of (4) with g i =. In a real system where the task execution times differ from their estimations, the actual value of g i may not equal. As a result, the closed-loop system may behave differently. However, we show that a system controlled by a controller designed with g i =can remain stable when the variation of g i is within a certain range. This range is established using a stability analysis of the closed-loop system by considering model variations. C. Controller Design and Analysis Because of our novel control architecture, the model (3) is simplified as the model (4), and we can borrow the controller design in [27]. The Z-transform of the P controller [27] is C(z) = T j S i c j r j. (5) The transfer function of the closed-loop system controlled by controller (5) is G(z) =z. (6) It is easy to prove that the controlled system is stable and has zero steady state errors when g i =. When the designed P controller is used on a system with g i, the system will remain stable when <g i < 2, which means that the actual utilization change cannot be twice the estimated utilization change. We have also proven that the system can achieve zero steady state error when the system is stable. V. PROCESSOR-LEVEL TASK CONSOLIDATION In this section, we first formulate the problem of power optimization with uncertain execution times. Second, we show that the problem is an NP-complete problem and investigate four heuristics. Finally, we analyze the coordination between the processor-level task consolidation and core-level utilization control. A. Problem Formulation In addition to notation defined in Section IV-B, we introduce more notation. T op is the time interval between two consecutive optimization invocations. The value of T op can be selected based on the trade-off between the system response speed to workload variations and the overhead of task migration in the system. T op is normally longer than T s such that the

5 core-level utilization control loops can settle down within two consecutive invocations of the task consolidation algorithm. In multi-core systems, the processor power consumption P (k) is commonly modeled [22] as follows P (k) =P s + n i= x i (k)[p i ind + P i d(k)] (7) where P s denotes the static power of all power consuming components(except the cores). This is approximately a constant and can only be removed by powering off the entire processor. x i represents the state of core C i. If a core is active, x i = ; otherwise, x i = and C i is turned off. The frequency-dependent active power for the core is defined as Pd i(k)=α if i (k) βi, where both α i and β i are systemdependent parameters. Pind i is the static power of Core i and does not depend on the supply voltage and frequency. Given a utilization set-point vector, B = [B...B n ] T and frequency constraints [F min,f max ] for each core C i, the optimization goal at the k th point (time kt op ) is to dynamically choose a task allocation {S i (k) i n} and core frequencies {f i (k) i n} to minimize the processor power consumption P (k) min n i= x i (k)[p i ind + α if i (k) βi ] (8) where x i (k) is defined as a function of S i (k) {, Si (k) =φ x i (k) = (9), S i (k) φ subject to the following constraint U j B i () T j S i(k) where U j is the average CPU utilization of task T j which can be obtained from the operating system. Constraint () ensures that the aggregate utilization of all the tasks on core C i is smaller than the schedulable utilization bounds [9]. As a result, all the tasks on each core can meet their real-time deadlines. f i (k) in (8) can also be computed as a function of S i (k) based on the following equation U j T j S i(k) f i (k) =. () B i Note that f i (k) is enforced by constraint (). If f i (k) F min, we set f i (k) = F min and the real-time guarantees will not be violated because the core frequency is higher than required. Therefore, the design goal of the task consolidation algorithm is to determine a task allocation S i (k) that can minimize power consumption P (k). Task consolidation and idle core shutdown can lead to more power savings than when simply using DVFS to lower core frequencies because as feature sizes decrease below 65 nm, the leakage power consumption becomes a major contributor to the total power consumption of a processor [6][26]. For example, in 23nm processors, the leakage power consumption accounts for approximately 8% of the total power consumption. For multi-core and manycore systems, the leakage power of idle cores can constitute a significant portion of the total power. The experiments on our hardware testbed (as shown in Figure 2) also demonstrate that the task consolidation and idle core shutdown result in more power savings than a DVFS-only solution. Clearly, the optimization problem formulated in (8) for dynamic task consolidation can be transformed to a bin packing problem [9] as it needs to pack all tasks to the cores based on their CPU utilizations and the capacity of a core shrinks when the number of its tasks increases. In the following, we discuss solutions to the formulated bin-packing problem. B. Heuristics Since the bin-packing problem is known to be NP-complete and so an optimal solution is not suitable to be used online in multi-core and many-core systems with many tasks, most existing work focuses on heuristics. Several suboptimal heuristics with different complexities have been proposed. In this work, we evaluate and compare several heuristics in terms of both overhead and solution quality. Note that for real-time embedded systems, run-time overhead is often a more serious concern than solution quality. High run-time overheads may impact the schedulibility of real-time tasks and cause deadline misses. We test First-Fit, Best-Fit, and an advanced bin packing heuristic rt-mbs based on MBS (Minimum Bin Slack) []. To further reduce the overhead of First-Fit, we design iff (Incremental First-Fit). In this section, we will compare the overheads of all heuristics theoretically. In Section VII-B, we evaluate four different heuristics presented in Section V-B in terms of both overhead and solution quality using realistic workloads. First-Fit places each task, in succession, into the first core into which it fits. Best-Fit places each task, in succession, into the most nearly full core in which it fits. Incremental First-Fit has two arrays to hold the task allocation in the last control period and task allocation in the current control period, respectively. Incremental First-Fit also employ First- Fit to assign each task into a core in every control period. However, in contrast to that First-Fit assign every task into a core by calling a system call, Incremental First-Fit store the assignment of every task into a core in an array instead of calling the system call immediately, then compare the new task allocation with the task allocation in the last control period and only call the system call for the task with changed core affinity. The key observation of iff is that the order in which tasks are packed into a core is irrelevant. What is important is the total number of core and the total utilizations of each core are the same as First-Fit. The system call overhead is up to 4 microseconds [3]. For every task, First-Fit needs to call the system call once. For a large of number tasks, the overhead may be big. Incremental First-Fit eliminates those unnecessary system calls and provide the same solution quality as First-Fit. MBS is bin-focused. In each step, MBS attempts to find a set of tasks (packing) that makes the core as full as possible.

6 Algorithm rt-mbs q: an index of tasks not assigned to cores n: the number of tasks not assigned to cores A: the current assignment Minimum-Bin-Slack ( q ) begin : for all index i from q to n do 2: Get i th tasks not assigned to cores; 3: if i th tasks can be assigned to the core under A then 4: Add i th tasks into A; 5: Minimum-Bin-Slack(i +); 6: Remove i th tasks from A; 7: if No free space exists under the current optimal assignment then 8: Exit; 9: end if : end if : if A is better than the current optimal assignment then 2: Set A the current optimal assignment; 3: end if 4: end for end Building a packing for each core is implemented recursively. The detailed algorithm of applying MBS to processor-level task consolidation is shown in Algorithm. The algorithm is invoked repeatedly until all tasks are assigned. The procedure is invoked with q = while the current assignment and current optimal assignment are initialized to be null sets. Note that the allocation in each step is subject to the utilization constraint (), which is enforced by line 3 of Algorithm. The utilization constraint is checked in each step when a task is allocated to a core to guarantee the real-time executions of tasks. We now analyze the complexity of the four heuristics. First- Fit and Best-Fit are among the simplest heuristics. MBS, in the worst case, has the same complexity as an exhaustive search. The complexity of Incremental First-Fit, First-Fit, Best-Fit, and MBS is O(mlogm), O(mlogm), O(mlogm), and O(m u+ ), respectively; where m is the total number of tasks in the system and u is the maximum number of tasks that can be placed in one core. The overhead of Incremental First- Fit is smaller than that of First-Fit because of fewer system calls. The improved time complexity is archived by using two more arrays with space complexity of O(m). C. Integration of Optimization and Feedback Control We now analyze the integration needed for the core-level utilization control loops to work with task consolidation. Specifically, we need to ensure that the stability of the corelevel utilization control is not affected when the processorlevel task consolidation reallocates the tasks to each core. Equation (5) reveals that the controller parameter is determined by the task set on the core. We now analyze the stability of the core-level utilization control when the controller parameter is incorrectly configured. First we define γ to be c j r j T j X i γ = (2) c j r j T j S i where c j r j is the utilization controller parameter and X i T j S i is the task allocation determined by the processor-level task consolidation. The transfer function of the closed-loop system controlled by the controller (6) becomes: γ G(z) = z ( γ). (3) From (3), if γ is less than, the system will be stable since the pole is inside the unit circle in the Z-plane. The processor-level task consolidation manager will check task migrations against the derived stability criteria. If the system would become unstable after the migration, the task migration will be prohibited. VI. SYSTEM IMPLEMENTATION We first introduce the physical testbed used in our experiments. Next we introduce our simulation environment. A. Physical Testbed Our testbed is an Intel Xeon X5365 Quad Core processor with an 8MB on-die L2 cache and,333 MHz Front Side Bus. The processor supports four DVFS levels: 3GHz, 2.67GHz, 2.33GHz, and 2GHz. According to Intel, the processor has Core and Core fabricated on one die and Core 2 and Core 3 on a separate die. We must change the DVFS levels of the 2 cores on each die in order to have a real impact on the processor power consumption. Therefore, we use this processor to emulate a dual-core processor that supports a percore DVFS. The operating system is a Fedora Core 7 with a Linux kernel and a real-time-preempt kernel patch. The default Linux kernel may migrate real-time tasks by itself, which can cause deadline misses as the core utilizations are not guaranteed by the kernel during migration. To disable task migration from the Linux kernel, we use a standard system call SCHED SETAFFINITY [6], which is a portable approach across different platforms. The overhead of the system call for task migration among cores is less than 4 μs which is acceptable in many real-time embedded systems. The detailed overhead results are in [3]. We adopt the Mibench benchmarks [5] designed for embedded systems as our tasks. Our experiments run a mediumsized workload comprised of tasks to run the Mibench benchmarks. Both cores initially have five periodic tasks with a total utilization of.3. The task parameters such as periods are configured according to a real real-time application []. The tasks on each core are scheduled by the RMS algorithm [9]. Note that our solution can also be used with other scheduling approaches, such as EDF, as long as the corresponding schedulable utilization bound is adopted. We use RMS as an example in this paper because RMS usually has a smaller runtime overhead in real systems. The deadline of

7 each task T i equals its period, /r i. The utilization set point of every core is set to its RMS schedulable utilization bound [9], i.e., B i = m(2 /m ), where m is the number of tasks on C i. Since the number of tasks may change according to the processor-level task consolidation, the set point can be set to.69 which is the limit of B i = m(2 /m ) when m. All tasks meet their deadlines if the desired utilization on every core is enforced. We now introduce the implementation details of each component in our system architecture. Utilization Monitor: The utilization monitor uses the /proc/stat file in Linux to estimate the core utilization in each sampling period. The /proc/stat file records the number of jiffies (usually ms - ms in Linux) when a core is in user mode, user mode with low priority (nice), system mode, and when used by the idle task, since the system starts. The utilization of each task can be calculated based on the number of jiffies consumed by the task process in each control period. Core-level Utilization Controller: The controller is implemented as a single-thread process with the highest priority running on each core. With a control period of 3 second, the controller periodically reads the core utilization, executes the control algorithm presented in Section IV-C to compute the desired core frequency, and sends the new frequency to the frequency modulator on the core. Frequency Modulator: We use Intel s Enhanced SpeedStep Technology to enforce the new CPU frequency. To the change core frequency, one needs to install the cpufreq package and then use the root privilege to write the new frequency level into the system file. A routine periodically checks this file and resets the core frequency accordingly. The average overhead (i.e., transition latency) to change the frequency in Intel processors is approximately μs. Power Monitor: To measure the power consumption of the processor, an Agilent 344A digital multimeter (DMM) is used with a Fluke i4 current probe to measure the current running through the 2V power lines that power the processor. The probe is clamped to the 2V lines and produces a voltage signal proportional to the current running through the lines with a coefficient of mv/a. The resultant voltage signal is then measured with the multi-meter. The accuracy of the probe is.5% of reading +.5A. B. Simulation Environment Our simulation environment is composed of an event-driven simulator implemented in 3,546 lines of C++ code. The simulator implements a multi-core real-time system with our control architecture, utilization monitor, frequency modulator, and task consolidation manager. The synthetic real-time applications are randomly generated with each task set containing 8-64 tasks, where their initial execution times are randomly generated following a uniform distribution. The task period of each task is also randomly generated within a range. In our simulations, we consider processors with 6, 32, 64, 28, and 256 cores. The power model parameters used in our simulations are based on the power models of many-core processors used in [6][8]. For the parameters in the power model (8), we set α i =,β i =3and a normalized frequency is used with F max =. The normalized maximum frequencydependent active power on each core is P d i =. Moreover, the static power is configured as P s =. normalized with respect to Pd i. According to [6], as feature sizes shrink below. micron, Pind i can become comparable to P d i in the near future. Thus, the normalized frequency-independent active power for one core is configured to be P ind i power model is implemented using numerical methods. VII. EVALUATION =. The In this section, we first compare four heuristics in Section V-B, then present our empirical results conducted on the hardware multi-core testbed. To stress test our solution, we finally describe our simulation results in many-core systems. A. Baselines We use three baselines for comparison in this paper. Dynamic core scaling is a state-of-the-art algorithm [25], which adjusts both the core frequencies and number of active cores of a multi-core system to reduce the dynamic and leakage power consumption by task migration. The fundamental difference between Dynamic core scaling and the proposed solution is that the Dynamic core scaling makes a task migration decision based on the WCET of the task to be migrated. For systems operating in unpredictable environments, to guarantee the timeliness, the WCETs have to be conservative. The actual execution time of the task to be migrated is usually much smaller than the overestimated WCET. In contrast, the proposed solution makes a task migration decision based on the average CPU utilization, which can be easily monitored at runtime in a lightweight way. In addition, Dynamic core scaling uses a chip-wide DVFS while the proposed solution uses a per-core DVFS, which is already supported by many microprocessors. We demonstrate in Section VII-D that the proposed solution outperforms Dynamic core scaling significantly in terms of power savings. The second baseline, DVFS-Only, is the frequency scaling loop proposed in [27]. It relies only on DVFS to throttle the core frequency to manage the power consumption of a core, subject to the utilization constraints without turning off any cores. DVFS-Only has a similar utilization controller design with the proposed solution, but does not perform task consolidation. No-Power-Management is a classical open-loop scheduling solution that partitions the tasks in a static way [9] and the frequencies of all cores in a multi-core system are fixed to the maximum frequency level. While No-Power-Management can initially guarantee the timeliness, it may fail when task execution times change at runtime and waste energy when the system is underutilized. B. Comparison of Different Heuristics A scalability requirement for a multi-core or many-core power optimization heuristic is low run-time overhead. In this experiment, We evaluate four different heuristics presented in Section V-B in terms of both overhead and performance by simulations. Different workloads including 6 to 256 tasks are randomly generated to stress test all heuristics. To estimate the

8 s Core Utilization Power (Watt).8 Core Core Control period (3 sec) Core Utilization.8 Core Core Control period (3 sec).8 Core Core Control period (3 sec) (a) Core utilization (Proposed) (b) Core utilization (DVFS-Only) (c) Core utilization (No-Power-Management) Power (Watt) Control period (3 sec) Control period (3 sec) Control period (3 sec) (d) Processor power consumption (Proposed) (e) Processor power consumption (DVFS- (f) Processor power (No-Power-Management) Only) Fig. 2. Typical runs of three solutions (Proposed, DVFS-Only, and No-Power-Management) on the hardware testbed. The solutions are activated at the th control period and handle a 2% execution time reduction at the 2th control period. Core Utilization Power (Watt) Overhead (Microsec) (logarithmic scale). iffd FFD BFD MBS Number of Cores (a) The overhead of four optimization heuristics Fig. 3. Nomalized Power iffd FFD BFD MBS Number of Cores (b) Power consumptions using four optimization heuristics Comparison among the three heuristics overhead of the heuristics, we measure the execution time of each heuristic on a 2.5-GHz Intel Core 2 Duo PC with 2-GByte RAM. To obtain high-resolution measurement, we use Windows API QueryPerformanceCounter. We collect the average of multiple runs. As shown in Figure 3(a), the overhead of incremental First-Fit is smallest, while the overhead of MBS is significantly higher than the others. Figure 3(b) shows that under realistic workloads, the processor power consumption under all heuristics is very close. According to the simulations, we adopt incremental First-Fit for online power reduction in the following experiments because of its low overhead. C. Empirical Results on Hardware Testbed In this experiment, we first disable the proposed solution from the st to the th control period. As shown in Figure 2(a), the initial utilizations of Core and Core 2 are both.3. Core utilizations are lower than the RMS bound, resulting in an undesired underutilized system. We then enable the proposed solution at the th control period. As shown in Figure 2(a), all tasks on Core 2 are migrated to Core. Core 2 becomes idle and is then turned off. As shown in Figure 2(d), the power consumption of the CPU is consequently reduced by approximately 9%. As shown in Figure 2(a), at the 2th control period, the execution time of all the tasks is suddenly decreased by 2%, resulting in a sharp drop of the utilization of Core. This decrease is implemented by reducing the number of loop iterations in the Mibench benchmarks. The proposed solution responds to the utilization drop by dynamically decreasing the core frequency of the core. Since the settling time of the utilization controller is just several control periods, the utilization converges quickly to the RMS bound again. As shown in Figure 2(d), the power consumption of the CPU is further reduced by approximately %. The experiment demonstrates the effectiveness of the proposed solution with uncertain task execution times. We then examine the power efficiency of two baselines: DVFS-Only and No-Power-Management. To make a fair comparison, we adopt the same workload and scenario used for the proposed solution. For DVFS-Only, Figure 2(b) shows that at the control period the utilization of all the cores increases because DVFS-Only throttles the frequencies of both cores to the lowest levels. As a result, Figure 2(e) shows the processor power drops at the th control period. However, the power consumption is still much higher than that of the proposed solution. The reason is that DVFS-Only cannot consolidate tasks to reduce the leakage power consumption of the processor. At the 2th control period, even though the execution times of all the tasks are decreased by 2%, DVFS-Only can only achieve very slightly further power savings because both the cores are already at their lowest frequencies. This experiment demonstrates the necessity of task consolidation. For No-Power-Management, as shown in Figure 2(c), at the 2th control period, the execution times of all tasks are decreased by 2%. Since No-Power-Management does not decrease the core frequencies in response to the lower workload, Figure 2(f) shows that the processor power is only slightly reduced and is much higher than that of the proposed solution. Since all the three solutions do not violate the RMS schedulable utilization bounds in their entire runs, no deadline miss is observed in this experiment for any of the solutions. D. Simulation Results In this section, we first compare the proposed solution with Dynamic core scaling on a quad-core system. We then test the effectiveness of the proposed solution in many-core systems. We have also performed the evaluation of the proposed solution in heterogeneous multi-core systems. The results are not presented due to space limitations but can be found in [3]. ) Comparison with Dynamic Core Scaling: In this section, we compare the power efficiency of the proposed solution and Dynamic core scaling in unpredictable environments. The WCETs of tasks often have to be conservative in unpredictable environments as the actual execution time may vary across in a wide range at runtime. The majority of the time, the actual

9 Core utilization Core Core 2 Core 3 Core Time (sec) Core utilization Core Core 2 Core 3 Core Time (sec) (a) Proposed solution (activated at the 5s) (b) Dynamic core scaling (activated at 5s) (c) Power consumption under different ietf Fig. 4. Comparison between the proposed solution and Dynamic core scaling. execution times can be much smaller than the pessimistic WCETs. The inverse execution-time factor (ietf) denotes the ratio of the estimated execution time to the actual execution time of a periodic task. The greater the ietf is, the more conservative the estimated execution time of a task. For Dynamic core scaling, the ietf can be determined by the predictability of the environment. In the first experiment, we randomly generate a small scale task set including 5 tasks in a quad-core system. The ietf of the tasks is.5. Figure 4(a) shows that all tasks are consolidated onto two cores (Cores and 2) under the proposed solution. In contrast, Figure 4(b) shows all tasks are consolidated onto three cores (Cores to 3). The reason is that Dynamic core scaling relies on WCETs to decide whether or not it migrates a task. Because of the overestimated WCETs (i.e., ietf=.5), Dynamic core scaling may prevent task migrations. Dynamic core scaling cannot make task migration decisions based on actual execution times. Otherwise, schedulable bounds may be violated after migrations and deadline misses occur. In contrast, the proposed solution relies on the feedback of the average task CPU utilizations and so tasks can be consolidated onto fewer cores. Note that when the actual execution time of a task approaches the WCET, the proposed solution can still guarantee the timeliness by dynamically enforcing the schedulable utilization bounds. Figure 4(b) also shows that only Core reaches the utilization bound under Dynamic core scaling due to its assumption of chip-wide DVFS. If the workload is not perfectly balanced, which is common in a real system, chip-wide DVFS cannot allow all cores to reach the RMS bound at the same time, resulting in undesired underutilized systems and unnecessarily more power consumption. In this experiment, after the activation at 5s, the normalized power consumption of the proposed solution is reduced from 8. to 3.6, while that of Dynamic core scaling is reduced from 8. to Dynamic core scaling consumes about 5.5% more power than the proposed solution. The reason is that the proposed solution can consolidate tasks to reduce leakage power and utilize per-core DVFS to save more dynamic power. We then compare the normalized power consumption of the proposed solution and Dynamic core scaling when the ietf varies from to.8. Figure 4(c) shows when the ietf is, which means the the actual execution times are equal to the WCETs, the normalized power consumption of Dynamic core scaling is approximately the same as that of the proposed solution. The slight difference is because Dynamic core scaling does not utilize per-core DVFS. When the ietf increases from Nomalized power Proposed Solution Dynamic core scaling Inverse execution-time factor to.2, the normalized power consumption of Dynamic core scaling increases to approximately 5.5% more than that of the proposed solution. The reason is that when the ietf is, both the solutions consolidate tasks onto two cores. When the task WCETs increase to.2 times of the actual execution times (i.e., ietf=.2), Dynamic core scaling uses three core while the proposed solution still uses only two. When the ietf increases from.2 to.6, the difference between the two solutions only changes marginally because Dynamic core scaling still utilizes three cores in this case. However, when the ietf further increases to.8, Dynamic core scaling begins to use all four cores. As a result, it consumes 39.6% more power than the proposed solution. Since both the proposed solution and Dynamic core scaling can enforce the CPU utilization dynamically on each core, the two solutions both achieve a zero deadline miss ratio in all runs. This experiment demonstrates that the proposed solution significantly improves the power efficiency of real-time systems in unpredictable environments. 2) Many-Core Systems: In this set of experiments, we evaluate the proposed solution in four many-core systems with 6, 32, 64, and 28 cores. The workload of each many-core system is randomly generated using the tool introduced in Section VI-B. We first test the real-time performance under execution time uncertainty between the proposed solution and No-Power- Management, which is an open-loop solution and cannot adapt to runtime execution time variations. The utilization set points for all the cores is.69 (i.e., the RMS bound). In the middle of each run, the execution times of all the tasks on C are increased by %. As shown in Figure 5(a), the proposed solution effectively controls the CPU utilization of C back to the set point and so, on average, has a zero steady state error with a maximal deviation less than.. In contrast, the average utilization of C scheduled by No- Power-Management is much higher than the set point due to the significant execution time increase at runtime. Figure 5(b) plots the deadline miss ratios of the two solutions. The deadline miss ratio is calculated as the total misses of all the tasks divided by the total number of released jobs. As shown in Figure 5(b), the deadline miss ratio of the proposed solution is much lower than that of No-Power-Management. The reason is that the utilization of C under the proposed solution is controlled below the schedulable bound the majority of the time, except during the short time interval that the controller takes to control the utilization back to the set point after

Cache-Aware Utilization Control for Energy-Efficient Multi-Core Real-Time Systems

Cache-Aware Utilization Control for Energy-Efficient Multi-Core Real-Time Systems Cache-Aware Utilization Control for Energy-Efficient Multi-Core Real-Time Systems Xing Fu, Khairul Kabir, and Xiaorui Wang Dept. of Electrical Engineering and Computer Science, University of Tennessee,

More information

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth

More information

Feedback-Directed Management of Performance and Power for Emerging Computer Systems

Feedback-Directed Management of Performance and Power for Emerging Computer Systems University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 5-2016 Feedback-Directed Management of Performance and Power for Emerging Computer

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors

Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors Xiaorui Wang, Kai Ma, and Yefu Wang Department of Electrical Engineering and Computer Science University of Tennessee,

More information

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters Gregor von Laszewski, Lizhe Wang, Andrew J. Younge, Xi He Service Oriented Cyberinfrastructure Lab Rochester Institute of Technology,

More information

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013)

CPU Scheduling. Daniel Mosse. (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013) CPU Scheduling Daniel Mosse (Most slides are from Sherif Khattab and Silberschatz, Galvin and Gagne 2013) Basic Concepts Maximum CPU utilization obtained with multiprogramming CPU I/O Burst Cycle Process

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip

More information

CHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER

CHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER 73 CHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER 7.1 INTRODUCTION The proposed DVS algorithm is implemented on DELL INSPIRON 6000 model laptop, which has Intel Pentium Mobile Processor

More information

IN recent years, power control (also called power capping)

IN recent years, power control (also called power capping) IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 2, FEBRUARY 2011 245 Coordinating Power Control and Performance Management for Virtualized Server Clusters Xiaorui Wang, Member, IEEE,

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou (  Zhejiang University Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads

More information

Empirical Approximation and Impact on Schedulability

Empirical Approximation and Impact on Schedulability Cache-Related Preemption and Migration Delays: Empirical Approximation and Impact on Schedulability OSPERT 2010, Brussels July 6, 2010 Andrea Bastoni University of Rome Tor Vergata Björn B. Brandenburg

More information

Performance Controlled Power Optimization for Virtualized Internet Datacenters

Performance Controlled Power Optimization for Virtualized Internet Datacenters University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 8-2011 Performance Controlled Power Optimization for Virtualized Internet Datacenters

More information

Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ

Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ Dongkun Shin Jihong Kim School of CSE School of CSE Seoul National University Seoul National University Seoul, Korea

More information

Chapter 5: Process Scheduling

Chapter 5: Process Scheduling Chapter 5: Process Scheduling Chapter 5: Process Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Thread Scheduling Operating Systems Examples Algorithm

More information

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 1 School of

More information

Computer Science Window-Constrained Process Scheduling for Linux Systems

Computer Science Window-Constrained Process Scheduling for Linux Systems Window-Constrained Process Scheduling for Linux Systems Richard West Ivan Ganev Karsten Schwan Talk Outline Goals of this research DWCS background DWCS implementation details Design of the experiments

More information

DPPC: Dynamic Power Partitioning and Capping in Chip Multiprocessors

DPPC: Dynamic Power Partitioning and Capping in Chip Multiprocessors DPPC: Dynamic Partitioning and Capping in Chip Multiprocessors Kai Ma 1,2, Xiaorui Wang 1,2, and Yefu Wang 2 1 The Ohio State University 2 University of Tennessee, Knoxville Abstract A key challenge in

More information

vsphere Resource Management Update 2 VMware vsphere 5.5 VMware ESXi 5.5 vcenter Server 5.5

vsphere Resource Management Update 2 VMware vsphere 5.5 VMware ESXi 5.5 vcenter Server 5.5 vsphere Resource Management Update 2 VMware vsphere 5.5 VMware ESXi 5.5 vcenter Server 5.5 You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/ If

More information

CHAPTER 6 STATISTICAL MODELING OF REAL WORLD CLOUD ENVIRONMENT FOR RELIABILITY AND ITS EFFECT ON ENERGY AND PERFORMANCE

CHAPTER 6 STATISTICAL MODELING OF REAL WORLD CLOUD ENVIRONMENT FOR RELIABILITY AND ITS EFFECT ON ENERGY AND PERFORMANCE 143 CHAPTER 6 STATISTICAL MODELING OF REAL WORLD CLOUD ENVIRONMENT FOR RELIABILITY AND ITS EFFECT ON ENERGY AND PERFORMANCE 6.1 INTRODUCTION This chapter mainly focuses on how to handle the inherent unreliability

More information

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date:

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date: Subject Name: OPERATING SYSTEMS Subject Code: 10EC65 Prepared By: Kala H S and Remya R Department: ECE Date: Unit 7 SCHEDULING TOPICS TO BE COVERED Preliminaries Non-preemptive scheduling policies Preemptive

More information

Exam Review TexPoint fonts used in EMF.

Exam Review TexPoint fonts used in EMF. Exam Review Generics Definitions: hard & soft real-time Task/message classification based on criticality and invocation behavior Why special performance measures for RTES? What s deadline and where is

More information

Multimedia Systems 2011/2012

Multimedia Systems 2011/2012 Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware

More information

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing

More information

Power-Aware Throughput Control for Database Management Systems

Power-Aware Throughput Control for Database Management Systems Power-Aware Throughput Control for Database Management Systems Zichen Xu, Xiaorui Wang, Yi-Cheng Tu * The Ohio State University * The University of South Florida Power-Aware Computer Systems (PACS) Lab

More information

CPU Scheduling: Objectives

CPU Scheduling: Objectives CPU Scheduling: Objectives CPU scheduling, the basis for multiprogrammed operating systems CPU-scheduling algorithms Evaluation criteria for selecting a CPU-scheduling algorithm for a particular system

More information

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques

COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques COL862: Low Power Computing Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques Authors: Huazhe Zhang and Henry Hoffmann, Published: ASPLOS '16 Proceedings

More information

Improving Real-Time Performance on Multicore Platforms Using MemGuard

Improving Real-Time Performance on Multicore Platforms Using MemGuard Improving Real-Time Performance on Multicore Platforms Using MemGuard Heechul Yun University of Kansas 2335 Irving hill Rd, Lawrence, KS heechul@ittc.ku.edu Abstract In this paper, we present a case-study

More information

Lecture Topics. Announcements. Today: Uniprocessor Scheduling (Stallings, chapter ) Next: Advanced Scheduling (Stallings, chapter

Lecture Topics. Announcements. Today: Uniprocessor Scheduling (Stallings, chapter ) Next: Advanced Scheduling (Stallings, chapter Lecture Topics Today: Uniprocessor Scheduling (Stallings, chapter 9.1-9.3) Next: Advanced Scheduling (Stallings, chapter 10.1-10.4) 1 Announcements Self-Study Exercise #10 Project #8 (due 11/16) Project

More information

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Ewa Kusmierek and David H.C. Du Digital Technology Center and Department of Computer Science and Engineering University of Minnesota

More information

Response Time and Throughput

Response Time and Throughput Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing

More information

Scheduling. Today. Next Time Process interaction & communication

Scheduling. Today. Next Time Process interaction & communication Scheduling Today Introduction to scheduling Classical algorithms Thread scheduling Evaluating scheduling OS example Next Time Process interaction & communication Scheduling Problem Several ready processes

More information

PROCESS SCHEDULING II. CS124 Operating Systems Fall , Lecture 13

PROCESS SCHEDULING II. CS124 Operating Systems Fall , Lecture 13 PROCESS SCHEDULING II CS124 Operating Systems Fall 2017-2018, Lecture 13 2 Real-Time Systems Increasingly common to have systems with real-time scheduling requirements Real-time systems are driven by specific

More information

Performance, Power, Die Yield. CS301 Prof Szajda

Performance, Power, Die Yield. CS301 Prof Szajda Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the

More information

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks Last Time Making correct concurrent programs Maintaining invariants Avoiding deadlocks Today Power management Hardware capabilities Software management strategies Power and Energy Review Energy is power

More information

PowerTracer: Tracing requests in multi-tier services to diagnose energy inefficiency

PowerTracer: Tracing requests in multi-tier services to diagnose energy inefficiency : Tracing requests in multi-tier services to diagnose energy inefficiency Lin Yuan 1, Gang Lu 1, Jianfeng Zhan 1, Haining Wang 2, and Lei Wang 1 1 Institute of Computing Technology, Chinese Academy of

More information

Practice Exercises 305

Practice Exercises 305 Practice Exercises 305 The FCFS algorithm is nonpreemptive; the RR algorithm is preemptive. The SJF and priority algorithms may be either preemptive or nonpreemptive. Multilevel queue algorithms allow

More information

Scheduling II. Today. Next Time. ! Proportional-share scheduling! Multilevel-feedback queue! Multiprocessor scheduling. !

Scheduling II. Today. Next Time. ! Proportional-share scheduling! Multilevel-feedback queue! Multiprocessor scheduling. ! Scheduling II Today! Proportional-share scheduling! Multilevel-feedback queue! Multiprocessor scheduling Next Time! Memory management Scheduling with multiple goals! What if you want both good turnaround

More information

LECTURE 3:CPU SCHEDULING

LECTURE 3:CPU SCHEDULING LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives

More information

Chapter 5: CPU Scheduling

Chapter 5: CPU Scheduling Chapter 5: CPU Scheduling Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Operating Systems Examples Algorithm Evaluation

More information

Modification and Evaluation of Linux I/O Schedulers

Modification and Evaluation of Linux I/O Schedulers Modification and Evaluation of Linux I/O Schedulers 1 Asad Naweed, Joe Di Natale, and Sarah J Andrabi University of North Carolina at Chapel Hill Abstract In this paper we present three different Linux

More information

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System Abstract In this paper, we address the latency issue in RT- XEN virtual machines that are available in Xen 4.5. Despite the advantages of applying virtualization to systems, the default credit scheduler

More information

Chapter 5: CPU Scheduling. Operating System Concepts Essentials 8 th Edition

Chapter 5: CPU Scheduling. Operating System Concepts Essentials 8 th Edition Chapter 5: CPU Scheduling Silberschatz, Galvin and Gagne 2011 Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Operating

More information

vsphere Resource Management Update 1 11 JAN 2019 VMware vsphere 6.7 VMware ESXi 6.7 vcenter Server 6.7

vsphere Resource Management Update 1 11 JAN 2019 VMware vsphere 6.7 VMware ESXi 6.7 vcenter Server 6.7 vsphere Resource Management Update 1 11 JAN 2019 VMware vsphere 6.7 VMware ESXi 6.7 vcenter Server 6.7 You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/

More information

Scheduling the Intel Core i7

Scheduling the Intel Core i7 Third Year Project Report University of Manchester SCHOOL OF COMPUTER SCIENCE Scheduling the Intel Core i7 Ibrahim Alsuheabani Degree Programme: BSc Software Engineering Supervisor: Prof. Alasdair Rawsthorne

More information

Evaluating a DVS Scheme for Real-Time Embedded Systems

Evaluating a DVS Scheme for Real-Time Embedded Systems Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé, Rami Melhem Computer Science Department, University of Pittsburgh {xruibin,mosse,melhem}@cs.pitt.edu Abstract Dynamic voltage

More information

A Fuzzy-based Multi-criteria Scheduler for Uniform Multiprocessor Real-time Systems

A Fuzzy-based Multi-criteria Scheduler for Uniform Multiprocessor Real-time Systems 10th International Conference on Information Technology A Fuzzy-based Multi-criteria Scheduler for Uniform Multiprocessor Real-time Systems Vahid Salmani Engineering, Ferdowsi University of salmani@um.ac.ir

More information

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three level scheduling 2 1 Types of Scheduling 3 Long- and Medium-Term Schedulers Long-term scheduler Determines which programs

More information

A Feedback Control Approach for Guaranteeing Relative Delays in Web Servers *

A Feedback Control Approach for Guaranteeing Relative Delays in Web Servers * A Feedback Control Approach for Guaranteeing Relative Delays in Web Servers * Chenyang Lu Tarek F. Abdelzaher John A. Stankovic Sang H. Son Department of Computer Science, University of Virginia Charlottesville,

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

6.1 Motivation. Fixed Priorities. 6.2 Context Switch. Real-time is about predictability, i.e. guarantees. Real-Time Systems

6.1 Motivation. Fixed Priorities. 6.2 Context Switch. Real-time is about predictability, i.e. guarantees. Real-Time Systems Real-Time Systems Summer term 2017 6.1 Motivation 6.1 Motivation Real-Time Systems 6 th Chapter Practical Considerations Jafar Akhundov, M.Sc. Professur Betriebssysteme Real-time is about predictability,

More information

Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms

Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms Dawei Li and Jie Wu Department of Computer and Information Sciences Temple University Philadelphia, USA {dawei.li,

More information

CS307: Operating Systems

CS307: Operating Systems CS307: Operating Systems Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building 3-513 wuct@cs.sjtu.edu.cn Download Lectures ftp://public.sjtu.edu.cn

More information

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 370: SYSTEM ARCHITECTURE & SOFTWARE [CPU SCHEDULING] Shrideep Pallickara Computer Science Colorado State University OpenMP compiler directives

More information

CS3733: Operating Systems

CS3733: Operating Systems CS3733: Operating Systems Topics: Process (CPU) Scheduling (SGG 5.1-5.3, 6.7 and web notes) Instructor: Dr. Dakai Zhu 1 Updates and Q&A Homework-02: late submission allowed until Friday!! Submit on Blackboard

More information

Real-Time Internet of Things

Real-Time Internet of Things Real-Time Internet of Things Chenyang Lu Cyber-Physical Systems Laboratory h7p://www.cse.wustl.edu/~lu/ Internet of Things Ø Convergence of q Miniaturized devices: integrate processor, sensors and radios.

More information

Global Scheduling in Multiprocessor Real-Time Systems

Global Scheduling in Multiprocessor Real-Time Systems Global Scheduling in Multiprocessor Real-Time Systems Alessandra Melani 1 Global vs Partitioned scheduling Single shared queue instead of multiple dedicated queues Global scheduling Partitioned scheduling

More information

vsphere Resource Management

vsphere Resource Management Update 1 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition.

More information

Introduction to Real-time Systems. Advanced Operating Systems (M) Lecture 2

Introduction to Real-time Systems. Advanced Operating Systems (M) Lecture 2 Introduction to Real-time Systems Advanced Operating Systems (M) Lecture 2 Introduction to Real-time Systems Real-time systems deliver services while meeting some timing constraints Not necessarily fast,

More information

Chapter 5: CPU Scheduling

Chapter 5: CPU Scheduling COP 4610: Introduction to Operating Systems (Fall 2016) Chapter 5: CPU Scheduling Zhi Wang Florida State University Contents Basic concepts Scheduling criteria Scheduling algorithms Thread scheduling Multiple-processor

More information

A Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT. Felipe Cerqueira and Björn Brandenburg

A Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT. Felipe Cerqueira and Björn Brandenburg A Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT Felipe Cerqueira and Björn Brandenburg July 9th, 2013 1 Linux as a Real-Time OS 2 Linux as a Real-Time OS Optimizing system responsiveness

More information

CHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT

CHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT CHAPTER 6 ENERGY AWARE SCHEDULING ALGORITHMS IN CLOUD ENVIRONMENT This chapter discusses software based scheduling and testing. DVFS (Dynamic Voltage and Frequency Scaling) [42] based experiments have

More information

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot. Heterogeneous Multi-Processing Technology. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot Heterogeneous Multi-Processing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

POWER MANAGEMENT AND ENERGY EFFICIENCY

POWER MANAGEMENT AND ENERGY EFFICIENCY POWER MANAGEMENT AND ENERGY EFFICIENCY * Adopted Power Management for Embedded Systems, Minsoo Ryu 2017 Operating Systems Design Euiseong Seo (euiseong@skku.edu) Need for Power Management Power consumption

More information

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition,

Chapter 5: CPU Scheduling. Operating System Concepts 8 th Edition, Chapter 5: CPU Scheduling Operating System Concepts 8 th Edition, Hanbat National Univ. Computer Eng. Dept. Y.J.Kim 2009 Chapter 5: Process Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

More information

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)

More information

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2019 Lecture 8 Scheduling Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ POSIX: Portable Operating

More information

Fast on Average, Predictable in the Worst Case

Fast on Average, Predictable in the Worst Case Fast on Average, Predictable in the Worst Case Exploring Real-Time Futexes in LITMUSRT Roy Spliet, Manohar Vanga, Björn Brandenburg, Sven Dziadek IEEE Real-Time Systems Symposium 2014 December 2-5, 2014

More information

Let s look at each and begin with a view into the software

Let s look at each and begin with a view into the software Power Consumption Overview In this lesson we will Identify the different sources of power consumption in embedded systems. Look at ways to measure power consumption. Study several different methods for

More information

Cluster scheduling for real-time systems: utilization bounds and run-time overhead

Cluster scheduling for real-time systems: utilization bounds and run-time overhead Real-Time Syst (2011) 47: 253 284 DOI 10.1007/s11241-011-9121-1 Cluster scheduling for real-time systems: utilization bounds and run-time overhead Xuan Qi Dakai Zhu Hakan Aydin Published online: 24 March

More information

Improving Virtual Machine Scheduling in NUMA Multicore Systems

Improving Virtual Machine Scheduling in NUMA Multicore Systems Improving Virtual Machine Scheduling in NUMA Multicore Systems Jia Rao, Xiaobo Zhou University of Colorado, Colorado Springs Kun Wang, Cheng-Zhong Xu Wayne State University http://cs.uccs.edu/~jrao/ Multicore

More information

Multi-Mode Virtualization for Soft Real-Time Systems

Multi-Mode Virtualization for Soft Real-Time Systems Multi-Mode Virtualization for Soft Real-Time Systems Haoran Li, Meng Xu, Chong Li, Chenyang Lu, Christopher Gill, Linh Phan, Insup Lee, Oleg Sokolsky Washington University in St. Louis, University of Pennsylvania

More information

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems

Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Efficient Evaluation and Management of Temperature and Reliability for Multiprocessor Systems Ayse K. Coskun Electrical and Computer Engineering Department Boston University http://people.bu.edu/acoskun

More information

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018 ECE 172 Digital Systems Chapter 15 Turbo Boost Technology Herbert G. Mayer, PSU Status 8/13/2018 1 Syllabus l Introduction l Speedup Parameters l Definitions l Turbo Boost l Turbo Boost, Actual Performance

More information

Constructing and Verifying Cyber Physical Systems

Constructing and Verifying Cyber Physical Systems Constructing and Verifying Cyber Physical Systems Mixed Criticality Scheduling and Real-Time Operating Systems Marcus Völp Overview Introduction Mathematical Foundations (Differential Equations and Laplace

More information

SHIP: A Scalable Hierarchical Power Control Architecture for Large-Scale Data Centers Supplementary File

SHIP: A Scalable Hierarchical Power Control Architecture for Large-Scale Data Centers Supplementary File 1 SHIP: A Scalable Hierarchical Power Control Architecture for Large-Scale Data Centers Supplementary File Xiaorui Wang, Ming Chen Charles Lefurgy, Tom W. Keller Department of Electrical Engineering and

More information

DVFS Space Exploration in Power-Constrained Processing-in-Memory Systems

DVFS Space Exploration in Power-Constrained Processing-in-Memory Systems DVFS Space Exploration in Power-Constrained Processing-in-Memory Systems Marko Scrbak and Krishna M. Kavi Computer Systems Research Laboratory Department of Computer Science & Engineering University of

More information

Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW

Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW Department of Computer Science Institute for System Architecture, Operating Systems Group REAL-TIME MICHAEL ROITZSCH OVERVIEW 2 SO FAR talked about in-kernel building blocks: threads memory IPC drivers

More information

Utilization based Spare Capacity Distribution

Utilization based Spare Capacity Distribution Utilization based Spare Capacity Distribution Attila Zabos, Robert I. Davis, Alan Burns Department of Computer Science University of York Abstract Flexible real-time applications have predefined temporal

More information

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,

More information

Power Management for Embedded Systems

Power Management for Embedded Systems Power Management for Embedded Systems Minsoo Ryu Hanyang University Why Power Management? Battery-operated devices Smartphones, digital cameras, and laptops use batteries Power savings and battery run

More information

Partitioned Fixed-Priority Scheduling of Parallel Tasks Without Preemptions

Partitioned Fixed-Priority Scheduling of Parallel Tasks Without Preemptions Partitioned Fixed-Priority Scheduling of Parallel Tasks Without Preemptions *, Alessandro Biondi *, Geoffrey Nelissen, and Giorgio Buttazzo * * ReTiS Lab, Scuola Superiore Sant Anna, Pisa, Italy CISTER,

More information

An Oracle White Paper September Oracle Utilities Meter Data Management Demonstrates Extreme Performance on Oracle Exadata/Exalogic

An Oracle White Paper September Oracle Utilities Meter Data Management Demonstrates Extreme Performance on Oracle Exadata/Exalogic An Oracle White Paper September 2011 Oracle Utilities Meter Data Management 2.0.1 Demonstrates Extreme Performance on Oracle Exadata/Exalogic Introduction New utilities technologies are bringing with them

More information

4/6/2011. Informally, scheduling is. Informally, scheduling is. More precisely, Periodic and Aperiodic. Periodic Task. Periodic Task (Contd.

4/6/2011. Informally, scheduling is. Informally, scheduling is. More precisely, Periodic and Aperiodic. Periodic Task. Periodic Task (Contd. So far in CS4271 Functionality analysis Modeling, Model Checking Timing Analysis Software level WCET analysis System level Scheduling methods Today! erformance Validation Systems CS 4271 Lecture 10 Abhik

More information

Power Measurement Using Performance Counters

Power Measurement Using Performance Counters Power Measurement Using Performance Counters October 2016 1 Introduction CPU s are based on complementary metal oxide semiconductor technology (CMOS). CMOS technology theoretically only dissipates power

More information

A Design Framework for Real-Time Embedded Systems with Code Size and Energy Constraints

A Design Framework for Real-Time Embedded Systems with Code Size and Energy Constraints A Design Framework for Real-Time Embedded Systems with Code Size and Energy Constraints SHEAYUN LEE Samsung Electronics Co., Ltd. INSIK SHIN University of Pennsylvania WOONSEOK KIM Samsung Electronics

More information

A Study of the Effectiveness of CPU Consolidation in a Virtualized Multi-Core Server System *

A Study of the Effectiveness of CPU Consolidation in a Virtualized Multi-Core Server System * A Study of the Effectiveness of CPU Consolidation in a Virtualized Multi-Core Server System * Inkwon Hwang and Massoud Pedram University of Southern California Los Angeles CA 989 {inkwonhw, pedram}@usc.edu

More information

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5 OPERATING SYSTEMS CS3502 Spring 2018 Processor Scheduling Chapter 5 Goals of Processor Scheduling Scheduling is the sharing of the CPU among the processes in the ready queue The critical activities are:

More information

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) 1/32 CPU Scheduling The scheduling problem: - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) When do we make decision? 2/32 CPU Scheduling Scheduling decisions may take

More information

Diffusion TM 5.0 Performance Benchmarks

Diffusion TM 5.0 Performance Benchmarks Diffusion TM 5.0 Performance Benchmarks Contents Introduction 3 Benchmark Overview 3 Methodology 4 Results 5 Conclusion 7 Appendix A Environment 8 Diffusion TM 5.0 Performance Benchmarks 2 1 Introduction

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP

A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP A Study on Optimally Co-scheduling Jobs of Different Lengths on CMP Kai Tian Kai Tian, Yunlian Jiang and Xipeng Shen Computer Science Department, College of William and Mary, Virginia, USA 5/18/2009 Cache

More information

Scheduling. Scheduling 1/51

Scheduling. Scheduling 1/51 Scheduling 1/51 Scheduler Scheduling Scheduler allocates cpu(s) to threads and processes. This action is known as scheduling. The scheduler is a part of the process manager code that handles scheduling.

More information

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s)

CPU Scheduling. The scheduling problem: When do we make decision? - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) CPU Scheduling The scheduling problem: - Have K jobs ready to run - Have N 1 CPUs - Which jobs to assign to which CPU(s) When do we make decision? 1 / 31 CPU Scheduling new admitted interrupt exit terminated

More information