Adaptive Online Cache Reconfiguration for Low Power Systems

Size: px
Start display at page:

Download "Adaptive Online Cache Reconfiguration for Low Power Systems"

Transcription

1 Adaptive Online Cache Reconfiguration for Low Power Systems Andre Costi Nacul and Tony Givargis Department of Computer Science University of California, Irvine Center for Embedded Computer Systems {nacul, Technical Report #03-01 April 23, Abstract Given a set of real-time tasks scheduled using earliest deadline first (EDF), we propose an online algorithm for dynamically reconfiguring the cache subsystem of a system-on-achip (SOC) platform to meet timing requirements while minimizing power consumption. Our online algorithm gradually constructs a set of pseudo-pareto-optimal cache configurations for each task, which it then uses to determine a low power operating point meeting timing requirements. We have evaluated the quality of our algorithm by considering the overall energy saving of an SOC platform, including the time and energy overhead of the scheduler and our online algorithm, as well as the cache reconfiguration penalties. Our results show savings of approximately 20% in overall system energy usage. Keywords Adaptive caches, dynamic reconfiguration, low-power design, system-on-a-chip 1

2 Table of Contents 1 INTRODUCTION RELATED WORK PROBLEM DESCRIPTION PROPOSED SOLUTION OVERVIEW THE PARETO DISCOVERY PHASE CONFIGURATION SELECTION PHASE EXPERIMENTS CONCLUSION...10 REFERENCES...11 List of Figures Figure 1 - Runtime behavior...5 Figure 2 - The online algorithm....6 Figure 3 - Scheduler skeleton...7 Figure 4 - Target SOC Platform...8 Figure 5 - Experimental Results

3 1 Introduction Minimizing energy consumption of electronic devices has become a first class system design concern [1], especially, in the areas of embedded and portable devices, since such devices draw their current from batteries that place a limited amount of energy at the system s disposal. On the other hand, in recent years, increased application demand for functionality [2][3], market pressures, and shortening of design cycles, have led to a new system-on-a-chip (SOC) platform based design methodology [4]. A platform is a computing system composed of artifacts such as general-purpose processors, hierarchy of caches, on-chip main memory, I/O peripherals, co-processors, and possibly FPGA fabric for post-fabrication customizations. These platforms are generally targeted toward a large number of applications from a specific domain (e.g., networking or multimedia). To address the need for energy efficiency, the artifacts within these SOC platforms are often designed to be dynamically configurable. Features such as processor and memory power modes [5]; dynamic voltage scaling [6]; and run-time cache reconfiguration (e.g., Motorola s M*CORE [7][8]) have been commercially introduced. Dynamic reconfiguration of the platform provides an opportunity for operating system (OS) and/or application tasks to carry out strategic high-level resource management and achieve energy savings. In this work, we propose an online algorithm for dynamically adapting the cache subsystem to the workload requirements for the purposes of saving energy. The workload is considered to be a set of tasks with real-time deadlines. Our online algorithm is invoked as part of the OS scheduler, which performs standard earliest deadline first (EDF) task scheduling first. Then, our online algorithm, determines an ideal cache configuration for the current task that is to be executed. In our experiments, we consider the overhead of the OS scheduler, our online algorithm, as well as the cache reconfiguration time and energy penalties. Furthermore, we evaluate the quality of our technique by measuring the global power savings (i.e., considering processor, memories, and buses in addition to caches). When invoked, our algorithm initially performs an incremental search of the cache configuration space and updates a set of pseudo-pareto-optimal points for the current task to be executed. Subsequently, the pseudo-pareto-optimal set is used to select a configuration meeting the task deadline while minimizing power consumption. The remainder of this paper is organized as follows. In Section 2, we outline related work. In Section 3, we formulate the problem and state our assumptions. In Section 4, we introduce our online algorithm. In Section 5, we present our experiments. In Section 6, we give our concluding remarks. 2 Related Work One similar approach that has gained popularity is dynamic voltage scaling (DVS), where one can save energy with minor performance degradation by reducing the operating supply voltage of the processor, or even of the whole system [9][10][11]. The premise of all DVS techniques is to achieve a steady/even processor speed while meeting all tasks deadlines. This is often accomplished by appropriately scheduling tasks and selecting 3

4 voltage settings that eliminate the slack. Our approach is completely complementary to DVS. In our approach, we do not perform task scheduling. Instead, we assume an already scheduled task set. The premise of our work is to tune the cache down to the working set of each task that is to be executed, thus saving on cache power consumption. Our aim is not to disturb the task timings. Thus, the advantage is the possibility of combining a DVS scheduler with our approach for added benefit. A great amount of previous work has shown that statically tuning the cache subsystem to the running task can result in significant energy savings [12][13]. For example, Motorola s recent version of an M*CORE processor IC has a configurable 4 way set associative unified cache, in which each way can be disabled, or used for instructions, data, or both. Malik et al. [8] have shown that the best cache configuration depends heavily on the particular running task. Likewise, Zhang et al. [14] analysis shows that having a dynamically configurable line size architecture can have a significant (up to 50%) energy saving potential in embedded systems. Tang et al. [15] have proposed an architectural scheme for dynamic cache line sizing. Their approach is to introduce a hardware unit along with a memory and cache protocol for fine grained tuning of the line size. In contrast, our approach is a software technique that allows the OS to take charge of cache reconfiguration, taking into account a dynamic workload and application requirements. In a similar effort, Dropsho et al. [16] have considered disabling cache ways (i.e., associativity) dynamically to achieve low power. They propose cache architectures intended for dynamic reconfiguration. Further, they provide a hardware solution for adaptivity. As with the previous technique, our approach is a software technique performing adaptivity at the task and OS levels. 3 Problem Description Our problem formulation is as follows. The system is composed of N tasks, T 1, T 2 T n. Each task T i has a deadline D i and a period P i. To generalize the solution, a non periodic or sporadic task T i is assumed to have P i = 0. Tasks are non-preemptive. One of the tasks that are running on the platform is the scheduler T s. Scheduler task T s has no deadline and no period, and is activated every time a task finishes execution to perform the context switching. As stated previously, the scheduler selects the next task T j to be executed based on EDF. Then, our online algorithm, running as part of the scheduler, selects an appropriate cache configuration that maintains the timing of the task T j while saving as much energy as possible. The platform s cache subsystem is assumed to have a finite number of possible configurations C 1, C 2 C n. Each configuration C i will be different than any other configuration C j by at least one of the configurable parameters: cache size, line size or cache associativity. Among all valid configurations, one of them is the so-called reference configuration C r. The reference configuration is assumed to be the default system configuration, or the configuration to be used if dynamic cache reconfiguration is not used. For schedulability testing, we assume that the worse case execution time of each task under the reference configuration is known ahead of time (e.g., obtained via offline simulation). 4

5 T i / C i T s / Ci P T (C i,c j ) T j / C j T s / Cj P T (C j,c k ) Figure 1 - Runtime behavior Time We assume a time penalty for cache reconfiguration. This penalty is for writing dirty data back to memory. The time penalty is captured by a function P T (C i,c j ) of the current configuration C i and the new configuration C j. This function can be either hard coded statically, or learned by our online algorithm during run time. Figure 1 depicts the runtime behavior proposed here. Here, task T i runs with cache configuration C i, while T j with C j. Between these two tasks, the scheduler T s executes with the same cache configuration of the last running task. Our algorithm runs as part of the scheduler task T s. Note that the time penalty P T (C i,c j ) of cache reconfiguration is also depicted in the figure 1. We note that being able to select the next cache configuration without any prior knowledge of the currently running task is especially important, as we assume that the work load is dynamic and not necessarily known during design time. The main advantage of our algorithm is that it learns about task behavior under different cache configurations in a dynamic setting. We assume that there is a way to measure the power consumption of each task just executed on the platform. We assume that this power consumption is inclusive of the power penalty for cache reconfiguration. Checking the power consumption can be accomplished by reading cache access counters and applying appropriate power models. Alternatively, a platform may provide direct measurements of the power consumption of its components. We allow for some missed deadlines as the online algorithm is learning about the task behavior under different cache configuration. This is a reasonable assumption for soft real-time application where occasional lose of a deadline is not as critical as in a hard real-time application. Despite this timing relaxation, soft real time applications (e.g., multimedia, videoconferencing, etc.) are very common in the embedded system environment. 4 Proposed Solution 4.1 Overview Dynamic cache reconfiguration poses a trade-off between power and performance. Larger caches are supposed to reduce the number of misses, allowing a task to execute 1 The time line is not to scale. 5

6 EDF Next task Phase I: Pareto discovery Scheduler Phase II: Cache selector Task database Next $ config. Figure 2 - The online algorithm. faster. On the other hand, the energy needed for a read or write in a bigger cache is larger than in a smaller cache, leading to a higher energy consumption scenario. This is clearly a multi-objective function: we want to minimize power while still meeting a time constraint. In a multi-objective function, it is usually the case that one specific solution is good for one objective, but not so good for the other ones. In the universe of different configurations, we can identify some configurations that are better than all the other ones for at least one of the performance criteria. These are the so-called Paretooptimal solutions. However computing the exact set of Pareto-optimal configurations is a challenging problem, as the configuration space is likely to be large. Instead, we aim at computing an estimated Pareto-optimal set (i.e., a near-optimal set) which we refer to as the pseudo- Pareto-optimal cache configurations. We dynamically discover the pseudo-pareto-optimal cache configurations, for each task, which are used to determine the best cache configuration for low power. Our online algorithm operates in two phases. The first phase of our online algorithm is designed to discover the performance of each task under different cache configurations. This is the Pareto-discovery phase. The second phase of our online algorithm is the cache configuration selector. Figure 2 depicts the parts of the task scheduler, which includes the two phases of our online algorithm. Note that the execution of the two phases of our online algorithm is interleaved and every time the scheduler is activated one iteration is completed for each of phase. A common task configuration database is shared between discovery and selection. The database is build incrementally by the Pareto discovery algorithm, and is used to keep information about the pseudo-pareto-optimal configurations known so far. After a finite number of iterations, the discovery process is considered finished and the database is stable. The configuration selection algorithm consults the database in order to pick the best cache configuration to be set for the current activation of the task. 6

7 SCHEDULER: Input: current task and config. (T i, C i ) Output: next task and config. (T j, C j ) // compute delta time and power dtime = time() T i.start_time dpower = (energy() T i.start_energy)/dtime // introduce new Pareto points is_pareto = true for each p k in T i.p if( p k.time < dtime && p k.power < dpower ) is_pareto = false if( is_pareto ) { T i.p = Ti.P { C i } for each p k in T i.p if( p k.time > dtime && p k.power > dpower ) T i.p = T i.p { p k } } // perform standard scheduling T j = EDF() // explore or select if( need_to_explore(t j ) ) C j = discover_pareto(t j ) // Section 4.2 else C j = pick_best_config(t j ) // Section 4.3 // prepare for next execute T j.start_time = time() T j.start_energy = energy() return(t j, C j ) The complete scheduler along with our online algorithm skeleton is given in Figure The Pareto Discovery Phase Figure 3 - Scheduler skeleton The main objective of the Pareto discovery phase is to converge on to a reasonable approximation of the actual Pareto-optimal set for each task. The discovery procedure starts with the reference configuration as the only member of the pseudo-pareto-optimal set. Gradually, each of the cache size, line size, and associativity parameter are varied, individually (i.e., one change per scheduler invocation) in a greedy search process. Specifically, in a first stage, starting from the reference configuration, the cache size parameter is changed until all possible settings have been explored, or the task timings are affected beyond certain threshold. Then, in a similar fashion, during second and third stages, the cache line and associativity parameters are varied. 7

8 µp Unified $ Timer Main Memory Power Monitor A new point p i is introduced into the pseudo-pareto-optimal set P if it has a better time or power measure than every other point p j P. The newly added point p i will invalidate any existing point p j P if p j has an inferior time and power measures than p i. Invalidated points are removed from the set P. 4.3 Configuration Selection Phase The configuration selection phase is based on the utilization rate of the processor. With smaller utilization rates, there is additional slack to change the current cache configuration to a lower energy at a higher execution time configuration. The utilization rate of the processor is calculated every time a task finishes execution, or whenever a task is added or removed to and from the system. At any moment, given that the tasks are sorted according to EDF, the utilization rate can be calculated as follows. i exec _ time j j= 1 util = max i deadlinei current _ time For the utilization calculation, the best case execution time (but not necessarily most energy efficient) of each task is used. Given this utilization rate, we calculate the target execution time for the next task T j as shown below. exec_ time target _ exec _ time = Given the target execution time, our online algorithm selects the pseudo-pareto configuration that has a time less (but closest) to the target time. j Note that utilization rate of 1.0 means that there is no slack available, thus the fastest configuration must be used to meet the deadline. On the other extreme, a low utilization rate of 0.1 means that only 1/10 of the processing available is committed to task execution, and thus the system can shift to a much lower cache configuration, increasing the execution time and saving power. 5 Experiments Figure 4 - Target SOC Platform The target SOC platform used in our experiments is shown in Figure 4. The Platune simulator was modified for our experiments [17]. Our platform included a MIPS processor with unified cache, main memory, and the associated buses. A timer peripheral j util 8

9 was used by the scheduler for interval timing. Likewise, a hardware power monitor, based on models developed in Platune, was incorporated into the platform for task power measurements. For our platform, the reference configuration C r was set to be a 8K byte, with a line size of 4 bytes, and a 2-way set associativity. The possible cache sizes ranged from 256 bytes to 8K bytes, in power of 2 increments. The possible line size ranged from 4 to 16 bytes in powers of 2 increments. Finally, the possible degree of associativity ranged from 1 to 4. An operating system scheduler was also implemented, so that (i) the real system could be tested and evaluated and (ii) the scheduling overhead could be taken into account. Our online algorithm was incorporated into this scheduler. For our experiments, we used typical embedded system tasks that are part of the PowerStone benchmark applications [17]. These tasks included a Unix compression utility called compress, a CRC checksum algorithm called crc, an encryption algorithm called des, an engine controller called engine, an FIR filter called fir, a fax decoder called g3fax, a sorting algorithm called ucbqsort, an image rendering algorithm called blit, a POCSAG communication protocol called pocsag, and a JPEG decoder called jpeg. We did three sets of experiments, each set corresponding to one of high processor utilization, medium processor utilization, and low processor utilization. In other words, we selected a mix of tasks, from the Powerstone set of tasks, along with appropriate deadlines to result in a processor utilization of 90%, 50%, and 20%. Figure 5 depicts our results for the three different processor utilization experiments. In the figure, the steady line gives the power consumption of the overall system if configured to execute with the reference cache configuration. The varying plot depicts power consumption of the system, as a function of time, as it adapts using our approach. Note that during early stages, the power profile oscillates. This is due to the algorithm discovering the pseudo-pareto-optimal points. During this time, we note that some deadlines are missed. For example, in the low processor utilization, the final power consumption (e.g., at 25 second marker) is higher than that corresponding to the earlier seen configurations (e.g., 5 second marker). However, that configuration is not used because of a deadline miss. Also, note that when the processor utilization is very high, most of the tasks need to run under the fastest configuration, which for us is the reference configuration, in order to meet the deadlines. In this case, there is as little as 1% opportunity for saving power. However, as the utilization goes down to medium and low, significant energy saving, namely 19% and 22% respectively, can be observed. We note that this is overall SOC platform power saving and not just the cache subsystem. 9

10 Load = 90% Power (W) Time (s) Adaptive Reference Load = 50% Power (W) Adaptive Reference Time (s) Load = 20% Power (W) Time (s) Adaptive Reference Figure 5 - Experimental Results 10

11 6 Conclusion We have proposed an online algorithm for dynamically reconfiguring the cache subsystem of a system-on-a-chip (SOC) platform to meet timing requirements while minimizing power consumption. Our online algorithm gradually constructs a set of pseudo-pareto-optimal cache configurations for each task, which it then uses to determine a low power operating point meeting timing requirements. We have evaluated the quality of our algorithm by considering the overall energy saving of an SOC platform, including the time and energy overhead of the scheduler and our online algorithm, as well as the cache reconfiguration penalties. Our results show savings of approximately 20% in overall system energy usage. References [1] T. Mudge. Power: A First Class Architectural Design Constraint. IEEE Computer, vol. 34, no. 4, pp , [2] International Technology Roadmap for Semiconductors (ITRS), [3] D. Wingard, R. Fordham, J. Ready, F. Romeo, A. de Oliveira. Embedded system design: the real story, in Proceedings of the Design Automation Conference, [4] F. Vahid, T. Givargis. The case for a configure-and-execute paradigm, in the Proceedings of the International Workshop on Hardware/Software Codesign, [5] V. Delaluz, M. Kandemir, N. Vijaykrishnan, A. Sivasubramaniam, M.J. Irwin. Hardware and Software Techniques for Controlling DRAM Power Modes. IEEE Transactions on Computers, vol. 50, no. 11, pp , [6] L. Geppert, T.S. Perry. Transmeta's magic show. IEEE Spectrum, vol. 37, no. 5, pp , May [7] Motorola M*CORE Product Page. [8] A. Malik, B. Moyer, D. Cermak. A Lower Power Unified Cache Architecture Providing Power and Performance Flexibility. International Symposium on Low Power Electronics and Design, June [9] T.D. Burd, T.A. Pering, A.J. Stratakos, R.W. Brodersen. A Dynamic Voltage Scaled Microprocessor system. IEEE International Solid-State Circuits Conference, November [10] A. Rae, S. Parameswaran. Voltage Reduction of Application-Specific Heterogeneous Multiprocessor Systems for Power Minimization. ASP-DAC, 2000, pp [11] T. Pering, T. Burd, R. Brodersen. The simulation and evaluation of dynamic voltage scaling algorithms. International Symposium on Low Power Electronics and Design. Aug [12] P. Petrov, A. Orailoglu. Towards Effective Embedded Processors in Codesigns: Customizable Partitioned Caches. International Workshop on Hardware/Software Codesign, [13] C. Su, A.M. Despain. Cache Design Trade-offs for Power and Performance Optimization: A Case Study. International Symposium on Low Power Electronics and Design,

12 [14] C. Zhang, F. Vahid and W. Najjar. Energy Benefits of a Configurable Line Size Cache for Embedded Systems. Proceedings of the IEEE Computer Society Annual Symposium on VLSI, [15] W. Tang, A. Veidenbaum and R. Gupta. Architectural Adaptation for Power and Performance. Proceedings of the International Conference on Supercomputing, pp , [16] S. Dropsho, et al. Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, [17] T. Givargis and F. Vahid. Platune: A Tuning Framework for System-on-a-chip Platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, v21, n11, pp ,

Dynamic Voltage and Cache Reconfiguration for Low Power

Dynamic Voltage and Cache Reconfiguration for Low Power Dynamic Voltage and Cache Reconfiguration for Low Power Andre Costi Nacul and Tony Givargis Department of Computer Science University of California, Irvine Center for Embedded Computer Systems {nacul,

More information

Optimal Cache Organization using an Allocation Tree

Optimal Cache Organization using an Allocation Tree Optimal Cache Organization using an Allocation Tree Tony Givargis Technical Report CECS-2-22 September 11, 2002 Department of Information and Computer Science Center for Embedded Computer Systems University

More information

Analytical Design Space Exploration of Caches for Embedded Systems

Analytical Design Space Exploration of Caches for Embedded Systems Technical Report CECS-02-27 Analytical Design Space Exploration of s for Embedded Systems Arijit Ghosh and Tony Givargis Technical Report CECS-2-27 September 11, 2002 Department of Information and Computer

More information

Analytical Design Space Exploration of Caches for Embedded Systems

Analytical Design Space Exploration of Caches for Embedded Systems Analytical Design Space Exploration of Caches for Embedded Systems Arijit Ghosh and Tony Givargis Department of Information and Computer Science Center for Embedded Computer Systems University of California,

More information

Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses

Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses Reference Caching Using Unit Distance Redundant Codes for Activity Reduction on Address Buses Tony Givargis and David Eppstein Department of Information and Computer Science Center for Embedded Computer

More information

Cache Optimization For Embedded Processor Cores: An Analytical Approach

Cache Optimization For Embedded Processor Cores: An Analytical Approach Cache Optimization For Embedded Processor Cores: An Analytical Approach Arijit Ghosh and Tony Givargis Department of Computer Science Center for Embedded Computer Systems University of California, Irvine

More information

Parameterized System Design

Parameterized System Design Parameterized System Design Tony D. Givargis, Frank Vahid Department of Computer Science and Engineering University of California, Riverside, CA 92521 {givargis,vahid}@cs.ucr.edu Abstract Continued growth

More information

On the Interplay of Loop Caching, Code Compression, and Cache Configuration

On the Interplay of Loop Caching, Code Compression, and Cache Configuration On the Interplay of Loop Caching, Code Compression, and Cache Configuration Marisha Rawlins and Ann Gordon-Ross* Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL

More information

A Self-Tuning Cache Architecture for Embedded Systems

A Self-Tuning Cache Architecture for Embedded Systems A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang Department of Electrical Engineering University of California, Riverside czhang@ee.ucr.edu Abstract Memory accesses can account for

More information

Cache Configuration Exploration on Prototyping Platforms

Cache Configuration Exploration on Prototyping Platforms Cache Configuration Exploration on Prototyping Platforms Chuanjun Zhang Department of Electrical Engineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Department of Computer Science

More information

Using a Victim Buffer in an Application-Specific Memory Hierarchy

Using a Victim Buffer in an Application-Specific Memory Hierarchy Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang Depment of lectrical ngineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid Depment of Computer Science

More information

Automatic Tuning of Two-Level Caches to Embedded Applications

Automatic Tuning of Two-Level Caches to Embedded Applications Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside {ann/vahid}@cs.ucr.edu http://www.cs.ucr.edu/~vahid

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

Energy Benefits of a Configurable Line Size Cache for Embedded Systems

Energy Benefits of a Configurable Line Size Cache for Embedded Systems Energy Benefits of a Configurable Line Size Cache for Embedded Systems Chuanjun Zhang Department of Electrical Engineering University of California, Riverside czhang@ee.ucr.edu Frank Vahid and Walid Najjar

More information

Synthesis of Customized Loop Caches for Core-Based Embedded Systems

Synthesis of Customized Loop Caches for Core-Based Embedded Systems Synthesis of Customized Loop Caches for Core-Based Embedded Systems Susan Cotterell and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside {susanc, vahid}@cs.ucr.edu

More information

A Self-Tuning Cache Architecture for Embedded Systems

A Self-Tuning Cache Architecture for Embedded Systems A Self-Tuning Cache Architecture for Embedded Systems CHUANJUN ZHANG, FRANK VAHID*, and ROMAN LYSECKY University of California, Riverside *Also with the Center for Embedded Computer Systems at UC Irvine

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

Energy-efficient Phase-based Cache Tuning for Multimedia Applications in Embedded Systems

Energy-efficient Phase-based Cache Tuning for Multimedia Applications in Embedded Systems Energy-efficient Phase-based Cache Tuning for Multimedia Applications in Embedded Systems Tosiron Adegbija and Ann Gordon-Ross* Department of Electrical and Computer Engineering University of Florida,

More information

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 11, NOVEMBER Tony Givargis and Frank Vahid

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 11, NOVEMBER Tony Givargis and Frank Vahid TRANSACTIONS ON COMPUTERAIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 11, NOVEMBER 2002 1 Platune: A Tuning Framework for SystemonaChip Platforms Tony Givargis and Frank Vahid Abstract

More information

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal

More information

A Study on the Loop Behavior of Embedded Programs

A Study on the Loop Behavior of Embedded Programs A Study on the Loop Behavior of Embedded Programs Jason Villarreal, Roman Lysecky, Susan Cotterell, and Frank Vahid Department of Computer Science and Engineering University of California, Riverside Technical

More information

System-Level Exploration for Pareto-Optimal Configurations in Parameterized System-on-a-Chip (December 2002)

System-Level Exploration for Pareto-Optimal Configurations in Parameterized System-on-a-Chip (December 2002) 416 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 4, AUGUST 2002 System-Level Exploration for Pareto-Optimal Configurations in Parameterized System-on-a-Chip (December

More information

Energy-Constrained Scheduling of DAGs on Multi-core Processors

Energy-Constrained Scheduling of DAGs on Multi-core Processors Energy-Constrained Scheduling of DAGs on Multi-core Processors Ishfaq Ahmad 1, Roman Arora 1, Derek White 1, Vangelis Metsis 1, and Rebecca Ingram 2 1 University of Texas at Arlington, Computer Science

More information

A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power

A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power Frank Vahid* and Ann Gordon-Ross Department of Computer Science and Engineering University of California, Riverside http://www.cs.ucr.edu/~vahid

More information

A Reconfigurable Cache Design for Embedded Dynamic Data Cache

A Reconfigurable Cache Design for Embedded Dynamic Data Cache I J C T A, 9(17) 2016, pp. 8509-8517 International Science Press A Reconfigurable Cache Design for Embedded Dynamic Data Cache Shameedha Begum, T. Vidya, Amit D. Joshi and N. Ramasubramanian ABSTRACT Applications

More information

Reliability-Aware Co-synthesis for Embedded Systems

Reliability-Aware Co-synthesis for Embedded Systems Reliability-Aware Co-synthesis for Embedded Systems Y. Xie, L. Li, M. Kandemir, N. Vijaykrishnan, and M. J. Irwin Department of Computer Science and Engineering Pennsylvania State University {yuanxie,

More information

Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops. Lea Hwang Lee, William Moyer, John Arends

Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops. Lea Hwang Lee, William Moyer, John Arends Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications ith Small Tight Loops Lea Hang Lee, William Moyer, John Arends Instruction Fetch Energy Reduction Using Loop Caches For Loop

More information

Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ

Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ Dongkun Shin Jihong Kim School of CSE School of CSE Seoul National University Seoul National University Seoul, Korea

More information

Trace-driven System-level Power Evaluation of Systemon-a-chip

Trace-driven System-level Power Evaluation of Systemon-a-chip Trace-driven System-level Power Evaluation of Systemon-a-chip Peripheral Cores Tony D. Givargis, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside, CA 92521

More information

Hardware Software Codesign of Embedded System

Hardware Software Codesign of Embedded System Hardware Software Codesign of Embedded System CPSC489-501 Rabi Mahapatra Mahapatra - Texas A&M - Fall 00 1 Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on

More information

Synthetic Benchmark Generator for the MOLEN Processor

Synthetic Benchmark Generator for the MOLEN Processor Synthetic Benchmark Generator for the MOLEN Processor Stephan Wong, Guanzhou Luo, and Sorin Cotofana Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology,

More information

Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware

Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering, University of California, Riverside {ann/vahid}@cs.ucr.edu,

More information

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b 1 School of

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Low-Power Data Address Bus Encoding Method

Low-Power Data Address Bus Encoding Method Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,

More information

Lossless Compression using Efficient Encoding of Bitmasks

Lossless Compression using Efficient Encoding of Bitmasks Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA

More information

Synergistic Integration of Dynamic Cache Reconfiguration and Code Compression in Embedded Systems *

Synergistic Integration of Dynamic Cache Reconfiguration and Code Compression in Embedded Systems * Synergistic Integration of Dynamic Cache Reconfiguration and Code Compression in Embedded Systems * Hadi Hajimiri, Kamran Rahmani, Prabhat Mishra Department of Computer & Information Science & Engineering

More information

Intra-Task Dynamic Cache Reconfiguration *

Intra-Task Dynamic Cache Reconfiguration * Intra-Task Dynamic Cache Reconfiguration * Hadi Hajimiri, Prabhat Mishra Department of Computer & Information Science & Engineering University of Florida, Gainesville, Florida, USA {hadi, prabhat}@cise.ufl.edu

More information

Low-Cost Embedded Program Loop Caching - Revisited

Low-Cost Embedded Program Loop Caching - Revisited Low-Cost Embedded Program Loop Caching - Revisited Lea Hwang Lee, Bill Moyer*, John Arends* Advanced Computer Architecture Lab Department of Electrical Engineering and Computer Science University of Michigan

More information

ICS 180 Spring Embedded Systems. Introduction: What are Embedded Systems and what is so interesting about them?

ICS 180 Spring Embedded Systems. Introduction: What are Embedded Systems and what is so interesting about them? ICS 180 Spring 1999 Embedded Systems Introduction: What are Embedded Systems and what is so interesting about them? A. Veidenbaum Information and Computer Science University of California, Irvine. Outline

More information

Instruction-Based System-Level Power Evaluation of System-on-a-Chip Peripheral Cores

Instruction-Based System-Level Power Evaluation of System-on-a-Chip Peripheral Cores 856 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 6, DECEMBER 2002 Instruction-Based System-Level Power Evaluation of System-on-a-Chip Peripheral Cores Tony Givargis, Associate

More information

Frequent Loop Detection Using Efficient Non-Intrusive On- Chip Hardware

Frequent Loop Detection Using Efficient Non-Intrusive On- Chip Hardware Frequent Loop Detection Using Efficient Non-Intrusive On- Chip Hardware Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering, University of California, Riverside {ann/vahid}@cs.ucr.edu,

More information

Area-Efficient Error Protection for Caches

Area-Efficient Error Protection for Caches Area-Efficient Error Protection for Caches Soontae Kim Department of Computer Science and Engineering University of South Florida, FL 33620 sookim@cse.usf.edu Abstract Due to increasing concern about various

More information

An Approach for Adaptive DRAM Temperature and Power Management

An Approach for Adaptive DRAM Temperature and Power Management IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 An Approach for Adaptive DRAM Temperature and Power Management Song Liu, Yu Zhang, Seda Ogrenci Memik, and Gokhan Memik Abstract High-performance

More information

1 Dynamic Cache Reconfiguration for Soft Real-Time Systems

1 Dynamic Cache Reconfiguration for Soft Real-Time Systems Dynamic Cache Reconfiguration for Soft Real-Time Systems WEIXUN WANG, University of Florida PRABHAT MISHRA, University of Florida ANN GORDON-ROSS, University of Florida In recent years, efficient dynamic

More information

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors

Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Reducing Power Consumption for High-Associativity Data Caches in Embedded Processors Dan Nicolaescu Alex Veidenbaum Alex Nicolau Dept. of Information and Computer Science University of California at Irvine

More information

Integrating MRPSOC with multigrain parallelism for improvement of performance

Integrating MRPSOC with multigrain parallelism for improvement of performance Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,

More information

A hardware/software partitioning and scheduling approach for embedded systems with low-power and high performance requirements

A hardware/software partitioning and scheduling approach for embedded systems with low-power and high performance requirements A hardware/software partitioning and scheduling approach for embedded systems with low-power and high performance requirements Javier Resano, Daniel Mozos, Elena Pérez, Hortensia Mecha, Julio Septién Dept.

More information

Computer Sciences Department

Computer Sciences Department Computer Sciences Department SIP: Speculative Insertion Policy for High Performance Caching Hongil Yoon Tan Zhang Mikko H. Lipasti Technical Report #1676 June 2010 SIP: Speculative Insertion Policy for

More information

Energy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards

Energy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards Energy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards U. Neffe, K. Rothbart, Ch. Steger, R. Weiss Graz University of Technology Inffeldgasse 16/1 8010 Graz, AUSTRIA {neffe, rothbart,

More information

PACT: Priority-Aware Phase-based Cache Tuning for Embedded Systems

PACT: Priority-Aware Phase-based Cache Tuning for Embedded Systems PACT: Priority-Aware Phase-based Cache Tuning for Embedded Systems Sam Gianelli and Tosiron Adegbija Department of Electrical & Computer Engineering University of Arizona, Tucson, AZ, USA Email: {samjgianelli,

More information

MIPS) ( MUX

MIPS) ( MUX Memory What do we use for accessing small amounts of data quickly? Registers (32 in MIPS) Why not store all data and instructions in registers? Too much overhead for addressing; lose speed advantage Register

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

Tuning of Cache Ways and Voltage for Low-Energy Embedded System Platforms *

Tuning of Cache Ways and Voltage for Low-Energy Embedded System Platforms * Design Automation for Embedded Systems, 7, 35^51, 2002. ß 2002 Kluwer Academic Publishers, Boston. Manufactured inthe Netherlands. Tuning of Cache Ways and Voltage for Low-Energy Embedded System Platforms

More information

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 01 Introduction Welcome to the course on Hardware

More information

SF-LRU Cache Replacement Algorithm

SF-LRU Cache Replacement Algorithm SF-LRU Cache Replacement Algorithm Jaafar Alghazo, Adil Akaaboune, Nazeih Botros Southern Illinois University at Carbondale Department of Electrical and Computer Engineering Carbondale, IL 6291 alghazo@siu.edu,

More information

Model-based Analysis of Event-driven Distributed Real-time Embedded Systems

Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Model-based Analysis of Event-driven Distributed Real-time Embedded Systems Gabor Madl Committee Chancellor s Professor Nikil Dutt (Chair) Professor Tony Givargis Professor Ian Harris University of California,

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance

Chapter 1. Computer Abstractions and Technology. Lesson 2: Understanding Performance Chapter 1 Computer Abstractions and Technology Lesson 2: Understanding Performance Indeed, the cost-performance ratio of the product will depend most heavily on the implementer, just as ease of use depends

More information

Predictive Thermal Management for Hard Real-Time Tasks

Predictive Thermal Management for Hard Real-Time Tasks Predictive Thermal Management for Hard Real-Time Tasks Albert Mo Kim Cheng and Chen Feng Real-Time System Laboratory, Department of Computer Science University of Houston, Houston, TX 77204, USA {cheng,

More information

Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance

Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance Wei Zhang and Yiqiang Ding Department of Electrical and Computer Engineering Virginia Commonwealth University {wzhang4,ding4}@vcu.edu

More information

Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors

Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Siew-Kei Lam Centre for High Performance Embedded Systems, Nanyang Technological University, Singapore (assklam@ntu.edu.sg)

More information

Hardware Software Codesign of Embedded Systems

Hardware Software Codesign of Embedded Systems Hardware Software Codesign of Embedded Systems Rabi Mahapatra Texas A&M University Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on Codesign of Embedded System

More information

Hardware/Software Partitioning and Scheduling of Embedded Systems

Hardware/Software Partitioning and Scheduling of Embedded Systems Hardware/Software Partitioning and Scheduling of Embedded Systems Andrew Morton PhD Thesis Defence Electrical and Computer Engineering University of Waterloo January 13, 2005 Outline 1. Thesis Statement

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Tiny Instruction Caches For Low Power Embedded Systems

Tiny Instruction Caches For Low Power Embedded Systems Tiny Instruction Caches For Low Power Embedded Systems ANN GORDON-ROSS, SUSAN COTTERELL and FRANK VAHID University of California, Riverside Instruction caches have traditionally been used to improve software

More information

A First-step Towards an Architecture Tuning Methodology for Low Power

A First-step Towards an Architecture Tuning Methodology for Low Power A First-step Towards an Architecture Tuning Methodology for Low Power Greg Stitt, Frank Vahid*, Tony Givargis Department of Computer Science and Engineering University of California, Riverside {gstitt,

More information

A One-Shot Configurable-Cache Tuner for Improved Energy and Performance

A One-Shot Configurable-Cache Tuner for Improved Energy and Performance A One-Shot Configurable-Cache Tuner for Improved Energy and Performance Ann Gordon-Ross 1, Pablo Viana 2, Frank Vahid 1,3, Walid Najjar 1, Edna Barros 4 1 Department of Computer Science and Engineering

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi. Lecture - 10 System on Chip (SOC)

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi. Lecture - 10 System on Chip (SOC) Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 10 System on Chip (SOC) In the last class, we had discussed digital signal processors.

More information

Design Space Exploration Using Parameterized Cores

Design Space Exploration Using Parameterized Cores RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR Design Space Exploration Using Parameterized Cores Ian D. L. Anderson M.A.Sc. Candidate March 31, 2006 Supervisor: Dr. M. Khalid 1 OUTLINE

More information

Multi-Level Computing Architectures (MLCA)

Multi-Level Computing Architectures (MLCA) Multi-Level Computing Architectures (MLCA) Faraydon Karim ST US-Fellow Emerging Architectures Advanced System Technology STMicroelectronics Outline MLCA What s the problem Traditional solutions: VLIW,

More information

System-Wide Leakage-Aware Energy Minimization using Dynamic Voltage Scaling and Cache Reconfiguration in Multitasking Systems

System-Wide Leakage-Aware Energy Minimization using Dynamic Voltage Scaling and Cache Reconfiguration in Multitasking Systems IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. XX, NO. NN, MMM YYYY 1 System-Wide Leakage-Aware Energy Minimization using Dynamic Voltage Scaling and Cache Reconfiguration in Multitasking

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

Computer-Aided Recoding for Multi-Core Systems

Computer-Aided Recoding for Multi-Core Systems Computer-Aided Recoding for Multi-Core Systems Rainer Dömer doemer@uci.edu With contributions by P. Chandraiah Center for Embedded Computer Systems University of California, Irvine Outline Embedded System

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

SACR: Scheduling-Aware Cache Reconfiguration for Real- Time Embedded Systems

SACR: Scheduling-Aware Cache Reconfiguration for Real- Time Embedded Systems 2009 22nd International Conference on VLSI Design SACR: Scheduling-Aware Cache Reconfiguration for Real- Time Embedded Systems Weixun Wang and Prabhat Mishra Department of Computer and Information Science

More information

DESIGN AND IMPLEMENTATION OF 8X8 DRAM MEMORY ARRAY USING 45nm TECHNOLOGY

DESIGN AND IMPLEMENTATION OF 8X8 DRAM MEMORY ARRAY USING 45nm TECHNOLOGY DESIGN AND IMPLEMENTATION OF 8X8 DRAM MEMORY ARRAY USING 45nm TECHNOLOGY S.Raju 1, K.Jeevan Reddy 2 (Associate Professor) Digital Systems & Computer Electronics (DSCE), Sreenidhi Institute of Science &

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

A Versatile Instrument for Analyzing and Testing the Interfaces of Peripheral Devices

A Versatile Instrument for Analyzing and Testing the Interfaces of Peripheral Devices Reprint A Versatile Instrument for Analyzing and Testing the Interfaces of Peripheral Devices P. Savvopoulos, M. Varsamou and Th. Antonakopoulos The 3rd International Conference on Systems, Signals & Devices

More information

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,

More information

A New Approach to Determining the Time-Stamping Counter's Overhead on the Pentium Pro Processors *

A New Approach to Determining the Time-Stamping Counter's Overhead on the Pentium Pro Processors * A New Approach to Determining the Time-Stamping Counter's Overhead on the Pentium Pro Processors * Hsin-Ta Chiao and Shyan-Ming Yuan Department of Computer and Information Science National Chiao Tung University

More information

Outline Marquette University

Outline Marquette University COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations

More information

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA

Outline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra

More information

Concepts for Model Compilation in Hardware/Software Codesign

Concepts for Model Compilation in Hardware/Software Codesign Concepts for Model Compilation in Hardware/Software Codesign S. Schulz, and J.W. Rozenblit Dept. of Electrical and Computer Engineering The University of Arizona Tucson, AZ 85721 USA sschulz@ece.arizona.edu

More information

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES Shashikiran H. Tadas & Chaitali Chakrabarti Department of Electrical Engineering Arizona State University Tempe, AZ, 85287. tadas@asu.edu, chaitali@asu.edu

More information

CHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER

CHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER 73 CHAPTER 7 IMPLEMENTATION OF DYNAMIC VOLTAGE SCALING IN LINUX SCHEDULER 7.1 INTRODUCTION The proposed DVS algorithm is implemented on DELL INSPIRON 6000 model laptop, which has Intel Pentium Mobile Processor

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics

Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics Self-Adaptive FPGA-Based Image Processing Filters Using Approximate Arithmetics Jutta Pirkl, Andreas Becher, Jorge Echavarria, Jürgen Teich, and Stefan Wildermann Hardware/Software Co-Design, Friedrich-Alexander-Universität

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two

Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Bushra Ahsan and Mohamed Zahran Dept. of Electrical Engineering City University of New York ahsan bushra@yahoo.com mzahran@ccny.cuny.edu

More information

Coarse Grained Reconfigurable Architecture

Coarse Grained Reconfigurable Architecture Coarse Grained Reconfigurable Architecture Akeem Edwards July 29 2012 Abstract: This paper examines the challenges of mapping applications on to a Coarsegrained reconfigurable architecture (CGRA). Through

More information

POSIX-Compliant Portable Code Synthesis for Embedded Systems

POSIX-Compliant Portable Code Synthesis for Embedded Systems POSIX-Compliant Portable Code Synthesis for Embedded Systems Andre Costi Nacul, Siddharth Choudhuri, and Tony Givargis Department of Computer Science University of California, Irvine Center for Embedded

More information

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor

More information

New Advances in Micro-Processors and computer architectures

New Advances in Micro-Processors and computer architectures New Advances in Micro-Processors and computer architectures Prof. (Dr.) K.R. Chowdhary, Director SETG Email: kr.chowdhary@jietjodhpur.com Jodhpur Institute of Engineering and Technology, SETG August 27,

More information

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit P Ajith Kumar 1, M Vijaya Lakshmi 2 P.G. Student, Department of Electronics and Communication Engineering, St.Martin s Engineering College,

More information

Self-adaptability in Secure Embedded Systems: an Energy-Performance Trade-off

Self-adaptability in Secure Embedded Systems: an Energy-Performance Trade-off Self-adaptability in Secure Embedded Systems: an Energy-Performance Trade-off N. Botezatu V. Manta and A. Stan Abstract Securing embedded systems is a challenging and important research topic due to limited

More information

Exploring the Probabilistic Design Space of Multimedia Systems

Exploring the Probabilistic Design Space of Multimedia Systems Exploring the Probabilistic Design Space of Multimedia Systems Shaoxiong Hua, Gang Qu, and Shuvra S. Bhattacharyya Electrical and Computer Engineering Department and Institute for Advanced Computer Studies

More information