A hardware/software partitioning and scheduling approach for embedded systems with low-power and high performance requirements

Size: px
Start display at page:

Download "A hardware/software partitioning and scheduling approach for embedded systems with low-power and high performance requirements"

Transcription

1 A hardware/software partitioning and scheduling approach for embedded systems with low-power and high performance requirements Javier Resano, Daniel Mozos, Elena Pérez, Hortensia Mecha, Julio Septién Dept. de Arquitectura de Computadores, Facultad de Informática, UCM, Madrid {javier1, mozos, eperez, horten, Abstract. Hardware/software (hw/sw) partitioning largely affects the system cost, performance, and power consumption. Most of the previous hw/sw partitioning approaches are focused on either optimising the hw area, or the performance. Thus, they ignore the influence of the partitioning process on the energy consumption. However, during this process the designer still has the maximum flexibility, hence, it is clearly the best moment to analyse the energy consumption. We have developed a new hw/sw partitioning and scheduling tool that reduces the energy consumption of an embedded system while meeting high performance constraints. We have applied it to two current multimedia applications saving up to 30% of the system energy without reducing the performance. 1 Introduction Low-power has become one of the major design concerns. First of all, the designer must guarantee that his design does not exceed the power constraints of the target platform, since it will generate heating problems. Moreover, due to the proliferation of portable, battery-dependent devices, low-energy consumption has become one of the key features for the success of a design. The current trend for portable embedded systems is to create heterogeneous systems, with one or more low-power processors, some additional hardware (hw) logic (ASICs and/or FPGAs), and some memory hierarchy. Current technologies allow creating the whole system in a single chip (SoC). One of the most important steps to carry out in order to implement an application over such a system is to partition the application functionality among the different processing elements. This process drastically influences both the energy consumption and performance of the system. Figure 1 presents a simple example where the partitioning process can lead to energy savings. If the designer selects the fastest solution (sch1), the execution time is 139 time-units and the energy 21 energy-units. However, if the deadline for the application is 150, the designer can try to find a slower solution that meets this constraint while consuming less energy. In this case sch2 would be

2 selected since its execution time is less than the deadline and its energy consumption is 16. Thus, the energy consumption decreases 25%. PE1 PE2 T E T E Node Node S c h 1 S c h 2 P E 1 P E 2 N 1 N 2 P E 1 N 2 P E 2 N 1 D eadline Fig. 1. Partitioning example. Two nodes must be partitioned between two Processing Elements (PE). T means time. E means energy. Sch1 and Sch2 are two selected solutions. Since our partitioning tool is still under construction, currently we just support a software (sw) processor, an FPGA, a system bus and one or several memory blocks. However, partitioning an application to such a system is still a NP-complete problem. Moreover, there are several existing prototype platforms as well as commercial platforms that follow this scheme providing a sw processor and some reconfigurable hw resources e.g. Garp [1], Morphosys [2] and the Virtex II-Pro XC2VP4 and VP7 [3]. The system bus and the memory blocks require a careful study, since both elements can significantly affect the system performance and energy consumption, especially because both hw and sw performance are improving much faster than communication channels and memories do. In order to estimate accurately the impact of the memories and buses in the system performance and energy consumption their physical features must be taken into account. Ideally the vendor should provide either estimators or at least time and power models, but unfortunately, this is not always the case, then, time and power models are needed, some examples of existing useful models are [4] for USB, and PCI buses (just timing considerations), and [5,6] for memories. However, even after accurately estimating all the tasks, communications and memory accesses, computing the overall execution time it is not trivial, since it involves a scheduling that must take into account data and control dependencies as well as the accesses to the shared resources. Thus, we have developed a tool that schedules the tasks and the accesses to the system bus, and the shared memories during the partitioning process. This scheduling is the only way to accurately evaluate a solution, since otherwise, it is impossible to determine the impact of the communications or the delays introduced due to the conflicts on the accesses to shared resources (In [7] this problem is explained in detail). In addition, this scheduling prevents the need for arbitration logic in the bus controllers. Since the scheduler is integrated in a partitioning tool that must evaluate a great amount of different partitions one of our major concerns was to achieve near-optimal scheduling without increasing significantly the execution time of the partitioning tool. The rest of the paper is structured as follows: section 2 presents an overview of the related work; section 3 explains in detail the format of the initial specification for our partitioning tool; section 4 describes the cost function that steers the design space exploration; sections 5, 6, and 7 explain how the energy, execution-time and hardware area are estimated for a given partitioning. Section 8 presents the experimental

3 results and finally section 9 remarks some conclusions as well as future work to be done. 2. Related Work Hardware/software partitioning is a very well known problem. Several partitioning tools have been proposed in literature (e.g. [8, 9]). Most of these previous approaches accomplish the partitioning problem at a high abstraction level, adding the platform low-level details and scheduling the tasks on the processing elements (PEs) in a subsequent step called co-synthesis. Moreover, even during co-synthesis often the communications between different PEs are neglected, thus, these communications are included in a following step called communication synthesis. After these three steps the resultant solution is co-simulated, and likely, the results will not be the expected, so the process will have to start again with another solution. The main problem of this approach is that some of the features neglected during partitioning are critical for the system performance. Thus, it is almost impossible to found near-optimal solutions when communications are neglected during the partitioning process. Another lack of most of the existing approaches is that they just consider either hardware area or execution time minimization. However, as mentioned in the introduction, currently minimizing the energy consumption is often one of the more important designer concerns. Recently several scheduling and/or partitioning approaches for multiprocessors have been presented. They attempt to minimize the system consumption either applying Dynamic Voltage Scheduling (DVS) or applying different supply voltages to each processor; some of the more relevant are [10, 11,12]. DVS techniques schedule the voltage supplied to each processor during its execution. This is a powerful way to achieve power savings, since in CMOS technologies the power consumption decreases quadratically with the power supply. However, currently there is not support for DVS in most of the commercial processors, and to the best of our knowledge, there is not support at all for DVS in FPGAs platforms. Hence, nowadays, this is not a feasible approach for hw/sw co-design. [13] is the first hw/sw partitioning tool for low-power that we have found, it starts from a full sw implementation in a microprocessor ( P), and reduces the energy consumption migrating part of the functionality to hw, the energy savings are achieved turning off the P (in addition clock gating is applied in the hw partition). This approach does not perform a full partitioning design exploration. Moreover, it expects some data for the designer, like the number of ALUs, multipliers, shifters, etc., based on some previous designer experience, so the results of the partitioning will highly depend on the designer capabilities. PAP [14] is a recent partitioning tool that attempts to minimize the hardware area while meeting the timing and power constraints, thus they do not minimize the overall energy consumption but take care that infeasible solutions (those that consume more power than the allowed by the platform) will not be selected. Finally, in [15] a scheduling technique for dynamically reconfigurable FPGAs with support for partial reconfiguration is presented. The scheduling process attempts to minimize the energy consumption optimising the

4 number of partial reconfigurations. However, this scheduling is carried out after the partitioning process, hence, most of the flexibility is lost since the partition has been previously fixed. According to this paper, currently, FPGAs dynamic reconfiguration is extremely power inefficient, since in their experiments up to 50% of the FPGA energy consumption was due to these reconfigurations. Although there is substantial work spent in partitioning and scheduling for lowpower, we believe that our approach is the first one that accomplishes a deep design space exploration of the partitioning and scheduling process for hardware/software low-power embedded systems, attempting to meet the real-time timing constraints while minimising the overall system energy consumption, and including the system bus, and memories in the performance and energy consumption estimations. 3 Initial Specification The initial specification is described as a Directed Acyclic Graph (DAG), where each node represents a computational task, or an access to the shared memory, and the edges correspond to dependencies among the nodes. Three different dependencies are considered, namely: communication, internal, and temporal dependencies. A communication dependency edge (CDE) either connects two nodes of PEs, or corresponds to a memory access; therefore, it represents a data transfer that must be carried out using the system bus. An internal dependency edge (IDE) connects two nodes allocated in the same PE, thus, it represents a data transfer, but in this case there is no access to the system-bus. A temporal dependency edge (TDE) represents a dependency between two nodes in the same PE that has been imposed by the scheduler. Each node of the graph must be characterized by its execution, power and area estimations for every possible platform. Each CDE is tagged with the amount of data to be transferred, and the execution time and energy consumption estimations. These estimations must include both the access to the system bus, and when needed, the access to the shared memory. 4. Cost Function The cost function of a codesign system typically includes different elements like the hw area, the execution time, the energy consumption, or the amount of communications. One of the more difficult issues when designing a partitioning system is how to mix all these completely different magnitudes into a cost function that should be able to lead the design space exploration in a near-optimal fashion. In literature several codesign approaches can be found where cost functions are built like the following: n n n a* * * i 0 i t i 0 i e i 0 i F c Area c Time c Energy (1)

5 Thus, for a given partition, each node of the DAG is characterizes with a number for every magnitude considered (three in this example). The cost function is then easily computed adding these numbers and multiplying them by some coefficients. Often, the user must fix these coefficients, thus, he has to identify the equivalence between a second, a Joule, and a mm 2. There is not an evident criteria about how to fix these coefficients, therefore these heterogeneous cost functions often lead to inefficient design-space explorations. In order to avoid this problem, our partitioning tool is led by a straightforward cost function that can be identified either with the energy consumption, the hw area or the execution time. Thus, the tool supports three different design-space explorations; the first one attempts to find the solution that consumes less energy and meets three restrictions, namely, maximum execution time, maximum hardware area and maximum power consumption restrictions. The first restriction guarantees that the application meets its real-time deadline; the second guarantees that there are enough hw resources to implement the hw partition; and the third restriction prevents the heating problems. If the system is not battery-dependent, the cost function can be identified either with the execution time, or with the area. When the execution time is selected as cost function, the tool attempts to find the fastest solution that meets the given area and power restrictions, otherwise, when area is selected, the tool will try to find the solution with less hw area that meets the execution time and power restrictions. It is up to the designer to decide which one is the goal of the design-space exploration. Table 1 shows all the possibilities. Table 1. Cost functions and restrictions that can steer the design space exploration Available Cost Functions Energy Time Area Available Restrictions Time Energy Area Power 5. Energy Consumption Estimations First of all, each node and each edge of the DAG must be characterized with its energy consumption for every possible processing element. These estimations must be carried out using the tools provided by the vendors if possible; otherwise generic power models must be applied. In addition to the energy consumption due to the nodes execution (including those nodes that represent the accesses to the shared memory) and communications, we assume that the PEs also consume energy when they are idle. If the PE is a processor, the power consumption in the idle state is commonly provided in the data sheet. The energy can be computed multiplying the power by the idle time. The same approach is used for the memory blocks. If the PE is implemented in the FPGA and clock gating is applied to it, the power that consumes when is idle will be just the device quiescent power. Otherwise, if clock gating is not implemented the logic dissipates more power apart from the quiescent power,

6 since the clock signal continues switching. This case is estimated considering the power consumption of the circuit when the toggle rate of the inputs is set to 0, thus we assume that when the circuit is idle all the inputs are fixed, if this is not correct, a proper toggle rate should be estimated profiling the system. Besides the energy considerations, the partitioning tool must check if a given partition meets the power dissipation constraints of the platform. To this end, the average power consumption of each node and each communication is included in the DAG. 6. Execution Time Estimations The execution time estimator, receive as input a given partitioning where the execution time of each node and each access to the system-bus have been previously estimated (we assume cycle accurate estimations). Nodes representing accesses to the shared memory have always a 0 time-units execution time assigned, since the latency of accessing the shared memory is considered as part of the communication delay. With this input the estimator schedules the execution of every node as well as all the accesses to the system bus. This scheduling is a NP-complete problem, however the estimation must be done as fast as possible since it has to be computed for every explored partition. Thus, we have developed a fast heuristic, based on list scheduling techniques, which provides a near-optimal scheduling with a low computational complexity (O(N 2 )). Fig. 2 depicts the scheduling pseudo-code. A) Assign a weight to each node. B) Choose the execution order for the SW nodes. C) Recalculate the weights taking into account the new dependencies. D) Schedule those nodes that are not waiting for a communication. E) While there is a communication waiting for execution do: E1) Choose one communication and schedule it. E2) Schedule those nodes that are not waiting for a communication Fig. 2. Scheduling heuristic pseudo-code Step A: The weights are used to steer the scheduling process trying to minimize the global execution time. The weight of a node is the maximum time-distance from that node to the end of the execution in the initial graph. This distance is computed carrying out an ALAP scheduling that takes into account all the dependencies. Thus, those nodes, which are in the DAG critical path, have higher weights. Step B and C: The initial DAG allows parallel execution between their nodes, but those nodes assigned to sw must be executed sequentially. The sw execution order is decided sorting the nodes by their weights. To impose this order new TDE dependencies are added to the initial DAG. It is easy to prove that this sw execution order does not allow the new dependencies to create cycles in the graph. Since these new dependencies can significantly affect the system performance, a new weight is assigned to each node. These weights are computed in the same way that in step A, but considering the new dependencies.

7 Steps D and E: An enhanced list-scheduling heuristic that attempts to minimize the global execution time has been developed for the scheduling process. This heuristic decides when each node and each communication is executed, assigning to them a t start and a t end times. The motivation of the heuristic is to detect the system bus access conflicts and the delays created by them. The scheduling starts assigning t start = 0, and t end =t ex to the first node, where t ex is its execution time in the partition where it has been assigned. Then the algorithm continues scheduling the successors of the first node. A greedy policy is followed to schedule nodes while there is no need for hw/sw communications. When a scheduled node requests a hw/sw communication with another node this request is stored in a list. Once all the nodes that do not need a hw/sw communication have been scheduled, one of the requested communications is selected and scheduled. There are two selecting criteria (E1): If at a given time t the system bus is not carrying out any communication and there is just one previous request, the communication channel is assigned to this request, and the bus is tagged as busy until this communication ends. Otherwise, if there is more than one request, the one with the greatest weight will be selected. The weight of a communication is computed as the weight of the destination node plus the time needed to execute the communication. Once the selected communication has been scheduled the graph is examined (E2) and all the nodes that can start their execution without waiting for another HW/SW communication are also scheduled. The loop continues until all the communications are scheduled. 7. Area estimation We apply the following equation to estimate the area needed to implement the nodes assigned to hw in the FPGA: Area N 1 A A A A (2) i 0 i driver control storage A i is the area of the node i. A driver is the area needed to implement the communication driver. A i and A driver are estimated from a core library. When a new core is added to the library its area is estimated using a synthesis tool. A control is the area needed for the control logic that schedules the communications. In this approach the scheduling control is assumed by a state machine, so the area requested is estimated as a function of the number of communications. A storage is the area needed for storing the data to transfer until a communication is executed. This storage space is computed during the communication scheduling. During this process a record keeps the maximum storage space required.

8 8. Results and Analysis All the estimators has been integrated into a partitioning tool based on genetic algorithms (GA) [16]. This tool creates a random initial population of valid solutions. A solution is valid if meets the given area, time and power constraints. Invalid solutions are rejected to save computational time, as well as to prevent the algorithm from converging to a non-valid area. During the design space exploration solutions evolve by reproducing themselves, generating new offspring of solutions. The crossover and the mutation operators carry out the reproduction process. Population is kept constant deleting the solution surplus. The 80% of the survivors are selected choosing the best solutions, and the 20% remaining is randomly selected in order to prevent a premature convergence. The designer can establish the population and the crossover and mutation probabilities. In addition, he can also select the cost function (between time, energy, and area) and fix the area, time and power restrictions. The partitioning tool allows the designer to select between two different scheduling modes, the first implements our heuristic while the second carries out a full search of the design space applying a branch&bound (b&b) algorithm, hence this mode guarantees that always the best schedule is found. As a first experiment, in order to validate our heuristic, we have run the partitioning tool in these two different modes for a set of 100 randomly generated DAGs. These DAGs were created using the TGFF tool [17], and their sizes are limited to any number between 10 and 20 nodes (for greater sizes it is not feasible to apply the b&b algorithm). The results obtained show that the b&b algorithm finds slightly better schedulings (on average 10% less execution time), but at the price of increasing 800 times the computational time needed to carry out the partitioning process (which it is reasonable since it performs a full search of the design space). These results confirm that our scheduling heuristic finds near-optimum schedulings with an almost negligible overhead. In this experiment the average time needed to schedule one of the DAGs with our heuristic was less than 2.5 s using a Pentium II running at 350 MHz. In our second experiment we attempt to compare the results obtained when using the energy and the execution time as cost function. To this end, we have analyzed two current multimedia applications, namely a JPEG decoder and a pattern recognition application that compute the Hough Transform of a matrix of pixels in order to find simple geometric patterns. The Hough Transform is commonly applied in robotics and astronomical data analysis. It is very simple to reduce the energy consumption when it is also possible to reduce the performance. Therefore, in this experiment we check whether it is possible to reduce the energy consumption while keeping almost the highest performance. Hence, we have run first the partitioning tool using the execution time as cost function to find the fastest solution. Then, we have rerun it using the energy instead of the time as cost function, but this time we have imposed that the solutions must be at most 10% slower than the fastest solution found in the previous step. Therefore, the tool is going to found the solution that consumes less energy while keeping almost the highest performance. For this experiment we have estimated the energy, execution time, area and power consumption of the application using the XILINX Foundation

9 5.i tool for the FPGA and the system bus, an ARM processor simulator for the sw processor and a 128 MB MICRON SRAM memory datasheet for the shared memory. Each application has been partitioned to a platform composed by a XILINX Virtex FPGA, an ARM processor running at 233 MHz, a 128 MB memory block and a system bus with 16 bit width and clocked at 33 MHz. The measurements were repeated 5 times for 5 different FPGA sizes. The results are shown in table 2. It is remarkable that we can decrease up to 30% the energy consumption (on average 17%), whereas the execution time remains almost the same (it increases less than 3% on average). Table 2. Results for the Pattern Recogniton Application (a) and the JPEG decoder (b). T1, and E1 are the execution time and the energy consumption for the fastest solution, whereas T2 and E2 correspond to the solution found using the energy as cost function. a) Pat. Rec. T1 T2 Time % E1 E2 Energy % FPGA % % FPGA % % FPGA % % FPGA % % FPGA % % Average + 2% - 15% b) JPEG T1 T2 Time % E1 E2 Energy % FPGA % % FPGA % % FPGA % % FPGA % % FPGA % % Average + 3% - 19% 9. Conclusions and Future work We have presented the first (to the best of our knowledge) hw/sw partitioning tool that can steer the design space exploration of the partitioning process to minimize the energy, the execution time or the area. In addition this is one of the few tools that accomplishes a full scheduling during the partitioning process including the accesses to the system bus and shared memories. This scheduling is the only way to accurately estimate the goodness of a given partition. We believe that this tool can be especially useful to decrease the energy consumption of a given application while meeting hard real-time constraints. Thus, we have applied our tool to two current multimedia applications, saving up to the 30% of the energy consumption, whereas the performance remains almost constant. Moreover, it must be remarked that is unimportant that the performance slightly decreases as long as the timing constraints are met.

10 Although our tool fulfills the requirements to partition an application to several existing platforms, several extensions are needed to apply it to platforms with multiple processors and more complex interconnection networks. Acknowledgements This work has been partially supported by Spanish Government research grant TIC References 1. J. R. Hauser and J. Wawrzynek, "Garp: A mips processor with a reconfigurable coprocessor," in IEEE Workshop on FPGAs for Custom Computing Machines, pp , H. Singh et al, MorphoSyS: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications, IEEE Trans. on Computers, pp , Vol. 49, No. 5, M. Gasteier, M. Munich, M. Glesner. Generation of Interconnect Topologies for Comuni cation Synthesis, DATE 98, pp K. Itoh et al., Trends in Low-Power Ram Circuits Technologies, Proc. IEEE, 83(4): , Apr M. Kamble and K. Ghose, Analytical Energy Disipation Models for Low Power Caches, Proc. Int l Sym. Low Power Electronics and Design, p. 143, Aug J. Resano et al, Analyzing Communication Overheads during Hardware/Software Partitioning, ESCODES 02, pp , R.P. Dick and N.K. Jha, CORDS: Hardware-Software Co-Synthesis of Reconfigurable Real-Time Distributed Embedded Systems, ICCAD 98, pp , J. Noguera, R.M. Badía, A HW/SW partitioning algorithm for dynamically reconfigurable architectures, DATE 01, pp , P. Yang et al., Energy-Aware Runtime Scheduling for Embedded-Multiprocessors SOCs, IEEE Journal on Design&Test of Computers, pp , G. Qu et al., Power Minimization using System-Level Partitioning of Applications with Quality of Services Requirements, Proc of Int. conf. on CAD. pp , I. Hong et al., Power Optimization of Variable-Voltage Core-Based System, IEEE Trans. on CAD of Integrated Circuits and Systems, vol. 18, no 12, pp , J. Henkel, A low power hardware/software partitioning approach for core-based embedded systems, DAC 99, pp , R. Mahapatra and P. Vijay, PAP: Power Aware Partitioning for Reconfigurable System, To be published in Proc. of HPCA Workshop 2003, feb L. Shang et al., Hw/Sw Co-synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs, ASP-DAC 02, pp , J. Holland. Adaptation in natural and artificial systems, MIT Press, R.P. Dick et al, TGFF: Task Graphs for Free, Int l Workshop HW/SW Codesign, pp , 1998

A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment

A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment JAVIER RESANO, ELENA PEREZ, DANIEL MOZOS, HORTENSIA MECHA, JULIO SEPTIÉN Departamento de Arquitectura de Computadores

More information

ARTICLE IN PRESS. Analyzing communication overheads during hardware/software partitioning

ARTICLE IN PRESS. Analyzing communication overheads during hardware/software partitioning Microelectronics Journal xx (2003) xxx xxx www.elsevier.com/locate/mejo Analyzing communication overheads during hardware/software partitioning J. Javier Resano*, M. Elena Pérez, Daniel Mozos, Hortensia

More information

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time

More information

Using Dynamic Voltage Scaling to Reduce the Configuration Energy of Run Time Reconfigurable Devices

Using Dynamic Voltage Scaling to Reduce the Configuration Energy of Run Time Reconfigurable Devices Using Dynamic Voltage Scaling to Reduce the Configuration Energy of Run Time Reconfigurable Devices Yang Qu 1, Juha-Pekka Soininen 1 and Jari Nurmi 2 1 Technical Research Centre of Finland (VTT), Kaitoväylä

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Lecture 7: Introduction to Co-synthesis Algorithms

Lecture 7: Introduction to Co-synthesis Algorithms Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today

More information

MULTI-OBJECTIVE DESIGN SPACE EXPLORATION OF EMBEDDED SYSTEM PLATFORMS

MULTI-OBJECTIVE DESIGN SPACE EXPLORATION OF EMBEDDED SYSTEM PLATFORMS MULTI-OBJECTIVE DESIGN SPACE EXPLORATION OF EMBEDDED SYSTEM PLATFORMS Jan Madsen, Thomas K. Stidsen, Peter Kjærulf, Shankar Mahadevan Informatics and Mathematical Modelling Technical University of Denmark

More information

Abstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003

Abstract. 1 Introduction. Reconfigurable Logic and Hardware Software Codesign. Class EEC282 Author Marty Nicholes Date 12/06/2003 Title Reconfigurable Logic and Hardware Software Codesign Class EEC282 Author Marty Nicholes Date 12/06/2003 Abstract. This is a review paper covering various aspects of reconfigurable logic. The focus

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Multi MicroBlaze System for Parallel Computing

Multi MicroBlaze System for Parallel Computing Multi MicroBlaze System for Parallel Computing P.HUERTA, J.CASTILLO, J.I.MÁRTINEZ, V.LÓPEZ HW/SW Codesign Group Universidad Rey Juan Carlos 28933 Móstoles, Madrid SPAIN Abstract: - Embedded systems need

More information

A Partitioning Flow for Accelerating Applications in Processor-FPGA Systems

A Partitioning Flow for Accelerating Applications in Processor-FPGA Systems A Partitioning Flow for Accelerating Applications in Processor-FPGA Systems MICHALIS D. GALANIS 1, GREGORY DIMITROULAKOS 2, COSTAS E. GOUTIS 3 VLSI Design Laboratory, Electrical & Computer Engineering

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

A Replacement Technique to Maximize Task Reuse in Reconfigurable Systems

A Replacement Technique to Maximize Task Reuse in Reconfigurable Systems A Replacement echnique to Maximize ask Reuse in Reconfigurable Systems Abstract Dynamically reconfigurable hardware is a promising technology that combines in the same device both the high performance

More information

ENERGY EFFICIENT SCHEDULING SIMULATOR FOR DISTRIBUTED REAL-TIME SYSTEMS

ENERGY EFFICIENT SCHEDULING SIMULATOR FOR DISTRIBUTED REAL-TIME SYSTEMS I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 409-414 ENERGY EFFICIENT SCHEDULING SIMULATOR FOR DISTRIBUTED REAL-TIME SYSTEMS SANTHI BASKARAN 1, VARUN KUMAR P. 2, VEVAKE B. 2 & KARTHIKEYAN A. 2 1 Assistant

More information

Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors

Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors Siew-Kei Lam Centre for High Performance Embedded Systems, Nanyang Technological University, Singapore (assklam@ntu.edu.sg)

More information

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Pilot: A Platform-based HW/SW Synthesis System

Pilot: A Platform-based HW/SW Synthesis System Pilot: A Platform-based HW/SW Synthesis System SOC Group, VLSI CAD Lab, UCLA Led by Jason Cong Zhong Chen, Yiping Fan, Xun Yang, Zhiru Zhang ICSOC Workshop, Beijing August 20, 2002 Outline Overview The

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

Computer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01

Computer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01 Adapting Systems by Evolving Hardware Computer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01 Jim Torresen Group Department of Informatics University of Oslo, Norway E-mail: jimtoer@ifi.uio.no

More information

A Novel Deadlock Avoidance Algorithm and Its Hardware Implementation

A Novel Deadlock Avoidance Algorithm and Its Hardware Implementation A ovel Deadlock Avoidance Algorithm and Its Hardware Implementation + Jaehwan Lee and *Vincent* J. Mooney III Hardware/Software RTOS Group Center for Research on Embedded Systems and Technology (CREST)

More information

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study William Fornaciari Politecnico di Milano, DEI Milano (Italy) fornacia@elet.polimi.it Donatella Sciuto Politecnico

More information

Hardware/Software Codesign

Hardware/Software Codesign Hardware/Software Codesign 3. Partitioning Marco Platzner Lothar Thiele by the authors 1 Overview A Model for System Synthesis The Partitioning Problem General Partitioning Methods HW/SW-Partitioning Methods

More information

System Verification of Hardware Optimization Based on Edge Detection

System Verification of Hardware Optimization Based on Edge Detection Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

A Modified Genetic Algorithm for Process Scheduling in Distributed System

A Modified Genetic Algorithm for Process Scheduling in Distributed System A Modified Genetic Algorithm for Process Scheduling in Distributed System Vinay Harsora B.V.M. Engineering College Charatar Vidya Mandal Vallabh Vidyanagar, India Dr.Apurva Shah G.H.Patel College of Engineering

More information

A Reconfigurable Multifunction Computing Cache Architecture

A Reconfigurable Multifunction Computing Cache Architecture IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 9, NO. 4, AUGUST 2001 509 A Reconfigurable Multifunction Computing Cache Architecture Huesung Kim, Student Member, IEEE, Arun K. Somani,

More information

Hardware Software Codesign of Embedded Systems

Hardware Software Codesign of Embedded Systems Hardware Software Codesign of Embedded Systems Rabi Mahapatra Texas A&M University Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on Codesign of Embedded System

More information

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,

More information

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs

A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs A Methodology and Tool Framework for Supporting Rapid Exploration of Memory Hierarchies in FPGAs Harrys Sidiropoulos, Kostas Siozios and Dimitrios Soudris School of Electrical & Computer Engineering National

More information

Real-Time Dynamic Energy Management on MPSoCs

Real-Time Dynamic Energy Management on MPSoCs Real-Time Dynamic Energy Management on MPSoCs Tohru Ishihara Graduate School of Informatics, Kyoto University 2013/03/27 University of Bristol on Energy-Aware COmputing (EACO) Workshop 1 Background Low

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

MOGAC: A Multiobjective Genetic Algorithm for the Co-Synthesis of Hardware-Software Embedded Systems

MOGAC: A Multiobjective Genetic Algorithm for the Co-Synthesis of Hardware-Software Embedded Systems MOGAC: A Multiobjective Genetic Algorithm for the Co-Synthesis of Hardware-Software Embedded Systems Robert P. Dick and Niraj K. Jha Department of Electrical Engineering Princeton University Princeton,

More information

An adaptive genetic algorithm for dynamically reconfigurable modules allocation

An adaptive genetic algorithm for dynamically reconfigurable modules allocation An adaptive genetic algorithm for dynamically reconfigurable modules allocation Vincenzo Rana, Chiara Sandionigi, Marco Santambrogio and Donatella Sciuto chiara.sandionigi@dresd.org, {rana, santambr, sciuto}@elet.polimi.it

More information

RECONFIGURABLE computing (RC) [5] is an interesting

RECONFIGURABLE computing (RC) [5] is an interesting 730 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 7, JULY 2006 System-Level Power-Performance Tradeoffs for Reconfigurable Computing Juanjo Noguera and Rosa M. Badia Abstract

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives

More information

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,

More information

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 797- flur,chengkokg@ecn.purdue.edu

More information

A Complete Data Scheduler for Multi-Context Reconfigurable Architectures

A Complete Data Scheduler for Multi-Context Reconfigurable Architectures A Complete Data Scheduler for Multi-Context Reconfigurable Architectures M. Sanchez-Elez, M. Fernandez, R. Maestre, R. Hermida, N. Bagherzadeh, F. J. Kurdahi Departamento de Arquitectura de Computadores

More information

Scheduling tasks in embedded systems based on NoC architecture

Scheduling tasks in embedded systems based on NoC architecture Scheduling tasks in embedded systems based on NoC architecture Dariusz Dorota Faculty of Electrical and Computer Engineering, Cracow University of Technology ddorota@pk.edu.pl Abstract This paper presents

More information

Mobile Robot Path Planning Software and Hardware Implementations

Mobile Robot Path Planning Software and Hardware Implementations Mobile Robot Path Planning Software and Hardware Implementations Lucia Vacariu, Flaviu Roman, Mihai Timar, Tudor Stanciu, Radu Banabic, Octavian Cret Computer Science Department, Technical University of

More information

MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA

MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA Akash Kumar,, Shakith Fernando, Yajun Ha, Bart Mesman and Henk Corporaal Eindhoven University of Technology, Eindhoven,

More information

Hardware Software Codesign of Embedded System

Hardware Software Codesign of Embedded System Hardware Software Codesign of Embedded System CPSC489-501 Rabi Mahapatra Mahapatra - Texas A&M - Fall 00 1 Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience H. Krupnova CMG/FMVG, ST Microelectronics Grenoble, France Helena.Krupnova@st.com Abstract Today, having a fast hardware

More information

Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter

Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter M. Bednara, O. Beyer, J. Teich, R. Wanka Paderborn University D-33095 Paderborn, Germany bednara,beyer,teich @date.upb.de,

More information

Low-Power Data Address Bus Encoding Method

Low-Power Data Address Bus Encoding Method Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,

More information

Long Term Trends for Embedded System Design

Long Term Trends for Embedded System Design Long Term Trends for Embedded System Design Ahmed Amine JERRAYA Laboratoire TIMA, 46 Avenue Félix Viallet, 38031 Grenoble CEDEX, France Email: Ahmed.Jerraya@imag.fr Abstract. An embedded system is an application

More information

Adaptive Online Cache Reconfiguration for Low Power Systems

Adaptive Online Cache Reconfiguration for Low Power Systems Adaptive Online Cache Reconfiguration for Low Power Systems Andre Costi Nacul and Tony Givargis Department of Computer Science University of California, Irvine Center for Embedded Computer Systems {nacul,

More information

RED: A Reconfigurable Datapath

RED: A Reconfigurable Datapath RED: A Reconfigurable Datapath Fernando Rincón, José M. Moya, Juan Carlos López Universidad de Castilla-La Mancha Departamento de Informática {frincon,fmoya,lopez}@inf-cr.uclm.es Abstract The popularity

More information

Hardware-Software Co-Design of Embedded Reconfigurable Architectures

Hardware-Software Co-Design of Embedded Reconfigurable Architectures Hardware-Software Co-Design of Embedded Reconfigurable Architectures Yanbing Li, Tim Callahan *, Ervan Darnell **, Randolph Harr, Uday Kurkure, Jon Stockwood Synopsys Inc., 700 East Middlefield Rd. Mountain

More information

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture

Introduction Warp Processors Dynamic HW/SW Partitioning. Introduction Standard binary - Separating Function and Architecture Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic HW/SW Partitioning Initially execute application in software only 5 Partitioned application executes faster

More information

ECE 448 Lecture 15. Overview of Embedded SoC Systems

ECE 448 Lecture 15. Overview of Embedded SoC Systems ECE 448 Lecture 15 Overview of Embedded SoC Systems ECE 448 FPGA and ASIC Design with VHDL George Mason University Required Reading P. Chu, FPGA Prototyping by VHDL Examples Chapter 8, Overview of Embedded

More information

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB. Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics

More information

Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007)

Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007) In the name of God Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm Spring 2009 Instructor: Dr. Masoud Yaghini Outlines Problem Definition Modeling As A Set Partitioning

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

FeRAM Circuit Technology for System on a Chip

FeRAM Circuit Technology for System on a Chip FeRAM Circuit Technology for System on a Chip K. Asari 1,2,4, Y. Mitsuyama 2, T. Onoye 2, I. Shirakawa 2, H. Hirano 1, T. Honda 1, T. Otsuki 1, T. Baba 3, T. Meng 4 1 Matsushita Electronics Corp., Osaka,

More information

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems

Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado

More information

I. INTRODUCTION DYNAMIC reconfiguration, often referred to as run-time

I. INTRODUCTION DYNAMIC reconfiguration, often referred to as run-time IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 11, NOVEMBER 2006 1189 Integrating Physical Constraints in HW-SW Partitioning for Architectures With Partial Dynamic Reconfiguration

More information

Performance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path

Performance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path Performance Improvements of Microprocessor Platforms with a Coarse-Grained Reconfigurable Data-Path MICHALIS D. GALANIS 1, GREGORY DIMITROULAKOS 2, COSTAS E. GOUTIS 3 VLSI Design Laboratory, Electrical

More information

Optimal Cache Organization using an Allocation Tree

Optimal Cache Organization using an Allocation Tree Optimal Cache Organization using an Allocation Tree Tony Givargis Technical Report CECS-2-22 September 11, 2002 Department of Information and Computer Science Center for Embedded Computer Systems University

More information

Hardware Software Partitioning of Multifunction Systems

Hardware Software Partitioning of Multifunction Systems Hardware Software Partitioning of Multifunction Systems Abhijit Prasad Wangqi Qiu Rabi Mahapatra Department of Computer Science Texas A&M University College Station, TX 77843-3112 Email: {abhijitp,wangqiq,rabi}@cs.tamu.edu

More information

Evaluation of Runtime Task Mapping Heuristics with rsesame - A Case Study

Evaluation of Runtime Task Mapping Heuristics with rsesame - A Case Study Evaluation of Runtime Task Mapping Heuristics with rsesame - A Case Study Kamana Sigdel Mark Thompson Carlo Galuzzi Andy D. Pimentel Koen Bertels Computer Engineering Laboratory EEMCS, Delft University

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

Mapping a group of jobs in the error recovery of the Grid-based workflow within SLA context

Mapping a group of jobs in the error recovery of the Grid-based workflow within SLA context Mapping a group of jobs in the error recovery of the Grid-based workflow within SLA context Dang Minh Quan International University in Germany School of Information Technology Bruchsal 76646, Germany quandm@upb.de

More information

THIS PAPER describes algorithms to synthesize lowpower

THIS PAPER describes algorithms to synthesize lowpower 508 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 3, MARCH 2007 SLOPES: Hardware Software Cosynthesis of Low-Power Real-Time Distributed Embedded Systems With

More information

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

QUKU: A Fast Run Time Reconfigurable Platform for Image Edge Detection

QUKU: A Fast Run Time Reconfigurable Platform for Image Edge Detection QUKU: A Fast Run Time Reconfigurable Platform for Image Edge Detection Sunil Shukla 1,2, Neil W. Bergmann 1, Jürgen Becker 2 1 ITEE, University of Queensland, Brisbane, QLD 4072, Australia {sunil, n.bergmann}@itee.uq.edu.au

More information

Introduction to Embedded Systems

Introduction to Embedded Systems Introduction to Embedded Systems Outline Embedded systems overview What is embedded system Characteristics Elements of embedded system Trends in embedded system Design cycle 2 Computing Systems Most of

More information

Static Compaction Techniques to Control Scan Vector Power Dissipation

Static Compaction Techniques to Control Scan Vector Power Dissipation Static Compaction Techniques to Control Scan Vector Power Dissipation Ranganathan Sankaralingam, Rama Rao Oruganti, and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Design Space Exploration Using Parameterized Cores

Design Space Exploration Using Parameterized Cores RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR Design Space Exploration Using Parameterized Cores Ian D. L. Anderson M.A.Sc. Candidate March 31, 2006 Supervisor: Dr. M. Khalid 1 OUTLINE

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS NABEEL AL-MILLI Financial and Business Administration and Computer Science Department Zarqa University College Al-Balqa'

More information

Lossless Compression using Efficient Encoding of Bitmasks

Lossless Compression using Efficient Encoding of Bitmasks Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA

More information

Energy-Constrained Scheduling of DAGs on Multi-core Processors

Energy-Constrained Scheduling of DAGs on Multi-core Processors Energy-Constrained Scheduling of DAGs on Multi-core Processors Ishfaq Ahmad 1, Roman Arora 1, Derek White 1, Vangelis Metsis 1, and Rebecca Ingram 2 1 University of Texas at Arlington, Computer Science

More information

Verification and Validation of X-Sim: A Trace-Based Simulator

Verification and Validation of X-Sim: A Trace-Based Simulator http://www.cse.wustl.edu/~jain/cse567-06/ftp/xsim/index.html 1 of 11 Verification and Validation of X-Sim: A Trace-Based Simulator Saurabh Gayen, sg3@wustl.edu Abstract X-Sim is a trace-based simulator

More information

Synthetic Benchmark Generator for the MOLEN Processor

Synthetic Benchmark Generator for the MOLEN Processor Synthetic Benchmark Generator for the MOLEN Processor Stephan Wong, Guanzhou Luo, and Sorin Cotofana Computer Engineering Laboratory, Electrical Engineering Department, Delft University of Technology,

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

Real-Time Mixed-Criticality Wormhole Networks

Real-Time Mixed-Criticality Wormhole Networks eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks

More information

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic A Novel Design of High Speed and Area Efficient De-Multiplexer Using Pass Transistor Logic K.Ravi PG Scholar(VLSI), P.Vijaya Kumari, M.Tech Assistant Professor T.Ravichandra Babu, Ph.D Associate Professor

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

Energy Aware Optimized Resource Allocation Using Buffer Based Data Flow In MPSOC Architecture

Energy Aware Optimized Resource Allocation Using Buffer Based Data Flow In MPSOC Architecture ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

FPGA: What? Why? Marco D. Santambrogio

FPGA: What? Why? Marco D. Santambrogio FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much

More information

Delay Estimation for Technology Independent Synthesis

Delay Estimation for Technology Independent Synthesis Delay Estimation for Technology Independent Synthesis Yutaka TAMIYA FUJITSU LABORATORIES LTD. 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, JAPAN, 211-88 Tel: +81-44-754-2663 Fax: +81-44-754-2664 E-mail:

More information

A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management

A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management H. Tan and R. F. DeMara Department of Electrical and Computer Engineering University of Central Florida

More information

Systems Development Tools for Embedded Systems and SOC s

Systems Development Tools for Embedded Systems and SOC s Systems Development Tools for Embedded Systems and SOC s Óscar R. Ribeiro Departamento de Informática, Universidade do Minho 4710 057 Braga, Portugal oscar.rafael@di.uminho.pt Abstract. A new approach

More information

ECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom

ECE 4514 Digital Design II. Spring Lecture 22: Design Economics: FPGAs, ASICs, Full Custom ECE 4514 Digital Design II Lecture 22: Design Economics: FPGAs, ASICs, Full Custom A Tools/Methods Lecture Overview Wows and Woes of scaling The case of the Microprocessor How efficiently does a microprocessor

More information

System on Chip (SoC) Design

System on Chip (SoC) Design System on Chip (SoC) Design Moore s Law and Technology Scaling the performance of an IC, including the number components on it, doubles every 18-24 months with the same chip price... - Gordon Moore - 1960

More information

Real-Time Dynamic Voltage Hopping on MPSoCs

Real-Time Dynamic Voltage Hopping on MPSoCs Real-Time Dynamic Voltage Hopping on MPSoCs Tohru Ishihara System LSI Research Center, Kyushu University 2009/08/05 The 9 th International Forum on MPSoC and Multicore 1 Background Low Power / Low Energy

More information

An FPGA Based Adaptive Viterbi Decoder

An FPGA Based Adaptive Viterbi Decoder An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst Overview Introduction Objectives Background Adaptive Viterbi Algorithm Architecture

More information

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip 1 Mythili.R, 2 Mugilan.D 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information