Design Space Exploration in System Level Synthesis under Memory Constraints

Size: px
Start display at page:

Download "Design Space Exploration in System Level Synthesis under Memory Constraints"

Transcription

1 Design Space Exploration in System Level Synthesis under Constraints Radoslaw Szymanek and Krzysztof Kuchcinski Dept. of Computer and Information Science Linköping University Sweden Abstract This paper addresses the problem of component selection, task assignment and task scheduling for distributed embedded computer systems. Such systems have a large number of constraints of different nature, such as cost, execution time, memory capacity and limitations on resource usage. Previous approaches have concentrated on a specific class of requirements and thus they limit number of constraints which can be handled in the design process. This results very often in non-feasible or too expensive solutions. The system presented in this paper CLASS (Constraint Logic based System Synthesis) makes it possible to impose different design constraints and thus model the design more realistically. It is also efficient in finding good solutions or, in some cases, optimal solutions for even nontrivial problems. 1. Introduction Embedded systems are needed in more and more areas. The expectations concerning their cost, reliability and functionality grow constantly. The gap between the technology and design methods ability to address system synthesis problems also increases. Synthesis of embedded systems is usually decomposed into a chain of problems which must be solved to obtain the final solution. The first task is to decide which components should be used in the target architecture. In the next step assignment of all tasks into the selected processing units is done. The last step is scheduling defined as an assignment of the particular time slots for execution of tasks and data transmissions between processing units. All these decisions influence each other making the process of finding (near) optimal solutions for industrial designs very difficult or even impossible. The properties of the final design are determined by the quality of the solutions of these three main tasks. The complexity of designed systems and the number of This work was supported by the Foundation for Strategic Research, Integrated Electronic Systems program. heterogeneous constraints imposed on these designs increase therefore makes the synthesis challenging. The development of new modeling techniques and heuristics to address this challenge is needed. The aim of this development is to explore as much as possible of the whole design space while not being trapped locally. Since the complexity grows there is a shift from single processor systems to multiprocessor heterogeneous systems. These systems have a potential to achieve superior results in terms of cost and performance over homogenous systems since they can match the problem more closely. Multiprocessor systems, considered in this paper, consist of three types of units. Processing units, such as processor cores or ASIC s, belong to the first group. Next group consists of the communication devices, such as buses and links, used to transfer data between processing units. The last but not the least important is a memory, which can be divided into at least two groups, according to its usage. The data memory is used to store temporary data which are produced and consumed during the execution of an algorithm. The need for this memory can dramatically vary and does not depend directly on the choice of the processing unit. The second group is the code memory, for which the requirement is fixed during the task execution. Assignment of a task to a different processor can result in change of code and data memory requirements. The memory constraints are difficult to model but neglecting them can lead to a considerable waste of memory components. The main contribution of this paper is the inclusion of memory constraints in the design process of distributed embedded systems. CLASS uses the Constraint Logic Programming (CLP) paradigm. The optimization methods based on CLP have several advantages over other approaches. The CLP framework provides elegant formalism for modeling system level synthesis problems. The already defined models can be easily extend by adding new type of constraints. Finally, the process of creating new synthesis heuristics is easy, since it can be based on primitive constructs provided by CLP framework. CLASS supports interactive exploration of the design space. This interaction guides the designer towards that part

2 of the design space which has the highest potential of the (near) optimal solutions. Designer knowledge, gained during the interaction, is usually very hard to put into a formal framework but this knowledge is very important in tailoring the architecture to a specific application. CLASS assumes that the functional description of a system is given as a set of cooperating tasks. This description has to be compiled into a task graph which captures data dependencies between tasks. Each task is characterized by the estimated execution time and memory requirements. The real-time DSP or image processing applications belong to the class of problems which can be modeled using above assumptions. They are fairly deterministic which make them suitable for static scheduling. In these applications an important aspect is also the scheme of data memory usage. Very often the unbalanced use of data memory results in too high data memory requirements. The amount of data memory usage can be, however, decreased by taking into account data memory during scheduling. The importance of memory consideration was also indicated in [1], for example. In this paper we focus on the new synthesis system based on CLP methodology. In section 2, we outline related work in this area. Section 3 defines the model of the architecture and the model of the system. Section 4 gives an example to grasp better our system and understand the complexity of the synthesis problem. Section 5 describes the system synthesis tool and possible designer interaction with the system. Section 6 presents experimental results. Finally, the last section concludes the paper and gives some directions for future work. 2. Related Work The synthesis problem encompasses a big number of problems which can be studied alone. In our work we try to integrate these problems and include as many heterogeneous constraints as possible. The closest system to our was described in [1] and its previous version in [2]. That approach uses Mixed Integer Linear Programming which results in many inequalities and binary decision variables. Since the system aimed at finding optimal solutions the runtimes are prohibitively large even for examples consisting of nine tasks. The architecture of the synthesized systems consists of processors, and buses or links only. Other components, such as ASIC s are not considered. The approach was extended by inclusion of a simple memory model in [1]. The memory cost was directly included in the cost function. Another system synthesis approach which guarantees optimal solutions is described in [8]. The presented algorithm makes it possible to introduce multiple computation of the same task on several processing units in order to remove some communications from the buses. The target architecture is restricted by assuming that buses have the same transmission rate and tasks assigned to ASIC s are executed sequentially. The global memory is used to store input and output data of the whole computation, but intermediate data cannot be stored there. Clustering is the main way of dealing with complexity of co-synthesis of embedded systems. The COSYN system and its extension CASPER described in [6] and [7] are using this method. The algorithm presented there can cope with large industrial size problems, but memory requirements are not considered. During allocation of clusters the architecture is gradually extended by adding new components when deadlines are not met. In [12] clustering is guided by detailed information of communication requirements in order to merge transfers which do not interfere with each other too much. In [9], the multiprocessor task assignment was modeled as a vector packing problem. The target architecture consists of an arbitrary number of heterogeneous processing units that communicate over one bus. It often results in a bus bottleneck for data dominated applications. This assumption causes the best heuristic to produce solutions with the lowest bus ratio. That work was extended in [10], where it solves also a configuration selection problem. The goal is to minimize the cost of the system while correct assignment and schedule can be found. Evolution algorithms for system synthesis are used in [11, 15]. Those approaches give good solutions for middle size problems while not limiting the target architecture. The target architecture consists of general purpose processors, ASIC s, buses and memories. Their advantage is that the obtained result defines the set of solutions, which is often close to the Pareto-set. The main problem of that method is difficulty in adding new type of constraints to the model. Our work is built on the research presented in [3]. This work uses CLP to represent the system synthesis problem by a set of finite domain constraints defining different timing requirements. For small problems optimal solutions can be obtained. Heuristics are used for larger design problems. The system can minimize the design cost for a given execution time or vice versa. The efficiency of the CLP approach is compared with other approaches in favor of CLP. Our approach have more general target architecture than other approaches. It considers processors, ASIC s, buses, links, and local memories. The memory is divided into two groups, code and data memory. The designer has a freedom to influence the final design by making decisions concerning the final architecture, task assignment, and scheduling. He can also supply the system with a partial solution of the selection of components and task assignments. This makes possible to guide the synthesis process in a clean manner and still use the full power of automatic synthesis methods. We implemented an optimization heuristic which gives

3 good results for large designs. 3. System Modeling In our approach, CLP is used to model the system architecture and the design problem. Therefore we first briefly introduce the concept of finite domain variables and constraints over these variables, then present both models and the relations between them in two consecutive subsections. Each finite domain variable (FDV) is initially defined by a set of integer values which constitute its domain. Constraints specify relations among these variables therefore restricting a domain of one variable usually results in restricting domains of the other FDV s. The CLP model consists of FDV s and constraints imposed on these variables. We use system Constraint Handling in Prolog (CHIP) v The CHIP system implements basic and global constraints. Basic constraints are equality, inequality or conditional constraint. The problem can be described using these constraints only. To avoid exponential growth of the number of constraints when complexity of problems increases global constraints are also used. Global constraints usually impose restrictions on cumulative use of resources, rectangle placement or partitioning of graphs [18]. Modeling the problem using global constraints gives a clean and understandable description of the problem Architecture Model The target architecture, in our approach, consists of processing units, such as processors, ASIC s and communication devices, such as buses and links. Each processor has two local memories: for data and for code. The architecture is described by specifying processors, ASIC s, buses and interconnection between them. Each processor can be described by the following tuple: P=(λ, β, κ, ϕ) (1) where λ is an integer value and denotes the cost of the processor,βis a 0/1 FDV denoting whether a processor is used or not, κis also an integer and denotes the amount of data memory, and finally the amount of code memory is denoted by ϕ. In case of ASIC ϕ=0. The data memory is used to store data computed by the tasks. Each task requires data memory which is reserved from the start time of the task until all communications from this task are completed. The amount of needed data memory on each processor changes during the schedule because of data transfers between tasks. The processor can compute and send or receive data concurrently, which makes the data memory usage scheme even more dynamic. During synthesis we have to assure that the maximal usage of data memory does not exceed the memory size. ASIC s consist of any number of parts. The ASIC s parts operate independently making possible parallel execution of tasks. The ASIC s cost is fixed regardless of the number of tasks assigned to them. All tasks assigned to an ASIC have access to local data memory. ASIC s do not have code memory. Each bus or link is described by the following tuple: B = (λ, β, ϖ) (4) where λ and ϖ are numbers and denote the cost and the speed of a bus/link respectively, β is a 0/1 FDV and denotes whether a bus or a link is used in the final architecture. The processing units and communication devices have an associated cost. The cost of processing components includes the cost of their memory. This suits better the situation when a designer is creating the system from off-theshelf components when all features off the components are fixed. Differing the cost of the processor with different amount of memory available for it can be done by creating a set of processors with the same performance but different cost and memory capacitance. There is no restriction on the nature of the interconnection structure. The designer has to specify the possible connections between processing and communication devices. This specification is used to impose constraints on bus or link selection for transferring data between two cooperating tasks, when executed on different processors. An example of interconnection structure is presented in Figure 1b, where one bus, B 1, and two links, L 1 and L 2, are used Problem Definition In our approach, the functional description of the system contains a number of cooperating tasks. This description together with estimated execution time and memory requirements are compiled into a task graph. The task graph is an acyclic graph, as one presented in Figure 1a. The nodes of this graph represent computational tasks. Each task is described by the following tuple of FDV s: T=(τ, ρ, δ, µ) (3) where τ denotes the start time of the task execution, ρ, denotes the resource on which the task is executed, δ denotes the task duration and finally µ denotes the amount of code memory needed for the task execution. T 1 T 2 C C 2 3 C T 1 3 C 4 T C 4 5 T 5 B 1 P 1 L 1 P 2 P 3 L 2 A 1 a) a task graph example b) an example of the system architecture Figure 1. Task data flow graph and target architecture

4 The execution time and code memory required by the task depend on the processor. The tasks must be always scheduled on one of the processing units and they cannot be preempted. This is modeled by imposing constraints which define finite relations between FDV s of (3) representing different tasks [3]. The arcs in the task graph represent data transfers between tasks. Each arc is described by a tuple of FDV s: C=(τ, ρ, δ, α) (4) where τ denotes the start time of the communication, ρ denotes the resource which is used to transfer data, δ denotes the duration of the communication, and α denotes the amount of the transferred data. Each arc in the task graph imposes a constraint which defines an execution order between two tasks. When task T i communicates data (communication C c ) to task T j then this is modeled by a constraint which is a conjuction of two following inequalities: τ i + δ i τ c τ c + δ c τ j (5) There are two possible scenarios for exchanging data between two cooperating tasks. First one is, when two tasks are executed on different processors. The communication must be assigned to and scheduled on a communication device for transferring data between two tasks involved in this communication. In second case, both tasks are executed on the same processor and they communicate using the local memory. Therefore FDV δ c =0 and previous constraint reduces to the following one τ i + δ i τ j. All these constraints together create a partial ordering of the tasks. Data memory has to be allocated for each task assigned to a processing device. This memory needs to be reserved during the time interval spanning from the task start and the related communication end. More formally, if we have a task T i and a communication from this task C j then the data memory is reserved for the time interval [τ i, τ j +δ j ]. Note that the time interval when the data memory is allocated depends on both task assignment and scheduling. The constraints on data memory allocation are imposed using cumulative and conditional constraints. These constraints limit the allocation of data memory on each processor to the level of the available data memory. The code memory constraints are simpler. The code memory requirements do not change during execution, so these constraints can be expressed using a conditional sum. Example: Consider two cooperating tasks and communication between them as depicted in Figure 2a. T 1 is executed on the processor P 1 and T 2 is executed on the processor P 2. The communication C 1 is scheduled on bus B 1. This is depicted in Figure 1b as a Gantt diagram. The data transfers can freely appear between finish time of T 1 and start time of T 2 which is expressed by the following inequalities: τ t1 + δ t1 τ c1 τ c1 + δ c1 τ t a) two cooperating tasks B1 P2 P1 DM P2 DM P1 T C 1 1 T 2 T 1 b) schedule for two cooperating tasks D 1 Processor P 1 must reserve data memory for task T 1 denoted by D 1 from τ t1 until τ c1. Processor P 2 reserves data memory for task T 2 denoted by D 2 from τ c1 until τ t2. The memory size of D 1 and D 2 is the same. This constraint is represented for each processing unit using FDV s and cumulative constraints. Figure 2c depicts the possible memory usage and the situation when there is some data memory left on both processors. This memory can be used to store data from other tasks. 4. System Synthesis C 1 T 2 The synthesis problem is to find an architecture with a minimal cost which can execute all tasks fulfilling timing and memory constraints. The architecture is created from a set of components specified by the designer. The whole process is guided by the constraint system, which enforces the correctness of the solution by rejecting all decisions which violate constraints. The synthesis process assigns to each FDV one of the values from its domain. All domain variables introduced in the previous section, such as τ i, ρ i, δ i, µ i and β i, must be assigned to one specific value. After each assignment the correctness of the partial solution is checked by the constraint engine. In case of inconsistency the last decision is withdrawn and another value is tried. In our approach, CLP is used for modeling the synthesis problem and finding solutions. Since adding new constraints is relatively easy task when comparing to other approaches, the designer has the possibility to add his own constraints and influence the solution. The designer constraints concern the deadline for execution of the task graph and deadlines for selected tasks. He has also the possibility to guide our synthesis system by specifying the components which should be used in the final architecture and tasks assignment to these components. This kind of guidance can lead to good solutions easier and in consequence results in better exploration of the design space. D 2 c) data memory usage for processors executing these tasks Figure 2. Data memory requirements

5 The ordering of FDV s in the assignment influences the efficiency of cutting the search space. Good heuristic should choose wisely the next variable and the value assigned to this variable, thus obtaining the good solutions faster. We implemented several heuristics for finding solutions which are based on the domain specific knowledge. In this paper, we use one of these heuristics and the decision flow of used heuristic is following: Assignment of tasks to the resources Assignment of execution intervals to each task Assignment of time slots for executing tasks The first step of the heuristic is to assign tasks to processing units and communications to communication devices. The assignment tries to select the cheapest processors for a given task while minimizing the code memory usage. When the task cannot be assigned to none of the processing units which are already present in the architecture, then a new processing unit is added. In the second step, we assign an execution interval for each task and communication. The number of intervals depends on the duration of the task, larger task duration gives smaller number intervals. Tasks and communications are then divided into three groups depending on the position in the graph. For tasks which are close to the start time of the task graph the intervals with the smallest starting time are tried first. For tasks positioned in the middle of the execution period the middle intervals are selected first, and finally for tasks from the end of the execution period we assign intervals with the largest starting time. This approach allows to scatter tasks and communications evenly in time domain. The search for the correct assignment is done using a heuristic method for traversing the search space which is called credit search [17]. In the previous step, the intervals for tasks execution were decided. The third step assigns the actual time slots within previously decided intervals. Since the search space is very restricted an exhaustive search is performed to find the first correct assignment. The heuristic backtracks whenever it cannot find correct assignment at any step. For example, if during credit search of the intervals, no assignment will be found, then our heuristic finds a new assignment of tasks and communications to resources and credit search is performed again. After finding a solution, a new constraint is added which restricts the cost of the next solution to be smaller than the one just obtained and the heuristic is restarted. 5. Example In this section, we use simple task graph and target architecture depicted in Figure 1. The task graph consists of five tasks and five communications. The target architecture can consist of at most three processors, one ASIC, one bus, and Table 1: Characteristics of the processing units Processing units Communication devices P 1 P 2 P 3 A 1 B 1 L 1 L 2 Cost Data Speed two links. The cost of the processors, P 1,P 2 and P 3, ASIC, A 1, bus, B 1, and links, L 1 and L 2, is given in Table 1. The execution time of the tasks on different processing units and their code memory requirements are given in Table 2. Table 2: Execution time and code memory requirements Task Execution time Processing units P 1 P 2 P 3 A 1 Execution time Execution time Execution time T T T T T For example, T 3 can be executed on processors P 1,P 2 or P 3. The execution time on P 3 is 2 time units and code memory requirement is 3 units. The amount of data transferred between tasks is given in Table 3 Table 3: Data transfers characteristics C 1 C 2 C 3 C 4 C 5 Data amount For this small example there are five different solutions depending on the deadline of the whole schedule. These solutions are presented in Table 4. They were generated by our synthesis system. The solution 5 for the deadline 7 is depicted in Figure 3. This solution consists of two processors, P 1 and P 2, one ASIC A 1, one bus B 1 and one link L 2. The Gantt diagram produced by our synthesis system for this solution is depicted in Figure 4. For this schedule the requirements for Table 4: Synthesis # Deadline Cost Architecture P P 1, P 2, L 1 3 9,10 6 P 1, P 2, B P 1, P 3, B P 2, P 3, A 1, B 1, L 2

6 T 4 T 3 C2 A1->P3 T 1 T 2 T 5 P 2 P 3 A 1 C3(A1->P3) C4(A1->P2) C1(A1->P2) Figure 3. The solution 5 for deadline 7. the code and data memory are presented in Table 5. Table 5: and data memory requirements for the solution 5. P 2 P 3 A Data Synthesis System Tool A1 P1 P2 P3 L1 L2 B1 T1 T2 T4 C2 C1 C4 C3 T5 T3 Figure 4. Gantt diagram of the solution with deadline 7. Figure 5. The synthesis system was implemented in Constraint Handling in Prolog (CHIP) package version 5.1 [18]. This synthesis system allows the designer to specify the synthesis problem and to find solutions interactively. The system guides the designer in the process of the space exploration. The designer has the possibility of saving solutions, finding new solutions keeping some decisions made during previous design exploration steps. There are several ways of influencing the final design by making decisions regarding component selection, task assignment, and scheduling. The designer can specify which components must be used in the final architecture thus partially creating final architecture. This will help to reduce the design space during task assignment step. The decisions of the designer can also influence the task assignment. The designer can explicitly assign all or selected tasks and communications to the chosen processing or communication units. Finally, the designer can specify the deadlines for specific tasks and help the synthesis system to cut the search during scheduling. The overall view of the system is presented in Figure 6. The main window is divided into two parts. The first one is used to display one of the four spreadsheets which specify the characteristics of the architecture components or the characteristics of computational and communication tasks. The second window, the bottom one, is a Gantt diagram representing the solution graphically to make the evaluation easier. The graphical interface allows better interaction with the system. The system gives also the possibility to display the current task graph, an example is presented in Figure Experimental Results The tool s presentation of the task graph The efficiency of the synthesis tool has been evaluated on six randomly generated examples. They represent relatively small synthesis problems as well as the large ones. All experiments were targeted for the same type of the architecture as depicted in Figure 1b, but instead of the ASIC A 1 we use processor P 4. Since the number of tasks and communications grows the deadlines were extended and the processors were enhanced by adding additional data and code memory. The possible components of the architecture are presented in Table 6. Bus B 1 connects all processors, link L 1 connects processors P 1 and P 2 and link L 2 connects processors P 3 with P 4. Table 6: Architecture components Cost Data Speed P P P P B L L

7 Figure 6. The system synthesis tool The main characteristics of the examples are presented in Table 7. All tasks and communications in all examples create a single task graph. All examples were run on the Pentium 200MHz processor using the heuristic presented in section 4. The results are shown in Table 8 Our objective was to minimize the cost of the architecture not the of the processors and buses, which are not utilized very much, up to 79%. However, the of code memory and data memory is very high always around 90%. The runtimes for large examples are less than 12 minutes. All these results were obtained without designer s interaction. The tool allows the designer to express a wide variety of constraints on the designs and helps to prune enormous design space. The designer can work iteratively on the design by saving the currently obtained design into a file, and improve it later. Improvements can usually be done by Table 7: Example task graphs Experiments #tasks #commun. Mem DataMem enforcing good decisions from the old design in a new one and leaving other decisions to the synthesis system. An example of a possible scenario for iterative improvements is presented in Table 9. We start with design 6.1, which is identical to design 6 from Table 8. The synthesis of this example was done without interaction from the designer. Design 6.2 was obtained by specifying that the architecture in a new design is the same as in 6.1. With this information the system can improve the assignment and the schedule. Finally, design 6.3 which is depicted in Figure 6 was obtained by fixing architecture as well as task and communication assignment. The final design has very good characteristics in terms of the ratios of the processors and memories, which are very high, all above 90%. Table 8: Experimental results Exp. Cost Deadline Components memory Data memory CPU Bus/Link Runtime (s) P3, P4, L P3, P4, L P1, P2, L P3, P4, L P3, P4, L P1, P2, L

8 Table 9: Interactive improvements of the design 6 Exp. Cost Deadline 8. Conclusions and Future Work CLASS copes with heterogeneous constraints at the system level design. The main contribution is the inclusion of the data memory constraints. Data memory usage peak varies very much according to the assignment and schedule. Incorporating timing constraints with data constraints gives better propagation, better decisions during component selection, assignment and scheduling. We can cope with large examples using the developed heuristic. The presented synthesis system has been evaluated on large task graph examples. It has been shown that it provides good quality results while having acceptable runtimes. By providing more information from the designer, the design quality can be significantly improved. The future directions of this research work are to create better heuristics, extend the model of the problem and add the possibility of pipelined designs as it was indicated in [4, 13]. 9. References Components memory Data memory [1] S. Prakash and A. C. Parker, Synthesis of Application- Specific Multiprocessor Systems Including Components, VLSI Signal Processing, 1994 [2] S. Prakash and A. C. Parker, SOS: Synthesis of Application- Specific Heterogeneous Multiprocessor Systems, Parallel and Distributed Computing, pp , 1992 [3] K. Kuchcinski, Embedded System Synthesis by Timing Constraint Solving, ISSS 1997 [4] K. Kuchcinski, Integrated Resource Assignment and Scheduling of Task Graphs Using Finite Domain Constraints, DATE Conference, 1999 CPU Bus/Link Runtime (s) P1, P2, L P1, P2, L P1, P2, L [5] R. Beckmann and J. Herrmann, Synthesis for General Purpose Computers by Use of Constraint Logic Programming, University of Dortmund, Department of Computer Science, Research Report 684, 1998 [6] B. P. Dave, G. Lakshminarayana and N. K. Jha, COSYN: Hardware-Software Co-Synthesis of Embedded Systems, DAC, 1997 [7] B. P. Dave and N. K. Jha, CASPER: Concurrent Hardware- Software Co-Synthesis of Hard Real-Time Aperiodic and Periodic Specifications of Embedded System Architecture, DATE Conference, 1998 [8] A. Bender, MILP Based Task Mapping for Heterogeneous Multiprocessor Systems, European Design Automation Conference with Euro-VHDL, 1996 [9] J. Beck and D. P. Siewiorek, Modeling Multicomputer Task Allocation as a Vector Packing Problem, International Symposium on System Synthesis, 1996 [10] J. E. Beck and D. P. Siewiorek, Automatic Configuration of Embedded Multicomputer Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, feb, 2, pp , volume 17, 1998 [11] T. Blickle, J. Teich and L. Thiele, System-Level Synthesis Using Evolutionary Algorithms, Design Automation for Embedded Systems, 1998 [12] M. Gasteier and M. Glesner, Bus-Based Communication Synthesis on System Level, 9th International Symposium on System Synthesis, 1996 [13] S. Bakshi and D. D. Gajski, A Scheduling and Pipelining Algorithm for Hardware/Software Systems,10th International Symposium on System Synthesis, 1997 [14] C. Lee, M. Potkonjak, and W. Wolf, System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics, 9th International Symposium on System Synthesis, 1996 [15] R. P. Dick and K. Jha, MOGAC: A Multiobjective Genetic Algorithm for Hardware-Software Cosynthesis of Distributed Embedded Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, oct, 10, volume 17, 1998 [16] B. P. Dave and N. K. Jha, COHRA: Hardware-Software Cosynthesis of Hierarchical Heterogeneous Distributed Embedded Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, October, vol. 17, 1998 [17] N. Beldiceanu, E. Bourreau, H. Simonis and P. Chan, Partial search strategy in CHIP, Presented at 2nd Metaheuristic International Conference MIC97, Sophia Antipolis, France, July [18] CHIP, System Documentation, COSYTEC, 1996

Published in: Proceedings of the Practical Application of Constraint Logic Programming (PACLP) Conference

Published in: Proceedings of the Practical Application of Constraint Logic Programming (PACLP) Conference Digital Systems Design Using Constraint Logic Programming Szymanek, Radoslaw; Gruian, Flavius; Kuchcinski, Krzysztof Published in: Proceedings of the Practical Application of Constraint Logic Programming

More information

MOGAC: A Multiobjective Genetic Algorithm for the Co-Synthesis of Hardware-Software Embedded Systems

MOGAC: A Multiobjective Genetic Algorithm for the Co-Synthesis of Hardware-Software Embedded Systems MOGAC: A Multiobjective Genetic Algorithm for the Co-Synthesis of Hardware-Software Embedded Systems Robert P. Dick and Niraj K. Jha Department of Electrical Engineering Princeton University Princeton,

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

A Controller Testability Analysis and Enhancement Technique

A Controller Testability Analysis and Enhancement Technique A Controller Testability Analysis and Enhancement Technique Xinli Gu Erik Larsson, Krzysztof Kuchinski and Zebo Peng Synopsys, Inc. Dept. of Computer and Information Science 700 E. Middlefield Road Linköping

More information

Hardware Software Partitioning of Multifunction Systems

Hardware Software Partitioning of Multifunction Systems Hardware Software Partitioning of Multifunction Systems Abhijit Prasad Wangqi Qiu Rabi Mahapatra Department of Computer Science Texas A&M University College Station, TX 77843-3112 Email: {abhijitp,wangqiq,rabi}@cs.tamu.edu

More information

HIGH-LEVEL SYNTHESIS

HIGH-LEVEL SYNTHESIS HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

920 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 10, OCTOBER 1998

920 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 10, OCTOBER 1998 920 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 10, OCTOBER 1998 MOGAC: A Multiobjective Genetic Algorithm for Hardware Software Cosynthesis of Distributed

More information

COMPLEX embedded systems with multiple processing

COMPLEX embedded systems with multiple processing IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 8, AUGUST 2004 793 Scheduling and Mapping in an Incremental Design Methodology for Distributed Real-Time Embedded Systems

More information

Design Space Exploration Using Parameterized Cores

Design Space Exploration Using Parameterized Cores RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR Design Space Exploration Using Parameterized Cores Ian D. L. Anderson M.A.Sc. Candidate March 31, 2006 Supervisor: Dr. M. Khalid 1 OUTLINE

More information

Search. Krzysztof Kuchcinski. Department of Computer Science Lund Institute of Technology Sweden.

Search. Krzysztof Kuchcinski. Department of Computer Science Lund Institute of Technology Sweden. Search Krzysztof Kuchcinski Krzysztof.Kuchcinski@cs.lth.se Department of Computer Science Lund Institute of Technology Sweden January 12, 2015 Kris Kuchcinski (LTH) Search January 12, 2015 1 / 46 Search

More information

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca Datapath Allocation Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca e-mail: baruch@utcluj.ro Abstract. The datapath allocation is one of the basic operations executed in

More information

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems Gamal Attiya and Yskandar Hamam Groupe ESIEE Paris, Lab. A 2 SI Cité Descartes, BP 99, 93162 Noisy-Le-Grand, FRANCE {attiyag,hamamy}@esiee.fr

More information

Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007)

Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007) In the name of God Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm Spring 2009 Instructor: Dr. Masoud Yaghini Outlines Problem Definition Modeling As A Set Partitioning

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 4. System Partitioning Lothar Thiele 4-1 System Design specification system synthesis estimation SW-compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

MULTI-OBJECTIVE DESIGN SPACE EXPLORATION OF EMBEDDED SYSTEM PLATFORMS

MULTI-OBJECTIVE DESIGN SPACE EXPLORATION OF EMBEDDED SYSTEM PLATFORMS MULTI-OBJECTIVE DESIGN SPACE EXPLORATION OF EMBEDDED SYSTEM PLATFORMS Jan Madsen, Thomas K. Stidsen, Peter Kjærulf, Shankar Mahadevan Informatics and Mathematical Modelling Technical University of Denmark

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Hardware-Software Co-partitioning for Distributed Embedded Systems

Hardware-Software Co-partitioning for Distributed Embedded Systems Hardware-Software Co-partitioning for Distributed Embedded Systems Outline 1. Introduction 2. Related Work 3. Distributed Embedded System and System Model 4. Multi-Level Partitioning 5. Case Study 2 1.

More information

JaCoP v Java Constraint Programming Libraray. Krzysztof Kuchcinski

JaCoP v Java Constraint Programming Libraray. Krzysztof Kuchcinski JaCoP v. 4.5 Java Constraint Programming Libraray Krzysztof Kuchcinski Krzysztof.Kuchcinski@cs.lth.se Department of Computer Science Lund Institute of Technology Sweden October 25, 2017 Kris Kuchcinski

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 8 HW/SW Co-Design Sources: Prof. Margarida Jacome, UT Austin Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu

More information

Partitioning Methods. Outline

Partitioning Methods. Outline Partitioning Methods 1 Outline Introduction to Hardware-Software Codesign Models, Architectures, Languages Partitioning Methods Design Quality Estimation Specification Refinement Co-synthesis Techniques

More information

An Improved Scheduling Technique for Time-Triggered Embedded Systems

An Improved Scheduling Technique for Time-Triggered Embedded Systems An Improved Scheduling Technique for Time-Triggered Embedded Systems Abstract In this paper we present an improved scheduling technique for the synthesis of time-triggered embedded systems. Our system

More information

Hardware/Software Codesign

Hardware/Software Codesign Hardware/Software Codesign 3. Partitioning Marco Platzner Lothar Thiele by the authors 1 Overview A Model for System Synthesis The Partitioning Problem General Partitioning Methods HW/SW-Partitioning Methods

More information

Digital Hardware-/Softwaresystems Specification

Digital Hardware-/Softwaresystems Specification Digital Hardware-/Softwaresystems Specification Seminar Architecture & Design Methods for Embedded Systems Summer Term 2006 University of Stuttgart Faculty of Computer Science, Electrical Engineering and

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment

A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment JAVIER RESANO, ELENA PEREZ, DANIEL MOZOS, HORTENSIA MECHA, JULIO SEPTIÉN Departamento de Arquitectura de Computadores

More information

Network Topology Control and Routing under Interface Constraints by Link Evaluation

Network Topology Control and Routing under Interface Constraints by Link Evaluation Network Topology Control and Routing under Interface Constraints by Link Evaluation Mehdi Kalantari Phone: 301 405 8841, Email: mehkalan@eng.umd.edu Abhishek Kashyap Phone: 301 405 8843, Email: kashyap@eng.umd.edu

More information

Hardware/Software Co-Synthesis with Memory Hierarchies

Hardware/Software Co-Synthesis with Memory Hierarchies Hardware/Software Co-Synthesis with Memory Hierarchies Yanbing Li and Wayne Wolf Department of Electrical Engineering Princeton University, Princeton, NJ 08544. email: fyanbing,wolfg@ee.princeton.edu Abstract

More information

DESIGN OF AN FFT PROCESSOR

DESIGN OF AN FFT PROCESSOR 1 DESIGN OF AN FFT PROCESSOR Erik Nordhamn, Björn Sikström and Lars Wanhammar Department of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract In this paper we present a structured

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 370: SYSTEM ARCHITECTURE & SOFTWARE [CPU SCHEDULING] Shrideep Pallickara Computer Science Colorado State University OpenMP compiler directives

More information

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors

Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors G. Chen 1, M. Kandemir 1, I. Kolcu 2, and A. Choudhary 3 1 Pennsylvania State University, PA 16802, USA 2 UMIST,

More information

Search Space Reduction for E/E-Architecture Partitioning

Search Space Reduction for E/E-Architecture Partitioning Search Space Reduction for E/E-Architecture Partitioning Andreas Ettner Robert Bosch GmbH, Corporate Reasearch, Robert-Bosch-Campus 1, 71272 Renningen, Germany andreas.ettner@de.bosch.com Abstract. As

More information

The Design of Mixed Hardware/Software Systems

The Design of Mixed Hardware/Software Systems The Design of Mixed Hardware/Software Systems Jay K. Adams Synopsys, Inc. 700 East Middlefield Road Mountain View, CA 94043 jka@synopsys.com Donald E. Thomas Deptartment of Electrical and Computer Engineering

More information

High-Level Synthesis (HLS)

High-Level Synthesis (HLS) Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Adaptive packet scheduling for requests delay guaranties in packetswitched computer communication network

Adaptive packet scheduling for requests delay guaranties in packetswitched computer communication network Paweł Świątek Institute of Computer Science Wrocław University of Technology Wybrzeże Wyspiańskiego 27 50-370 Wrocław, Poland Email: pawel.swiatek@pwr.wroc.pl Adam Grzech Institute of Computer Science

More information

Lecture 9: Load Balancing & Resource Allocation

Lecture 9: Load Balancing & Resource Allocation Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently

More information

A SIMULATED ANNEALING ALGORITHM FOR SOME CLASS OF DISCRETE-CONTINUOUS SCHEDULING PROBLEMS. Joanna Józefowska, Marek Mika and Jan Węglarz

A SIMULATED ANNEALING ALGORITHM FOR SOME CLASS OF DISCRETE-CONTINUOUS SCHEDULING PROBLEMS. Joanna Józefowska, Marek Mika and Jan Węglarz A SIMULATED ANNEALING ALGORITHM FOR SOME CLASS OF DISCRETE-CONTINUOUS SCHEDULING PROBLEMS Joanna Józefowska, Marek Mika and Jan Węglarz Poznań University of Technology, Institute of Computing Science,

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Hardware/ Software Partitioning

Hardware/ Software Partitioning Hardware/ Software Partitioning Peter Marwedel TU Dortmund, Informatik 12 Germany Marwedel, 2003 Graphics: Alexandra Nolte, Gesine 2011 年 12 月 09 日 These slides use Microsoft clip arts. Microsoft copyright

More information

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints

Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Jörg Dümmler, Raphael Kunis, and Gudula Rünger Chemnitz University of Technology, Department of Computer Science,

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

Hardware/Software Partitioning for SoCs. EECE Advanced Topics in VLSI Design Spring 2009 Brad Quinton

Hardware/Software Partitioning for SoCs. EECE Advanced Topics in VLSI Design Spring 2009 Brad Quinton Hardware/Software Partitioning for SoCs EECE 579 - Advanced Topics in VLSI Design Spring 2009 Brad Quinton Goals of this Lecture Automatic hardware/software partitioning is big topic... In this lecture,

More information

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Ewa Kusmierek and David H.C. Du Digital Technology Center and Department of Computer Science and Engineering University of Minnesota

More information

Hardware/Software Partitioning of Digital Systems

Hardware/Software Partitioning of Digital Systems Hardware/Software Partitioning of Digital Systems F. Dufour Advisor: M. Radetzki Department of Technical Computer Science University of Stuttgart Seminar Embedded Systems Outline 1 Partitioning and digital

More information

Mixed Criticality Scheduling in Time-Triggered Legacy Systems

Mixed Criticality Scheduling in Time-Triggered Legacy Systems Mixed Criticality Scheduling in Time-Triggered Legacy Systems Jens Theis and Gerhard Fohler Technische Universität Kaiserslautern, Germany Email: {jtheis,fohler}@eit.uni-kl.de Abstract Research on mixed

More information

Schedulability-Driven Communication Synthesis for Time Triggered Embedded Systems

Schedulability-Driven Communication Synthesis for Time Triggered Embedded Systems Schedulability-Driven Communication Synthesis for Time Triggered Embedded Systems Paul Pop, Petru Eles, and Zebo Peng Dept. of Computer and Information Science, Linköping University, Sweden {paupo, petel,

More information

Data Flow Graph Partitioning Schemes

Data Flow Graph Partitioning Schemes Data Flow Graph Partitioning Schemes Avanti Nadgir and Harshal Haridas Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802 Abstract: The

More information

Scheduling and Mapping in an Incremental Design Methodology for Distributed Real-Time Embedded Systems

Scheduling and Mapping in an Incremental Design Methodology for Distributed Real-Time Embedded Systems (1) TVLSI-00246-2002.R1 Scheduling and Mapping in an Incremental Design Methodology for Distributed Real-Time Embedded Systems Paul Pop, Petru Eles, Zebo Peng, Traian Pop Dept. of Computer and Information

More information

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction

More information

COSYN: Hardware-Software Co-Synthesis of Embedded Systems *

COSYN: Hardware-Software Co-Synthesis of Embedded Systems * COSYN: Hardware-Software Co-Synthesis of Embedded Systems * Bharat P. Dave 1, Ganesh Lakshminarayana, and Niraj K. Jha Department of Electrical Engineering Princeton University, Princeton, NJ 08544 Abstract:

More information

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip 1 Mythili.R, 2 Mugilan.D 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

Modeling and Simulating Discrete Event Systems in Metropolis

Modeling and Simulating Discrete Event Systems in Metropolis Modeling and Simulating Discrete Event Systems in Metropolis Guang Yang EECS 290N Report December 15, 2004 University of California at Berkeley Berkeley, CA, 94720, USA guyang@eecs.berkeley.edu Abstract

More information

EECS 571 Principles of Real-Time Embedded Systems. Lecture Note #8: Task Assignment and Scheduling on Multiprocessor Systems

EECS 571 Principles of Real-Time Embedded Systems. Lecture Note #8: Task Assignment and Scheduling on Multiprocessor Systems EECS 571 Principles of Real-Time Embedded Systems Lecture Note #8: Task Assignment and Scheduling on Multiprocessor Systems Kang G. Shin EECS Department University of Michigan What Have We Done So Far?

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

Computational Complexity CSC Professor: Tom Altman. Capacitated Problem

Computational Complexity CSC Professor: Tom Altman. Capacitated Problem Computational Complexity CSC 5802 Professor: Tom Altman Capacitated Problem Agenda: Definition Example Solution Techniques Implementation Capacitated VRP (CPRV) CVRP is a Vehicle Routing Problem (VRP)

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

A Complete Data Scheduler for Multi-Context Reconfigurable Architectures

A Complete Data Scheduler for Multi-Context Reconfigurable Architectures A Complete Data Scheduler for Multi-Context Reconfigurable Architectures M. Sanchez-Elez, M. Fernandez, R. Maestre, R. Hermida, N. Bagherzadeh, F. J. Kurdahi Departamento de Arquitectura de Computadores

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

COSYN: Hardware-Software Co-synthesis of Embedded Systems

COSYN: Hardware-Software Co-synthesis of Embedded Systems COSYN: Hardware-Software Co-synthesis of Embedded Systems Track T5.2: Hardware/Software Co-Design Abstract Hardware-software co-synthesis is the process of partitioning an embedded system specification

More information

Estimation of Wirelength

Estimation of Wirelength Placement The process of arranging the circuit components on a layout surface. Inputs: A set of fixed modules, a netlist. Goal: Find the best position for each module on the chip according to appropriate

More information

Scheduling tasks in embedded systems based on NoC architecture

Scheduling tasks in embedded systems based on NoC architecture Scheduling tasks in embedded systems based on NoC architecture Dariusz Dorota Faculty of Electrical and Computer Engineering, Cracow University of Technology ddorota@pk.edu.pl Abstract This paper presents

More information

Hardware/Software Partitioning using Integer Programming. Ralf Niemann, Peter Marwedel. University of Dortmund. D Dortmund, Germany

Hardware/Software Partitioning using Integer Programming. Ralf Niemann, Peter Marwedel. University of Dortmund. D Dortmund, Germany Hardware/Software using Integer Programming Ralf Niemann, Peter Marwedel Dept. of Computer Science II University of Dortmund D-44221 Dortmund, Germany Abstract One of the key problems in hardware/software

More information

Staffing and Scheduling in Multiskill and Blend Call Centers

Staffing and Scheduling in Multiskill and Blend Call Centers Staffing and Scheduling in Multiskill and Blend Call Centers 1 Pierre L Ecuyer GERAD and DIRO, Université de Montréal, Canada (Joint work with Tolga Cezik, Éric Buist, and Thanos Avramidis) Staffing and

More information

High-level Variable Selection for Partial-Scan Implementation

High-level Variable Selection for Partial-Scan Implementation High-level Variable Selection for Partial-Scan Implementation FrankF.Hsu JanakH.Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract In this paper, we propose

More information

TCG-Based Multi-Bend Bus Driven Floorplanning

TCG-Based Multi-Bend Bus Driven Floorplanning TCG-Based Multi-Bend Bus Driven Floorplanning Tilen Ma Department of CSE The Chinese University of Hong Kong Shatin, N.T. Hong Kong Evangeline F.Y. Young Department of CSE The Chinese University of Hong

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University

More information

A Process Model suitable for defining and programming MpSoCs

A Process Model suitable for defining and programming MpSoCs A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.

More information

Traffic Pattern Analysis in Multiprocessor System

Traffic Pattern Analysis in Multiprocessor System International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 6, Number 1 (2013), pp. 145-151 International Research Publication House http://www.irphouse.com Traffic Pattern Analysis

More information

Simplicial Global Optimization

Simplicial Global Optimization Simplicial Global Optimization Julius Žilinskas Vilnius University, Lithuania September, 7 http://web.vu.lt/mii/j.zilinskas Global optimization Find f = min x A f (x) and x A, f (x ) = f, where A R n.

More information

Optimization Techniques for Design Space Exploration

Optimization Techniques for Design Space Exploration 0-0-7 Optimization Techniques for Design Space Exploration Zebo Peng Embedded Systems Laboratory (ESLAB) Linköping University Outline Optimization problems in ERT system design Heuristic techniques Simulated

More information

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao

Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Bus Encoding Technique for hierarchical memory system Anne Pratoomtong and Weiping Liao Abstract In microprocessor-based systems, data and address buses are the core of the interface between a microprocessor

More information

System partitioning. System functionality is implemented on system components ASICs, processors, memories, buses

System partitioning. System functionality is implemented on system components ASICs, processors, memories, buses System partitioning System functionality is implemented on system components ASICs, processors, memories, buses Two design tasks: Allocate system components or ASIC constraints Partition functionality

More information

Buffer Minimization in Pipelined SDF Scheduling on Multi-Core Platforms

Buffer Minimization in Pipelined SDF Scheduling on Multi-Core Platforms Buffer Minimization in Pipelined SDF Scheduling on Multi-Core Platforms Yuankai Chen and Hai Zhou Electrical Engineering and Computer Science, Northwestern University, U.S.A. Abstract With the increasing

More information

Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count-Directed Clustering

Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count-Directed Clustering Efficient Test Compaction for Combinational Circuits Based on Fault Detection Count-Directed Clustering Aiman El-Maleh, Saqib Khurshid King Fahd University of Petroleum and Minerals Dhahran, Saudi Arabia

More information

A New Exam Timetabling Algorithm

A New Exam Timetabling Algorithm A New Exam Timetabling Algorithm K.J. Batenburg W.J. Palenstijn Leiden Institute of Advanced Computer Science (LIACS), Universiteit Leiden P.O. Box 9512, 2300 RA Leiden, The Netherlands {kbatenbu, wpalenst}@math.leidenuniv.nl

More information

CELLULAR automata (CA) are mathematical models for

CELLULAR automata (CA) are mathematical models for 1 Cellular Learning Automata with Multiple Learning Automata in Each Cell and its Applications Hamid Beigy and M R Meybodi Abstract The cellular learning automata, which is a combination of cellular automata

More information

A Multiobjective Optimization Model for Exploring Multiprocessor Mappings of Process Networks

A Multiobjective Optimization Model for Exploring Multiprocessor Mappings of Process Networks A Multiobjective Optimization Model for Exploring Multiprocessor Mappings of Process Networks Cagkan Erbas Dept. of Computer Science University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands

More information

Word-Level Equivalence Checking in Bit-Level Accuracy by Synthesizing Designs onto Identical Datapath

Word-Level Equivalence Checking in Bit-Level Accuracy by Synthesizing Designs onto Identical Datapath 972 PAPER Special Section on Formal Approach Word-Level Equivalence Checking in Bit-Level Accuracy by Synthesizing Designs onto Identical Datapath Tasuku NISHIHARA a), Member, Takeshi MATSUMOTO, and Masahiro

More information

Chapter 2 Designing Crossbar Based Systems

Chapter 2 Designing Crossbar Based Systems Chapter 2 Designing Crossbar Based Systems Over the last decade, the communication architecture of SoCs has evolved from single shared bus systems to multi-bus systems. Today, state-of-the-art bus based

More information

Mathematical Programming Formulations, Constraint Programming

Mathematical Programming Formulations, Constraint Programming Outline DM87 SCHEDULING, TIMETABLING AND ROUTING Lecture 3 Mathematical Programming Formulations, Constraint Programming 1. Special Purpose Algorithms 2. Constraint Programming Marco Chiarandini DM87 Scheduling,

More information

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 797- flur,chengkokg@ecn.purdue.edu

More information

Static Compaction Techniques to Control Scan Vector Power Dissipation

Static Compaction Techniques to Control Scan Vector Power Dissipation Static Compaction Techniques to Control Scan Vector Power Dissipation Ranganathan Sankaralingam, Rama Rao Oruganti, and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer

More information

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders 770 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 48, NO. 8, AUGUST 2001 FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders Hyeong-Ju

More information

Determination of the Minimum Break Point Set of Directional Relay Networks based on k-trees of the Network Graphs

Determination of the Minimum Break Point Set of Directional Relay Networks based on k-trees of the Network Graphs IEEE TPWRD 2011. THIS IS THE AUTHORS COPY. THE DEFINITIVE VERSION CAN BE FOUND AT IEEE. 1 Determination of the Minimum Break Point Set of Directional Relay Networks based on k-trees of the Network Graphs

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

Introduction VLSI PHYSICAL DESIGN AUTOMATION

Introduction VLSI PHYSICAL DESIGN AUTOMATION VLSI PHYSICAL DESIGN AUTOMATION PROF. INDRANIL SENGUPTA DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Introduction Main steps in VLSI physical design 1. Partitioning and Floorplanning l 2. Placement 3.

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Hardware/Software Codesign

Hardware/Software Codesign Hardware/Software Codesign SS 2016 Prof. Dr. Christian Plessl High-Performance IT Systems group University of Paderborn Version 2.2.0 2016-04-08 how to design a "digital TV set top box" Motivating Example

More information

COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS

COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS Zoltan Baruch 1, Octavian Creţ 2, Kalman Pusztai 3 1 PhD, Lecturer, Technical University of Cluj-Napoca, Romania 2 Assistant, Technical University of

More information

Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems

Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Downloaded from orbit.dtu.dk on: Dec 16, 2017 Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Eles, Petru; Kuchcinski, Krzysztof; Peng, Zebo; Doboli, Alex; Pop,

More information

Efficient Symbolic Multi Objective Design Space Exploration

Efficient Symbolic Multi Objective Design Space Exploration This is the author s version of the work. The definitive work was published in Proceedings of the 3th Asia and South Pacific Design Automation Conference (ASP-DAC 2008), pp. 69-696, 2008. The work is supported

More information

Using Dynamic Voltage Scaling to Reduce the Configuration Energy of Run Time Reconfigurable Devices

Using Dynamic Voltage Scaling to Reduce the Configuration Energy of Run Time Reconfigurable Devices Using Dynamic Voltage Scaling to Reduce the Configuration Energy of Run Time Reconfigurable Devices Yang Qu 1, Juha-Pekka Soininen 1 and Jari Nurmi 2 1 Technical Research Centre of Finland (VTT), Kaitoväylä

More information

Lossless Compression using Efficient Encoding of Bitmasks

Lossless Compression using Efficient Encoding of Bitmasks Lossless Compression using Efficient Encoding of Bitmasks Chetan Murthy and Prabhat Mishra Department of Computer and Information Science and Engineering University of Florida, Gainesville, FL 326, USA

More information

Mapping pipeline skeletons onto heterogeneous platforms

Mapping pipeline skeletons onto heterogeneous platforms Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon January 2007 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons

More information