BRNO UNIVERSITY OF TECHNOLOGY Faculty of Information Technology Department of Computer Systems. Ing. Azeddien M. Sllame

Size: px
Start display at page:

Download "BRNO UNIVERSITY OF TECHNOLOGY Faculty of Information Technology Department of Computer Systems. Ing. Azeddien M. Sllame"

Transcription

1 BRNO UNIVERSITY OF TECHNOLOGY Faculty of Information Technology Department of Computer Systems Ing. Azeddien M. Sllame DESIGN SPACE EXPLORATION OF HIGH-PERFORMANCE DIGITAL SYSTEMS PROZKOUMÁVÁNÍ PROSTORU NÁVRHU PRO VYSOCE VÝKONNÉ ČÍSLICOVÉ SYSTÉMY SHORT VERSION OF PHD THESIS Study field: Information Technology Supervisor: Doc. Ing. Vladimír Drábek, CSc. Opponents: Prof. Ing. Jaromír Krejčíček, CSc. Doc. Ing. Jiří Douša, CSc. Doc. Ing. Karel Vlček, CSc. Presentation date:

2 KEY WORDS digital design, pipelining, synthesis, design space, module selection KLÍČOVÁ SLOVA návrh číslicových obvodů, zřetězení, syntéza, prostor návrhu, výběr modulů MÍSTO ULOŽENÍ PRÁCE Ústav počítačových systémů FIT VUT v Brně Azeddien M. Sllame, 2003 ISBN ISSN

3 CONTENTS 1 Introduction The component-based approach Thesis contributions The approach The results A new list-based scheduling algorithm New evolutionary-based module selection algorithms Module selection algorithm without resource sharing Module selection algorithm with resource sharing New pipeline-scheduling algorithm A design space exploration methodology Reusable component model Conclusions Future work

4 Abstract As digital systems become increasingly complex, a higher abstraction level is required to describe them. Consequently, searching the corresponding large design space in a manageable time and being able to find the best possible implementation in an efficient manner is becoming a critical factor in the design process. Design space can be defined as a multidimensional space measured by different design characteristics such as performance, area and architecture style. A point in that space defines one possible implementation for a given design exploiting some design features. Conversely, to manage recent advances in semiconductor technologies, which offer millions of transistors in a single chip, the design flow employed (after system level partitioning process) in current computer-aided design tools has evolved into three distinct phases: behavioral, logic and physical synthesis processes. Behavioral level takes as an input system blocks which are intended to be realized as hardware and which represent the most critical parts at system level. However, a block means a complex hardware component such as discrete cosine transform cell, which is one of the image processing systems building block. The component is constructed from a set of sub-components (we call them modules) such as adders and multipliers. Resource usage can be used to characterize the design space at this level, because the circuit objectives (area, delay) and any exploitation of any design features such as performance or architecture style depend on resource usage. Therefore, in this thesis we are proposing an efficient design space exploration methodology based on a component point of view. The component is described behaviorally in VHDL and then, to reach the final implementation, the design process goes through architecture selection, scheduling, pipelining and module selection processes. As it enters any phase, it is explored by a local exploration scheme incorporated within that phase. Inclusion of architecture selection enables designers to efficiently allocate proper modules to realize the design. Hence, a suitable design structure is assured while pipelining at functional level increases design performance. Moreover, involving module selection adds another level of exploration, which permits the use of slow modules (cheaper) on noncritical paths, while faster (expensive) modules are used on critical paths and only when necessary. In addition, pipelining and scheduling processes are supported by resource sharing to decrease the design cost whenever possible. However, in the scheduling phase, we have developed list-based scheduling algorithms that have different priority selection techniques for nodes to be scheduled next. In the pipelining phase, the previous scheduling algorithms are extended to handle pipelining at the functional level of the component. Novel evolutionary-based module selection algorithms have been developed to further refine the design cost either with or without resource sharing and with or without functional pipelining. Therefore, the algorithms applied to solve subsequent problems of the constructed methodology have formed the basis for building a prototyping tool that aims to support the design of high-performance digital systems. To illustrate the efficiency of the proposed methodology, the set of developed algorithms have been tested with standard benchmarks. Moreover, assumptions to generalize the presented methodology to cope with system level designs are highlighted. Further more, a virtual component model is proposed in order to make the proposed methodology useful to producers of IP cores. Using the proposed methodology, which reflects the current state-of-the-art behavioral synthesis structure, the designer can explore the design space by varying the design architecture, pipelining the design in different ways and into a different number of stages, selecting different modules configuration sets to implement the design and apply resource sharing in different ways. At minimum, a 3D design space exploration methodology is always granted. 4

5 1 INTRODUCTION Digital design can be defined as the process of converting an abstract specification of a system to a detailed implementation in a way that best satisfies design specified constraints on performance, cost, power dissipation, testability and so on. Though current general-purpose processors capabilities admit implementing most of the digital functions as software (SW) programs, the pure SW implementations of a system design are often too slow to meet the imposed performance constraints. Therefore, dedicated hardware (HW) chips are often needed to complement or assist the re-programmable components on certain performance-critical tasks. However, this approach offers flexibility to the system behavior by using SW reprogrammability, while reducing the size of the synthesis process by using the application-specific chips only in the system critical parts. Thus, the final implementation of such systems always contains interacting HW cores and SW components, such as application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) and processors. On the other hand, the complexity of such systems and the short time-to-market require the use of automated techniques during the specification, determination of the boundary between HW and SW components, synthesis of HW blocks, implementation of HW/SW interface and testing phases of those systems. As a result, such systems necessitate making the codesign of HW and SW a major topic for the design automation of embedded systems and impose the usage of reusable cores in current design flows. System-on-a-chip (SOC) is a recent typical case of such a design paradigm. Conversely, to manage the recent advances in semiconductor technologies which offer millions of transistors in a single chip, the design flow employed in current computer-aided design (CAD) tools has evolved into three distinct phases: behavioral, logic, and physical synthesis processes, each one of these processes has its own design space and the higher the process level the larger the corresponding design space. Consequently, searching the corresponding large design space in a manageable time and being able to find the best possible implementation in an efficient manner is becoming a critical factor in the design process. To achieve this goal, the design space needs to be properly characterized. Characterization is the process of identifying the most important features of the design to guide designers (or expert systems) exploring the design space systematically. In this thesis, we are interested in behavioral synthesis only. System level design space can be characterized by the partitioning process and system level components, which include: FPGAs, ASICs, digital signal processors (DSPs) and intellectual property IP cores. The design space size depends upon the 5

6 selected components and the main system architecture as one complete unit. Behavioral level takes as an input system blocks which are intended to be realized as a HW and which represent the most critical parts at system level. Specifically, designers at this level look to find correct choices and efficient implementations for HW cores (components). Typically, at this level, each component is composed of a set of sub-components called modules. On the other hand, different measurements can be given to the size of the design space. Resource usage can be used to characterize design space at this level, because the circuit objectives (area, delay) and any exploitation of any design features such as performance or architecture style depend on resource usage. Therefore, for efficient design space exploration, the set of modules that makes up the component first needs to be scheduled in an efficient order. Then, modules composed of the component must be selected correctly in such a way that the component meets the imposed throughput requirements. Thirdly, the area occupied by that component is optimized (minimized) by efficiently distributing costly modules to critical paths and less costly modules to non-critical paths. Finally, modules need to be selected with efficient implementation styles (such as pipelining) to produce a highperformance system component. Consequently, any estimation tool that is used to estimate a performance figure for any system component at this level must incorporate scheduling, module selection and structure style selection. In this thesis, we intend to propose and describe an efficient design space exploration methodology oriented to realizing high-performance HW components that are needed to support systems in their critical parts. Since this methodology works on a behavioral level, we have decided to use high-level synthesis (HLS) [Gajski94] structure so that the methodology results can be integrated with any system level design methodology and also to be familiar with current trends in digital design process. 2 THE COMPONENT-BASED APPROACH The proposed methodology follows the component-based approach which is a well-known one that allows a natural way of problem decomposition and enhances component reusability. Fundamental principles of component reusability are discussed in [Keating99]. However, the reasons behind the use of the component-based approach are summarized (but not limited to these reasons) as follows: It excludes the partitioning process complexity from the design path, which is still highly influenced by designers knowledge. Consequently, this allow 6

7 designers to concentrate on component-based design space which can be defined and explored according to component characteristics; hence, it enables managing system complexity. It reduces the design risks, hence, system integrators deal with verifiable and documented components, since every component is created and tested separately. Moreover, any component can be redesigned (in the worst case) alone, or replaced without affecting other system components (upgradability in reuse context). Therefore it increases the designer s productivity and shortens the time-to-market. Describing the component in a behavioral level allows the benefits of developments in the field of HLS and system synthesis processes. Furthermore, it enables switching the implementation from HW to SW, if a new processor is capable of doing this. 3 THESIS CONTRIBUTIONS The work presented in this thesis makes the following contributions: It discusses the reusability features of VHDL as a system design specification language. It proposes new reusable virtual component model. It proposes an efficient scheduling algorithm that is useful for some classes of data flow graphs such as those found in DSP applications. It proposes scheduling algorithm for functional pipelining. It proposes new novel evolutionary-based module selection algorithms; one for general solutions with no resource sharing, while the other considers resource sharing. It proposes a well-structured design space exploration methodology for highperformance cost-efficient HW components. 4 THE APPROACH The work presented in this thesis concentrates on design space exploration techniques and algorithms at behavioral level, since the input specification of the HW component is described behaviorally using VHDL. However, the structure of the methodology reflects the current state-of-the-art behavioral synthesis phase structure in the current trend in digital design flow process. These algorithms and techniques are employed in a design space exploration methodology aimed at 7

8 designing high-performance and cost-efficient reusable HW cores. Specifically speaking we are targeting signal and image processing systems. However, while designing the methodology, we have followed the specify-explorerefine design paradigm [Gajski95] in which the component is described behaviorally in VHDL and then translated into an internal data representation i.e. data flow graph (DFG) structure which captures all control and data flow dependencies of the given behavioral description. However, to reach the final implementation, the design process goes through scheduling, pipelining and module selection processes. As it enters any phase, it is explored by a local exploration scheme incorporated within that phase. In the scheduling phase, we have developed list-based scheduling algorithms that have different selection processes for nodes to be scheduled next. In the pipelining phase, the previous scheduling algorithms are extended to handle pipelining at the functional level of the component. Moreover, pipelining and scheduling processes are supported by resource sharing to decrease the design cost whenever possible. Novel evolutionary-based module selection algorithms have been developed to further refine the design cost either with or without resource sharing and with or without pipelining. 5 THE RESULTS 5.1 A NEW LIST-BASED SCHEDULING ALGORITHM List-based scheduling techniques [Gajski94] are adapted to HLS systems to solve the resource-constraint scheduling (RCS) problem. In the RCS problem we specify resource-constraints for each operation type that exist in the DFG and the objective function of the employed algorithms is to minimize the total execution time. List scheduling processes each control step (c-step) sequentially and at each c-step, subject to resource constraints, tries to choose the best operation from all the candidate operations to place into the current c-step. List scheduling uses a readylist which keeps all nodes that have all predecessors already scheduled and is always sorted with respect to a priority functions. The priority function always resolves the resource contention among operations; i.e. operations with lower priority will be deferred to the next or later c-steps. However, list-based scheduling algorithms depend predominantly on their priority function, in some cases especially in DSP-like algorithms (see Figure 1), equal priority values to some of nodes in the ready-list are produced. These equal priority values complicate the scheduler selection process in such a way that they do not guide the scheduler to efficiently select the proper operation to be firstly scheduled in the current c-step. 8

9 Such incorrect node ordering forces the scheduler to make decision errors which are translated into a sub-optimal schedule, i.e. long c-steps in the case of a RCS problem. A := (X1 * X2) + (X3 * X4); B := (X5 * X6) + (X7 * X8); C := (X1 * X4) + (X2 * X3); D := (X5 * X8) + (X6 * X7); F := A C; G := B D; A B C D (a) F G (b) (c) Figure 1: (a) Code, (b) DFG, (c) Schedule with mobility, (d) Improved schedule. To overcome such a problem we have proposed a new list-based scheduling algorithm which exploits some inherent features of data-driven digital systems (i.e. signal and image processing systems). Since these kinds of algorithms enclose in their DFGs the features of regularity and symmetry, such as a butterfly computational structure, which are found in discrete cosine transform (DCT) algorithm, as well as those regular structures that are essential computational cores found in a wide variety of filtering algorithms. Regularity in a DFG means the existence of sub-graphs (sub-dfgs) called templates, which have multiple instances in the DFG. In other words, the DFG can be decomposed into several similar sub-dfgs that, when suitably replicated, form the complete DFG. By symmetry, we mean that the template uses a set of functionally equivalent operations. The proposed scheduling algorithm starts with a preprocessing phase in which it reads the HDL description code (e.g. VHDL) and then constructs the corresponding DFG structure. The data structure used in this phase makes use of and stores all valuable data about the given design behavior. This information is included in every node data structure so as to help the scheduler to have more information about the node during the selection process (such as successor, predecessor, number of successors, tree-id(s), depth of tree and other nodes (d) 9

10 contributing to the same successor). Therefore, after construction of the DFG data structure every node knows its successor and its predecessor. The algorithm will then choose those nodes which have no successors (the last operations in the DFG) to construct trees, beginning from them as roots. The tree is constructed in such a simple way that each node starting from the last operation (i.e. selected as a root) in the DFG will pass a tree-id to its predecessor and the distance (tree-depth) is accumulated with every node until we reach an input operation. This tree-like graph contains all nodes reachable from that root, i.e. some nodes may be included in more than one tree which allow them to be considered as critical nodes or to be given the tree-id of the largest tree since we are accumulating distances from each tree root. The scheduler will use another priority function (mobility in our case) as the main priority function to generate priority values to all operations in the ready-list. Then the ready-list will be sorted as follows: for those operations that have equal main priority value (the same mobility), the scheduler will select those operations that belonging to the same tree, i.e. those contributing to the same path, using the treeid value enclosed with each node data structure. Then for those operations which have the same tree-id value, the scheduler will choose those operations contributing to the same successor (i.e. subtree) (same-successor). This simple technique is able to guide the scheduler to select the proper operation and to produce a correct schedule more quickly and efficiently than the approach described in [Govind97], which is based on graph clustering techniques. Table 1 presents the scheduling results of the DFG shown in Figure 1. The proposed algorithm supports variable execution time of functional units (multicycling, chaining) and the usage of pipelined functional units. The main results of the presented list-based scheduling algorithm are: The schedules produced by the proposed algorithm are always structured in such a way that all operations which contribute to the same path are scheduled as close together as possible, respecting the availability of resources. Results of the new algorithm are given in comparison with other well-known algorithms. However, in the worst case, the algorithm produces schedules that are similar to those using mobility alone as a priority function. Different variants from the algorithm have been developed. The proposed algorithm enhances the design space exploration process at the scheduling level significantly, since it is able to produce optimal schedules for a set of DFGs as illustrated in Figure 1. Finally, the algorithm approach demonstrates how application specific synthesis can benefit from exploiting the underlying structure of the DFG being synthesized, as well as proving that: The more we incorporate 10

11 information about the underlying DFG structure of the given design behavior, the more we get accurate and optimal/near-optimal scheduling results. Table 1 Results of the DFG illustrated in Figure 1 Res. set No. of c-steps List-based Kollig s algorithm results List-based Optimal + - * (mobility) taken from [Govind97] new approach NEW EVOLUTIONARY-BASED MODULE SELECTION ALGORITHMS The module selection problem is an optimization problem and it can be formulated using different optimization methods [Eles98], [Gajski94], or it can also be solved using heuristic techniques. On the other hand, evolutionary algorithms have, in recent years, been successfully applied to optimization [Back96]. Evolutionary algorithms are inspired by and based upon evolution in nature. They consider a large collection of solutions at once, instead of working with one solution at a time in the search space. However, we have defined the module selection process as a performance-driven problem, since we are performing the module selection process on ready schedules that are produced by RCS type algorithms, such as those described in section 5.1. Therefore, the objective function of both of the proposed algorithms is to search for the modules configuration set which has the minimum implementation cost (design area) for the specified design delay. However, the design cost (area) is estimated through the cost of modules available in modules configuration set only and performance of the final design is measured as the design total delay of the produced implementation. A real component library (CL) is employed with the proposed algorithms. However, CL contains different alternative implementations for each resource type. These are characterized by different area and latency estimates. The term modules configuration set means the complete set of modules that are selected from the CL to implement the design schedule such that it satisfies the design required delay. This set may include none or many instances of the same module that exists in the CL. 11

12 The algorithms are designed to produce the upper bound and lower bound of design costs in the initial population. The upper bound design is constructed from the fastest modules (most expensive), while the lower bound is constructed from the slowest modules (the cheapest). This will allow designers to explore the design space in between, as well as let them know the size of the design space of the design under development. One-point crossover is applied with probability p c = 60 %. Two randomly selected genes are mutated per chromosome if crossover is not used. Both operators produce correct implementations according to the schedule. In addition, tournament selection with base 2 and elitism are employed. The initial population is generated from a combination (50:50) of the fastest one and the slowest, as we have found from experiments that a (50:50) combination yields better convergence than if we had started from the fastest combination or from the slowest. The fitness function assigns higher values to chromosomes that exhibit design delay (L) equal to the required design delay (RL) and this minimizes the area (number of gates) (A) needed for the implementation. 1 if L > RL, Fitness value = MAX A 5 * L RL otherwise. (1) where MAX is a sufficiently high value. Design delay L is calculated as a sum of latencies l i of the slowest modules in each scheduled c-step used in the chromosome, which means the module which has the maximum latency (the slowest) value represents the delay of the corresponding c- step in the schedule. L = n l i i= 1 where n is the number of scheduled c-steps MODULE SELECTION ALGORITHM WITHOUT RESOURCE SHARING This algorithm provides a general solution to the module selection process in such a way that it produces implementations which have the minimum design cost while meeting RL, with no resource sharing. The algorithm starts by reading the following inputs: (i) an initial schedule that is produced from the RCS type algorithms, e.g. list-based scheduler described in section 5.1; (ii) the required RL for the final implementation; and (iii) CL, which is used by the algorithm to search for the best modules configuration set. Hence, the algorithm outputs modules configuration set which has an area A estimated according to all the resources of the produced implementation, while the RL is estimated according to the formula (2). 12 (2)

13 5.2.2 MODULE SELECTION ALGORITHM WITH RESOURCE SHARING This algorithm provides a solution to the module selection problem with resource sharing. However, resource sharing is always employed in HLS systems to reduce the design cost as much as possible, provided that performance and other design constraints can be satisfied. Different datapath operations can share the same resource if they are not executed during the same clock cycle. The main advantage of the second algorithm over the first algorithm is the ability of evolving module types from the CL and their corresponding exact positions in the final schedule. The point is that it is possible to automatically decide which of the already selected modules will be employed to implement an operation in the schedule in case that a given c-step needs less modules than are available. To do that, the algorithm requires, as an input, the number of each module type that will appear in the modules configuration set of the implementation, in addition to the inputs specified above with first algorithm. However, the term resource set which abbreviated as resource set (+2, *3), means the maximum allowable number of adders and multipliers for every c-step for the given schedule is 2 adders and 3 multipliers; this is used by RCS algorithms and represents the maximum number of each module type that will appear in the final implementation in case of resource sharing is employed. However, in this case, the total implementation area A is the sum of the areas of modules in the modules configuration set, which of course reflect the numbers provided in the schedule resource set, while the RL is estimated according to the formula (2). The main results of both evolutionary-based algorithms are that: We have observed that as the number of modules in CL were increased, the design space tended to be larger and the possibility of producing high quality designs in terms of design cost was increased, since the possibility of making a proper tradeoff during the module selection process was increased by adding more modules into the CL. The cheapest designs are those obtained by using a complete CL and including resource sharing in the design process. The obtained results clearly demonstrate the suitability of evolutionary algorithms to solve the module selection problem in the HLS process. 5.3 NEW PIPELINE-SCHEDULING ALGORITHM A pipeline-scheduling algorithm based on the list-based scheduling algorithm described in section 5.1 above. The input to the algorithm consists of the DFG, the 13

14 CL, clock cycle, the pipe stage delay and design constraints specified either as a design required resource set or data introduction interval (DII). The output of the algorithm consists of a mapped and partitioned DFG where each node is mapped to a module of the corresponding type and the DFG is partitioned into the minimum number of pipe stages, each with a delay no larger than the specified pipe stage delay. Time constraints in this algorithm are specified as constraint in DII, while the resource set represents the design area constraints. The proposed algorithm has two different pipelining strategies: forward scheduling and backward scheduling. Each has a different priority function. The scheduling priority of operations used with backward approach is based on urgency measures of operations. This is based on the critical paths starting from each node, i.e. the calculation of the computation path length including the node toward the DFG input nodes, since the selection of the modules configuration set is made before the pipelining process [Park88]. The forward approach priority function is based on graph construction technique as presented in section 5.1 above. The algorithm is supported by a function that uses a real CL to choose the proper modules configuration set which are able to perform the DFG under the specified pipe stage delay. Following this, the pipeline (forward or backward) and schedule iteration is performed which will partition the DFG into stages; each has, at maximum, the delay of the specified pipe stage delay. Concurrently scheduling is performed with the help of an allocation table. The scheduling process uses the following rule: schedule the current node (which is selected from ready-list) in the current pipe stage if adding its latency does not violate the pipe stage delay and if there is a free resource that is available to execute it without any resource conflicts with other nodes that are found in concurrently running stages. The pipelining and scheduling iteration is repeated using that rule until the end of all DFG nodes. The main result of the presented pipeline-scheduling algorithm is: The choice between doing forward / backward pipelining and resource sharing combined with clock cycle selection, pipe stage delay determination and module selection allow designers to make efficient area-performance tradeoffs by using the different strategies employed in the flexible algorithm procedure. 5.4 A DESIGN SPACE EXPLORATION METHODOLOGY The possible design space boundaries of any design are depicted in Figure 2. This figure illustrates the tradeoff process, which is governed by either maximum allowed design cost or minimum required performance. Exploring such a large 14

15 design space randomly takes up a lot of a designer s time and will produce inefficient designs. Expensive Possible design space A Max. allowed cost C Design area D Feasible design space with constraints Cheap B Slow Design delay Min. required performance Fast Figure 2: Design space boundaries: A: The fastest design; B: The cheapest design; C: The fastest design within cost constraint; and D: The cheapest design satisfying minimum required performance. However, we propose a design space exploration methodology which will, at minimum, operate in a 3D space. The individual algorithms that construct different phases of the methodology have been described in previous sections; here we are doing the unifying process. The intention of the methodology is to explore the design space systematically with respect to different design constraints. As a result, fast and sufficiently accurate statements concerning a possible implementation can be obtained. Moreover, using the proposed techniques, designers are guided toward the next steps without making bad design decisions. We assume that the system specification is first spatially partitioned into HW blocks and SW components and a HW implementation is required for each of the HW blocks. After this, the process represented by our design space exploration methodology is started for each system HW oriented block. The methodology is constructed from one preliminary step and three main steps. In preliminary step, designers explore the component s initial specification by altering the component s VHDL construction and verify this by using a test bench until the specification which best captures the component behavior is found. The initial specification is then translated into an internal data representation i.e. DFG structure. Thus, other phases of the methodology can operate on such a DFG. For 15

16 example, schedule partitioning the DFG into sub-dfgs so that each sub-dfg is executed in one c-step. However, the three main steps are scheduling, pipelining and module selection. Scheduling is considered as a trivial design space. Pipelining is used to seek highperformance implementations, while module selection is applied to reduce the implementations cost. The outline structure of the methodology is given in Figure 3. Initial specification of the HW -component Pipelining Support allocation with scheduling Support of forward pipelining Support of backward pipelining Support of resource sharing Scheduling Support allocation with scheduling Support of resource-constraints Support of time-constraint Support of structural pipelining Support of multicycled operations Module selection Evolutionary approach is used Support of resource sharing Implementations with pipelining and scheduling Implementations with pipelining and module selection Implementations with scheduling and module selection Figure 3: Conceptual structure of the design methodology. The designer using the proposed methodology can explore the design space in one or a combination of four approaches: (1) Varying the architecture of the design and changing the corresponding resource set; (2) Selecting different modules configuration sets to implement the design; (3) Pipelining the design in different ways and into a different number of stages with different modules configuration sets; and (4) Sharing resources in different ways. Scheduling phase The methodology as seen in Figure 3, contains a set of scheduling algorithms, which can start to explore the selected VHDL behavior of the HW component specification either from the time axis (starting from point D of Figure 2) or from 16

17 the area axis (ending at point C of Figure 2) of the prescribed trivial design space by using TCS or RCS approaches respectively. The set of scheduling algorithms integrated in the first phase of the methodology include: as soon as possible (ASAP), as late as possible (ALAP), force-directed scheduling (FDS) [Paulin89b], static-list scheduling algorithm and list-based scheduling algorithm. The FDS algorithm is included for comparison purposes only. The list-based scheduling algorithm was created in five different variants each with distinct priority function. The priority functions that are associated with the list-based scheduler include mobility alone, number-of-successors alone, mobility+number-of-successors, mobility+tree-structuring, and mobility+treeid+ same_successor. The purpose of using different priority functions with a list-based scheduling algorithm is to further explore the underlying structure of the schedules produced by each selection method employed in those priority functions. For instance, the priority functions mobility+tree-structuring and mobility+treeid+ same_successor are able to produce efficient structured schedules, as has been explained in section 5.1. The employed algorithms are supporting pipelined functional units, multicycling as well as resource sharing principles. At this phase, resource set size has a large impact on the scheduling results. The larger the resource set, the more exploiting parallel executions of operations are allowed, so that a higher performance can be achieved at the expense of higher area cost. By adjusting the design constraints and the resource set, designers at this level can quickly evaluate multiple implementation alternatives with different scheduling algorithms. For example, ASAP/ALAP schedules are used to define the upper bound of the design cost in the design space exploration process, point A in Figure 2, while a list-based schedule with one module for each operation type produces a lower bound for the design cost, point B in Figure 2. The output of this exploration step is a set of tables comparing different results of distinct allocated resource sets with different scheduling algorithms. Hence, designers can select those schedules which satisfy the design objective function to do further module selection or pipelining exploration processes. Pipelining phase Pipelining algorithms are developed as extensions to the scheduling algorithms described above in such a way that all the algorithms support the design process with/without pipelining, as described in section 5.2. However, for every algorithm forward and backward pipelining strategies are incorporated, each is applicable with time-constrained and resource-constrained pipelining. Resource sharing is supported in order to allow designers to reduce the design cost, while the pipelining algorithms can allow execution overlap. Moreover, the module selection 17

18 process is still applicable with pipelining. Either by using the largest stage as a design input to the module selection algorithms or by doing a simple preselection phase in which a function is employed that lists all desirable modules from CL that are applicable to work correctly with the defined clock cycle and the corresponding specified pipe stage delay. However, a local exploration scheme at this phase is granted by varying modules configuration set, clock cycle, pipe stage delay, resource sharing and the DII value. The result of this exploration step is a set of schedules each with different pipe stage delays and each corresponding to different modules configuration set each with distinct DII value. Module selection phase Evolutionary-based algorithms, as described in section 5.3, are used in the proposed methodology to do module selection process with/without resource sharing. This phase uses initial schedules as inputs, which were selected by designers from pipelined or nonpipelined schedules produced by the previous two phases. Then, a local design exploration scheme is employed to evaluate a large number of implementations by varying the required design delay and using the module selection process with/without resource sharing to find the best modules configuration set that satisfies the specified design delay. The result of this exploration step is a set of implementations each with distinct design delays, each corresponding to a different modules configuration set. Clock cycle, which will derive the selected modules configuration set, could be selected using the technique described in [Chaudh97], or our clock cycle exploration scheme, which is guided automatically by the latencies of the selected modules configuration set elements, can be used also. The benefit of postponing the clock selection process to after scheduling and the module selection processes is to have the advantage of using more modules during the module selection phase and not to constrain the module selection process to only a few candidates which agree with an a priori selected clock cycle. In other words, the selection of the clock cycle before the module selection process restricts the design space too much to choose from only a small subset of modules, which in turn will create the possibility of producing inefficient designs. If we select the clock cycle after the module selection phase, we can find an efficient clock cycle that is able to utilize the chosen subset of modules that already satisfies resource-constraints and timeconstraints, which are the main design goals in the design process. 18

19 Some illustrative results Figure 4 describes an experiment carried out to demonstrate the efficiency and quality of the designs produced by the proposed methodology for DCT benchmark In this experiment, the design space was explored by using three different architectures. Then, for every architecture a module selection exploration was performed with/without resource sharing. Figure 5 describes an experiment was carried out with finite impulse response filter (FIR) benchmark to point out the capability of combining module selection based on an evolutionary approach with a pipelining exploration process. This experiment was executed by first producing a pipelined schedule by pipelining process and then running the module selection process by using only the largest pipe stage modules. The design space was explored for three architectures and for each architecture the pipe stage delay is varied by a delay of one multiplier or one multiplier plus one adder delay. Figure 6 shows an experiment was performed with the second order differential equation solver (Diffeq) benchmark to show the process of combining different exploration paths in one exploration figure for comparison. In this experiment, the design space is explored by using three different architectures. Then, for every architecture, the design is explored by performing module selection without resource sharing, module selection with resource sharing and pipelining + module selection with resource sharing. However, Figure 6 demonstrates that highperformance and cost-efficient designs are those produced by pipelining and plus module selection with resource sharing process. Exploration time Design exploration time is a very important factor in any design process. In our presented methodology, the exploration time of each schedule for each architecture with/without pipelining is produced in less than a second using list-based scheduling algorithms. Therefore, the designer can explore any architecture with pipelining in few seconds. Exploration times of module selection process for different benchmarks are reported on Table 2. However, if we consider the exploration time (see Table 2) of module selection process for the largest benchmark used in the presented experiments, i.e., DCT benchmark, neglecting the tuning process of the evolutionary algorithms. The worst CPU time (Pentium III 700 MHz machine with 128MB RAM) for the ten runs for each design point of DCT (+2, *3) design (without resource sharing) was about 110 seconds (i.e., each design point could be obtained within 11 seconds), while it was 101 seconds for DCT (+3, *2) design and 79 seconds for DCT (+4, *3) design. However, if we consider the DCT (+2, *3) 19

20 design with a single run, exploring a design space curve with 16 implementations will take less than 3 minutes to complete. Consequently, if we consider that the design point could be obtained within 11 seconds on average for DCT benchmark, the designer can explore the design space shown in Figure 4 within 19 minutes, which is a reasonable time for such a large benchmark. Other benchmark exploration times are reported on Table 2. Table 2 Exploration time for module selection process with/without resource sharing (pop. Size: population size, Exe time: execution time in seconds, EWF: fifth order elliptic wave filter benchmark) No. of runs = 10 Design name Module selection without resource sharing algorithm Maximum Exe time number of (worst generations case) Pop. size Module selection with resource sharing algorithm Pop. Maximum size number of Exe. time (worst case) generations DCT(+2, *3) DCT(+3, *2) DCT(+4, *3) EWF(+2,*1) EWF (+2, *2) EWF (+3, *2) FIR (+2, *1) FIR (+3, *2) FIR (+5, *3) Diffeq (+1, -1, 1 <>, *1) Diffeq (+1, -1, 1 <>, *2) Diffeq (+1,-1, 1<>, *3)

21 Design area (gates) DCT(+3, *2) with module selection and resource sharing DCT(+3, *2) with module selection without resource sharing DCT(+2, *3) with module selection and resource sharing DCT(+2, *3) with module selection without resource sharing DCT(+4, *3) with module selection and resource sharing DCT(+4, *3) with module selection without resource sharing Design delay (ns) Figure 4: Module selection design space exploration process for DCT benchmark. Design area (gates) FIR (+5, *3), pipe stage delay = multiplier latency FIR (+5, *3), pipe stage delay = multiplier + adder latency FIR (+3, *2), pipe stage delay = multiplier latency FIR (+3, *2), pipe stage delay = multiplier + adder latency FIR (+2, *1), pipe stage delay = multiplier latency FIR (+2, *1), pipe stage delay= multiplier + adder latency Pipe stage delay (ns) (1/Throughput) Figure 5: Pipelining + module selection design space exploration process for FIR benchmark. 21

22 22 Design area (gates) Diffeq (+1, -1, 1<>, *3) with DII=2 Diffeq (+1, -1, 1<>, *2) with DII=3 Diffeq (+1, -1, 1<>, *1) with DII=6 Diffeq (+1, -1, 1<>, *1) with module selection and resource sharing Diffeq (+1, -1, 1<>, *2) with module selection and resource sharing Diffeq (+1, -1, 1<>, *3) with module selection and resource sharing Diffeq (+1, -1, 1<>, *1) with module selection without resource sharing Diffeq (+1, -1, 1<>, *2) with module selection without resource sharing Diffeq (+1, -1, 1<>, *3) with module selection without resource sharing Design delay (ns) Figure 6: Diffeq benchmark design space exploration: pipelining + module selection with resource sharing, module selection only with resource sharing, module selection only without resource sharing. DISCUSSION: APPLICABILITY TO HANDLE SYSTEM LEVEL DESIGNS We have presented a component based design space exploration technique. However, the proposed methodology can be generalized to handle system level design processes as well. Usually, the system level specification is given in terms of interacting concurrent processes from a behavioral point of view since the current trend in digital design process is that the initial specification specifies the system level functionality without any details of how to be implemented. Hence, the partitioning process divides the system level specification into an SW part and an HW part in the simplest case. We assume that the goal of the partitioning process is to satisfy the design timing constraints while reducing the HW cost of the design. As a result, an HW design procedure presented in previous section is applied only to a system s critical parts which have no available HW cores to implement them, i.e., the behavior will be executed in HW only if a processor is unable to satisfy the timing constraints of that behavior. Assuming that concurrent processes which have been divided into SW and HW represent system level tasks [Eles98], the following assumptions are needed for such tasks: (i) every task is a non-preemptive task; (ii) a task may be scheduled

23 only on one processor; (iii) a processor can execute only one task at a given time; and (iv) the task may begin its execution only after all its data inputs are available. Remember that the system contains a set of components and the component contains a set of modules. However, pipelining is a general technique which can be applied hierarchically to any system design by partitioning the system into concurrently running stages, using pipelined components to perform some system tasks and using pipelined modules inside the pipelined components. In order to allow component selection at system level for SW parts, we need to use a system level SW component library. However, such a library contains different components, each with different implementations which are able to execute SW tasks such as processors and DSPs. The elements of the SW component library are characterized by speed, power consumption and dollar cost. For HW parts to allow component selection at a system level, the component library (which is used by our methodology) is incorporated with different system level components with different implementations such as HW cores for MPEG and DSP filters, memories and buses. HW components are characterized by area, latency and pipe stages. Hence, the design space exploration methodology for system level could be seen as: (A) perform hierarchical pipelining; (B) create any needed custom HW cores using our proposed methodology; (C) schedule and perform component selection for SW parts; (D) schedule and perform component selection for HW parts; and (E) perform communication synthesis to integrate the whole system. 5.5 REUSABLE COMPONENT MODEL The model is designed based on the knowledge gained by the author while using VHDL to model, simulate and design of different systems, as well as from the design experience of using Synopsys Behavioral Compiler. The aim is to allow the reuse of a HW component in many applications as much as possible. Therefore, we intended to specify the component at the behavioral level using VHDL language, because the component that is specified at the behavioral level has a wider reusability domain than the component which is specified at the RTL level. The former can be reused in different applications with different constraints. The use of VHDL language allows the designing of parameterized designs, easy management of large designs, enhanced readability and permits the writing of designs independently of the technologies used for their final realization. In addition, the test designers, when using specified cores in VHDL, have sufficient knowledge about the internal structure of the core which enable them to develop a correct test strategy which smoothly admits core insertion. Furthermore, the design space 23

24 exploration process using HLS tools will enlarge the reusability domain of the component, since it permits receiving different implementations from the same specification. We assume that the designer is the person who will create the reusable component, while the user (system integrator) is the person who will reuse it. In addition, we assume that different communication units are available in the design library that is used by the user. There are some design-for-reuse requirements for HW IPs provided in [Keating99]. However, such principles will be adopted for our proposed reusable component, which we will call a behavioral component (or a component for simplicity). The design for reuse requirements are listed below: The behavioral component has to have enough general use, such as DCT component. The behavioral component has to be fully documented and its function (what to do) is properly characterized to easy system integration. The specification of the behavioral component has to be easily configurable, easy to modify and independent of the implementation technology. The behavioral component has to be implementable on multiple technologies. The behavioral component specification has to be executable on a variety of platforms and simulateable with a variety of tools. The behavioral component has to be verified independently of the application in which it will be used. The behavioral component needs to be provided with a standard interface. The behavioral component has to be specified using a uniform design methodology to ensure proper synthesizability of the component. To satisfy the listed requirements we have organized the component model in such a way that the component s main characteristics are provided to the user in the first level of the component structure. In addition, we have separated the computation core of the component from the communication part, in such a way that different interfacing circuits may be inserted by the user according to the technology and design requirements.. Furthermore, we intend to use the design methodology based on design space exploration techniques that were described in the previous sections. Figure 7 illustrates the structure of our proposed reusable component model for reuse. As seen in Figure 7, Generators are used to create final implementations upon user constraints and libraries which are available with the synthesis tools to be employed by the user. The complexity of a generator differs from one implementation to another, where the generator can include different design exploration steps such as definition of clock cycle, structure style (e.g. with/without pipelining), pipelined functional units and use RAM as a communication unit, etc. However, each optimized behavioral code of any 24

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden High Level Synthesis with Catapult MOJTABA MAHDAVI 1 Outline High Level Synthesis HLS Design Flow in Catapult Data Types Project Creation Design Setup Data Flow Analysis Resource Allocation Scheduling

More information

High-Level Synthesis (HLS)

High-Level Synthesis (HLS) Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

High-Level Synthesis

High-Level Synthesis High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction

More information

NISC Application and Advantages

NISC Application and Advantages NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 8 HW/SW Co-Design Sources: Prof. Margarida Jacome, UT Austin Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu

More information

HIGH-LEVEL SYNTHESIS

HIGH-LEVEL SYNTHESIS HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

Design Space Exploration Using Parameterized Cores

Design Space Exploration Using Parameterized Cores RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR Design Space Exploration Using Parameterized Cores Ian D. L. Anderson M.A.Sc. Candidate March 31, 2006 Supervisor: Dr. M. Khalid 1 OUTLINE

More information

High Level Synthesis

High Level Synthesis High Level Synthesis Design Representation Intermediate representation essential for efficient processing. Input HDL behavioral descriptions translated into some canonical intermediate representation.

More information

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN Xiaoying Li 1 Fuming Sun 2 Enhua Wu 1, 3 1 University of Macau, Macao, China 2 University of Science and Technology Beijing, Beijing, China

More information

Synthesis at different abstraction levels

Synthesis at different abstraction levels Synthesis at different abstraction levels System Level Synthesis Clustering. Communication synthesis. High-Level Synthesis Resource or time constrained scheduling Resource allocation. Binding Register-Transfer

More information

Lecture 7: Introduction to Co-synthesis Algorithms

Lecture 7: Introduction to Co-synthesis Algorithms Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

Hardware Software Codesign of Embedded Systems

Hardware Software Codesign of Embedded Systems Hardware Software Codesign of Embedded Systems Rabi Mahapatra Texas A&M University Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on Codesign of Embedded System

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 4. System Partitioning Lothar Thiele 4-1 System Design specification system synthesis estimation SW-compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits Synthesis and Optimization of Digital Circuits Dr. Travis Doom Wright State University Computer Science and Engineering Outline Introduction Microelectronics Micro economics What is design? Techniques

More information

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis*

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* R. Ruiz-Sautua, M. C. Molina, J.M. Mendías, R. Hermida Dpto. Arquitectura de Computadores y Automática Universidad Complutense

More information

Hardware Software Codesign of Embedded System

Hardware Software Codesign of Embedded System Hardware Software Codesign of Embedded System CPSC489-501 Rabi Mahapatra Mahapatra - Texas A&M - Fall 00 1 Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING 1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation

More information

Advanced Design System DSP Synthesis

Advanced Design System DSP Synthesis Advanced Design System 2002 DSP Synthesis February 2002 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

Lecture 20: High-level Synthesis (1)

Lecture 20: High-level Synthesis (1) Lecture 20: High-level Synthesis (1) Slides courtesy of Deming Chen Some slides are from Prof. S. Levitan of U. of Pittsburgh Outline High-level synthesis introduction High-level synthesis operations Scheduling

More information

VHDL simulation and synthesis

VHDL simulation and synthesis VHDL simulation and synthesis How we treat VHDL in this course You will not become an expert in VHDL after taking this course The goal is that you should learn how VHDL can be used for simulation and synthesis

More information

High Level Abstractions for Implementation of Software Radios

High Level Abstractions for Implementation of Software Radios High Level Abstractions for Implementation of Software Radios J. B. Evans, Ed Komp, S. G. Mathen, and G. Minden Information and Telecommunication Technology Center University of Kansas, Lawrence, KS 66044-7541

More information

RTL Coding General Concepts

RTL Coding General Concepts RTL Coding General Concepts Typical Digital System 2 Components of a Digital System Printed circuit board (PCB) Embedded d software microprocessor microcontroller digital signal processor (DSP) ASIC Programmable

More information

A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment

A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment JAVIER RESANO, ELENA PEREZ, DANIEL MOZOS, HORTENSIA MECHA, JULIO SEPTIÉN Departamento de Arquitectura de Computadores

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

Partitioning Methods. Outline

Partitioning Methods. Outline Partitioning Methods 1 Outline Introduction to Hardware-Software Codesign Models, Architectures, Languages Partitioning Methods Design Quality Estimation Specification Refinement Co-synthesis Techniques

More information

Verilog. What is Verilog? VHDL vs. Verilog. Hardware description language: Two major languages. Many EDA tools support HDL-based design

Verilog. What is Verilog? VHDL vs. Verilog. Hardware description language: Two major languages. Many EDA tools support HDL-based design Verilog What is Verilog? Hardware description language: Are used to describe digital system in text form Used for modeling, simulation, design Two major languages Verilog (IEEE 1364), latest version is

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

FPGAs: FAST TRACK TO DSP

FPGAs: FAST TRACK TO DSP FPGAs: FAST TRACK TO DSP Revised February 2009 ABSRACT: Given the prevalence of digital signal processing in a variety of industry segments, several implementation solutions are available depending on

More information

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it.

MODELING LANGUAGES AND ABSTRACT MODELS. Giovanni De Micheli Stanford University. Chapter 3 in book, please read it. MODELING LANGUAGES AND ABSTRACT MODELS Giovanni De Micheli Stanford University Chapter 3 in book, please read it. Outline Hardware modeling issues: Representations and models. Issues in hardware languages.

More information

Hardware/Software Co-design

Hardware/Software Co-design Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Choosing an Intellectual Property Core

Choosing an Intellectual Property Core Choosing an Intellectual Property Core MIPS Technologies, Inc. June 2002 One of the most important product development decisions facing SOC designers today is choosing an intellectual property (IP) core.

More information

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India

Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India Mapping Signal Processing Algorithms to Architecture Sumam David S Head, Dept of Electronics & Communication National Institute of Technology Karnataka, Surathkal, India sumam@ieee.org Objectives At the

More information

Topics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1)

Topics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1) Topics Verilog Hardware modeling and simulation Event-driven simulation Basics of register-transfer design: data paths and controllers; ASM charts. High-level synthesis Initially a proprietary language,

More information

RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER

RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER RISC IMPLEMENTATION OF OPTIMAL PROGRAMMABLE DIGITAL IIR FILTER Miss. Sushma kumari IES COLLEGE OF ENGINEERING, BHOPAL MADHYA PRADESH Mr. Ashish Raghuwanshi(Assist. Prof.) IES COLLEGE OF ENGINEERING, BHOPAL

More information

ASIC world. Start Specification Design Verification Layout Validation Finish

ASIC world. Start Specification Design Verification Layout Validation Finish AMS Verification Agenda ASIC world ASIC Industrial Facts Why Verification? Verification Overview Functional Verification Formal Verification Analog Verification Mixed-Signal Verification DFT Verification

More information

Chapter 5: ASICs Vs. PLDs

Chapter 5: ASICs Vs. PLDs Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task.

More information

High Data Rate Fully Flexible SDR Modem

High Data Rate Fully Flexible SDR Modem High Data Rate Fully Flexible SDR Modem Advanced configurable architecture & development methodology KASPERSKI F., PIERRELEE O., DOTTO F., SARLOTTE M. THALES Communication 160 bd de Valmy, 92704 Colombes,

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 01 Introduction Welcome to the course on Hardware

More information

High-Level Synthesis Creating Custom Circuits from High-Level Code

High-Level Synthesis Creating Custom Circuits from High-Level Code High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,

More information

Parallel FIR Filters. Chapter 5

Parallel FIR Filters. Chapter 5 Chapter 5 Parallel FIR Filters This chapter describes the implementation of high-performance, parallel, full-precision FIR filters using the DSP48 slice in a Virtex-4 device. ecause the Virtex-4 architecture

More information

Combinational Circuit Design Using Genetic Algorithms

Combinational Circuit Design Using Genetic Algorithms Combinational Circuit Design Using Genetic Algorithms Nithyananthan K Bannari Amman institute of technology M.E.Embedded systems, Anna University E-mail:nithyananthan.babu@gmail.com Abstract - In the paper

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

CMPE 415 Programmable Logic Devices Introduction

CMPE 415 Programmable Logic Devices Introduction Department of Computer Science and Electrical Engineering CMPE 415 Programmable Logic Devices Introduction Prof. Ryan Robucci What are FPGAs? Field programmable Gate Array Typically re programmable as

More information

Modeling and Simulating Discrete Event Systems in Metropolis

Modeling and Simulating Discrete Event Systems in Metropolis Modeling and Simulating Discrete Event Systems in Metropolis Guang Yang EECS 290N Report December 15, 2004 University of California at Berkeley Berkeley, CA, 94720, USA guyang@eecs.berkeley.edu Abstract

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

Self-checking combination and sequential networks design

Self-checking combination and sequential networks design Self-checking combination and sequential networks design Tatjana Nikolić Faculty of Electronic Engineering Nis, Serbia Outline Introduction Reliable systems Concurrent error detection Self-checking logic

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016 NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

101-1 Under-Graduate Project Digital IC Design Flow

101-1 Under-Graduate Project Digital IC Design Flow 101-1 Under-Graduate Project Digital IC Design Flow Speaker: Ming-Chun Hsiao Adviser: Prof. An-Yeu Wu Date: 2012/9/25 ACCESS IC LAB Outline Introduction to Integrated Circuit IC Design Flow Verilog HDL

More information

Hardware Design and Simulation for Verification

Hardware Design and Simulation for Verification Hardware Design and Simulation for Verification by N. Bombieri, F. Fummi, and G. Pravadelli Universit`a di Verona, Italy (in M. Bernardo and A. Cimatti Eds., Formal Methods for Hardware Verification, Lecture

More information

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction

More information

Report on benchmark identification and planning of experiments to be performed

Report on benchmark identification and planning of experiments to be performed COTEST/D1 Report on benchmark identification and planning of experiments to be performed Matteo Sonza Reorda, Massimo Violante Politecnico di Torino Dipartimento di Automatica e Informatica Torino, Italy

More information

Navigating the RTL to System Continuum

Navigating the RTL to System Continuum Navigating the RTL to System Continuum Calypto Design Systems, Inc. www.calypto.com Copyright 2005 Calypto Design Systems, Inc. - 1 - The rapidly evolving semiconductor industry has always relied on innovation

More information

Design Methodologies. Kai Huang

Design Methodologies. Kai Huang Design Methodologies Kai Huang News Is that real? In such a thermally constrained environment, going quad-core only makes sense if you can properly power gate/turbo up when some cores are idle. I have

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience

Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience Mapping Multi-Million Gate SoCs on FPGAs: Industrial Methodology and Experience H. Krupnova CMG/FMVG, ST Microelectronics Grenoble, France Helena.Krupnova@st.com Abstract Today, having a fast hardware

More information

Constraint Analysis and Heuristic Scheduling Methods

Constraint Analysis and Heuristic Scheduling Methods Constraint Analysis and Heuristic Scheduling Methods P. Poplavko, C.A.J. van Eijk, and T. Basten Eindhoven University of Technology, Department of Electrical Engineering, Eindhoven, The Netherlands peter@ics.ele.tue.nl

More information

Introduction. Sungho Kang. Yonsei University

Introduction. Sungho Kang. Yonsei University Introduction Sungho Kang Yonsei University Outline VLSI Design Styles Overview of Optimal Logic Synthesis Model Graph Algorithm and Complexity Asymptotic Complexity Brief Summary of MOS Device Behavior

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Design methodology for programmable video signal processors. Andrew Wolfe, Wayne Wolf, Santanu Dutta, Jason Fritts

Design methodology for programmable video signal processors. Andrew Wolfe, Wayne Wolf, Santanu Dutta, Jason Fritts Design methodology for programmable video signal processors Andrew Wolfe, Wayne Wolf, Santanu Dutta, Jason Fritts Princeton University, Department of Electrical Engineering Engineering Quadrangle, Princeton,

More information

SpecC Methodology for High-Level Modeling

SpecC Methodology for High-Level Modeling EDP 2002 9 th IEEE/DATC Electronic Design Processes Workshop SpecC Methodology for High-Level Modeling Rainer Dömer Daniel D. Gajski Andreas Gerstlauer Center for Embedded Computer Systems Universitiy

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

Advanced Design System 1.5. DSP Synthesis

Advanced Design System 1.5. DSP Synthesis Advanced Design System 1.5 DSP Synthesis December 2000 Notice The information contained in this document is subject to change without notice. Agilent Technologies makes no warranty of any kind with regard

More information

Lab #1: Introduction to Design Methodology with FPGAs part 1 (80 pts)

Lab #1: Introduction to Design Methodology with FPGAs part 1 (80 pts) Nate Pihlstrom, npihlstr@uccs.edu Lab #1: Introduction to Design Methodology with FPGAs part 1 (80 pts) Objective The objective of this lab assignment is to introduce and use a methodology for designing

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University

More information

DESIGN OF AN FFT PROCESSOR

DESIGN OF AN FFT PROCESSOR 1 DESIGN OF AN FFT PROCESSOR Erik Nordhamn, Björn Sikström and Lars Wanhammar Department of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract In this paper we present a structured

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

CAD SUBSYSTEM FOR DESIGN OF EFFECTIVE DIGITAL FILTERS IN FPGA

CAD SUBSYSTEM FOR DESIGN OF EFFECTIVE DIGITAL FILTERS IN FPGA CAD SUBSYSTEM FOR DESIGN OF EFFECTIVE DIGITAL FILTERS IN FPGA Pavel Plotnikov Vladimir State University, Russia, Gorky str., 87, 600000, plotnikov_pv@inbox.ru In given article analyze of DF design flows,

More information

Designing with VHDL and FPGA

Designing with VHDL and FPGA Designing with VHDL and FPGA Instructor: Dr. Ahmad El-Banna lab# 1 1 Agenda Course Instructor Course Contents Course References Overview of Digital Design Intro. to VHDL language and FPGA technology IDE

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

Evolutionary Algorithm for Embedded System Topology Optimization. Supervisor: Prof. Dr. Martin Radetzki Author: Haowei Wang

Evolutionary Algorithm for Embedded System Topology Optimization. Supervisor: Prof. Dr. Martin Radetzki Author: Haowei Wang Evolutionary Algorithm for Embedded System Topology Optimization Supervisor: Prof. Dr. Martin Radetzki Author: Haowei Wang Agenda Introduction to the problem Principle of evolutionary algorithm Model specification

More information

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer) ESE Back End 2.0 D. Gajski, S. Abdi (with contributions from H. Cho, D. Shin, A. Gerstlauer) Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu 1 Technology advantages

More information

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca Datapath Allocation Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca e-mail: baruch@utcluj.ro Abstract. The datapath allocation is one of the basic operations executed in

More information

Software Architecture

Software Architecture Software Architecture Does software architecture global design?, architect designer? Overview What is it, why bother? Architecture Design Viewpoints and view models Architectural styles Architecture asssessment

More information

CHAPTER 5. CHE BASED SoPC FOR EVOLVABLE HARDWARE

CHAPTER 5. CHE BASED SoPC FOR EVOLVABLE HARDWARE 90 CHAPTER 5 CHE BASED SoPC FOR EVOLVABLE HARDWARE A hardware architecture that implements the GA for EHW is presented in this chapter. This SoPC (System on Programmable Chip) architecture is also designed

More information

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation Datasheet Create a Better Starting Point for Faster Physical Implementation Overview Continuing the trend of delivering innovative synthesis technology, Design Compiler Graphical streamlines the flow for

More information

System Level Design Flow

System Level Design Flow System Level Design Flow What is needed and what is not Daniel D. Gajski Center for Embedded Computer Systems University of California, Irvine www.cecs.uci.edu/~gajski System Level Design Flow What is

More information

Emergence of Segment-Specific DDRn Memory Controller and PHY IP Solution. By Eric Esteve (PhD) Analyst. July IPnest.

Emergence of Segment-Specific DDRn Memory Controller and PHY IP Solution. By Eric Esteve (PhD) Analyst. July IPnest. Emergence of Segment-Specific DDRn Memory Controller and PHY IP Solution By Eric Esteve (PhD) Analyst July 2016 IPnest www.ip-nest.com Emergence of Segment-Specific DDRn Memory Controller IP Solution By

More information

EEL 5722C Field-Programmable Gate Array Design

EEL 5722C Field-Programmable Gate Array Design EEL 5722C Field-Programmable Gate Array Design Lecture 19: Hardware-Software Co-Simulation* Prof. Mingjie Lin * Rabi Mahapatra, CpSc489 1 How to cosimulate? How to simulate hardware components of a mixed

More information

Intro to High Level Design with SystemC

Intro to High Level Design with SystemC Intro to High Level Design with SystemC Aim To introduce SystemC, and its associated Design Methodology Date 26th March 2001 Presented By Alan Fitch Designer Challenges Design complexity System on Chip

More information

Lecture Compiler Backend

Lecture Compiler Backend Lecture 19-23 Compiler Backend Jianwen Zhu Electrical and Computer Engineering University of Toronto Jianwen Zhu 2009 - P. 1 Backend Tasks Instruction selection Map virtual instructions To machine instructions

More information

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0; How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory

More information

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION 131 CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION 6.1 INTRODUCTION The Orthogonal arrays are helpful in guiding the heuristic algorithms to obtain a good solution when applied to NP-hard problems. This

More information

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2013 Agenda Introduction

More information

Xilinx DSP. High Performance Signal Processing. January 1998

Xilinx DSP. High Performance Signal Processing. January 1998 DSP High Performance Signal Processing January 1998 New High Performance DSP Alternative New advantages in FPGA technology and tools: DSP offers a new alternative to ASICs, fixed function DSP devices,

More information

Distributed Vision Processing in Smart Camera Networks

Distributed Vision Processing in Smart Camera Networks Distributed Vision Processing in Smart Camera Networks CVPR-07 Hamid Aghajan, Stanford University, USA François Berry, Univ. Blaise Pascal, France Horst Bischof, TU Graz, Austria Richard Kleihorst, NXP

More information