THIS paper describes a new algorithm for performance

Size: px
Start display at page:

Download "THIS paper describes a new algorithm for performance"

Transcription

1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 5, NO. 2, JUNE Unifiable Scheduling and Allocation for Minimizing System Cycle Time Steve C.-Y. Huang and Wayne H. Wolf, Senior Member, IEEE Abstract This paper describes a new scheduling and allocation algorithm which optimizes a datapath-controller system for clock cycle time. The cycle time of a VLSI system depends not only on the characteristics of the datapath and controller in isolation but also on the interactions between them. A datapath may impose both arrival time constraints on controller inputs and departure time constraints on controller outputs. Late-arriving controller inputs may be generated by complex datapath functions, such as ALU carry-out, while early-departure controller outputs may be required to control slow datapath units. If the controller is not designed taking into account arrival and departure times, it may unnecessarily put control logic on the critical timing path. Our synthesis heuristic, which can be used in conjunction with other scheduling heuristics, identifies critical interactions between datapath and controller and reallocates/reschedules them to reduce system cycle time during high-level synthesis. Experimental results show that a unifiable scheduling and allocation (USA) can substantially improve system cycle time with only small area penalties. Index Terms Allocation, clock period, high-level synthesis, scheduling. I. INTRODUCTION THIS paper describes a new algorithm for performance optimization of datapath-controller systems. The algorithm minimizes the cycle time of the system by analyzing interactions between the datapath and the controller and using scheduling and allocation to avoid creating long critical delay paths which involve both the datapath and the controller. Because most high-level synthesis algorithms consider only the controller or datapath in isolation, this algorithm is an important tool for high-level performance optimization. Fig. 1(a) is a behavioral description to be executed in a control step. If we do not use two multipliers to precompute the ( ) and ( ) operations, we would just need one subtracter and one multiplier to execute the description, as illustrated in Fig. 1(b). Because there are no data dependencies between the subtraction and the multiplication, ideally these two operations could be executed in parallel if we do not consider the controller s effect on the total system cycle time. Suppose that the subtraction has a delay of ns and the multiplication has a delay of ns, then the system cycle Manuscript received September 13, 1994; revised October This work was supported by the National Science Foundation under Grant MIP S. C.-Y. Huang was with the Department of Electrical Engineering, Princeton University, Princeton, NJ USA. He is now with Synopsys, Mountain View, CA USA. W. H. Wolf is with the Department of Electrical Engineering, Princeton University, Princeton, NJ USA. Publisher Item Identifier S (97) Fig. 1. The ideal design: subtraction and multiplication can be executed in parallel. time would be the maximum of and. In this case the estimated cycle time would be, because subtraction usually has a smaller delay than multiplication. However, the controller also plays an important role in determining the system cycle time. In the past, interactions between datapath and controller have not been well considered in system design or estimation. To simplify synthesis, traditional approaches usually treat the controller and datapath separately and cannot take into account their interactions. Although the controller usually occupies less area than datapath components, Fig. 1(b) is still an idealization which oversimplifies the controller s influence on system cycle time. In contrast, Fig. 2 shows how datapath and controller interactions can substantially increase the cycle time. In the controller of Fig. 2, represents the condition, which is a primary input from the datapath into the controller; PS and NS denote present state and next state respectively; and are the controller s output signals, which are /97$ IEEE

2 198 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 5, NO. 2, JUNE 1997 Fig. 2. The worst-case design: controller implementation might introduce undesired dependencies between datapath components Sub and Mul. also the multiplexer select signals for the multiplier component in the datapath. In this configuration, datapath components may impose both arrival time constraints on controller input ( ) and departure time constraints on controller outputs ( and ). The simplified controller can be encoded by a state assignment program JEDI [1] and implemented in a multilevel logic form using SIS [2]. Equation (1) shows the logic functions of and An finite state machine (FSM) output is dependent on input if is a function of.if has no dependency on, is independent of. From (1), we notice that output signals and are dependent on the primary input. Therefore, the functional unit can not select the proper multiplication operands to perform the multiplication until the functional unit is done with the subtraction. In other words, the controller implementation introduces undesired dependencies between datapath components and. The multiplexer (MUX) select signals have to wait for the late-arriving input signals to continue the following datapath operations. As a result, operations and are actually not executed in parallel as predicted in Fig. 1(b). In fact, they would be executed in a serial manner due to the undesired dependencies introduced by the controller implementation as described in Fig. 2. The estimated system cycle time would be ( ) ns rather max( ) ns. Traditional high-level synthesis methods emphasize either the datapath or the controller, but make it difficult to study the interactions between them. As this example shows, datapathcontroller interactions can be subtle yet have a profound influence on the system cycle time. These problems occur at the cycle on which the conditional occurs path-based scheduling algorithms try to share resources across conditionals, but do not consider the interaction between the conditional logic itself with the operations performed immediately afterward. One way to eliminate these long delay paths is to require (1) every controller to be Moore machine. However, that solution introduces unnecessary memory elements. Our scheduling algorithm adds additional clock cycle delays only where required to minimize cycle time. In previous work [3], we demonstrated that in many cases the total time required for a schedule can be reduced by trading cycle time for the number of cycles. Reducing the cycle time of the controller-datapath system requires analyzing the timing of the control signals which flow between the two components. The undesired dependencies from the controller are usually related to the controller structure. Scheduling and allocation affect the design of the controller and its interaction with the datapath, helping to determine the delay of the controllerdatapath system. Therefore, to minimize system cycle time, we must simultaneously schedule the controller and datapath. In many cases, the undesired dependencies can be minimized by generating a unifiable controller structure [3], which we will briefly review in Section II-C. The idea is to minimize the controller output s dependencies on late-arriving input signals from datapath components, so that controller output s departure time can be minimized to continue performing the remaining datapath operations. In this paper, we propose a unifiable scheduling and allocation (USA) method, which is a mixed scheduling and allocation approach, to generate a unifiable controller. This approach is particularly useful when a system has both limited hardware and tight performance constraints other approaches may slow down the circuit unnecessarily. Our ultimate goal is to minimize the unnecessary dependencies between datapath and controller so that system cycle time can be improved with very small to no area overhead. Our heuristics can be used in conjunction with other scheduling and allocation heuristics our presentation of USA will be based on force-directed list scheduling and left-edge allocation. The USA heuristics, when used in conjunction with other heuristics, ensure that system cycle time and controllerdatapath operations will be taken into account during scheduling. The next section reviews previous work in scheduling and allocation and dependency reduction in register-transfer machines. Section III describes our input representation and target architecture. Section IV uses an example to illustrate the ideas in unifiable scheduling and allocation and Section V explains the algorithms in detail. Experimental results are discussed in Section VI. Finally, conclusions are presented in Section VII. II. PREVIOUS WORK Given a behavioral description, we can generate a corresponding internal design representation suitable for further processing. A basic block is defined as a straight-line sequence of operations without branches. A data flow graph (DFG) describes the basic block s data dependencies: a vertex corresponds to an operation and a directed edge corresponds to a variable and its associated data dependency. We refer to a data flow graph for whom a schedule has been found a scheduled data flow graph (SDFG). A control-data flow

3 HUANG AND WOLF: MINIMIZING SYSTEM CYCLE TIME 199 Fig. 3. List scheduling. graph (CDFG) is a DFG with control flows. Scheduling and allocation algorithms usually work on design representations similar to DFG s and CDFG s. We will discuss CDFG s in more detail in Section III. A. Scheduling Scheduling approaches can be divided into time-constrained and resource-constrained scheduling [5], [6]. Linear programming (LP) and integer linear programming (ILP) approaches have been used to solve the general scheduling problem by [7], [8]. However, since both time-and resource-constrained scheduling are NP-complete [9], most synthesis systems use heuristic scheduling methods. Most scheduling algorithms concentrate on scheduling of datapath operations and model delay only as the number of clock cycles required to execute a resource (which may be fractional in the case of chained resources). This simple model ignores the importance of the clock cycle time in determining the total time required to execute a sequence of operations and, as a result, do not consider the influence of control signals and controller delays on the system cycle time. List scheduling, summarized in Fig. 3, keeps a list of ready operations to be scheduled by a priority function. The algorithm schedules the nodes in the data flow graph starting from the sources and working toward the sinks, assigning the operations to a control step Cstep. The prioritize() function chooses which of the currently-ready operations are to be scheduled at the current control step. The exact function used by prioritize() to choose operators for scheduling can be modified to implement different scheduling heuristics: Splicer used the mobility of an operation as the priority function [10]; Cathedral-II [11] and ELF [12] used urgency of an operation as the priority function; The HAL system s forcedirected list scheduling (FDLS) prioritizes based on force computations [13]. We will see in Section V that USA is based on force-directed list scheduling. While the details of the force computations are not essential to an understanding of USA, FDLS s forces model both the uniformity with which operations are allocated to resources over time and the effects of data dependencies on that allocation. USA augments the FDLS force computations with unifiability heuristics to determine which operations will be scheduled at each call to prioritize(). The work which has most closely considered logic delays in scheduling is by Kuehlmann and Bergamaschi, who added to HIS a timing-driven scheduling algorithm based on timing modules and behavioral timing networks [14]. Control flow timing network and data flow timing network were generated separately. Then these two networks were connected to model the interactions between the control flow and the data flow. However, a one-hot encoding model was used to represent each state in the control flow timing network. Onehot encoding usually produces designs which are inefficient in both area and testability; if a different encoding is used for the implementation, the results of the one-hot analysis are invalidated. Also, the delay estimation for scheduling was based on incremental addition of worst case delays on a limited number of timing modules. The IBM high-level synthesis system HIS used a pathbased scheduling algorithm to minimize the number of control steps under given constraints such as timing and area [15]. Wakabayashi and Yoshimura [16] developed a different algorithm with similar optimization goals. These algorithms schedule operations on the legs of a conditional to maximize the commonality of the datapaths required for the conditionals, minimizing hardware requirements. These algorithms do not change the scheduling of operations across the conditionals. And as we will see in Section IV, some of the decisions made by these algorithms to minimize hardware cost have the side effect of decreasing system performance. Parallel compiler optimizations, such as program dependency graph (PDG) operations [17], can be used to reorder operations in nested conditionals. Other research into scheduling has attacked different aspects of the problem. Ku and De Micheli used a relative scheduling formulation to support operations with fixed and unbounded delays [18]. It emphasized interface synthesis whose delays are unknown at compile time but did not consider the interaction between datapath and controller delays. In controller synthesis, Nestor and Tamas presented a heuristic algorithm that exploits

4 200 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 5, NO. 2, JUNE 1997 scheduling freedom to minimize the controller area [19]. Delay was not addressed in that work. They used Espresso [20] as a predictor, it was difficult to deduce dependency properties of the logic implementation from the schedule in this approach. Chaudhuri et al. [21] developed an algorithm which selects a clock cycle time during scheduling and allocation to optimize performance. Their work considers only datapath delays, not controller contributions to clock cycle time. In the scheduling approaches discussed above, arrival/ departure timing constraints between the datapath and controller have not been taken into account to minimize dependencies on late-arriving signals and thus to improve system cycle time. The first scheduling work which considered this kind of controller-datapath interactions was MDS, which applied the minimal dependency concept to schedule controller events [3]. This work showed the first cycle time-number of cycles tradeoff. By increasing the number of cycles while reducing the clock period, the algorithm could in many cases reduced the total execution time (cycle time number of cycles) required for a computation. B. Allocation In general, allocation approaches can be categorized as decomposition, greedy constructive, or iterative refinement approaches [5]. Most previous work in allocation has concentrated on optimizing some combination of datapath resources, registers, functional units, and interconnect. The REAL system introduced the left-edge algorithm, similar to the left-edge algorithm used in channel routing, for register allocation [22]. The left-edge algorithm considers the variables in order of the time step in which they are first defined. A register is considered available at Cstep if the last variable which was assigned to it has ended its lifetime at. Variables are assigned to registers greedily previously defined and available registers are used first, then new registers are created. This greedy algorithm is optimal for acyclic scheduled data flow graphs, as shown by Kurdahi and Parker. As we will see in Section V, USA uses a modification of the left-edge algorithm, in which a variable is allocated to a register only if it surpasses a unifiability threshold. The EMUCS systems used a global selection criterion to allocate the next element for minimal number of registers, modules and multiplexers [23]. The STAR package used branch and bound search for subtask space and performs a constructive binding followed by an iterative refinement for minimal hardware resources [24]. The OAS synthesizer applied integer programming model for scheduling and allocation for embedded VLSI chips [25]. A method for resource sharing and minimal control sequence based on condition vector has been presented in Cyber [26]. These approaches did not try to predict the controller structure. As a result, allocation decisions could have adverse affects on the controller s delay and, therefore the entire cycle time of the entire controller-datapath system. MCD, developed by Huang and Wolf, first applied the minimal dependency concept in datapath allocation to reduce the controller delay on the critical path [27]. TABLE I FSM-1, z0 IS DISTINCT FOR S1 AND z1 IS UNIFIABLE FOR S1 C. Controller Unifiability and Minimum Dependency In earlier work, we introduced the unifiability optimization, which uses don t-care conditions in the controller to eliminate unnecessary dependencies of primary outputs on primary inputs. We developed an algorithm for minimum-dependency state encoding [28]. For state in a Mealy machine, ifthe value of output has at least one 1 and at least one 0 on the transitions out of, we say that output is distinct for state. An output is unifiable for state if is not distinct for state. For instance, if the value of is either 1 or don tcare in transitions with state as the source state, then is unifiable for. is called a unifiable controller if every primary output is unifiable for each state. We want to build a unifiable controller with respect to late-arriving input signals, but not necessarily to all input signals. This is different from building a Moore machine in which output values depend only on the present state. For simplicity, we will assume that all the input signals to the finite state machine are late-arriving signals throughout this paper. Therefore, having a unifiable output corresponds to having a unifiable output with respect to late-arriving input signals in this context. For example, for FSM-1 in Table I, there are two transitions associated with. One transition specifies that binary output is 1 when input is 0, where PO in Table I means primary output. The other transition indicates that becomes 0 when input is 1. Therefore is distinct (not unifiable) for state. However, is unifiable for, because is either 0 or don t-care for the two transitions associated with. Suppose that is a latearriving input signal and other input signals are not shown for clarity. If we assign the don t-care as 0, then s value is unified to be 0 for branch state with respect to late-arriving signal. A unifiable output s logic function can eventually be independent of the late primary input. For instance, if we treat the symbolic present state input as an input in addition to primary input and assign the don t-care in the last row of Table I to be 0, we can write s function as independent of. However, the nonunifiable output s logic function will depend on primary input. We developed the PDS algorithm [28] to perform this minimum-dependency-driven don t-care assignment. After generating the symbolic state transition table, we encoded the states FSM-1 using standard methods and implemented it in multilevel logic form using scripts in SIS [2]. (SIS is a sequential logic optimization system which includes features from the combinational optimization system misii.) The relationship between unifiability and dependency is illustrated by the equations, where

5 HUANG AND WOLF: MINIMIZING SYSTEM CYCLE TIME 201 Fig. 4. (a) (b) (a) Part of a CDFG derived from Sehwa and (b) an annotated CDFG. is a binary present state variable arriving around time zero. The unifiability of a binary controller output can be verified by performing XOR operation on its care output values with respect to branch states. That is, we need only look at the ones and zeros of a controller output values at conditional states. If the XOR result is zero, then the output values must be either all ones or all zeros in addition to don t-care values. In this case, the output is unifiable and it will lead to a minimal dependency structure. On the other hand, if the XOR result is one, the output is not unifiable and the resulting controller structure is not desirable. Scheduling for minimal dependency and allocation for minimal dependency have been applied separately in our previous work [3], [27]. In this paper, we present a mixed scheduling and allocation approach and use a symbolic PO controller rather than a binary PO controller to investigate unifiability even earlier in the processing stage. III. INPUT REPRESENTATION AND TARGET ARCHITECTURE A control data flow graph (CDFG) is a representation for a behavioral description with conditional branches. The vertices in a CDFG can be categorized as operation node, conditional node, and join node. Operation node denotes functional unit operation. Conditional node specifies the beginning of the operations following a branch condition, and join node denotes the end of the conditional operations. Variable in a CDFG is associated with the edge between vertices. Usually is a temporary value to be stored in a register or other storage unit. For instance, Fig. 4(a) is part of a CDFG derived from Sehwa [29]. In Fig. 4(a), 7 and 8 are operation nodes; is a conditional node;, and are join nodes, and and are variables. Conditional nodes and join nodes are important features which distinguish a CDFG from a data flow graph (DFG). For example in Fig. 4(a), conditional node and join node imply that operations 7 and 8 are mutually exclusive due to the conditional branch introduced. However, some CDFG benchmark representations do not give specific detail of the operation involved with conditional Fig. 5. A system with a nonpipelined controller. nodes. For instance, it is not clear to us what kind of operations are associated with the conditional node in Fig. 4(a). By simply looking at the join nodes and above the conditional node, it is also difficult to determine what operands are associated with the operations following the conditional node. To overcome the problem, some previous work assumed that the conditional signal is an external signal generated from some operation not specified in the CDFG [30], [31]. Because our goal is to generate both the datapath and the controller for a given CDFG benchmark, we would like to attach some more information to the conditional node operation. Fig. 4(b) is an annotated CDFG derived from Fig. 4(a). In Fig. 4(b), a?5 node is used to denote the conditional node operation, where in this case it is a chained subtraction and comparison operation. The left branch is executed when the condition is false. The right branch is executed when the condition is true. The left and right operands will flow to the operation following the conditional node accordingly. With the modified configuration, we will be able to generate a consistent datapath and controller for a given CDFG. Our target architecture is a configuration with a nonpipelined controller (as shown in Fig. 5) and a multiplexer-based datapath [5]. The dotted lines in Fig. 5 denote the signal communication between the controller and the datapath. Multiplexers are used at the inputs of both registers and functional units. We will discuss the extensions required to synthesize pipelined and multiple FSM controllers in Section VII. Given an unscheduled CDFG, the synthesis goal is to find a scheduling and allocation method to minimize the system cycle time, including both the datapath and the controller. Minimal overhead in the number of control steps and area should be achieved. The cycle time and area are to be measured after the technology mapping of the logic design. IV. AN EXAMPLE USA is a mixed scheduling and allocation algorithm because it uses a model to predict the resulting allocation configuration in the course of the unifiable scheduling. We will use an example to introduce our unifiable scheduling and allocation

6 202 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 5, NO. 2, JUNE 1997 Fig. 6. A nonunifiable allocated SCDFG. (USA) approach in this section. Section V will describe the USA algorithm in detail. A scheduled CDFG (SCDFG) is a CDFG where each operation has been assigned to a specific control step. Fig. 6 is part of an example SCDFG. For simplicity, we only show the conditional branch part in detail. In the following discussion, we will first assume that there is only one adder available to schedule and allocate the whole CDFG, and inputs, and are stored separately. In Fig. 6, both and are bound to by conditional resource sharing. Because Fig. 6 also includes some allocation information, we call it an allocated SCDFG to differentiate that from a traditional SCDFG. The figure also shows a partial state transition table for the controller around the conditional. and are the left operand and the right operand of, respectively. The figure also shows the symbolic PO controller in which the primary outputs are still in symbolic form. The concept is similar to the symbolic control table in Bridge [32]. According to Fig. 6, if is false, will be executed. Therefore in the state transition table, and will have and as the input, respectively. This is shown in the first row of state transition where primary input is 0 and present state is. Similarly if is true, and will become the inputs to. Fig. 6 also shows the multiplexer configuration for. The signal is used to select the appropriate input to. Similarly, is for. The controller for this datapath operates such that, when is false, and will become 0 to select and into and accordingly. Therefore, ( ) is performed on. Similarly, if is true, ( ) is performed on. In Section II-C, we discussed the concept of a unifiable controller. This scheduling and allocation shown in Fig. 6 produces a nonunifiable controller and

7 HUANG AND WOLF: MINIMIZING SYSTEM CYCLE TIME 203 Fig. 7. A unifiable allocated design from USA. are not unifiable for with respect to the controller input. We can write formulas and, where present state variables are denoted in a symbolic form. The primary input of the controller is, which is a late-arriving input from the datapath into the controller. The multiplexer control outputs have undesired dependencies on the late arriving input. Therefore, it will slow down the operation to select the appropriate operands for the functional unit, and hence delay the addition operation. The total system cycle time will potentially be increased. The concept is similar to that of Fig. 2 in Section I. If we use the symbolic PO table in Fig. 6 to investigate why the nonunifiable allocation of the SCDFG creates a nonunifiable controller, we will realize the reason is that and require different input operands for different primary input conditions with respect to a branch condition ( and in this case). Therefore, MUX select signals generate a unique value for every specific input operand in the binary PO controller. In this case, at least one distinct (i.e., nonunifiable) binary primary output is produced, as shown in the binary controller. The discussion above explains the relationship between a unifiable allocated SCDFG, a unifiable symbolic PO controller and a unifiable binary PO controller. We define a few useful terms. For a symbolic PO controller, if the value of symbolic primary output has at least two different symbolic inputs on the transitions out of, we say that symbolic output is distinct for state. An symbolic output is unifiable for state if is not distinct for state. is called a unifiable symbolic PO controller if every symbolic output is unifiable for each state. A unifiable symbolic PO controller generates only unifiable binary PO controller. A unifiable allocated SCDFG is an allocated SCDFG which generates a unifiable controller. As we can see, unifiability can be determined by checking the existence of a unifiable allocated SCDFG or a unifiable symbolic PO controller, which can be detected earlier in the processing stage than using a binary PO controller. A binary PO controller cannot be generated until the multiplexer tree for interconnection is built. In Fig. 7, the operation has been rescheduled to control step. As a result, the symbolic controller symbolic is unifiable. The multiplexer configuration is the same as in Fig. 6. However, the corresponding binary PO controller becomes unifiable because the allocation of operations to function units has changed. Simultaneous consideration of scheduling effects on the controller and allocation effects on the datapath were required to synthesize a high-performance implementation. V. ALGORITHMS Having explained the idea of unifiable scheduling and allocation, we will describe the algorithms in detail. The unifiable concept can be embedded in existing scheduling and allocation methods. Here some well-known heuristic approaches will be chosen as the foundation of the USA program. The scheduling part of USA is based on the force-directed list scheduling (FDLS) method and left-edge allocation algorithm, both of which were described in Section II. USA could be adapted to other scheduling methods as well we illustrate the USA heuristics using these well-known algorithms for specificity. The unifiability of an operation node or a variable can be determined by the resulting controller structure s unifiability, which is related to both scheduling and allocation.

8 204 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 5, NO. 2, JUNE 1997 Fig. 8. USA s approach to assigning a ready operation list. USA defines constants unifiability, high, and low can be real numbers from zero to one where high low. The values of these constants can be set by the user to guide synthesis. We index the values high and low with subscripts to differentiate various allocated SCDFG s according to the cost of preserving controller unifiability. A threshold value is used in the functions unifiable_allocated_scdfg() of Fig. 8 and allocate_variable_list() of Fig. 10. If threshold low, USA acts like the force-directed list scheduling with left-edge allocation. If high threshold low, USA performs unifiable scheduling and allocation. Unlike FDLS, USA tries to allocate the current operation only if its unifiability passes the threshold. The thresholds are also computed such that, when scheduling outside the range at which unifiability affects clock period, the algorithm makes the same choices as FDLS or left-edge algorithm. In USA, operations with lower forces and higher unifiability are scheduled first. Fig. 8 describes USA s assign_ready_list() function, which performs the selection operation of FDLS. The function first sorts the ready list readylist by force value, Fig. 9. Part of a unifiable allocated SCDFG. then determines s unifiability; the unifiability value, in turn, determines whether the operation will be scheduled on this control step or the next. There are several possible cases: 1) is executed unconditionally (the 1 case of Fig. 9), in which case it is given a high unifiability value; 2) is executed conditionally but another resource of the proper type is available in this control step so that s unifiability is set high;

9 HUANG AND WOLF: MINIMIZING SYSTEM CYCLE TIME 205 Fig. 10. USA s approach to assigning a variable list. 3) is conditionally executed and there are no unused resources available on this cycle (see Fig. 7), in which case s unifiability is set to low. The second and third cases require the algorithm to recheck the available resources to find another resource of the same type which is not involved in the conditional. The function unifiable_allocated_scdfg() actually allocates to the control step. But unlike standard FDLS, it schedules only if s unifiability is above threshold. If not, as in case 3 above, will be deferred to a later schedule in order to make the control signals unifiable. The complexity of the list scheduling with conditional resource sharing is where is the number of operation nodes in a CDFG. The while loops in the functions Listscheduling() and assign_ready_list() run at most iterations, respectively. The mutual-exclusivity test in the function assign_ready_list() takes at most comparisons. The complexity of USA s function assign_ready_list() in Fig. 8 is the same as the corresponding operation in FDLS. Because USA has the same structure as FDLS, its complexity is as well. Register allocation in USA starts with lifetime analysis of the variables in SCDFG. It is based on the left-edge algorithm, which performs register allocation in polynomial time by using the channel-routing analogy of packing horizontal segments into as few tracks as possible [22]. USA modifies the leftedge algorithm and allocates variables with high unifiability to the same register. USA s allocation algorithm is shown in Fig. 10. The register allocation algorithm of USA in Fig. 10 is similar to the standard left-edge allocation algorithm; however, register_unifiability() takes unifiability into account in TABLE II DESIGN CHARACTERISTICS OF SEVERAL BENCHMARKS deciding when to allocate a variable. To determine the unifiability of a variable, the function first checks if is mutually-exclusive with the set of variables assigned to the current register. If not, it implies that would be associated with a unconditional operation and a unifiable allocated SCDFG could be preserved. In this case, will be assigned a high unifiability value and be assigned to if and have nonoverlapping lifetime intervals. On the other hand, if is mutually-exclusive with, register_unifiability() will check if the operation node preceding and operation nodes preceding are operations next to conditional nodes. If yes, USA will assign a low unifiability value to prevent from being assigned to if and are of different functional unit types. This step tries to avoid creating a nonunifiable allocated SCDFG. However, if and are of the same type and same resource-id, a high unifiability will be assigned to. register_unifiability() is called by the outer loop of the register allocation algorithm; it has a user-definable threshold

10 206 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 5, NO. 2, JUNE 1997 TABLE III DETAILED RESULTS ON THE BENCHMARKS which determines how aggressively register allocation takes unifiability into account. If the threshold is set very low, allocation operates exactly like the left-edge algorithm; as the threshold is raised, more decisions are made which take unifiability into account. Left-edge allocation, while optimal for pure CFG s, is not optimal for specifications with conditionals. We have used leftedge allocation as the basis of our discussion and experiments due to its simplicity and clarity. As we have mentioned, the unifiability heuristic can be adapted to other allocation algorithms as well. The complexity of our USA extension to the left-edge algorithm is the same as for standard left-edge allocation:, where is the number of variables in the CDFG and is the number of variables allocated. The variables can be sorted in time, The mutual-exclusivity test in the function register_sharable() takes at most comparisons, since there will be at most registers created during allocation. VI. EXPERIMENTAL RESULTS A. Generation of Interconnected Controller and Datapath Given a CDFG, USA can generate its corresponding allocated SCDFG and a system netlist consisting of interconnected controller and datapath. USA collects the allocated SCDFG for each control step and updates the symbolic PO controller according to all possible transitions in every control step. Multiplexer binding for functional units and registers is then created based on the symbolic PO controller; a point-to-point model for interconnection is used in this case. A balanced tree consisting of two-input multiplexers are generated for minimal area and balanced delay distribution for all multiplexer inputs. The inputs to the controller are the output values of the conditional operations in the allocated SCDFG. USA generates MUX-select signals for functional units and registers and load signals for registers as the primary outputs of the controller. The final controller is in KISS format [33] acceptable by JEDI [1]. The datapath part is generated based on PDL++ type input format [34]. USA then combines the controller and datapath in blif format to be optimized by SIS [2]. B. Comparison of Results We compare the results from USA with those from traditional force-directed list scheduling and left-edge algorithms. We call the latter as the BASE algorithm in the following discussion. The benchmarks we used are from Sehwa [29], Kim [35], Maha [36], and HAL [37]. They all have conditional operations. Given an unscheduled CDFG and optional resource constraints, we generate the allocated SCDFG s and the corresponding controller and datapath netlists for both USA and BASE. Scripts script.rugged and script.delay in SIS are used to perform logic optimization. The system cycle time are measured with library model after technology mapping using MCNC libraries [33]. Table II shows the number of control steps and functional units used in both USA and BASE methods. Basically, USA can usually use the same amount of functional units to generate a unifiable allocated SCDFG with minimal control steps, whereas BASE generates nonunifiable controllers in all cases. Table III describes some other statistics in detail. denotes the total number in bits of the MUX select lines for functional units. Similarly, is for register s MUX select lines. is the total number of 2-to- 1 multiplexers in the balanced two-input multiplexer tree for the functional unit. Similarly for. For these test cases, USA tends to use more multiplexers than BASE. We think that is because USA would usually generate a more balanced allocated SCDFG to available resources, whereas traditional resource sharing tends to be greedy and happens to use a few less multiplexers in these cases. For instance, suppose we have five inputs to bind to two resources and. USA would generate a binding with three inputs to and two inputs to, whereas BASE would produce a binding with four inputs to and one input to. In the former case, the total bit width of MUX select lines needed would be ; for the latter case, it would be. However, we think that the multiplexer overhead can be reduced and we will see later that the cycle time improvement is more significant than the area overhead. We extract the cycle time (in nanoseconds) after technology mapping and define delay ratio as the ratio of (USA/BASE). Similarly area ratio is defined as the (USA/BASE) ratio in lambda area using MCNC standard cell libraries. Fig. 11 shows the mean delay and area ratio for the test cases above when bit width of the resources equal to 4, 8, and 16, respectively. According to the experimental data, the delay improvement increases as bit width increases. We think it contributes to the fact that as bit width increases, the datapath

11 HUANG AND WOLF: MINIMIZING SYSTEM CYCLE TIME 207 Fig. 11. Average delay ratio and area ratio as a function of bit width. Fig. 12. Delay ratio and area ratio comparison. functional units take more time to generate control signals for the controller. Therefore, for a nonunifiable controller with unnecessary dependencies, it further delays the generation of BASE controller s primary output signals to control the remaining operations in the datapath. Fig. 11 also shows that delay improvement ratio is larger than area overhead ratio for in all cases. The area and delay ratio data are shown in Fig. 12 for bit width 4, 8, and 16. Two lines, delay ratio 1 and area ratio 1, are also shown as reference lines. If a datum point is toward the origin (0, 0), it means that USA achieves both cycle time improvement and area reduction for that case. The data distribution shows that in most cases, USA produces cycle time improvement with small to no area overhead.

12 208 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 5, NO. 2, JUNE 1997 Fig. 13. The Branch CDFG. TABLE IV DETAILED MULTICYCLING RESULTS ON THE BRANCH CDFG C. Multicycled Operations So far it has been assumed that each operation node in the CDFG takes one control step to be executed. In fact, operations with long execution time can be split into more than one cycle to further reduce system cycle time. This technique is called multicycling. USA can handle multicycled operations for a unifiable allocated SCDFG. The basic algorithms are the same except that lifetime analysis for resources needs to be updated accordingly [38]. Fig. 13 is a Branch CDFG example derived from the Cond CDFG in Section VI-B. This CDFG will be used to compare experimental results on multicycling. and are hardwired constants. The table compares variations on the design: Branch- is the same as the Branch CDFG except that the multiplier is multicycled over two cycles; Branchuses two multicycled multipliers instead of one, meaning that Branch- needs one more control step than Branch-. Fig. 14 shows the results from the BASE and the USA approaches. Now the multiplication operations span two cycles. Conditional resource sharing are applied for and. Because there is only one adder available, USA will defer the 2 operation to create a unifiable allocated SCDFG. Table IV shows some statistics in detail when bit width 4. USA has smaller system cycle time in all cases. There are two cases where USA achieves area reduction as well. VII. CONCLUSIONS A mixed scheduling and allocation approach is presented in this paper to generate a unifiable allocated scheduled control data flow graph. USA takes a control data flow graph and optional resource constraints and applies the unifiable scheduling and allocation approach to generate the controller and datapath netlist for the entire system. Experimental results show that by generating a unifiable allocated SCDFG, unnecessary dependencies between datapath and controller can be reduced, and hence most of the datapath components can be executed in parallel rather than in serial with respect to conditional operations. The system cycle time is hence improved with small to no area overhead. In our target architecture, we assume that the controller architecture is a single FSM. For a system with multiple FSM s, USA can be used to produce a unifiable structure for each FSM. By minimizing the dependencies between FSM s and its corresponding datapath operations, the entire system will have minimal dependencies as well. As for a pipelined datapath, the original datapath operation will be divided into several stages with storage units between them. In this case, USA will check the unifiability with respect to the branch state associated with the first stage of the pipeline operations. A unifiable allocated SCDFG can hence be generated.

13 HUANG AND WOLF: MINIMIZING SYSTEM CYCLE TIME 209 (a) (b) Fig. 14. Results using two multicycled multipliers from (a) the BASE approach and (b) the USA approach. The unifiable concept can be extended to system cycle time estimation and be applied to a synthesis system with cycle time constraints in addition to resource and control step constraints. Power consumption can also be reduced by scaling the voltage due to minimized cycle time [39]. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their helpful comments. REFERENCES [1] B. Lin and A. R. Newton, Synthesis of multiple level logic from symbolic high-level description languages, in Proc. VLSI Conf., Munich, Germany, Aug [2] E. M. Sentovich, K. J. Singh, C. Moon, H. Savoj, R. K. Brayton, and A. Sangiovanni-Vincentelli, Sequential circuit design using synthesis and optimization, in Proc. ICCD-92, Oct. 1992, pp [3] S. C.-Y. Huang and W. H. Wolf, Scheduling for minimal dependency in FSM s, in Proc. ICCAD-93, Nov. 1993, pp [4] E. A. Snow, Automation of Module Set Independent Register-Transfer Level Design, Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburgh, PA, Apr [5] D. Gajski, A. Wu, N. Dutt, and S. Lin, High-Level Synthesis: Introduction to Chip and System Design. Boston, MA: Kluwer Academic, [6] G. De Micheli, Synthesis and Optimization of Digital Circuits. New York: McGraw-Hill, [7] C. A. Papachristou and H. Honuk, A linear program driven scheduling and allocation method, in Proc. DAC-90, 1990, pp [8] C.-T. Hwang, J.-H. Lee, and Y.-C. Hsu, A formal approach to the scheduling problem in high-level synthesis, IEEE Trans. Computer- Aided Design, vol. 10, pp , Apr [9] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco, CA: W. H. Freeman, [10] B. M. Pangrle, A behavior compiler for intelligent silicon compilation, Ph.D. dissertation, Univ. Illinois at Urbana-Champaign, [11] H. DeMan, J. Rabaey, P. Six, and L. Claesen, Cathedral-II: A silicon compiler for digital signal processing, IEEE Design & Test, vol. 3, pp , Dec [12] E. F. Girczyc, R. J. Buhr, and J. P. Knight, Application of a subset of ADA as an algorithmic hardware description language for graph-based hardware compilation, IEEE Trans. Computer-Aided Design, vol. 4, pp , Apr [13] P. G. Paulin and J. P. Knight, Force-directed scheduling for the behavioral synthesis of ASIC s, IEEE Trans. Computer-Aided Design, vol. 8, pp , June [14] A. Kuehlmann and R. A. Bergamaschi, Timing analysis in high-level synthesis, in Proc. ICCAD-92, 1992, pp [15] R. Camposano, Path scheduling for synthesis, IEEE Trans. Computer- Aided Design, vol. 10, pp , Jan [16] K. Wakabayashi and T. Yoshimura, A resource sharing and control synthesis method for conditional branches, in Proc. ICCAD-89, IEEE Computer Society Press, 1989, pp [17] J. Ferrante, K. J. Ottenstein, and J. D. Warren, The program dependence graph and its use in optimization, ACM Trans. Programming Languages and Syst.,, vol. 9, no. 3, pp , July [18] D. Ku and G. De Micheli, Relative scheduling under timing constraints: Algorithms for high-level synthesis of digital circuits, IEEE Trans. Computer-Aided Design, vol. 11, pp , June [19] J. Nestor and V. Tamas, Exploiting scheduling freedom in controller synthesis, in Proc. Sixth Int. Symp. High-Level Synth., Nov. 1992, pp [20] R. Rudell and A. L. Sangiovanni-Vincentelli, ESPRESSO-MV: Algorithms for multivalued logic minimization, in Proc. Custom Integr. Circ. Conf., Portland, OR, May [21] S. Chaudhuri, S. A. Blythe, and R. A. Walker, An exact methodology for scheduling in a 3d design space, in Proc., 8th Int. Symp. Sys. Synth., IEEE Computer Society Press, 1995, pp [22] F. J. Kurdahi and A. C. Parker, REAL: A program for register allocation, in Proc. DAC-87, June 1987, pp [23] D. E. Thomas, E. D. Lagnese, R. A. Walker, J. A. Nestor, J. V. Rajan, and R. L. Blackburn, Algorithmic and Register-Transfer Level Synthesis: The System Architect s Workbench. Boston, MA: Kluwer Academic, [24] F.-S. Tsai and Y.-C. Hsu, STAR: An automatic data path allocator, IEEE Trans. Computer-Aided Design, vol. 11, pp , Sept [25] C. H. Gebotys, Optimal scheduling and allocation of embedded VLSI chips, in Proc. DAC-92, June 1992, pp [26] K. Wakabayashi, Cyber: High level synthesis system from software into ASIC, High-Level VLSI Synthesis, R. Camposano and W. H. Wolf, Eds. Kluwer Academic Publishers, [27] S. C.-Y. Huang and W. H. Wolf, How datapath allocation affects controller delay, in Proc. Seventh Int. Symp. High-Level Synth., IEEE Computer Society Press, May 1994, pp [28], Performance-driven synthesis in controller-datapath systems, IEEE Trans. VLSI Syst., vol. 2, pp , Mar [29] N. Park and A. C. Parker, Sehwa: A software package for synthesis of pipelines from behavioral specification, IEEE Trans. Computer-Aided Design, pp , Mar [30] J. J. Kim, F. J. Kurdahi, and N. Park, Automatic synthesis of timestationary controllers for pipelined data paths, in Proc. ICCAD-91, Nov. 1991, pp [31] T.-C. Lee, N. K. Jha, and W. H. Wolf, A conditional resource sharing method for behavioral synthesis of highly testable data paths, in Int. Test Conf., Baltimore, MD, Oct [32] C.-J. Tseng, R.-S. Wei, S. G. Rothweiler, M. M. Tong, and A. K. Bose, Bridge: A versatile behavioral synthesis system, in Proc. DAC-88, 1988, pp [33] R. Lisanke, Logic synthesis benchmark circuits, in Int. Workshop on Logic Synthesis, May [34] R. J. Lipton, D. N. Serpanos, and W. H. Wolf, PDL++: An optimizing generator language for register-transfer design, in Proc. ISCAS-90, May 1990, IEEE Circuits and Systems Society, pp [35] T. Kim, J. W. S. Liu, and C. L. Liu, A scheduling algorithm for conditional resource sharing, in Proc. ICCAD-91, Nov. 1991, pp [36] A. C. Parker, J. T. Pizarro, and M. Mlinar, MAHA: A program for datapath synthesis, in Proc. DAC-86, June 1986, pp [37] P. G. Paulin, Global scheduling and allocation algorithms in the HAL system, in High-Level VLSI Synthesis, R. Camposano and W. H. Wolf, Eds. New York: Kluwer Academic, [38] S. Bhatia and N. K. Jha. Behavioral synthesis for hierarchical testability of controller/datapath circuits with conditional branches, in Proc. ICCD-94, Sept [39] D. Liu and C. Svensson, Trading speed for low power by choice of supply and threshold voltages, IEEE J. Solid-State Circuits, vol. 28, Jan

How Datapath Allocation Affects Controller Delay

How Datapath Allocation Affects Controller Delay How Datapath Allocation Affects Controller Delay Steve C.-Y. Huang and Wayne H. Wolf Dept. of Electrical Engineering Princeton University Princeton, NJ 08544 Abstract We present in this paper an allocation

More information

Data Path Allocation using an Extended Binding Model*

Data Path Allocation using an Extended Binding Model* Data Path Allocation using an Extended Binding Model* Ganesh Krishnamoorthy Mentor Graphics Corporation Warren, NJ 07059 Abstract Existing approaches to data path allocation in highlevel synthesis use

More information

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca

Datapath Allocation. Zoltan Baruch. Computer Science Department, Technical University of Cluj-Napoca Datapath Allocation Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca e-mail: baruch@utcluj.ro Abstract. The datapath allocation is one of the basic operations executed in

More information

An Intelligent Priority Decision Making Algorithm for Competitive Operators in List-based Scheduling

An Intelligent Priority Decision Making Algorithm for Competitive Operators in List-based Scheduling IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.1, January 2009 81 An Intelligent Priority Decision Making Algorithm for Competitive Operators in List-based Scheduling Jun-yong

More information

High-Level Synthesis

High-Level Synthesis High-Level Synthesis 1 High-Level Synthesis 1. Basic definition 2. A typical HLS process 3. Scheduling techniques 4. Allocation and binding techniques 5. Advanced issues High-Level Synthesis 2 Introduction

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

MOST computations used in applications, such as multimedia

MOST computations used in applications, such as multimedia IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 9, SEPTEMBER 2005 1023 Pipelining With Common Operands for Power-Efficient Linear Systems Daehong Kim, Member, IEEE, Dongwan

More information

High-level Variable Selection for Partial-Scan Implementation

High-level Variable Selection for Partial-Scan Implementation High-level Variable Selection for Partial-Scan Implementation FrankF.Hsu JanakH.Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract In this paper, we propose

More information

High Level Synthesis

High Level Synthesis High Level Synthesis Design Representation Intermediate representation essential for efficient processing. Input HDL behavioral descriptions translated into some canonical intermediate representation.

More information

Eliminating False Loops Caused by Sharing in Control Path

Eliminating False Loops Caused by Sharing in Control Path Eliminating False Loops Caused by Sharing in Control Path ALAN SU and YU-CHIN HSU University of California Riverside and TA-YUNG LIU and MIKE TIEN-CHIEN LEE Avant! Corporation In high-level synthesis,

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University

More information

Coordinated Resource Optimization in Behavioral Synthesis

Coordinated Resource Optimization in Behavioral Synthesis Coordinated Resource Optimization in Behavioral Synthesis Jason Cong Bin Liu Junjuan Xu Computer Science Department, University of California, Los Angeles Email: {cong, bliu, irene.xu}@cs.ucla.edu Abstract

More information

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi Incorporating the Controller Eects During Register Transfer Level Synthesis Champaka Ramachandran and Fadi J. Kurdahi Department of Electrical & Computer Engineering, University of California, Irvine,

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Register Binding for FPGAs with Embedded Memory

Register Binding for FPGAs with Embedded Memory Register Binding for FPGAs with Embedded Memory Hassan Al Atat and Iyad Ouaiss iyad.ouaiss@lau.edu.lb Department of Computer Engineering Lebanese American University Byblos, Lebanon Abstract * The trend

More information

Minimization of Multiple-Valued Functions in Post Algebra

Minimization of Multiple-Valued Functions in Post Algebra Minimization of Multiple-Valued Functions in Post Algebra Elena Dubrova Yunjian Jiang Robert Brayton Department of Electronics Dept. of Electrical Engineering and Computer Sciences Royal Institute of Technology

More information

BEHAVIORAL SYNTHESIS: AN OVERVIEW

BEHAVIORAL SYNTHESIS: AN OVERVIEW CHAPTER 6 BEHAVIORAL SYNTHESIS: AN OVERVIEW REINALDO A. BERGAMASCHI IBM Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, USA., Tel: 1 914 945 3903, Fax: 1 914 945 4469, e-mail:

More information

A New Optimal State Assignment Technique for Partial Scan Designs

A New Optimal State Assignment Technique for Partial Scan Designs A New Optimal State Assignment Technique for Partial Scan Designs Sungju Park, Saeyang Yang and Sangwook Cho The state assignment for a finite state machine greatly affects the delay, area, and testabilities

More information

HIGH-LEVEL SYNTHESIS

HIGH-LEVEL SYNTHESIS HIGH-LEVEL SYNTHESIS Page 1 HIGH-LEVEL SYNTHESIS High-level synthesis: the automatic addition of structural information to a design described by an algorithm. BEHAVIORAL D. STRUCTURAL D. Systems Algorithms

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

Set Manipulation with Boolean Functional Vectors for Symbolic Reachability Analysis

Set Manipulation with Boolean Functional Vectors for Symbolic Reachability Analysis Set Manipulation with Boolean Functional Vectors for Symbolic Reachability Analysis Amit Goel Department of ECE, Carnegie Mellon University, PA. 15213. USA. agoel@ece.cmu.edu Randal E. Bryant Computer

More information

Low Power Bus Binding Based on Dynamic Bit Reordering

Low Power Bus Binding Based on Dynamic Bit Reordering Low Power Bus Binding Based on Dynamic Bit Reordering Jihyung Kim, Taejin Kim, Sungho Park, and Jun-Dong Cho Abstract In this paper, the problem of reducing switching activity in on-chip buses at the stage

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

High-Level Synthesis (HLS)

High-Level Synthesis (HLS) Course contents Unit 11: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 11 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors

A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

Worst Case Execution Time Analysis for Synthesized Hardware

Worst Case Execution Time Analysis for Synthesized Hardware Worst Case Execution Time Analysis for Synthesized Hardware Jun-hee Yoo ihavnoid@poppy.snu.ac.kr Seoul National University, Seoul, Republic of Korea Xingguang Feng fengxg@poppy.snu.ac.kr Seoul National

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017 Design of Low Power Adder in ALU Using Flexible Charge Recycling Dynamic Circuit Pallavi Mamidala 1 K. Anil kumar 2 mamidalapallavi@gmail.com 1 anilkumar10436@gmail.com 2 1 Assistant Professor, Dept of

More information

A New Decomposition of Boolean Functions

A New Decomposition of Boolean Functions A New Decomposition of Boolean Functions Elena Dubrova Electronic System Design Lab Department of Electronics Royal Institute of Technology Kista, Sweden elena@ele.kth.se Abstract This paper introduces

More information

Reclocking for High Level Synthesis

Reclocking for High Level Synthesis Reclocking for High Level Synthesis Pradip Jha Sri Parameswaran Nikil Dutt Information and Computer Science Dept of ECE Information and Computer Science University of California, Irvine The University

More information

TEST FUNCTION SPECIFICATION IN SYNTHESIS

TEST FUNCTION SPECIFICATION IN SYNTHESIS TEST FUNCTION SPECIFICATION IN SYNTHESIS Vishwani D. Agrawal and Kwang-Ting Cbeng AT&T Bell Laboratories Murray Hill, New Jersey 07974 ABSTRACT - We present a new synthesis for testability method in which

More information

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering

More information

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech) K.Prasad Babu 2 M.tech (Ph.d) hanumanthurao19@gmail.com 1 kprasadbabuece433@gmail.com 2 1 PG scholar, VLSI, St.JOHNS

More information

An Area-Efficient BIRA With 1-D Spare Segments

An Area-Efficient BIRA With 1-D Spare Segments 206 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 1, JANUARY 2018 An Area-Efficient BIRA With 1-D Spare Segments Donghyun Kim, Hayoung Lee, and Sungho Kang Abstract The

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

Performance Analysis and Optimization of Schedules for Conditional and Loop-Intensive Specifications

Performance Analysis and Optimization of Schedules for Conditional and Loop-Intensive Specifications Performance Analysis and Optimization of Schedules for Conditional and Loop-Intensive Specifications Subhrajit Bhattacharya Sujit Dey Franc Brglez y Dept. of Computer Science C&C Research Labs CBL, Dept.

More information

Topics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1)

Topics. Verilog. Verilog vs. VHDL (2) Verilog vs. VHDL (1) Topics Verilog Hardware modeling and simulation Event-driven simulation Basics of register-transfer design: data paths and controllers; ASM charts. High-level synthesis Initially a proprietary language,

More information

System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics

System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics System-Level Synthesis of Application Specific Systems using A* Search and Generalized Force-Directed Heuristics Chunho Lee, Miodrag Potkonjak, and Wayne Wolf Computer Science Department, University of

More information

2 <3> <2> <1> (5,6) 9 (5,6) (4,5) <1,3> <1,2> <1,1> (4,5) 6 <1,1,4> <1,1,3> <1,1,2> (5,7) (5,6) (5,5)

2 <3> <2> <1> (5,6) 9 (5,6) (4,5) <1,3> <1,2> <1,1> (4,5) 6 <1,1,4> <1,1,3> <1,1,2> (5,7) (5,6) (5,5) A fast approach to computing exact solutions to the resource-constrained scheduling problem M. NARASIMHAN and J. RAMANUJAM 1 Louisiana State University This paper presents an algorithm that substantially

More information

Layout Modeling and Design Space Exploration in PSS1 System

Layout Modeling and Design Space Exploration in PSS1 System VLSI DESIGN 1997, Vol. 5, No. 2, pp. 211-221 Reprints available directly from the publisher Photocopying permitted by license only (C) 1997 OPA (Overseas Publishers Association) Amsterdam B.V. Published

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Twiddle Factor Transformation for Pipelined FFT Processing

Twiddle Factor Transformation for Pipelined FFT Processing Twiddle Factor Transformation for Pipelined FFT Processing In-Cheol Park, WonHee Son, and Ji-Hoon Kim School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea icpark@ee.kaist.ac.kr,

More information

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead

Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Near Optimal Repair Rate Built-in Redundancy Analysis with Very Small Hardware Overhead Woosung Lee, Keewon Cho, Jooyoung Kim, and Sungho Kang Department of Electrical & Electronic Engineering, Yonsei

More information

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip

Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip 1 Mythili.R, 2 Mugilan.D 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders

FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders 770 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 48, NO. 8, AUGUST 2001 FIR Filter Synthesis Algorithms for Minimizing the Delay and the Number of Adders Hyeong-Ju

More information

EE382M.20: System-on-Chip (SoC) Design

EE382M.20: System-on-Chip (SoC) Design EE82M.20: SystemonChip (SoC) Design Lecture 1 Resource Allocation and Binding Source: G. De Micheli, Integrated Systems Center, EPFL Synthesis and Optimization of Digital Circuits, McGraw Hill, 2001. Andreas

More information

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis*

Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* Behavioural Transformation to Improve Circuit Performance in High-Level Synthesis* R. Ruiz-Sautua, M. C. Molina, J.M. Mendías, R. Hermida Dpto. Arquitectura de Computadores y Automática Universidad Complutense

More information

RTL Power Estimation and Optimization

RTL Power Estimation and Optimization Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL

More information

Symbolic Modeling and Evaluation of Data Paths*

Symbolic Modeling and Evaluation of Data Paths* Symbolic Modeling and Evaluation of Data Paths* Chuck Monahan Forrest Brewer Department of Electrical and Computer Engineering University of California, Santa Barbara, U.S.A. 1. Introduction Abstract We

More information

Constraint Analysis and Heuristic Scheduling Methods

Constraint Analysis and Heuristic Scheduling Methods Constraint Analysis and Heuristic Scheduling Methods P. Poplavko, C.A.J. van Eijk, and T. Basten Eindhoven University of Technology, Department of Electrical Engineering, Eindhoven, The Netherlands peter@ics.ele.tue.nl

More information

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

THE orthogonal frequency-division multiplex (OFDM)

THE orthogonal frequency-division multiplex (OFDM) 26 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 1, JANUARY 2010 A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors Chen-Fong Hsiao, Yuan Chen, Member, IEEE,

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Sequential Logic Synthesis

Sequential Logic Synthesis Sequential Logic Synthesis Logic Circuits Design Seminars WS2010/2011, Lecture 9 Ing. Petr Fišer, Ph.D. Department of Digital Design Faculty of Information Technology Czech Technical University in Prague

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

Register Binding and Port Assignment for Multiplexer Optimization

Register Binding and Port Assignment for Multiplexer Optimization Register Binding and Port Assignment for Multiplexer Optimization Deming Chen and Jason Cong Computer Science Department University of California, Los Angeles {demingc, cong}@cs.ucla.edu Abstract - Data

More information

Single-Path Programming on a Chip-Multiprocessor System

Single-Path Programming on a Chip-Multiprocessor System Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at

More information

Non-Heuristic Optimization and Synthesis of Parallel-Prefix Adders

Non-Heuristic Optimization and Synthesis of Parallel-Prefix Adders International Workshop on Logic and Architecture Synthesis (IWLAS, Grenoble, ecember Non-Heuristic Optimization and Synthesis of Parallel-Prefix Adders Reto Zimmermann Integrated Systems Laboratory Swiss

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

Digital Filter Synthesis Considering Multiple Adder Graphs for a Coefficient

Digital Filter Synthesis Considering Multiple Adder Graphs for a Coefficient Digital Filter Synthesis Considering Multiple Graphs for a Coefficient Jeong-Ho Han, and In-Cheol Park School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea jhhan.kr@gmail.com,

More information

Algorithm for Determining Most Qualified Nodes for Improvement in Testability

Algorithm for Determining Most Qualified Nodes for Improvement in Testability ISSN:2229-6093 Algorithm for Determining Most Qualified Nodes for Improvement in Testability Rupali Aher, Sejal Badgujar, Swarada Deodhar and P.V. Sriniwas Shastry, Department of Electronics and Telecommunication,

More information

Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction

Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction Synthesis of DSP Systems using Data Flow Graphs for Silicon Area Reduction Rakhi S 1, PremanandaB.S 2, Mihir Narayan Mohanty 3 1 Atria Institute of Technology, 2 East Point College of Engineering &Technology,

More information

Analysis of Conditional Resource Sharing using a Guard-based Control Representation *

Analysis of Conditional Resource Sharing using a Guard-based Control Representation * Analysis of Conditional Resource Sharing using a Guard-based Control Representation * Ivan P. Radivojevi c orrest Brewer Department of Electrical and Computer Engineering University of California, Santa

More information

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 8, AUGUST

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 23, NO. 8, AUGUST IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL 23, NO 8, AUGUST 2004 1175 Register Binding-Based RTL Power Management for Control-Flow Intensive Designs Jiong Luo, Lin

More information

Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components

Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components Rainer Dömer, Andreas Gerstlauer, Dongwan Shin Technical Report CECS-04-19 July 22, 2004 Center for Embedded Computer Systems University

More information

Efficient Computation of Canonical Form for Boolean Matching in Large Libraries

Efficient Computation of Canonical Form for Boolean Matching in Large Libraries Efficient Computation of Canonical Form for Boolean Matching in Large Libraries Debatosh Debnath Dept. of Computer Science & Engineering Oakland University, Rochester Michigan 48309, U.S.A. debnath@oakland.edu

More information

Placement Algorithm for FPGA Circuits

Placement Algorithm for FPGA Circuits Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

Functional extension of structural logic optimization techniques

Functional extension of structural logic optimization techniques Functional extension of structural logic optimization techniques J. A. Espejo, L. Entrena, E. San Millán, E. Olías Universidad Carlos III de Madrid # e-mail: { ppespejo, entrena, quique, olias}@ing.uc3m.es

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Target Architecture Oriented High-Level Synthesis for Multi-FPGA Based Emulation *

Target Architecture Oriented High-Level Synthesis for Multi-FPGA Based Emulation * Target Architecture Oriented High-Level Synthesis for Multi-FPGA Based Emulation * Oliver Bringmann,2, Carsten Menn, Wolfgang Rosenstiel,2 FZI, Haid-und-Neu-Str. 0-4, 763 Karlsruhe, Germany 2 Universität

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION

OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION OPTIMIZATION OF FIR FILTER USING MULTIPLE CONSTANT MULTIPLICATION 1 S.Ateeb Ahmed, 2 Mr.S.Yuvaraj 1 Student, Department of Electronics and Communication/ VLSI Design SRM University, Chennai, India 2 Assistant

More information

Delay and Power Optimization of Sequential Circuits through DJP Algorithm

Delay and Power Optimization of Sequential Circuits through DJP Algorithm Delay and Power Optimization of Sequential Circuits through DJP Algorithm S. Nireekshan Kumar*, J. Grace Jency Gnannamal** Abstract Delay Minimization and Power Minimization are two important objectives

More information

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point International Journal of Electrical and Computer Engineering (IJECE) Vol. 3, No. 6, December 2013, pp. 805~814 ISSN: 2088-8708 805 Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating

More information

A Controller Testability Analysis and Enhancement Technique

A Controller Testability Analysis and Enhancement Technique A Controller Testability Analysis and Enhancement Technique Xinli Gu Erik Larsson, Krzysztof Kuchinski and Zebo Peng Synopsys, Inc. Dept. of Computer and Information Science 700 E. Middlefield Road Linköping

More information

TODAY, new applications, e.g., multimedia or advanced

TODAY, new applications, e.g., multimedia or advanced 584 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1998 A Formal Technique for Hardware Interface Design Adel Baganne, Jean-Luc Philippe, and Eric

More information

High-Level Synthesis of Programmable Hardware Accelerators Considering Potential Varieties

High-Level Synthesis of Programmable Hardware Accelerators Considering Potential Varieties THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.,, (VDEC) 113 8656 7 3 1 CREST E-mail: hiroaki@cad.t.u-tokyo.ac.jp, fujita@ee.t.u-tokyo.ac.jp SoC Abstract

More information

RPUSM: An Effective Instruction Scheduling Method for. Nested Loops

RPUSM: An Effective Instruction Scheduling Method for. Nested Loops RPUSM: An Effective Instruction Scheduling Method for Nested Loops Yi-Hsuan Lee, Ming-Lung Tsai and Cheng Chen Department of Computer Science and Information Engineering 1001 Ta Hsueh Road, Hsinchu, Taiwan,

More information

Functional Test Generation for Delay Faults in Combinational Circuits

Functional Test Generation for Delay Faults in Combinational Circuits Functional Test Generation for Delay Faults in Combinational Circuits Irith Pomeranz and Sudhakar M. Reddy + Electrical and Computer Engineering Department University of Iowa Iowa City, IA 52242 Abstract

More information

Delay Estimation for Technology Independent Synthesis

Delay Estimation for Technology Independent Synthesis Delay Estimation for Technology Independent Synthesis Yutaka TAMIYA FUJITSU LABORATORIES LTD. 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, JAPAN, 211-88 Tel: +81-44-754-2663 Fax: +81-44-754-2664 E-mail:

More information

TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS

TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS TECHNOLOGY MAPPING FOR THE ATMEL FPGA CIRCUITS Zoltan Baruch E-mail: Zoltan.Baruch@cs.utcluj.ro Octavian Creţ E-mail: Octavian.Cret@cs.utcluj.ro Kalman Pusztai E-mail: Kalman.Pusztai@cs.utcluj.ro Computer

More information

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h)

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h) HAI ZHOU Electrical and Computer Engineering Northwestern University 2535 Happy Hollow Rd. Evanston, IL 60208-3118 Glenview, IL 60025 haizhou@ece.nwu.edu www.ece.nwu.edu/~haizhou (847) 491-4155 (o) (847)

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016 NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering

More information

Distributed Synchronous Control Units for Dataflow Graphs under Allocation of Telescopic Arithmetic Units

Distributed Synchronous Control Units for Dataflow Graphs under Allocation of Telescopic Arithmetic Units Distributed Synchronous Control Units for Dataflow Graphs under Allocation of Telescopic Arithmetic Units Euiseok Kim, Hiroshi Saito Jeong-Gun Lee Dong-Ik Lee Hiroshi Nakamura Takashi Nanya Dependable

More information

Symbolic Manipulation of Boolean Functions Using a Graphical Representation. Abstract

Symbolic Manipulation of Boolean Functions Using a Graphical Representation. Abstract Symbolic Manipulation of Boolean Functions Using a Graphical Representation Randal E. Bryant 1 Dept. of Computer Science Carnegie-Mellon University Abstract In this paper we describe a data structure for

More information

Lecture 20: High-level Synthesis (1)

Lecture 20: High-level Synthesis (1) Lecture 20: High-level Synthesis (1) Slides courtesy of Deming Chen Some slides are from Prof. S. Levitan of U. of Pittsburgh Outline High-level synthesis introduction High-level synthesis operations Scheduling

More information

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power

More information

Increasing Parallelism of Loops with the Loop Distribution Technique

Increasing Parallelism of Loops with the Loop Distribution Technique Increasing Parallelism of Loops with the Loop Distribution Technique Ku-Nien Chang and Chang-Biau Yang Department of pplied Mathematics National Sun Yat-sen University Kaohsiung, Taiwan 804, ROC cbyang@math.nsysu.edu.tw

More information

Logic Synthesis of Multilevel Circuits with Concurrent Error Detection

Logic Synthesis of Multilevel Circuits with Concurrent Error Detection IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 16, NO. 7, JULY 1997 783 [16] Layout synthesis benchmark set, Microelectronics Center of North Carolina, Research Triangle

More information

Controller Synthesis for Hardware Accelerator Design

Controller Synthesis for Hardware Accelerator Design ler Synthesis for Hardware Accelerator Design Jiang, Hongtu; Öwall, Viktor 2002 Link to publication Citation for published version (APA): Jiang, H., & Öwall, V. (2002). ler Synthesis for Hardware Accelerator

More information

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs

Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs Saving Power by Mapping Finite-State Machines into Embedded Memory Blocks in FPGAs Anurag Tiwari and Karen A. Tomko Department of ECECS, University of Cincinnati Cincinnati, OH 45221-0030, USA {atiwari,

More information

SCHEDULING II Giovanni De Micheli Stanford University

SCHEDULING II Giovanni De Micheli Stanford University SCHEDULING II Giovanni De Micheli Stanford University Scheduling under resource constraints Simplified models: Hu's algorithm. Heuristic algorithms: List scheduling. Force-directed scheduling. Hu's algorithm

More information

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori The Computer Journal, 46(6, c British Computer Society 2003; all rights reserved Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes Tori KEQIN LI Department of Computer Science,

More information

High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs

High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs High-Level Power Estimation and Low-Power Design Space Exploration for FPGAs Deming Chen Department of ECE University of Illinois, Urbana-Champaign dchen@uiuc.edu Jason Cong, Yiping Fan, Zhiru Zhang Computer

More information

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Design and Implementation of VLSI 8 Bit Systolic Array Multiplier Khumanthem Devjit Singh, K. Jyothi MTech student (VLSI & ES), GIET, Rajahmundry, AP, India Associate Professor, Dept. of ECE, GIET, Rajahmundry,

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry

High Performance Memory Read Using Cross-Coupled Pull-up Circuitry High Performance Memory Read Using Cross-Coupled Pull-up Circuitry Katie Blomster and José G. Delgado-Frias School of Electrical Engineering and Computer Science Washington State University Pullman, WA

More information