Retiming and Clock Scheduling for Digital Circuit Optimization

Size: px
Start display at page:

Download "Retiming and Clock Scheduling for Digital Circuit Optimization"

Transcription

1 184 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Retiming and Clock Scheduling for Digital Circuit Optimization Xun Liu, Student Member, IEEE, Marios C. Papaefthymiou, Member, IEEE, and Eby G. Friedman, Fellow, IEEE Abstract This paper investigates the application of simultaneous retiming and clock scheduling for optimizing synchronous circuits under setup and hold constraints. Two optimization problems are explored: 1) clock period minimization and 2) tolerance maximization to clock-signal delay variations. Exact mixed-integer linear programming formulations and efficient heuristics are given for both problems. When both long and short paths are considered, circuits optimized by the combined application of retiming and clock scheduling can achieve shorter clock periods or demonstrate greater tolerance to clock-signal delay variations than circuits optimized by retiming or clock scheduling. Experiments with benchmark circuits demonstrate the effectiveness of the combined optimization. In comparison with the best result obtained by either of the two optimizations, the joint application of retiming and clock scheduling increased operating speeds by more than 8% on the average. It also increased tolerance to clock delay variations by an average of 12% over a broad range of target clock frequencies. Larger relative improvements were achieved for shorter clock periods, thus suggesting that simultaneous retiming and clock scheduling can play an important role in high-speed design. Index Terms Delay tolerance, digital circuits, optimization delay variations, retiming, scheduling, timing. I. INTRODUCTION PERIODIC clock signals play a central role in the design and performance of synchronous digital systems. They provide a synchronization mechanism that enables the straightforward implementation of finite state machines. They also determine the throughput of the computation performed by these machines, since data propagation through a digital system proceeds at the rate specified by the clock period. As clocking rates increase, variations in the propagation delays of the clock signals across a chip are becoming a significant fraction of the clock period and begin posing a fundamental challenge to the design of high-performance systems. Such variations are typically due to process parameter variations, temperature or environmental variations, and power supply variations and become particularly pronounced in submicron design. Consequently, in addition to clock period minimization, tolerance Manuscript received August 16, 2000; revised April 27, This work was supported in part by the National Science Foundation under Grants CCR , MIP , and CCR , a grant from the New York State Science and Technology Foundation, and by grants from the Xerox, IBM, and Intel Corporations. This paper was recommended by Associate Editor L. Stok. X. Liu and M. C. Papaefthymiou are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI USA ( xunliu@eecs.umich.edu, marios@eecs.umich.edu). E. G. Friedman is with the Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY USA ( friedman@ece.rochester.edu). Publisher Item Identifier S (02) maximization to clock-signal delay variations has become of paramount importance to the design of high-performance digital circuits. Retiming and clock scheduling are two elegant and powerful approaches to digital circuit performance optimization. Although completely different from an implementation standpoint, they both improve circuit performance by redistributing timing slack among the combinational paths of a circuit. Retiming decreases the critical delays of a circuit by relocating its registers so that combinational path delays are evenly balanced across the entire design. Instead of transforming a circuit, clock scheduling increases the timing slack along critical paths by introducing appropriate clock delays that differentiate, or skew, the arrival times of the clock signals at the circuit registers. Due to their common operating principle, retiming and clock scheduling are often referred to as being equivalent, although they do not always achieve the same clock period when applied separately. Significant effort has been devoted to the individual study of the two optimizations. The investigation of their combined application has been limited, however. This paper investigates the simultaneous application of retiming and clock scheduling for digital circuit optimization under setup and hold constraints. When both long and short paths are considered, the combined application of the two optimizations is more powerful than either of the two optimizations by itself, resulting in faster circuits or circuits that are more tolerant to variations in the delays of their clock signals. Moreover, the combined application of retiming and clock scheduling adds flexibility during circuit synthesis, providing alternative ways for achieving a target performance specification. The effectiveness of simultaneous retiming and clock scheduling in minimizing the clock period of a digital circuit is demonstrated in the example of Fig. 1. In addition to demonstrating the speedup potential of simultaneous retiming and clock scheduling, Fig. 1 shows that the shortest clock period achievable by retiming and clock scheduling may not be obtainable by an intuitive application of the two optimizations in sequence, that is, first performing retiming for maximum speed (under zero skew) and then applying clock scheduling. In this figure, digital circuits are represented as directed graphs. Each vertex denotes a block of combinational logic, and each edge between a vertex pair denotes a wire between the corresponding logic blocks. The rectangles represent edge-triggered registers. The pair inside each vertex denotes the minimum and maximum propagation delays of the signals through the corresponding logic block /02$ IEEE

2 LIU et al.: RETIMING AND CLOCK SCHEDULING FOR DIGITAL CIRCUIT OPTIMIZATION 185 Fig. 1. Clock period improvement by simultaneous retiming and clock scheduling. (a) Original circuit. (b) Fastest retimed circuit with zero skew. (c) Fastest retimed circuit with nonzero skew. When all clock skews are zero, the minimum period at which the circuit can be clocked is the longest delay among its combinational paths. For example, the original circuit in Fig. 1(a) achieves a clock period of tu (time units) with zero clock skew. If nonzero clock skew is introduced, it is possible to operate the circuit at a shorter clock period that equals the largest difference between the maximum and minimum register-to-register delays over all paths in the circuit. Accordingly, if register sees the clock edge by 4 tu earlier than register, the circuit can function correctly with a clock period of tu. This clock period is the shortest one that clock scheduling can achieve without introducing any signal races, since the difference in the maximum and minimum propagation delays along the path BCD is 19 tu. The retimed circuit in Fig. 1(b) is obtained by first retiming the original circuit to minimize its clock period with zero skew and subsequently adjusting its clock schedule to further reduce its clock period. The new register locations result from a backward shift of register across block D and a forward shift of register across block B. With zero clock skew, this circuit is optimally retimed and achieves a clock period of tu. If the clock edge arrives at register by 1 tu earlier than at register, this retimed circuit can achieve an even shorter clock period of tu. No further clock period improvements are possible, however, because the propagation delay difference along path DAB is 15 tu. Fig. 1(c) shows another retimed version of the original circuit that is obtained by shifting register across block B. For this circuit, the shortest clock period that results in no timing violations under zero clock skew is tu. If the clock edge arrives at register by 4 tu later than register, however, this circuit can function correctly with a clock period tu. This clock period is the minimum that can be possibly achieved by applying both retiming and clock scheduling, since the total propagation delay around the cycle is 28 tu and must be distributed between two combinational paths. The effectiveness of simultaneous retiming and clock scheduling in increasing the tolerance of a circuit to variations in the propagation delays of its clock signals is demonstrated by the example in Fig. 2. In this example, the clock skew between the input/output registers and is assumed to be zero. The setup and hold constraints along each of the two combinational paths yield a range [ ] of permissible clock delays [18] for the arrival of the clock signal at register. The permissible skew range of is obtained by intersecting the two possible ranges. For the original circuit in Fig. 2(a) and a target clock period of 12 tu, the range of possible delays for the clock signal ar- Fig. 2. Tolerance improvement by simultaneous retiming and clock scheduling. (a) Original and (b) retimed circuit. riving at is [ 2, 4]. When clock delay is centered at zero, therefore, the permissible skew range of is [ 2, 2], assuming symmetric clock delay variations, and the tolerance of the circuit is 4 tu. When clock signals are designed to arrive at with a delay, however, the permissible skew range is [ 2, 4], and delay tolerance increases to 6 tu. A retimed version of the original circuit that is obtained by shifting forward is shown in Fig. 2(b). In this case, the intersection of the two skew ranges is [ 1, 7]. When clock skew is zero, the permissible range of is [ 1, 1], and tolerance drops to 2 tu. When the arrival time of the clock signal at is delayed by, however, the permissible range becomes [ 1, 7], and tolerance increases to 8 tu. This value is the maximum tolerance that can be achieved by simultaneous retiming and clock scheduling for tu. An interesting observation about the retimed circuit in Fig. 2(b) is that when skews are zero, the delay tolerance of this circuit is smaller than that in the original one. Nevertheless, when clock skews are nonzero, this circuit exhibits maximum tolerance to delay variations. It may thus be impossible to compute a maximum tolerance configuration by simply performing retiming for maximum tolerance, followed by clock scheduling for maximum tolerance. This paper investigates the simultaneous application of retiming and clock scheduling for increasing the performance of a digital circuit. We first study the retiming and clock scheduling problem. In this problem, given a synchronous digital circuit, a target clock period, and a target tolerance on the variation of the clock arrival times at the circuit s registers, we wish to compute a retiming and a clock schedule so that the optimized circuit meets its performance specification. We then turn to the two optimization variants of the basic retiming and clock scheduling problem. In the minimum-period retiming and clock scheduling problem, given a synchronous circuit and a target tolerance, we wish to compute a retiming and a clock schedule such that the optimized circuit satisfies the tolerance constraints while achieving the minimum possible clock period. In the maximum-tolerance retiming and clock scheduling problem, given a synchronous circuit and a target clock period, we wish to compute a retiming and a clock schedule such that the optimized circuit meets its clock period constraint while achieving the maximum possible tolerance to variations in the clock signal arrival times.

3 186 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 We give an exact mixed-integer linear programming (MILP) formulation for the retiming and clock scheduling problem under setup and hold constraints. Our program has constraints, where is the number of wires in the circuit and can be solved exactly using general MILP solvers. Although MILP algorithms have worst case exponential complexity, our MILP formulation is valuable from a number of standpoints. Most notably, it gives a mathematically accurate set of necessary and sufficient conditions for the problem, thus providing a way for calculating the optimal solution. For relatively small circuits, the MILP formulation is thus useful in evaluating the effectiveness of heuristics. In addition to the MILP formulation of retiming and clock scheduling, we give practical heuristics for the minimum-period and the maximum-tolerance optimization problems. Our heuristics run faster than MILP-based optimization by several orders of magnitude with little or no sacrifice in accuracy. To complement our analysis, we present extensive experimental results on modified LGSynth93 and ISCAS89 benchmark circuits that demonstrate the speed and tolerance improvements which can be achieved by combining retiming and clock scheduling. Our results also indicate the circuit structures that are more amenable to speed-up by retiming and clock scheduling. In our experiments, clock periods improve by 8% on the average and by as much as 42% in the best case over separate retiming or clock scheduling. Sequential application of retiming and clock scheduling is found to be effective for most benchmark circuits when optimizing for speed. For about a quarter of the circuits, however, simultaneous retiming and clock scheduling can further improve the clock frequency by up to 28%. With respect to delay variations, our heuristic outperforms the best of retiming, scheduling, or sequential retiming and scheduling by an average of 12% across a broad range of target clock periods. In certain cases, improvements of up to a multiplicative factor of 3 are achieved. In our experiments, the effectiveness of retiming and clock scheduling in improving tolerance to clock signal delay variations generally increases with the target operating frequency. Such a property is especially desirable in high-performance design, since it provides flexibility in achieving high operating speeds with high tolerance to delay variations. The remainder of this paper has six sections. Background material is given in Section II. In Section III, we present a set of necessary and sufficient conditions for correct timing in a digital circuit that is optimized by retiming and clock scheduling. A novel MILP formulation for the retiming and clock scheduling problem is given in Section IV. In Section V, we describe a heuristic scheme for retiming and clock scheduling that can be adapted to solve the minimum-period and maximumtolerance variants of the basic retiming and clock scheduling problem. The results of extensive experiments using LGSynth93 and ISCAS89 benchmark circuits are presented in Section VI. A summary of our contributions and directions for future research are given in Section VII. II. BACKGROUND Section II-A presents the circuit and delay models we assume in this paper. Background material for retiming and clock scheduling is given in Section II-B and Section II-C, respectively. Previous research related to retiming and clock scheduling is discussed in Section II-D. A. Circuit and Delay Model An edge-triggered circuit is modeled as a directed multigraph. The vertices correspond to the combinational logic elements in the circuit. The directed edges model the interconnections between the combinational blocks. Each edge corresponds to a wire that connects an output of a combinational block to the input of another combinational block, possibly through one or more globally clocked, edge-triggered registers. For each edge, the register count of the corresponding wire is given by a nonnegative, integer edgeweight. To avoid races, every directed cycle of includes an edge with strictly positive register count. Throughout this paper, we assume a data-independent delay model that considers timing constraints on long and short paths. Specifically, each vertex is associated with a pair of nonnegative weights and which give the longest and shortest propagation delay, respectively, through the corresponding combinational logic block. All registers are assumed to have the same propagation delay, setup time and hold time, regardless of their positions in the circuit. Given a target clock period, we say that a circuit achieves if all setup and hold constraints in hold. Intuitively, contains no setup violations if for every combinational path from a register to a register in, data have sufficient time to propagate along before the arrival of the latching clock edge at register. Moreover, contains no hold violations (also known as races) if the propagation of clock signals between and occurs faster than data propagation along. A precise mathematical formulation of the setup and hold constraints that must hold to ensure correct timing is given in Section III. B. Retiming A retiming of an edge-triggered circuit is an integer-valued vertex-labeling that denotes a transformation of the original circuit into a functionally equivalent circuit. For each edge in, is defined by The retiming transformation for a vertex in is shown in Fig. 3. The output of s computation in is generated clock cycles later than in. The retimed circuit is well formed if for all edges,wehave (1) 0 (2) Equation (1) implies that for every vertex pair in, the change in the register count along any path depends solely on its two endpoints (3)

4 LIU et al.: RETIMING AND CLOCK SCHEDULING FOR DIGITAL CIRCUIT OPTIMIZATION 187 Fig. 3. Retiming of a vertex u by r(u)=1. where. Thus, the maximum decrease in the register count of any path is Accordingly, given a retiming, the number of registers along all paths with is. The only paths that can become combinational (and possibly lead to a timing violation) in are those for which in. For each of the vertex pairs in, the quantities where, represent the longest and shortest propagation delays along the path. Therefore, with zero skew, the clock period of any retimed circuit is always some element in the -size set of. When only setup constraints are considered, a retimed circuit that achieves a given clock period can be computed in. A retimed circuit that achieves the minimum possible clock period can be computed in steps [11]. C. Clock Scheduling In synchronous digital circuits, a clock signal provides a global time reference that synchronizes the flow of data between storage elements. Clock signals are delivered throughout the circuit by a clock distribution network which introduces delays in their propagation [8]. A clock schedule of an edge-triggered circuit is a real-valued edge-labeling. This labeling gives the propagation delay of each clock signal from the global clock source to the corresponding wire in. Timing violations can be fixed (or created) by adjusting these delays. For example, consider the path, where and are wires with registers, and is a path of combinational logic blocks. The difference is referred to as the clock skew between the registers and.if, then the time available for the propagation of data from to decreases by. This adjustment may cause to become a critical path in or eliminate a race condition along. Conversely, if, then the time available for data to propagate from to increases by. This increase may remove a long path violation or introduce a short path violation. In the clock scheduling problem, we wish to adjust the clock skews by appropriately selecting a clock schedule so that no setup or hold violations exist in the circuit. Furthermore, in the clock scheduling optimization problem, (4) (5) (6) clock skews are adjusted to implement the optimal circuit performance, for example, minimal clock period, without any setup or hold violations. A variety of factors, including differences in interconnect delay, parasitic impedances, process parameter variations, and temperature variations, affect the delays in the propagation of the clock signals to their registers in the circuit. After fabrication, therefore, the actual delay of a clock signal with nominal propagation delay varies within a range around. For every value of the clock signal delay in this range, a correctly designed circuit should meet all its timing constraints. The minimum and the maximum delays for the propagation of each clock signal to the corresponding wire in are denoted by labelings and, respectively. For each wire, the values and define a range for. Given a clock period, we say that a circuit achieves with tolerance to the propagation delays of its clock signals if and only if there exist labelings and such that is timed correctly with clock period and for all wires,wehave The maximum tolerance of is defined as (7) such that achieves and Inequality (7) holds (8) Given and, it is possible to compute a clock schedule with tolerance if the distribution of the delay variations is known. Throughout this paper we assume that delay variations are distributed symmetrically. In this case, for each edge in, the quantity can be computed straightforwardly by setting. D. Previous Research Retiming has been investigated for a variety of clocking disciplines [9], [11], [14], [20], delay models [10], [21], and optimization objectives [2], [6], [17], [19]. In the context of edge-triggered circuits, a linear programming formulation of the clock scheduling problem was first described in [7], and a graph-theoretic approach to clock scheduling was presented in [5]. In both papers, the relative placement of the storage elements was assumed to be fixed. Algorithms for scheduling local clocks to improve the tolerance of an edge-triggered circuit to process parameter variations were described in [18]. The combined application of retiming and clock scheduling was first discussed in [16]. A mathematical framework for retiming and clock scheduling for maximum tolerance to clock delay variations under setup and hold constraints was investigated in [13]. A two-step procedure for minimizing the clock period of a synchronous circuit by combining retiming and clock scheduling was proposed in [3]. That work considered only setup violations. When both setup and hold constraints are considered, however, the solution space expands. The optimization of clock period in this broader solution space is discussed in [12]. Level-clocked circuits have the potential to achieve greater tolerance than edge-triggered ones with the same clock period.

5 188 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Fig. 4. A two-phase level-clocked implementation of the circuit in Fig. 2. Algorithms for maximizing the tolerance of level-clocked circuits to clock signal delay variations were investigated in [1] and [15]. The work in [1] investigates clock period minimization and tolerance maximization using clock scheduling with fixed latch locations. In that paper, the timing slacks of all existing datapaths and the tolerance of all clock tree paths are improved by as much as possible, without using retiming to relocate registers and balance the timing slacks of different paths. On the other end of the spectrum, the work in [15] applies retiming with fixed clock skew values that are assumed to be given and are not free parameters to be optimized. Moreover, clocking schemes are constrained to avoid hold violations. In our proposed method, both register locations and clock skew values are variables that are adjusted to optimize circuit performance. A direct comparison between our approach and the methods described in [1] and [15] is not possible because the quality of the results obtained with those methods can vary significantly depending on the given conditions, that is, the fixed locations of the latches in [1] and the fixed skew values and clocking scheme in [15]. Depending on the values of these parameters, our approach can outperform [1] and [15]. For example, Fig. 4 shows a two-phase level-clocked implementation of the edge-triggered circuit from Fig. 2(a). For the edge-triggered version of this circuit, combined retiming and clock scheduling can achieve a target clock period 12 tu with a tolerance of 8 tu, as shown in Section I. The clock scheduling algorithm in [1] cannot achieve a tolerance greater than 6 tu, however, since the timing constraints in that paper provide an upper bound of for the tolerance. For tu, the upper bound of 6 tu is obtained for and with tu and tu. Furthermore, the retiming algorithm in [15] cannot achieve tolerance greater than 5 tu. Specifically, to achieve a clock period of 12 tu, retiming must place a latch on an edge other than the interface, since and [15] constrains the two clock phases to be nonoverlapping. Such a latch cannot have tolerance greater than tu, however. In general, retiming and clock scheduling with level-clocked circuits is at least as effective as with edge-triggered circuits, assuming no skew variations among latches on the same wire. The combined application of the two optimizations on level-clocked circuits has yet to be investigated, however, and constitutes an interesting future research direction. III. RETIMING AND CLOCK SCHEDULING CONSTRAINTS In this section, we give a set of necessary and sufficient conditions for correct timing in a digital circuit that is optimized by retiming and clock scheduling. We first give the setup and hold constraints that must be satisfied by a correctly timed circuit when the placement of its registers is given and all clock skews are zero. We also give the constraints for correct timing with nonzero clock skew and a lower bound on the circuit s tolerance. The conditions for correct timing with retiming and clock scheduling are then derived by adapting these constraints to include the relocation of registers. For simplicity, we assume that register propagation delays are zero. It is straightforward to extend our analysis to encompass registers with nonzero propagation delay. When all clock skews are zero, the arrival times of the clock signals at the registers are all equal. In this case, the maximum time allowable for the propagation of data along every combinational path between any two registers in the circuit is equal to the clock period minus the register setup time. The shortest clock period that will not result in setup violations is thus bounded from below by the sum of the register setup time and the delay of the longest combinational path in the circuit. To avoid races, the minimum data propagation time should not be shorter than the register hold time. The following proposition gives a mathematical description of these setup and hold constraints that must hold for correct timing. Proposition 1: Let be an edge-triggered circuit, and let be a constant. Then, can be correctly timed by a zero-skew clock with period if and only if for every edge pair, in such that,, and,wehave (9) (10) When clock scheduling is introduced, the arrival times of the clock signals can be adjusted to balance the maximum allowable propagation times between different register pairs. The following theorem from [7] gives setup and hold constraints that must hold for a circuit to be correctly timed with a clock schedule and a target clock period. Theorem 2: Let be an edge-triggered circuit and a constant. Moreover, let be a clock scheduling function. Then, the circuit is timed correctly if and only if for every edge pair, in such that,, and,wehave (11) (12) Inequalities (11) and (12) can be rewritten as difference constraints in the form of a shortest paths problem [4]. A clock schedule that satisfies these constraints can be computed efficiently in time by solving a shortest paths problem on an appropriately constructed edge-weighted graph. For each edge, the set includes a vertex. For each inequality (11), the set includes an edge with weight. Moreover, for each inequality (12), includes an edge with weight. Due to variations in the physical parameters of a circuit, the delay values of its clock signals may change. In deep-submicron VLSI technologies, where interconnect delays dominate,

6 LIU et al.: RETIMING AND CLOCK SCHEDULING FOR DIGITAL CIRCUIT OPTIMIZATION 189 such variations can be significant compared with gate delays. A correctly designed circuit should be able to tolerate this variation, which means, when the clock scheduling variable assumes any value in the interval, the constraints in Theorem 2 should always hold. Therefore, they should also hold under the worst case assumptions, as stated in the following lemma. Lemma 3: Let be an edge-triggered circuit, and let be a given constant. Moreover, let and be assignments of minimum and maximum clock delays, respectively. Then, is timed correctly if and only if for every edge pair, in such that,, and,wehave (13) (14) where and are hold and setup time, respectively. Proof: We will show that (13) holds if and only if all setup constraints in the circuit hold. The proof for the hold constraints is similar. ( ) Assume (13) holds. For any clock schedule,wehave since,, and (13) holds. From (11) it follows that there are no setup violations. ( ) if (13) is violated, then the setup constraint between the edges and is violated. The setup and hold constraints for correct timing under nonzero clock skew and a specified tolerance to the variation of clock delays follow immediately from Lemma 3. Corollary 4: Let be an edge-triggered circuit. Moreover, let and be given real constants. Then, achieves a clock period with tolerance if and only if there exist functions and such that for each edge (15) and for every edge pair, in such that,, and (16) (17) By relying on Corollary 4 we can give a precise mathematical formulation for the retiming and clock scheduling problem. In this problem, we wish to compute a retiming, a clock schedule, and, so that the optimized circuit achieves a clock period with no timing violations and with tolerance. The following theorem formulates this problem as a set of constraints. Theorem 5: Let be a synchronous circuit, and let and be given constants. Moreover, let be a retiming function, let be an assignment of maximum clock delays, and let be an assignment of minimum clock delays. Then the retimed circuit is well formed and achieves a clock period with tolerance if and only if for every edge,wehave and for every pair of edges,wehave (18) (19) or or (20) or or (21) where and for the setup and hold constraints, respectively. Proof: Inequality (18) follows from (7) and ensures that the tolerance of is. Inequality (19) follows from (1) and (2) and, therefore, the resulting circuit is well formed. It remains to show that the timing constraints are satisfied. We will prove that (20) holds if and only there are no setup violations in. The proof for the hold constraints is similar. ( ) We show that if the setup constraints hold then (20) holds. The proof will be by contradiction. Assume that for every vertex pair and in, (13) holds. Moreover, let be two edges in such that,,, and. It follows that for and,wehave, which contradicts (13). ( ) We now show that if some setup constraint in is violated then (20) does not hold. The violation of (13) implies that for some edge pair, in such that,, and,wehave. By the definition of and a straightforward algebraic manipulation, it follows that. Therefore, (20) does not hold for the edges and in. IV. EXACT MILP FORMULATION The constraints in the statement of Theorem 5 do not appear to be amenable to efficient algorithmic solutions. In this section we recast them in the form of an equivalent exact mixed-integer linear program, thus enabling the application of powerful MILP solvers to the retiming and clock scheduling problem. We express the retiming and clock scheduling problem as a set of linear inequalities with integer and real unknowns. This set is obtained by restricting the solution space of the constraints in Theorem 5 while maintaining their feasibility. First, for each, we introduce two new integer variables and such that (22) 0 (23) 0 (24) 1 (25) (26)

7 190 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 The parameter in (26) is the maximum number of registers that retiming can place on any edge in and equals (27) where, and, and PI and PO are the sets of the primary inputs and outputs, respectively, in. This upper bound follows directly from the fact that retiming does not change the register count of cycles or I/O paths in a circuit graph. The following lemma proves that is an indicator variable for. Intuitively, maps each edge to the set 0 1, indicating whether the retimed register count is positive or not. Accordingly, gives the number of registers on in excess of 1, after retiming. Lemma 6: Assuming that the conditions in the statement of Theorem 5 hold, we have 0 0 (28) Proof: ( ) Since and are nonnegative and their sum equals,if then. ( ) If, (26) and (24) yield. Therefore, their sum,, is zero. Using and, (20) and (21) can be simplified by eliminating the implication and reducing the number of disjunctions it includes. The following lemma gives equivalent relations with only one disjunction. Lemma 7: Assuming that the conditions in the statement of Theorem 5 hold, (20) and (21) are equivalent to the disjunction or (29) or (30) where and for the setup and hold constraints, respectively. Proof: Since the predicate is equivalent to, (20) and (21) can be rewritten in the equivalent form or or or (31) or or or (32) Therefore, to prove the equivalence of relations (20) and (21) with relations (29) and (30), it remains to show that holds if and only if 1or 0or 0 (33) 1 (34) holds. ( ) Assuming that (33) holds, we prove that (34) holds by a straightforward case analysis on the values of,, and. Let. Since and. Therefore 1 1 and (34) holds. Now, assume that. From Lemma 6, we have. Since and,wehave and therefore (34) holds. The proof for the case is similar. ( ) We prove the contrapositive. Suppose that (33) does not hold. Then, from (2), (25), and (28), we have 0 and 1 and 1 (35) It follows that, and therefore (34) does not hold. In the following lemma we show how (29) and (30) can be replaced by two linear inequalities, thus paving the way for the formulation of the retiming and clock scheduling problem as a mixed-integer linear program. To obtain these bounds, we assume that the parameter and are bounded from above by some known constant that depends on the chip die size and physical constraints on the chip realization. Lemma 8: Assuming that the conditions in the statement of Theorem 5 hold, (29) and (30) are equivalent to (36) (37) where and for the setup and hold constraints, respectively, and is an upper bound on and Proof: ( ) We first prove that (29) implies (36). Inequality (36) follows by a straightforward case analysis on the value of the parameter. If, since is well formed,.if, then (29) implies that since the sum is integer and. Therefore, (36) holds. The proof for (37) is similar by replacing with. ( ) We now prove that (36) implies (29) by a case analysis on the value of. If, then (29) trivially holds. If, then. Since is an integer, (36) yields, and therefore Relation (29) holds. Similarly, we can prove (37) implies (30). Based on Lemmas 7 and 8, the problem of retiming and clock scheduling problem can now be recast as a mixed-integer linear 1 2

8 LIU et al.: RETIMING AND CLOCK SCHEDULING FOR DIGITAL CIRCUIT OPTIMIZATION 191 program with linear inequalities, integer unknowns, and real unknowns. The complete statement of this program is given in the following theorem. Theorem 9: Let be an edge-triggered circuit, and let and be given constants. Then there exists a retiming function and clock schedules and such that the retimed circuit is well formed and achieves clock period with tolerance if and only if there exists a retiming function, clock schedules and, and functions and such that for every edge,wehave for every edge,we have and for every pair of edges (38) 0 (39) 0 (40) 1 (41) (42),wehave (43) (44) (45) 2 (46) 2 (47) where and for the setup and hold constraints, respectively. Proof: This theorem follows directly from Theorem 5 and Lemmas 7 and 8. Relation (38) and inequalities (39) (42) ensure that is well formed and is an indicator of. Inequality (43) ensures that the resulting schedule has tolerance. Inequalities (44) and (45) enforce the upper bound of the parameters and. Inequalities (46) and (47) enforce the setup and hold constraints for the clock period. The unknowns are the integers, the integers, the integers, the reals, and the reals. (Note that the variables are not independent, since (38) can be eliminated by substituting into the left-hand sides of (40) and (42).) All inequalities are linear in their unknowns. V. HEURISTIC OPTIMIZATION In this section we describe a heuristic approach for solving the minimum-period and the maximum-tolerance optimization versions of the retiming and clock scheduling problem. Even though the MILP constraints presented in Section IV can be used to compute an exact solution to the problem, the worst case exponential runtime of MILP solvers becomes evident even for circuits of relatively small size. Our heuristics run substantially faster than the exact MILP solvers. They rely on a common procedure that identifies critical paths based on clock scheduling information and generates a retimed circuit with improved clock period or tolerance to delay variations. To obtain an optimal circuit, this scheduling-guided retiming procedure is applied iteratively until a local optimum of the solution space is reached. Section V-A describes our Procedure SGR for scheduling-guided retiming. Section V-B proves that SGR correctly identifies and removes timing bottlenecks. Section V-C gives our heuristics for the problems of minimum-period retiming and clock scheduling and maximum-tolerance retiming and clock scheduling. A. Scheduling-Guided Retiming The scheduling-guided retiming procedure we describe in this section relies on three auxiliary slack graphs to encode and manipulate tight path constraints, that is, timing and tolerance constraints that hold with equality for given and. Specifically, a graph is used to encode the tight setup constraints, a graph is introduced to capture the tight hold constraints, and a graph is constructed to encode both the setup and the hold constraints in a single representation. The three slack graphs are constructed as follows. For each edge in such that, a vertex is introduced in all three graphs. Each vertex is associated with a weight equal to the register count of its corresponding edge. For simplicity, we overload to denote in each slack graph the vertex of a corresponding edge in. Similarly, we use to denote the corresponding vertex weight. All edges connected to a primary input or output of are assumed to have a register and are represented in each slack graph by a single vertex with. The register on the edge serves as the I/O interface of the circuit and is not relocated by retiming. The directed edges of the slack graphs correspond to combinational register-to-register paths in for which both the timing and the tolerance constraints are satisfied with equality. Specifically, for each pair of edges, in such that,, and, an edge is introduced in if (48) Equation (48) is derived by combining (15) and (16) and replacing the inequality by equality. Whenever (48) holds, the corresponding setup constraint is said to be tight and gives rise to a timing bottleneck. Similarly, an edge is added to if (49) This equation is derived by combining (15) and (17) and signifies a timing bottleneck due to a hold constraint between registers on and. The graph is the union of and with all the edges in inverted. This inversion is a mere convenience that enables the uniform handling of both the setup and the hold constraints. As it can be verified by bringing (17) into the form (50)

9 192 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Fig. 5. Circuit graph and associated slack graphs. Fig. 6. Algorithm SGR for scheduling-guided retiming. and subsequently comparing it with (16), from a mathematical standpoint the hold constraint from to is equivalent to a setup constraint from to with path delay and setup time. Thus, when the direction of the hold edges is inverted in, timing bottlenecks due to combinations of tight setup and tight hold constraints appear as directed paths, and timing optimality can be verified by identifying a directed cycle in. A formal proof of this statement is given in Theorem 10. Fig. 5 gives an example of how slack graphs are constructed. When tolerance is set to 1 tu, the original circuit in Fig. 5(a) can be correctly timed with a minimum clock period of 20 tu. The ranges [ ] of the optimal clock schedules are indicated next to each register, assuming that both and are zero. Fig. 5(b) shows the corresponding slack graphs 20 1 and The path from register to register in the original circuit satisfies both (48) and (49). An edge is therefore added between the corresponding vertices in 20 1 and The graph 20 1 combining 20 1 and 20 1 is shown in Fig. 5(c). The directed cycle in this graph indicates that for the specified tolerance and register placement, the given clock schedule is optimum and achieves the minimum possible clock period. Our algorithm SGR for scheduling-guided retiming is given in Fig. 6. The inputs to this procedure are a circuit graph, clock schedules and, a clock period, and a tolerance. The input graph may be an original or a retimed circuit. If the procedure succeeds in identifying and eliminating a timing bottleneck in by generating a retimed graph, it returns. Otherwise, it returns False. Initially, algorithm SGR constructs the slack graphs,, and. It then proceeds to detect timing bottlenecks by identifying directed cycles in and. Once discovered, a bottleneck is removed by retiming the original input graph. The for loop in lines 2 4 searches for the special case of timing bottlenecks comprising only setup constraints. If an edge with register count greater than 1 participates in such a constraint, one of its registers is shifted to a register-free edge along the bottleneck. A tight setup constraint is thus removed, creating the potential for a shorter clock period or greater tolerance. The for loop in lines 5 11 searches for timing bottlenecks comprising both setup and hold constraints. In lines 6 8, the procedure discovers a tight setup and a tight hold constraint that start from the same register. By shifting that register forward, it decreases the maximum delay of the long path and eliminates the corresponding setup constraint. It also broadens the range of clock

10 LIU et al.: RETIMING AND CLOCK SCHEDULING FOR DIGITAL CIRCUIT OPTIMIZATION 193 schedules that satisfy the hold constraint along the short path, thus creating opportunities for further clock period or tolerance optimization. In lines 9 11, a pair of tight setup and hold constraints leading to the same register is detected, and the register is shifted backward to relax them. If no register can be moved, line 12 reports the failure to build a new. The runtime of algorithm SGR is dominated by its three loops. Each loop iterates times, since each slack graph has edges. The runtime of each loop body execution depends on the implementations of the shifting transformations in lines 3, 7, and 10. A straightforward implementation of register shifting that propagates the perturbation to adjacent edges until no edge has a register deficit takes time. In this case, the total runtime of each loop is steps. More elaborate shifting schemes that also consider constraints on the register counts of specific edges require the solution of single-source shortest-paths problems with negative edge-weights. The construction of the three slack graphs in line 1 takes time, since may have edges with nonzero register counts. In lines 2 and 5, the detection of all edges that participate in some directed cycle of or can be performed in steps by running a breadth-first search that proceeds until it discovers levels. Algorithm SGR runs very fast in actual circuits because the number of vertices in the slack graphs equals to the number of registers in the circuit at most, which is usually far less than. B. Timing Bottleneck Elimination In this section we prove that algorithm SGR correctly identifies timing bottlenecks in a circuit, given a specified clock schedule, a clock period, and a tolerance. We then prove that it correctly retimes to remove a tight timing constraint that contributes to an identified bottleneck. Algorithm SGR identifies timing bottlenecks by detecting directed cycles in the slack graph. The following theorem proves that each timing bottleneck in does indeed correspond to a special directed cycle in. In addition to bottlenecks due to cyclic sequential paths in, the theorem encompasses bottlenecks due to acyclic sequential paths from a primary input to a primary output, since each such path corresponds to a closed path in that starts and ends with the vertex. Furthermore, it also considers possible bottlenecks due to acyclic sequential paths between any two registers since both setup and hold constraints are considered. Theorem 10: Let be an edge-triggered circuit, and let and be given reals. Moreover, let and be minimum and maximum clock schedules, respectively, such that achieves a clock period with tolerance. Then for the given placement of registers in, is the shortest clock period that clock scheduling can achieve with tolerance if and only if there exists a directed cycle in which contains a tight setup constraint. Proof: ( ) The proof is by contradiction. Specifically, we show that if the directed graph has no cycle containing a tight setup constraint, then there exist clock schedules and such that achieves a clock period with tolerance, thus contradicting the minimality of. To that end, we first describe a procedure for labeling the vertices in. We then Fig. 7. (a) T (; ) with a single strongly connected component c; g; h from edges corresponding to hold constraints. (b) Construction of T (; ). (c) Labeling of T (; ). rely on these labels to compute new clock schedules and that distribute timing slack among all tight timing constraints and yield a clock period with tolerance. The absence of tight setup constraints from all directed cycles in allows us to compute a labeling of its vertices such that for each edge,wehave 1 (51) (52) where and are the labels of and, respectively. The labels can be computed by performing a single-source longestpaths computation on a directed acyclic graph. This graph is derived from by grouping together into a single vertex all vertices that participate in the same strongly connected component formed by tight hold constraints. Fig. 7 shows the derivation of from a tight graph. The solid lines represent tight setup constraints, and the dashed lines represent tight hold constraints. The shaded nodes form a single strongly connected component from edges corresponding to tight hold constraints and are grouped together in. Since has no cycle containing a tight setup constraint, the graph is acyclic. It is thus possible to assign to each vertex in the length of the longest path from a source vertex to [4]. Fig. 7(c) shows the derivation of labels on. For each vertex that is also present in, wehave. For all vertices in the same strongly connected component represented by,wehave. In other words, all vertices in the same strongly connected component in have the same label. We now define the slack of a vertex in as the minimum timing margin over all hold constraints between and all upstream registers in but not in the same strongly connected component and over all setup constraints between and downstream registers in. Intuitively, this slack is the largest value by which the skew between and any can increase without violating any timing constraint except for hold violations within the same strongly connected component. If some of these timing constraints is tight, then. In mathematical

11 194 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 terms, for an edge in with, the margin is defined as follows:. In this case, a straightforward algebraic manipulation yields (53) The slack of the tight graph is defined as. Based on the acyclicity of, we can prove that. Indeed, since is acyclic, there exists a vertex, and a corresponding vertex, with no outgoing edge. Therefore, has no tight setup constraint with any downstream register in. Moreover, has no tight hold constraint with any upstream register in except for those within a strongly connected component. From the definition of it follows that, and therefore. We now rely on the labeling and the slack to compute new clock schedules and. For each vertex, the new clock schedule is defined as where schedule is defined as. Furthermore, the clock The following straightforward case analysis shows that the new clock schedules and satisfy the timing constraints in for clock period and tolerance. Let be a path in such that,, and. By relying on the definitions of and, the setup constraint along this path can be written as follows: (55) If the setup constraint between and is tight, it follows that and. Therefore (56) From (54) (56), and the definitions of and, it follows that and satisfy the setup constraint from to with clock period and tolerance. A similar case analysis shows that and satisfy the hold constraint between and. Moreover, in the case that and belong to the same connected component in, their clock schedules change by the same amount, and the hold constraints among them still hold tight. Therefore, there exists a clock schedule that achieves a clock period with tolerance. ( ) We now prove that if there exists a directed cycle in which contains at least one tight setup constraint, then is the shortest clock period that clock scheduling can achieve with tolerance. Consider an edge in that encodes a setup constraint on a combinational path from a register on edge to a register on edge in. From (15) and (16), this constraint yields (57) Alternatively, if encodes a hold constraint from to, then (15) and (17) yield (58) By rearranging the terms in (57) and (58), we obtain the equivalent inequalities Adding up (59) and (60) along, we obtain (59) (60) (54) The schedules and satisfy the setup constraint between and with and. If this constraint is not tight, then

12 LIU et al.: RETIMING AND CLOCK SCHEDULING FOR DIGITAL CIRCUIT OPTIMIZATION 195 Fig. 8. Result of register relocation for the original circuit in Fig. 5. (a) Register l is shifted forward. (b) Register k is shifted backward. where and denote the edges in that correspond to setup and hold constraints, respectively. The summation on the left-hand side of this inequality telescopes, yielding (61) where denotes the edge count of the set. Since there exists at least one tight setup constraint, we can now obtain the following lower bound on the clock period by rearranging (61) as follows: (62) where denotes the edge count of the sets. The right-hand side of (62) gives a lower bound for that is independent of the clock schedules and. If the choice of and does induce the directed cycle in, (57) and (58) are satisfied with equality along. Propagating this equality all the way to (62), we infer that the lower bound is attained. Therefore, the associated clock period is the shortest one achievable with tolerance. Theorem 10 implies that whenever has a directed cycle which contains at least a tight setup constraint, the clock period of a circuit cannot be improved by clock scheduling without reducing the tolerance to clock delay variations. Retiming can be used to break such cycles by relocating the registers along the critical paths of the circuit. The following theorem shows that the register relocations in lines 3, 7, and 10 of algorithm SGR break a directed cycle in. Theorem 11: Each execution of lines 3, 7, and 10 in algorithm SGR eliminates a tight timing constraint associated with the timing bottleneck detected in lines 2, 6, and 9, respectively. Proof: When line 3 executes, a register is placed on a register-free edge, between the edges and that give rise to a tight setup constraint. Since is a subpath of, it follows that and the tight setup constraint between and has thus been removed. When line 7 executes, the delay along the setup path that ends at is reduced. The corresponding tight setup constraint is thus removed. Similarly, when line 10 executes, the setup path starting at becomes shorter and the corresponding tight constraint is eliminated. The retiming transformations in lines 7 and 10 create the potential for further clock period and tolerance optimization. For example, consider the forward relocation of a register across a vertex under the conditions of the predicate in the if statement of line 6. In the transformed circuit, the maximum delays of the paths that end at and decrease by, whereas the minimum delays decrease by. Thus, clock scheduling can redistribute a slack of among other tight setup constraints downstream from. Moreover, it can redistribute a slack of among other tight hold constraints downstream from. It thus allows for the timing constraints to be met with a shorter clock period or greater tolerance. The reader should note that even though they promote optimization, the retiming transformations of algorithm SGR do not guarantee the improvement of the clock period or the tolerance in the resulting circuit. When a register is shifted forward to remove a tight setup constraint with a downstream register, for example, a setup violation with an upstream register may result. A careful implementation of the relocation steps in lines 7 and 10 of algorithm SGR should consider the violation of constraints that currently hold before deciding to move a register. Fig. 8 shows how register relocation can result in a faster or more delay-tolerant circuit. By examining the slack graphs in Fig. 5 for tu and tu, we infer that vertex satisfies the condition in line 6 of algorithm SGR and can thus be shifted forward. Furthermore, vertex satisfies the condition of line 9 and can be shifted backward. The circuits resulting from each of these possible transformations are shown in Fig. 8. With tolerance remaining at tu, clock scheduling results in a shorter clock period for both circuits. The retimed circuit in Fig. 8(a) achieves tu, whereas the circuit in Fig. 8(b)

13 196 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Fig. 9. Procedure RSMINP for simultaneous retiming and clock scheduling to minimize clock period. Fig. 10. Procedure RSMAXT for simultaneous retiming and clock scheduling to maximize clock delay tolerance. achieves tu. The intervals above the register names specify the ranges of the skewed clock arrival times. Alternatively, with the clock period fixed at tu, clock scheduling can increase the delay tolerance of both retimed circuits to tu. The ranges of the clock arrival times are given below the corresponding register names. C. Period and Tolerance Optimization This section describes two heuristic procedures for the optimization versions of the retiming and clock scheduling problem. These procedures are not guaranteed to find the optimum solution to their respective problems. As shown in Section VI, however, they outperform the MILP-based binary search algorithms by achieving a good tradeoff between program runtime and circuit performance improvement. Fig. 9 gives pseudocode for algorithm RSMINP that performs simultaneous retiming and clock scheduling to minimize clock period. Given an edge-triggered circuit and a target delay tolerance, algorithm RSMINP returns a retimed circuit, an optimal clock period, and clock schedules and such that achieves with the given tolerance. The main idea in this heuristic is to iteratively perform clock scheduling followed by scheduling guided retiming. In line 1, is initialized to a retimed version of the input circuit that operates with minimum clock period under zero clock skew. The registers in this circuit have been relocated so that maximum delays are distributed evenly along its paths. In line 2, algorithm SCHEDMINP solves the clock scheduling optimization problem for with tolerance. Based on Section III, this computation can be performed by binary searching the tolerance domain with a single-source shortest-paths algorithm. The while loop in lines 3 8 iterates until the clock period cannot be reduced any further for a predefined number of iterations. Each execution of line 3 invokes algorithm SGR with the objective to relax the tight setup and hold constraints for the clock schedules, and the corresponding optimal clock period returned by algorithm SCHEDMINP. In line 4, clock scheduling is performed on the retimed circuit returned by algorithm SGR. Lines 5 6 update the best solution found so far. The loop exits when no new is found or the clock period has not improved for a given number of times. Fig. 10 gives a pseudocode description of algorithm RSMAXT for maximum-tolerance retiming and clock scheduling. Similar to algorithm RSMINP, this procedure is based on scheduling-guided retiming to create a sequence of retimed circuits and converge toward an optimal tolerance. Given an edge-triggered circuit and a target clock period, this procedure returns a retimed circuit, an optimal clock tolerance, and corresponding clock schedules and such that achieves the given clock period with maximum tolerance. Procedure

14 LIU et al.: RETIMING AND CLOCK SCHEDULING FOR DIGITAL CIRCUIT OPTIMIZATION 197 SCHEDMAXT in line 2 performs a binary search to compute the maximal delay tolerance that can achieve with clock period. Next, the while loop in lines 3 8 iteratively retimes until tolerance cannot be improved any further or no new retiming can be found. TABLE I STATISTICS OF TEST CIRCUITS VI. EXPERIMENTAL RESULTS In this section, we discuss extensive experimental results from the application of retiming and clock scheduling to the optimization of the LGSynth93 and ISCAS89 benchmark circuits. The statistics of the circuits in our test suite are given in Table I. Section VI-A presents the results we obtained for the clock period minimization problem and Section IV-B gives our results for the tolerance maximization problem. In both cases, the combined application of the two optimizations resulted in significant improvements over the separate application of either of the two. We also compared the results of our heuristic with those of sequential retiming and clock scheduling. For the circuits in our test suite, our heuristic achieved marginally better speedups than retiming for maximum speed followed by scheduling for maximum speed. With respect to tolerance maximization, however, our heuristic achieved 12% higher tolerance, on the average, than sequential retiming and clock scheduling. Our proposed heuristic approach was orders of magnitude faster than the exact MILP-based optimizer with little sacrifice in accuracy. Our software was developed in C and ran on a Pentium Pro II with 128MB of main memory. A. Clock Period Minimization Simultaneous retiming and clock scheduling was applied to improve the clock period of LGSynth93 and ISCAS89 benchmark circuits. We first evaluated the extent to which the combination of retiming and clock scheduling is better than either optimization. We subsequently compared our heuristic with the sequential application of the two optimizations. The following experimental methodology was used to compare the joint retiming and clock scheduling with either of the two approaches. First, for each circuit, the shortest clock period under zero skew was computed. Subsequently, each benchmark circuit was optimized by retiming, clock scheduling, and simultaneous retiming and clock scheduling to achieve the minimum possible clock periods,, and, respectively. All clock periods were computed with zero tolerance. Simultaneous retiming and clock scheduling was performed using the heuristic procedure RSMINP. Fig. 11 gives the relative improvement achieved by heuristic retiming and clock scheduling over the best result obtained by either retiming or clock scheduling. Our results show that combined retiming and clock scheduling does improve over the separate application of the two optimizations. Improvements greater than 10% are achieved in only five cases, however, and most of the circuits show no improvement. This relatively unimpressive performance can be attributed to the structure of our original test circuits. Most of these circuits are finite state machines with single-register loops or with input/output combinational paths. Thus, they are not amenable to performance optimization by retiming or clock scheduling. To explore the potential of simultaneous retiming and clock scheduling on a test suite that is more representative of typical circuits, we slightly modified the original benchmark circuits by inserting an additional register into each single-register loop. Specifically, each register whose output was connected to its own input via a purely combinational path was replaced by a pair of back-to-back registers. This modification does not change the clock period of the unoptimized circuits. It nevertheless creates more potential for retiming and clock scheduling to improve circuit performance. The optimal clock periods,, and were computed on the modified test circuits. Relative improvements are given in Fig. 12. Simultaneous retiming and clock scheduling improved the clock period for more than half of the modified test suite. For one-third of

15 198 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Fig. 11. Relative clock period improvements with original circuits over retiming or clock scheduling. Fig. 12. Relative clock period improvements with modified circuits over retiming or clock scheduling. Fig. 13. Relative clock period improvements with modified circuits over the sequential retiming and clock scheduling. the test circuits, improvements exceeded 10%. In four cases, relative improvements exceeded 30%. The average improvement was 8%. These results show the superiority of combined retiming and clock scheduling to the separate application of the two optimizations. In addition to the separate application of retiming or clock scheduling, we compared our heuristic algorithm RSMINP with a relatively straightforward sequential heuristic. Specifically, for all modified circuits, we computed the minimum possible clock periods that can be obtained by performing minimum-period retiming followed by minimum-period scheduling. As with our heuristic, the tolerance was set to zero. Although this heuristic may fail to obtain the shortest possible clock period, as we demonstrated in Section I, it proved to be quite effective on our modified circuits with respect to clock period minimization. Fig. 13 shows the relative improvements obtained by our heuristic over the sequential heuristic on the modified circuits. Noticeable improvements were achieved for only a quarter of the circuits in the test suite. Average and maximum improvements were 1.6% and 28%, respectively. TABLE II CLOCK PERIOD MINIMIZATION WITH ALGORITHM RSMINP AND AN EXACT MILP-BASED BRANCH-AND-BOUND ALGORITHM

16 LIU et al.: RETIMING AND CLOCK SCHEDULING FOR DIGITAL CIRCUIT OPTIMIZATION 199 Fig. 14. Optimal tolerance with =. The runtime of Procedure RSMINP was always longer than that of the sequential heuristic, since RSMINP uses the sequential heuristic as a preprocessing step. If RSMINP could not find a better solution than the sequential heuristic, it terminated quickly and achieved comparable runtimes. Whenever Procedure RSMINP computed better solutions, however, its runtime could be several times longer then the sequential heuristic. To evaluate the relative speed and efficiency of our retiming and clock scheduling heuristic, we independently developed a MILP-based branch-and-bound solver. Table II compares the runtimes and output clock periods of the two programs for a subset of our test suite. In general, the CPU requirements of the MILP-based optimizer grow very fast, due to the high computational complexity of mixed-integer linear programming. With a 48-hour timeout, the MILP-solver runs out of time on most circuits, without having discovered a better solution than the heuristic scheme. The last column of Table II shows the relative clock period improvement achieved by the MILP-based solver over our heuristic. Except for daio, the fastest circuit computed by the heuristic is as good as that of the MILP solver. For daio, the heuristic solution comes within 1% of the MILP solver. In the case of s1423, the heuristic computes a better solution, because the exact solver reaches its timeout limit before discovering a better solution. Whenever the exact solver terminates with the optimal answer, it is one to five orders of magnitude slower than the heuristic. In all the experiments, gate delays were derived from the widely used linear delay formula. For each gate, we set and, where and were parameters obtained from the library iwls93.mis2lib in the LGSynth93 benchmark. The parameter was a random number uniformly distributed in the interval [0,1). The relatively small range of this parameter resulted in small variations between the maximum and minimum gate delays, thus limiting the effectiveness of combined retiming and clock scheduling over the sequential application of the two heuristics. The delays and were computed once and then were kept constant throughout the retiming process. Although fanout can change in zero-skew retiming, when clock skew is not zero, registers can only be merged when they have the same skew, an event that occurs very rarely. In this paper, we did not consider register sharing, and therefore the fanout did not change during retiming. B. Delay Tolerance Maximization In addition to clock period optimization, we applied retiming and clock scheduling to our modified benchmark circuits to maximize their tolerance to clock delay variations. In our experiments, target clock periods ranged from to. For each clock period, the test circuits were optimized by retiming, clock scheduling, sequential retiming and clock scheduling, and Procedure RSMAXT to achieve the maximum tolerance,,, and, respectively. The sequential heuristic performed minimal-period retiming followed by maximal tolerance scheduling with the given target clock period. Generally speaking, retiming had little effect on tolerance improvement in our experiments and was outperformed by clock scheduling for most circuits. The sequential heuristic achieved the highest tolerance among,, and. As shown in Figs , for most target clock periods and most test circuits, our heuristic RSMAXT achieved higher tolerance than the best achievable by the other three optimization methods. For, clock scheduling performs as well as the combined application of retiming and clock scheduling in almost all cases. When the target clock period is relatively long, each circuit has only a few critical paths and abundant timing slack that can be distributed with no need for register relocation. It is thus possible for clock scheduling to set the clock arrival times near the middle of their respective permissible ranges, thus achieving the same tolerance as the combination of retiming and clock scheduling. As the target clock period decreases, our retiming and clock scheduling heuristic outperforms the other three alternatives by an increasingly higher margin. Moreover, it is capable of achieving shorter clock periods than the alternatives. Fig. 19 gives the fraction of test circuits for which our heuristic and its three alternatives can achieve correct timing. For, Procedure RSMAXT can meet the required timing for about 20% more circuits than the combination of the other three methods. For each circuit and each target clock period, we calculated the normalized tolerance achieved by our heuristic RSMAXT and the best of the other optimizations

17 200 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Fig. 15. Optimal tolerance with = 0:95 2. Fig. 16. Optimal tolerance with = 0:85 2. Fig. 17. Optimal tolerance with = 0: Whenever the target clock period could not be achieved, we set tolerance to zero. Fig. 20 shows the average tolerance improvements with respect to the target clock period. As the target decreases, average relative improvements increase monotonically, exceeding 25% at and averaging 12% across the entire range. Our combined retiming and clock scheduling methodology is thus most promising for high-speed circuit design with a tight timing budget. Table III compares the efficiency and effectiveness of the heuristic solver and the MILP-based solver for a target clock of. With a timeout limit set to 48 hours, the MILP solver improves the tolerance of only a small number of circuits from our test suite. For every circuit omitted from the table, the MILP

Statistical Timing Analysis Using Bounds and Selective Enumeration

Statistical Timing Analysis Using Bounds and Selective Enumeration IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 9, SEPTEMBER 2003 1243 Statistical Timing Analysis Using Bounds and Selective Enumeration Aseem Agarwal, Student

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize. Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions.

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions. CS 787: Advanced Algorithms NP-Hardness Instructor: Dieter van Melkebeek We review the concept of polynomial-time reductions, define various classes of problems including NP-complete, and show that 3-SAT

More information

MOST attention in the literature of network codes has

MOST attention in the literature of network codes has 3862 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 8, AUGUST 2010 Efficient Network Code Design for Cyclic Networks Elona Erez, Member, IEEE, and Meir Feder, Fellow, IEEE Abstract This paper introduces

More information

General properties of staircase and convex dual feasible functions

General properties of staircase and convex dual feasible functions General properties of staircase and convex dual feasible functions JÜRGEN RIETZ, CLÁUDIO ALVES, J. M. VALÉRIO de CARVALHO Centro de Investigação Algoritmi da Universidade do Minho, Escola de Engenharia

More information

Column Generation Method for an Agent Scheduling Problem

Column Generation Method for an Agent Scheduling Problem Column Generation Method for an Agent Scheduling Problem Balázs Dezső Alpár Jüttner Péter Kovács Dept. of Algorithms and Their Applications, and Dept. of Operations Research Eötvös Loránd University, Budapest,

More information

WE consider the gate-sizing problem, that is, the problem

WE consider the gate-sizing problem, that is, the problem 2760 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL 55, NO 9, OCTOBER 2008 An Efficient Method for Large-Scale Gate Sizing Siddharth Joshi and Stephen Boyd, Fellow, IEEE Abstract We consider

More information

Adaptations of the A* Algorithm for the Computation of Fastest Paths in Deterministic Discrete-Time Dynamic Networks

Adaptations of the A* Algorithm for the Computation of Fastest Paths in Deterministic Discrete-Time Dynamic Networks 60 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 3, NO. 1, MARCH 2002 Adaptations of the A* Algorithm for the Computation of Fastest Paths in Deterministic Discrete-Time Dynamic Networks

More information

Optimizing Computations for Effective Block-Processing

Optimizing Computations for Effective Block-Processing Optimizing Computations for Effective Block-Processing KUMAR N. LALGUDI Intel Corporation MARIOS C. PAPAEFTHYMIOU University of Michigan and MIODRAG POTKONJAK University of California Block-processing

More information

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming.

Retiming. Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford. Outline. Structural optimization methods. Retiming. Retiming Adapted from: Synthesis and Optimization of Digital Circuits, G. De Micheli Stanford Outline Structural optimization methods. Retiming. Modeling. Retiming for minimum delay. Retiming for minimum

More information

Matching Theory. Figure 1: Is this graph bipartite?

Matching Theory. Figure 1: Is this graph bipartite? Matching Theory 1 Introduction A matching M of a graph is a subset of E such that no two edges in M share a vertex; edges which have this property are called independent edges. A matching M is said to

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

Scheduling Algorithms to Minimize Session Delays

Scheduling Algorithms to Minimize Session Delays Scheduling Algorithms to Minimize Session Delays Nandita Dukkipati and David Gutierrez A Motivation I INTRODUCTION TCP flows constitute the majority of the traffic volume in the Internet today Most of

More information

Lecture 2 September 3

Lecture 2 September 3 EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give

More information

INTERLEAVING codewords is an important method for

INTERLEAVING codewords is an important method for IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 2, FEBRUARY 2005 597 Multicluster Interleaving on Paths Cycles Anxiao (Andrew) Jiang, Member, IEEE, Jehoshua Bruck, Fellow, IEEE Abstract Interleaving

More information

22 Elementary Graph Algorithms. There are two standard ways to represent a

22 Elementary Graph Algorithms. There are two standard ways to represent a VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph

More information

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings

On the Relationships between Zero Forcing Numbers and Certain Graph Coverings On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,

More information

Rigidity, connectivity and graph decompositions

Rigidity, connectivity and graph decompositions First Prev Next Last Rigidity, connectivity and graph decompositions Brigitte Servatius Herman Servatius Worcester Polytechnic Institute Page 1 of 100 First Prev Next Last Page 2 of 100 We say that a framework

More information

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem CS61: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem Tim Roughgarden February 5, 016 1 The Traveling Salesman Problem (TSP) In this lecture we study a famous computational problem,

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

1. Lecture notes on bipartite matching February 4th,

1. Lecture notes on bipartite matching February 4th, 1. Lecture notes on bipartite matching February 4th, 2015 6 1.1.1 Hall s Theorem Hall s theorem gives a necessary and sufficient condition for a bipartite graph to have a matching which saturates (or matches)

More information

Scheduling Unsplittable Flows Using Parallel Switches

Scheduling Unsplittable Flows Using Parallel Switches Scheduling Unsplittable Flows Using Parallel Switches Saad Mneimneh, Kai-Yeung Siu Massachusetts Institute of Technology 77 Massachusetts Avenue Room -07, Cambridge, MA 039 Abstract We address the problem

More information

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS TIMOTHY L. VIS Abstract. A significant problem in finite optimization is the assignment problem. In essence, the assignment

More information

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers , October 20-22, 2010, San Francisco, USA Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers I-Lun Tseng, Member, IAENG, Huan-Wen Chen, and Che-I Lee Abstract Longest-path routing problems,

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

5. Lecture notes on matroid intersection

5. Lecture notes on matroid intersection Massachusetts Institute of Technology Handout 14 18.433: Combinatorial Optimization April 1st, 2009 Michel X. Goemans 5. Lecture notes on matroid intersection One nice feature about matroids is that a

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

THE TRANSITIVE REDUCTION OF A DIRECTED GRAPH*

THE TRANSITIVE REDUCTION OF A DIRECTED GRAPH* SIAM J. COMPUT. Vol. 1, No. 2, June 1972 THE TRANSITIVE REDUCTION OF A DIRECTED GRAPH* A. V. AHO, M. R. GAREY" AND J. D. ULLMAN Abstract. We consider economical representations for the path information

More information

Winning Positions in Simplicial Nim

Winning Positions in Simplicial Nim Winning Positions in Simplicial Nim David Horrocks Department of Mathematics and Statistics University of Prince Edward Island Charlottetown, Prince Edward Island, Canada, C1A 4P3 dhorrocks@upei.ca Submitted:

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A 4 credit unit course Part of Theoretical Computer Science courses at the Laboratory of Mathematics There will be 4 hours

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

Multi-Cluster Interleaving on Paths and Cycles

Multi-Cluster Interleaving on Paths and Cycles Multi-Cluster Interleaving on Paths and Cycles Anxiao (Andrew) Jiang, Member, IEEE, Jehoshua Bruck, Fellow, IEEE Abstract Interleaving codewords is an important method not only for combatting burst-errors,

More information

Abstract Path Planning for Multiple Robots: An Empirical Study

Abstract Path Planning for Multiple Robots: An Empirical Study Abstract Path Planning for Multiple Robots: An Empirical Study Charles University in Prague Faculty of Mathematics and Physics Department of Theoretical Computer Science and Mathematical Logic Malostranské

More information

THE CAPACITY demand of the early generations of

THE CAPACITY demand of the early generations of IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 3, MARCH 2007 605 Resequencing Worst-Case Analysis for Parallel Buffered Packet Switches Ilias Iliadis, Senior Member, IEEE, and Wolfgang E. Denzel Abstract

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

1 Linear programming relaxation

1 Linear programming relaxation Cornell University, Fall 2010 CS 6820: Algorithms Lecture notes: Primal-dual min-cost bipartite matching August 27 30 1 Linear programming relaxation Recall that in the bipartite minimum-cost perfect matching

More information

LECTURES 3 and 4: Flows and Matchings

LECTURES 3 and 4: Flows and Matchings LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that

More information

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch.

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. Iterative Improvement Algorithm design technique for solving optimization problems Start with a feasible solution Repeat the following step until no improvement can be found: change the current feasible

More information

Foundations of Computing

Foundations of Computing Foundations of Computing Darmstadt University of Technology Dept. Computer Science Winter Term 2005 / 2006 Copyright c 2004 by Matthias Müller-Hannemann and Karsten Weihe All rights reserved http://www.algo.informatik.tu-darmstadt.de/

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

Fundamental Properties of Graphs

Fundamental Properties of Graphs Chapter three In many real-life situations we need to know how robust a graph that represents a certain network is, how edges or vertices can be removed without completely destroying the overall connectivity,

More information

Network Topology Control and Routing under Interface Constraints by Link Evaluation

Network Topology Control and Routing under Interface Constraints by Link Evaluation Network Topology Control and Routing under Interface Constraints by Link Evaluation Mehdi Kalantari Phone: 301 405 8841, Email: mehkalan@eng.umd.edu Abhishek Kashyap Phone: 301 405 8843, Email: kashyap@eng.umd.edu

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

Integer Programming Theory

Integer Programming Theory Integer Programming Theory Laura Galli October 24, 2016 In the following we assume all functions are linear, hence we often drop the term linear. In discrete optimization, we seek to find a solution x

More information

Generalized Network Flow Programming

Generalized Network Flow Programming Appendix C Page Generalized Network Flow Programming This chapter adapts the bounded variable primal simplex method to the generalized minimum cost flow problem. Generalized networks are far more useful

More information

Lecture 7. s.t. e = (u,v) E x u + x v 1 (2) v V x v 0 (3)

Lecture 7. s.t. e = (u,v) E x u + x v 1 (2) v V x v 0 (3) COMPSCI 632: Approximation Algorithms September 18, 2017 Lecturer: Debmalya Panigrahi Lecture 7 Scribe: Xiang Wang 1 Overview In this lecture, we will use Primal-Dual method to design approximation algorithms

More information

EXTREME POINTS AND AFFINE EQUIVALENCE

EXTREME POINTS AND AFFINE EQUIVALENCE EXTREME POINTS AND AFFINE EQUIVALENCE The purpose of this note is to use the notions of extreme points and affine transformations which are studied in the file affine-convex.pdf to prove that certain standard

More information

Hardness of Subgraph and Supergraph Problems in c-tournaments

Hardness of Subgraph and Supergraph Problems in c-tournaments Hardness of Subgraph and Supergraph Problems in c-tournaments Kanthi K Sarpatwar 1 and N.S. Narayanaswamy 1 Department of Computer Science and Engineering, IIT madras, Chennai 600036, India kanthik@gmail.com,swamy@cse.iitm.ac.in

More information

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs Advanced Operations Research Techniques IE316 Quiz 1 Review Dr. Ted Ralphs IE316 Quiz 1 Review 1 Reading for The Quiz Material covered in detail in lecture. 1.1, 1.4, 2.1-2.6, 3.1-3.3, 3.5 Background material

More information

3186 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 9, SEPTEMBER Zero/Positive Capacities of Two-Dimensional Runlength-Constrained Arrays

3186 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 9, SEPTEMBER Zero/Positive Capacities of Two-Dimensional Runlength-Constrained Arrays 3186 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 51, NO 9, SEPTEMBER 2005 Zero/Positive Capacities of Two-Dimensional Runlength-Constrained Arrays Tuvi Etzion, Fellow, IEEE, and Kenneth G Paterson, Member,

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

An Improved Measurement Placement Algorithm for Network Observability

An Improved Measurement Placement Algorithm for Network Observability IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 16, NO. 4, NOVEMBER 2001 819 An Improved Measurement Placement Algorithm for Network Observability Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper

More information

Precomputation Schemes for QoS Routing

Precomputation Schemes for QoS Routing 578 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 11, NO. 4, AUGUST 2003 Precomputation Schemes for QoS Routing Ariel Orda, Senior Member, IEEE, and Alexander Sprintson, Student Member, IEEE Abstract Precomputation-based

More information

MODERN automated manufacturing systems require. An Extended Event Graph With Negative Places and Tokens for Time Window Constraints

MODERN automated manufacturing systems require. An Extended Event Graph With Negative Places and Tokens for Time Window Constraints IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 2, NO. 4, OCTOBER 2005 319 An Extended Event Graph With Negative Places and Tokens for Time Window Constraints Tae-Eog Lee and Seong-Ho Park

More information

22 Elementary Graph Algorithms. There are two standard ways to represent a

22 Elementary Graph Algorithms. There are two standard ways to represent a VI Graph Algorithms Elementary Graph Algorithms Minimum Spanning Trees Single-Source Shortest Paths All-Pairs Shortest Paths 22 Elementary Graph Algorithms There are two standard ways to represent a graph

More information

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture - 35 Quadratic Programming In this lecture, we continue our discussion on

More information

Graph Theory Questions from Past Papers

Graph Theory Questions from Past Papers Graph Theory Questions from Past Papers Bilkent University, Laurence Barker, 19 October 2017 Do not forget to justify your answers in terms which could be understood by people who know the background theory

More information

Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems

Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems Benjamin Grimmer bdg79@cornell.edu arxiv:1508.05567v2 [cs.ds] 20 Jul 2017 Abstract We consider a variety of NP-Complete network

More information

SANDRA SPIROFF AND CAMERON WICKHAM

SANDRA SPIROFF AND CAMERON WICKHAM A ZERO DIVISOR GRAPH DETERMINED BY EQUIVALENCE CLASSES OF ZERO DIVISORS arxiv:0801.0086v2 [math.ac] 17 Aug 2009 SANDRA SPIROFF AND CAMERON WICKHAM Abstract. We study the zero divisor graph determined by

More information

Number Theory and Graph Theory

Number Theory and Graph Theory 1 Number Theory and Graph Theory Chapter 6 Basic concepts and definitions of graph theory By A. Satyanarayana Reddy Department of Mathematics Shiv Nadar University Uttar Pradesh, India E-mail: satya8118@gmail.com

More information

Disjoint Support Decompositions

Disjoint Support Decompositions Chapter 4 Disjoint Support Decompositions We introduce now a new property of logic functions which will be useful to further improve the quality of parameterizations in symbolic simulation. In informal

More information

Algorithms for Provisioning Virtual Private Networks in the Hose Model

Algorithms for Provisioning Virtual Private Networks in the Hose Model IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 4, AUGUST 2002 565 Algorithms for Provisioning Virtual Private Networks in the Hose Model Amit Kumar, Rajeev Rastogi, Avi Silberschatz, Fellow, IEEE, and

More information

On the Verification of Sequential Equivalence

On the Verification of Sequential Equivalence 686 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL 22, NO 6, JUNE 2003 On the Verification of Sequential Equivalence Jie-Hong R Jiang and Robert K Brayton, Fellow, IEEE

More information

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem

Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem Methods and Models for Combinatorial Optimization Exact methods for the Traveling Salesman Problem L. De Giovanni M. Di Summa The Traveling Salesman Problem (TSP) is an optimization problem on a directed

More information

Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching. 1 Primal/Dual Algorithm for weighted matchings in Bipartite Graphs

Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching. 1 Primal/Dual Algorithm for weighted matchings in Bipartite Graphs CMPUT 675: Topics in Algorithms and Combinatorial Optimization (Fall 009) Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching Lecturer: Mohammad R. Salavatipour Date: Sept 15 and 17, 009

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Response Time Analysis of Asynchronous Real-Time Systems

Response Time Analysis of Asynchronous Real-Time Systems Response Time Analysis of Asynchronous Real-Time Systems Guillem Bernat Real-Time Systems Research Group Department of Computer Science University of York York, YO10 5DD, UK Technical Report: YCS-2002-340

More information

Adaptive Techniques for Improving Delay Fault Diagnosis

Adaptive Techniques for Improving Delay Fault Diagnosis Adaptive Techniques for Improving Delay Fault Diagnosis Jayabrata Ghosh-Dastidar and Nur A. Touba Computer Engineering Research Center Department of Electrical and Computer Engineering University of Texas,

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht

More information

An Approach to Task Attribute Assignment for Uniprocessor Systems

An Approach to Task Attribute Assignment for Uniprocessor Systems An Approach to ttribute Assignment for Uniprocessor Systems I. Bate and A. Burns Real-Time Systems Research Group Department of Computer Science University of York York, United Kingdom e-mail: fijb,burnsg@cs.york.ac.uk

More information

REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS

REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS REDUCING THE CODE SIZE OF RETIMED SOFTWARE LOOPS UNDER TIMING AND RESOURCE CONSTRAINTS Noureddine Chabini 1 and Wayne Wolf 2 1 Department of Electrical and Computer Engineering, Royal Military College

More information

BCN Decision and Risk Analysis. Syed M. Ahmed, Ph.D.

BCN Decision and Risk Analysis. Syed M. Ahmed, Ph.D. Linear Programming Module Outline Introduction The Linear Programming Model Examples of Linear Programming Problems Developing Linear Programming Models Graphical Solution to LP Problems The Simplex Method

More information

Computer Science 280 Fall 2002 Homework 10 Solutions

Computer Science 280 Fall 2002 Homework 10 Solutions Computer Science 280 Fall 2002 Homework 10 Solutions Part A 1. How many nonisomorphic subgraphs does W 4 have? W 4 is the wheel graph obtained by adding a central vertex and 4 additional "spoke" edges

More information

Optimal Prescribed-Domain Clock Skew Scheduling

Optimal Prescribed-Domain Clock Skew Scheduling Optimal Prescribed-Domain Clock Skew Scheduling Li Li, Yinghai Lu, Hai Zhou Electrical Engineering and Computer Science Northwestern University 6B-4 Abstract Clock skew scheduling is an efficient technique

More information

Figure 1: An example of a hypercube 1: Given that the source and destination addresses are n-bit vectors, consider the following simple choice of rout

Figure 1: An example of a hypercube 1: Given that the source and destination addresses are n-bit vectors, consider the following simple choice of rout Tail Inequalities Wafi AlBalawi and Ashraf Osman Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV fwafi,osman@csee.wvu.edug 1 Routing in a Parallel Computer

More information

ACONCURRENT system may be viewed as a collection of

ACONCURRENT system may be viewed as a collection of 252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 3, MARCH 1999 Constructing a Reliable Test&Set Bit Frank Stomp and Gadi Taubenfeld AbstractÐThe problem of computing with faulty

More information

arxiv: v1 [math.co] 25 Sep 2015

arxiv: v1 [math.co] 25 Sep 2015 A BASIS FOR SLICING BIRKHOFF POLYTOPES TREVOR GLYNN arxiv:1509.07597v1 [math.co] 25 Sep 2015 Abstract. We present a change of basis that may allow more efficient calculation of the volumes of Birkhoff

More information

2 The Fractional Chromatic Gap

2 The Fractional Chromatic Gap C 1 11 2 The Fractional Chromatic Gap As previously noted, for any finite graph. This result follows from the strong duality of linear programs. Since there is no such duality result for infinite linear

More information

Primal Dual Schema Approach to the Labeling Problem with Applications to TSP

Primal Dual Schema Approach to the Labeling Problem with Applications to TSP 1 Primal Dual Schema Approach to the Labeling Problem with Applications to TSP Colin Brown, Simon Fraser University Instructor: Ramesh Krishnamurti The Metric Labeling Problem has many applications, especially

More information

ARELAY network consists of a pair of source and destination

ARELAY network consists of a pair of source and destination 158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper

More information

5.4 Pure Minimal Cost Flow

5.4 Pure Minimal Cost Flow Pure Minimal Cost Flow Problem. Pure Minimal Cost Flow Networks are especially convenient for modeling because of their simple nonmathematical structure that can be easily portrayed with a graph. This

More information

Bounds on the signed domination number of a graph.

Bounds on the signed domination number of a graph. Bounds on the signed domination number of a graph. Ruth Haas and Thomas B. Wexler September 7, 00 Abstract Let G = (V, E) be a simple graph on vertex set V and define a function f : V {, }. The function

More information

Drexel University Electrical and Computer Engineering Department. ECEC 672 EDA for VLSI II. Statistical Static Timing Analysis Project

Drexel University Electrical and Computer Engineering Department. ECEC 672 EDA for VLSI II. Statistical Static Timing Analysis Project Drexel University Electrical and Computer Engineering Department ECEC 672 EDA for VLSI II Statistical Static Timing Analysis Project Andrew Sauber, Julian Kemmerer Implementation Outline Our implementation

More information

Lecture 6: Faces, Facets

Lecture 6: Faces, Facets IE 511: Integer Programming, Spring 2019 31 Jan, 2019 Lecturer: Karthik Chandrasekaran Lecture 6: Faces, Facets Scribe: Setareh Taki Disclaimer: These notes have not been subjected to the usual scrutiny

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 52, NO 4, APRIL 2006 1425 Tradeoff Functions for Constrained Systems With Unconstrained Positions T Lei Poo, Member, IEEE, Panu Chaichanavong, Member, IEEE,

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Faster parameterized algorithms for Minimum Fill-In

Faster parameterized algorithms for Minimum Fill-In Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Abstract We present two parameterized algorithms for the Minimum Fill-In problem, also known as Chordal

More information

On the Maximum Throughput of A Single Chain Wireless Multi-Hop Path

On the Maximum Throughput of A Single Chain Wireless Multi-Hop Path On the Maximum Throughput of A Single Chain Wireless Multi-Hop Path Guoqiang Mao, Lixiang Xiong, and Xiaoyuan Ta School of Electrical and Information Engineering The University of Sydney NSW 2006, Australia

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007 3413 Relay Networks With Delays Abbas El Gamal, Fellow, IEEE, Navid Hassanpour, and James Mammen, Student Member, IEEE Abstract The

More information

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,

More information