PERFORMANCE optimization has always been a critical

Size: px
Start display at page:

Download "PERFORMANCE optimization has always been a critical"

Transcription

1 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER Buffer Insertion for Noise and Delay Optimization Charles J. Alpert, Member, IEEE, Anirudh Devgan, Member, IEEE, and Stephen T. Quay Abstract Interconnect-driven optimization is an increasingly important step in high-performance design. Algorithms for buffer insertion have been successfully utilized to reduce delay in global interconnect paths; however, existing techniques only optimize delay and timing slack. With the continually increasing ratio of coupling capacitance to total capacitance and the use of aggressive dynamic logic circuit families, noise analysis and avoidance is becoming a major design bottleneck. Hence, timing and noise must be simultaneously optimized to achieve maximum performance. This paper presents comprehensive buffer insertion techniques for noise and delay optimization. Three algorithms are presented, the first for noise avoidance for single sink trees, the second for avoidance for multiple sink trees, and the last for simultaneous noise and delay optimization. We prove the optimality of each algorithm (under various assumptions) and present other theoretical results as well. We ran experiments on a highperformance microprocessor design and show that our approach fixes all noise violations. Our approach was separately verified by a detailed, simulation-based noise analysis tool. Further, we show that optimizing delay alone cannot fix all of the noise violations and that the performance penalty induced by optimizing both delay and noise as opposed to only delay is less than 2%. Index Terms Buffer insertion, interconnect-synthesis, noise analysis, routing, Steiner-tree. I. INTRODUCTION PERFORMANCE optimization has always been a critical step in the design of integrated circuits. Process technology scaling into the deep submicron regime has made interconnect performance more dominant than transistor and logic performance. With the continued scaling of process technology, the resistance per unit length of the interconnect continues to increase, the capacitance per unit length remains roughly constant and transistor or logic delay continues to decrease. This trend has led to interconnect delay been more dominant than logic delay. Process technology options, such as use of copper wires instead of aluminum can only provide temporary relief. The trend of increasing interconnect dominance is expected to continue. Interconnect-driven timing optimization techniques, such as wire sizing, buffer insertion and gate sizing have gained widespread acceptance in deep submicron design (see Cong et al. [7] for an excellent survey on layout optimization and interconnect synthesis). In particular, buffer insertion techniques have been successful in reducing interconnect delay. To the first order, interconnect delay is proportional Manuscript received December 24, 1998; revised July 27, This paper was recommended by Associate Editor M. Pedram. C. J. Alpert is with the IBM Austin Research Laboratory, Austin, TX USA ( alpert@austin.ibm.com). A. Devgan and S. T. Quay are with the IBM Server Group, Austin, TX USA ( devgan@austin.ibm.com; quayst@austin.ibm.com). Publisher Item Identifier S (99)09466-X. to the square of the length of the wire. Inserting buffers effectively divides the wire into smaller segments, which makes the delay almost linear in terms of interconnect length (plus the buffer delays). Additional advantages of buffer insertion will make this optimization even more pervasive as the ratio of device to interconnect delay continues to decrease. Several works have studied the delay-driven buffer insertion problem. Closed formed solutions have been proposed by Alpert and Devgan [1], Bakoglu [2], Chu and Wong [5], and Dhar and Franklin [9], all of which find exact solutions (under different assumptions) for inserting buffers on a two-pin net. Approaches which simultaneously construct a routing tree and insert buffers have been proposed by Kang et al. [13], Lillis et al. [19], and Okamoto and Cong [23]; Berman et al. [3] showed this problem is NP-complete. The works of Kannan et al. [14] and Lin and Marek-Sadowska [20] insert buffers on a tree by iteratively finding the best location for a single buffer. Finally, a number of works [1], [17] [19], [23] can be classified as extensions and variants to Van Ginneken s dynamic programming algorithm [31] which finds the optimal buffer placement under the Elmore delay model [10]. Lillis et al. [18] extended the algorithm to simultaneously perform wire sizing and buffer insertion using a buffer library that contains both inverting and noninverting buffers. In addition, they show how to integrate skew into the gate delay model and optimize a power function (e.g., the total number of buffers), all while retaining optimality. Later, Lillis [17] extended this work to handle nets with multiple sources. Alpert and Devgan [1] proposed a wire segmenting preprocessing algorithm to handle the one buffer per wire limitation of Van Ginneken s algorithm, which results in a smooth tradeoff between solution quality and run time. Although timing optimization has always been critical in the design process, present day design techniques and process technologies are making noise analysis and avoidance as important, or in some cases, even more important than timing analysis and optimization. The shrinking of minimum distance between adjacent wires has caused an increase in the coupling capacitance of a net to its neighbors. Furthermore, a wire s thickness is typically greater than its width, which has increased the ratio of coupling to total capacitance. A large coupling capacitance can cause a switching net to induce significant noise onto a neighboring net, resulting in an incorrect functional response. Further, the widespread use of fast dynamic logic circuits and its derivative logic families has made noise avoidance and analysis even more critical since these logic families are more susceptible to noise failure. It is no longer sufficient or even acceptable to optimize only for delay. Noise avoidance techniques must become an integral /99$ IEEE

2 1634 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 (a) (b) Fig. 1. The noise effect on a victim net: (a) without a buffer and (b) with a buffer. part of the performance optimization environment. Buffer insertion provides a suitable environment for simultaneous optimization of timing and noise. Consider the circuit shown in Fig. 1. The amount of coupling capacitance from one net to another is proportional to the distance that the two nets run parallel to each other. Fig. 1(a) illustrates the noise effects that the aggressor net (top) can have on the victim net (bottom). The coupling capacitance may cause an input signal on the aggressor net to induce a noise pulse on the victim net. If the resulting noise is greater than the tolerable noise margin (NM) of the sink, then an electrical fault results. Fig. 1(b) shows how inserting a buffer can distribute the capacitive coupling between the two newly created wires, resulting in a smaller noise pulse on the input of the inserted buffer than the pulse in Fig. 1(a). Since the buffer is a restoring logic gate, ideally no noise is propagated to its output (if the noise induced on its input is less than the tolerable noise margin). The second wire will also receive a smaller noise pulse, and if the amplitude of this noise pulse is lower the sink s noise margin, the circuit will function correctly. Further, if the wire is long enough, insertion of the buffer will also reduce the total source to sink delay. Various noise analysis and avoidance techniques have been proposed over the last few years [4], [12], [24], [28], [29], [32]. Circuit simulation techniques, such as SPICE [22], can also be used for noise estimation. When the problem can be modeled as a linear circuit (which it generally can be for most coupled noise problems), specialized linear model reduction techniques (such as rapid interconnect circuit evaluation (RICE) [27] or asymptotic waveform evaluation (AWE) [25]) are typically applied [10], [26]. For linear circuits, model reduction is preferred to circuit simulation due to the computational cost savings; however, the cost may still be prohibitive. The runtime concurrents are even more critical if noise analysis is to be performed within a physical design system, such as a global router. Hence, model reduction and circuit simulation are not suitable for these applications. Geometric noise models, e.g., based on geometric distances between wires, are computationally efficient enough to be used within a physical design system (such as a global router). However, these models are not accurate since they ignore the electrical properties of the circuit. Simplified electrical noise metrics are particularly amenable for use within a physical design optimization tool since they can be efficient and accurate. Vittal and Marek-Sadowska [32] proposed a noise model for crosstalk and showed how to reduce crosstalk during channel routing. The authors of [32] model a coupled network as a simplified RC circuit in which the capacitances for the victim line, coupling, and the aggressor line are modeled as lumped capacitances, and the gates are modeled by linear resistors. They then derive analytical expressions for noise from the resultant RC circuit. However, the Vittal and Marek-Sadowska do not consider the resistances for the victim and aggressor lines. Stohr et al. [29] represent the victim and aggressor net as simplified T models circuits and compute analytical noise expressions similar that are similar to those in [32]. The recently proposed noise metric of Devgan [8] efficiently computes coupled noise while considering circuit effects such as input slew rate, line resistances, and coupling capacitances. A nice feature of the metric is that its computational complexity, structure, and incremental nature is the same as the famous Elmore delay metric (see Section II-B), which make it particularly amenable for noise avoidance. Further, like the Elmore metric for delay, the noise metric is a provable upper bound on coupled noise. Other advantages of the noise metric include the ability to incorporate multiple aggressor nets and handle general victim and aggressor net topologies. For these reasons, we adopt the Devgan noise metric in this work. A disadvantage of the Devgan metric is that it becomes more pessimistic as the ratio of the aggressor net s transition time (at the output of the driver) to its delay decreases. However, cases in which this ratio becomes very small are rare since a long net delay generally corresponds to a large load on the driver, which in turn causes a slower transition time. The metric also does not consider the duration of the noise pulse. In general, the noise margin of a gate is dependent on both the peak noise amplitude and the noise pulse width. However, when considering failure at a gate, peak amplitude dominates pulse width. In other words, the gate is much more sensitive to a change in peak amplitude than a change in pulse width. Consequently, the induced noise can be approximated by the peak noise amplitude, especially when the noise metric is an upper bound. In practice, we observe that the Devgan metric provides a computationally efficient and sufficiently accurate noise estimate, as illustrated through the realistic industrial examples studied in Section V. The following contributions are made in the paper. We propose a formula for the maximum length of a wire such that no noise violation is induced. We prove that any buffer insertion algorithm that optimizes only delay is insufficient for fixing noise violations. We present two algorithms for optimal buffer insertion for noise avoidance. The first is an optimal linear-time algorithm for two-pin nets, and the second is an optimal quadratic-time algorithm for multipin nets. We present a third algorithm for minimizing the maximum source to sink delay such that all noise constraints are satisfied. The algorithm is optimal under the condition that the buffer library contains a single buffer type. Experiments with an industrial noise analysis tool show that our algorithm eliminates all noise violations for a set of 500 nets from a modern microprocessor design, while optimizing for delay alone does not eliminate

3 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1635 all violations. Further, the average delay penalty for simultaneously optimizing noise and delay compared to optimizing delay alone is only 2%. The rest of our paper is as follows. Section II discusses presents notation, a review of Devgan s noise metric [8], and the problem formulations that we study. Section III presents optimal buffer insertion algorithms for noise avoidance, and Section IV presents an optimal algorithm for simultaneously optimizing noise and delay. Section V presents experimental results, and we conclude in Section VI. II. PRELIMINARIES We assume that the input routing tree topology is fixed or that a Steiner estimation has been computed for the given net. A routing tree contains a set of wires and a set of nodes where is the unique source node, is the set of sink nodes, and is the set of internal nodes. A wire with length is an ordered pair of nodes in which the signal propagates from to. Each node has a unique parent wire. The tree is assumed to be binary, i.e., each node can have at most two children. 1 Let the left and right children of be denoted by and, respectively. We assume that if has only one child, then it is. The path from node to, denoted by, is an ordered subset of wires of. A buffer library is also given. A buffer insertion solution is a mapping which either assigns a buffer or no buffer, denoted by, to each internal node of. 2 Let denote the number of internal nodes with inserted buffers from solution. For sufficiently long wires in which multiple buffers may need to be inserted, wires should first be segmented (e.g., as in [1]) to create as many internal nodes as necessary to form a reasonable set of potential locations for buffer insertion. 3 Assigning buffers to induces nets, and hence subtrees, each with no internally placed buffers. For each, let, the subtree rooted at, be the maximal subtree of such that is the source and contains no internal buffers. Observe that if, contains only one node. A. Delay Optimization As in [1], [18], [23], and [31], we adopt the Elmore delay 1 In general, a routed Steiner topology will never have more than three children. Such a topology is converted into a binary tree by considering each node v with three children, say a, b, and c. A dummy infeasible node w is inserted and two of the children, say a and b, are picked to be children of w. Node v now has two children, c and w, and wire (v; w) has zero length. Which pair of nodes are chosen to be children of w does not affect the solutions returned by any of the algorithms presented. 2 A buffer placed on an internal node with degree d is interpreted as having one input, one output, and d 0 1 fanouts. 3 Note that our algorithms are independent of the wire segmenting algorithm used. Solution quality increases as the number of internal nodes increases, albeit at the expense of runtime. Indeed, it may be appropriate to develop a new wire segmenting algorithm for the particular formulations we address. model 4 [10] for interconnect delays and a linear model for gate delays. For each gate, let denote the input capacitance, the intrinsic resistance and the intrinsic delay of. Let and, respectively, denote the lumped capacitance and resistance for each wire. The capacitive load seen at node is the total lumped capacitance of, i.e., The Elmore delay for a wire The delay through a gate is given by is given by If,, then. The total delay from to is given by (4) Each sink has a given required arrival time, and we assume that the input signal arrives at the source node at time zero. The condition must hold for the circuit to meet its timing requirements. We define the slack for every as where is the set of all sinks that are downstream from, i.e., is an ancestor of all sinks in. This definition differs slightly from the traditional definition of slack, but one can see that they are actually the same if is viewed as the root of the tree. Observe that the conditions in (5) holds if and only if. B. Noise Avoidance In [8], Devgan proposed a coupled noise estimation metric that is similar in spirit to Elmore delay. The noise metric is an upper bound for RC and overdamped RLC circuits. Hence, if a given net satisfies the noise constraints formed by this metric, it should also pass any reasonable noise analysis tool. The metric enables a quick estimation of the noise induced by simultaneous switching of multiple aggressor nets. The metric depends on the resistance of the victim net, the resistance of the gate driving the victim net, coupling capacitances to the aggressor nets, and the rise times and the slopes of the signals on the aggressor nets. For example, consider the four aggressor nets and the single two-pin victim net in Fig. 2. The wire in the victim net is segmented into nine new wires such that each 4 We note recent efforts [15], [21] to compute delays in constant times based on lookup tables. Although these methods hold promise, we adopt the Elmore delay because it has the additive property that a path delay is equal to the sum of the edge delays in the path. Without this property, our buffer insertion algorithms cannot be proven optimal. (1) (2) (3) (5)

4 1636 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 Fig. 3. Example noise computation. Fig. 2. Wire segmenting scheme for multiple aggressor nets. new wire is completely coupled to either zero, one, or two of the aggressor nets. The coupling capacitance from an aggressor net to a victim net can be modeled as some fraction of the wire capacitance of the victim net. For a given wire near wires from simultaneously switching aggressor nets, let be the ratios of coupling to wire capacitance from each net to, and let be the slopes (i.e., power supply voltage over input rise time) of the net signals. The total current induced by the aggressor nets on is Note that is the current through due to capacitive coupling. Often, information about neighboring aggressor nets is unavailable, especially if buffer insertion is performed before routing. In this case, a designer may wish to perform buffer insertion to improve performance while also avoiding future potential noise problems. When performing buffer insertion in estimation mode, one might assume that: 1) there is a single aggressor net which couples with each wire in the routing tree, 2) the slope of all aggressor nets is, and 3) some fixed ratio of the total capacitance of each wire is due to coupling capacitance. In this case for each wire. Let be defined as the total downstream current seen at, i.e., (6) (7) currents (induced from not shown aggressor nets) and resistances. To compute the cumulative noise seen at the sinks, we first compute the total downstream currents via (7) for each node:,,,, and. One can now compute the noise induced on each edge from (8), which yields Noise Noise Noise (10) Assuming the driver at has resistance, we can compute the noise seen at each sink via (9) Noise Noise Noise Noise Noise Noise Noise which evaluates to Noise Noise Each wire adds to the noise induced on the victim net. The amount of additional noise induced from a wire is given by Each node. The condition has a predetermined noise margin The total noise additional noise seen at a sink some upstream node is given by (8) starting at (9) Noise (11) must hold if the circuit is to have no electrical faults. In other words, the total noise propagated from every restoring gate to each its sinks must be less than the noise margin for. We define the noise slack for every as where 0 if there is no gate at. The path from to has no intermediate buffers. If an intermediate buffer is present, the noise computation begins from the output of the buffer, since the buffer is a restoring stage. Fig. 3 shows an example of a victim net with driver and sinks and. The wires are labeled with their corresponding (12) The purpose of noise slack is to serve as a noise margin for internal nodes of the tree. Observe that for each sink. The noise constraints for the downstream sinks in will be satisfied if and only if the noise slack at is

5 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1637 greater than the noise seen at. Observe that (11) holds if and only if for each. 5 C. Problem Formulations We study two different buffer insertion problems. The first formulation tries to optimize only noise while minimizing the total number of buffers; two algorithms for this problem are presented in Section III. The second formulation seeks a solution with no noise violations that optimizes delay, and this problem is solved by our third algorithm, which is an extension of Van Ginneken s algorithm [31]. Optimality for all three algorithms is guaranteed under the condition that the buffer library contains a single buffer type. The first problem seeks the minimum number of buffers that must be inserted into a tree so that no noise violations exist. Problem 1: Given a tree, a buffer library, and noise margins for each, find a solution which minimizes, such that for each. A solution to this problem formulation is certainly useful for handling noncritical nets with noise problems, for which delay optimization is unnecessary. One can integrate delay into the problem by instead optimizing performance (maximize the slack at the source) while simultaneously avoiding noise. Problem 2: Given a tree, a buffer library, and noise margins for each, find a solution which maximizes, 6 such that for each. Fig. 4. Van Ginneken s algorithm. D. Review of Van Ginneken s Algorithm Two of the algorithms we present rely on concepts introduced in Van Ginneken s buffer insertion algorithm [31]; we now present a review. The algorithm optimally solves the problem of minimizing delay for a single buffer type. The algorithm s main idea is to construct possible candidate solutions for each node in the tree. The candidates for node only can be computed after candidates for all nodes in.acandidate is a three-tuple where is the load seen at, is the slack at, and is the current solution for the subtree. Given a candidate for a node, the only information needed to compute the slack for the parent of is the load at, the slack at, and the parent wire capacitance and resistance. Hence, and are used to percolate new candidates up the tree, and is used to recover the final solution when the algorithm terminates. Fig. 4 shows Van Ginneken s algorithm which takes a routing tree and a single buffer as input and returns a candidate solution for the source. Step 1 calls the Find_Candidates procedure which is presented in Fig. 5. It returns a set of 5 Observe the similarities between the noise metric and Elmore delay metric. In particular, noise margin is analogous to require arrival time, noise slack is analogous to slack, current is analogous to capacitance, and noise is analogous to delay. 6 Observe that various formulations can be captured by manipulating the RAT (si) values. For example, if si is the only critical sink then RAT (w) =1 for all w 2 SI 0fsig. Alternatively, setting all slacks to be equal captures minimizing max si2si Delay(so 0 si). Fig. 5. Find_Candidates procedure. possible candidates for the source, but without accounting for the driver delay. Hence, Step 3 updates each candidate to include the driver delay, and Step 4 returns the candidate with largest slack. The complexity of the algorithm is. The Find_Candidates procedure shown in Fig. 5. It takes the node to be processed as input, recursively computes the lists of possible candidates for all the nodes in, and then returns the candidate list for node. The procedure can be broken into four main parts: Steps 1 4 examine the candidates for the children of and merge them together to form, the set of candidates for. Step 1 handles the base case in which is a sink, Step 2 handles the single child case, and Steps 3 and 4 handle the two children case. For the two children case, the two child candidate lists and are traversed, and the two candidates and are merged together by summing their downstream capacitances and currents, and taking the minimums of their timing and noise slacks. Observe that the number of candidates resulting from merging the two lists is only as opposed to.

6 1638 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 which implies that (15) Fig. 6. Uniform neighboring coupling capacitance for a single wire. Step 5 inserts buffers for some candidates, thereby creating new candidates to add to. Since it may be physically impossible to place a buffer at the current node, only nodes which are feasible are considered. The buffer is considered for insertion at, and the candidate in which produces the largest resulting slack is preserved. Step 6 computes the new load, slack, current and noise slack for each candidate induced by the parent wire of. Finally, Step 7 prunes inferior candidates from, using the pruning schemes of [18] and [31], not the scheme used in Algorithm 2. Here, solutions are sorted in increasing order of load and decreasing order of slack. Given two candidates and, is inferior to if and only if and. III. NOISE CONSTRAINED BUFFER INSERTION In this section, we study Problem 1, inserting the minimum number of buffers such that noise constraints are satisfied. A. Theoretical Results for Noise Avoidance We begin by studying the simplest case of a single wire with uniform width and neighboring coupling capacitance, as shown in Fig. 6. This case is instructive, since the wires in a routing tree can always be segmented into a set of several such wires, as seen in Fig. 2. For each wire, let be the wire resistance per unit length and be the current per unit length. Since current is a constant times wire capacitance (6), we can use a -model to represent its distribution. We now derive a new formula for finding the longest possible length of the wire, driven by buffer, such that there is not a noise violation. Theorem 1: For a given wire in a routing tree, a buffer needs to be inserted on to satisfy noise constraints if and only if and (13) Proof: If we cannot insert a buffer on wire, then the noise seen at is minimized when is inserted just above in the routing tree. From (11), for noise constraints to be satisfied, we must have (14) which is a quadratic in. Solving for yields the theorem. The constraint is needed to ensure that. More generally, given that is the unit wire capacitance of, we can substitute in (13) to yield (16) Theorem 1 leads to some interesting observations. First, if the noise slack equals, then the wire length is zero. A smaller noise slack would cause to be negative, a result which implies that a buffer should have been inserted on the subtree. If the noise slack is too small [i.e., less than ], then by the time node is reached, it is too late to insert a buffer on wire to satisfy noise constraints. Second, notice that as the resistance of the driving buffer increases, the negative term in (13) decreases faster than the positive square root term, which makes the length decrease. The maximum wire length is achieved when the buffer resistance and downstream current are zero, yielding a length of. Such a simple approximation could be useful for noise avoidance if the driver s properties are unknown or if the ratio of driver to wire resistance is close to zero. Finally, observe that in (16), the ratio of bad capacitance to total capacitance is inversely proportional to the distance separating the aggressor and victim nets, i.e.,, for some constant. If the wire is coupled to a single aggressor net, one can solve (16) for the separating distance, yielding (17) Theorem 2: A net that has been optimally buffered to minimize only delay may have noise violations. Proof: Consider a wire in which and are gates in the buffered net. Let be the slope of an aggressor net, and let be the ratio of coupling to total capacitance for. Alternatively, a path of wires could connect to but the analysis is basically the same. If there is a noise violation at, then the following equation [similar to (13)] must hold: Solving for yields (18) (19)

7 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1639 Fig. 7. Iterative application of Theorem 1. noise slack, then a buffer must be inserted. Step 4 computes the maximum length that this buffer may be inserted from, and inserts it there at a new internal node. Finally, Step 5 computes the noise slack and the source and inserts a buffer right after the source if there is a noise violation (which can only occur if ). Theorem 3: Algorithm 1 computes an optimal solution to Problem 1 for a single-sink tree in time. The proof of optimality follows from the fact that buffers are always inserted their maximal distance up the tree, according to Theorem 1. Observe that extending Algorithm 1 to accommodate a buffer library with multiple buffer types is trivial, since according to Theorem 1, the buffer with smallest resistance always yields the maximum spacing between buffers. Hence, one can equivalently obtain the optimal solution by creating a new buffer library consisting only of the buffer in the original library with smallest resistance. Fig. 8. Algorithm 1: noise avoidance for single-sink tree. For any fixed values for the parameters on the right side of the inequality, a noise violation will occur if the noise margin at is small enough. Even if is reasonably large, an aggressor net can have a very large slope and a high coupling ratio which would also cause a violation. B. Algorithm 1: Optimal Noise Avoidance for Single-Sink Trees Theorem 1 suggests a noise avoidance buffer insertion algorithm for trees with only one sink. Begin at the sink node and work up the tree, updating the total downstream current and noise slacks of visited internal nodes. At a given node, use Theorem 1 to find out if adding a buffer is necessary and to compute its furthest possible location up the parent wire. The algorithm terminates when the source node is reached. Fig. 7 illustrates the order in which buffers will be inserted by the algorithm. Fig. 8 presents the description for Algorithm 1, Noise Avoidance for Single-Sink Trees. The algorithm accepts a routing tree, and a single buffer type. Step 1 initializes the current and noise slack of the sink node, then Steps 2 4 climb up the tree visiting each node in turn. Step 3 examines whether or not a buffer needs to be inserted on the current wire, by computing the noise from placing a buffer at node. If this noise is less than the noise slack, then no buffer needs to be inserted, so the algorithm computes the downstream current and noise slack for node, then moves to the next wire. If the noise computed in Step 3 is larger then the C. Algorithm 2: Optimal Noise Avoidance for Multi-Sink Trees It may appear fairly easy to extend Algorithm 1 to trees with multiple sinks. However, some difficulty arises when processing an internal node with two children, under the following scenario. Let,, and, respectively, be the wire, current, and noise slack for the left (right) branch of. It may be that and, i.e., the noise constraints for the left and right branches are satisfied. However, we may have, which implies that merging the two branches would cause a noise violation. Thus, a buffer must be placed on either the left or right branch immediately following. One cannot immediately deduce which branch to choose since we may have and, i.e., the left branch is more tolerant of noise than the right branch, but the left branch has a larger downstream current which makes it more noise sensitive. The correct choice cannot be made without knowing the characteristics and location of the gate driving, but since the algorithm is bottom-up, this location of this gate has not yet been determined. To handle this additional complexity, we propose to generate a set of candidate solutions for each node and propagate these candidate solutions up the tree, in the same spirit as in Van Ginneken s algorithm [31]. This time, a candidate is defined as a three-tuple where is the downstream current seen at, is the noise slack for, and is the current solution for the subtree. and are used to determine which candidates to percolate up the tree, and stores the current solution seen so far. When a node with two children is encountered, we let denote the new solution that results from merging solutions for the left and for the right branch of. We assign if either or and otherwise. Whenever a node with two children is encountered and a buffer needs to be inserted on either the left or the right branch, candidates for each of these two options are generated. The candidates are stored in nondecreasing order by downstream current so that inferior solutions can be pruned in a linear pass of the current

8 1640 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 If there is a violation, then two new solutions, one with a new buffer on the left and one with a new buffer on the right, are generated and inserted into the current list of candidates. When the algorithm terminates, the solution(s) in with the fewest number of buffers is chosen. Theorem 4: Algorithm 2 returns an optimal solution to Problem 1 for a multi-sink tree in time. As in Algorithm 1, optimality follows from the fact that buffers are only inserted at their maximal distance up the tree, except for the case when the test in Step 5 holds. But in this case, both possible buffer insertions are preserved and propagated up the tree, and at least one of these must be optimal. The complexity is since each node may have as many as candidates. In practice, we believe that the algorithm s run time will be linear in the average case, since the test in Step 5 will rarely hold. As for Algorithm 1, one can equivalently obtain an optimal solution for a buffer library with multiple buffer types by selecting the buffer type with smallest resistance. IV. OPTIMIZING DELAY WHILE SATISFYING NOISE CONSTRAINTS Fig. 9. Algorithm 2: noise avoidance for multi-sink trees. candidate list. Given two candidates and for node, we say that is inferior to if and only if and. The complete description for Algorithm 2: Optimal Noise Avoidance for Multi-Sink Trees, is presented in Fig. 9. The algorithm is recursive and is initially passed the source for the tree. 7 Algorithm 2 is similar to Algorithm 1, except for Steps 4 7 which handle nodes with two children. Step 4 iterates through each left branch candidate and each right branch candidate using the Van Ginneken s linear merging technique [31]. Step 5 tests whether merging the two candidates results in a noise violation, 8 and if there is no violation, Step 7 merges the candidates for the two branches without inserting a buffer. 7 Note that it is inefficient to store M for each candidate. In the actual implementation, pointers are stored for the left and right candidates, and the final solution can be revealed by traversing these pointers. 8 We assume R so > R b for this test, otherwise one must test whether the current solution M will have no noise violations if no more buffers are inserted. If this test yields no noise violations, then Step 7 should be executed instead of Step 6. A. Algorithm 3: Noise Avoidance for Delay Minimization To address Problem 2, simultaneous optimization of noise and delay, we modify Van Ginneken s optimal delay algorithm (Figs. 4 and 5) to include noise avoidance. The essence of our modifications is the following: whenever Van Ginneken s algorithm considers inserting a buffer into a candidate, we check to see if the noise constraints have been violated; if they have, we do not permit a buffer to be inserted. Observe that our algorithm will actually generate fewer candidates than Van Ginneken s algorithm. Van Ginneken s algorithm generates a set of candidates which violate noise constraints and a set which satisfy noise constraints, while our algorithm only generates candidates belonging to the latter set. The other modifications are for bookkeeping purposes and to permit a buffer library with more than one buffer. Again, a list of candidates is computed for each node in the tree, except that this now a candidate is a fivetuple where is the load seen at, is the slack at, is the downstream current seen at, is the noise slack at, and is the current solution. Figs. 10 and 11 illustrate Algorithm 3, which is equivalent to Van Ginneken s algorithm except for the modifications for noise avoidance which are shown in boldface. Step 1 calls the Find_Noise_Candidates procedure which returns a list of candidate solutions that do not include the driver s effect on delay and noise. Step 2 adds the driver delay and computes the noise slack, then the candidate with the best slack, such that noise constraints are satisfied, is returned in Step 3. The Find_Noise_Candidates procedure shown in Fig. 11. It takes the node to be processed as input, recursively computes the lists of possible candidates for all the nodes in, and then returns the candidate list for node. Steps 1 4 are

9 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1641 Fig. 10. Algorithm 3: noise avoidance for delay minimization. Fig. 12. Illustrations for proof of Theorem 5. Fig. 11. Find_Noise_Candidates procedure. the same as in Fig. 5 except for additional bookkeeping. In Step 5, each buffer type in the library is considered for insertion at, and the candidate in which produces the largest resulting slack such that noise constraints are satisfied is chosen. The procedure will not insert a buffer if its causes a noise violation. This step is the fundamental modification made to Van Ginneken s algorithm. Step 6 computes the new load, slack, current and noise slack for each candidate induced by the parent wire of. Finally, Step 7 is the same as in Fig. 5. The modifications for noise avoidance do not increase the time complexity of Van Ginneken s algorithm; hence, Algorithm 3 has time complexity [18]. B. Optimality of Algorithm 3 Theorem 5: If and both and hold for each, then Algorithm 3 returns an optimal solution to Problem 2. Proof: First, observe that Algorithm 3 always returns a solution which satisfies the noise constraints (even when ) since neither a buffer nor the driver is added to a candidate if it has a noise violation. In Step 5 of the Find_Candidates procedure, a buffer is only added to a candidate if the addition of the buffer does not violate the noise margins of the sinks that it drives. Similarly, Step 4 of Fig. 10 will only return a solution with positive noise slack. It may seem at first that pruning candidates may cause the optimal solution to be eliminated, since pruning is performed based on load and timing slack, not current and noise slack. Consider candidates and for node where and. Since is inferior to, then will be pruned by Step 7, but perhaps will have a noise violation further up the tree while will not. Since subsequently becomes an illegal solution, may actually be the optimal solution to Problem 2, but the existence of causes it to be pruned. We now show that the optimal solution will never be pruned. There are three cases, illustrated in Fig. 12. Case 1: and both see a single gate downstream from. If this gate is a buffer for both candidates, then the wire capacitance component of will be greater than the wire capacitance component for since the gate loads are the same and. Hence, the total downstream wire length from will be longer for than for. Since current is monotone increasing in wire length, we have. And since the noise margins for the buffers are the same,. So, if fails to meet noise constraints further up the tree, will fail as well, i.e., pruning does not eliminate an optimal solution. Alternatively, if sees one of the original sinks downstream from, while sees a buffer, then its downstream wire length must be longer than that of,so. Clearly, it is impossible for to see a buffer downstream and for to see a sink because then (because the buffer

10 1642 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 has smaller input capacitance than each sink). By assumption, the sink s noise margin is no greater than the buffer s noise margin, so. As before, if fails to meet noise constraints further up the tree then will fail as well. Case 2: One candidate drives a single gate, and one drives multiple gates. Since, then must drive the single gate since the buffer must have smaller capacitance than either branch. Hence, this gate must be further up the tree then either of the two gates driven by. Analysis similar to that of Case 1 can be applied to conclude that and, which makes worse than in terms of both timing and noise, so pruning will not eliminate an optimal solution. Case 3: Each candidate sees multiple sinks downstream from. In this case, it is possible to have, since the gate(s) driven on the left branch for may be further up the tree than for. This means that unlike in the previous two cases, may violate noise constraints further up the tree, while would not. Thus, a legal solution may in effect become pruned by an illegal one, making it seem possible that a potentially optimal solution may be pruned. We now show that Step 4 in Find_Candidates constructs a third solution which will not be pruned, and that is inferior to both in terms of noise and timing. This implies that could not have been part of the optimal solution. Decompose the loads and of and into their left and right components: and.if, then we can use the same analyzes from Cases 1 and 2 to conclude that is inferior to in terms of both timing and noise. Thus, pruning it does not eliminate an optimal solution. If, then either or since downstream current is a monotone nondecreasing function of capacitance. If, then consists of the left branch of and the right branch of, otherwise, it consists of the right branch of and the left branch of. Thus,, and since the buffers seen at are further up the tree for than for and, we conclude that, and by using the same arguments as in Cases 1 and 2. Finally, to show that it is safe to prune because of the existence of, we must show that. Decompose the slacks and into their left and right components: and. Since is inferior to, and hence.if consists of the left branch of and the right branch of, then.if consists of the right branch of and the left branch of, then. In either case,, which means that is inferior to in terms of both noise and timing. C. Discussion For Theorem 5, the assumptions and are somewhat realistic since one generally would want to choose buffers with small input capacitance and high noise margin. We observe that if is sufficiently large, then any solution with buffer inserted would instantly be pruned since both its capacitance and slack would be worse that those for the zero-buffer solution. In this case, Algorithm 3 would fail to insert any buffers, even if the zero-buffer solution had noise violations. In practice, buffers are commonly used to reduce delay often by decoupling off-path load capacitance. Thus, a buffer library should have at least one buffer with small input capacitance relative to the wire and sink capacitances. As long as such a buffer exists in the library and there is sufficient wire segmenting, Algorithm 3 will find a legal solution. When there is more than one buffer in the buffer library, even if they all have small input capacitance, the optimality of Algorithm 3 is not guaranteed. The proof of Theorem 5 utilizes the fact that if one candidate sees more downstream capacitance than another candidate, then it must also see more wire capacitance, and hence, more downstream current. Since different buffer types have different input capacitances, this assertion cannot be made. We believe that even with multiple buffer types, Algorithm 3 will generally produce solutions that are very close to optimal; our experimental results in Section V strongly support this claim. In optimally solving Problem 2, more buffers may be added than necessary, e.g., six additional buffers might be inserted to squeeze out an extra 25 ps of performance. Alternatively, one may wish to minimize the number of inserted buffers such that noise and timing constraints are satisfied. Problem 3: Given a tree, a buffer library, and noise margins for each, find a solution which minimizes such that and for each. Since many possible solutions may exist, maximize as a secondary objective. We can address this formulation by incorporating an extension to Van Ginneken s algorithm proposed by Lillis et al. [18] into Algorithm 3. Lillis et al. showed that instead of storing a single candidate list for each node, one can store several lists in an array indexed by the total number of buffers inserted in the candidate solution. This allows one to generate the optimal solution in terms of delay for any desired number of buffers. We have incorporated this extension into Algorithm 3 and address Problem 3 by first finding the best solution in terms of timing for each possible number of buffers and then returning the solution with the fewest buffers such that both noise and timing constraints are satisfied. This algorithm has been incorporated into a tool called BuffOpt. We have theoretically shown that optimizing delay alone cannot possibly fix all noise problems. The experimental results in the next section shows that this phenomenon occurs in practice as well. V. EXPERIMENTAL RESULTS The techniques presented in the previous sections have been implemented in a tool called Buffopt. We refer to the algorithm which find the optimizes only delay [1], [18] as DelayOpt. DelayOpt is the same as Algorithm 3 in Fig. 10, without the boldface modifications. However, like BuffOpt, DelayOpt was extended to be able to trade off between the degree of timing optimization and the number of buffers inserted. This extension allows us to report the best result for any given

11 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1643 TABLE I SINK DISTRIBUTION OF THE 500 TEST NETS TABLE III NOISE AVOIDANCE COMPARISONS OF BuffOpt VERSUS DelayOpt(k) TABLE II NUMBER OF NOISE VIOLATIONS REPORTED BY 3dNOISE BEFORE AND AFTER RUNNING BuffOpt number of buffers k, and we denote this result as DelayOpt(k). For our experiments, we selected a set of 500 nets from a modern Power PC microprocessor design. Each gate in the design belongs to precharacterized cell library, and timing and capacitance data for each gate was obtained by simulation and extraction. The 500 nets with largest total capacitances were chosen for analysis, since these nets were most likely to have noise violations. Table I shows the distribution of the sizes of these nets. To verify BuffOpt, we separately ran a more detailed, simulation-based noise analysis tool, 3dnoise [26]. 3dnoise was run both before and after running BuffOpt and DelayOpt. To perform noise analysis, 3dnoise uses accurate momentmatching based techniques that are similar to RICE [27]. We ran BuffOpt, DelayOpt and 3dnoise all in estimation mode (see Section II-B), assuming a 0.7 coupling to total capacitance ratio from a single aggressor net with rise time 0.25 ns and a power supply voltage of 1.8 V (so 7.2). The tolerable noise margin for every gate in the design is 0.8 V. Our buffer library contained 5 inverting and 6 noninverting buffers of varying power levels. Our experiments show the following. BuffOpt eliminated every noise problem in the design data. DelayOpt was unable to solve identified noise problems. The average delay penalty from using BuffOpt instead of DelayOpt was less than 2%. A. BuffOpt Successfully Avoids Noise We ran BuffOpt on the 500 nets, and observed that BuffOpt identified noise violations for 423 of the nets and successfully inserted buffers to fix all of them. To verify BuffOpt s identification of noise critical nets, we ran 3dnoise on the test data before running BuffOpt. The accurate analysis of 3dnoise identified 386 nets with noise violations, all of which were also identified by BuffOpt. BuffOpt identified more nets with violations, which shows that the noise metric is more conservative. This is due to the fact that the noise metric is an upper bound. To verify the ability of BuffOpt to eliminate noise violations, we ran 3dnoise on the test data after running BuffOpt. 3dnoise identified no noise violations on the data after buffers had been inserted. This data is summarized in Table II. TABLE IV AVERAGE DELAY REDUCTION IN PICOSECONDS FROM BUFFER INSERTION B. Delay Alone is an Insufficient Optimization Our next set of experiments compared BuffOpt to DelayOpt (optimal delay-driven buffer insertion) in terms of how well the algorithms avoided noise. Since BuffOpt never inserted more than four buffers on any net, we ran DelayOpt four times in which no solution was allowed to have more than buffers, and ranged from 1 to 4. Table III compares the solutions generated by DelayOpt(k) with BuffOpt. Observe that running DelayOpt(4) causes the addition of 1126 more buffers than BuffOpt, and yet still has 13 noise violations. Limiting the total number of buffers inserted by DelayOpt only increases the number of noise violations and still causes more buffers to be inserted for. Thus, as we proved in Theorem 2, noise avoidance must be integrated into the problem formulation in order to avoid all noise. Finally, observe that the total CPU time for BuffOpt is actually less than DelayOpt(k) for. This occurs because BuffOpt prunes candidate solutions which violate noise constraints, which gives BuffOpt fewer total candidates to analyze than DelayOpt. The total number of buffers inserted is given by nets with buffers. For example BuffOpt inserted (0 77) (1 161) (2 232) (3 84) (4 2) 717 buffers. C. The Delay Penalty Is Small Our final experiment compares DelayOpt to BuffOpt in terms of total delay. We first ran BuffOpt to see how many buffers were inserted, then ran DelayOpt for the same number of buffers in order to make an apples to apples comparison. We computed the reduction in total delay for each net, and averaged the results by the number of buffers inserted. The cumulative results are presented in Table IV. For example, there were 232 nets for which two buffers were inserted, and on average BuffOpt reduced delay by ps while DelayOpt reduced delay by ps. To find the average delay reduction

12 1644 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 over all 423 instances, we compute the total reduction in delay for all nets and divide by 423. For example, for BuffOpt, the weighted average is (( ) ( ) ( ) (45.5 2))/ From the last column in the table, we see that overall there was an average delay penalty of only 6 ps, or equivalently 1.99%, from avoiding noise. Thus, Buffopt is able to integrate noise into a delay-driven algorithm with virtually no loss in total delay. Recall that in Section IV, we proved the optimality of Algorithm 3 for Problem 2 under the assumption that the buffer library contained only a single buffer type, but that we could not guarantee optimality for a larger buffer library. The DelayOpt results in Table IV form an upper bound on the optimal solution to Problem 2 since it is an optimal algorithm for minimum delay without noise constraints. Thus, we see that even with a buffer library of size 11, BuffOpt returns virtually the optimal delay since it is on average within 2% of an upper bound. VI. CONCLUSIONS Noise is becoming an increasingly critical bottleneck in the design process. In this work, we presented comprehensive buffer insertion techniques for noise and delay optimization. Three algorithms were presented, the first for optimal noise avoidance for single-sink trees, the second for optimal noise avoidance for multi-sink trees, and the third for simultaneous noise and delay optimization. We successfully verified our approach on a modern PowerPC design. We also showed both theoretically and empirically that optimizing delay alone cannot solve all noise problems, and that the performance penalty of optimizing both noise and delay compared to delay alone was only 2% on average. ACKNOWLEDGMENT The authors would like to thank J. Rahmeh for his help with 3dnoise and the PowerPC design data. REFERENCES [1] C. J. Alpert and A. Devgan, Wire segmenting for improved buffer insertion, in Proc. 34th IEEE/ACM Design Automation Conf., 1997, pp [2] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. Reading, MA: Addison-Wesley, [3] C. L. Berman, J. L. Carter, and K. F. Day, The fanout problem: From theory to practice, in Advanced Research in VLSI: Proc Decennial Caltech Conference, C. L. Seitz, Ed. Cambridge, MA; MIT Press, Mar. 1989, pp [4] I. Catt, Crosstalk (noise) in digital systems, IEEE Trans. Electron. Comput., vol. ED-16, pp , [5] C. C. N. Chu and D. F. Wong, Closed form solution to simultaneous buffer insertion/sizing and wire sizing, in Proc. Int. Symp. Physical Design, 1997, pp [6], A new approach to simultaneous buffer insertion and wire sizing, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1997, pp [7] J. Cong, L. He, C.-K. Koh, and P. H. Madden, Performance optimization of VLSI interconnect layout, Integration: The VLSI J., vol. 21, pp. 1 94, [8] A. Devgan, Efficient coupled noise estimation for on-chip interconnects, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1997, pp [9] S. Dhar and M. A. Franklin, Optimum buffer circuits for driving long uniform lines, IEEE J. Solid-State Circuits, vol. 26, pp , Jan [10] W. C. Elmore, The transient response of damped linear network with particular regard to wideband amplifiers, J. Appl. Phys., vol. 19, pp , [11] P. Feldmann and R. W. Fruend, Reduced-order modeling of large linear subcircuits via a block Lanczos algorithm, in Proc. ACM/IEEE Design Automation Conf., 1995, pp [12] L. Gal, On-chip crosstalk The new signal integrity challenge, in Proc. Custom Integrated Circuits Conf., 1995, pp [13] M. Z.-W. Kang, W. W.-M. Dai, T. Dillinger, and D. P. LaPotin, Delay bounded buffered tree construction for timing driven floorplanning, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1997, pp [14] L. N. Kannan, P. R. Suaris, and H.-G. Fang, A methodology and algorithms for post-placement delay optimization, in Proc. 31st IEEE/ACM Design Automation Conf., 1994, pp [15] R. Kay and L. Pileggi, PRIMO: Probability interpretation of moments for delay calculation, in Proc. ACM/IEEE Design Automation Conf., 1998, pp [16] D. Kirkpatrick and A. Sangiovanni-Vincentelli, Techniques for crosstalk avoidance in design of high-performance digital systems, in Proc. IEEE Int. Conf. Computer-Aided Design, 1994, pp [17] J. Lillis, Timing optimization for multi-source nets: Characterization and optimal repeater insertion, in Proc. 34th IEEE/ACM Design Automation Conf., 1997, pp [18] J. Lillis, C.-K. Cheng, and T.-T. Y. Lin, Optimal wire sizing and buffer insertion for low power and a generalized delay model, IEEE J. Solid-State Circuits, vol. 31, pp , Mar [19], Simultaneous routing and buffer insertion for high-performance interconnect, in Proc. 6th Great Lakes Symp. Physical Design, 1996, pp [20] S. Lin and M. Marek-Sadowska, A fast and efficient algorithm for determining fanout trees in large networks, in Proc. European Conf. Design Automation, 1991, pp [21] T. Lin, E. Acar, and L. Pileggi, h-gamma: An RC delay metric based on a gamma distribution approximation of the homogeneous response, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1998, pp [22] L. W. Nagel, SPICE2, a computer program to simulate semiconductor circuits, Univ. California, Berkeley, CA, Tech. Rep. ERL-M520, May [23] T. Okamoto and J. Cong, Interconnect layout optimization by simultaneous Steiner tree construction and buffer insertion, in Proc. 5th ACM/SIGDA Physical Design Workshop, 1996, pp [24] P. Penfield and J. Rubinstein, Signal delay in RC tree networks, in Proc. ACM/IEEE Design Automation Conf., 1981, pp [25] L. T. Pillage and R. A. Rohrer, Asymptotic waveform evaluation for timing analysis, IEEE Trans. Computer-Aided Design, vol. 9, pp , Apr [26] J. Rahmeh, The 3d-noise user guide, IBM, Austin, TX, Internal Rep., [27] C. Ratzlaff and L. T. Pillage, RICE: Rapid interconnect circuit evaluator using asymptotic waveform evaluation, IEEE Trans. Computer- Aided Design, vol. 13, pp , June [28] K. Shephard and V. Narayan, Noise in submicron digital design, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1996, pp [29] T. Stohr, M. Alt, A. Hetzel, and J. Koehl, Analysis, reduction and avoidance of crosstalk on VLSI chips, in Proc. Int. Symp. Physical Design, 1998, pp [30] R. R Tummala and E. J. Ryamszewski, Microelectronics Packaging Handbook. New York: Van Nostrand, Reinhold, [31] L. P. P. P. van Ginneken, Buffer placement in distributed RC-tree networks for minimal Elmore delay, in Proc. Int. Symp. Circuits and Systems, 1990, pp [32] A. Vittal and M. Marek-Sadowska, Crosstalk reduction for VLSI, IEEE Trans. Computer-Aided Design, vol. 16, pp , Mar Charles J. Alpert (S 92 M 96) received the B.S. and the B.A. degrees from Stanford University, Stanford, CA, in He received the Ph.D. degree in computer science at the University of California at Los Angeles in He currently works as a Research Staff Member at the IBM Austin Research Laboratory in Austin, TX. Dr. Alpert received a Best Paper Award at both the 1994 and 1995 ACM/IEEE Design Automation Conferences. He also serves on the program committee for the International Symposium on Physical Design. His research interests include clock distribution, noise analysis, global routing, and timing optimization.

13 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1645 Anirudh Devgan (S 91 M 91) received the B.T. degree in electrical engineering from the Indian Institute of Technology, Delhi, India, in 1990 and the M.S. and Ph.D. degrees in electrical and computer engineering from Carnegie Mellon University, Pittsburgh, PA, in 1991 and 1993, respectively. He is currently the Manager of electrical analysis and physical design in the IBM Server Group in Austin, TX. From 1994 to 1999, he was a Research Staff Member in the IBM Research Division, at IBM Thomas J. Watson Research Center, Yorktown Heights, NY, and at IBM Austin Research Laboratory, Austin, TX. His research interests are in the area of design automation of integrated circuits, specifically transistor and interconnect level modeling, analysis, and optimization. Stephen T. Quay received the B.S. degree in electrical engineering and the B.S. degree in computer science from Washington University, St. Louis, MO, in Since 1983, he has worked in many areas of chip layout and analysis for IBM in Endicott, NY, and Austin, TX. Currently, he is an Advisory Engineer in the Server Development Group where he develops design automation applications for interconnect extraction and performance optimization.

Making Fast Buffer Insertion Even Faster Via Approximation Techniques

Making Fast Buffer Insertion Even Faster Via Approximation Techniques 1A-3 Making Fast Buffer Insertion Even Faster Via Approximation Techniques Zhuo Li 1,C.N.Sze 1, Charles J. Alpert 2, Jiang Hu 1, and Weiping Shi 1 1 Dept. of Electrical Engineering, Texas A&M University,

More information

Time Algorithm for Optimal Buffer Insertion with b Buffer Types *

Time Algorithm for Optimal Buffer Insertion with b Buffer Types * An Time Algorithm for Optimal Buffer Insertion with b Buffer Types * Zhuo Dept. of Electrical Engineering Texas University College Station, Texas 77843, USA. zhuoli@ee.tamu.edu Weiping Shi Dept. of Electrical

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 8 (2013), pp. 907-912 Research India Publications http://www.ripublication.com/aeee.htm Circuit Model for Interconnect Crosstalk

More information

Receiver Modeling for Static Functional Crosstalk Analysis

Receiver Modeling for Static Functional Crosstalk Analysis Receiver Modeling for Static Functional Crosstalk Analysis Mini Nanua 1 and David Blaauw 2 1 SunMicroSystem Inc., Austin, Tx, USA Mini.Nanua@sun.com 2 University of Michigan, Ann Arbor, Mi, USA Blaauw@eecs.umich.edu

More information

Buffered Steiner Trees for Difficult Instances

Buffered Steiner Trees for Difficult Instances Buffered Steiner Trees for Difficult Instances C. J. Alpert 1, M. Hrkic 2, J. Hu 1, A. B. Kahng 3, J. Lillis 2, B. Liu 3, S. T. Quay 1, S. S. Sapatnekar 4, A. J. Sullivan 1, P. Villarrubia 1 1 IBM Corp.,

More information

An Interconnect-Centric Design Flow for Nanometer. Technologies

An Interconnect-Centric Design Flow for Nanometer. Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong Department of Computer Science University of California, Los Angeles, CA 90095 Abstract As the integrated circuits (ICs) are scaled

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Statistical Timing Analysis Using Bounds and Selective Enumeration

Statistical Timing Analysis Using Bounds and Selective Enumeration IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 22, NO. 9, SEPTEMBER 2003 1243 Statistical Timing Analysis Using Bounds and Selective Enumeration Aseem Agarwal, Student

More information

Simultaneous Shield and Buffer Insertion for Crosstalk Noise Reduction in Global Routing

Simultaneous Shield and Buffer Insertion for Crosstalk Noise Reduction in Global Routing Simultaneous Shield and Buffer Insertion for Crosstalk Noise Reduction in Global Routing Tianpei Zhang and Sachin S. Sapatnekar Department of Electrical and Computer Engineering, University of Minnesota

More information

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu

More information

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),

More information

Chapter 28: Buffering in the Layout Environment

Chapter 28: Buffering in the Layout Environment Chapter 28: Buffering in the Layout Environment Jiang Hu, and C. N. Sze 1 Introduction Chapters 26 and 27 presented buffering algorithms where the buffering problem was isolated from the general problem

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA A Path Based Algorithm for Timing Driven Logic Replication in FPGA By Giancarlo Beraudo B.S., Politecnico di Torino, Torino, 2001 THESIS Submitted as partial fulfillment of the requirements for the degree

More information

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Renshen Wang Department of Computer Science and Engineering University of California, San Diego La Jolla,

More information

Basic Idea. The routing problem is typically solved using a twostep

Basic Idea. The routing problem is typically solved using a twostep Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a

More information

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 1 Lecture 10: Repeater (Buffer) Insertion Introduction to Buffering Buffer Insertion

More information

A Faster Approximation Scheme For Timing Driven Minimum Cost Layer Assignment

A Faster Approximation Scheme For Timing Driven Minimum Cost Layer Assignment A Faster Approximation Scheme For Timing Driven Minimum Cost Layer Assignment ABSTRACT Shiyan Hu Dept. of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan 49931

More information

A Novel Performance-Driven Topology Design Algorithm

A Novel Performance-Driven Topology Design Algorithm A Novel Performance-Driven Topology Design Algorithm Min Pan, Chris Chu Priyadarshan Patra Electrical and Computer Engineering Dept. Intel Corporation Iowa State University, Ames, IA 50011 Hillsboro, OR

More information

An Efficient Algorithm For RLC Buffer Insertion

An Efficient Algorithm For RLC Buffer Insertion An Efficient Algorithm For RLC Buffer Insertion Zhanyuan Jiang, Shiyan Hu, Jiang Hu and Weiping Shi Texas A&M University, College Station, Texas 77840 Email: {jerryjiang, hushiyan, jianghu, wshi}@ece.tamu.edu

More information

ARELAY network consists of a pair of source and destination

ARELAY network consists of a pair of source and destination 158 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 55, NO 1, JANUARY 2009 Parity Forwarding for Multiple-Relay Networks Peyman Razaghi, Student Member, IEEE, Wei Yu, Senior Member, IEEE Abstract This paper

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

10. Interconnects in CMOS Technology

10. Interconnects in CMOS Technology 10. Interconnects in CMOS Technology 1 10. Interconnects in CMOS Technology Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October

More information

Post-Route Gate Sizing for Crosstalk Noise Reduction

Post-Route Gate Sizing for Crosstalk Noise Reduction Post-Route Gate Sizing for Crosstalk Noise Reduction Murat R. Becer, David Blaauw, Ilan Algor, Rajendran Panda, Chanhee Oh, Vladimir Zolotov and Ibrahim N. Hajj Motorola Inc., Univ. of Michigan Ann Arbor,

More information

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations An Efficient Routing Tree Construction Algorithm with uffer Insertion, Wire Sizing and Obstacle Considerations Sampath Dechu Zion Cien Shen Chris C N Chu Physical Design Automation Group Dept Of ECpE Dept

More information

Pseudopin Assignment with Crosstalk Noise Control

Pseudopin Assignment with Crosstalk Noise Control 598 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 5, MAY 2001 Pseudopin Assignment with Crosstalk Noise Control Chin-Chih Chang and Jason Cong, Fellow, IEEE

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

Fast Algorithms For Slew Constrained Minimum Cost Buffering

Fast Algorithms For Slew Constrained Minimum Cost Buffering 18.3 Fast Algorithms For Slew Constrained Minimum Cost Buffering Shiyan Hu, Charles J. Alpert, Jiang Hu, Shrirang Karandikar, Zhuo Li, Weiping Shi, C. N. Sze Department of Electrical and Computer Engineering,

More information

Full-chip Routing Optimization with RLC Crosstalk Budgeting

Full-chip Routing Optimization with RLC Crosstalk Budgeting 1 Full-chip Routing Optimization with RLC Crosstalk Budgeting Jinjun Xiong, Lei He, Member, IEEE ABSTRACT Existing layout optimization methods for RLC crosstalk reduction assume a set of interconnects

More information

Interconnect Delay and Area Estimation for Multiple-Pin Nets

Interconnect Delay and Area Estimation for Multiple-Pin Nets Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Z. Pan UCLA Computer Science Department Los Angeles, CA 90095 Sponsored by SRC and Avant!! under CA-MICRO Presentation

More information

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive

More information

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM

A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 09, 2016 ISSN (online): 2321-0613 A Review Paper on Reconfigurable Techniques to Improve Critical Parameters of SRAM Yogit

More information

1 Introduction 2. 2 A Simple Algorithm 2. 3 A Fast Algorithm 2

1 Introduction 2. 2 A Simple Algorithm 2. 3 A Fast Algorithm 2 Polyline Reduction David Eberly, Geometric Tools, Redmond WA 98052 https://www.geometrictools.com/ This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy

More information

WE consider the gate-sizing problem, that is, the problem

WE consider the gate-sizing problem, that is, the problem 2760 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL 55, NO 9, OCTOBER 2008 An Efficient Method for Large-Scale Gate Sizing Siddharth Joshi and Stephen Boyd, Fellow, IEEE Abstract We consider

More information

Interconnect Design for Deep Submicron ICs

Interconnect Design for Deep Submicron ICs ! " #! " # - Interconnect Design for Deep Submicron ICs Jason Cong Lei He Kei-Yong Khoo Cheng-Kok Koh and Zhigang Pan Computer Science Department University of California Los Angeles CA 90095 Abstract

More information

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree Chapter 4 Trees 2 Introduction for large input, even access time may be prohibitive we need data structures that exhibit running times closer to O(log N) binary search tree 3 Terminology recursive definition

More information

Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects

Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects Jun Chen ECE Department University of Wisconsin, Madison junc@cae.wisc.edu Lei He EE Department University of

More information

Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation

Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation Hamid Shojaei, and Azadeh Davoodi University of Wisconsin 1415 Engineering Drive, Madison WI 53706 Email: {shojaei,

More information

Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles

Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles Ying Rao, Tianxiang Yang University of Wisconsin Madison {yrao, tyang}@cae.wisc.edu ABSTRACT Buffer

More information

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize. Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject

More information

Buffered Routing Tree Construction Under Buffer Placement Blockages

Buffered Routing Tree Construction Under Buffer Placement Blockages Buffered Routing Tree Construction Under Buffer Placement Blockages Abstract Interconnect delay has become a critical factor in determining the performance of integrated circuits. Routing and buffering

More information

Fast Interconnect Synthesis with Layer Assignment

Fast Interconnect Synthesis with Layer Assignment Fast Interconnect Synthesis with Layer Assignment Zhuo Li IBM Research lizhuo@us.ibm.com Tuhin Muhmud IBM Systems and Technology Group tuhinm@us.ibm.com Charles J. Alpert IBM Research alpert@us.ibm.com

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

Crosslink Insertion for Variation-Driven Clock Network Construction

Crosslink Insertion for Variation-Driven Clock Network Construction Crosslink Insertion for Variation-Driven Clock Network Construction Fuqiang Qian, Haitong Tian, Evangeline Young Department of Computer Science and Engineering The Chinese University of Hong Kong {fqqian,

More information

Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE Model

Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE Model 446 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 4, APRIL 2000 Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE

More information

Retiming and Clock Scheduling for Digital Circuit Optimization

Retiming and Clock Scheduling for Digital Circuit Optimization 184 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Retiming and Clock Scheduling for Digital Circuit Optimization Xun Liu, Student Member,

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

On Covering a Graph Optimally with Induced Subgraphs

On Covering a Graph Optimally with Induced Subgraphs On Covering a Graph Optimally with Induced Subgraphs Shripad Thite April 1, 006 Abstract We consider the problem of covering a graph with a given number of induced subgraphs so that the maximum number

More information

An Improved Measurement Placement Algorithm for Network Observability

An Improved Measurement Placement Algorithm for Network Observability IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 16, NO. 4, NOVEMBER 2001 819 An Improved Measurement Placement Algorithm for Network Observability Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper

More information

Crosstalk Aware Static Timing Analysis Environment

Crosstalk Aware Static Timing Analysis Environment Crosstalk Aware Static Timing Analysis Environment B. Franzini, C. Forzan STMicroelectronics, v. C. Olivetti, 2 20041 Agrate B. (MI), ITALY bruno.franzini@st.com, cristiano.forzan@st.com ABSTRACT Signals

More information

Clustering Using Graph Connectivity

Clustering Using Graph Connectivity Clustering Using Graph Connectivity Patrick Williams June 3, 010 1 Introduction It is often desirable to group elements of a set into disjoint subsets, based on the similarity between the elements in the

More information

Linking Layout to Logic Synthesis: A Unification-Based Approach

Linking Layout to Logic Synthesis: A Unification-Based Approach Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Repeater Insertion in Tree Structured Inductive Interconnect: Software Description

Repeater Insertion in Tree Structured Inductive Interconnect: Software Description Repeater Insertion in Tree Structured Inductive Interconnect: Software Description Yehea I. Ismail ECE Department Northwestern University Evanston, IL 60208 Eby G. Friedman ECE Department University of

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

Constructive floorplanning with a yield objective

Constructive floorplanning with a yield objective Constructive floorplanning with a yield objective Rajnish Prasad and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 13 E-mail: rprasad,koren@ecs.umass.edu

More information

Placement Algorithm for FPGA Circuits

Placement Algorithm for FPGA Circuits Placement Algorithm for FPGA Circuits ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Yuzhe Chen, Zhaoqing Chen and Jiayuan Fang Department of Electrical

More information

AS VLSI technology scales to deep submicron and beyond, interconnect

AS VLSI technology scales to deep submicron and beyond, interconnect IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL., NO. 1 TILA-S: Timing-Driven Incremental Layer Assignment Avoiding Slew Violations Derong Liu, Bei Yu Member, IEEE, Salim

More information

Crosstalk Noise Avoidance in Asynchronous Circuits

Crosstalk Noise Avoidance in Asynchronous Circuits Crosstalk Noise Avoidance in Asynchronous Circuits Alexander Taubin (University of Aizu) Alex Kondratyev (University of Aizu) J. Cortadella (Univ. Politecnica Catalunya) Luciano Lavagno (University of

More information

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

Iterative-Constructive Standard Cell Placer for High Speed and Low Power Iterative-Constructive Standard Cell Placer for High Speed and Low Power Sungjae Kim and Eugene Shragowitz Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455

More information

Path-Based Buffer Insertion

Path-Based Buffer Insertion 1346 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 7, JULY 2007 Path-Based Buffer Insertion C. N. Sze, Charles J. Alpert, Jiang Hu, and Weiping Shi Abstract

More information

COPYRIGHTED MATERIAL. Architecting Speed. Chapter 1. Sophisticated tool optimizations are often not good enough to meet most design

COPYRIGHTED MATERIAL. Architecting Speed. Chapter 1. Sophisticated tool optimizations are often not good enough to meet most design Chapter 1 Architecting Speed Sophisticated tool optimizations are often not good enough to meet most design constraints if an arbitrary coding style is used. This chapter discusses the first of three primary

More information

Optimization Methods in Management Science

Optimization Methods in Management Science Problem Set Rules: Optimization Methods in Management Science MIT 15.053, Spring 2013 Problem Set 6, Due: Thursday April 11th, 2013 1. Each student should hand in an individual problem set. 2. Discussing

More information

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of

More information

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions.

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions. CS 787: Advanced Algorithms NP-Hardness Instructor: Dieter van Melkebeek We review the concept of polynomial-time reductions, define various classes of problems including NP-complete, and show that 3-SAT

More information

CAD Technology of the SX-9

CAD Technology of the SX-9 KONNO Yoshihiro, IKAWA Yasuhiro, SAWANO Tomoki KANAMARU Keisuke, ONO Koki, KUMAZAKI Masahito Abstract This paper outlines the design techniques and CAD technology used with the SX-9. The LSI and package

More information

ESE370 Fall ESE370, Fall 2015 Project 2: Memory Design Wednesday, Nov. 11

ESE370 Fall ESE370, Fall 2015 Project 2: Memory Design Wednesday, Nov. 11 ESE370 Fall 205 University of Pennsylvania Department of Electrical and System Engineering Circuit-Level Modeling, Design, and Optimization for Digital Systems ESE370, Fall 205 Project 2: Memory Design

More information

AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS

AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS www.arpnjournals.com AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS C. Uttraphan 1, N. Shaikh-Husin 2 1 Embedded Computing System (EmbCoS) Research Focus Group,

More information

4 Fractional Dimension of Posets from Trees

4 Fractional Dimension of Posets from Trees 57 4 Fractional Dimension of Posets from Trees In this last chapter, we switch gears a little bit, and fractionalize the dimension of posets We start with a few simple definitions to develop the language

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

FPGA Power Management and Modeling Techniques

FPGA Power Management and Modeling Techniques FPGA Power Management and Modeling Techniques WP-01044-2.0 White Paper This white paper discusses the major challenges associated with accurately predicting power consumption in FPGAs, namely, obtaining

More information

Crosstalk Noise Optimization by Post-Layout Transistor Sizing

Crosstalk Noise Optimization by Post-Layout Transistor Sizing Crosstalk Noise Optimization by Post-Layout Transistor Sizing Masanori Hashimoto hasimoto@i.kyoto-u.ac.jp Masao Takahashi takahasi@vlsi.kuee.kyotou.ac.jp Hidetoshi Onodera onodera@i.kyoto-u.ac.jp ABSTRACT

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

Capacitive Coupling Noise in High-Speed VLSI Circuits

Capacitive Coupling Noise in High-Speed VLSI Circuits Capacitive Coupling Noise in High-Speed VLSI Circuits Payam Heydari Department of Electrical and Computer Engineering University of California Irvine, CA 92697 Massoud Pedram Department of Electrical Engineering-Systems

More information

PCP and Hardness of Approximation

PCP and Hardness of Approximation PCP and Hardness of Approximation January 30, 2009 Our goal herein is to define and prove basic concepts regarding hardness of approximation. We will state but obviously not prove a PCP theorem as a starting

More information

Module 2: Classical Algorithm Design Techniques

Module 2: Classical Algorithm Design Techniques Module 2: Classical Algorithm Design Techniques Dr. Natarajan Meghanathan Associate Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu Module

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

Chip-Level Verification for Parasitic Coupling Effects in Deep-Submicron Digital Designs

Chip-Level Verification for Parasitic Coupling Effects in Deep-Submicron Digital Designs Chip-Level Verification for Parasitic Coupling Effects in Deep-Submicron Digital Designs Lun Ye 1 Foong-Charn Chang 1 Peter Feldmann 1 Nagaraj NS 2 Rakesh Chadha 1 Frank Cano 3 1 Bell Laboratories, Murray

More information

LECTURES 3 and 4: Flows and Matchings

LECTURES 3 and 4: Flows and Matchings LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that

More information

Effects of FPGA Architecture on FPGA Routing

Effects of FPGA Architecture on FPGA Routing Effects of FPGA Architecture on FPGA Routing Stephen Trimberger Xilinx, Inc. 2100 Logic Drive San Jose, CA 95124 USA steve@xilinx.com Abstract Although many traditional Mask Programmed Gate Array (MPGA)

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

CENG 4480 Lecture 11: PCB

CENG 4480 Lecture 11: PCB CENG 4480 Lecture 11: PCB Bei Yu Reference: Chapter 5 of Ground Planes and Layer Stacking High speed digital design by Johnson and Graham 1 Introduction What is a PCB Why we need one? For large scale production/repeatable

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

Concurrent Flip-Flop and Repeater Insertion for High Performance Integrated Circuits

Concurrent Flip-Flop and Repeater Insertion for High Performance Integrated Circuits Concurrent Flip-Flop and Repeater Insertion for High Performance Integrated Circuits Pasquale Cocchini pasquale.cocchini@intel.com, Intel Labs, CADResearch Abstract For many years, CMOS process scaling

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

Effects of Specialized Clock Routing on Clock Tree Timing, Signal Integrity, and Routing Congestion

Effects of Specialized Clock Routing on Clock Tree Timing, Signal Integrity, and Routing Congestion Effects of Specialized Clock Routing on Clock Tree Timing, Signal Jesse Craig IBM Systems & Technology Group jecraig@us.ibm.com Denise Powell Synopsys, Inc. dpowell@synopsys.com ABSTRACT Signal integrity

More information

A Hybrid Recursive Multi-Way Number Partitioning Algorithm

A Hybrid Recursive Multi-Way Number Partitioning Algorithm Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Hybrid Recursive Multi-Way Number Partitioning Algorithm Richard E. Korf Computer Science Department University

More information