PERFORMANCE optimization has always been a critical

Size: px

Start display at page:

Download "PERFORMANCE optimization has always been a critical"

Maria Spencer
6 years ago
Views:

1 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER Buffer Insertion for Noise and Delay Optimization Charles J. Alpert, Member, IEEE, Anirudh Devgan, Member, IEEE, and Stephen T. Quay Abstract Interconnect-driven optimization is an increasingly important step in high-performance design. Algorithms for buffer insertion have been successfully utilized to reduce delay in global interconnect paths; however, existing techniques only optimize delay and timing slack. With the continually increasing ratio of coupling capacitance to total capacitance and the use of aggressive dynamic logic circuit families, noise analysis and avoidance is becoming a major design bottleneck. Hence, timing and noise must be simultaneously optimized to achieve maximum performance. This paper presents comprehensive buffer insertion techniques for noise and delay optimization. Three algorithms are presented, the first for noise avoidance for single sink trees, the second for avoidance for multiple sink trees, and the last for simultaneous noise and delay optimization. We prove the optimality of each algorithm (under various assumptions) and present other theoretical results as well. We ran experiments on a highperformance microprocessor design and show that our approach fixes all noise violations. Our approach was separately verified by a detailed, simulation-based noise analysis tool. Further, we show that optimizing delay alone cannot fix all of the noise violations and that the performance penalty induced by optimizing both delay and noise as opposed to only delay is less than 2%. Index Terms Buffer insertion, interconnect-synthesis, noise analysis, routing, Steiner-tree. I. INTRODUCTION PERFORMANCE optimization has always been a critical step in the design of integrated circuits. Process technology scaling into the deep submicron regime has made interconnect performance more dominant than transistor and logic performance. With the continued scaling of process technology, the resistance per unit length of the interconnect continues to increase, the capacitance per unit length remains roughly constant and transistor or logic delay continues to decrease. This trend has led to interconnect delay been more dominant than logic delay. Process technology options, such as use of copper wires instead of aluminum can only provide temporary relief. The trend of increasing interconnect dominance is expected to continue. Interconnect-driven timing optimization techniques, such as wire sizing, buffer insertion and gate sizing have gained widespread acceptance in deep submicron design (see Cong et al. [7] for an excellent survey on layout optimization and interconnect synthesis). In particular, buffer insertion techniques have been successful in reducing interconnect delay. To the first order, interconnect delay is proportional Manuscript received December 24, 1998; revised July 27, This paper was recommended by Associate Editor M. Pedram. C. J. Alpert is with the IBM Austin Research Laboratory, Austin, TX USA ( alpert@austin.ibm.com). A. Devgan and S. T. Quay are with the IBM Server Group, Austin, TX USA ( devgan@austin.ibm.com; quayst@austin.ibm.com). Publisher Item Identifier S (99)09466-X. to the square of the length of the wire. Inserting buffers effectively divides the wire into smaller segments, which makes the delay almost linear in terms of interconnect length (plus the buffer delays). Additional advantages of buffer insertion will make this optimization even more pervasive as the ratio of device to interconnect delay continues to decrease. Several works have studied the delay-driven buffer insertion problem. Closed formed solutions have been proposed by Alpert and Devgan [1], Bakoglu [2], Chu and Wong [5], and Dhar and Franklin [9], all of which find exact solutions (under different assumptions) for inserting buffers on a two-pin net. Approaches which simultaneously construct a routing tree and insert buffers have been proposed by Kang et al. [13], Lillis et al. [19], and Okamoto and Cong [23]; Berman et al. [3] showed this problem is NP-complete. The works of Kannan et al. [14] and Lin and Marek-Sadowska [20] insert buffers on a tree by iteratively finding the best location for a single buffer. Finally, a number of works [1], [17] [19], [23] can be classified as extensions and variants to Van Ginneken s dynamic programming algorithm [31] which finds the optimal buffer placement under the Elmore delay model [10]. Lillis et al. [18] extended the algorithm to simultaneously perform wire sizing and buffer insertion using a buffer library that contains both inverting and noninverting buffers. In addition, they show how to integrate skew into the gate delay model and optimize a power function (e.g., the total number of buffers), all while retaining optimality. Later, Lillis [17] extended this work to handle nets with multiple sources. Alpert and Devgan [1] proposed a wire segmenting preprocessing algorithm to handle the one buffer per wire limitation of Van Ginneken s algorithm, which results in a smooth tradeoff between solution quality and run time. Although timing optimization has always been critical in the design process, present day design techniques and process technologies are making noise analysis and avoidance as important, or in some cases, even more important than timing analysis and optimization. The shrinking of minimum distance between adjacent wires has caused an increase in the coupling capacitance of a net to its neighbors. Furthermore, a wire s thickness is typically greater than its width, which has increased the ratio of coupling to total capacitance. A large coupling capacitance can cause a switching net to induce significant noise onto a neighboring net, resulting in an incorrect functional response. Further, the widespread use of fast dynamic logic circuits and its derivative logic families has made noise avoidance and analysis even more critical since these logic families are more susceptible to noise failure. It is no longer sufficient or even acceptable to optimize only for delay. Noise avoidance techniques must become an integral /99$ IEEE

2 1634 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 (a) (b) Fig. 1. The noise effect on a victim net: (a) without a buffer and (b) with a buffer. part of the performance optimization environment. Buffer insertion provides a suitable environment for simultaneous optimization of timing and noise. Consider the circuit shown in Fig. 1. The amount of coupling capacitance from one net to another is proportional to the distance that the two nets run parallel to each other. Fig. 1(a) illustrates the noise effects that the aggressor net (top) can have on the victim net (bottom). The coupling capacitance may cause an input signal on the aggressor net to induce a noise pulse on the victim net. If the resulting noise is greater than the tolerable noise margin (NM) of the sink, then an electrical fault results. Fig. 1(b) shows how inserting a buffer can distribute the capacitive coupling between the two newly created wires, resulting in a smaller noise pulse on the input of the inserted buffer than the pulse in Fig. 1(a). Since the buffer is a restoring logic gate, ideally no noise is propagated to its output (if the noise induced on its input is less than the tolerable noise margin). The second wire will also receive a smaller noise pulse, and if the amplitude of this noise pulse is lower the sink s noise margin, the circuit will function correctly. Further, if the wire is long enough, insertion of the buffer will also reduce the total source to sink delay. Various noise analysis and avoidance techniques have been proposed over the last few years [4], [12], [24], [28], [29], [32]. Circuit simulation techniques, such as SPICE [22], can also be used for noise estimation. When the problem can be modeled as a linear circuit (which it generally can be for most coupled noise problems), specialized linear model reduction techniques (such as rapid interconnect circuit evaluation (RICE) [27] or asymptotic waveform evaluation (AWE) [25]) are typically applied [10], [26]. For linear circuits, model reduction is preferred to circuit simulation due to the computational cost savings; however, the cost may still be prohibitive. The runtime concurrents are even more critical if noise analysis is to be performed within a physical design system, such as a global router. Hence, model reduction and circuit simulation are not suitable for these applications. Geometric noise models, e.g., based on geometric distances between wires, are computationally efficient enough to be used within a physical design system (such as a global router). However, these models are not accurate since they ignore the electrical properties of the circuit. Simplified electrical noise metrics are particularly amenable for use within a physical design optimization tool since they can be efficient and accurate. Vittal and Marek-Sadowska [32] proposed a noise model for crosstalk and showed how to reduce crosstalk during channel routing. The authors of [32] model a coupled network as a simplified RC circuit in which the capacitances for the victim line, coupling, and the aggressor line are modeled as lumped capacitances, and the gates are modeled by linear resistors. They then derive analytical expressions for noise from the resultant RC circuit. However, the Vittal and Marek-Sadowska do not consider the resistances for the victim and aggressor lines. Stohr et al. [29] represent the victim and aggressor net as simplified T models circuits and compute analytical noise expressions similar that are similar to those in [32]. The recently proposed noise metric of Devgan [8] efficiently computes coupled noise while considering circuit effects such as input slew rate, line resistances, and coupling capacitances. A nice feature of the metric is that its computational complexity, structure, and incremental nature is the same as the famous Elmore delay metric (see Section II-B), which make it particularly amenable for noise avoidance. Further, like the Elmore metric for delay, the noise metric is a provable upper bound on coupled noise. Other advantages of the noise metric include the ability to incorporate multiple aggressor nets and handle general victim and aggressor net topologies. For these reasons, we adopt the Devgan noise metric in this work. A disadvantage of the Devgan metric is that it becomes more pessimistic as the ratio of the aggressor net s transition time (at the output of the driver) to its delay decreases. However, cases in which this ratio becomes very small are rare since a long net delay generally corresponds to a large load on the driver, which in turn causes a slower transition time. The metric also does not consider the duration of the noise pulse. In general, the noise margin of a gate is dependent on both the peak noise amplitude and the noise pulse width. However, when considering failure at a gate, peak amplitude dominates pulse width. In other words, the gate is much more sensitive to a change in peak amplitude than a change in pulse width. Consequently, the induced noise can be approximated by the peak noise amplitude, especially when the noise metric is an upper bound. In practice, we observe that the Devgan metric provides a computationally efficient and sufficiently accurate noise estimate, as illustrated through the realistic industrial examples studied in Section V. The following contributions are made in the paper. We propose a formula for the maximum length of a wire such that no noise violation is induced. We prove that any buffer insertion algorithm that optimizes only delay is insufficient for fixing noise violations. We present two algorithms for optimal buffer insertion for noise avoidance. The first is an optimal linear-time algorithm for two-pin nets, and the second is an optimal quadratic-time algorithm for multipin nets. We present a third algorithm for minimizing the maximum source to sink delay such that all noise constraints are satisfied. The algorithm is optimal under the condition that the buffer library contains a single buffer type. Experiments with an industrial noise analysis tool show that our algorithm eliminates all noise violations for a set of 500 nets from a modern microprocessor design, while optimizing for delay alone does not eliminate

3 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1635 all violations. Further, the average delay penalty for simultaneously optimizing noise and delay compared to optimizing delay alone is only 2%. The rest of our paper is as follows. Section II discusses presents notation, a review of Devgan s noise metric [8], and the problem formulations that we study. Section III presents optimal buffer insertion algorithms for noise avoidance, and Section IV presents an optimal algorithm for simultaneously optimizing noise and delay. Section V presents experimental results, and we conclude in Section VI. II. PRELIMINARIES We assume that the input routing tree topology is fixed or that a Steiner estimation has been computed for the given net. A routing tree contains a set of wires and a set of nodes where is the unique source node, is the set of sink nodes, and is the set of internal nodes. A wire with length is an ordered pair of nodes in which the signal propagates from to. Each node has a unique parent wire. The tree is assumed to be binary, i.e., each node can have at most two children. 1 Let the left and right children of be denoted by and, respectively. We assume that if has only one child, then it is. The path from node to, denoted by, is an ordered subset of wires of. A buffer library is also given. A buffer insertion solution is a mapping which either assigns a buffer or no buffer, denoted by, to each internal node of. 2 Let denote the number of internal nodes with inserted buffers from solution. For sufficiently long wires in which multiple buffers may need to be inserted, wires should first be segmented (e.g., as in [1]) to create as many internal nodes as necessary to form a reasonable set of potential locations for buffer insertion. 3 Assigning buffers to induces nets, and hence subtrees, each with no internally placed buffers. For each, let, the subtree rooted at, be the maximal subtree of such that is the source and contains no internal buffers. Observe that if, contains only one node. A. Delay Optimization As in [1], [18], [23], and [31], we adopt the Elmore delay 1 In general, a routed Steiner topology will never have more than three children. Such a topology is converted into a binary tree by considering each node v with three children, say a, b, and c. A dummy infeasible node w is inserted and two of the children, say a and b, are picked to be children of w. Node v now has two children, c and w, and wire (v; w) has zero length. Which pair of nodes are chosen to be children of w does not affect the solutions returned by any of the algorithms presented. 2 A buffer placed on an internal node with degree d is interpreted as having one input, one output, and d 0 1 fanouts. 3 Note that our algorithms are independent of the wire segmenting algorithm used. Solution quality increases as the number of internal nodes increases, albeit at the expense of runtime. Indeed, it may be appropriate to develop a new wire segmenting algorithm for the particular formulations we address. model 4 [10] for interconnect delays and a linear model for gate delays. For each gate, let denote the input capacitance, the intrinsic resistance and the intrinsic delay of. Let and, respectively, denote the lumped capacitance and resistance for each wire. The capacitive load seen at node is the total lumped capacitance of, i.e., The Elmore delay for a wire The delay through a gate is given by is given by If,, then. The total delay from to is given by (4) Each sink has a given required arrival time, and we assume that the input signal arrives at the source node at time zero. The condition must hold for the circuit to meet its timing requirements. We define the slack for every as where is the set of all sinks that are downstream from, i.e., is an ancestor of all sinks in. This definition differs slightly from the traditional definition of slack, but one can see that they are actually the same if is viewed as the root of the tree. Observe that the conditions in (5) holds if and only if. B. Noise Avoidance In [8], Devgan proposed a coupled noise estimation metric that is similar in spirit to Elmore delay. The noise metric is an upper bound for RC and overdamped RLC circuits. Hence, if a given net satisfies the noise constraints formed by this metric, it should also pass any reasonable noise analysis tool. The metric enables a quick estimation of the noise induced by simultaneous switching of multiple aggressor nets. The metric depends on the resistance of the victim net, the resistance of the gate driving the victim net, coupling capacitances to the aggressor nets, and the rise times and the slopes of the signals on the aggressor nets. For example, consider the four aggressor nets and the single two-pin victim net in Fig. 2. The wire in the victim net is segmented into nine new wires such that each 4 We note recent efforts [15], [21] to compute delays in constant times based on lookup tables. Although these methods hold promise, we adopt the Elmore delay because it has the additive property that a path delay is equal to the sum of the edge delays in the path. Without this property, our buffer insertion algorithms cannot be proven optimal. (1) (2) (3) (5)

4 1636 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 Fig. 3. Example noise computation. Fig. 2. Wire segmenting scheme for multiple aggressor nets. new wire is completely coupled to either zero, one, or two of the aggressor nets. The coupling capacitance from an aggressor net to a victim net can be modeled as some fraction of the wire capacitance of the victim net. For a given wire near wires from simultaneously switching aggressor nets, let be the ratios of coupling to wire capacitance from each net to, and let be the slopes (i.e., power supply voltage over input rise time) of the net signals. The total current induced by the aggressor nets on is Note that is the current through due to capacitive coupling. Often, information about neighboring aggressor nets is unavailable, especially if buffer insertion is performed before routing. In this case, a designer may wish to perform buffer insertion to improve performance while also avoiding future potential noise problems. When performing buffer insertion in estimation mode, one might assume that: 1) there is a single aggressor net which couples with each wire in the routing tree, 2) the slope of all aggressor nets is, and 3) some fixed ratio of the total capacitance of each wire is due to coupling capacitance. In this case for each wire. Let be defined as the total downstream current seen at, i.e., (6) (7) currents (induced from not shown aggressor nets) and resistances. To compute the cumulative noise seen at the sinks, we first compute the total downstream currents via (7) for each node:,,,, and. One can now compute the noise induced on each edge from (8), which yields Noise Noise Noise (10) Assuming the driver at has resistance, we can compute the noise seen at each sink via (9) Noise Noise Noise Noise Noise Noise Noise which evaluates to Noise Noise Each wire adds to the noise induced on the victim net. The amount of additional noise induced from a wire is given by Each node. The condition has a predetermined noise margin The total noise additional noise seen at a sink some upstream node is given by (8) starting at (9) Noise (11) must hold if the circuit is to have no electrical faults. In other words, the total noise propagated from every restoring gate to each its sinks must be less than the noise margin for. We define the noise slack for every as where 0 if there is no gate at. The path from to has no intermediate buffers. If an intermediate buffer is present, the noise computation begins from the output of the buffer, since the buffer is a restoring stage. Fig. 3 shows an example of a victim net with driver and sinks and. The wires are labeled with their corresponding (12) The purpose of noise slack is to serve as a noise margin for internal nodes of the tree. Observe that for each sink. The noise constraints for the downstream sinks in will be satisfied if and only if the noise slack at is

5 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1637 greater than the noise seen at. Observe that (11) holds if and only if for each. 5 C. Problem Formulations We study two different buffer insertion problems. The first formulation tries to optimize only noise while minimizing the total number of buffers; two algorithms for this problem are presented in Section III. The second formulation seeks a solution with no noise violations that optimizes delay, and this problem is solved by our third algorithm, which is an extension of Van Ginneken s algorithm [31]. Optimality for all three algorithms is guaranteed under the condition that the buffer library contains a single buffer type. The first problem seeks the minimum number of buffers that must be inserted into a tree so that no noise violations exist. Problem 1: Given a tree, a buffer library, and noise margins for each, find a solution which minimizes, such that for each. A solution to this problem formulation is certainly useful for handling noncritical nets with noise problems, for which delay optimization is unnecessary. One can integrate delay into the problem by instead optimizing performance (maximize the slack at the source) while simultaneously avoiding noise. Problem 2: Given a tree, a buffer library, and noise margins for each, find a solution which maximizes, 6 such that for each. Fig. 4. Van Ginneken s algorithm. D. Review of Van Ginneken s Algorithm Two of the algorithms we present rely on concepts introduced in Van Ginneken s buffer insertion algorithm [31]; we now present a review. The algorithm optimally solves the problem of minimizing delay for a single buffer type. The algorithm s main idea is to construct possible candidate solutions for each node in the tree. The candidates for node only can be computed after candidates for all nodes in.acandidate is a three-tuple where is the load seen at, is the slack at, and is the current solution for the subtree. Given a candidate for a node, the only information needed to compute the slack for the parent of is the load at, the slack at, and the parent wire capacitance and resistance. Hence, and are used to percolate new candidates up the tree, and is used to recover the final solution when the algorithm terminates. Fig. 4 shows Van Ginneken s algorithm which takes a routing tree and a single buffer as input and returns a candidate solution for the source. Step 1 calls the Find_Candidates procedure which is presented in Fig. 5. It returns a set of 5 Observe the similarities between the noise metric and Elmore delay metric. In particular, noise margin is analogous to require arrival time, noise slack is analogous to slack, current is analogous to capacitance, and noise is analogous to delay. 6 Observe that various formulations can be captured by manipulating the RAT (si) values. For example, if si is the only critical sink then RAT (w) =1 for all w 2 SI 0fsig. Alternatively, setting all slacks to be equal captures minimizing max si2si Delay(so 0 si). Fig. 5. Find_Candidates procedure. possible candidates for the source, but without accounting for the driver delay. Hence, Step 3 updates each candidate to include the driver delay, and Step 4 returns the candidate with largest slack. The complexity of the algorithm is. The Find_Candidates procedure shown in Fig. 5. It takes the node to be processed as input, recursively computes the lists of possible candidates for all the nodes in, and then returns the candidate list for node. The procedure can be broken into four main parts: Steps 1 4 examine the candidates for the children of and merge them together to form, the set of candidates for. Step 1 handles the base case in which is a sink, Step 2 handles the single child case, and Steps 3 and 4 handle the two children case. For the two children case, the two child candidate lists and are traversed, and the two candidates and are merged together by summing their downstream capacitances and currents, and taking the minimums of their timing and noise slacks. Observe that the number of candidates resulting from merging the two lists is only as opposed to.

6 1638 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 which implies that (15) Fig. 6. Uniform neighboring coupling capacitance for a single wire. Step 5 inserts buffers for some candidates, thereby creating new candidates to add to. Since it may be physically impossible to place a buffer at the current node, only nodes which are feasible are considered. The buffer is considered for insertion at, and the candidate in which produces the largest resulting slack is preserved. Step 6 computes the new load, slack, current and noise slack for each candidate induced by the parent wire of. Finally, Step 7 prunes inferior candidates from, using the pruning schemes of [18] and [31], not the scheme used in Algorithm 2. Here, solutions are sorted in increasing order of load and decreasing order of slack. Given two candidates and, is inferior to if and only if and. III. NOISE CONSTRAINED BUFFER INSERTION In this section, we study Problem 1, inserting the minimum number of buffers such that noise constraints are satisfied. A. Theoretical Results for Noise Avoidance We begin by studying the simplest case of a single wire with uniform width and neighboring coupling capacitance, as shown in Fig. 6. This case is instructive, since the wires in a routing tree can always be segmented into a set of several such wires, as seen in Fig. 2. For each wire, let be the wire resistance per unit length and be the current per unit length. Since current is a constant times wire capacitance (6), we can use a -model to represent its distribution. We now derive a new formula for finding the longest possible length of the wire, driven by buffer, such that there is not a noise violation. Theorem 1: For a given wire in a routing tree, a buffer needs to be inserted on to satisfy noise constraints if and only if and (13) Proof: If we cannot insert a buffer on wire, then the noise seen at is minimized when is inserted just above in the routing tree. From (11), for noise constraints to be satisfied, we must have (14) which is a quadratic in. Solving for yields the theorem. The constraint is needed to ensure that. More generally, given that is the unit wire capacitance of, we can substitute in (13) to yield (16) Theorem 1 leads to some interesting observations. First, if the noise slack equals, then the wire length is zero. A smaller noise slack would cause to be negative, a result which implies that a buffer should have been inserted on the subtree. If the noise slack is too small [i.e., less than ], then by the time node is reached, it is too late to insert a buffer on wire to satisfy noise constraints. Second, notice that as the resistance of the driving buffer increases, the negative term in (13) decreases faster than the positive square root term, which makes the length decrease. The maximum wire length is achieved when the buffer resistance and downstream current are zero, yielding a length of. Such a simple approximation could be useful for noise avoidance if the driver s properties are unknown or if the ratio of driver to wire resistance is close to zero. Finally, observe that in (16), the ratio of bad capacitance to total capacitance is inversely proportional to the distance separating the aggressor and victim nets, i.e.,, for some constant. If the wire is coupled to a single aggressor net, one can solve (16) for the separating distance, yielding (17) Theorem 2: A net that has been optimally buffered to minimize only delay may have noise violations. Proof: Consider a wire in which and are gates in the buffered net. Let be the slope of an aggressor net, and let be the ratio of coupling to total capacitance for. Alternatively, a path of wires could connect to but the analysis is basically the same. If there is a noise violation at, then the following equation [similar to (13)] must hold: Solving for yields (18) (19)

7 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1639 Fig. 7. Iterative application of Theorem 1. noise slack, then a buffer must be inserted. Step 4 computes the maximum length that this buffer may be inserted from, and inserts it there at a new internal node. Finally, Step 5 computes the noise slack and the source and inserts a buffer right after the source if there is a noise violation (which can only occur if ). Theorem 3: Algorithm 1 computes an optimal solution to Problem 1 for a single-sink tree in time. The proof of optimality follows from the fact that buffers are always inserted their maximal distance up the tree, according to Theorem 1. Observe that extending Algorithm 1 to accommodate a buffer library with multiple buffer types is trivial, since according to Theorem 1, the buffer with smallest resistance always yields the maximum spacing between buffers. Hence, one can equivalently obtain the optimal solution by creating a new buffer library consisting only of the buffer in the original library with smallest resistance. Fig. 8. Algorithm 1: noise avoidance for single-sink tree. For any fixed values for the parameters on the right side of the inequality, a noise violation will occur if the noise margin at is small enough. Even if is reasonably large, an aggressor net can have a very large slope and a high coupling ratio which would also cause a violation. B. Algorithm 1: Optimal Noise Avoidance for Single-Sink Trees Theorem 1 suggests a noise avoidance buffer insertion algorithm for trees with only one sink. Begin at the sink node and work up the tree, updating the total downstream current and noise slacks of visited internal nodes. At a given node, use Theorem 1 to find out if adding a buffer is necessary and to compute its furthest possible location up the parent wire. The algorithm terminates when the source node is reached. Fig. 7 illustrates the order in which buffers will be inserted by the algorithm. Fig. 8 presents the description for Algorithm 1, Noise Avoidance for Single-Sink Trees. The algorithm accepts a routing tree, and a single buffer type. Step 1 initializes the current and noise slack of the sink node, then Steps 2 4 climb up the tree visiting each node in turn. Step 3 examines whether or not a buffer needs to be inserted on the current wire, by computing the noise from placing a buffer at node. If this noise is less than the noise slack, then no buffer needs to be inserted, so the algorithm computes the downstream current and noise slack for node, then moves to the next wire. If the noise computed in Step 3 is larger then the C. Algorithm 2: Optimal Noise Avoidance for Multi-Sink Trees It may appear fairly easy to extend Algorithm 1 to trees with multiple sinks. However, some difficulty arises when processing an internal node with two children, under the following scenario. Let,, and, respectively, be the wire, current, and noise slack for the left (right) branch of. It may be that and, i.e., the noise constraints for the left and right branches are satisfied. However, we may have, which implies that merging the two branches would cause a noise violation. Thus, a buffer must be placed on either the left or right branch immediately following. One cannot immediately deduce which branch to choose since we may have and, i.e., the left branch is more tolerant of noise than the right branch, but the left branch has a larger downstream current which makes it more noise sensitive. The correct choice cannot be made without knowing the characteristics and location of the gate driving, but since the algorithm is bottom-up, this location of this gate has not yet been determined. To handle this additional complexity, we propose to generate a set of candidate solutions for each node and propagate these candidate solutions up the tree, in the same spirit as in Van Ginneken s algorithm [31]. This time, a candidate is defined as a three-tuple where is the downstream current seen at, is the noise slack for, and is the current solution for the subtree. and are used to determine which candidates to percolate up the tree, and stores the current solution seen so far. When a node with two children is encountered, we let denote the new solution that results from merging solutions for the left and for the right branch of. We assign if either or and otherwise. Whenever a node with two children is encountered and a buffer needs to be inserted on either the left or the right branch, candidates for each of these two options are generated. The candidates are stored in nondecreasing order by downstream current so that inferior solutions can be pruned in a linear pass of the current

8 1640 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 If there is a violation, then two new solutions, one with a new buffer on the left and one with a new buffer on the right, are generated and inserted into the current list of candidates. When the algorithm terminates, the solution(s) in with the fewest number of buffers is chosen. Theorem 4: Algorithm 2 returns an optimal solution to Problem 1 for a multi-sink tree in time. As in Algorithm 1, optimality follows from the fact that buffers are only inserted at their maximal distance up the tree, except for the case when the test in Step 5 holds. But in this case, both possible buffer insertions are preserved and propagated up the tree, and at least one of these must be optimal. The complexity is since each node may have as many as candidates. In practice, we believe that the algorithm s run time will be linear in the average case, since the test in Step 5 will rarely hold. As for Algorithm 1, one can equivalently obtain an optimal solution for a buffer library with multiple buffer types by selecting the buffer type with smallest resistance. IV. OPTIMIZING DELAY WHILE SATISFYING NOISE CONSTRAINTS Fig. 9. Algorithm 2: noise avoidance for multi-sink trees. candidate list. Given two candidates and for node, we say that is inferior to if and only if and. The complete description for Algorithm 2: Optimal Noise Avoidance for Multi-Sink Trees, is presented in Fig. 9. The algorithm is recursive and is initially passed the source for the tree. 7 Algorithm 2 is similar to Algorithm 1, except for Steps 4 7 which handle nodes with two children. Step 4 iterates through each left branch candidate and each right branch candidate using the Van Ginneken s linear merging technique [31]. Step 5 tests whether merging the two candidates results in a noise violation, 8 and if there is no violation, Step 7 merges the candidates for the two branches without inserting a buffer. 7 Note that it is inefficient to store M for each candidate. In the actual implementation, pointers are stored for the left and right candidates, and the final solution can be revealed by traversing these pointers. 8 We assume R so > R b for this test, otherwise one must test whether the current solution M will have no noise violations if no more buffers are inserted. If this test yields no noise violations, then Step 7 should be executed instead of Step 6. A. Algorithm 3: Noise Avoidance for Delay Minimization To address Problem 2, simultaneous optimization of noise and delay, we modify Van Ginneken s optimal delay algorithm (Figs. 4 and 5) to include noise avoidance. The essence of our modifications is the following: whenever Van Ginneken s algorithm considers inserting a buffer into a candidate, we check to see if the noise constraints have been violated; if they have, we do not permit a buffer to be inserted. Observe that our algorithm will actually generate fewer candidates than Van Ginneken s algorithm. Van Ginneken s algorithm generates a set of candidates which violate noise constraints and a set which satisfy noise constraints, while our algorithm only generates candidates belonging to the latter set. The other modifications are for bookkeeping purposes and to permit a buffer library with more than one buffer. Again, a list of candidates is computed for each node in the tree, except that this now a candidate is a fivetuple where is the load seen at, is the slack at, is the downstream current seen at, is the noise slack at, and is the current solution. Figs. 10 and 11 illustrate Algorithm 3, which is equivalent to Van Ginneken s algorithm except for the modifications for noise avoidance which are shown in boldface. Step 1 calls the Find_Noise_Candidates procedure which returns a list of candidate solutions that do not include the driver s effect on delay and noise. Step 2 adds the driver delay and computes the noise slack, then the candidate with the best slack, such that noise constraints are satisfied, is returned in Step 3. The Find_Noise_Candidates procedure shown in Fig. 11. It takes the node to be processed as input, recursively computes the lists of possible candidates for all the nodes in, and then returns the candidate list for node. Steps 1 4 are

9 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1641 Fig. 10. Algorithm 3: noise avoidance for delay minimization. Fig. 12. Illustrations for proof of Theorem 5. Fig. 11. Find_Noise_Candidates procedure. the same as in Fig. 5 except for additional bookkeeping. In Step 5, each buffer type in the library is considered for insertion at, and the candidate in which produces the largest resulting slack such that noise constraints are satisfied is chosen. The procedure will not insert a buffer if its causes a noise violation. This step is the fundamental modification made to Van Ginneken s algorithm. Step 6 computes the new load, slack, current and noise slack for each candidate induced by the parent wire of. Finally, Step 7 is the same as in Fig. 5. The modifications for noise avoidance do not increase the time complexity of Van Ginneken s algorithm; hence, Algorithm 3 has time complexity [18]. B. Optimality of Algorithm 3 Theorem 5: If and both and hold for each, then Algorithm 3 returns an optimal solution to Problem 2. Proof: First, observe that Algorithm 3 always returns a solution which satisfies the noise constraints (even when ) since neither a buffer nor the driver is added to a candidate if it has a noise violation. In Step 5 of the Find_Candidates procedure, a buffer is only added to a candidate if the addition of the buffer does not violate the noise margins of the sinks that it drives. Similarly, Step 4 of Fig. 10 will only return a solution with positive noise slack. It may seem at first that pruning candidates may cause the optimal solution to be eliminated, since pruning is performed based on load and timing slack, not current and noise slack. Consider candidates and for node where and. Since is inferior to, then will be pruned by Step 7, but perhaps will have a noise violation further up the tree while will not. Since subsequently becomes an illegal solution, may actually be the optimal solution to Problem 2, but the existence of causes it to be pruned. We now show that the optimal solution will never be pruned. There are three cases, illustrated in Fig. 12. Case 1: and both see a single gate downstream from. If this gate is a buffer for both candidates, then the wire capacitance component of will be greater than the wire capacitance component for since the gate loads are the same and. Hence, the total downstream wire length from will be longer for than for. Since current is monotone increasing in wire length, we have. And since the noise margins for the buffers are the same,. So, if fails to meet noise constraints further up the tree, will fail as well, i.e., pruning does not eliminate an optimal solution. Alternatively, if sees one of the original sinks downstream from, while sees a buffer, then its downstream wire length must be longer than that of,so. Clearly, it is impossible for to see a buffer downstream and for to see a sink because then (because the buffer

10 1642 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 has smaller input capacitance than each sink). By assumption, the sink s noise margin is no greater than the buffer s noise margin, so. As before, if fails to meet noise constraints further up the tree then will fail as well. Case 2: One candidate drives a single gate, and one drives multiple gates. Since, then must drive the single gate since the buffer must have smaller capacitance than either branch. Hence, this gate must be further up the tree then either of the two gates driven by. Analysis similar to that of Case 1 can be applied to conclude that and, which makes worse than in terms of both timing and noise, so pruning will not eliminate an optimal solution. Case 3: Each candidate sees multiple sinks downstream from. In this case, it is possible to have, since the gate(s) driven on the left branch for may be further up the tree than for. This means that unlike in the previous two cases, may violate noise constraints further up the tree, while would not. Thus, a legal solution may in effect become pruned by an illegal one, making it seem possible that a potentially optimal solution may be pruned. We now show that Step 4 in Find_Candidates constructs a third solution which will not be pruned, and that is inferior to both in terms of noise and timing. This implies that could not have been part of the optimal solution. Decompose the loads and of and into their left and right components: and.if, then we can use the same analyzes from Cases 1 and 2 to conclude that is inferior to in terms of both timing and noise. Thus, pruning it does not eliminate an optimal solution. If, then either or since downstream current is a monotone nondecreasing function of capacitance. If, then consists of the left branch of and the right branch of, otherwise, it consists of the right branch of and the left branch of. Thus,, and since the buffers seen at are further up the tree for than for and, we conclude that, and by using the same arguments as in Cases 1 and 2. Finally, to show that it is safe to prune because of the existence of, we must show that. Decompose the slacks and into their left and right components: and. Since is inferior to, and hence.if consists of the left branch of and the right branch of, then.if consists of the right branch of and the left branch of, then. In either case,, which means that is inferior to in terms of both noise and timing. C. Discussion For Theorem 5, the assumptions and are somewhat realistic since one generally would want to choose buffers with small input capacitance and high noise margin. We observe that if is sufficiently large, then any solution with buffer inserted would instantly be pruned since both its capacitance and slack would be worse that those for the zero-buffer solution. In this case, Algorithm 3 would fail to insert any buffers, even if the zero-buffer solution had noise violations. In practice, buffers are commonly used to reduce delay often by decoupling off-path load capacitance. Thus, a buffer library should have at least one buffer with small input capacitance relative to the wire and sink capacitances. As long as such a buffer exists in the library and there is sufficient wire segmenting, Algorithm 3 will find a legal solution. When there is more than one buffer in the buffer library, even if they all have small input capacitance, the optimality of Algorithm 3 is not guaranteed. The proof of Theorem 5 utilizes the fact that if one candidate sees more downstream capacitance than another candidate, then it must also see more wire capacitance, and hence, more downstream current. Since different buffer types have different input capacitances, this assertion cannot be made. We believe that even with multiple buffer types, Algorithm 3 will generally produce solutions that are very close to optimal; our experimental results in Section V strongly support this claim. In optimally solving Problem 2, more buffers may be added than necessary, e.g., six additional buffers might be inserted to squeeze out an extra 25 ps of performance. Alternatively, one may wish to minimize the number of inserted buffers such that noise and timing constraints are satisfied. Problem 3: Given a tree, a buffer library, and noise margins for each, find a solution which minimizes such that and for each. Since many possible solutions may exist, maximize as a secondary objective. We can address this formulation by incorporating an extension to Van Ginneken s algorithm proposed by Lillis et al. [18] into Algorithm 3. Lillis et al. showed that instead of storing a single candidate list for each node, one can store several lists in an array indexed by the total number of buffers inserted in the candidate solution. This allows one to generate the optimal solution in terms of delay for any desired number of buffers. We have incorporated this extension into Algorithm 3 and address Problem 3 by first finding the best solution in terms of timing for each possible number of buffers and then returning the solution with the fewest buffers such that both noise and timing constraints are satisfied. This algorithm has been incorporated into a tool called BuffOpt. We have theoretically shown that optimizing delay alone cannot possibly fix all noise problems. The experimental results in the next section shows that this phenomenon occurs in practice as well. V. EXPERIMENTAL RESULTS The techniques presented in the previous sections have been implemented in a tool called Buffopt. We refer to the algorithm which find the optimizes only delay [1], [18] as DelayOpt. DelayOpt is the same as Algorithm 3 in Fig. 10, without the boldface modifications. However, like BuffOpt, DelayOpt was extended to be able to trade off between the degree of timing optimization and the number of buffers inserted. This extension allows us to report the best result for any given

11 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1643 TABLE I SINK DISTRIBUTION OF THE 500 TEST NETS TABLE III NOISE AVOIDANCE COMPARISONS OF BuffOpt VERSUS DelayOpt(k) TABLE II NUMBER OF NOISE VIOLATIONS REPORTED BY 3dNOISE BEFORE AND AFTER RUNNING BuffOpt number of buffers k, and we denote this result as DelayOpt(k). For our experiments, we selected a set of 500 nets from a modern Power PC microprocessor design. Each gate in the design belongs to precharacterized cell library, and timing and capacitance data for each gate was obtained by simulation and extraction. The 500 nets with largest total capacitances were chosen for analysis, since these nets were most likely to have noise violations. Table I shows the distribution of the sizes of these nets. To verify BuffOpt, we separately ran a more detailed, simulation-based noise analysis tool, 3dnoise [26]. 3dnoise was run both before and after running BuffOpt and DelayOpt. To perform noise analysis, 3dnoise uses accurate momentmatching based techniques that are similar to RICE [27]. We ran BuffOpt, DelayOpt and 3dnoise all in estimation mode (see Section II-B), assuming a 0.7 coupling to total capacitance ratio from a single aggressor net with rise time 0.25 ns and a power supply voltage of 1.8 V (so 7.2). The tolerable noise margin for every gate in the design is 0.8 V. Our buffer library contained 5 inverting and 6 noninverting buffers of varying power levels. Our experiments show the following. BuffOpt eliminated every noise problem in the design data. DelayOpt was unable to solve identified noise problems. The average delay penalty from using BuffOpt instead of DelayOpt was less than 2%. A. BuffOpt Successfully Avoids Noise We ran BuffOpt on the 500 nets, and observed that BuffOpt identified noise violations for 423 of the nets and successfully inserted buffers to fix all of them. To verify BuffOpt s identification of noise critical nets, we ran 3dnoise on the test data before running BuffOpt. The accurate analysis of 3dnoise identified 386 nets with noise violations, all of which were also identified by BuffOpt. BuffOpt identified more nets with violations, which shows that the noise metric is more conservative. This is due to the fact that the noise metric is an upper bound. To verify the ability of BuffOpt to eliminate noise violations, we ran 3dnoise on the test data after running BuffOpt. 3dnoise identified no noise violations on the data after buffers had been inserted. This data is summarized in Table II. TABLE IV AVERAGE DELAY REDUCTION IN PICOSECONDS FROM BUFFER INSERTION B. Delay Alone is an Insufficient Optimization Our next set of experiments compared BuffOpt to DelayOpt (optimal delay-driven buffer insertion) in terms of how well the algorithms avoided noise. Since BuffOpt never inserted more than four buffers on any net, we ran DelayOpt four times in which no solution was allowed to have more than buffers, and ranged from 1 to 4. Table III compares the solutions generated by DelayOpt(k) with BuffOpt. Observe that running DelayOpt(4) causes the addition of 1126 more buffers than BuffOpt, and yet still has 13 noise violations. Limiting the total number of buffers inserted by DelayOpt only increases the number of noise violations and still causes more buffers to be inserted for. Thus, as we proved in Theorem 2, noise avoidance must be integrated into the problem formulation in order to avoid all noise. Finally, observe that the total CPU time for BuffOpt is actually less than DelayOpt(k) for. This occurs because BuffOpt prunes candidate solutions which violate noise constraints, which gives BuffOpt fewer total candidates to analyze than DelayOpt. The total number of buffers inserted is given by nets with buffers. For example BuffOpt inserted (0 77) (1 161) (2 232) (3 84) (4 2) 717 buffers. C. The Delay Penalty Is Small Our final experiment compares DelayOpt to BuffOpt in terms of total delay. We first ran BuffOpt to see how many buffers were inserted, then ran DelayOpt for the same number of buffers in order to make an apples to apples comparison. We computed the reduction in total delay for each net, and averaged the results by the number of buffers inserted. The cumulative results are presented in Table IV. For example, there were 232 nets for which two buffers were inserted, and on average BuffOpt reduced delay by ps while DelayOpt reduced delay by ps. To find the average delay reduction

12 1644 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 over all 423 instances, we compute the total reduction in delay for all nets and divide by 423. For example, for BuffOpt, the weighted average is (( ) ( ) ( ) (45.5 2))/ From the last column in the table, we see that overall there was an average delay penalty of only 6 ps, or equivalently 1.99%, from avoiding noise. Thus, Buffopt is able to integrate noise into a delay-driven algorithm with virtually no loss in total delay. Recall that in Section IV, we proved the optimality of Algorithm 3 for Problem 2 under the assumption that the buffer library contained only a single buffer type, but that we could not guarantee optimality for a larger buffer library. The DelayOpt results in Table IV form an upper bound on the optimal solution to Problem 2 since it is an optimal algorithm for minimum delay without noise constraints. Thus, we see that even with a buffer library of size 11, BuffOpt returns virtually the optimal delay since it is on average within 2% of an upper bound. VI. CONCLUSIONS Noise is becoming an increasingly critical bottleneck in the design process. In this work, we presented comprehensive buffer insertion techniques for noise and delay optimization. Three algorithms were presented, the first for optimal noise avoidance for single-sink trees, the second for optimal noise avoidance for multi-sink trees, and the third for simultaneous noise and delay optimization. We successfully verified our approach on a modern PowerPC design. We also showed both theoretically and empirically that optimizing delay alone cannot solve all noise problems, and that the performance penalty of optimizing both noise and delay compared to delay alone was only 2% on average. ACKNOWLEDGMENT The authors would like to thank J. Rahmeh for his help with 3dnoise and the PowerPC design data. REFERENCES [1] C. J. Alpert and A. Devgan, Wire segmenting for improved buffer insertion, in Proc. 34th IEEE/ACM Design Automation Conf., 1997, pp [2] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. Reading, MA: Addison-Wesley, [3] C. L. Berman, J. L. Carter, and K. F. Day, The fanout problem: From theory to practice, in Advanced Research in VLSI: Proc Decennial Caltech Conference, C. L. Seitz, Ed. Cambridge, MA; MIT Press, Mar. 1989, pp [4] I. Catt, Crosstalk (noise) in digital systems, IEEE Trans. Electron. Comput., vol. ED-16, pp , [5] C. C. N. Chu and D. F. Wong, Closed form solution to simultaneous buffer insertion/sizing and wire sizing, in Proc. Int. Symp. Physical Design, 1997, pp [6], A new approach to simultaneous buffer insertion and wire sizing, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1997, pp [7] J. Cong, L. He, C.-K. Koh, and P. H. Madden, Performance optimization of VLSI interconnect layout, Integration: The VLSI J., vol. 21, pp. 1 94, [8] A. Devgan, Efficient coupled noise estimation for on-chip interconnects, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1997, pp [9] S. Dhar and M. A. Franklin, Optimum buffer circuits for driving long uniform lines, IEEE J. Solid-State Circuits, vol. 26, pp , Jan [10] W. C. Elmore, The transient response of damped linear network with particular regard to wideband amplifiers, J. Appl. Phys., vol. 19, pp , [11] P. Feldmann and R. W. Fruend, Reduced-order modeling of large linear subcircuits via a block Lanczos algorithm, in Proc. ACM/IEEE Design Automation Conf., 1995, pp [12] L. Gal, On-chip crosstalk The new signal integrity challenge, in Proc. Custom Integrated Circuits Conf., 1995, pp [13] M. Z.-W. Kang, W. W.-M. Dai, T. Dillinger, and D. P. LaPotin, Delay bounded buffered tree construction for timing driven floorplanning, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1997, pp [14] L. N. Kannan, P. R. Suaris, and H.-G. Fang, A methodology and algorithms for post-placement delay optimization, in Proc. 31st IEEE/ACM Design Automation Conf., 1994, pp [15] R. Kay and L. Pileggi, PRIMO: Probability interpretation of moments for delay calculation, in Proc. ACM/IEEE Design Automation Conf., 1998, pp [16] D. Kirkpatrick and A. Sangiovanni-Vincentelli, Techniques for crosstalk avoidance in design of high-performance digital systems, in Proc. IEEE Int. Conf. Computer-Aided Design, 1994, pp [17] J. Lillis, Timing optimization for multi-source nets: Characterization and optimal repeater insertion, in Proc. 34th IEEE/ACM Design Automation Conf., 1997, pp [18] J. Lillis, C.-K. Cheng, and T.-T. Y. Lin, Optimal wire sizing and buffer insertion for low power and a generalized delay model, IEEE J. Solid-State Circuits, vol. 31, pp , Mar [19], Simultaneous routing and buffer insertion for high-performance interconnect, in Proc. 6th Great Lakes Symp. Physical Design, 1996, pp [20] S. Lin and M. Marek-Sadowska, A fast and efficient algorithm for determining fanout trees in large networks, in Proc. European Conf. Design Automation, 1991, pp [21] T. Lin, E. Acar, and L. Pileggi, h-gamma: An RC delay metric based on a gamma distribution approximation of the homogeneous response, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1998, pp [22] L. W. Nagel, SPICE2, a computer program to simulate semiconductor circuits, Univ. California, Berkeley, CA, Tech. Rep. ERL-M520, May [23] T. Okamoto and J. Cong, Interconnect layout optimization by simultaneous Steiner tree construction and buffer insertion, in Proc. 5th ACM/SIGDA Physical Design Workshop, 1996, pp [24] P. Penfield and J. Rubinstein, Signal delay in RC tree networks, in Proc. ACM/IEEE Design Automation Conf., 1981, pp [25] L. T. Pillage and R. A. Rohrer, Asymptotic waveform evaluation for timing analysis, IEEE Trans. Computer-Aided Design, vol. 9, pp , Apr [26] J. Rahmeh, The 3d-noise user guide, IBM, Austin, TX, Internal Rep., [27] C. Ratzlaff and L. T. Pillage, RICE: Rapid interconnect circuit evaluator using asymptotic waveform evaluation, IEEE Trans. Computer- Aided Design, vol. 13, pp , June [28] K. Shephard and V. Narayan, Noise in submicron digital design, in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, 1996, pp [29] T. Stohr, M. Alt, A. Hetzel, and J. Koehl, Analysis, reduction and avoidance of crosstalk on VLSI chips, in Proc. Int. Symp. Physical Design, 1998, pp [30] R. R Tummala and E. J. Ryamszewski, Microelectronics Packaging Handbook. New York: Van Nostrand, Reinhold, [31] L. P. P. P. van Ginneken, Buffer placement in distributed RC-tree networks for minimal Elmore delay, in Proc. Int. Symp. Circuits and Systems, 1990, pp [32] A. Vittal and M. Marek-Sadowska, Crosstalk reduction for VLSI, IEEE Trans. Computer-Aided Design, vol. 16, pp , Mar Charles J. Alpert (S 92 M 96) received the B.S. and the B.A. degrees from Stanford University, Stanford, CA, in He received the Ph.D. degree in computer science at the University of California at Los Angeles in He currently works as a Research Staff Member at the IBM Austin Research Laboratory in Austin, TX. Dr. Alpert received a Best Paper Award at both the 1994 and 1995 ACM/IEEE Design Automation Conferences. He also serves on the program committee for the International Symposium on Physical Design. His research interests include clock distribution, noise analysis, global routing, and timing optimization.

ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1645 Anirudh Devgan (S 91 M 91) received the B.T. degree in electrical engineering from the Indian Institute of Technology, Delhi, India, in 1990 and the M.

He is currently the Manager of electrical analysis and physical design in the IBM Server Group in Austin, TX.

13 ALPERT et al.: BUFFER INSERTION FOR NOISE AND DELAY OPTIMIZATION 1645 Anirudh Devgan (S 91 M 91) received the B.T. degree in electrical engineering from the Indian Institute of Technology, Delhi, India, in 1990 and the M.S. and Ph.D. degrees in electrical and computer engineering from Carnegie Mellon University, Pittsburgh, PA, in 1991 and 1993, respectively. He is currently the Manager of electrical analysis and physical design in the IBM Server Group in Austin, TX. From 1994 to 1999, he was a Research Staff Member in the IBM Research Division, at IBM Thomas J. Watson Research Center, Yorktown Heights, NY, and at IBM Austin Research Laboratory, Austin, TX. His research interests are in the area of design automation of integrated circuits, specifically transistor and interconnect level modeling, analysis, and optimization. Stephen T. Quay received the B.S. degree in electrical engineering and the B.S. degree in computer science from Washington University, St. Louis, MO, in Since 1983, he has worked in many areas of chip layout and analysis for IBM in Endicott, NY, and Austin, TX. Currently, he is an Advisory Engineer in the Server Development Group where he develops design automation applications for interconnect extraction and performance optimization.

Making Fast Buffer Insertion Even Faster Via Approximation Techniques

1A-3 Making Fast Buffer Insertion Even Faster Via Approximation Techniques Zhuo Li 1,C.N.Sze 1, Charles J. Alpert 2, Jiang Hu 1, and Weiping Shi 1 1 Dept. of Electrical Engineering, Texas A&M University,