[14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM

Size: px

Start display at page:

Download "[14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM"

Randall Porter
6 years ago
Views:

1 Journal of High Speed Electronics and Systems, pp65-81, [14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM IEEE Design AUtomation Conference, pp , [15] A. Kahng, J. Cong and G. Robins, High-performance clock routing based on recursive geometric matching, 28th ACM IEEE Design Automation Conference, pp , [16] N. A. Sherwani and B. Wu, Clock layout for high-performance ASIC based on weighted center algorithm, Proc. Fourth Annual IEEE Int. ASIC Conf. & Exhibit, Rochester, NY, September [17] H. Kim and D. Zhou, Efficient Implementation of a Planar Clock Routing with the Treatment of Obstacles'', submitted to IEEE Transaction Computer-Aided Design., 1998 [18] D. Zhou, W. Li and X. Zeng, An Effective Buffer Insertion Algorithm for High-Speed Clock Network, submitted to IEEE Design Automation Conference,

2 References [1] B. Wu and N. Sherwani. Effective Buffer insertion of Clock Tree for High-Speed VLSI Circuits. Microelectronics Journal, 23: , July [2] J. Lillis, C.K. Chen and T.T. Lin, Optimal Wire Sizing and Buffer Insertion for Low Power and a Generalized Delay Model, Proc. IEEE Int. Conf. on Computer-Aided Design, pp ,1995. [3] Joe G. Xi and Wayne W. M. Dai, Buffer Insertion and Sizing Under Process Variations for Low Power Clock Distribution, ACM/IEEE DAC, pp , 1995 [4] W.C. Elmore, The Transient Response of Damped Linear Network with Particular Regard to Wideband Amplifier, J. Applied Physics, 19, pp , [5] J. Rubinstein, P.Penfield and M.A. Horowitz, Signal delay in RC tree networks, 1993IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, Vol. 2, No.3, pp ,1983. [6] S. Pullela, N. Menezes, J. Omar and L. Pillage, Skew and Delay Optimization for Reliable Buffered Clock Trees, Proc. IEEE Int. Conf. on Computer-Aided Design, pp ,1993. [7] J. Chung and C.K. Cheng, Skew Sensitivity Minimization of Buffered Clock Tree, Proc. Int. Conf. on Computer-Aided Design, pp , [8] L.P.P.P. van Ginneken. Buffer placement in distributed RC-tree networks for minimal Elmore delay. International Symposium on Circuits and Systems, pp , [9] Y. P. Chen and D.F. Wong, An algorithm for zero-skew clock tree routing with buffer insertion, Proc. European on Computer-Aided Design, pp , [10] J.D. Cho and M. Sarrafzadeh, A buffer distribution algorithm for high-speed clock routing, Proc. ACM/IEEE Design Automation Conf., pp ,1993. [11] N. Sherwani, Algorithms for VLSI Physical Design Automation, second edition, Kluwer Academic Publishers, pp , [12] Jason Cong, Lei He, Cheng-Kok Koh and Patrick H. Madden. Performance optimization of VLSI interconnect layout. INTEGRATION, the VLSI Journal, 21:1-94, [13] D. Zhou and X.Y. Liu, On the Optimal Drivers of High-Speed Low Power ICs, in International 19

3 Table 4: Results before and after buffer insertion. Chip Sink# Longest Path Length (cm) Initially Inserted Buffer Layer Before Buffer Insertion Delay Skew (τ 0 ) (τ s0 ) After Delay&Skew Minimization Delay Skew (τ) (τ s ) Delay&Skew Improvement Impv1 Impv Chip Chip *all delay and skew data are in ps. τ τ 0 s0 Impv1 = Impv2 = τ τ s *,. In this paper, we derived an optimal buffer placement theory for delay minimization based on the Elmore delay model. We show that the optimal buffer placement for delay minimization is achieved when all delay functions have the same derivative values. Accordingly, we developed a delay minimization algorithm which gives the minimal clock delay. We further developed a skew minimization scheme to minimize the skew of clock signals. To achieve the minimal skew, we also derived a formula to calculate the optimal size of the inserted buffers. Since our approach is based on a path by path fashion, it can be easily used for a more general case where the skew of each individual sink is specified differently. Such a case will be uncounted in ASIC and system on silicon designs. Acknowledgment The authors wish to thank Haksu Kim for his valuable discussions and providing us with the test input examples. 18

4 candidate subpath. Step 2. Apply the tree delay minimization algorithm to optimize the buffer positions on the candidate subpath. Step 3. Apply the buffer sizing algorithm to optimize the sizes of the buffers on the subpath. 7. Results and Conclusions The presented algorithm was implemented in C language on SPARC 20 workstation. The clock routing is generated using the algorithm in [17], where routing obstacles exist and the routing trees are not well balanced and not equal length. Table 3 shows the technology parameters used in our program. We tested a 6mm x 6mm chip and a 1cm x 1cm chip. In each test case, the routing trees have the number of sinks 50, 100, 300 and 500, respectively. Table 4 presents the delay and skew results before and after buffer insertion. Different number of inserted buffer layers were tested and the optimal buffer layers were found. In the table, the delay is measured as the signal propagation time from the source of clock tree to sinks, i.e., the summation of the Elmore delays of the subtrees and the buffer delays along a source-to-sink path. (Please note our delay measurement is different from the slew time used in some publications.) The results show that the reduction of both the delay and skew using the proposed method are in one order of magnitude. For all of the tested cases the skew is reduced to within the scope of 100ps which is necessary for running the clock network at multi-ghz. Table 3: Technology parameters. Sheet Resistance 0.14 Ω/ MaxBuffer output R 5 Ω MinBuffer output R 100 Ω Area Capacitance 0.08 ff/µm 2 MaxBuffer input C 0.01 pf MinBuffer input C 0.01 pf Fringe Capacitance 0.03 ff/µm MaxBuffer intrinsic delay 100 ps MinBuffer intrinsic delay 30 ps Wire Width 10 µm 17

5 j-th buffer B ij, r bj denotes this buffer s output resistance, and τ skew is the specific skew tolerance. If buffer sizing suffices, specific buffer sizing algorithm is given below. 6.2 Buffer Sizing Procedure Suppose there are m candidate buffers in the candidate subpath. Along this subpath from top down the procedure increases the candidate buffer size to their maximal value one by one until the increasing of the size of a buffer leads to a path delay smaller than the minimal delay of the tree. Then, the size of this buffer is determined by Eqn. (18), n 1 ( r bj r bmin )C + ( r j bn r bn )C = τ n pi τ min j = 1 (18) where n 1 ( 1 n m) is the number of buffers having been increased to the maximum size r bmin and r bn is the size of buffer under the consideration (labeled as the n-th buffer). B qj B qm s q v i s 0 B i1 B ij P i B im s i Path P(s 0, s q ) is the optimized path. Path P(s 0, s 1 ) is the minimal delay path P min. B 11 Path P(s 0, s i ) is the path under buffer sizing. P min s 2 B ij to B im can change size for path P(s 0, s i ) s 1 Figure 8: Buffer sizing for skew minimization 6.3 Adding Buffer Strategy When all the buffers in the candidate subpath are increased to their maximum size and the skew is still not within the specification, additional buffers are added to the subpath. The added buffers are placed below the first buffer (counted from the starting branching point of the subpath). Specific procedure is given in the following Subpath Buffer Adding Strategy. Step 1. Apply the Layer Defined Initial Buffer Insertion algorithm to add more layer buffers on the 16

6 Reduce the delay of path P with respect to the minimum delay τ i min by buffer sizing, or adding buffers. Step 3. Repeat Step1 and Step 2 until the skew specification is met. The above algorithm minimizes skew path by path. Paths with smaller delay will be optimized before paths with larger delay. Once a path is optimized, all the buffers on that path will have their sizes and positions fixed such that later optimized paths will not worsen the delay of the previously optimized paths. The algorithm starts with sorting the paths into a queue in an ascending order of their delays. Candidate path (except the minimal delay path) is selected sequentially from the queue for delay reduction. To ensure buffer sizing or adding buffers in the current path does not worsen the delay of the previously optimized paths, the algorithm searches along the current path from the sink upward to the source to meet the first branching point, which locates on a path to which the buffer sizing has been done. The portion of the current path from this branching point to the sink is the candidate subpath. Buffers in the candidate subpath are the candidate buffers for sizing. For example, in Figure 8, P(s 0, s 1 ) is the minimal delay path and path P(s 0, s q ) is the one to which buffer sizing has been done. Suppose P(s 0, s i ) is the current path under the consideration of buffer sizing. Since buffer B 11 is on the minimal delay path and buffer B i1 size has been optimized, their sizes can not be changed again. Only buffers B ij to B im on candidate subpath P(v i,s i ) can change size for the skew optimization of the current path P(s 0, s i ). Eqn. (17) tests whether the delay of the considered path can be reduced to satisfy the specified skew tolerance by setting all the candidate buffers to the maximum buffer size r bmin. m ( r bj r bmin )C τ j pi τ min τ skew j = 1 (17) If Eqn. (17) holds, changing buffer size is enough to meet skew specification. Otherwise, more buffers need to be added to path P(s 0, s i ). In Eqn.(17), we suppose there are m candidate buffers for buffer sizing on path P(s 0, s i ), as shown in Figure 8. Term C j denotes the lumped capacitance of the subtree rooted at the 15

7 Another example is the optimization of path P(s 0, s 2 ). Buffers B 11 and B 12 can not be moved. Buffer B 23 can only be moved towards branching node v 2. Buffer B 24 can be moved between branching node v 2 and sink node s 2. s q s 0 B 11 v 1 B 12 B i1 v 4 v 5 B i2 P i B 23 B im s i v 2 v 3 B 24 s 2 P min s 1 Figure 7: Delay minimization by buffer position optimization The clock tree delay algorithm optimizes the tree delay in a path by path fashion, using the single path delay minimization algorithm (SPDM). Clearly, the delay of the clock tree monotonously decreases due to above introduced constraints on adjusting buffer positions. After the tree delay minimization we obtain a buffered tree with minimal delay, called BTD. 6. Skew Minimization Strategy Skew minimization is achieved by increasing buffer size or adding more buffers (if necessary) to the paths such that delays on these paths decrease towards the value of the minimal delay. To ensure that such a skew minimization strategy do not increase the minimum delay, we kept the sizes and positions of the buffers on the minimum delay path in BTD unchanged. Also we will setup several constraints in buffer sizing and buffer adding to guarantee the convergence of the algorithm. 6.1 Skew Minimization Algorithm Step 1. Sort all the paths in BTD according to their delay in an ascending order τ p1 ( τ τ min ) <... < pi τ <... < pq. Step 2. For i = 2 to q In the current path P i, find the subpath for buffer sizing or buffer adding. 14

8 τ min denote the minimal delay. Let p denote the i-th path to be minimized and τ i pi be the delay of path p i. The algorithm is as follows. Step1. Sort all paths in the BT (buffered tree) according to their delay in an ascending order τ p1 ( τ min, P τ τ min ) <... < pi <... < pq Step2. For i = 1 to q Minimize delay of path p i using the single path delay minimization algorithm under the buffer moving constraints (discussed in Section 5.2.2). Step3. Repeat Step1 and Step2 till minimal delay of the clock tree can no longer be further reduced Moving Constraints for Monotonous Delay Reduction Suppose path p i is the current path for delay minimization. To ensure the delay monotonously decreasing and the convergence of the algorithm, following constraints are enforced in the buffer moving process. A. If a buffer on path p is also on path p i min (the minimal delay path), it can not be moved. B. On path p, from source to sink, the first buffer which is not on path p i min could not be moved down along the path. C. Other buffers on path p i which are not in case A and B, can be moved both up and down, but can not be moved up through branching points and will split into two buffers when moved down through the branching points. Figure 7 illustrates above mentioned constraints. In this example, path P(s 0, s 1 ) is the minimal delay path P mim. Suppose we optimize path P(s 0, s i ) by moving buffers in this path. According to the buffer moving constraints, buffer B 11 can not be moved. Buffer B i1 can only be moved towards branching node v 1, but can not be moved through v 1. Buffer B i1 can not be moved towards branching node v 4, since such a movement will increase the capacitive load rooted at buffer B 11 which will increase the previously optimized minimal delay. Buffer B i2 can be moved either towards v 4 or towards v 5, but can not be moved through v 4. and will be split into two buffers when moving through v 5. 13

9 Table 2: Algorithm of single path delay minimization. Input: Path P(s 0,s i ) (There s k buffers B 1, B 2,..., B k on this path from source s 0 to sink s i ); X : buffer moving step; Output: Optimized path P(s 0,s i ); Procedure PathDelayMinimization ( P(s 0,s i ), k ) τ min Calculate path delay; move_flag 1; while (move_flag = 1) move_flag 0; for i 1 to k do τ Calculate path delay, if buffer B i is moved up X ; if τ < τ min then Move buffer B i up X ; move_flag 1; else τ Calculate path delay, if buffer B i is moved down X ; if τ < τ min then Move buffer B i up X ; move_flag 1; end if; end if; end for; end while; end Procedure; 5.2 Delay Minimization for Clock Tree The algorithm for tree delay minimization by moving buffer positions in the tree is based on single path delay minimization. However, tree delay minimization is much more difficult to solve than single path delay minimization, because one path delay minimization affects the performance of the other paths which may have already been minimized. For example, in Figure 5 after we have done the minimization for P(s 0,s 1 ), the next path to be optimized for delay is P(s 0,s 2 ). If unfortunately the minimization of the delay of P(s 0,s 2 ) needs to move buffer B 1 down towards sink s 2, this movement will increase the delay of P(s 0,s 1 ). To cope with this situation, an effective algorithm should be developed to ensure the convergence of the minimization process Tree Delay Minimization Algorithm Suppose there are q paths in the buffered clock tree. Let p min denote the minimal delay path and 12

10 c tance 2 r and output resistance 2 b of buffer B j2 in delay τ( v is j b 0, v m + q ) j τ 2 = R p c 1 + c 2 + r b j b 2 C p2 j b j (13) c where 1 b is the input capacitance of buffer B j2. To ensure the continuity, τ should equal to, which j 2 τ 1 results in the following two conditions in Eqn.(14) and Eqn.(15), respectively. c b j = c 1 + b j c b j 2 (14) r b ( C j p1 + C p2 ) = r 2 C p2 b j (15) Suppose C p1 = γ C p2. We obtain the size relationship of the two split buffers in Eqn. (16). W b j 1 = γ W 1 + γ b W j 2 b j = W 1 + γ Lb b 1 j j = L 2 = b j L b j (16) L where b, L j 1, L W 2 denote the transistor gate length of the buffer B j, B j1 and B j2, and b, W b j b j 1, W 2 b j j b j denote their gate width. Theorem 3 (Buffer Splitting Theorem): In a buffered clock tree, when a buffer is moved over a branching point, if it is split into two buffers according to the size requirement given by Eqn. (16), all path delays in the clock tree remain the same Algorithm for Single Path Delay Minimization Using Theorem 2 and 3, we develop an iterative algorithm for single path delay minimization. The algorithm repeatedly selects a buffer along the path from source to sink and tests if the path delay can be reduced by changing the buffer position. If the delay is reduced, the buffer is moved to the new position. Otherwise, it stays in its original position. When the last buffer in the path is tested, the algorithm goes back to the first buffer and starts a new iteration until the delay can not be further reduced. When a buffer is moved over a branching point, the split buffers sizes are determined by Eqn. (16). Formally, the single path delay minimization (SPDM) algorithm is given in Table 2. 11

11 applied to ensure the continuity of delay function. In the following, we analyze this problem using the circuit theory. We shall thereafter develop an optimization scheme based on this analysis. s 2 B 1 s 0 B B p 2 v p 1 s 1 ST1 Figure 5. Move buffer B from p 2 to p Buffer Splitting Theory This subsection discusses how to split a buffer into two smaller buffers when it is moved down through a branching node to maintain the continuity of delay function. Consider a generic case shown in Figure 6, where buffer B j locates directly before a branching point v. If f, according to m 1 ( x 1 ) < f 2 ( x 2 ) v m... B B j-1 j Bj B j2... v 0 v 1 v m-1 v m v m+1 v m+q-1 v m+q x 1 x 2 Figure 6. Buffer splitting technique. Theorem 2 the buffer should be moved forward through the branching point to reduce the path delay. Before buffer splitting, in τ( v 0, v m + q ) the portion of delay contributed by the output resistance r bj and input capacitance c bj of buffer B j is given by τ 1 = R p c b + r j b ( C j p1 + C p2 ) (12) where R P is the resistance of the path from v 0 to v m, C P1 and C P2 are lumped capacitance of two subtrees rooted at node v m, respectively. After buffer B j is moved to branches 1 and 2 directly after branching v point m, it is split into two buffer B j1 and B j2. The delay portion τ 2 contributed by the input capaci- 10

12 2 f 1 ( x 1 ) = τ( v 0, v m ) = α 1 x 1 + β 1 x 1 + Γ 1 (7) Similarly, the delay τ( v m, v m + q ) from buffer B j to buffer B j+1 can be written as a function of x 2 in Eqn.(8). 2 f 2 ( x 2 ) = τ( v m, v m + q ) = α 2 x 2 + β 2 x 2 + Γ 2 (8) Note that parameter α 1, α 2, β 1, β 2, Γ 1, Γ 2 are constants for the given path. Combining Eqns. (7) and (8) we obtain the delay from v 0 to v m+q, i.e., τ( v 0, v m + q ) in Eqn. (9) which is a quadratic function of variable x 1. 2 f ( x 1 ) = τ( v 0, v m + q ) = α 1 x 1 + β 1 x 1 + Γ 1 + α 2 ( L j x 1 ) 2 + β 2 ( L j x 1 ) + Γ 2 (9) The minimum delay of is achieved when ( ) = 0, i.e. the following equation holds. f ( x 1 ) f x 1 f 1 ( x 1 ) = f 2 ( x 2 ) (10) The position of buffer B j for minimal delay is therefore given by β 2 β 1 + 2α 2 L j x 1 = ( α 1 + α 2 ) (11) We have the following optimal buffer position theory for delay minimization. Theorem 2 (Optimal Buffer Position Theorem): For a single path from source to sink in a buffered clock tree, the delay between two adjacent buffers (or between the source and the first buffer, or between the last buffer and the sink) is a convex function of the variable which relates to the buffer position in the path. The path delay achieves its minimum when the derivatives of all delay functions are the same. From Theorem 2, one can continuously move a buffer along the path to its optimal position for delay minimization. However, if the optimal position is not in between the branching points v m-1, and v m+1, the buffer may need to be moved over branching points. In such a case, the delay function is no longer a continuous function of the buffer movement. For the example illustrated in Figure 5, v is a branching node of tree T, when buffer B is moved down over v from position p 2 to p 1, the path delay of subtree rooted at B will have a discontinued jump. If B is at the position p 2, the capacitance of the subtree ST1 has to be taken into account in calculating the Elmore delay of the subtree rooted at B, while if B is at p 1, the capacitance of ST1 will not be included. We have seen that when a buffer is moved over a branching point, some special treatment should be 9

13 5. Delay Minimization by Buffer Position Optimization Clock delay minimization by buffer placement is challenged by the fact that buffer position in one path affects the performance of the other paths. We minimize clock tree delay in a path by path fashion. To illustrate the complexity of this problem, we shall start by examining the problem of buffer placement in a single path in a buffered tree. We first derive a theory of the optimal buffer position and then show the need for splitting a buffer when minimizing the path delay. Based on this theory, an iterative algorithm is consequently proposed to minimize the delay of a buffered clock tree. 5.1 Single Path Delay Minimization We now investigate how to achieve the optimal position when one buffer is moved between two branching nodes and all other buffers are fixed at their current position. Especially, we establish a theory for the optimal placement of buffers in a single path for achieving the minimal delay Buffer Position Optimization for delay minimization B j-1 B j B j s 0 v 0 v 1 v m-1 v m v m+1 v m+q-1 v m+q s i x 1 x 2 Figure 4. A buffered subpath Without lose of generality, consider one portion of a path P(s 0,s i ) from source s 0 to sink s i in a buffered clock tree, as shown in Figure 4. Between buffer B j-1 and B j there are m 1 ( m 1) branching points, v 1, v 2,..., v m-1. Between buffer B j and B j+1 there are q 1 ( q 1 ) branching nodes v m+1, v m+2,..., v m+q-1, where v 0, v m and v m+q denote three buffer nodes. Suppose buffer B j locates between v m-1 and v m+1. Let x 1 denote the wire length of e(v m-1, v m ) and x 2 the wire length of e(v m, v m+1 ), respectively. The wire length of e(v m-1, v m+1 ) is a constant L j in the given routing tree, i.e., x 1 + x 2 = L j (6) Using the delay model described in Section 2, the delay τ( v 0, v m ) from buffer B j-1 to buffer B j can be written as a function of x 1 in Eqn.(7). 8

14 4. Initial Buffer Insertion The initial buffer insertion procedure was presented in our previous paper [18]. In the following we briefly Table 1: Algorithm of initial buffer insertion. Initial Buffer Insertion Algorithm Input: An unbuffered tree T 0 and buffer layer k. Output: Buffered Tree BT. Procedure InitialInsertion(T,k) for i 0 to k-1 do BuffSetNew NULL; if i = 0 then T p T; InsertBuffer(); else for each buffer B BuffSetOld T p subtree rooted at B ; InsertBuffer(); end for; end if; BuffSetOld BuffSetNew ; end for; end Procedure; do Procedure InsertBuffer() repeat In T p, find path P(u,v) with the shortest path length l min, excluding the path which contains a buffer; l min Insert a buffer B new into P(u,v) at ; ( k + 1) i Add B new to the set of BuffSetNew; until there s one buffer inserted into every path of T p ; end Procedure; describe it for the purpose of a self-contained paper. The procedure is a top-down (from source to sinks) and level-by-level method to insert k level of buffers in each source-to-sink path. All the buffers choose the same middle sizes. Table 1 shows the pseudocode of the initial buffer insertion algorithm. The following theorem was proved in [18] for this algorithm. Theorem 1: The proposed level-by-level initial buffer insertion procedure inserts the same number of buffers in each source-to-sink path. 7

15 Input T 0, k Initial Buffer Insertion BT Delay minimization by buffer position optimization BTD Skew minimization by adding buffers and changing buffer size Output BTDS Figure 3. Clock delay and skew minimization approach. 1) Initial Buffer Insertion The initial buffer insertion algorithm inserts a number of k level of buffers in each source-to-sink path for a given unbuffered clock tree T 0. The algorithm generates a buffered clock tree called BT. 2) Delay Minimization This stage minimizes the clock delay for an arbitrarily given buffered tree, without adding more buffers or increasing buffer sizes. We develop theories for delay minimization by moving buffer position in the clock tree. The objective of the delay minimization algorithm is to generate a new buffered tree with the minimal delay, called BTD. 3) Skew Minimization In this stage, the sizes and positions of the buffers in the minimum delay path are kept unchanged. We use buffer sizing, and if necessary, add more buffers to minimize skew. During this procedure, we keep the minimum delay unchanged. The result of this stage is a tree with minimal delay and skew, called BTDS. In the following sections, we present details of our approaches of above three procedures. 6

16 Eqn.(4) is the well known Elmore delay [4]. n τ( v 0, v n ) = τ e vi i = 1 (4) Buffer delay τ b is given by the following equation. τ b = d b + r b c b (5) where d b, r b and c b are buffer b s intrinsic delay, output resistance and load capacitance respectively. 3. Delay and Skew Minimization by Buffer Insertion In this section, we first formulate the buffer insertion optimization problem, and then present the three-stage approach to achieve both delay and skew minimization. Buffer Insertion Problem: Given a routed clock tree T 0 and buffer size range from the maximum value r bmax to the minimum value r bmin., determine the number of inserted buffers, their positions and sizes such that clock delay and skew are minimized. In the above definition, we have used the buffer output resistance r as the measurement of buffer size. Since the buffer structure is the cascaded inverters [13], when buffer size is changed, its input capacitance remains the same, only its delay and output resistance change. We assume the buffer size is limited in a range from the maximum value r bmax to the minimal value r bmin, which is the constraint of real design settings. To solve the above defined buffer insertion problem, one strategy is to fix the number of buffers first, say k buffers in each path, and then to find the corresponding buffer positions and sizes. We call this problem a k level buffer insertion problem. We only need to consider this k level buffer insertion problem since we can change the value of k from 1 to a desired number and repeat the same procedure to find the optimal value of k. Considering the number of buffers is upper bounded by a very small number in practical application, we actually do not need to run this procedure many times. In the following, we assume k is a fixed value. Figure 3 sketches our proposed three-stage buffer insertion approach for the k level buffer insertion problem, which consists of initial buffer insertion, delay minimization and skew minimization. 5

17 uted RC line (Figure 2(a)), and the buffer is modeled by a standard RC network (Figure 2(c,d)). The distributed RC line is further modeled by the equivalent π-model [6] as shown in Figure 2(b) (assuming properly segmented), where resistance r e and capacitance c e of wire e are given by Eqns. (1) and (2), respectively. RC r e c e /2 c e /2 (a) (b) Buffer d b r b c b (c) (d) Figure 2: Delay model (a) A distributed RC line; (b) The equi valent π-model; (c) A clock buffer; (d) Buffer model. r e ( l e ) = r s l e w e = r o l e c e ( l e ) = c a l e w e + 2c f ( l e + w e ) = ( c a w e + 2c f ) l e = c 0 l e (1) (2) where l e >> w e, and l e and w e are the length and width of wire e respectively. The constant parameters, r s c a and c f, are the sheet resistance, unit-area capacitance and fringe capacitance for a unit-width unit-length wire segment, respectively. From above definition, clearly term r o denotes the unit-length wire resistance and c o denotes the unit-length wire capacitance. For calculating the delay of wire e v which enters node v from its parent node, Eqn.(3) is used for the r approximation, where ev c and ev are the resistance and capacitance of the wire e v, and C(T v ) is the lumped capacitance of the subtree T v rooted at node v. c e τ( e v ) = r v e C( T v 2 v ) (3) Supposing a path P(v 0,v n ) consisting of a number of n wires e, e,..., e v1 v2 vn, the path delay of P(v 0,v n ) is given by Eqn. (4). 4

18 b j u e v v :. s 0 s0 b i+1 : : b i. : :. b 1 (a) Unbuffered routing tree (b) Buffered routing tree Figure 1. A clock tree before and after buffer insertion. tree T(V,E), which consists of wire set E and node set V={{s 0 } SI IN}, where s 0 is the unique source node, IN is the set of branching nodes (or internal nodes) and SI is the set of sink nodes. Definition 2 (Buffered routing tree): After buffers have been inserted into the routing tree, the tree is called a buffered routing tree BT (Figure 1(b)), where the inserted buffers become the new nodes, called buffer nodes in the tree. The node set V of BT is now indicated by {{s 0 } SI IN BN}, where BN is the set of buffer nodes. Definition 3 (Wire): A wire e v (e v E) is a directed edge (u,v) (u,v V), where the signal propagates from u to v. Node u is called the parent of node v, and node v is called the child of node u. Definition 4 (Path): A path P(v 0,v n ) from node v 0 to node v n in clock tree T is a sequence of nodes v 0, The delay model used in this paper is given in Figure 2. A wire in a clock tree is modeled by a distrib- v 1, v 2,..., v n V and wires e v1, e v2,..., e vn V such that e links v i-1 and v i and e vi vi e v j, i, j, i j. Its length is denoted by P l (v 0,v n ). Definition 5 (Buffer layer): In a clock tree T, if k buffers are inserted into a source-to-sink path P(s 0,s i ), we say that there are k-layer (k-level) of buffers on path P(s 0,s i ). From source s 0 to sink s i, the j-th (j=1,2,...,k) buffer is called j-th layer (level) buffer. 2.2 Delay Model 3

19 as shown by our optimal buffer position theory in this paper. Algorithms proposed in [9,10,11] insert the same number of buffers in each source-to-sink paths. They further assume the buffers at the same level have the same size. Undoubtedly this buffer insertion strategy helps to reduce skew sensitivity to process variations[12], but this strategy requires an almost balanced routing tree topology. Otherwise, it can not effectively reduce the skew to any specified tolerance. The balanced buffer insertion scheme [3] attempts to partition the clock tree into several subtrees so that every subtree has equal path-length and all source-tosink paths have equal number of levels. This scheme requires an equal path length routing tree. In many practical design situation, it is impossible to construct an equal-path-length tree or well-balanced tree in a routing plane if there exist pre-placed cell modules [17]. Furthermore, an equal-path-length clock tree does not guarantee the minimal delay and skew due to the fact that circuit delay depends not only on the path length but also on the circuit topology [18]. This paper aims to solve the optimal buffer insertion problem from both the theoretical and practical design point of view. Contrast to the previous research, we aim to establish our buffer insertion theories, methodologies and algorithms based on real circuit design settings, where the clock routing tree is neither well balanced nor equal-length. The rest of the paper is organized as follows. Section 2 introduces the terminologies and the delay model used throughout this paper. Section 3 presents the overall approach for delay and skew minimization by buffer insertion. Section 4 presents the initial buffer insertion strategy. Section 5 proposes the theories and algorithms for delay minimization by buffer position optimization. Section 6 develops the strategy for skew minimization. Section 7 shows the experimental results and concludes the paper. 2. Definitions and Delay Model We start by giving a synopsis of the basic definitions which will be used throughout this paper. 2.1 Definition A tree is denoted by T(V,E), where V is the set of nodes and E is the set of edges. Definition 1 (Unbuffered routing tree): An unbuffered routing tree (Figure 1(a)) is a directed binary 2

20 Buffer Insertion for Clock Delay and Skew Minimization * X. Zeng 1, 2, D. Zhou 1, Wei Li 1 1 Department of Electrical and Computer Engineering University of North Carolina at Charlotte, NC Department of Electronic Engineering Fudan University, Shanghai , China. Abstract Buffer insertion is an effective approach to achieve both minimal clock signal delay and skew in high speed VLSI circuit design. In this paper, we develop an optimal buffer insertion and sizing scheme. Particularly, we show that buffer to buffer delay is a convex function of buffer positions in a clock tree. The minimal clock delay can be obtained by equalizing derivatives of this convex function, and the minimal skew can be obtained by equalizing delay functions of different source to sink paths. Based on this theory, we further develop a three-stage method to initially insert buffers in a given clock tree, minimize delay by optimizing buffer positions, and minimize skew by buffer level augment and buffer size refinement. The presented algorithm achieves both minimal delay and skew in real clock tree design. 1. Introduction In the coming century, the clock rate of microprocessors will approach multi-ghz. This trend will last as process technology moves into the deep submicron level. The drastically increased requirement for high performance and high speed VLSI circuits has posed challenges in the design of high speed clock network, where clock delay and skew minimization has been a critical problem. Clock delay and skew can be optimized by good routing strategy [14,17] and effective buffer insertion [1,2]. Early research [14-16] have proposed algorithms to minimize skew by properly balance the length of the path from source to sinks. To further reduce the clock signal delay and transmission line effect, buffer insertion is necessary. The objective of buffer insertion aims to find the proper buffer numbers, sizes and placement in a given clock tree. The fix-position algorithms proposed in [6,7,8] insert buffers at either branching points in a clock tree. However, the optimal buffer positions for minimal delay do not necessarily reside on the branching points, * This research is supported in part by AirForce Office of Scientific Research grant F

21 Title: Buffer Insertion for Clock Delay and Skew Minimization Abstract: Buffer insertion is an effective approach to achieve both minimal clock signal delay and skew in high speed VLSI circuit design. In this paper, we develop an optimal buffer insertion and sizing scheme. Particularly, we show that buffer to buffer delay is a convex function of buffer positions in a clock tree. The minimal clock delay can be obtained by equalizing derivatives of this convex function, and the minimal skew can be obtained by equalizing delay functions of different source to sink paths. Based on this theory, we further develop a three-stage method to initially insert buffers in a given clock tree, minimize delay by optimizing buffer positions, and minimize skew by buffer level augment and buffer size refinement. The presented algorithm achieves both minimal delay and skew in real clock tree design. Contact Author: Name: Dian Zhou Address: Electrical and Computer Engineering Dept. University of North Carolina at Charlotte 9201 University City Boulevard Charlotte, NC Telephone: (704) zhou@uncc.edu 0

Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles

Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles Ying Rao, Tianxiang Yang University of Wisconsin Madison {yrao, tyang}@cae.wisc.edu ABSTRACT Buffer