[14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM

Size: px
Start display at page:

Download "[14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM"

Transcription

1 Journal of High Speed Electronics and Systems, pp65-81, [14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM IEEE Design AUtomation Conference, pp , [15] A. Kahng, J. Cong and G. Robins, High-performance clock routing based on recursive geometric matching, 28th ACM IEEE Design Automation Conference, pp , [16] N. A. Sherwani and B. Wu, Clock layout for high-performance ASIC based on weighted center algorithm, Proc. Fourth Annual IEEE Int. ASIC Conf. & Exhibit, Rochester, NY, September [17] H. Kim and D. Zhou, Efficient Implementation of a Planar Clock Routing with the Treatment of Obstacles'', submitted to IEEE Transaction Computer-Aided Design., 1998 [18] D. Zhou, W. Li and X. Zeng, An Effective Buffer Insertion Algorithm for High-Speed Clock Network, submitted to IEEE Design Automation Conference,

2 References [1] B. Wu and N. Sherwani. Effective Buffer insertion of Clock Tree for High-Speed VLSI Circuits. Microelectronics Journal, 23: , July [2] J. Lillis, C.K. Chen and T.T. Lin, Optimal Wire Sizing and Buffer Insertion for Low Power and a Generalized Delay Model, Proc. IEEE Int. Conf. on Computer-Aided Design, pp ,1995. [3] Joe G. Xi and Wayne W. M. Dai, Buffer Insertion and Sizing Under Process Variations for Low Power Clock Distribution, ACM/IEEE DAC, pp , 1995 [4] W.C. Elmore, The Transient Response of Damped Linear Network with Particular Regard to Wideband Amplifier, J. Applied Physics, 19, pp , [5] J. Rubinstein, P.Penfield and M.A. Horowitz, Signal delay in RC tree networks, 1993IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, Vol. 2, No.3, pp ,1983. [6] S. Pullela, N. Menezes, J. Omar and L. Pillage, Skew and Delay Optimization for Reliable Buffered Clock Trees, Proc. IEEE Int. Conf. on Computer-Aided Design, pp ,1993. [7] J. Chung and C.K. Cheng, Skew Sensitivity Minimization of Buffered Clock Tree, Proc. Int. Conf. on Computer-Aided Design, pp , [8] L.P.P.P. van Ginneken. Buffer placement in distributed RC-tree networks for minimal Elmore delay. International Symposium on Circuits and Systems, pp , [9] Y. P. Chen and D.F. Wong, An algorithm for zero-skew clock tree routing with buffer insertion, Proc. European on Computer-Aided Design, pp , [10] J.D. Cho and M. Sarrafzadeh, A buffer distribution algorithm for high-speed clock routing, Proc. ACM/IEEE Design Automation Conf., pp ,1993. [11] N. Sherwani, Algorithms for VLSI Physical Design Automation, second edition, Kluwer Academic Publishers, pp , [12] Jason Cong, Lei He, Cheng-Kok Koh and Patrick H. Madden. Performance optimization of VLSI interconnect layout. INTEGRATION, the VLSI Journal, 21:1-94, [13] D. Zhou and X.Y. Liu, On the Optimal Drivers of High-Speed Low Power ICs, in International 19

3 Table 4: Results before and after buffer insertion. Chip Sink# Longest Path Length (cm) Initially Inserted Buffer Layer Before Buffer Insertion Delay Skew (τ 0 ) (τ s0 ) After Delay&Skew Minimization Delay Skew (τ) (τ s ) Delay&Skew Improvement Impv1 Impv Chip Chip *all delay and skew data are in ps. τ τ 0 s0 Impv1 = Impv2 = τ τ s *,. In this paper, we derived an optimal buffer placement theory for delay minimization based on the Elmore delay model. We show that the optimal buffer placement for delay minimization is achieved when all delay functions have the same derivative values. Accordingly, we developed a delay minimization algorithm which gives the minimal clock delay. We further developed a skew minimization scheme to minimize the skew of clock signals. To achieve the minimal skew, we also derived a formula to calculate the optimal size of the inserted buffers. Since our approach is based on a path by path fashion, it can be easily used for a more general case where the skew of each individual sink is specified differently. Such a case will be uncounted in ASIC and system on silicon designs. Acknowledgment The authors wish to thank Haksu Kim for his valuable discussions and providing us with the test input examples. 18

4 candidate subpath. Step 2. Apply the tree delay minimization algorithm to optimize the buffer positions on the candidate subpath. Step 3. Apply the buffer sizing algorithm to optimize the sizes of the buffers on the subpath. 7. Results and Conclusions The presented algorithm was implemented in C language on SPARC 20 workstation. The clock routing is generated using the algorithm in [17], where routing obstacles exist and the routing trees are not well balanced and not equal length. Table 3 shows the technology parameters used in our program. We tested a 6mm x 6mm chip and a 1cm x 1cm chip. In each test case, the routing trees have the number of sinks 50, 100, 300 and 500, respectively. Table 4 presents the delay and skew results before and after buffer insertion. Different number of inserted buffer layers were tested and the optimal buffer layers were found. In the table, the delay is measured as the signal propagation time from the source of clock tree to sinks, i.e., the summation of the Elmore delays of the subtrees and the buffer delays along a source-to-sink path. (Please note our delay measurement is different from the slew time used in some publications.) The results show that the reduction of both the delay and skew using the proposed method are in one order of magnitude. For all of the tested cases the skew is reduced to within the scope of 100ps which is necessary for running the clock network at multi-ghz. Table 3: Technology parameters. Sheet Resistance 0.14 Ω/ MaxBuffer output R 5 Ω MinBuffer output R 100 Ω Area Capacitance 0.08 ff/µm 2 MaxBuffer input C 0.01 pf MinBuffer input C 0.01 pf Fringe Capacitance 0.03 ff/µm MaxBuffer intrinsic delay 100 ps MinBuffer intrinsic delay 30 ps Wire Width 10 µm 17

5 j-th buffer B ij, r bj denotes this buffer s output resistance, and τ skew is the specific skew tolerance. If buffer sizing suffices, specific buffer sizing algorithm is given below. 6.2 Buffer Sizing Procedure Suppose there are m candidate buffers in the candidate subpath. Along this subpath from top down the procedure increases the candidate buffer size to their maximal value one by one until the increasing of the size of a buffer leads to a path delay smaller than the minimal delay of the tree. Then, the size of this buffer is determined by Eqn. (18), n 1 ( r bj r bmin )C + ( r j bn r bn )C = τ n pi τ min j = 1 (18) where n 1 ( 1 n m) is the number of buffers having been increased to the maximum size r bmin and r bn is the size of buffer under the consideration (labeled as the n-th buffer). B qj B qm s q v i s 0 B i1 B ij P i B im s i Path P(s 0, s q ) is the optimized path. Path P(s 0, s 1 ) is the minimal delay path P min. B 11 Path P(s 0, s i ) is the path under buffer sizing. P min s 2 B ij to B im can change size for path P(s 0, s i ) s 1 Figure 8: Buffer sizing for skew minimization 6.3 Adding Buffer Strategy When all the buffers in the candidate subpath are increased to their maximum size and the skew is still not within the specification, additional buffers are added to the subpath. The added buffers are placed below the first buffer (counted from the starting branching point of the subpath). Specific procedure is given in the following Subpath Buffer Adding Strategy. Step 1. Apply the Layer Defined Initial Buffer Insertion algorithm to add more layer buffers on the 16

6 Reduce the delay of path P with respect to the minimum delay τ i min by buffer sizing, or adding buffers. Step 3. Repeat Step1 and Step 2 until the skew specification is met. The above algorithm minimizes skew path by path. Paths with smaller delay will be optimized before paths with larger delay. Once a path is optimized, all the buffers on that path will have their sizes and positions fixed such that later optimized paths will not worsen the delay of the previously optimized paths. The algorithm starts with sorting the paths into a queue in an ascending order of their delays. Candidate path (except the minimal delay path) is selected sequentially from the queue for delay reduction. To ensure buffer sizing or adding buffers in the current path does not worsen the delay of the previously optimized paths, the algorithm searches along the current path from the sink upward to the source to meet the first branching point, which locates on a path to which the buffer sizing has been done. The portion of the current path from this branching point to the sink is the candidate subpath. Buffers in the candidate subpath are the candidate buffers for sizing. For example, in Figure 8, P(s 0, s 1 ) is the minimal delay path and path P(s 0, s q ) is the one to which buffer sizing has been done. Suppose P(s 0, s i ) is the current path under the consideration of buffer sizing. Since buffer B 11 is on the minimal delay path and buffer B i1 size has been optimized, their sizes can not be changed again. Only buffers B ij to B im on candidate subpath P(v i,s i ) can change size for the skew optimization of the current path P(s 0, s i ). Eqn. (17) tests whether the delay of the considered path can be reduced to satisfy the specified skew tolerance by setting all the candidate buffers to the maximum buffer size r bmin. m ( r bj r bmin )C τ j pi τ min τ skew j = 1 (17) If Eqn. (17) holds, changing buffer size is enough to meet skew specification. Otherwise, more buffers need to be added to path P(s 0, s i ). In Eqn.(17), we suppose there are m candidate buffers for buffer sizing on path P(s 0, s i ), as shown in Figure 8. Term C j denotes the lumped capacitance of the subtree rooted at the 15

7 Another example is the optimization of path P(s 0, s 2 ). Buffers B 11 and B 12 can not be moved. Buffer B 23 can only be moved towards branching node v 2. Buffer B 24 can be moved between branching node v 2 and sink node s 2. s q s 0 B 11 v 1 B 12 B i1 v 4 v 5 B i2 P i B 23 B im s i v 2 v 3 B 24 s 2 P min s 1 Figure 7: Delay minimization by buffer position optimization The clock tree delay algorithm optimizes the tree delay in a path by path fashion, using the single path delay minimization algorithm (SPDM). Clearly, the delay of the clock tree monotonously decreases due to above introduced constraints on adjusting buffer positions. After the tree delay minimization we obtain a buffered tree with minimal delay, called BTD. 6. Skew Minimization Strategy Skew minimization is achieved by increasing buffer size or adding more buffers (if necessary) to the paths such that delays on these paths decrease towards the value of the minimal delay. To ensure that such a skew minimization strategy do not increase the minimum delay, we kept the sizes and positions of the buffers on the minimum delay path in BTD unchanged. Also we will setup several constraints in buffer sizing and buffer adding to guarantee the convergence of the algorithm. 6.1 Skew Minimization Algorithm Step 1. Sort all the paths in BTD according to their delay in an ascending order τ p1 ( τ τ min ) <... < pi τ <... < pq. Step 2. For i = 2 to q In the current path P i, find the subpath for buffer sizing or buffer adding. 14

8 τ min denote the minimal delay. Let p denote the i-th path to be minimized and τ i pi be the delay of path p i. The algorithm is as follows. Step1. Sort all paths in the BT (buffered tree) according to their delay in an ascending order τ p1 ( τ min, P τ τ min ) <... < pi <... < pq Step2. For i = 1 to q Minimize delay of path p i using the single path delay minimization algorithm under the buffer moving constraints (discussed in Section 5.2.2). Step3. Repeat Step1 and Step2 till minimal delay of the clock tree can no longer be further reduced Moving Constraints for Monotonous Delay Reduction Suppose path p i is the current path for delay minimization. To ensure the delay monotonously decreasing and the convergence of the algorithm, following constraints are enforced in the buffer moving process. A. If a buffer on path p is also on path p i min (the minimal delay path), it can not be moved. B. On path p, from source to sink, the first buffer which is not on path p i min could not be moved down along the path. C. Other buffers on path p i which are not in case A and B, can be moved both up and down, but can not be moved up through branching points and will split into two buffers when moved down through the branching points. Figure 7 illustrates above mentioned constraints. In this example, path P(s 0, s 1 ) is the minimal delay path P mim. Suppose we optimize path P(s 0, s i ) by moving buffers in this path. According to the buffer moving constraints, buffer B 11 can not be moved. Buffer B i1 can only be moved towards branching node v 1, but can not be moved through v 1. Buffer B i1 can not be moved towards branching node v 4, since such a movement will increase the capacitive load rooted at buffer B 11 which will increase the previously optimized minimal delay. Buffer B i2 can be moved either towards v 4 or towards v 5, but can not be moved through v 4. and will be split into two buffers when moving through v 5. 13

9 Table 2: Algorithm of single path delay minimization. Input: Path P(s 0,s i ) (There s k buffers B 1, B 2,..., B k on this path from source s 0 to sink s i ); X : buffer moving step; Output: Optimized path P(s 0,s i ); Procedure PathDelayMinimization ( P(s 0,s i ), k ) τ min Calculate path delay; move_flag 1; while (move_flag = 1) move_flag 0; for i 1 to k do τ Calculate path delay, if buffer B i is moved up X ; if τ < τ min then Move buffer B i up X ; move_flag 1; else τ Calculate path delay, if buffer B i is moved down X ; if τ < τ min then Move buffer B i up X ; move_flag 1; end if; end if; end for; end while; end Procedure; 5.2 Delay Minimization for Clock Tree The algorithm for tree delay minimization by moving buffer positions in the tree is based on single path delay minimization. However, tree delay minimization is much more difficult to solve than single path delay minimization, because one path delay minimization affects the performance of the other paths which may have already been minimized. For example, in Figure 5 after we have done the minimization for P(s 0,s 1 ), the next path to be optimized for delay is P(s 0,s 2 ). If unfortunately the minimization of the delay of P(s 0,s 2 ) needs to move buffer B 1 down towards sink s 2, this movement will increase the delay of P(s 0,s 1 ). To cope with this situation, an effective algorithm should be developed to ensure the convergence of the minimization process Tree Delay Minimization Algorithm Suppose there are q paths in the buffered clock tree. Let p min denote the minimal delay path and 12

10 c tance 2 r and output resistance 2 b of buffer B j2 in delay τ( v is j b 0, v m + q ) j τ 2 = R p c 1 + c 2 + r b j b 2 C p2 j b j (13) c where 1 b is the input capacitance of buffer B j2. To ensure the continuity, τ should equal to, which j 2 τ 1 results in the following two conditions in Eqn.(14) and Eqn.(15), respectively. c b j = c 1 + b j c b j 2 (14) r b ( C j p1 + C p2 ) = r 2 C p2 b j (15) Suppose C p1 = γ C p2. We obtain the size relationship of the two split buffers in Eqn. (16). W b j 1 = γ W 1 + γ b W j 2 b j = W 1 + γ Lb b 1 j j = L 2 = b j L b j (16) L where b, L j 1, L W 2 denote the transistor gate length of the buffer B j, B j1 and B j2, and b, W b j b j 1, W 2 b j j b j denote their gate width. Theorem 3 (Buffer Splitting Theorem): In a buffered clock tree, when a buffer is moved over a branching point, if it is split into two buffers according to the size requirement given by Eqn. (16), all path delays in the clock tree remain the same Algorithm for Single Path Delay Minimization Using Theorem 2 and 3, we develop an iterative algorithm for single path delay minimization. The algorithm repeatedly selects a buffer along the path from source to sink and tests if the path delay can be reduced by changing the buffer position. If the delay is reduced, the buffer is moved to the new position. Otherwise, it stays in its original position. When the last buffer in the path is tested, the algorithm goes back to the first buffer and starts a new iteration until the delay can not be further reduced. When a buffer is moved over a branching point, the split buffers sizes are determined by Eqn. (16). Formally, the single path delay minimization (SPDM) algorithm is given in Table 2. 11

11 applied to ensure the continuity of delay function. In the following, we analyze this problem using the circuit theory. We shall thereafter develop an optimization scheme based on this analysis. s 2 B 1 s 0 B B p 2 v p 1 s 1 ST1 Figure 5. Move buffer B from p 2 to p Buffer Splitting Theory This subsection discusses how to split a buffer into two smaller buffers when it is moved down through a branching node to maintain the continuity of delay function. Consider a generic case shown in Figure 6, where buffer B j locates directly before a branching point v. If f, according to m 1 ( x 1 ) < f 2 ( x 2 ) v m... B B j-1 j Bj B j2... v 0 v 1 v m-1 v m v m+1 v m+q-1 v m+q x 1 x 2 Figure 6. Buffer splitting technique. Theorem 2 the buffer should be moved forward through the branching point to reduce the path delay. Before buffer splitting, in τ( v 0, v m + q ) the portion of delay contributed by the output resistance r bj and input capacitance c bj of buffer B j is given by τ 1 = R p c b + r j b ( C j p1 + C p2 ) (12) where R P is the resistance of the path from v 0 to v m, C P1 and C P2 are lumped capacitance of two subtrees rooted at node v m, respectively. After buffer B j is moved to branches 1 and 2 directly after branching v point m, it is split into two buffer B j1 and B j2. The delay portion τ 2 contributed by the input capaci- 10

12 2 f 1 ( x 1 ) = τ( v 0, v m ) = α 1 x 1 + β 1 x 1 + Γ 1 (7) Similarly, the delay τ( v m, v m + q ) from buffer B j to buffer B j+1 can be written as a function of x 2 in Eqn.(8). 2 f 2 ( x 2 ) = τ( v m, v m + q ) = α 2 x 2 + β 2 x 2 + Γ 2 (8) Note that parameter α 1, α 2, β 1, β 2, Γ 1, Γ 2 are constants for the given path. Combining Eqns. (7) and (8) we obtain the delay from v 0 to v m+q, i.e., τ( v 0, v m + q ) in Eqn. (9) which is a quadratic function of variable x 1. 2 f ( x 1 ) = τ( v 0, v m + q ) = α 1 x 1 + β 1 x 1 + Γ 1 + α 2 ( L j x 1 ) 2 + β 2 ( L j x 1 ) + Γ 2 (9) The minimum delay of is achieved when ( ) = 0, i.e. the following equation holds. f ( x 1 ) f x 1 f 1 ( x 1 ) = f 2 ( x 2 ) (10) The position of buffer B j for minimal delay is therefore given by β 2 β 1 + 2α 2 L j x 1 = ( α 1 + α 2 ) (11) We have the following optimal buffer position theory for delay minimization. Theorem 2 (Optimal Buffer Position Theorem): For a single path from source to sink in a buffered clock tree, the delay between two adjacent buffers (or between the source and the first buffer, or between the last buffer and the sink) is a convex function of the variable which relates to the buffer position in the path. The path delay achieves its minimum when the derivatives of all delay functions are the same. From Theorem 2, one can continuously move a buffer along the path to its optimal position for delay minimization. However, if the optimal position is not in between the branching points v m-1, and v m+1, the buffer may need to be moved over branching points. In such a case, the delay function is no longer a continuous function of the buffer movement. For the example illustrated in Figure 5, v is a branching node of tree T, when buffer B is moved down over v from position p 2 to p 1, the path delay of subtree rooted at B will have a discontinued jump. If B is at the position p 2, the capacitance of the subtree ST1 has to be taken into account in calculating the Elmore delay of the subtree rooted at B, while if B is at p 1, the capacitance of ST1 will not be included. We have seen that when a buffer is moved over a branching point, some special treatment should be 9

13 5. Delay Minimization by Buffer Position Optimization Clock delay minimization by buffer placement is challenged by the fact that buffer position in one path affects the performance of the other paths. We minimize clock tree delay in a path by path fashion. To illustrate the complexity of this problem, we shall start by examining the problem of buffer placement in a single path in a buffered tree. We first derive a theory of the optimal buffer position and then show the need for splitting a buffer when minimizing the path delay. Based on this theory, an iterative algorithm is consequently proposed to minimize the delay of a buffered clock tree. 5.1 Single Path Delay Minimization We now investigate how to achieve the optimal position when one buffer is moved between two branching nodes and all other buffers are fixed at their current position. Especially, we establish a theory for the optimal placement of buffers in a single path for achieving the minimal delay Buffer Position Optimization for delay minimization B j-1 B j B j s 0 v 0 v 1 v m-1 v m v m+1 v m+q-1 v m+q s i x 1 x 2 Figure 4. A buffered subpath Without lose of generality, consider one portion of a path P(s 0,s i ) from source s 0 to sink s i in a buffered clock tree, as shown in Figure 4. Between buffer B j-1 and B j there are m 1 ( m 1) branching points, v 1, v 2,..., v m-1. Between buffer B j and B j+1 there are q 1 ( q 1 ) branching nodes v m+1, v m+2,..., v m+q-1, where v 0, v m and v m+q denote three buffer nodes. Suppose buffer B j locates between v m-1 and v m+1. Let x 1 denote the wire length of e(v m-1, v m ) and x 2 the wire length of e(v m, v m+1 ), respectively. The wire length of e(v m-1, v m+1 ) is a constant L j in the given routing tree, i.e., x 1 + x 2 = L j (6) Using the delay model described in Section 2, the delay τ( v 0, v m ) from buffer B j-1 to buffer B j can be written as a function of x 1 in Eqn.(7). 8

14 4. Initial Buffer Insertion The initial buffer insertion procedure was presented in our previous paper [18]. In the following we briefly Table 1: Algorithm of initial buffer insertion. Initial Buffer Insertion Algorithm Input: An unbuffered tree T 0 and buffer layer k. Output: Buffered Tree BT. Procedure InitialInsertion(T,k) for i 0 to k-1 do BuffSetNew NULL; if i = 0 then T p T; InsertBuffer(); else for each buffer B BuffSetOld T p subtree rooted at B ; InsertBuffer(); end for; end if; BuffSetOld BuffSetNew ; end for; end Procedure; do Procedure InsertBuffer() repeat In T p, find path P(u,v) with the shortest path length l min, excluding the path which contains a buffer; l min Insert a buffer B new into P(u,v) at ; ( k + 1) i Add B new to the set of BuffSetNew; until there s one buffer inserted into every path of T p ; end Procedure; describe it for the purpose of a self-contained paper. The procedure is a top-down (from source to sinks) and level-by-level method to insert k level of buffers in each source-to-sink path. All the buffers choose the same middle sizes. Table 1 shows the pseudocode of the initial buffer insertion algorithm. The following theorem was proved in [18] for this algorithm. Theorem 1: The proposed level-by-level initial buffer insertion procedure inserts the same number of buffers in each source-to-sink path. 7

15 Input T 0, k Initial Buffer Insertion BT Delay minimization by buffer position optimization BTD Skew minimization by adding buffers and changing buffer size Output BTDS Figure 3. Clock delay and skew minimization approach. 1) Initial Buffer Insertion The initial buffer insertion algorithm inserts a number of k level of buffers in each source-to-sink path for a given unbuffered clock tree T 0. The algorithm generates a buffered clock tree called BT. 2) Delay Minimization This stage minimizes the clock delay for an arbitrarily given buffered tree, without adding more buffers or increasing buffer sizes. We develop theories for delay minimization by moving buffer position in the clock tree. The objective of the delay minimization algorithm is to generate a new buffered tree with the minimal delay, called BTD. 3) Skew Minimization In this stage, the sizes and positions of the buffers in the minimum delay path are kept unchanged. We use buffer sizing, and if necessary, add more buffers to minimize skew. During this procedure, we keep the minimum delay unchanged. The result of this stage is a tree with minimal delay and skew, called BTDS. In the following sections, we present details of our approaches of above three procedures. 6

16 Eqn.(4) is the well known Elmore delay [4]. n τ( v 0, v n ) = τ e vi i = 1 (4) Buffer delay τ b is given by the following equation. τ b = d b + r b c b (5) where d b, r b and c b are buffer b s intrinsic delay, output resistance and load capacitance respectively. 3. Delay and Skew Minimization by Buffer Insertion In this section, we first formulate the buffer insertion optimization problem, and then present the three-stage approach to achieve both delay and skew minimization. Buffer Insertion Problem: Given a routed clock tree T 0 and buffer size range from the maximum value r bmax to the minimum value r bmin., determine the number of inserted buffers, their positions and sizes such that clock delay and skew are minimized. In the above definition, we have used the buffer output resistance r as the measurement of buffer size. Since the buffer structure is the cascaded inverters [13], when buffer size is changed, its input capacitance remains the same, only its delay and output resistance change. We assume the buffer size is limited in a range from the maximum value r bmax to the minimal value r bmin, which is the constraint of real design settings. To solve the above defined buffer insertion problem, one strategy is to fix the number of buffers first, say k buffers in each path, and then to find the corresponding buffer positions and sizes. We call this problem a k level buffer insertion problem. We only need to consider this k level buffer insertion problem since we can change the value of k from 1 to a desired number and repeat the same procedure to find the optimal value of k. Considering the number of buffers is upper bounded by a very small number in practical application, we actually do not need to run this procedure many times. In the following, we assume k is a fixed value. Figure 3 sketches our proposed three-stage buffer insertion approach for the k level buffer insertion problem, which consists of initial buffer insertion, delay minimization and skew minimization. 5

17 uted RC line (Figure 2(a)), and the buffer is modeled by a standard RC network (Figure 2(c,d)). The distributed RC line is further modeled by the equivalent π-model [6] as shown in Figure 2(b) (assuming properly segmented), where resistance r e and capacitance c e of wire e are given by Eqns. (1) and (2), respectively. RC r e c e /2 c e /2 (a) (b) Buffer d b r b c b (c) (d) Figure 2: Delay model (a) A distributed RC line; (b) The equi valent π-model; (c) A clock buffer; (d) Buffer model. r e ( l e ) = r s l e w e = r o l e c e ( l e ) = c a l e w e + 2c f ( l e + w e ) = ( c a w e + 2c f ) l e = c 0 l e (1) (2) where l e >> w e, and l e and w e are the length and width of wire e respectively. The constant parameters, r s c a and c f, are the sheet resistance, unit-area capacitance and fringe capacitance for a unit-width unit-length wire segment, respectively. From above definition, clearly term r o denotes the unit-length wire resistance and c o denotes the unit-length wire capacitance. For calculating the delay of wire e v which enters node v from its parent node, Eqn.(3) is used for the r approximation, where ev c and ev are the resistance and capacitance of the wire e v, and C(T v ) is the lumped capacitance of the subtree T v rooted at node v. c e τ( e v ) = r v e C( T v 2 v ) (3) Supposing a path P(v 0,v n ) consisting of a number of n wires e, e,..., e v1 v2 vn, the path delay of P(v 0,v n ) is given by Eqn. (4). 4

18 b j u e v v :. s 0 s0 b i+1 : : b i. : :. b 1 (a) Unbuffered routing tree (b) Buffered routing tree Figure 1. A clock tree before and after buffer insertion. tree T(V,E), which consists of wire set E and node set V={{s 0 } SI IN}, where s 0 is the unique source node, IN is the set of branching nodes (or internal nodes) and SI is the set of sink nodes. Definition 2 (Buffered routing tree): After buffers have been inserted into the routing tree, the tree is called a buffered routing tree BT (Figure 1(b)), where the inserted buffers become the new nodes, called buffer nodes in the tree. The node set V of BT is now indicated by {{s 0 } SI IN BN}, where BN is the set of buffer nodes. Definition 3 (Wire): A wire e v (e v E) is a directed edge (u,v) (u,v V), where the signal propagates from u to v. Node u is called the parent of node v, and node v is called the child of node u. Definition 4 (Path): A path P(v 0,v n ) from node v 0 to node v n in clock tree T is a sequence of nodes v 0, The delay model used in this paper is given in Figure 2. A wire in a clock tree is modeled by a distrib- v 1, v 2,..., v n V and wires e v1, e v2,..., e vn V such that e links v i-1 and v i and e vi vi e v j, i, j, i j. Its length is denoted by P l (v 0,v n ). Definition 5 (Buffer layer): In a clock tree T, if k buffers are inserted into a source-to-sink path P(s 0,s i ), we say that there are k-layer (k-level) of buffers on path P(s 0,s i ). From source s 0 to sink s i, the j-th (j=1,2,...,k) buffer is called j-th layer (level) buffer. 2.2 Delay Model 3

19 as shown by our optimal buffer position theory in this paper. Algorithms proposed in [9,10,11] insert the same number of buffers in each source-to-sink paths. They further assume the buffers at the same level have the same size. Undoubtedly this buffer insertion strategy helps to reduce skew sensitivity to process variations[12], but this strategy requires an almost balanced routing tree topology. Otherwise, it can not effectively reduce the skew to any specified tolerance. The balanced buffer insertion scheme [3] attempts to partition the clock tree into several subtrees so that every subtree has equal path-length and all source-tosink paths have equal number of levels. This scheme requires an equal path length routing tree. In many practical design situation, it is impossible to construct an equal-path-length tree or well-balanced tree in a routing plane if there exist pre-placed cell modules [17]. Furthermore, an equal-path-length clock tree does not guarantee the minimal delay and skew due to the fact that circuit delay depends not only on the path length but also on the circuit topology [18]. This paper aims to solve the optimal buffer insertion problem from both the theoretical and practical design point of view. Contrast to the previous research, we aim to establish our buffer insertion theories, methodologies and algorithms based on real circuit design settings, where the clock routing tree is neither well balanced nor equal-length. The rest of the paper is organized as follows. Section 2 introduces the terminologies and the delay model used throughout this paper. Section 3 presents the overall approach for delay and skew minimization by buffer insertion. Section 4 presents the initial buffer insertion strategy. Section 5 proposes the theories and algorithms for delay minimization by buffer position optimization. Section 6 develops the strategy for skew minimization. Section 7 shows the experimental results and concludes the paper. 2. Definitions and Delay Model We start by giving a synopsis of the basic definitions which will be used throughout this paper. 2.1 Definition A tree is denoted by T(V,E), where V is the set of nodes and E is the set of edges. Definition 1 (Unbuffered routing tree): An unbuffered routing tree (Figure 1(a)) is a directed binary 2

20 Buffer Insertion for Clock Delay and Skew Minimization * X. Zeng 1, 2, D. Zhou 1, Wei Li 1 1 Department of Electrical and Computer Engineering University of North Carolina at Charlotte, NC Department of Electronic Engineering Fudan University, Shanghai , China. Abstract Buffer insertion is an effective approach to achieve both minimal clock signal delay and skew in high speed VLSI circuit design. In this paper, we develop an optimal buffer insertion and sizing scheme. Particularly, we show that buffer to buffer delay is a convex function of buffer positions in a clock tree. The minimal clock delay can be obtained by equalizing derivatives of this convex function, and the minimal skew can be obtained by equalizing delay functions of different source to sink paths. Based on this theory, we further develop a three-stage method to initially insert buffers in a given clock tree, minimize delay by optimizing buffer positions, and minimize skew by buffer level augment and buffer size refinement. The presented algorithm achieves both minimal delay and skew in real clock tree design. 1. Introduction In the coming century, the clock rate of microprocessors will approach multi-ghz. This trend will last as process technology moves into the deep submicron level. The drastically increased requirement for high performance and high speed VLSI circuits has posed challenges in the design of high speed clock network, where clock delay and skew minimization has been a critical problem. Clock delay and skew can be optimized by good routing strategy [14,17] and effective buffer insertion [1,2]. Early research [14-16] have proposed algorithms to minimize skew by properly balance the length of the path from source to sinks. To further reduce the clock signal delay and transmission line effect, buffer insertion is necessary. The objective of buffer insertion aims to find the proper buffer numbers, sizes and placement in a given clock tree. The fix-position algorithms proposed in [6,7,8] insert buffers at either branching points in a clock tree. However, the optimal buffer positions for minimal delay do not necessarily reside on the branching points, * This research is supported in part by AirForce Office of Scientific Research grant F

21 Title: Buffer Insertion for Clock Delay and Skew Minimization Abstract: Buffer insertion is an effective approach to achieve both minimal clock signal delay and skew in high speed VLSI circuit design. In this paper, we develop an optimal buffer insertion and sizing scheme. Particularly, we show that buffer to buffer delay is a convex function of buffer positions in a clock tree. The minimal clock delay can be obtained by equalizing derivatives of this convex function, and the minimal skew can be obtained by equalizing delay functions of different source to sink paths. Based on this theory, we further develop a three-stage method to initially insert buffers in a given clock tree, minimize delay by optimizing buffer positions, and minimize skew by buffer level augment and buffer size refinement. The presented algorithm achieves both minimal delay and skew in real clock tree design. Contact Author: Name: Dian Zhou Address: Electrical and Computer Engineering Dept. University of North Carolina at Charlotte 9201 University City Boulevard Charlotte, NC Telephone: (704) zhou@uncc.edu 0

Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles

Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles Ying Rao, Tianxiang Yang University of Wisconsin Madison {yrao, tyang}@cae.wisc.edu ABSTRACT Buffer

More information

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Renshen Wang Department of Computer Science and Engineering University of California, San Diego La Jolla,

More information

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu

More information

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations An Efficient Routing Tree Construction Algorithm with uffer Insertion, Wire Sizing and Obstacle Considerations Sampath Dechu Zion Cien Shen Chris C N Chu Physical Design Automation Group Dept Of ECpE Dept

More information

University of California at Berkeley. Berkeley, CA the global routing in order to generate a feasible solution

University of California at Berkeley. Berkeley, CA the global routing in order to generate a feasible solution Post Routing Performance Optimization via Multi-Link Insertion and Non-Uniform Wiresizing Tianxiong Xue and Ernest S. Kuh Department of Electrical Engineering and Computer Sciences University of California

More information

Time Algorithm for Optimal Buffer Insertion with b Buffer Types *

Time Algorithm for Optimal Buffer Insertion with b Buffer Types * An Time Algorithm for Optimal Buffer Insertion with b Buffer Types * Zhuo Dept. of Electrical Engineering Texas University College Station, Texas 77843, USA. zhuoli@ee.tamu.edu Weiping Shi Dept. of Electrical

More information

Interconnect Design for Deep Submicron ICs

Interconnect Design for Deep Submicron ICs ! " #! " # - Interconnect Design for Deep Submicron ICs Jason Cong Lei He Kei-Yong Khoo Cheng-Kok Koh and Zhigang Pan Computer Science Department University of California Los Angeles CA 90095 Abstract

More information

Li Minqiang Institute of Systems Engineering Tianjin University, Tianjin , P.R. China

Li Minqiang Institute of Systems Engineering Tianjin University, Tianjin , P.R. China Multi-level Genetic Algorithm (MLGA) for the Construction of Clock Binary Tree Nan Guofang Tianjin University, Tianjin 07, gfnan@tju.edu.cn Li Minqiang Tianjin University, Tianjin 07, mqli@tju.edu.cn Kou

More information

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 8 (2013), pp. 907-912 Research India Publications http://www.ripublication.com/aeee.htm Circuit Model for Interconnect Crosstalk

More information

PERFORMANCE optimization has always been a critical

PERFORMANCE optimization has always been a critical IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 1633 Buffer Insertion for Noise and Delay Optimization Charles J. Alpert, Member, IEEE, Anirudh

More information

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations XXVII IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, OCTOBER 5, 2009 Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Renshen Wang 1 Takumi Okamoto 2

More information

Variation Tolerant Buffered Clock Network Synthesis with Cross Links

Variation Tolerant Buffered Clock Network Synthesis with Cross Links Variation Tolerant Buffered Clock Network Synthesis with Cross Links Anand Rajaram David Z. Pan Dept. of ECE, UT-Austin Texas Instruments, Dallas Sponsored by SRC and IBM Faculty Award 1 Presentation Outline

More information

Interconnect Delay and Area Estimation for Multiple-Pin Nets

Interconnect Delay and Area Estimation for Multiple-Pin Nets Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Z. Pan UCLA Computer Science Department Los Angeles, CA 90095 Sponsored by SRC and Avant!! under CA-MICRO Presentation

More information

Crosslink Insertion for Variation-Driven Clock Network Construction

Crosslink Insertion for Variation-Driven Clock Network Construction Crosslink Insertion for Variation-Driven Clock Network Construction Fuqiang Qian, Haitong Tian, Evangeline Young Department of Computer Science and Engineering The Chinese University of Hong Kong {fqqian,

More information

Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE Model

Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE Model 446 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 4, APRIL 2000 Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE

More information

AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS

AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS www.arpnjournals.com AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS C. Uttraphan 1, N. Shaikh-Husin 2 1 Embedded Computing System (EmbCoS) Research Focus Group,

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

An Interconnect-Centric Design Flow for Nanometer. Technologies

An Interconnect-Centric Design Flow for Nanometer. Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong Department of Computer Science University of California, Los Angeles, CA 90095 Abstract As the integrated circuits (ICs) are scaled

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

High-Speed Clock Routing. Performance-Driven Clock Routing

High-Speed Clock Routing. Performance-Driven Clock Routing High-Speed Clock Routing Performance-Driven Clock Routing Given: Locations of sinks {s 1, s,,s n } and clock source s 0 Skew Bound B >= 0 If B = 0, zero-skew routing Possibly other constraints: Rise/fall

More information

Routability-Driven Bump Assignment for Chip-Package Co-Design

Routability-Driven Bump Assignment for Chip-Package Co-Design 1 Routability-Driven Bump Assignment for Chip-Package Co-Design Presenter: Hung-Ming Chen Outline 2 Introduction Motivation Previous works Our contributions Preliminary Problem formulation Bump assignment

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Making Fast Buffer Insertion Even Faster Via Approximation Techniques

Making Fast Buffer Insertion Even Faster Via Approximation Techniques 1A-3 Making Fast Buffer Insertion Even Faster Via Approximation Techniques Zhuo Li 1,C.N.Sze 1, Charles J. Alpert 2, Jiang Hu 1, and Weiping Shi 1 1 Dept. of Electrical Engineering, Texas A&M University,

More information

Linking Layout to Logic Synthesis: A Unification-Based Approach

Linking Layout to Logic Synthesis: A Unification-Based Approach Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

IN recent years, interconnect delay has become an increasingly

IN recent years, interconnect delay has become an increasingly 436 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 4, APRIL 1999 Non-Hanan Routing Huibo Hou, Jiang Hu, and Sachin S. Sapatnekar, Member, IEEE Abstract This

More information

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 1 Lecture 10: Repeater (Buffer) Insertion Introduction to Buffering Buffer Insertion

More information

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing Umadevi.S #1, Vigneswaran.T #2 # Assistant Professor [Sr], School of Electronics Engineering, VIT University, Vandalur-

More information

An algorithm for Performance Analysis of Single-Source Acyclic graphs

An algorithm for Performance Analysis of Single-Source Acyclic graphs An algorithm for Performance Analysis of Single-Source Acyclic graphs Gabriele Mencagli September 26, 2011 In this document we face with the problem of exploiting the performance analysis of acyclic graphs

More information

Bottleneck Steiner Tree with Bounded Number of Steiner Vertices

Bottleneck Steiner Tree with Bounded Number of Steiner Vertices Bottleneck Steiner Tree with Bounded Number of Steiner Vertices A. Karim Abu-Affash Paz Carmi Matthew J. Katz June 18, 2011 Abstract Given a complete graph G = (V, E), where each vertex is labeled either

More information

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

10. Interconnects in CMOS Technology

10. Interconnects in CMOS Technology 10. Interconnects in CMOS Technology 1 10. Interconnects in CMOS Technology Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October

More information

A buffer planning algorithm for chip-level floorplanning

A buffer planning algorithm for chip-level floorplanning Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 763 A buffer planning algorithm for chip-level floorplanning CHEN Song 1, HONG Xianlong 1, DONG Sheqin 1, MA Yuchun 1, CAI Yici 1,

More information

A Novel Net Weighting Algorithm for Timing-Driven Placement

A Novel Net Weighting Algorithm for Timing-Driven Placement A Novel Net Weighting Algorithm for Timing-Driven Placement Tim (Tianming) Kong Aplus Design Technologies, Inc. 10850 Wilshire Blvd., Suite #370 Los Angeles, CA 90024 Abstract Net weighting for timing-driven

More information

FOUR EDGE-INDEPENDENT SPANNING TREES 1

FOUR EDGE-INDEPENDENT SPANNING TREES 1 FOUR EDGE-INDEPENDENT SPANNING TREES 1 Alexander Hoyer and Robin Thomas School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332-0160, USA ABSTRACT We prove an ear-decomposition theorem

More information

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Yuzhe Chen, Zhaoqing Chen and Jiayuan Fang Department of Electrical

More information

Fast Delay Estimation with Buffer Insertion for Through-Silicon-Via-Based 3D Interconnects

Fast Delay Estimation with Buffer Insertion for Through-Silicon-Via-Based 3D Interconnects Fast Delay Estimation with Buffer Insertion for Through-Silicon-Via-Based 3D Interconnects Young-Joon Lee and Sung Kyu Lim Electrical and Computer Engineering, Georgia Institute of Technology email: yjlee@gatech.edu

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Name: ESE370 Fall 2012

Name: ESE370 Fall 2012 University of Pennsylvania Department of Electrical and System Engineering Circuit-Level Modeling, Design, and Optimization for Digital Systems ESE370, Fall 2012 Final Friday, December 14 Problem weightings

More information

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization This article has been accepted and published on J-STAGE in advance of copyediting. Content is final as presented. IEICE Electronics Express, Vol.* No.*,*-* Power-Mode-Aware Buffer Synthesis for Low-Power

More information

Gate Sizing by Lagrangian Relaxation Revisited

Gate Sizing by Lagrangian Relaxation Revisited Gate Sizing by Lagrangian Relaxation Revisited Jia Wang, Debasish Das, and Hai Zhou Electrical Engineering and Computer Science Northwestern University Evanston, Illinois, United States October 17, 2007

More information

Satisfiability Modulo Theory based Methodology for Floorplanning in VLSI Circuits

Satisfiability Modulo Theory based Methodology for Floorplanning in VLSI Circuits Satisfiability Modulo Theory based Methodology for Floorplanning in VLSI Circuits Suchandra Banerjee Anand Ratna Suchismita Roy mailnmeetsuchandra@gmail.com pacific.anand17@hotmail.com suchismita27@yahoo.com

More information

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs Jason Cong and Yuzheng Ding Department of Computer Science University of California, Los Angeles, CA 90024 Abstract In this

More information

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches E. Miller R. Libeskind-Hadas D. Barnard W. Chang K. Dresner W. M. Turner

More information

Timing-Constrained I/O Buffer Placement for Flip- Chip Designs

Timing-Constrained I/O Buffer Placement for Flip- Chip Designs Timing-Constrained I/O Buffer Placement for Flip- Chip Designs Zhi-Wei Chen 1 and Jin-Tai Yan 2 1 College of Engineering, 2 Department of Computer Science and Information Engineering Chung-Hua University,

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

A Novel Performance-Driven Topology Design Algorithm

A Novel Performance-Driven Topology Design Algorithm A Novel Performance-Driven Topology Design Algorithm Min Pan, Chris Chu Priyadarshan Patra Electrical and Computer Engineering Dept. Intel Corporation Iowa State University, Ames, IA 50011 Hillsboro, OR

More information

THE technology mapping and synthesis problem for field

THE technology mapping and synthesis problem for field 738 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 9, SEPTEMBER 1998 An Efficient Algorithm for Performance-Optimal FPGA Technology Mapping with Retiming Jason

More information

Floorplan considering interconnection between different clock domains

Floorplan considering interconnection between different clock domains Proceedings of the 11th WSEAS International Conference on CIRCUITS, Agios Nikolaos, Crete Island, Greece, July 23-25, 2007 115 Floorplan considering interconnection between different clock domains Linkai

More information

Non-Rectangular Shaping and Sizing of Soft Modules for Floorplan Design Improvement

Non-Rectangular Shaping and Sizing of Soft Modules for Floorplan Design Improvement Non-Rectangular Shaping and Sizing of Soft Modules for Floorplan Design Improvement Chris C.N. Chu and Evangeline F.Y. Young Abstract Many previous works on floorplanning with non-rectangular modules [,,,,,,,,,,,

More information

Buffered Routing Tree Construction Under Buffer Placement Blockages

Buffered Routing Tree Construction Under Buffer Placement Blockages Buffered Routing Tree Construction Under Buffer Placement Blockages Abstract Interconnect delay has become a critical factor in determining the performance of integrated circuits. Routing and buffering

More information

Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation

Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation Trace Signal Selection to Enhance Timing and Logic Visibility in Post-Silicon Validation Hamid Shojaei, and Azadeh Davoodi University of Wisconsin 1415 Engineering Drive, Madison WI 53706 Email: {shojaei,

More information

Timing-Driven Maze Routing

Timing-Driven Maze Routing 234 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 2, FEBRUARY 2000 Timing-Driven Maze Routing Sung-Woo Hur, Ashok Jagannathan, and John Lillis Abstract This

More information

An Efficient Data Structure for Maxplus Merge in Dynamic Programming

An Efficient Data Structure for Maxplus Merge in Dynamic Programming 3004 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 12, DECEMBER 2006 [12] L. P. Carloni and A. L. Sangiovanni-Vincentelli, Performance analysis and optimization

More information

Postgrid Clock Routing for High Performance Microprocessor Designs

Postgrid Clock Routing for High Performance Microprocessor Designs IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 31, NO. 2, FEBRUARY 2012 255 Postgrid Clock Routing for High Performance Microprocessor Designs Haitong Tian, Wai-Chung

More information

Buffered Steiner Trees for Difficult Instances

Buffered Steiner Trees for Difficult Instances Buffered Steiner Trees for Difficult Instances C. J. Alpert 1, M. Hrkic 2, J. Hu 1, A. B. Kahng 3, J. Lillis 2, B. Liu 3, S. T. Quay 1, S. S. Sapatnekar 4, A. J. Sullivan 1, P. Villarrubia 1 1 IBM Corp.,

More information

An Efficient Algorithm For RLC Buffer Insertion

An Efficient Algorithm For RLC Buffer Insertion An Efficient Algorithm For RLC Buffer Insertion Zhanyuan Jiang, Shiyan Hu, Jiang Hu and Weiping Shi Texas A&M University, College Station, Texas 77840 Email: {jerryjiang, hushiyan, jianghu, wshi}@ece.tamu.edu

More information

FPGA Clock Network Architecture: Flexibility vs. Area and Power

FPGA Clock Network Architecture: Flexibility vs. Area and Power FPGA Clock Network Architecture: Flexibility vs. Area and Power Julien Lamoureux and Steven J.E. Wilton Department of Electrical and Computer Engineering University of British Columbia Vancouver, B.C.,

More information

Optimal Prescribed-Domain Clock Skew Scheduling

Optimal Prescribed-Domain Clock Skew Scheduling Optimal Prescribed-Domain Clock Skew Scheduling Li Li, Yinghai Lu, Hai Zhou Electrical Engineering and Computer Science Northwestern University 6B-4 Abstract Clock skew scheduling is an efficient technique

More information

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori

Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes and Tori The Computer Journal, 46(6, c British Computer Society 2003; all rights reserved Speed-up of Parallel Processing of Divisible Loads on k-dimensional Meshes Tori KEQIN LI Department of Computer Science,

More information

Chapter 6. CMOS Functional Cells

Chapter 6. CMOS Functional Cells Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this

More information

DESIGN OF AN FFT PROCESSOR

DESIGN OF AN FFT PROCESSOR 1 DESIGN OF AN FFT PROCESSOR Erik Nordhamn, Björn Sikström and Lars Wanhammar Department of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract In this paper we present a structured

More information

Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion

Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion Chen Li, Cheng-Kok Koh School of ECE, Purdue University West Lafayette, IN 47907, USA {li35, chengkok}@ecn.purdue.edu Patrick

More information

An Interconnect-Centric Design Flow for Nanometer Technologies. Outline

An Interconnect-Centric Design Flow for Nanometer Technologies. Outline An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong Outline Global interconnects

More information

On Constructing Lower Power and Robust Clock Tree via Slew Budgeting

On Constructing Lower Power and Robust Clock Tree via Slew Budgeting 1 On Constructing Lower Power and Robust Clock Tree via Slew Budgeting Yeh-Chi Chang, Chun-Kai Wang and Hung-Ming Chen Dept. of EE, National Chiao Tung University, Taiwan 2012 年 3 月 29 日 Outline 2 Motivation

More information

Whitespace-Aware TSV Arrangement in 3D Clock Tree Synthesis

Whitespace-Aware TSV Arrangement in 3D Clock Tree Synthesis 2013 IEEE Computer Society Annual Symposium on VLSI Whitespace-Aware TSV Arrangement in 3D Clock Tree Synthesis Xin Li, Wulong Liu, Haixiao Du, Yu Wang, Yuchun Ma, Huazhong Yang Tsinghua National Laboratory

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 6 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Contents FPGA Technology Programmable logic Cell (PLC) Mux-based cells Look up table PLA

More information

Lecture 3: Art Gallery Problems and Polygon Triangulation

Lecture 3: Art Gallery Problems and Polygon Triangulation EECS 396/496: Computational Geometry Fall 2017 Lecture 3: Art Gallery Problems and Polygon Triangulation Lecturer: Huck Bennett In this lecture, we study the problem of guarding an art gallery (specified

More information

Clock Skew Optimization Considering Complicated Power Modes

Clock Skew Optimization Considering Complicated Power Modes Clock Skew Optimization Considering Complicated Power Modes Chiao-Ling Lung 1,2, Zi-Yi Zeng 1, Chung-Han Chou 1, Shih-Chieh Chang 1 National Tsing-Hua University, HsinChu, Taiwan 1 Industrial Technology

More information

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem

An Evolutionary Algorithm for the Multi-objective Shortest Path Problem An Evolutionary Algorithm for the Multi-objective Shortest Path Problem Fangguo He Huan Qi Qiong Fan Institute of Systems Engineering, Huazhong University of Science & Technology, Wuhan 430074, P. R. China

More information

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h)

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h) HAI ZHOU Electrical and Computer Engineering Northwestern University 2535 Happy Hollow Rd. Evanston, IL 60208-3118 Glenview, IL 60025 haizhou@ece.nwu.edu www.ece.nwu.edu/~haizhou (847) 491-4155 (o) (847)

More information

Connected Components of Underlying Graphs of Halving Lines

Connected Components of Underlying Graphs of Halving Lines arxiv:1304.5658v1 [math.co] 20 Apr 2013 Connected Components of Underlying Graphs of Halving Lines Tanya Khovanova MIT November 5, 2018 Abstract Dai Yang MIT In this paper we discuss the connected components

More information

Practical Techniques to Reduce Skew and Its Variations in Buffered Clock Networks

Practical Techniques to Reduce Skew and Its Variations in Buffered Clock Networks Practical Techniques to Reduce Skew and Its Variations in Buffered Clock Networks Ganesh Venkataraman, Nikhil Jayakumar, Jiang Hu, Peng Li, Sunil Khatri Dept. of Electrical Engineering, Texas A&M University,

More information

Probability-Based Approach to Rectilinear Steiner Tree Problems

Probability-Based Approach to Rectilinear Steiner Tree Problems 836 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 6, DECEMBER 2002 Probability-Based Approach to Rectilinear Steiner Tree Problems Chunhong Chen, Member, IEEE, Jiang Zhao,

More information

Fast Algorithms For Slew Constrained Minimum Cost Buffering

Fast Algorithms For Slew Constrained Minimum Cost Buffering 18.3 Fast Algorithms For Slew Constrained Minimum Cost Buffering Shiyan Hu, Charles J. Alpert, Jiang Hu, Shrirang Karandikar, Zhuo Li, Weiping Shi, C. N. Sze Department of Electrical and Computer Engineering,

More information

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Chen-Wei Liu 12 and Yao-Wen Chang 2 1 Synopsys Taiwan Limited 2 Department of Electrical Engineering National Taiwan University,

More information

An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation*

An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation* An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation* Yingxin Pang Dept.ofCSE Univ. of California, San Diego La Jolla, CA 92093 ypang@cs.ucsd.edu Chung-Kuan Cheng Dept.ofCSE

More information

Generating More Compactable Channel Routing Solutions 1

Generating More Compactable Channel Routing Solutions 1 Generating More Compactable Channel Routing Solutions 1 Jingsheng Cong Department of Computer Science University of Illinois, Urbana, IL 61801 D. F. Wong Department of Computer Science University of Texas,

More information

ICS 252 Introduction to Computer Design

ICS 252 Introduction to Computer Design ICS 252 Introduction to Computer Design Lecture 16 Eli Bozorgzadeh Computer Science Department-UCI References and Copyright Textbooks referred (none required) [Mic94] G. De Micheli Synthesis and Optimization

More information

The Feasible Solution Space for Steiner Trees

The Feasible Solution Space for Steiner Trees The Feasible Solution Space for Steiner Trees Peter M. Joyce, F. D. Lewis, and N. Van Cleave Department of Computer Science University of Kentucky Lexington, Kentucky 40506 Contact: F. D. Lewis, lewis@cs.engr.uky.edu

More information

A New Approach to the Dynamic Maintenance of Maximal Points in a Plane*

A New Approach to the Dynamic Maintenance of Maximal Points in a Plane* Discrete Comput Geom 5:365-374 (1990) 1990 Springer-Verlag New York~Inc.~ A New Approach to the Dynamic Maintenance of Maximal Points in a Plane* Greg N. Frederickson and Susan Rodger Department of Computer

More information

Multi-Voltage Domain Clock Mesh Design

Multi-Voltage Domain Clock Mesh Design Multi-Voltage Domain Clock Mesh Design Can Sitik Electrical and Computer Engineering Drexel University Philadelphia, PA, 19104 USA E-mail: as3577@drexel.edu Baris Taskin Electrical and Computer Engineering

More information

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Wei-Jin Dai, Dennis Huang, Chin-Chih Chang, Michel Courtoy Cadence Design Systems, Inc. Abstract A design methodology for the implementation

More information

Min-Cost Multicast Networks in Euclidean Space

Min-Cost Multicast Networks in Euclidean Space Min-Cost Multicast Networks in Euclidean Space Xunrui Yin, Yan Wang, Xin Wang, Xiangyang Xue School of Computer Science Fudan University {09110240030,11110240029,xinw,xyxue}@fudan.edu.cn Zongpeng Li Dept.

More information

Sources for this lecture 2. Shortest paths and minimum spanning trees

Sources for this lecture 2. Shortest paths and minimum spanning trees S-72.2420 / T-79.5203 Shortest paths and minimum spanning trees 1 S-72.2420 / T-79.5203 Shortest paths and minimum spanning trees 3 Sources for this lecture 2. Shortest paths and minimum spanning trees

More information

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

Iterative-Constructive Standard Cell Placer for High Speed and Low Power Iterative-Constructive Standard Cell Placer for High Speed and Low Power Sungjae Kim and Eugene Shragowitz Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455

More information

Deduction and Logic Implementation of the Fractal Scan Algorithm

Deduction and Logic Implementation of the Fractal Scan Algorithm Deduction and Logic Implementation of the Fractal Scan Algorithm Zhangjin Chen, Feng Ran, Zheming Jin Microelectronic R&D center, Shanghai University Shanghai, China and Meihua Xu School of Mechatronical

More information

ICS 252 Introduction to Computer Design

ICS 252 Introduction to Computer Design ICS 252 Introduction to Computer Design Placement Fall 2007 Eli Bozorgzadeh Computer Science Department-UCI References and Copyright Textbooks referred (none required) [Mic94] G. De Micheli Synthesis and

More information

Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization

Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization Double Patterning Layout Decomposition for Simultaneous Conflict and Stitch Minimization Kun Yuan, Jae-Seo Yang, David Z. Pan Dept. of Electrical and Computer Engineering The University of Texas at Austin

More information

Crosstalk Noise Optimization by Post-Layout Transistor Sizing

Crosstalk Noise Optimization by Post-Layout Transistor Sizing Crosstalk Noise Optimization by Post-Layout Transistor Sizing Masanori Hashimoto hasimoto@i.kyoto-u.ac.jp Masao Takahashi takahasi@vlsi.kuee.kyotou.ac.jp Hidetoshi Onodera onodera@i.kyoto-u.ac.jp ABSTRACT

More information

Introduction VLSI PHYSICAL DESIGN AUTOMATION

Introduction VLSI PHYSICAL DESIGN AUTOMATION VLSI PHYSICAL DESIGN AUTOMATION PROF. INDRANIL SENGUPTA DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Introduction Main steps in VLSI physical design 1. Partitioning and Floorplanning l 2. Placement 3.

More information

Challenges and Opportunities for Design Innovations in Nanometer Technologies

Challenges and Opportunities for Design Innovations in Nanometer Technologies SRC Design Sciences Concept Paper Challenges and Opportunities for Design Innovations in Nanometer Technologies Jason Cong Computer Science Department University of California, Los Angeles, CA 90095 (E.mail:

More information

Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects

Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects Determination of Worst-case Crosstalk Noise for Non-Switching Victims in GHz+ Interconnects Jun Chen ECE Department University of Wisconsin, Madison junc@cae.wisc.edu Lei He EE Department University of

More information

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Digital Design Methodology (Revisited) Design Methodology: Big Picture Digital Design Methodology (Revisited) Design Methodology Design Specification Verification Synthesis Technology Options Full Custom VLSI Standard Cell ASIC FPGA CS 150 Fall 2005 - Lec #25 Design Methodology

More information

Porosity Aware Buffered Steiner Tree Construction

Porosity Aware Buffered Steiner Tree Construction IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2003 100 Porosity Aware Buffered Steiner Tree Construction Charles J. Alpert, Gopal Gandham, Milos Hrkic,

More information

Retiming and Clock Scheduling for Digital Circuit Optimization

Retiming and Clock Scheduling for Digital Circuit Optimization 184 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2002 Retiming and Clock Scheduling for Digital Circuit Optimization Xun Liu, Student Member,

More information

( ) n 3. n 2 ( ) D. Ο

( ) n 3. n 2 ( ) D. Ο CSE 0 Name Test Summer 0 Last Digits of Mav ID # Multiple Choice. Write your answer to the LEFT of each problem. points each. The time to multiply two n n matrices is: A. Θ( n) B. Θ( max( m,n, p) ) C.

More information

Concurrent Flip-Flop and Repeater Insertion for High Performance Integrated Circuits

Concurrent Flip-Flop and Repeater Insertion for High Performance Integrated Circuits Concurrent Flip-Flop and Repeater Insertion for High Performance Integrated Circuits Pasquale Cocchini pasquale.cocchini@intel.com, Intel Labs, CADResearch Abstract For many years, CMOS process scaling

More information

arxiv: v2 [cs.ds] 30 Nov 2012

arxiv: v2 [cs.ds] 30 Nov 2012 A New Upper Bound for the Traveling Salesman Problem in Cubic Graphs Maciej Liśkiewicz 1 and Martin R. Schuster 1 1 Institute of Theoretical Computer Science, University of Lübeck Ratzeburger Allee 160,

More information