Fast Delay Estimation with Buffer Insertion for Through-Silicon-Via-Based 3D Interconnects

Size: px
Start display at page:

Download "Fast Delay Estimation with Buffer Insertion for Through-Silicon-Via-Based 3D Interconnects"

Transcription

1 Fast Delay Estimation with Buffer Insertion for Through-Silicon-Via-Based 3D Interconnects Young-Joon Lee and Sung Kyu Lim Electrical and Computer Engineering, Georgia Institute of Technology Abstract For successful adoption of through-silicon-via-based 3D ICs, delay estimation techniques of 3D interconnects for early design stages are required. The 3D nets may connect gates/macros placed far apart and through-silicon-vias (s) have large parasitic capacitances. Thus, buffers are inserted to reduce interconnect delay. To make good decisions in early design stages, the estimation of buffered delay should be fast and reasonably accurate. However, there has been no buffered delay estimation work for 3D ICs that considers proper delay models and RC parasitics. In this work, we investigate several analytical delay models for 3D net delay estimation. Then, based on analytical formula and our heuristic algorithm, we propose how to estimate the buffered delay for movable cases and fixed cases. The effectiveness of our delay estimation technique is demonstrated with various 3D nets. Compared with the van Ginneken buffer insertion based delay estimation, our estimation provides solutions about 750 times faster with almost the same estimated delay. Index Terms 3D IC, through-silicon-via, delay estimation, buffer insertion. I. INTRODUCTION As the physical limit for technology scaling approaches and the cost for IC fabrication soars, 3D IC is considered as a viable way to preserve Moore s law. The benefits of 3D ICs have been advocated and lots of researches have been done on material, fabrication, design methodology, testing and so on, for successful commercialization. For a fast adoption, designers may choose to partition existing designs into blocks then place them on different dies along with IP blocks. To enable this so-called block-level 3D IC design methodology, a reasonably good timing estimation technique is required in early design stages, such as architectural design space exploration, floorplanning, or timing-driven placement. Buffer insertion for 2D ICs was studied in closed analytical formulations [4]. After the pioneering work of van Ginneken [16] which adopted dynamic programming, efforts for generalization [9], speed-up [15], and higher accuracy [2] were made. However, the Ginneken algorithm usually takes considerable computation time, especially when wires are segmented for candidate buffer locations [1]. In early design stages, we need a fast yet reasonably good delay estimation. The buffer locations do not have to be very accurate, because the actual buffers may be placed at different locations as more layout information is available in the later design stages. For 3D ICs, a buffer planning algorithm at floorplanning stage was recently proposed in [5]. However, buffer delay was assumed to be constant, irrespective of RC load, and parasitics was not considered. Since s have large parasitic capacitance, it is natural to consider it in the delay estimation. Furthermore, depending on the design flow, design constraints, and manufacturing issues, we may or may not be allowed to move s in the 3D nets. In this work, This material is based upon work supported by the Semiconductor Research Corporation (SRC) under the Integrated Circuit & Systems Sciences (ICSS, Task ID: & ) and the Interconnect Focus Center (IFC, Theme ID: ). we demonstrate how to perform buffered delay estimation for based 3D interconnects, for both movable and fixed cases. Our buffered delay estimation technique takes two steps. First, we quickly determine the number of buffers and their locations using simple analytical delay models. Then we evaluate the buffered delay with more elaborate delay models. The major contributions of this work are as follows: We compare existing analytical delay models for various 3D nets with s. For gate delay, linear model and k-factor equation based model are examined, along with lumped and effective capacitance models. For net delay, the Elmore model, a moment based model (WED), and a technique for using step-input-based net delay models for ramp inputs (PERI) are explored. We discuss which models are suitable for buffered delay estimation. We provide fast delay estimation techniques for buffered 3D nets in -based 3D ICs. Two different cases of 3D nets are discussed: movable case and fixed case. For movable case buffer locations and locations are determined, whereas for fixed case buffer locations are determined with the consideration of given locations. Our delay destimation technique can be used for architecture exploration, floorplanning, or timing-driven placement. We demonstrate the effectiveness of our delay estimation technique with layout experiments of various 3D nets. Compared with the widely-used van Ginneken style buffer insertion based delay estimation method, our estimation provides solutions about 750 times faster with almost the same or a few percents increased delay. The remainder of this paper is organized as follows. Our 3D IC structure and parasitics are introduced and gate and net delay models are investigated in Section II. Then analytical buffer insertions for movable and fixed cases as well as our heuristic buffer insertion algorithm for delay estimation are presented in Section III. Experimental results are shown in Section IV, followed by conclusions in Section V. II. ANALYTICAL DELAY MODELS Our 3D IC structure is shown in Fig. 1(a). Although only two dies are depicted, the whole chip may have multiple dies stacked, in which s go through thinned substrates. Based on Nangate 45nm standard cell library [12], our macro occupies four standard cell rows as shown in Fig. 1(b). The diameter is 2µm. It is well known that s have large parasitics that affect timing. Depending on dimensions, materials, and manufacturing process, the magnitude of parasitics may vary [8]. Each has a parasitic capacitance (C T SV ) and a resistance (R T SV ) and is represented by a π-model with two capacitors and a resistor, as shown in Fig. 1(c). The inductance of is ignored because it is not dominant under a few GHz signal speed. Due to the parasitics, 1) gate and net delays are affected by number of s and their location, and 2) /12/$ IEEE th Int'l Symposium on Quality Electronic Design

2 Die 1 face LP (M6) back Die 0 PP (M1) (a) local vias gate PP (M1) (b) PP R C 2 (c) LP TABLE I PARAMETERS USED IN THIS WORK. THE BUFFER CELL BUF X8 IS USED. C g IS GATE INPUT CAPACITANCE. K g ps R g 0.303kΩ C g 6.585fF k k E-5 k E-8 k k C 0.102fF/µm R 1.5Ω/µm C T SV 25fF R T SV 1Ω Fig. 1. (a) Side view of the 3D IC, (b) top view of a, and (c) π-model of parasitics. PP (M1) and LP (M6) represent pin pad on metal1 and landing pad on metal6, respectively. Backside metallization on Die 0 is hidden for simplicity. Dashed lines in (b) denote standard cell row boundaries. Dimensions are in µm. nets with s no longer have uniform unit length R and C along the path, which is different from the uniform net RC assumption in previous 2D analytical buffer insertion works [4][1]. For multi-fanout nets, assuming that non-critical paths can be decoupled by inserting infinitesimally small offloading buffers [3], we have two-pin 3D nets that connects the source gate to the critical sink gate, with s along the path. Thus, for the rest of this work, we focus on two-pin nets. A. Gate Delay Model Linear gate delay model has been extensively used in timing optimization works. Linear gate delay is expressed as follows: D g,linear = K g + R g C L where K g and R g are intrinsic delay and intrinsic resistance, and C L is the lumped load capacitance at the output pin of the gate. We fit the above equation to the actual gate delay and obtain the parameters K g and R g. Note that the parameters change with different slew values (=transition times) at the input pin of the gate, however it is usually ignored. As discussed in [2], the linear gate delay model is inaccurate because 1) due to the resistive shielding [13], lumped load capacitance is an overestimate of the effective capacitance [14] seen at the gate output, and 2) gate delay is not a linear function of load capacitance. The first problem can be solved by adopting effective capacitance model, while the second problem is dealt by k-factor equation gate delay model [14]. In the effective capacitance calculation, the RC network is reduced to a π-model (C n, R π, C f ) in which R π models the resistive shielding effect. Then the effective capacitance (C eff ) at the gate output is computed as in [14]. The k-factor equation for gate delay is: D g,k factor = (k 1 + k 2C eff )S g,in + k 3C 3 eff + k 4C eff + k 5 where S g,in is the signal slew at the gate input and k 1 k 5 are curvefitting parameters. B. Net Delay Model The basic net delay model is given by Elmore delay equation [6], which corresponds to the first moment of the impulse response. The Elmore delay has been used in various timing optimization works because it is easy to compute and the delay is additive, meaning that the delay from A to C is the delay from A to B plus the delay from B to C. This additive characteristic supports the optimal substructure of dynamic programming in the original Ginneken buffer insertion [16]. The shortcoming of Elmore delay is that it may deviate from the actual delay by orders of magnitude [2]. For higher accuracy, we may use a moment-matching based delay metric. In this study, we evaluate WED [11], because it requires a small amount of computation and is reported to be more accurate than Elmore delay model. Two moments are computed in bottom-up traversal in van Ginneken dynamic programming as in [2], then two simple 1D table lookups are needed to obtain the estimated net delay. The problem of WED is that the model works only for step inputs. In real circuits, an input signal has a finite slew which makes WED significantly underestimate the actual delay. Thus we also adopt PERI method [7] which converts the delay from the delay models for step inputs to the delay for ramp inputs. However, since WED and PERI methods depend on signal slew and in early design stages we usually cannot determine signal slews on nets, it may not be appropriate to use WED and PERI in early design stages. C. Experiments for Delay Models We perform two experiments to compare the delay metrics: (1) For a 3D net of wirelength L=1000µm (= distance from source to sink gate) with one, vary the location, and (2) for 3D nets of various wirelengths with two s, vary the locations. For all experiments we use the Nangate 45nm standard cell library. For the same reason as in [3], we assume that all gates (sources, sinks, and buffers) are the same (BUF X8). Also we assume that router uses metal5 with unit length capacitance C and resistance R. The parameters used in this work are summarized in Table I. As a reference, delay values from PrimeTime are shown. The layout of each net is performed in Cadence Encounter, followed by RC extraction using Cadence QRC. Then combined with the RC modeling of s in Fig. 1, we run 3D static timing analysis in Synopsys PrimeTime to obtain the delay values. 1) A 3D Net with One : For a 3D net of wirelength L = 1000µm, we vary the location F (measured from the source gate) from 0 to 1000µm. As shown in Fig. 2, when the is moved away from the source gate (= increasing F ), the gate delay decreases while the net delay increases. Gate delay decreases because the effective load capacitance decreases as shown on the bottom right of Fig. 2. Due to the resistive shielding effect on the (the resistance of the wire between source gate and ), the capacitance seen by the source gate decreases as the moves towards the sink gate. Linear gate delay model with lumped capacitance fails to follow this trend. Note that the gate intrinsic delay (K g in Table I) is comparable to the gate delay, which suggests that ignoring the intrinsic delay may incur a large error in estimated delay. Elmore follows PrimeTime net delay closely, while WED constantly underestimates it. Per total delay, linear gate delay with lumped capacitance and Elmore net delay (lin+clu+elm) overestimates the delay much, while k-factor gate delay with effective capacitance and Elmore net delay (kfa+cef+elm) follows the PrimeTime delay closely. 2) 3D Nets with Two s: For various 3D nets with different wirelengths and two s, we pick the locations randomly. Table II summarizes the delay from the analytical delay models

3 Fig. 2. Source gate delay (D g ), net delay (D n ), segment delay (D seg =D g +D n ), and load capacitance (C load ) for the 3D net with single case. Clump/Clu and Ceff/Cef mean lumped capacitance and effective capacitance. WE, PE, and PT stand for WED, PERI, and PrimeTime. TABLE II COMPARISON OF SOURCE GATE AND NET DELAY VALUES BY ANALYTICAL DELAY MODELS AND PRIMETIME. THE LUMPED AND EFFECTIVE CAPACITANCES OF NET (C n) ARE ALSO SHOWN. THE L, F, AND G REPRESENT WIRELENGTH, DISTANCE FROM SOURCE GATE TO THE FIRST, AND DISTANCE FROM THE FIRST TO THE SECOND. THE LENGTH, DELAY, AND CAPACITANCE VALUES ARE IN µm, ps, AND ff RESPECTIVELY. L F G D g D n D seg = D g + D n C n linear+clump kfactor+ceff PT Elmore WED+PERI PT lin+clu+elm kfa+cef+elm PT Clump Ceff ave and PrimeTime. Again, same trend as in the previous subsection is observed. The linear+clump overestimates the gate delay much, while kfactor+ceff shows close values to PrimeTime results. This is mainly because of the difference between Clump and Ceff, shown in the rightmost column. Depending on the L, F, and G, Elmore under/overestimates the net delay (compared with PrimeTime delay), while WED+PERI always underestimates it. From the above experiments, it may seem that the simple linear gate delay with lumped capacitance and Elmore net delay are not suitable for buffered delay estimation. However, because of their fidelity we claim that these models are still useful for determining a buffer solution, as discussed in the following section. III. ANALYTICAL BUFFER INSERTION Our buffered delay estimation technique takes two steps. First, we quickly determine the number of buffers and their locations with simple analytical delay models. Then we evaluate the buffered delay with more elaborate delay models. This section explains the first step. Note that the buffers are not actually inserted into the netlist; we assume the buffers are temporarily inserted for evaluating the buffered delay. The final outcome of our technique is the estimated buffered delay, not the exact buffer locations. Our 3D buffer insertion problem is described as follows: During early design stage (eg. floorplanning), for given unbuffered 3D nets with the estimated routes (before detailed routing), we estimate the buffered delay of the 3D nets. The estimated 3D routes are constructed by adopting the 3D rectilinear Steiner tree [10] or the 3D rectilinear minimum spanning tree algorithm. Depending on the design flow and other design constraints, we may be allowed to move s in the 3D nets. Thus we categorize the problem into two cases: 1) Movable case: We determine/suggest the number of buffers

4 segment segment buffer segment length interval x L Fig. 3. L-x (a) a segment Fig. 4. left left right right left current right interval interval interval Terminologies for analytical formulations. R g R x R R(L - x) C x C C(L-x) (b) RC-tree model A segment and its RC-tree model. and their locations as well as the optimal locations to estimate the lowest achievable buffered 3D net delay. 2) Fixed case: Given the locations, we carefully insert buffers to estimate a reasonably low buffered 3D net delay. In this section, we provide the optimal buffer insertion solutions for several 3D net cases by solving analytical formulations. An analytical buffer insertion solution provides optimal number of buffers and their locations. Furthermore, from analytical solutions we gain insights on how the buffers should be inserted for optimal buffered delay. Note that the analytical formulations are based on linear gate delay with lumped load capacitance and Elmore net delay. We define terminologies used in our analytical formulations. As shown in Fig. 3, a segment is a buffer and the wire to the downstream buffer, and a segment is a segment with s in it. A segment length is defined as the wirelength of the segment, excluding the size. A interval is the interval between two adjacent s or the interval between the leftmost (or rightmost) and the source (or sink) gate. Considering a buffer (shown in shaded blue), left is towards upstream and right is towards downstream. Current interval is the interval that the buffer belongs to, and left/right interval is the interval to the left/right. Left/right is the left/right of the current interval. A. Movable Case For movable cases, our suggestions are as follows. Perform 2D buffer insertion without considering s, then place the s close to any buffer output. When the number of s is more than the number of buffers, place some of the s back to back near a buffer output. Now we explain the reasons for these suggestions. Theorem 1: The delay of a segment is minimized by placing the right after the driving buffer. Proof: Consider the following problem shown in Fig. 4: Given the wirelength between two buffers (L), determine location (x) so that the delay is minimized. The segment delay is formulated as follows: D seg = K g + R g(cx + C T SV + C(L x) + C g) + Rx(Cx/2 + C T SV + C(L x) + C g) + R T SV (C T SV /2 + C(L x) + C g ) + R(L x)(c(l x)/2 + C g ) To find the optimal x, we differentiate D seg by x: dd seg/dx = RC T SV R T SV C C g Since RC T SV R T SV C, dd seg/dx > 0. Thus, D seg is minimum when x = 0, i.e. is placed right after the driving buffer, regardless of L. Theorem 2: The path delay is optimal when non- segments are of the same length. Proof: Consider the path delay excluding the segment. Then, this path delay is optimal when the distance between buffers are all the same, as shown in [1]. Thus, the non- segments are of the same length. Theorem 3: The path delay does not change regardless of the segment locations. Proof: From Theorem 2, we may assume that all non- segments have the same length (=l s ), and the segment has a different length (=l t ). Let the delay of a non- segment and a segment D s and D t, respectively. When k buffers are inserted for a 3D net with n s, there are (k n + 1) non- segments and n segments. The path delay is the sum of the delay of each segment, i.e. D path = nd t + (k n + 1)D s, regardless of the segment locations. From Theorem 1, the optimal solution should place the right after a buffer. Furthermore, by Theorem 3, the optimal delay does not change regardless of the segment location. Now we find the optimal buffer insertion solution in a similar way as in [1]. With the assumptions in the proof of Theorem 3, when the total length is L, l t = (L (k n + 1)l s )/n. Then, the path delay is: D path (k) = nd t + (k n + 1)D s = n{k g + R g(c T SV + Cl t + C g) + R T SV (C T SV /2 + Cl t + C g) + Rl t (Cl t /2 + C g )} + (k n + 1){K g + R g (Cl s + C g ) + Rl s (Cl s /2 + C g )} which is quadratic in l s when l t is substituted in. To find the optimal l s, we differentiate it and set to zero: dd path /dl s = 0. The solution is: l s = 1 RT SV (L + n k + 1 R ), l t = 1 RT SV (L (k n + 1) k + 1 R ) Substituting l s and l t into D path (k), we find the optimal k by setting D path (k 1) > D path (k), i.e. the path delay starts to increase when one more buffer is added to the net with (k-1) buffers. The solution is: k = RC(L + nrt SV /R) K g + R gc g Since R T SV /R L, by approximation we get: l s = l t L k + 1, k RCL2 2 K g + R gc g Note that there is no related term in the solution equations. In fact, the approximated solution is the same as the 2D solution without s. Thus, we can insert buffers with the above equations then relocate s to minimize delay. B. Fixed Case For fixed cases, it is not easy to find the optimal solution. We may formulate the delay equations in matrix form using linear gate delay model and Elmore net delay model and solve it for delay minimization. However, it is computationally expensive and as shown in Section II the actual delay differs from the simple delay models thus the calculation effort is not so much worthwhile. In the early design stage, we rather prefer quick buffer insertion solution that gives the delay reasonably close to the optimum delay. We first show

5 L-k d L k buffers d Fig. 6. L d k buffers L-k d A right before sink. Fig. 5. A right after source, and the experimental results with wirelength (L)=2, 4mm. D path is the delay from the source to the sink gate. The PT means PrimeTime. the analytical solutions for several example cases, then we propose a heuristic algorithm that is simple and fast yet gives reasonably good delay estimates. 1) A Right After Source: As one extreme, we assume the fixed location is right after the source gate. We find the number of buffers (k) and their locations between the and sink gate. The path delay from the source gate to the sink gate is: D path (k) = K g + R g(c T SV + C(L k d) + C g) + R T SV (C T SV /2 + C(L k d) + C g) + R(L k d)(c(l k d)/2 + C g) + k{k g + R g(c d + C g) + R d(c d/2 + C g)} This is quadratic in terms of d. To find the optimal d, we differentiate it and set to zero: dd path /dd = 0. The solution is: d = L + RT SV /R k + 1 We substitute d into D path (k). The delay with k buffers is smaller than the delay with k 1 buffers when D path (k) < D path (k 1). That is, D path (k) decreases with larger k up to: k < RC(L + RT SV /R)2 K g + R gc g Since L R T SV /R, we may ignore R T SV /R term in the above solution. Then, k < RCL2, d = L (1) 2 K g + R gc g k + 1 which is the same solution as 2D solution [1]. Thus, when a (or stacked s) are placed right after source gate, we can insert k buffers from the above equations at equal distances, separated by d. As shown on the right of Fig. 5, we performed a layout experiments for the right after source gate case. For 3D nets of wirelength (L) 2 and 4mm, we increase the number of buffers (k) and place them at equal distances. For L=2mm, the minimum delay occurs at k = 2 for PrimeTime, linear+clump+elmore, and kfactor+ceff+elmore. The linear+clump+elmore overestimates the buffered delay, however the optimal number of buffers and their location are the same as those by PrimeTime. For L=4mm, the optimal k for PrimeTime and linear+clump+elmore is 5, while that of kfactor+ceff+elmore is 6. Yet, the PrimeTime delay difference between k=5 and 6 is small. In fact, k from 3 to 7 are all good solutions in terms of buffered delay. In sum, although linear+clump+elmore overestimates the delay, it gives the same optimal k as PrimeTime does. This suggests that the simple linear+clump+elmore has good fidelity in terms of buffer count and location. On the other hand, kfactor+ceff+elmore may give a little higher k than PrimeTime does. However, with the optimal k, the estimated delay is very close to the PrimeTime delay. From this experiment, we conclude that (1) for determining number of buffers and their locations, we may use linear+clump+elmore delay models for simplicity, and (2) for delay calculation after a buffer solution is obtained, we need to use more elaborate delay models (such as kfactor+ceff+elmore) for accurate delay estimations. 2) A Right Before Sink: As another extreme, we assume the fixed location is right before the sink gate, as shown in Fig. 6. The path delay from the source gate to the sink gate is: D path (k) = k{k g + R g (C d + C g ) + R d(c d/2 + C g )} + K g + R g(c(l k d) + C T SV + C g) + R(L k d)(c(l k d)/2 + C T SV + C g) + R T SV (C T SV /2 + C g ) By the same method as in the previous subsection, we get: k < RC(L + C T SV /C) 2, d = L + C T SV /C 2 K g + R g C g k + 1 The C T SV /C 123µm is not negligible when compared with L (usually around 700µm after buffer insertion). Thus the optimal k and d are different from 2D analytical solution. In fact, as shown in Section II, the delay of a segment increases as moves towards downstream. With this extreme location, the delay of the segment is relatively larger than other segments, thus it is intuitive to move buffers towards the. Comparing the k and d to the 2D analytical solution in the previous subsection, we see that it is equivalent to the solution for a 2D net with wirelength of L + C T SV /C, except for the last segment. C. 3D Heuristic Buffer Insertion for Fixed Case Since it is hard to find analytical solutions for general 3D nets with multiple fixed s, we propose the 3D heuristic buffer insertion algorithm (3Dheu), which is outlined in Algorithm 1. Our algorithm starts from the sink and performs a single bottom-up traversal to determine locations of buffers. For explanation we use terminologies defined at the beginning of Section III. As we traverse up towards the source gate, each buffer location is determined. The current buffer location serves as an anchor to the next buffer on the upstream. For determining the current buffer location, we first use the wirelength

6 Algorithm 1: The proposed 3Dheu algorithm. Input: a two-pin net with locations of sink gate and s Output: buffer location list BLlist 1 u sink gate location; 2 while u > 0 do 3 L u; 4 calculate d using Eq. 1; 5 if d = u then 6 break; // no more buffers needed 7 end 8 M 2 d; 9 v u M + d; 10 count N rt SV between u and v; 11 calculate x using Eq. 2; 12 w u M + x; 13 while v and w are in different interval do 14 if w is in right interval of v then 15 move v to the center of the right interval; 16 w w C T SV /(2C); 17 end 18 else if w is in left interval of v then // move direction changed 19 move v to the rightmost of the left interval; 20 w w +C T SV /(2C); 21 if v and w are in different interval then 22 w v; 23 end 24 break; // no further adjustment needed 25 end 26 end 27 u w; 28 add u into BLlist; 29 end from right (downstream, already determined) buffer to the source gate to determine d in Eq. 1 (Algorithm 1, Line 4). Note that we ignore s in calculating this d. Then the current buffer is temporarily placed with segment length of d (v in Algorithm 1, Line 9). Then, depending on how many s are between the current buffer and the right buffer, we adjust the current buffer location. How much we need to adjust will be discussed below. After adjustment, we continue the process until no more buffer is needed (Algorithm 1, Line 6). We now provide the details on how to adjust the current buffer location. In Fig. 7(a), a exists on the right of the current buffer. Here, M is the distance between three adjacent buffers by 2D analytical buffer insertion (M = 2d, d is from Eq. 1). The delay from the left (yet to be determined) buffer to the right (already determined) buffer is: D buf = K g + R g (Cx + C g ) + Rx(Cx/2 + C g ) + K g + R g (C(M x F ) + C T SV + CF + C g ) + R(M x F )(C(M x F )/2 + C T SV + CF + C g) + R T SV (C T SV /2 + CF + C g) + RF (CF/2 + C g) This is quadratic in terms of x. To find optimal x, we differentiate it and set to zero: dd buf /dx = 0. The solution is: x = M/2 + C T SV /(2C) That is, we need to move the current buffer from the temporary temporarily projected x currently determined M M-x-F current buffer M/2 C /(2C) optimal location (a) Single on the right M (b) Two s on the right F M-x-F 1 -F 2 F 1 x F 2 M/2 (C /(2C already determined Fig. 7. Adjusting current buffer location with (a) a and (b) two s on the right. F x x x-f M M ignore upstream s M-x (a) Single on the left (b) General case N r Fig. 8. Adjusting current buffer location with (a) a on the left. In (b), a more general case is shown. location (x = M/2) towards right by C T SV /(2C). In Fig. 7(b), two s exist on the right of the current buffer. By the similar method, the optimal x = M/2 + C T SV /(2C) 2. In general, when there are N rt SV s between current buffer and the right buffer, the optimal location of the current buffer is: x = M/2 + (C T SV /(2C)) N rt SV (2) In Fig. 8(a), a exists on the left of the current buffer. The delay from the left buffer to the right buffer is: D buf = K g + R g(cf + C T SV + C(x F ) + C g) + RF (CF/2 + C T SV + C(x F ) + C g) + R(x F )(C(x F )/2 + C g) + K g + R g(c(m x) + C g) + R(M x)(c(m x)/2 + C g) The optimal x = M/2. That is, the on the left (=upstream) does not affect the optimal location of the current buffer. This is because the current buffer does not see the (upstream) as load. In Fig. 8(b), we show a general case where both left and right s exist. In determining the current buffer location, we just need to

7 current interval left interval right interval (a) Across right current interval TABLE IV COMPARISON OF THE ESTIMATED DELAY WITH FIXED CASE AND MOVABLE CASE. THE DELAY IS IN ps. net Fixed Movable #buf delay #buf delay n n n n n average (b) Across left Fig. 9. Adjusting current buffer location across (a) the right and (b) the left. count the s on the right of the current buffer, then multiply it by C T SV /(2C) to get the adjustment distance, as in Eq. 2. Depending on the amount of adjustment, the current buffer may cross the right, sometimes multiple s. In that case, we do not move it to the target location at once, to avoid possible ping-pong situations. Instead, we move the current buffer across one at a time, as shown in Fig. 9. When the current buffer is moved to the neighboring interval, because the buffer now sees different number of s on the right, the optimal location changes. If it moved towards right interval, N rt SV in Eq. 2 decreases by one, thus the optimal location moves towards left by C T SV /(2C) (Algorithm 1, Line 16). On the other hand, if it moved towards left interval, the optimal location moves towards right by C T SV /(2C) (Algorithm 1, Line 20). If the adjusted optimal location is in the new interval, the buffer location is finalized (exit condition of while loop in Algorithm 1, Line 13). If the new location is outside the new interval and the move direction is the same, we move the temporary buffer location into the new interval (Algorithm 1, Line 15). However, if the new location is outside the new interval and the move direction changes, i.e. the new optimal buffer location is in the right (previously visited) interval, we push the buffer towards the right extreme of the interval (Algorithm 1, Line 22) so that the distance between the buffer butler and the is minimum, which reduces the delay of the segment. When the adjusted buffer location and the temporary buffer location are in different intervals, we take the temporary buffer location (Algorithm 1, Line 22). Then the buffer location is finalized (Algorithm 1, Line 24) because we found that further adjustments do not reduce delay by much. At most, we need to move a buffer by the number of the s between the buffer and the right buffer, which is bounded by N T SV. Thus the runtime complexity of the algorithm is O(N T SV k), where k is the number of buffers inserted. IV. EXPERIMENTAL RESULTS To demonstrate the effectiveness of our proposed delay estimation technique, we perform buffer insertions on example nets. The same parameters as in Table I are used. A. 3D Nets with Fixed s The following buffer insertion methods are compared: 1) 2D analytical (2Dana): A 2D analytical buffer insertion, ignoring locations. 2) Proposed 3D heuristic (3Dheu): Our proposed 3D buffer insertion heuristic. 3) Ginneken (Gin): Ginneken buffer insertion with linear gate delay, lumped capacitance, and Elmore net delay models. Our Ginneken implementation is similar to the VG in [2], with extensions for 3D IC handling. A single buffer type (BUF X8) is used. The example nets are a mixture of purely random nets and nets from 3D design layouts (modified into two-pin form). All the experiments are based on layouts as explained in Section II-C. After buffer insertion, we evaluate the buffered delay using PrimeTime (which is accurate), to avoid unfair comparisons due to inaccuracy of analytical delay models. For delay estimation during early design stages, we may use fast and reasonably accurate analytical delay models instead. Table III shows the comparison of the estimated delay with different buffer insertion methods. In column 2Dana, we show two delay values: 1) 2D delay = delay without parasitics, 2) delay = delay with parasitics. The 2D delay underestimates the (actual 3D) delay by about 23%, which indicates the importance of including parasitics in delay estimation. We observe that 2Dana uses less number of buffers than other methods, because it ignores parasitics. As a result, compared with Gin, 2Dana overestimates achievable delay of n2 and n4 by about 12% and 11% respectively. Our 3Dheu uses about the same number of buffers as Gin does and the average estimated delay is only 2.5% higher, which clearly demonstrates the effectiveness of our algorithm. The average runtime of 3Dheu and Gin per net are about 8µs and 6ms, thus 3Dheu is about 750 times faster. Fig. 10 shows the buffer insertion results for n4. In 2Dana, the second buffer from the left drives three s, resulting in large delay. Our 3Dheu placed one more buffers than Gin did. Although the buffer locations by 3Dheu and Gin are different, the buffered delay values differ by only 3.0%, as shown in Table III. B. 3D Nets with Movable s Table III shows the comparison of the estimated delay with fixed case and movable case. We use the same target nets in Table III. For movable case, we place the buffers based on 2D analytical buffer insertion (Eq. 1), then move the s as discussed in Section III-A. The fixed column data is from Gin in Table III. By moving s as well as buffers, we can reduce the number of buffers and the delay by 29% and 5.8%, respectively. This demonstrates the effectiveness of moving s to reduce the number of buffers as well as the buffered delay. V. CONCLUSIONS We presented the fast buffered delay estimation techniques for -based 3D interconnects. Analytical delay models were applied to the 3D interconnects, and analytical buffer insertion formulations were developed to derive optimal buffer insertion solutions. For fixed cases, we proposed a fast buffer insertion heuristic method, which produced only a few percents of error in delay estimation

8 TABLE III COMPARISON OF THE ESTIMATED DELAY WITH DIFFERENT BUFFER INSERTION METHODS. THE LOCATION IS THE DISTANCE FROM THE SOURCE GATE TO THE, EXCLUDING LANDING PAD DIAMETER. THE LENGTH AND DELAY ARE IN µm AND ps, RESPECTIVELY. net wirelen. # location 2Dana 3Dheu Gin #buf 2D delay delay #buf delay #buf delay n , n , 1970, 1980, n , 1740, 1890, 2470, n , 1430, 1880, 1910, 2630, 2730, n , 1570, 1790, 1800, 1850, 2260, 3460, 4120, 4520, average (a) locations in n (b) 2Dana (c) 3Dheu (d) Gin Fig. 10. Buffer insertion results for n4. Numbers represent locations of s and buffers in µm. compared with van Ginneken buffer insertion based delay estimation. We also showed that for movable cases, relocation may improve delay. As a follow-up work, we plan to extend our algorithm to handle multi-fanout nets and perform buffered delay estimations on 3D interconnects of block-level 3D IC designs with buffer blockages. REFERENCES [1] C. Alpert and A. Devgan. Wire Segmenting for Improved Buffer Insertion. In Proc. ACM Design Automation Conf., pages , [2] C. J. Alpert, A. Devgan, and S. T. Quay. Buffer Insertion With Accurate Gate and Interconnect Delay Computation. In Proc. ACM Design Automation Conf., pages , [3] C. J. Alpert, J. Hu, S. S. Sapatnekar, and C. N. Sze. Accurate Estimation of Global Buffer Delay Within a Floorplan. IEEE Trans. on Computer- Aided Design of Integrated Circuits and Systems, 25(6): , June [4] H. B. Bakoglu and J. D. Meindl. Optimal Interconnection Circuits for VLSI. IEEE Trans. on Electron Devices, 32(5): , May [5] S. Dong, H. Bai, X. Hong, and S. Goto. Buffer Planning for 3D ICs. In Proc. IEEE Int. Symp. on Circuits and Systems, pages , [6] W. C. Elmore. The Transient Response of Damped Linear Network with Particular Regard to Wideband Amplifiers. J. Applied Physics, 19:55 63, Jan [7] C. V. Kashyap, C. J. Alpert, F. Liu, and A. Devgan. Closed-Form Expressions for Extending Step Delay and Slew Metrics to Ramp Inputs for RC Trees. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 23(4): , Apr [8] G. Katti, M. Stucchi, K. D. Meyer, and W. Dehaene. Electrical Modeling and Characterization of Through Silicon via for Three-Dimensional ICs. IEEE Trans. on Electron Devices, 57(1): , Jan [9] J. Lillis, C.-K. Cheng, and T.-T. Y. Lin. Optimal Wire Sizing and Buffer Insertion for Low Power and a Generalized Delay Model. IEEE Journal of Solid-State Circuits, 31(3): , [10] Chung-Wei Lin, Shih-Lun Huang, Kai-Chi Hsu, Meng-Xiang Lee, and Yao-Wen Chang. Multilayer Obstacle-Avoiding Rectilinear Steiner Tree Construction Based on Spanning Graphs. IEEE Trans. on Computer- Aided Design of Integrated Circuits and Systems, 27(11): , Nov [11] F. Liu, C. Kashyap, and C. J. Alpert. A Delay Metric for RC Circuits Based on the Weibull Distribution. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 23(3): , Mar [12] Nangate. Nangate 45nm Open Cell Library. [13] P. R. O Brien and T. L. Savarino. Modeling the Driving-Point Characteristic of Resistive Interconnect for Accurate Delay Estimation. In Proc. IEEE Int. Conf. on Computer-Aided Design, pages , [14] J. Qian, S. Pullela, and L. Pillage. Modeling the Effective Capacitance for the RC Interconnect of CMOS Gates. IEEE Trans. on Computer- Aided Design of Integrated Circuits and Systems, 13(12): , [15] W. Shi, Z. Li, and C. J. Alpert. Complexity Analysis and Speedup Techniques for Optimal Buffer Insertion with Minimum Cost. In Proc. Asia and South Pacific Design Automation Conf., pages , [16] L. P.P.P. van Ginneken. Buffer Placement in Distributed RC-tree Networks for Minimal Elmore Delay. In Proc. IEEE Int. Symp. on Circuits and Systems, pages , 1990.

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),

More information

Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles

Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles Ying Rao, Tianxiang Yang University of Wisconsin Madison {yrao, tyang}@cae.wisc.edu ABSTRACT Buffer

More information

AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS

AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS www.arpnjournals.com AN OPTIMIZED ALGORITHM FOR SIMULTANEOUS ROUTING AND BUFFER INSERTION IN MULTI-TERMINAL NETS C. Uttraphan 1, N. Shaikh-Husin 2 1 Embedded Computing System (EmbCoS) Research Focus Group,

More information

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu

More information

High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology

High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology Shreepad Panth 1, Kambiz Samadi 2, Yang Du 2, and Sung Kyu Lim 1 1 Dept. of Electrical and Computer Engineering, Georgia

More information

Fast, Accurate A Priori Routing Delay Estimation

Fast, Accurate A Priori Routing Delay Estimation Fast, Accurate A Priori Routing Delay Estimation Jinhai Qiu Implementation Group Synopsys Inc. Mountain View, CA Jinhai.Qiu@synopsys.com Sherief Reda Division of Engineering Brown University Providence,

More information

A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias

A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias Moongon Jung and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, Georgia, USA Email:

More information

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Renshen Wang Department of Computer Science and Engineering University of California, San Diego La Jolla,

More information

Making Fast Buffer Insertion Even Faster Via Approximation Techniques

Making Fast Buffer Insertion Even Faster Via Approximation Techniques 1A-3 Making Fast Buffer Insertion Even Faster Via Approximation Techniques Zhuo Li 1,C.N.Sze 1, Charles J. Alpert 2, Jiang Hu 1, and Weiping Shi 1 1 Dept. of Electrical Engineering, Texas A&M University,

More information

A Design Tradeoff Study with Monolithic 3D Integration

A Design Tradeoff Study with Monolithic 3D Integration A Design Tradeoff Study with Monolithic 3D Integration Chang Liu and Sung Kyu Lim Georgia Institute of Techonology Atlanta, Georgia, 3332 Phone: (44) 894-315, Fax: (44) 385-1746 Abstract This paper studies

More information

Chapter 28: Buffering in the Layout Environment

Chapter 28: Buffering in the Layout Environment Chapter 28: Buffering in the Layout Environment Jiang Hu, and C. N. Sze 1 Introduction Chapters 26 and 27 presented buffering algorithms where the buffering problem was isolated from the general problem

More information

Interconnect Delay and Area Estimation for Multiple-Pin Nets

Interconnect Delay and Area Estimation for Multiple-Pin Nets Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Z. Pan UCLA Computer Science Department Los Angeles, CA 90095 Sponsored by SRC and Avant!! under CA-MICRO Presentation

More information

A Study of Through-Silicon-Via Impact on the 3D Stacked IC Layout

A Study of Through-Silicon-Via Impact on the 3D Stacked IC Layout A Study of Through-Silicon-Via Impact on the Stacked IC Layout Dae Hyun Kim, Krit Athikulwongse, and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta,

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid

Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Efficient Second-Order Iterative Methods for IR Drop Analysis in Power Grid Yu Zhong Martin D. F. Wong Dept. of Electrical and Computer Engineering Dept. of Electrical and Computer Engineering Univ. of

More information

A Novel Performance-Driven Topology Design Algorithm

A Novel Performance-Driven Topology Design Algorithm A Novel Performance-Driven Topology Design Algorithm Min Pan, Chris Chu Priyadarshan Patra Electrical and Computer Engineering Dept. Intel Corporation Iowa State University, Ames, IA 50011 Hillsboro, OR

More information

An Efficient Algorithm For RLC Buffer Insertion

An Efficient Algorithm For RLC Buffer Insertion An Efficient Algorithm For RLC Buffer Insertion Zhanyuan Jiang, Shiyan Hu, Jiang Hu and Weiping Shi Texas A&M University, College Station, Texas 77840 Email: {jerryjiang, hushiyan, jianghu, wshi}@ece.tamu.edu

More information

Time Algorithm for Optimal Buffer Insertion with b Buffer Types *

Time Algorithm for Optimal Buffer Insertion with b Buffer Types * An Time Algorithm for Optimal Buffer Insertion with b Buffer Types * Zhuo Dept. of Electrical Engineering Texas University College Station, Texas 77843, USA. zhuoli@ee.tamu.edu Weiping Shi Dept. of Electrical

More information

Fast Algorithms For Slew Constrained Minimum Cost Buffering

Fast Algorithms For Slew Constrained Minimum Cost Buffering 18.3 Fast Algorithms For Slew Constrained Minimum Cost Buffering Shiyan Hu, Charles J. Alpert, Jiang Hu, Shrirang Karandikar, Zhuo Li, Weiping Shi, C. N. Sze Department of Electrical and Computer Engineering,

More information

Buffered Steiner Trees for Difficult Instances

Buffered Steiner Trees for Difficult Instances Buffered Steiner Trees for Difficult Instances C. J. Alpert 1, M. Hrkic 2, J. Hu 1, A. B. Kahng 3, J. Lillis 2, B. Liu 3, S. T. Quay 1, S. S. Sapatnekar 4, A. J. Sullivan 1, P. Villarrubia 1 1 IBM Corp.,

More information

Interconnect Design for Deep Submicron ICs

Interconnect Design for Deep Submicron ICs ! " #! " # - Interconnect Design for Deep Submicron ICs Jason Cong Lei He Kei-Yong Khoo Cheng-Kok Koh and Zhigang Pan Computer Science Department University of California Los Angeles CA 90095 Abstract

More information

A buffer planning algorithm for chip-level floorplanning

A buffer planning algorithm for chip-level floorplanning Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 763 A buffer planning algorithm for chip-level floorplanning CHEN Song 1, HONG Xianlong 1, DONG Sheqin 1, MA Yuchun 1, CAI Yici 1,

More information

An Interconnect-Centric Design Flow for Nanometer. Technologies

An Interconnect-Centric Design Flow for Nanometer. Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong Department of Computer Science University of California, Los Angeles, CA 90095 Abstract As the integrated circuits (ICs) are scaled

More information

Routability-Driven Bump Assignment for Chip-Package Co-Design

Routability-Driven Bump Assignment for Chip-Package Co-Design 1 Routability-Driven Bump Assignment for Chip-Package Co-Design Presenter: Hung-Ming Chen Outline 2 Introduction Motivation Previous works Our contributions Preliminary Problem formulation Bump assignment

More information

L14 - Placement and Routing

L14 - Placement and Routing L14 - Placement and Routing Ajay Joshi Massachusetts Institute of Technology RTL design flow HDL RTL Synthesis manual design Library/ module generators netlist Logic optimization a b 0 1 s d clk q netlist

More information

[14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM

[14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM Journal of High Speed Electronics and Systems, pp65-81, 1996. [14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM IEEE Design AUtomation Conference, pp.573-579,

More information

PERFORMANCE optimization has always been a critical

PERFORMANCE optimization has always been a critical IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 1633 Buffer Insertion for Noise and Delay Optimization Charles J. Alpert, Member, IEEE, Anirudh

More information

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA A Path Based Algorithm for Timing Driven Logic Replication in FPGA By Giancarlo Beraudo B.S., Politecnico di Torino, Torino, 2001 THESIS Submitted as partial fulfillment of the requirements for the degree

More information

3-D INTEGRATED CIRCUITS (3-D ICs) are emerging

3-D INTEGRATED CIRCUITS (3-D ICs) are emerging 862 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 5, MAY 2013 Study of Through-Silicon-Via Impact on the 3-D Stacked IC Layout Dae Hyun Kim, Student Member, IEEE, Krit

More information

Power-Supply-Network Design in 3D Integrated Systems

Power-Supply-Network Design in 3D Integrated Systems Power-Supply-Network Design in 3D Integrated Systems Michael B. Healy and Sung Kyu Lim School of Electrical and Computer Engineering, Georgia Institute of Technology 777 Atlantic Dr. NW, Atlanta, GA 3332

More information

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations An Efficient Routing Tree Construction Algorithm with uffer Insertion, Wire Sizing and Obstacle Considerations Sampath Dechu Zion Cien Shen Chris C N Chu Physical Design Automation Group Dept Of ECpE Dept

More information

Porosity Aware Buffered Steiner Tree Construction

Porosity Aware Buffered Steiner Tree Construction IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. Y, MONTH 2003 100 Porosity Aware Buffered Steiner Tree Construction Charles J. Alpert, Gopal Gandham, Milos Hrkic,

More information

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations XXVII IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, OCTOBER 5, 2009 Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Renshen Wang 1 Takumi Okamoto 2

More information

On Enhancing Power Benefits in 3D ICs: Block Folding and Bonding Styles Perspective

On Enhancing Power Benefits in 3D ICs: Block Folding and Bonding Styles Perspective On Enhancing Power Benefits in 3D ICs: Block Folding and Bonding Styles Perspective Moongon Jung, Taigon Song, Yang Wan, Yarui Peng, and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta,

More information

Path-Based Buffer Insertion

Path-Based Buffer Insertion 1346 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 7, JULY 2007 Path-Based Buffer Insertion C. N. Sze, Charles J. Alpert, Jiang Hu, and Weiping Shi Abstract

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 8 (2013), pp. 907-912 Research India Publications http://www.ripublication.com/aeee.htm Circuit Model for Interconnect Crosstalk

More information

VERY large scale integration (VLSI) design for power

VERY large scale integration (VLSI) design for power IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 1, MARCH 1999 25 Short Papers Segmented Bus Design for Low-Power Systems J. Y. Chen, W. B. Jone, Member, IEEE, J. S. Wang,

More information

Thermal-aware Steiner Routing for 3D Stacked ICs

Thermal-aware Steiner Routing for 3D Stacked ICs Thermal-aware Steiner Routing for 3D Stacked ICs Mohit Pathak and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology {mohitp, limsk}@ece.gatech.edu Abstract In this

More information

Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion

Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion Chen Li, Cheng-Kok Koh School of ECE, Purdue University West Lafayette, IN 47907, USA {li35, chengkok}@ecn.purdue.edu Patrick

More information

Buffered Routing Tree Construction Under Buffer Placement Blockages

Buffered Routing Tree Construction Under Buffer Placement Blockages Buffered Routing Tree Construction Under Buffer Placement Blockages Abstract Interconnect delay has become a critical factor in determining the performance of integrated circuits. Routing and buffering

More information

Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE Model

Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE Model 446 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 19, NO. 4, APRIL 2000 Algorithms for Non-Hanan-Based Optimization for VLSI Interconnect under a Higher-Order AWE

More information

Test-TSV Estimation During 3D-IC Partitioning

Test-TSV Estimation During 3D-IC Partitioning Test-TSV Estimation During 3D-IC Partitioning Shreepad Panth 1, Kambiz Samadi 2, and Sung Kyu Lim 1 1 Dept. of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 2

More information

AS VLSI technology scales to deep submicron and beyond, interconnect

AS VLSI technology scales to deep submicron and beyond, interconnect IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL., NO. 1 TILA-S: Timing-Driven Incremental Layer Assignment Avoiding Slew Violations Derong Liu, Bei Yu Member, IEEE, Salim

More information

Physical Design Implementation for 3D IC Methodology and Tools. Dave Noice Vassilios Gerousis

Physical Design Implementation for 3D IC Methodology and Tools. Dave Noice Vassilios Gerousis I NVENTIVE Physical Design Implementation for 3D IC Methodology and Tools Dave Noice Vassilios Gerousis Outline 3D IC Physical components Modeling 3D IC Stack Configuration Physical Design With TSV Summary

More information

A comprehensive workflow and methodology for parasitic extraction

A comprehensive workflow and methodology for parasitic extraction A comprehensive workflow and methodology for parasitic extraction Radoslav Prahov, Achim Graupner Abstract: In this paper is presented, analysed and assessed a design automation methodology of a tool employed

More information

Performance-Preserved Analog Routing Methodology via Wire Load Reduction

Performance-Preserved Analog Routing Methodology via Wire Load Reduction Electronic Design Automation Laboratory (EDA LAB) Performance-Preserved Analog Routing Methodology via Wire Load Reduction Hao-Yu Chi, Hwa-Yi Tseng, Chien-Nan Jimmy Liu, Hung-Ming Chen 2 Dept. of Electrical

More information

ALGORITHMS FOR THE SCALING TOWARD NANOMETER VLSI PHYSICAL SYNTHESIS. A Dissertation CHIN NGAI SZE

ALGORITHMS FOR THE SCALING TOWARD NANOMETER VLSI PHYSICAL SYNTHESIS. A Dissertation CHIN NGAI SZE ALGORITHMS FOR THE SCALING TOWARD NANOMETER VLSI PHYSICAL SYNTHESIS A Dissertation by CHIN NGAI SZE Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Three-Dimensional Integrated Circuits: Performance, Design Methodology, and CAD Tools

Three-Dimensional Integrated Circuits: Performance, Design Methodology, and CAD Tools Three-Dimensional Integrated Circuits: Performance, Design Methodology, and CAD Tools Shamik Das, Anantha Chandrakasan, and Rafael Reif Microsystems Technology Laboratories Massachusetts Institute of Technology

More information

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation Course Goals Lab Understand key components in VLSI designs Become familiar with design tools (Cadence) Understand design flows Understand behavioral, structural, and physical specifications Be able to

More information

Basic Idea. The routing problem is typically solved using a twostep

Basic Idea. The routing problem is typically solved using a twostep Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a

More information

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Chen-Wei Liu 12 and Yao-Wen Chang 2 1 Synopsys Taiwan Limited 2 Department of Electrical Engineering National Taiwan University,

More information

Layer Assignment for Reliable System-on-Package

Layer Assignment for Reliable System-on-Package Layer Assignment for Reliable System-on-Package Jacob R. Minz and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, GA 30332-0250 {jrminz,limsk}@ece.gatech.edu

More information

Thermal-aware Steiner Routing for 3D Stacked ICs

Thermal-aware Steiner Routing for 3D Stacked ICs Thermal-aware Steiner Routing for 3D Stacked ICs Mohit Pathak and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology {mohitp, limsk}@ece.gatech.edu Abstract In this

More information

A Framework for Systematic Evaluation and Exploration of Design Rules

A Framework for Systematic Evaluation and Exploration of Design Rules A Framework for Systematic Evaluation and Exploration of Design Rules Rani S. Ghaida* and Prof. Puneet Gupta EE Dept., University of California, Los Angeles (rani@ee.ucla.edu), (puneet@ee.ucla.edu) Work

More information

Graph Models for Global Routing: Grid Graph

Graph Models for Global Routing: Grid Graph Graph Models for Global Routing: Grid Graph Each cell is represented by a vertex. Two vertices are joined by an edge if the corresponding cells are adjacent to each other. The occupied cells are represented

More information

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing Umadevi.S #1, Vigneswaran.T #2 # Assistant Professor [Sr], School of Electronics Engineering, VIT University, Vandalur-

More information

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation

Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Optimum Placement of Decoupling Capacitors on Packages and Printed Circuit Boards Under the Guidance of Electromagnetic Field Simulation Yuzhe Chen, Zhaoqing Chen and Jiayuan Fang Department of Electrical

More information

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Wei-Jin Dai, Dennis Huang, Chin-Chih Chang, Michel Courtoy Cadence Design Systems, Inc. Abstract A design methodology for the implementation

More information

An Automated System for Checking Lithography Friendliness of Standard Cells

An Automated System for Checking Lithography Friendliness of Standard Cells An Automated System for Checking Lithography Friendliness of Standard Cells I-Lun Tseng, Senior Member, IEEE, Yongfu Li, Senior Member, IEEE, Valerio Perez, Vikas Tripathi, Zhao Chuan Lee, and Jonathan

More information

Estimation of Wirelength

Estimation of Wirelength Placement The process of arranging the circuit components on a layout surface. Inputs: A set of fixed modules, a netlist. Goal: Find the best position for each module on the chip according to appropriate

More information

Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs

Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs Design and Analysis of Ultra Low Power Processors Using Sub/Near-Threshold 3D Stacked ICs Sandeep Kumar Samal, Yarui Peng, Yang Zhang, and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta,

More information

MOORE s law historically enables designs with higher

MOORE s law historically enables designs with higher 634 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 17, NO. 4, JULY 2018 Interdie Coupling Extraction and Physical Design Optimization for Face-to-Face 3-D ICs Yarui Peng, Member, IEEE, Dusan Petranovic, Member,

More information

Chapter 5 Global Routing

Chapter 5 Global Routing Chapter 5 Global Routing 5. Introduction 5.2 Terminology and Definitions 5.3 Optimization Goals 5. Representations of Routing Regions 5.5 The Global Routing Flow 5.6 Single-Net Routing 5.6. Rectilinear

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

Linking Layout to Logic Synthesis: A Unification-Based Approach

Linking Layout to Logic Synthesis: A Unification-Based Approach Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 6 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Contents FPGA Technology Programmable logic Cell (PLC) Mux-based cells Look up table PLA

More information

VLSI Physical Design: From Graph Partitioning to Timing Closure

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5 Global Routing Original uthors: ndrew. Kahng, Jens, Igor L. Markov, Jin Hu Chapter 5 Global Routing 5. Introduction 5.2 Terminology and Definitions 5.3 Optimization Goals 5. Representations of

More information

An overview of standard cell based digital VLSI design

An overview of standard cell based digital VLSI design An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased

More information

ESD Protection Design for Mixed-Voltage I/O Interfaces -- Overview

ESD Protection Design for Mixed-Voltage I/O Interfaces -- Overview ESD Protection Design for Mixed-Voltage Interfaces -- Overview Ming-Dou Ker and Kun-Hsien Lin Abstract Electrostatic discharge (ESD) protection design for mixed-voltage interfaces has been one of the key

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

Clock Skew Optimization Considering Complicated Power Modes

Clock Skew Optimization Considering Complicated Power Modes Clock Skew Optimization Considering Complicated Power Modes Chiao-Ling Lung 1,2, Zi-Yi Zeng 1, Chung-Han Chou 1, Shih-Chieh Chang 1 National Tsing-Hua University, HsinChu, Taiwan 1 Industrial Technology

More information

SYNTHESIS FOR ADVANCED NODES

SYNTHESIS FOR ADVANCED NODES SYNTHESIS FOR ADVANCED NODES Abhijeet Chakraborty Janet Olson SYNOPSYS, INC ISPD 2012 Synopsys 2012 1 ISPD 2012 Outline Logic Synthesis Evolution Technology and Market Trends The Interconnect Challenge

More information

ECE260B CSE241A Winter Routing

ECE260B CSE241A Winter Routing ECE260B CSE241A Winter 2005 Routing Website: / courses/ ece260bw05 ECE 260B CSE 241A Routing 1 Slides courtesy of Prof. Andrew B. Kahng Physical Design Flow Input Floorplanning Read Netlist Floorplanning

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

VLSI Physical Design: From Graph Partitioning to Timing Closure

VLSI Physical Design: From Graph Partitioning to Timing Closure VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5 Global Routing Original uthors: ndrew. Kahng, Jens, Igor L. Markov, Jin Hu VLSI Physical Design: From Graph Partitioning to Timing

More information

Simultaneous Shield and Buffer Insertion for Crosstalk Noise Reduction in Global Routing

Simultaneous Shield and Buffer Insertion for Crosstalk Noise Reduction in Global Routing Simultaneous Shield and Buffer Insertion for Crosstalk Noise Reduction in Global Routing Tianpei Zhang and Sachin S. Sapatnekar Department of Electrical and Computer Engineering, University of Minnesota

More information

2334 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER Broadband ESD Protection Circuits in CMOS Technology

2334 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER Broadband ESD Protection Circuits in CMOS Technology 2334 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003 Brief Papers Broadband ESD Protection Circuits in CMOS Technology Sherif Galal, Student Member, IEEE, and Behzad Razavi, Fellow,

More information

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface. Placement Introduction A very important step in physical design cycle. A poor placement requires larger area. Also results in performance degradation. It is the process of arranging a set of modules on

More information

Digital VLSI Design. Lecture 7: Placement

Digital VLSI Design. Lecture 7: Placement Digital VLSI Design Lecture 7: Placement Semester A, 2016-17 Lecturer: Dr. Adam Teman 29 December 2016 Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from

More information

Thermal-Aware 3D IC Physical Design and Architecture Exploration

Thermal-Aware 3D IC Physical Design and Architecture Exploration Thermal-Aware 3D IC Physical Design and Architecture Exploration Jason Cong & Guojie Luo UCLA Computer Science Department cong@cs.ucla.edu http://cadlab.cs.ucla.edu/~cong Supported by DARPA Outline Thermal-Aware

More information

Postgrid Clock Routing for High Performance Microprocessor Designs

Postgrid Clock Routing for High Performance Microprocessor Designs IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 31, NO. 2, FEBRUARY 2012 255 Postgrid Clock Routing for High Performance Microprocessor Designs Haitong Tian, Wai-Chung

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

Fast Interconnect Synthesis with Layer Assignment

Fast Interconnect Synthesis with Layer Assignment Fast Interconnect Synthesis with Layer Assignment Zhuo Li IBM Research lizhuo@us.ibm.com Tuhin Muhmud IBM Systems and Technology Group tuhinm@us.ibm.com Charles J. Alpert IBM Research alpert@us.ibm.com

More information

10. Interconnects in CMOS Technology

10. Interconnects in CMOS Technology 10. Interconnects in CMOS Technology 1 10. Interconnects in CMOS Technology Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October

More information

Pseudopin Assignment with Crosstalk Noise Control

Pseudopin Assignment with Crosstalk Noise Control 598 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 20, NO. 5, MAY 2001 Pseudopin Assignment with Crosstalk Noise Control Chin-Chih Chang and Jason Cong, Fellow, IEEE

More information

SEMICONDUCTOR industry is beginning to question the

SEMICONDUCTOR industry is beginning to question the 644 IEEE TRANSACTIONS ON COMPONENTS AND PACKAGING TECHNOLOGIES, VOL. 29, NO. 3, SEPTEMBER 2006 Placement and Routing for 3-D System-On-Package Designs Jacob Rajkumar Minz, Student Member, IEEE, Eric Wong,

More information

CS612 Algorithms for Electronic Design Automation. Global Routing

CS612 Algorithms for Electronic Design Automation. Global Routing CS612 Algorithms for Electronic Design Automation Global Routing Mustafa Ozdal CS 612 Lecture 7 Mustafa Ozdal Computer Engineering Department, Bilkent University 1 MOST SLIDES ARE FROM THE BOOK: MODIFICATIONS

More information

A New Methodology for Interconnect Parasitic Extraction Considering Photo-Lithography Effects

A New Methodology for Interconnect Parasitic Extraction Considering Photo-Lithography Effects A New Methodology for Interconnect Parasitic Extraction Considering Photo-Lithography Effects Ying Zhou, Yuxin Tian, Weiping Shi Texas A&M University Zhuo Li Pextra Corporation Frank Liu IBM Austin Research

More information

A Faster Approximation Scheme For Timing Driven Minimum Cost Layer Assignment

A Faster Approximation Scheme For Timing Driven Minimum Cost Layer Assignment A Faster Approximation Scheme For Timing Driven Minimum Cost Layer Assignment ABSTRACT Shiyan Hu Dept. of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan 49931

More information

Chapter 2 On-Chip Protection Solution for Radio Frequency Integrated Circuits in Standard CMOS Process

Chapter 2 On-Chip Protection Solution for Radio Frequency Integrated Circuits in Standard CMOS Process Chapter 2 On-Chip Protection Solution for Radio Frequency Integrated Circuits in Standard CMOS Process 2.1 Introduction Standard CMOS technologies have been increasingly used in RF IC applications mainly

More information

Efficient Multilayer Routing Based on Obstacle-Avoiding Preferred Direction Steiner Tree

Efficient Multilayer Routing Based on Obstacle-Avoiding Preferred Direction Steiner Tree Efficient Multilayer Routing Based on Obstacle-Avoiding Preferred Direction Steiner Tree Ching-Hung Liu, Yao-Hsin Chou, Shin-Yi Yuan, and Sy-Yen Kuo National Taiwan University 1 Outline 2 Outline 3 The

More information

(Lec 14) Placement & Partitioning: Part III

(Lec 14) Placement & Partitioning: Part III Page (Lec ) Placement & Partitioning: Part III What you know That there are big placement styles: iterative, recursive, direct Placement via iterative improvement using simulated annealing Recursive-style

More information

Advanced Surface Based MoM Techniques for Packaging and Interconnect Analysis

Advanced Surface Based MoM Techniques for Packaging and Interconnect Analysis Electrical Interconnect and Packaging Advanced Surface Based MoM Techniques for Packaging and Interconnect Analysis Jason Morsey Barry Rubin, Lijun Jiang, Lon Eisenberg, Alina Deutsch Introduction Fast

More information

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

Iterative-Constructive Standard Cell Placer for High Speed and Low Power Iterative-Constructive Standard Cell Placer for High Speed and Low Power Sungjae Kim and Eugene Shragowitz Department of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455

More information

Capacitive Coupling Noise in High-Speed VLSI Circuits

Capacitive Coupling Noise in High-Speed VLSI Circuits Capacitive Coupling Noise in High-Speed VLSI Circuits Payam Heydari Department of Electrical and Computer Engineering University of California Irvine, CA 92697 Massoud Pedram Department of Electrical Engineering-Systems

More information

Timing-Constrained I/O Buffer Placement for Flip- Chip Designs

Timing-Constrained I/O Buffer Placement for Flip- Chip Designs Timing-Constrained I/O Buffer Placement for Flip- Chip Designs Zhi-Wei Chen 1 and Jin-Tai Yan 2 1 College of Engineering, 2 Department of Computer Science and Information Engineering Chung-Hua University,

More information

Buffer Block Planning for Interconnect Planning and Prediction

Buffer Block Planning for Interconnect Planning and Prediction Buffer Block Planning for Interconnect Planning and Prediction Jason Cong, Tianming Kong and David Zhigang Pan y Department of Computer Science, University of California, Los Angeles, CA 90095 y IBM T.

More information