Thermal-aware Steiner Routing for 3D Stacked ICs

Size: px

Start display at page:

Download "Thermal-aware Steiner Routing for 3D Stacked ICs"

Roland Freeman
6 years ago
Views:

1 Thermal-aware Steiner Routing for 3D Stacked ICs Mohit Pathak and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology {mohitp, Abstract In this paper, we present the first work on the Steiner routing for 3D stacked ICs. In the 3D Steiner routing problem, the pins are located in multiple device layers, which makes it more general than its D counterpart. Our algorithm consists of two steps: tree construction and tree refinement. Our tree construction algorithm builds a delay-oriented Steiner tree under a given thermal profile. We show that thermal-aware 3D tree construction involves the minimization of two-variable Elmore delay function. In our tree refinement algorithm, we reposition the through-vias while preserving the original routing topology for further thermal optimization under performance constraint. We employ a novel scheme to relax the initial NLP formulation to ILP and consider all through-vias from all nets simultaneously. Our related experiments show the effectiveness of our proposed solutions. I. INTRODUCTION The emerging 3D die stacking technology enables the integration of multiple planar integrated circuits in the vertical direction with high-density interconnect through-silicon-vias. The pitch of these vertical die-to-die vias can be as short as a few microns, thus substantially reduce the communication latency, in particular, the global interconnect. One of the major concerns of 3D ICs is thermal dissipation. Stacking of different device layers combined with the low thermal conductivity of the bonding material may result in excessively high on-chip temperature. The location of through vias in a Steiner tree has high impact on the delay at the sink nodes of the tree. In addition, effective placement of through vias can play a significant role in lowering the temperature of the chip since these vias establish thermal paths to the heat sink. Thus, an approach which tries to optimize the location of through vias such that both performance and thermal issues are addressed is important. The contributions of this paper are as follows: We formulate and solve the new 3D IC Steiner Routing (3DSR) problem for multi-pin net routing in 3D stacked ICs. Our algorithm considers all die simultaneously and constructs performance-oriented Steiner trees under given thermal profile while determining the locations for the related through-vias. We show that thermal-aware 3D tree construction involves the minimization of two-variable Elmore delay function. We formulate and solve the new Through-via Relocation problem for thermal optimization in 3D stacked ICs. We further optimize the thermal objective by relocating This material is based upon work supported by the National Science Foundation under CAREER Grant No. CCF and MARCO CS. through vias in each tree under a given timing constraint while preserving the original performance-oriented routing topology. We employ a novel scheme to relax the initial NLP formulation to ILP and consider all throughvias from all nets simultaneously. A previous work [1] talks about Steiner trees in different routing layers, but their work does not consider pins on different device layers. Thus, this work cannot be used for 3D ICs without any significant modification. Another recent work on thermal via insertion [] is targeting thermal optimization during routing for 3D ICs. However, the thermal vias are additional vias that do not carry any signal and may cause congestion problem. In our algorithm, we relocate the existing through-vias for thermal optimization. Lastly, some analytical results on the location of through vias were recently presented in [3]. However, this work focused on two-pin nets and did not perform Steiner tree construction. II. PRELIMINARIES A. Problem Formulation We assume the following is given: (i) a set of m nets {n 0, n 1,, n m 1 }, where each net is represented by a list of pins n i = {p 0,p 1,,p k 1 } with p 0 as the driver, (ii) a 3D routing grid G that represents the routing resource in a given 3D stacked IC, where each grid node represents a routing region and each edge denotes the adjacency among the regions, (iii) each x/y grid edge is associated with horizontal/vertical wire capacity and z with via capacity, (iv) the location of each pin p(x, y, z), and (v) a 3D thermal grid Z with temperature at all grid nodes. A 3D Steiner Tree is defined to be a set of D (= planar) Steiner trees connected by through vias. The goal of the Thermal-aware 3D Steiner Routing problem is to generate a 3D Steiner tree for each net while satisfying the capacity constraints specified in the underlying G. 1 The objective is to minimize (i) the maximum temperature among all nodes in the thermal grid, and (ii) the maximum Elmore delay among all pins in each tree, where the delay is computed based on the given thermal distribution. This paper uses the recent temperature dependent interconnect delay model proposed in [4]. The line resistance per unit length can be calculated as: r(x) =r 0 (1 + β T (x)), where r 0 is the resistance at 0 C, β is the temperature co-efficient of resistance, and T (x) denotes the temperature at location x. 1 Note that we do not explicitly minimize routing/via congestion but implicitly address it by satisfying the capacity constraints. Each edge capacity can be set independently to reflect the availability of the routing resource /07/$ IEEE 05

2 B. Overview of the Approach The goal of our 3D Steiner routing is to address the following two important design objectives: performance and thermal. We tackle this problem in two steps: 1) Construction step: we first perform thermal analysis from the given 3D placement. We then construct a performance-oriented routing tree for each net under the non-uniform thermal profile. We recompute the temperature values to reflect the routing. ) Refinement step: We optimize the thermal objective by relocating through vias in each tree under given timing constraints. Note that the through via relocation preserves the original tree topology while optimizing the thermal objective. We recompute the timing slacks and temperature values to reflect the relocation. We repeat this step until no more thermal improvement is possible. III. 3D STEINER TREE CONSTRUCTION A. Overview of the Algorithm The basic approach of our 3D Steiner tree construction algorithm is similar to SERT [5], where an existing tree is incrementally grown by connecting a new sink pin to it. SERT starts with the driver pin and selects the sink pin that minimizes Elmore delay when connected to the driver. This process continues until all sink pins are connected to the tree that is being grown. The goal is to minimize the maximum Elmore delay among all sink pins of the tree. Here the biggest challenge is to compute the point on the tree where the new pin connects to. There are three major differences between SERT and our work: (i) all the pins in SERT are located in the same die, whereas our 3D algorithm handles the pins located in different die. This 3D case requires the usage of through vias, and the location of these vias has huge impact on the topology of the tree as well as the sink pin delay. (ii) the delay optimization in SERT is based on single variable, whereas our algorithm deals with two-variable function optimization, (iii) our interconnect delay is computed based on the given thermal profile. A pseudocode of our algorithm is shown in Figure 1. Our routing algorithm consists of two phases: construction (line 1-13) and refinement (14-15). We construct 3D Steiner trees during the construction phase while ignoring congestion and then alleviate congestion by rip-up-and-reroute during the refinement phase. Given a net n, our 3D Steiner tree T n initially contains the driver pin (line ). We store the remaining pins of n in Q n (line 3). We then examine all pinedge pair (line 5-6) and compute the impact of connecting the pin to the edge on Elmore delay under the given thermal profile Z, where the pin is from Q n and the edge is from T n. Specifically, the delay impact is calculated based on the increase in temperature-dependent Elmore delay among all pins currently in T n (line 9-10). This requires the computation of connection point x and the through via location y (line 7-8) (to be discussed in Section III-B). Next, we select the pin-edge pair that results in the minimum max-delay increase (line 11) Thermal-aware 3D Steiner Routing Algorithm input: netlist NL, routing graph R, thermal profile Z output: 3D Steiner tree for each net 1. for (each net n NL). T n = p 0 (n); 3. Q n = setofpinsofn except p 0 ; 4. while (Q n ) 5. for (each pin a Q n ) 6. for (each edge e T n ) 7. x = connection point for a e; 8. y = through via location on e(x, a); 9. update d(p) for all p T n ; 10. X(a, e) = max{d(p) p T n a}; 11. (a min,e min ) = pin+edge pair with min X; 1. T n = T n e min ; 13. remove a min from Q n ; 14. for (each non-timing critical T n violating capacity) 15. rip-up-and-reroute T n under Z; Fig. 1. Pseudocode of thermal-aware 3D Steiner routing algorithm. In case e and a are located in different planes, e(r, a) will contain a through via. and add the pin to T n (line 1-13). Our rip-up-and-reroute is done only on the less timing critical nets, i.e., the nets with smaller max-delay values (line 14-15). The complexity of our algorithm is O(nm 4 ), where n is the total number of nets and m is the maximum number of pins in any net. The O(m 4 ) term is based on the while-loop, two for-loops, and Elmore delay computation for all sinks (line 9). Since m is bounded by a constant in VLSI circuits typically, our algorithm runs in linear time in practice. B. Connection Point and Via Location In this section we discuss how we can efficiently construct 3D Steiner trees. Our discussion is based on two-die case for the simplicity of the discussion, but our algorithm is applicable to multiple die without any modification. Let r 1 and c 1 denote the unit length resistance and capacitance values for die 1. r and c are similarly defined for die. The capacitance and resistance of a through via connecting the two die are denoted C via and R via. Given a pin p and an edge e T,theconnection point is defined as the point on e to which p is connected. The connection point computation for D case has been presented in [5], where the Elmore delay change on an entire tree caused by adding a new pin to the tree is a function of a single variable x, the location of connection point. We extend this work by introducing a second variable y that represents the location of through via. We then optimize the two-variable delay function and determine the location of connection point (= x) and through via (= y) for 3D case. ReferringtoFigure,e(p, c) and e(q, b) are edges on T. p is the parent node of e(p, c), and q is the parent node of e(q, b). a is the new pin that needs to connect to e(p, c). Edge e(p, c) lies on die 1 with interconnect parasitics r 1 and c 1, whereas a lies on die with interconnect parasitics r and c. 06

3 die 1 die p 0 g (case 4) b (case 3) connection point p x d c (case ) q y, through via dz a (case 1) Fig.. Illustration of how pin a connects to e(p, c) T. e(y, a) is routed in the bottom die, where as all other edges are routed in the top die. x is the location of connection point on e(p, c). y is the location of the through via inserted on e(x, a). e(q, b) is another branch in T. g is another sink that is not a part of the subtree rooted at p. d is the shortest distance point on e(p, c) from a, andδz is the distance between the through via and a. The Elmore delay of T a is a function of both x and y. d is the point on e(p, c) that is of the shortest distance to a. x is the connection point, and y is the location of through via. Our first goal is to derive Elmore delay equations that are the functions of δx and δy. In what follows, we let δx denote the distance between node p and node x, δq, δa, δb, δc, and δd are used similarly. δy is the distance between x and y, and δz is the distance between y and a. LetT b denote the subtree rooted at node b. In order to compute the Elmore delay change on all sink pins in T caused by adding a to T, we consider the following four cases: (i) delay at the node to be added (= node a), (ii) delay at the subtree located after the connection point (= node c), (iii) delay at the subtree that could be located either before or after the connection point (= node b), (iv) delay of the nodes not in T p. Figure illustrates these four cases: Case 1: We handle the delay at node a. In this case, d(a) is a sum of four functions. f 1 is the delay from p 0 to p. f is the delay from p to x without considering e(q, b) and T b. f 3 is the delay from p to x when considering e(q, b) and T b. f 4 is the delay from x to a. Thus we have d(a) =f 1 +f +f 3 +f 4, where f 1 = K 0 + K 1 {c 1 δy + C via + c δz + c 1 δc + C c + C b +c 1 (δb ( δq)} ) δx f = r 1 δx c 1 + c 1δy + C via + c δz + c 1 (δc δx)+c c { r1 δx(c 1 (δb δq)+c b ), if δx δq f 3 = r 1 δq(c ( 1 (δb δq)+c b ), ) if δx δq δy f 4 = r 1 δy c 1 + C via + c δz δz +r c + R via ( Cvia + c δz ) where δz = δa (δx + δy), K 0 is the sum of resistance and capacitance products along p 0 p path, K 1 is the sum of resistance along p 0 p path, and C i is the capacitance of the sub-tree rooted at i th node. Case : The new delay at node c is given by d(c) =f 1 + f + f 3 + f 4, where f 3 = r 1 δq(c 1 (δb δq)+c b ) { } f 4 c1 (δc δx) = r 1 (δc δx) + C c Case 3: The new delay at node b is given by d(b) =f 1 + f + f 3, where { f r1 δx(c = 1 δy + C via + c δz), if δx δq r 1 δq(c 1 δy + C via + c δz), if δx δq { } f 3 δq = r 1 δq c 1 + c 1(δb δq)+c b + C c Case 4: For all other nodes not in T p, the added delay is a function of the added capacitance, which is linear in terms of x and y and given by C = c 1 (δx + δy)+c via + c δz In case of connecting two pins separated by non-zero intermediate layers, we use stacked vias so that no routing in these intermediate layers are used. C. Optimization of Delay Equations We discuss how the delay equations derived in the previous section can be optimized to give a small set of possible optimum location points. We first consider the conditions needed to determine the minimum of a general quadratic function of two variables. We later show how the delay equations derived in the previous section can be optimized using these conditions. In general, for a quadratic function of two variables f(δx, δy), the maximum or the minimum of the function depends upon the values of F δx and the determinant of the Hessian matrix H 1 : F F δx δx δy F δx δy F δy where F is the delay function under consideration. The above values for a quadratic function of two variables are always constant. For our case we have 0 δx δd and 0 δy δa and thus consider the following cases: Case A: If F δx 0 and H 1 0, the minimum can be found at the boundary points, i.e., δx =0or δx = δd and δy =0or δy = δa. Thus we have four points to look for the minimum. Case B: If F 0 F 0 and H 1 =0,wehavea δx δy concave function, and the minimum lies on the boundary points. Case C: If F δx = 0 F δy = 0 and H 1 = 0, then f(δx, δy) is a linear function of δx and δy, and the minimum lies at the boundary points. Case D: If H 1 < 0, the critical point found is a saddle point. and the minimum lies at the boundary, although a different set of boundary points may need to be chosen. The set of boundary points may be found by setting δx = 0 or δx = δd and minimizing f(δx, δy) as a function of δy or setting δy = 0 or δy = δa and minimizing f(δx, δy) as a function of δx. We show that the Elmore delay at each sink node in T can be optimized by considering any of the 4 cases shown above. 07

4 p3 (L3) v1 driver (L1) R v1 v R v3 v3 p4 (L4) p1 (L1) p (L) goal is then to find a new location for each movable via in each Steiner tree so that the maximum temperature among all nodes in the thermal grid is minimized while the timing and routing resource capacity constraints are not violated. Note that we preserve the original topology of the Steiner trees. All that is changing is the location of through vias for thermal optimization, where the movable through vias are moved into thermal hotspots under timing constraint to reduce the thermal resistivity. Fig. 3. Illustration of movable range. The figure is a top view of a multilayer grid. The driver is represented by a triangle, the sinks are represented by dots, and the vias are represented by square boxes. The driver is located in layer 1 (= L1). The movable range of each via is represented by the dotted lines. Thus, there is a fixed number of points (x, y) for which the Elmore delay can be minimized. IV. THERMAL-AWARE THROUGH VIA RELOCATION A. Overview of the Algorithm The motivation behind our through via relocation is to move as many through vias into thermal hotspots as possible while preserving the original tree topology we obtain during our construction step. The objective is to minimize the maximum temperature on the chip while not violating the timing and routing resource capacity constraints. Note that the problem of optimizing the number of through vias and temperature is non-linear in nature since we need to solve the equation T = PR, where T is the temperature matrix, P is the power vector, and R is the thermal resistance matrix such that R 1 a, where a is the number of through vias. General solutions available for solving non-linear problems cannot be applied directly for large problem sizes. In this section we propose an innovative problem formulation which helps us effectively overcome the non-linear nature of this problem. We propose a relaxed ILP based formulation in which the number of integer variables are kept at minimum. Our ILP-based method optimizes through vias on all nets simultaneously, which is more rigorous than a sequential approach that optimizes the nets one by one. In addition, we target all hotspots simultaneously instead of iteratively targeting one by one. Our experimental results in Section V demonstrate the advantage of this approach. B. Movable Range We start with the set of 3D Steiner trees we obtain from the construction step. All pins in each tree T i are associated with timing constraint that denotes the required arrival time in terms of Elmore delay. Each through via v T i is associated with the movable range, denoted R via, that denotes the range of new location along its route to the connection point so that the timing constraints are not violated. An illustration is shown in Figure 3. In case the movable range of a through via v is a single point, v is non-movable; otherwise it is movable. Our C. Fast Thermal Analysis To optimize the location of through vias, we use a fast thermal analysis model described in []. An illustration is shown in Figure 4. A tile structure is imposed on the surface, where each tile is approximated as a resistive chain as shown in Figure 4. In 3D ICs, the heat sinks are attached to the bottom or top side of the 3D IC stack, with other boundaries being adiabatic. Thus, the dominant heat flow is in the vertical direction. For the purpose of optimization we view each tile stack in Figure 4 as an independent thermal resistive chain. In our model we do not consider effects of lateral thermal dissipation, this can be justified since the thermal conductivity of epoxy material used to join the die is much lower than that of silicon itself. This essentially means that it is difficult for heat to dissipate in the vertical direction as compared to horizontal direction. Our fast thermal model which optimizes vertical heat flow helps in reducing the overall chip temperature. To obtain effective temperature reduction the full resistive thermal model [] (considering lateral resistances) is run twice, once before and once after calling our via relocation routine. The values in temperature reduction reported in our experiments are based on the full resistive model. D. Non-linear Programming In the following sections we first show how the thermal relocation problem can be formulated as a NLP (non linear programming) formulation. We then show how the NLP may be converted to an ILP problem formulation. The ILP formulation adds a large number of integer variables in the problem thus making it difficult to solve. We finally propose our reduced ILP problem formulation (which is a relaxed ILP formulation) which reduces the number of integer variables significantly. The NLP based formulation is defined as follows (Table I explains the notations we use in the formulation): Minimize Σα T (1) The authors in [] suggest to add thermal-vias for further thermal optimization. Our method, which simply relocates exiting through-vias, can be used in conjunction with these approaches for more rigorous thermal optimization. 08

5 TABLE I VARIABLES AND CONSTANTS USED IN OUR ILP FORMULATION T temperature at tile (i, j, k) α temperature-related weight for tile (i, j, k), which is computed by T org org /Tmax, wheret org denotes the original temperature for (i, j, k) before the optimization, and Tmax org is the maximum value among all T org values. constant vmax maximum number of through vias each tile can accommodate. constant β n becomes 1 when a through via is moved to tile (i, j, k) so that the total number of movable through vias at (i, j, k) changes from n 1 to n. V org original number of vias (= movable + non-movable) in tile (i, j, k) before the optimization. constant V m number of through vias in tile (i, j, k), which is just m. number of vias (= movable + non-movable) in tile (n) G cur G max R m R novia α γ n δ m Subject to: (i, j, k) after the optimization. becomes 1 if a through via in net n is moved from tile (i, j, k) to (x, y, k); 0 otherwise. current usage of wires and through vias for a grid (i, j, k) in 3D routing grid G. capacity constraint of wires and through vias for a grid (i, j, k) in 3D routing grid G. constant thermal resistance of tile i, j, k having m vias. thermal resistance of tile i, j, k with no through vias. constant thermal resistance of one through via. constant becomes 1 if the number of through vias in the tile (i, j, k) is n. it is the temperature difference between tile i, j, k and i, j, k 1 if the number of through vias in tile i, j, k is m. δ m =(P i,j,n + + P ) R m constant T = T 1 +(P i,j,n + + P ) R opt V =ΣM x,y,k R opt = α α + R novia () (3) = V org + V (4) v,w,k (n) ΣM (n), n (5) G cur G max, (i, j, k) (6) Σ (n) =1, n (7) (n) {0, 1} (8) Equation (1) is our objective function, where we minimize the weighted sum of temperature values at all thermal tiles (thermal tiles and thermal mesh nodes are used interchangeably in this paper). The weights α are computed based on the initial temperature measured before the through via relocation. In this case, the higher the α is, the lower the T we desire. Equation () gives the temperature at each tile based on our fast thermal model illustrated in Figure 4. Equation(3) shows the variation of thermal resistance based on the number of through vias in a tile. This is obtained from solving the following parallel resistance relation: 1/R opt =1/Rnovia opt /α. Equation (4) is from the definition of V. Equation + (5) states that the total change in the number of through vias for tile (i, j, k) is the total number of through vias moved into R lateral Tile stack array Sing tile stack Tile stack analysis Fig. 4. Thermal model used, where R b denotes the thermal resistance to the heatsink. The convention used in our ILP formulation is that adding through vias at point (= tile) i reduces the value of R i. (i, j, k) minus the total number of through vias moved out of (i, j, k). Equation (6) ensures that the routing resource (= wires and through vias) capacity constraints are satisfied. Equation (8) states that (n) are binary integer variables. Equation (7) ensures that only one via per net is moved. Note that this restriction is unavoidable since the movable range of a through via is computed independent of other through vias. Once a through via is moved, it affects the timing constraint, movability, and the range of all other through vias in the same net. The ultimate way to perform via relocation is to consider all vias from all nets simultaneously, which is computationally expensive. Our method that considers one via from all nets simultaneously is better than a sequential approach that considers all vias from a single net. We see that the original via relocation problem is nonlinear in nature due the inverse relation between thermal resistance and number of through vias in a tile (= R opt P 5 P 4 P 3 P P R b R 5 R 4 R 3 R R 1 vs ). In the next section we propose a simplified integer linear programming formulation which overcomes the nonlinear problem formulation. E. Integer Linear Programming From the NLP formulation we see that the number of through vias in each tile is an integer variable. Our ILP based formulation differs from the NLP in the following way: we replace Equations () and (3) with the following: T = T 1 + γ 0 δ γ vmax δ vmax (9) 1 γ 1 +γ + + vmax γ vmax = (10) Σγ m =1 (i, j, k) (11) γ n {0, 1} (1) Equation (9) gives the new equation for calculating the temperature. In this equation γ n are the new integer variables that are added whereas δ (refer to Table I) are the constants which are calculated for each possible value of the number of vias in a tile. Equation (10) equates the γ n variable with the optimum number of vias in a tile. Equation (11) ensures that for each tile only one γ n takes a value 1. Finally equation (1) ensures that γ n is either 0 or 1. All other equations are same to our previous NLP formulation. 09

6 We see that for this new ILP formulation a large number of new integer variables γ n are added. The number of these new integer variables is proportional to the number of tiles in our thermal grid and the number of vias possible in each grid. Adding such large number of integer variables makes the problem harder to solve. In our next section we propose our reduced ILP formulation which removes the need of integer γ n variables, thus reducing the large number of integer variables required. F. Reduced Integer Linear Programming Our ILP-based via relocation is formulated as follows (Table I explains the notations we use in the formulation): Minimize Subject to: Σα T (13) T = T 1 + δ 0 β 1 (δ 0 δ 1 ) (14) β vmax (δ vmax 1 δ vmax ) β 1 + β + + β vmax = (15) = V org + V (16) V =ΣM x,y,k v,w,k (n) ΣM (n), n (17) G cur G max, (i, j, k) (18) 0 β m 1 (19) (n) {0, 1} (0) Σ (n) =1, n (1) Equation (14) is a new way to compute the temperature at the routing tile (i, j, k) compared with Equation (9) (interested readers are referred to our technical report [6] for more details). Equation (15) states that the total number of through vias in a tile (i, j, k), which is specified by the β i values (1 i vmax), should equal to. Equation (19) restricts the range of values β n can take. All other equations are same as discussed in NLP formulation. To overcome the restriction of moving just one via per net (Equation 1) we iterate the entire relaxed ILP multiple times so that multiple through vias from a single net is given a chance to relocate in an iterative fashion. The number of integer variables (= (n)) can be huge if the number of nets is larger or the thermal grid is finer. This makes our ILP formulation less desirable for a large problem instance. We overcome this limitation by relaxing these integer variables and solving the resulting LP problem. We round the continuous variables based on a threshold value λ. We additionally perform a gainbased gradient search to obtain the λ value that generates the best quality results. V. EXPERIMENTAL RESULTS We implemented our algorithms in C++/STL and ran our experiments on Linux PC running at GHz. We tested our algorithms with three sets of benchmark: ISCAS89, ITC99, TABLE III TEMPERATURE COMPARISON BETWEEN GREEDY AND OUR SIMULTANEOUS THROUGH VIA RELOCATION APPROACH. T org AND Topt RESPECTIVELY DENOTE THE TEMPERATURE BEFORE AND AFTER THERMAL OPTIMIZATION. greedy ILP multiple ILP ckt T org T opt cpu T opt cpu T opt cpu iter s b14 opt s s b0 opt b1 opt b opt ibm ibm ibm ibm ibm RATIO and ISPD98. Our experiment is based on 4-die stacked IC, where the top two and bottom two dies are bonded in face-toface and the middle two in back-to-back. We assume all 4 dies have different unit-length resistance and capacitance values [3]: r 1 = 86Ω/mm and c 1 = 396fF/mm, r = 175Ω/mm and c = 100fF/mm, r 3 = 74Ω/mm and c 3 = 79fF/mm, and r 4 = 154Ω/mm and c 4 = 10fF/mm for unit-length interconnect for the four dies. R via = 106Ω/mm and C via = 140fF/mm values are used for the face-to-face vias, and R via = 53Ω/mm and C via = 80fF/mm values are used for the back-to-back vias. All runtime reported are in seconds. We use [7] to analyze the temperature profile from a given 3D placement [8]. To compare our results we implemented a wirelength driven 3D maze router. During tree construction in our 3D maze router, a new pin is added to an already existing tree such that the added interconnect is shortest possible. We also implemented a 3D version of Steiner arborescence. We first converted the 3D problem to a D problem by mapping the corresponding pin locations to the D plane, then we implemented a D Steiner arborescence used in [9]. All the wire routing was done on the same plane as that of the driver pin, finally all pins which were not located on the same plane as the driver pin were connected using stacked vias. We report the total wirelength, number of vias, the maximum path-delay among all sinks and processor time in seconds for each circuit. A. 3D Steiner Routing Results Table II shows a comparison among 3D maze, 3D A-tree, and our 3D Steiner routing. We observe that our 3D Steiner router achieves 49% average performance improvement over 3D maze routing and about 9% improvement over 3D A-tree. However, we see an increase in 1% wirelength and 7% vias over 3D maze router. Also our 3D Steiner router works slower than 3D A-tree, which can be attributed to a large number of high fanout nets present in the design. Our 3D Steiner tree generation takes O(p 4 ) time to run, where p is the number of pins, and does not scale well with increasing fanout size. 10

7 TABLE II 3D ROUTING FOR ALL TYPES OF NETS. WECOMPARE3D MAZE ROUTER AND OUR 3D STEINER ROUTER. THE RUNTIME IS IN SECONDS. THE VIA BOUND COLUMN SHOWS A LOWER BOUND ON THE VIA COUNT. 3D Maze Routing 3D A-tree 3D Steiner Routing ckt size v-bound wire delay via cpu wire delay via cpu wire delay via cpu s b14 opt s s b0 opt b1 opt b opt ibm ibm ibm ibm ibm RATIO The v-bound column in Table II shows the lower bound of the through via usage for each circuit. For the two-die twopin nets, the lower bound is 1. For the multi-die, multi-pin nets, we use the minimum possible via to connect all pins in the dies. We see that the number of vias needed by the 3D maze or 3D Steiner router is about twice as many as the minimum required. The number of vias used in 3D A-tree algorithm is the highest. However, this should not be used as a comparison metric, since we constructed the 3D A-tree to primarily compare performance results. If we look at circuit ibm13, we observe that 3D-Atree performs better than our 3D Steiner in terms of performance. This was due to congestion that caused larger number of nets to be ripped and re-routed in our 3D Steiner algorithm. In some cases, we observed that 3D A-tree caused lower congestion thus required fewer re-routing. B. Through Via Relocation Results To compare the results of our through via relocation, we implemented a fast greedy algorithm, which tries to move through vias into thermal hotspots in an iterative fashion. We choose a single hotspot and relocate the movable vias into it. We then repeat this process for the next hotspot until no more temperature improvement can be obtained. Table III shows the reduction in maximum temperature by our approach and the greedy algorithm (the accurate thermal model was used to calculate final temperature). We observe that our simultaneous approach achieves consistent improvement over the greedy approach at the expense of additional runtime. The runtime of our ILP-based method ranges from 70 to 800 seconds. The runtime for our biggest circuit, ibm17, that contains 184K nets is around 800 seconds. This shows that our method scales well with the complexity of the circuit while maintaining high quality solutions. In Table III we also show the impact of multiple iterations on our relaxed ILP. Our entire through via relocation algorithm can be repeated multiple times since the temperature values change from each iteration. However, we observe that the majority of the improvement is achieved from the first iteration and that the algorithm converges quickly without any major improvement during the subsequent iterations. VI. CONCLUSIONS This paper studied two new problems that are important for 3D stacked IC technology: 3D Steiner tree construction and through via relocation. Our routing algorithm is based on a constructive method, where a 3D Steiner tree is grown by connecting a new pin to the existing tree. We derived twovariable delay equations and optimized them to compute the location of the through vias under given thermal profile. For through via relocation, we devised an innovative technique which helps avoid the non-linear optimization required for temperature optimization. Our formulation can handle large number of vias simultaneously for an effective temperature optimization. REFERENCES [1] M. Yilidiz and P.H.Madded, Preferred Direction Steiner Trees, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 00. [] J. Cong and Y. Zhang, Thermal Via Planning for 3-D ICs, in Proc. IEEE Int. Conf. on Computer-Aided Design, 005. [3] V. Pavlidis and E. Friedman, Interconnect Delay Minimization through Interlayer Via Placement in 3-D ICs, in Proc. Great Lakes Symposum on VLSI, 005. [4] A. Ajami, K. Banerjee, and M. Pedram, Effects of non-uniform substrate temperature on the clock signal integrity in high performance designs, in Proc. of IEEE Custom Integrated Circuits Conference, May 001, pp. pp [5] K. Boese, A. Kahng, B. McCoyy, and G. Robins, Near-Optimal Critical Sink Routing Tree Constructions, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, [6] M. Pathak and S. K. Lim, Thermal-aware Steiner Routing for 3D Stacked ICs, Georgia Institute of Technology, Tech. Rep. GIT-CERCS-07-18, 007. [7] T.-Y. Wang and C. C.-P. Chen, 3-d thermal-adi: A linear-time chip level transient thermal simulator, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, pp , 00. [8] B. Goplen and S. Sapatnekar, Efficient Thermal Placement of Standard Cells in 3D ICs using a Force Directed Approach, in Proc. IEEE Int. Conf. on Computer-Aided Design, 003. [9] J. Cong, K.-S. Leung, and D. Zhou, Performance Driven Interconnect Design Based on distributed RC delay model, in Proc. ACM Design Automation Conf.,

Thermal-aware Steiner Routing for 3D Stacked ICs

Thermal-aware Steiner Routing for 3D Stacked ICs Mohit Pathak and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology {mohitp, limsk}@ece.gatech.edu Abstract In this