University of California at Berkeley. Berkeley, CA the global routing in order to generate a feasible solution

Size: px

Start display at page:

Download "University of California at Berkeley. Berkeley, CA the global routing in order to generate a feasible solution"

Kenneth Peters
5 years ago
Views:

1 Post Routing Performance Optimization via Multi-Link Insertion and Non-Uniform Wiresizing Tianxiong Xue and Ernest S. Kuh Department of Electrical Engineering and Computer Sciences University of California at Berkeley Berkeley, CA Abstract Most existing performance-driven and clock routing algorithms construct optimal routing topology for each net individually without considering its routability on the chip, so they can not guarantee performance after all nets are routed. This paper proposes a new approach for post routing performance optimization via multi-link insertion and non-uniform wiresizing, which improves the performance of a net topology obtained from a global routing solution. Unlike previous approaches, it can achieve reduction in both maximum delay and skew to satisfy user specied constraints and minimizes the routing resource consumed. During optimization, the topology of the net is kept routable. Experiments show that link insertion and wiresizing can improve net performance signicantly, and among all approaches, multi-link insertion and wiresizing achieves the best performance and area ef- ciency. 1 Introduction The advent of sub-micron and deep sub-micron technologies has made interconnect routing problem critical in performance-driven physical design for high density ICs and MCMs. Routing tools must have the ability to generate feasible topologies of nets and guarantee their performance. In recent years, several performance-driven routing algorithms have been proposed[1, 2, 5, 8]. Unlike conventional routing approaches, their objective for routing topology construction or wiresizing is to minimize the maximum or critical delay of the net. Another category of problems closely related to performancedriven routing is clock routing, which aims at reducing the maximum delay skew among sinks of a net. There have been numerous approaches for zero-skew solutions[9, 11], recent work by [3, 4] extends the existing methods to address the bounded-skew problem. Most of these existing performance-driven and clock routing algorithms are in fact pre-routing methods which have several limitations: 1. The optimal routing topology for each net is constructed individually without considering its routability on the entire chip. Since the initial net topologies are subject to signicant modications during This research was sponsored by the Semiconductor Research Corp. (SRC) contract 94-DC-324. the global routing in order to generate a feasible solution under routing resource constraints(congestion and blockage), pre-routing methods can not guarantee net performance after all nets are routed. 2. These approaches address the optimization of maximum delay and skew separately and fail to achieve reduction in both of them, in fact, many sacrice delay for skew minimization. 3. Most approaches aim at minimizing the maximum delay orskew of a net, which often leads to overconstrained routing solutions that are not optimized in routing area. In real design, minimizing the net routing area is usually the objective of optimization, while the net delay and skew are only required to satisfy certain bounds which can be known exactly only after a feasible global routing solution is obtained. 4. Existing approaches restrict routing topologies to either trees or other xed topologies, such simplication sacrices the exibility in routing. This paper proposes a new approach for post routing performance optimization via multi-link insertion and non-uniform wiresizing, which improves the performance of an existing net topology after global routing. Unlike [6], which greedily adds new edges into an existing tree on a geometric graph, we analyze in detail the impact of link insertion on maximum delay and skew of any arbitrary topology and develop a link insertion and wiresizing algorithm that can achieve the best improvement in performance. Our method extends the single link approach in [10] tomulti-link insertion and wiresizing so that better performance and area eciency can be achieved. During optimization, links are inserted sequentially between the source and current node with maximum delay (skew) in the topology with shortest feasible length, each link is wiresized non-uniformly under routing resource constraints. The objective is to satisfy the performance requirements with minimum routing area consumption. Our approach overcomes the limitations of previous ones: 1. It can reduce maximum delay and skew simultaneously to satisfy user-specied constraints and allow area-performance trade-os. 2. It guarantees the routability of the net and treats the tree and mesh structures in the same manner without any restrictions on routing topology. Since link insertion changes both the net topology

2 and its admittance matrix, it has advantage over wiresizing-only approaches for delay and skew reduction. Actually, the latter can be regarded as special cases of link insertion that aect node admittances only. The rest of this paper is organized as follows. Section 2 analyzes the impact of link insertion on maximum delay and skew reduction; Section 3 proposes the multilink insertion and wiresizing approach with optimality analysis; Section 4 shows experimental results which demonstrate the eectiveness of our approach for post routing performance improvement. 2 Delay and Skew Analysis 2.1 Denitions The routing topology N of a net is formulated as a network of uniformly distributed RC lines, each line is formulated by the lumped RC model. Trees and meshes are treated without distinction in our following discussions. We adopt the convention that the ground node in N is not numbered, the source consists of a voltage source in series with a driver resistance R d inserted between the ground and reference node n ref. The remaining nodes in N are numbered from 1 to n = jn j. The loading capacitance at sinks in N is denoted by C s. The resistance matrix of N is denoted by R =[R ij ], where each entry R ij in ohms, is equal to the potential in volts at node i if a 1A current were injected into node j while all nodes in N other than j were open circuited[7]. Notice that the driver resistance R d should be a part of every R ij. In the special case where N is a tree, R ij is simply the total resistance along the common path shared by nodeiand j. Denote c =[C j ] as the capacitance vector of N, where C j is the ground capacitance at node j 2 N. The Elmore delay D i at node i is then: D i = X j R ij C j (1) Consider N as an undirected connected graph. Suppose N is not bi-connected with reference node n ref being one of its articulation points, i.e., there are several sub-components N1;N2;:::;N k branched out from the source and are connected to each other only at n ref. (If n ref is not an articulation point of N, then N = N k ) (Fig. 2). articulation point N1 Dene the maximum node delay in Nas D max = maxfd i ji 2 N g. Denote the delay skew between nodes i; j by D ij = D i D j. The maximum delay skew in N, D max, is then dened as D max = maxfjd ij jji; j 2 N g. To study the impact of changes in topology on D max and D max,we establish a new routing topology N 0 by adding an additional link e with parameters R e and C e between n ref and a chosen node n 2 N (Fig. 2). This establishes a direct link from node n to the source and has the potential to achieve the largest reduction in delay and skew at node n among all possible links between n and nodes in N. Denote R 0 and c 0 as the resistance matrix and capacitance vector of N 0, Dmax 0 and Dmax 0 as the maximum delay and delay skew of N 0 respectively. 2.2 Changes in Node Delay Without losing generality, wechoose n 2 N k. According to the analysis in [10], the following conclusions hold for maximum delay reduction. Lemma 1 If both i; j 2 N k, R 0 ij <R ij, and when R e decreases, R 0 ij also decreases; otherwise, R0 ij = R ij. Theorem 1 With proper R e, C e values, Di 0 < D i, 8i 2 N k, and when R e decreases, Di 0 also decreases. Corollary 1 If there is only one component rooted at n ref (N = N k ), Dmax 0 <D max holds with proper R e ;C e. The situation in which multiple sub-components are rooted at n ref can be handled by inserting multiple links into the topology, connecting to dierent components. Therefore, according to Theorem 1, the maximum delay can always be reduced by adding new links properly into the existing topology. 2.3 Changes in Delay Skew To study the impact of topology change on delay skew, we further decompose N k into two sub-components, N k1 and N k2. N k1, N k2 connect to each other only at n(articulation point), i.e., n is on every path from a node in N k2 to n ref (Fig. 2). Notice that a node with maximum delay skew must also have the maximum delay, which is denoted as n max in later context. Dene R(i; j; m) =R im R jm. According to the analysis in [10], the following conclusions hold for maximum delay skew reduction. Rd n ref V + s e N2 Lemma 2 8m 2 N, if j 2 N k2, R 0 (n; j; m) = R(n; j; m); otherwise, R 0 (n; j; m) < R(n; j; m), and when R e decreases, R 0 (n; j; m) decreases. N k1 n N k2 Nk Theorem 2 If j 62 N k2, D 0 nj < D nj holds with proper values of R e, C e. When R e decreases, D 0 nj also decreases. articulation point Fig. 2 Decomposition of N and N k Corollary 2 If n is chosen as n max, D 0 max < D max holds with proper values of R e ;C e.

3 Corollary 2 implies that we can always reduce the maximum delay skew of an existing routing topology by adding a new link properly between n ref and n max. Recall that similar approach is used to reduce the maximum delay of the topology. So unlike previous algorithms which often sacrice delay for skew minimization, our approach can achieve reduction in both maximum delay and delay skew of a net topology. 3 Multi-Link Insertion and Non-uniform Wiresizing Since the maximum delay and skew reduction requires appropriate values of R e ;C e, each inserted link must be built and wiresized properly in order to optimize net performance. For this purpose, a post routing performance optimization approach isdeveloped. It sequentially inserts multiple links into an existing topology obtained from a global routing solution and nonuniformly wiresizes each link under routing resource constraints so that net performance requirements are satised with minimum area consumption. 3.1 Non-uniform Link Wiresizing Problem Formulation Without losing generality, we assume in the following analysis that there is only one sub-component rooted as n ref (N = N k ) and n max is unique at any time during the optimization (the situation in which multiple n max s exist can be easily handled by adding one link to each of them). Suppose we form a new topology N 0 by establishing a k-segment link e between current n max and n ref. De- ne length and width vectors of e as l =(l1;:::;l k ), w =(w1;:::;w k ) and w ub as the vector of wire width constraints on w. The R, C values of each segment i 2 e, R i ;C i, satisfy: R i = r o l i =w i ; C i = c0l i w i where r0;c0 are the unit resistance and capacitance of the link with minimum width w0, respectively. The total R, C values of link e, R e, C e, can then be expressed as: kx l i R e = r0 ; w i=1 i kx C e = c0 l i w i (2) i=1 The delay and skew of N 0 can be measured by K = (P P sp )=P sp, where P is the maximum delay or delay skew, and P sp is its specied constraint. K max = maxfk delay ;K skew g measures the performance of the net and K max 0 implies the requirements for maximum delay and skew are both satised Non-uniform Link Wiresizing Since reduction in maximum delay and skew favors link with small values of R e, C e which are both monotonic in l according to Eqn (2), a shortest path algorithm can be applied to obtain a feasible route for the inserted link e under routing resource constraints. Once the route is determined, l is xed and w becomes the only adjustable vector in N 0. The maximum delay and delay skew, both can be expressed as functions of w, are not monotonic in w since C e is proportional to w while R e is inversely proportional to w(eqn (2)). This non-uniform wiresizing problem can be formulated as an optimization process, which aims at satisfying specied maximum delay constraints, D bound, with minimum routing area lw (Notice that the maximum delay and skew reductions are consistent): Minimize lw Subject to: D 0 max (w) D bound 0 < w w ub Since D 0 max(w) is non-linear with respect to w, the wiresizing is a non-linear programming problem which can be solved eectively by the Sequential Quadratic Programming (SQP) method as explained in [10]. 3.2 Multi-Link Insertion and Non-uniform Wiresizing According to the delay and skew analysis, the maximum delay and skew reduction favors link connecting to n max that may change during the optimization process. Therefore, a multi-link insertion and nonuniform wiresizing algorithm is designed. It rst establishes the shortest link e connecting to the current n max and non-uniformly wiresizes e to satisfy delay bound D bound that is reduced by p at each iteration. When maximum delay node switches from n max to n 0 max, a new link e 0 connecting to n 0 max will be established if continued wiresizing of e can not satisfy the performance requirements with less area consumption than e 0. This link insertion and wiresizing process continues until K max 0isachieved. The Algorithm() 1. Initialization: 1.1 Input existing net topology N and its performance requirements. 1.2 Identify initial n max and K max, set initial D bound and its decreasing rate p. 2. Link establishment: Build new topology N 0 by establishing a shortest feasible link e between n ref and n max. 3. Non-uniform link wiresizing: While K max > 0 and n max remains unchanged: 3.1 Reduce D bound by p. 3.2 Non-uniformly wiresize e (Sec ). 3.3 Evaluate K max and area e. 4. Multi-link insertion (n max switches to n 0 max ): If no link exists between n ref and n 0 max: 4.1 Save current w e0 and area e0 of e, estimate area e 0 of a new link e 0 between n 0 max and n ref with minimum width. 4.2 While K max > 0&area e area e0 + area e 0: Continue wiresizing e, estimate K max and area e.

4 4.3 If area e > area e0 + area e 0(e 0 should be built): Restore w e0 of e, update D 0 max Set n max = n 0 max ;e=e0, goto Step 2. Else (e 0 already exists between n ref and n 0 max) 4.4 Set n max = n 0 max ;e= e0, goto Step 3. If K max 0 is satised at any step, the process will terminate. It should also be noticed that at Step 4: 1. The establishment of a new link e 0 would consume certain amount of routing resource(area e 0 ). Therefore, when n max switches to n 0 max, e 0 will not be built if continued wiresizing of e can achieve K max 0 with less additional area than area e 0 (Step ). 2. Since same node may repeatedly become n max during the iterative process, a link to current n max from n ref may already exist. In that case, this existing link will continue to be wiresized until new n max emerges or the performance becomes satisfactory (Step 4.4). 3.3 Optimality Analysis Uniform vs Non-uniform Wiresizing The wiresizing of e aects D 0 max since it determines the distribution of C e along e. Intuitively, large capacitive segments should be placed close to the source so that R ij s are small for node j 2 e with large C j. This implies that non-uniform wiresizing can achieve better results in performance optimization than uniform wiresizing. Lemma 3 To achieve a certain amount of reduction in D 0 max, non-uniform wiresizing consumes less routing area than uniform wiresizing Link to n max vs Link to Other Node Maximum delay can be reduced most if link e is connected to n max, since this would reduce R 0 jn max s more than a link to any other node. For area consumption, the following lemma can be established. Lemma 4 To achieve a certain amount of reduction in D max, a link connecting to n max consumes less routing area than a link to any other node Max Delay vs Skew Reduction According to Corollary 2 in Section 2, maximum delay and skew reductions are consistent if a link is connected to n max. This is formally stated for inserted link with non-uniform wiresizing as the following. Lemma 5 If an inserted link is connected to n max and wiresized, the maximum delay and skew reductions are consistent, i.e., larger reduction in Dmax 0 leads to larger reduction in Dmax. 0 Lemma 5 implies that all conclusions for maximum delay reduction can also be extended to maximum delay skew reduction, so the following corollary can be established according to the denition of K max. Corollary 3 Lemma 3,4 are both valid for the reduction of K max Optimality of the Algorithm Based on analysis above, the optimality of the algorithm in Sec 3.2 can be established. Proposition 1 Among all possible link insertion and wiresizing approaches that satisfy specied performance constraints, the Algorithm in Section 3.2 is optimal in terms of routing area consumption. Proof: 1. According to Lemma 3 and 4, Step 2, 3 and 4 in the Algorithm consume minimum routing area for certain performance improvement. 2. Since the Algorithm terminates as soon as K max 0 is obtained, it does not consume any area more than necessary. Therefore, the Algorithm achieves satisfactory performance with minimum routing area. 2 4 Experimental Results The post routing performance optimization algorithms have been implemented and tested. The following MCM parameters are used in delay evaluation (courtesy of Prof. Wayne Dai of UC Santa Cruz from data provided by AT&T): driver resistance R d = 25, unit resistance and capacitance R = 0:008=m; C = 0:06fF=m, loading capacitance C s =1pF. Table 1. Physical Characteristics of Nets Net #of Total wire w ub # of inserted no. pins length (mm) (xw0) Links Table 1 lists the physical characteristics of eight testing topologies obtained from a global routing solution, which are critical for chip performance. The number of segments on each inserted link is set to 6. The width of each link segment i, w i, belongs to the interval (w0;w iub ) where w0 is the minimum width of a link and w iub is the upper bound on w i, known from the current routing solution. In our testing, we assume that segments of inserted links in each topology share the same wire width upper bound w ub.for multi-link insertion and wiresizing, two inserted links can usually satisfy the specied performance requirements. The lengths of some inserted links are 20% to 40% longer than the Manhattan distance between n ref and n max, since the shortest path may be either unroutable or can not be wiresized up to w ub. Three dierent approaches are tested and compared: 1. Single link insertion and uniform wiresizing, which only establishes a single link to the original n max and wiresizes it uniformly. 2. Single link insertion and non-uniform wiresizing, which establishes a single link like Approach 1 but wiresizes it non-uniformly (same as in [10]).

5 3. Multi-link insertion and non-uniform wiresizing, which applies the multi-link insertion and wiresizing algorithm in Sec D max(% of Dmax) p x p Area multi non uni single non uni single uni Fig. 5 Maximum Delay vs Link Routing Area D max(% of Dmax) p1 xp Area multi non uni single non uni single uni Fig. 6 Maximum Skew vs Link Routing Area Fig. 5 and 6 show the trade-os between maximum delay (skew) reduction and the routing area of the inserted links in testing topology 1 by the three approaches. Here, Dmax 0 and Dmax 0 are measured by the percentages of the original D max and D max values before optimization. Point p1 on the curves marks the insertion of the rst link to the original n max, which reduces D max and D max byover 25% and 40% respectively. Initially, the curve by Approach 3 matches the curve by Approach 2, since both approaches non-uniformly wiresize the same link. The insertion of new link by Approach 3 due to the switching of maximum delay node (marked by point p2 on the curves) results in faster reduction in both D max and D max indicated by the slopes of the curves right to p2. Among all three approaches, the multi-link insertion and non-uniform wiresizing algorithm achieves the best performance and routing area eciency. It is the only approach that can simultaneously reduce D max and D max byover 38% and 60% respectively. Table 2 and 3 show the reductions in D max and D max by the three approaches. For the two single link insertion and wiresizing approaches, the solution is the best achievable in terms of minimizing K max = maxfk delay ;K skew g. The multi-link approach stops as soon as K max 0isachieved, better results in performance are possible if the process continues. From these tables, it can be observed that after post routing optimization: 1. All three approaches can improve net performance signicantly. Onaverage, D max is reduced by 37% to 43% and D max is reduced by 59% to 63%. K delay and K skew, which measure the gaps between current Dmax 0 and D0 max and their specied bounds, are reduced from average of 0:80 and 1:75 to be within the range of (0:11; 0:005). These demonstrate the eectiveness of our post routing performance optimization approaches. 2.Among the three approaches, multi-link insertion and non-uniform wiresizing achieves the best performance in terms of maximum delay and skew reduction. It can generate feasible solutions that satisfy the performance requirements (K max 0) in all cases where the best solutions by the two single link approaches fail to reach the specied bounds for both maximum delay and skew (K max > 0). Table 4 compares the link routing area by the three methods and demonstrates the area eciency of the multi-link approach. To achieve the performances listed in Table 2 and 3, the multi-link approach consumes an average of 39:53% less area than the single link uniform wiresizing method, while the area saving by the single link non-uniform wiresizing over the uniform wiresizing is only 16:94%. In summary, the multi-link non-uniform wiresizing method is the best among three approaches in terms of both performance improvement and area eciency. References [1] K. Boese, A. Kahng, B. Mccoy and G. Robins, \Rectilinear Steiner Trees with Minimum Elmore Delay", Proc. DAC 31, pp , [2] J. Cong and K. Leung, \Optimal Wiresizing Under the Distributed Elmore Delay Model", Proc. ICCAD 93, pp , [3] J. Cong and C. K. Koh, \Minimum-Cost Bounded- Skew Clock Routing", Proc. ISCAS 95, [4] D. Huang, A. Kahng, A. Tsao, \On the Bounded- Skew Clock and Steiner Routing Problems", Proc. DAC 32, pp , [5] X. Hong, T. Xue, E. Kuh, C. Cheng, J. Huang \Performance- Driven Steiner Tree Algorithms For Global Routing", Proc. DAC 30, pp , [6] B. McCoy and G. Robins, \Non-Tree Routing",Proc. ED&T 94, pp , [7] A. E. Ruehli, Circuit Analysis, Simulation and Design, Advances in CAD For VLSI, Vol.3, [8] S. Sapatnekar, \Interconnect Optimization under the Elmore Delay Model", Proc. of DAC 31, pp , [9] R. S. Tsay, \Exact Zero Skew", Proc. ICCAD 91, pp , [10] T. Xue, E. S. Kuh, \Post Routing Performance Optimization via Tapered Link Insertion and Wiresizing", Proc. EURO-DAC 95, to appear. [11] Q. Zhu, W.M. Dai, J. Xi, \Optimal Sizing of High-Speed Clock Networks Based on Distributed RC and Lossy Transmission Line Models", Proc. ICCAD 93, pp , 1993.

6 Table 2. Maximum Delay Reduction by Link Insertion and Wiresizing Before Optimization Single Sink Multi Link Net Specied Uniform Sizing Non Uni. Sizing Non Uni. Sizing No. Red.(-%) K delay Red.(-%) K delay Red.(-%) K delay Red.(-%) K delay avg Table 3. Maximum Skew Reduction by Link Insertion and Wiresizing Net Before Optimization Specied Single Sink Uniform Sizing Non Uni. Sizing Multi Link Non Uni. Sizing No. Red.(-%) K skew Red.(-%) K skew Red.(-%) K skew Red.(-%) K skew avg Table 4. Area Consumption by Link Insertion and Wiresizing Net Single Link Uniform Sizing Non Uniform Sizing Multi Link Non Uniform Sizing No. Area (xw0) Area (xw0) Saving (-%) Area (xw0) Saving (-%) avg

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Zhigang Pan Department of Computer Science University of California, Los Angeles, CA 90095 Email: fcong,pang@cs.ucla.edu