Routability-Driven Repeater Block Planning for Interconnect-Centric Floorplanning

Size: px

Start display at page:

Download "Routability-Driven Repeater Block Planning for Interconnect-Centric Floorplanning"

Carmel Richard
5 years ago
Views:

1 Routability-Driven Repeater Block Planning for Interconnect-Centric Floorplanning Probir Sarkar and Cheng-Kok Koh, Member, IEEE Abstract In this paper, we present a repeater block planning algorithm for interconnect-centric floorplanning. We introduce the concept of independent feasible regions for repeaters and derive an analytical formula for their computation. We develop a routability-driven repeater clustering algorithm to perform repeater block planning based on iterative deletion. The goal is to obtain a high quality solution for the repeater block locations so that performance-driven interconnect synthesis at the routing stage can be carried out with ease, while minimizing the chip area. Experimental results show that our method increases the percentage of all global nets that meet their target delays from 67.5% in [1] to 85%. Moreover, our approach minimizes the expected routing congestion, making it easier for performance-driven routers to synthesize global nets that require the insertion of repeaters to meet timing constraints. Index Terms Buffer insertion, deep submicron, physical design, Floorplanning, Routing. 1 Introduction Due to the continued scaling of VLSI technologies, interconnects play a dominant role in determining system performance, power, reliability, and cost. To ensure timing closure of designs, impacts of interconnects must be considered as early as possible in the design flow. 13 This work was supported by NSF CAREER award CCR P. Sarkar was with the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN , USA. He is now with Conexant Systems, Newport Beach, CA 92660, USA. 13 C.-K. Koh is with the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN , USA.

2 Several interconnect synthesis techniques topology construction, repeater insertion, device sizing, and wire sizing and spacing have been studied in the literature. A comprehensive survey of these techniques can be found in [2]. Studies in [3, 4, 5, 6] showed repeater insertion being among the most effective methods to optimize signal delay. Without repeaters, the delay of a long RC wire is quadratic in terms of the wire length. With judicious insertion of repeaters, the delay becomes linear [7, 8, 9]. However, most interconnect synthesis techniques were designed for post-placement interconnect optimization. In [10], it was projected that over 700K repeaters will be inserted in a single chip for the 70nm technology. The insertion of that many repeaters will significantly change the floorplan and placement of a design, rendering the original floorplan and placement invalid and leading to a slow design convergence. Floorplanning [11, 12], being the first stage of the physical design process, has significant effects on overall system power, performance, and reliability. However, very few existing timing-driven floorplanning techniques consider the option of repeater insertion. An advantage of considering repeater optimization during floorplanning is that the problem size at this stage is smaller compared to the problem size faced by place-and-route and hence, permits a more effective search of the design space. In [13], the floorplanner assumed that repeaters could be inserted arbitrarily in an existing floorplan. However, repeaters consume silicon resources, and certain circuit blocks such as the cache may not allow the insertion of repeaters. To overcome this problem, the authors in [1] considered the insertion of blocks of repeaters in the channel regions between circuit blocks. However, the greedy clustering of repeaters into a repeater block may result in routing congestion. As pointed out by the authors in [14], it is important to perform interconnect planning for global routing during the floorplanning stage. Many designs are routing-limited; it may not be feasible to get signals to and out of repeaters due to the limitation in routing resources. Therefore, it is important to consider routing feasibility (in subsequent steps) during the global distribution of repeaters in the floorplanning step. In this paper, we propose a routability-driven repeater block planning algorithm for interconnectcentric floorplanning. We introduce the concept of independent feasible region (IFR) for repeater insertion under delay constraint, and derive an analytical formula for the computation of IFRs. All repeaters are freely movable within their respective IFRs without violating the timing constraints. This has the advantage that each repeater of a net has equal flexibility of position, which facilitates global optimization. As in [1], we cluster individual repeaters into repeater blocks to form a regular layout structure for a higher density implementation. We develop an effective routability-driven repeater 2

3 block planning algorithm. A tile based congestion model is used to drive an iterative deletion heuristic for repeater clustering. Experimental results show that our method can boost the completion rate, i.e. the percentage of all global nets that meet their target delays, to 85% from the 67.5% completion rate reported in [1]. Furthermore, the expected congestion due to our repeater block planning algorithm is lower than that produced by a planner without considering congestion. The remainder of the paper is organized as follows. In Section 2, we present the problem formulation. Section 3 defines the independent feasible region for repeater insertion. Section 4 presents the routability-driven repeater block planning algorithm. Experimental results are shown in Section 5, followed by the conclusion in Section 6. An extended abstract of this work was presented at ISPD 2000 [15]. 2 Problem Formulation In this paper we study the following repeater block planning problem: given an initial floorplan, and timing constraints on each net, find the number, locations, and sizes of the repeater blocks to be inserted in order to meet the timing constraints. As in [1], we consider the insertion of repeater blocks in the channel regions between circuit blocks, in which repeater insertion is not allowed. Although the primary objective of the repeater block planner is to meet the timing constraints for all nets, it is equally important to avoid routing hot spots and keep the number of repeater blocks and the increase in chip area within tolerable limits. The two objectives of reducing the total number of repeater blocks added and avoiding regions with high routing congestion are contradictory in nature; by clustering large number of repeaters into a repeater block, we create a highly congested routing region around that repeater block. Figure 1 compares a routability-driven repeater block planner with a repeater planner that tries to minimize repeater blocks. Three feasible regions from repeaters of different nets are shown and the dashed lines represent the routes passing through the region. If minimizing repeater blocks is the sole objective, the repeater block would be chosen in the high congestion region as shown on the left in Figure 1. A routability-aware repeater planner, however, would move the repeater blocks away from the high congestion zone. As shown on the right in Figure 1, a routability-driven repeater block planner would strike a balance between the two contradictory criteria: routing congestion and repeater block count. 3

4 Feasible regions Repeater Blocks Routes Repeater Block Based approach: assigns repeaters to congested regions. Congestion-driven approach Separates repeater blocks if region is highly congested. Figure 1: Impact of routing congestion on repeater block planning. We measure the overall cost of a repeater block plan by a weighted composition of two cost functions: one cost function guides the solution towards minimum number of repeater blocks; and the other one minimizes the routing congestion of the solution. We shall present in detail the cost functions in Sections 4.1 and 4.2. The weights of the two cost functions may be suitably adjusted to reflect the relative importance of each criterion. Our repeater block planner tries to obtain a repeater block plan that minimizes the overall cost of the solution. 3 Feasible Region Computation In this section, we introduce the concept of independent feasible region (IFR) for repeater placement and obtain an expression for computing its width. We define the independent feasible region for a repeater to be the region in which it can be placed such that the timing constraint of the net is satisfied, assuming that the other repeaters of that net are also located within their respective independent feasible regions. The effectiveness of its use in repeater block planning is demonstrated in Section 5 by the significantly higher completion rates that we obtain as compared to [1]. 3.1 Preliminaries First, we present the definitions and expressions that will be used in stating the main result for IFR computation. Each driver/repeater is modeled as a switch-level RC circuit [16], and the Elmore delay formula [17] is used for delay computations. The notation for the physical parameters of the interconnect and repeater we use in this paper is as follows: 4

5 r: wire resistance per unit length; c: wire capacitance per unit length; T b : C b : R b : intrinsic repeater delay; repeater input capacitance; repeater output resistance. Given a wire segment of length l with driver output resistance R and sink capacitance C, the Elmore delay of this segment is defined as D RClµ rc 2 µl2 Rc rcµl RC (1) Using the above expression, the Elmore delay of a single-source, single-sink net N (two pin net) of length L with n repeaters can be expressed as, D N x 1 x 2 x n Lµ D R d C b x 1 µ D R b C s L x n µ n 1 i1 D R b C b x i 1 x i µ nt b where R d is the driver resistance, C s is the sink capacitance, and x i is the location of the i-th repeater. The optimal locations of the n repeaters for delay minimization of the net as shown in [18] are x i i 1µy L x L i ¾ 12 n (2) where x L y L 1 n 1 L n R b R d µ r 1 n 1 L R b R d µ r C s C b µ µ (3) c C s C b µ µ (4) c We denote the optimal delay for the net N, of length L, with n repeaters by D N opt nlµ DN x 1 x 2 x n Lµ 3.2 Feasible Region for Repeater Insertion In [1], the feasible region (FR) for repeater insertion was defined as the region in which a repeater could be placed, assuming that all the remaining repeaters were optimally placed with respect to it, in order to satisfy the target delay constraint, denoted by Dtgt. N We denote the width of the feasible region for a given repeater by W FR. An analytical expression for W FR was given in [1]. The following theorem presents an alternative but equivalent analytical expression, of which the proof is presented in the Appendix. 5

6 Source W IFR 1 i-1 i i+1 Sink l 1 l i l i+1 Figure 2: Independent feasible regions. Theorem 1 For Dtgt N D N opt nlµ, the width of the feasible region for the i-th repeater (i n) of the net N is W FR 2 2 Dtgt N DN opt nlµµ n i 1µ iµ rc n 1µ However, such a definition of FRs has the disadvantage that if we assign a repeater to a location that is on or near the boundary of its feasible region, the feasible regions of the other repeaters are almost entirely eliminated. ¾ 3.3 Independent Feasible Region As opposed to the definition of feasible region, the independent feasible region of a repeater is the region where it can be placed while meeting the timing specifications of the net, assuming that the other repeaters are placed within their respective independent feasible regions. Formally, we define the independent feasible region (IFR) for the i-th repeater of a net N as IFR i x i W IFR2x i W IFR2µ 0Lµ such that x 1 x 2 x i x n µ ¾ IFR 1 IFR 2 IFR n, D N x 1 x 2...x n Lµ Dtgt. N Here, W IFR and Dtgt N respectively denote the width of independent feasible region IFR i and the target delay associated with the net. Note that the final placement of a repeater in its IFR does not depend on the placement of the other repeaters, so long as they are placed within their respective IFRs. To allocate an equal degree of freedom to each repeater in the net, we choose the IFR intervals to be of equal width (see Figure 2). We have the following theorem for the width of the independent feasible region of a repeater. Theorem 2 For Dtgt N DN opt nlµ, the width of the independent feasible region for the i-th 6

7 repeater (i n) of the net N is W IFR 2 Dtgt N D N opt nlµ rc 2n 1µ ¾ The proof is presented in the Appendix. It can be trivially shown that when n 1, W FR W IFR. Although W IFR W FR for n 1, using IFRs for repeater planning results in a solution of higher quality because it facilitates a higher degree of fairness during global optimization of repeater block clustering. Note that when the objective is to minimize the number of repeaters inserted in a net, the independent feasible regions of repeaters belonging to the same net do not overlap each other. Otherwise, at least one repeater can be eliminated without violating the given timing constraint. 3.4 Two-Dimensional IFR In the preceding discussions, we were limited to repeater insertion along an one-dimensional line. Implicit in the discussions was the assumption that the route from source to sink is specified by some global router. For repeater block planning during floorplanning, however, no routing information is available. We assume that each net would be routed with a shortest path within the bounding box containing the two terminals. Therefore, we need to compute two-dimensional regions in which the repeaters can be placed. The two-dimensional independent feasible region (2-D IFR) of a repeater is defined as the union of the one-dimensional IFRs of that repeater on all monotonic Manhattan routes between source and sink. Therefore, 2-D IFRs, as in [1], are convex octilinear polygons with horizontal, vertical, and 1-slope boundaries (see Figure 3). However, 2-D IFRs of repeaters belonging to the same net are not completely independent of each other. As the widths and locations of a 2-D IFR are valid only under the assumption that a monotonic Manhattan route exists between the source and the sink, the assignments of repeaters to locations within their respective 2-D IFRs must be made such that they constitute a monotone path from source to sink. In Figure 3, for example, the repeater assignments, which form a non-monotonic sequence from the source to the sink, violate the monotonicity constraint even though the repeaters are within their respective 2-D IFRs. Therefore, whenever the 2-D IFR of a repeater is modified, the 2-D IFRs for all other repeaters in the net may need to be updated. 1 In this paper, we develop an efficient scheme to perform the update. Details of 13 1 Even though 2-D IFRs are not truly independent, they do not exhibit the same problem as FRs when a 7

8 Sink Feasible regions reduced by circuit blocks Circuit Blocks Feasible regions for repeaters 1 and 2 Source The assignment of repeaters to shown locations will result in a non-monotonic path. Figure 3: Non-monotonic repeater assignments. the updating scheme are presented in Section 4.4. In the subsequent discussions, we shall deal with only 2-D IFRs; we refer to them simply as IFRs. 4 Repeater Block Planning In this section, we describe in detail our routability-driven repeater block planning algorithm. Given a floorplan, we assume that repeaters can only be inserted within the vertical and horizontal channels [19] defined by circuit blocks as in [1]. We also assume that no repeaters can be inserted into circuit blocks. For the purpose of routing congestion computation (to be described in Section 4.1), we divide the entire chip area into a set of rectangular routing tiles as shown in Figure 4. The routing tiles correspond to higher metal layers reserved for the routing of global nets. We also divide the channel between circuit blocks into a set of repeater-block tiles, which are of a finer resolution than routing tiles. As in [1], these repeater-block tiles represent locations where the repeaters may be placed. Figure 4 shows a subset of repeater-block tiles. In essence, we construct a two-level tile structure, one for the purpose of estimating routing-congestion and the other for defining candidate repeater block (CRB) locations. For each repeater b to be inserted, we find S b, the set of CRBs in which it can be placed. As shown in Figure 4, each repeater has several CRBs to which it may be assigned. The objective repeater is assigned to the boundary of its FR, i.e., the FRs of remaining repeaters are almost entirely eliminated. Note that for 2-D FRs, repeaters of a net must also constitute a monotone path from source to sink because a 2-D FR is similarly obtained as the union of the one-dimensional FRs of that repeater on all monotonic Manhattan routes between source and sink. 8

9 Routing Tiles Independent Feasible Region Circuit Block Sink Circuit Block Circuit Block CRBs of feasible region Source Bounding Box of the net Figure 4: Creation of routing tiles and candidate repeater blocks. of the repeater block planner is to assign each repeater to a single CRB. To solve this problem, we take the approach of first generating the candidate set for each repeater, and then using a routing-congestion driven iterative deletion [20] algorithm to obtain an assignment for each repeater. The iterative deletion procedure operates on a bipartite graph G that represents the set of all possible repeater assignments. Let B be the set of all repeaters that need to be inserted. The edge set ofg is defined as E Gµ bcµ : b ¾ Bc ¾ S b. The iterative deletion algorithm starts with a redundant solution space that contains all possible assignments of repeaters to CRBs. One at a time, the algorithm deletes an incompatible repeater assignment, i.e., an assignment that results in high routing congestion or too many repeater blocks. Equivalently, the iterative deletion algorithm removes an edge from the bipartite graph one at a time. When the iterative deletion algorithm terminates, we obtain assignments of repeaters to CRBs that are compatible to each other in terms of routability and repeater block count. Figure 5 gives the overall flow of our repeater block planning algorithm. Steps 1 through 4 are data preparation stages. Step 1 constructs a two-level tile structure as shown in Figure 4. In Steps 2 and 3, we compute the minimum number of repeaters required for each two-pin net to meet its target delay [1, 18] and assign W IFR of each repeater based on the available net slack (Dtgt N D N opt nlµ) as defined in Theorem 2. Then, we generate the candidate set S b for each repeater b by intersecting the IFR of the repeater with the set of repeater-block tiles within the bounding box of the net. If the IFR for a repeater does not have any intersections with the set of repeater-block tiles, then we consider possible repeater locations along the boundaries of circuit blocks. For such nets, we allow repeaters to be placed along the circuit block boundaries in order to meet their timing constraints. Step 4 constructs the bipartite graphg. 9

10 Repeater Block Planning Algorithm 1. Build a two-level tile structure on a given floorplan; 2. Compute IFR for each repeater b ¾ B; 3. Obtain CRB set S b for each repeater b ¾ B; 4. Generate the bipartite graph G; 5. while there exists a repeater to be assigned do 6. Delete the highest cost edge ofg; 7. Update monotonicity; 8. Update congestion matrix; 9. Update edge costs; 10. Assign repeater to a CRB if required; Figure 5: Repeater Block Planning Algorithm. Steps 5 10 perform the iterative deletion operations. The iterative deletion procedure assigns each repeater to one of its CRBs. To accomplish the task of deleting incompatible repeater assignments, the edges ing have dynamic weights. The weight of edge bcµ reflects the cost of assigning repeater b to CRB c, assuming that the other repeaters can go into any of their CRBs with equal probability. An edge of a higher cost implies that the repeater block assignment is likely to be incompatible with the rest, and the iterative deletion operation in Steps 6 10 (to be described in Section 4.3) seeks to remove the highest cost edge or the most incompatible repeater block assignment. As edges are iteratively deleted from graph G, the routability of the floorplan is modified and so is the expected repeater block count. These changes are reflected onto the edge costs of the graph at every iteration, thus making the edge weights dynamic. We shall present the routing congestion model and the associated cost function in Section 4.1, and the edge weight, which is a composite cost function of the routing congestion cost and repeater block cost, in Section Congestion Model The congestion model employed is essentially a two dimensional rectangular grid based probabilistic map assuming two-bend routing for each segment. This is similar to that developed in [14]. The congestion of a routing tile tile i jµ is defined as : C h i jµ Expected number of horizontal routes 10

11 passing through tile i jµ C v i jµ Expected number of vertical routes passing through tile i jµ To derive the congestion numbers for the routing grid, we first compute the expected number of routes passing through every routing tile for a wire segment with a fixed source and a fixed sink. Without loss of generality, we consider the source to be located in tile 00µ and the sink to be located in tile mnµ and assume that a bend consumes both horizontal and vertical routing resources. It is easy to see that there are a total of m nµ possible two-bend routes from source to sink. Assuming that all these routes are equally likely, we obtain the probability matrix shown in Figure 6. δc h i jµ is defined as the probability of a horizontal route passing through tile i jµ. Similarly δc v i jµ is defined for vertical routes. Tile δc h i jµ δc v i jµ 0 i m0 j n 1 m n 0 i m j 0 m i 1 m n 0 i m j n i 1 m n 1 i 00 j n m n 1 i m0 j n i 0 j 0 or i m j n m n m m n Figure 6: Probability matrix. 1 m n 1 m n 1 m n n j 1 m n j 1 m n n m n We define a subnet as the segment of a net between two consecutive repeaters, between the source and the first repeater, or between the last repeater and the sink. In the problem at hand, the source and the sink of a subnet may have several candidate locations if they are repeaters. We compute the contribution of each subnet to the congestion matrices, C h and C v by fixing the source and sink of a subnet segment to the centroids of their CRBs. Thus, given a set of two pin nets and candidate locations for the repeaters we can compute the congestion matrices C h and C v as follows: C h i jµ subnets C v i jµ subnets δc h i jµ δc v i jµ 11

12 Slope=100 Congestion Cost Slope=10 Slope = Normalized Congestion Figure 7: Piecewise-linear congestion cost function. As is done in a number of other routers we assign a congestion cost to each routing tile. A number of ways for modeling the congestion cost have been proposed in literature [21, 22, 23]. For the purpose of this work, we use a monotonic piecewise cost function of the type proposed in [24] (see Figure 7). 4.2 Dynamic Edge Weights The cost of assigning a repeater to a CRB (or the edge cost) is a weighted composite function comprising of the congestion cost and the repeater block cost. Congestion Cost: The congestion cost of an edge is defined as the maximum congestion cost among all routing tiles in the one-bend routing path for the subnet that has the repeater as its source and the subnet that has the repeater as its sink. The respective sink and source of the subnets are assumed to be located at the centroids of their CRBs. One-bend routing, instead of two-bend routing, is used for congestion cost calculation because of its efficiency. The congestion metric and the associated cost functions used have been described in Section 4.1. We denote the congestion cost of an edge e bcµ ¾ E Gµ as CC eµ. Repeater Block Cost: The repeater block cost function is used to guide the iterative deletion procedure to converge to a solution with minimum number of repeater blocks. Let c be a CRB. Let B c be the number of IFRs intersecting on this CRB. Also, define B max to be the maximum number of repeaters that can be inserted into a CRB. The repeater block cost of the CRB c, BB cµ is defined as follows: If the number of repeaters assigned to the 12

13 channel tile is less than B max, then we define BB cµ to be 1min B c B max µ. Otherwise we define BB cµ. The composite cost function is a weighted product of the two costs. The cost of an edge, e bcµ ¾ E Gµ is defined as : COST eµ CC eµµ p 1 BB cµµ p2 where p 1 and p 2 are positive parameters such that p 1 p 2 1. Changing the values of p 1 and p 2 allows us to perform a tradeoff between the two contradictory criteria involved. 4.3 Iterative Deletion The iterative improvement method modifies the solution space repeatedly by changing a small portion at each step. Our procedure begins with a redundant set of possible locations for each repeater, and removes a single highest-cost redundant assignment at each step, while attempting to minimize the cost of the solution. It proceeds by deleting the highest cost edge e bcµ of G. Also, let N denote the net to which repeater b belongs. We update the bipartite graph and its associated edge costs as follows: Monotonicity Update: As mentioned in Section 3.4, the formula for the IFR of each repeater has been derived assuming that a monotonic route between the source and the sink through the repeater locations is generated. To prevent non-monotonic repeater location assignments, we update the candidate sets for repeaters in the net N to ensure the existence of a monotonic path. A CRB in N lies on a monotonic path if there exists a sequence of CRBs, one in each CRB set of the remaining repeaters, that forms a monotonic path from source to sink. In Figure 3, for example, the repeater assignments shown form a non-monotonic sequence from the source to the sink. CRBs that do not lie on a monotonic path are removed. In Section 4.4, we present an efficient technique to remove non-monotone CRBs. Congestion Update: Deletion of the highest cost edge and the corresponding monotonicity update change the solution set represented by the bipartite graph G. After monotonicity update, the CRB sets of repeaters on a net change. Once all the changes to the CRB set of a repeater have been determined, the new centroid of the CRB set can be computed in linear time with respect to the number of deleted CRBs. The congestion update procedure reflects the impact of these changes onto the congestion matrices C h and C v. As 13

14 the centroids of the CRB sets of repeaters on the net before and after the monotonicity update are known, the congestion matrix can be updated in a very efficient manner (linear with respect to the number of routing tiles through which the affected subnets pass through). Note that the update is necessary only when the new and old centroids of a subnet lie in different routing tiles. It is important to note that the centroids usually change their routing tile locations only after several iterations of monotonicity updates. We also observe that the congestion matrix does not change significantly with each congestion update. Edge Cost Update: Due to monotonicity and congestion updates, the cost of assigning a repeater to a particular tile changes. This procedure updates the cost of the edges in the graphg. The updates of the repeater block costs can be done by simply recomputing the function BB cµ, for all CRBs that have been deleted or to which repeater assignments have been made. The update of BB cµ for all edges incident on c is linear with respect to the degree of c ing. However, it is very costly to reflect the changes of the congestion cost in the edge cost with each congestion update. As the congestion matrix does not change significantly with each occurrence of congestion update, we perform a lazy update of the routing cost for all the edges after every K occurrences of congestion updates. In our implementation, K 20 provides a satisfactory compromise between efficiency and solution quality. The assignment of a repeater to a repeater block is accomplished when the candidate set for the repeater consists of only one repeater block. 4.4 An Efficient Technique for Monotonicity Update As stated in Section 3.4, it is important that the eventual repeater assignments of a net constitute a monotonic path from source to sink. To ensure that, the candidate sets of other repeaters belonging to the same net need to be properly updated upon the deletion of a CRB from the candidate set of a repeater of that net. This procedure is referred to as Monotonicity Update. In the rest of this section, we shall consider the case where the net N, whose source is located at X s Y s µ and destination at X d Y d µ, has X s X d and Y s Y d. 2 The other three cases X s X d and Y s Y d, X s X d and Y s Y d, and X s X d and Y s Y d can be handled in a similar 13 2 We use the notation X Y µ for coordinates to avoid any confusion with the notation x and y used for denoting optimal repeater placement in Section 3. 14

15 IDS un-deleted CRBs c i c j IDC deleted CRBs c k Figure 8: An independent dominated set and its independent dominated contour. White squares corresponds to un-deleted CRBs. Lightly shaded squares correspond to deleted CRBs. Black squares correspond to CRBs in the independent dominated set. The independent dominated contour is depicted by thick lines. The darker shaded squares correspond to the region in which one should search for new members of the independent dominated set should c j be deleted. fashion. Let X i Y i µ be the location of the CRB to which the i-th repeater b i is assigned. The Monotonicity Update procedure must ensure that the following property is preserved by the eventual assignments of n repeaters: Monotonicity Property (MP): i j, 1 i n, 1 j n, i j; X i X j i j Prop. (1) Y i Y j i j Prop. (2) In order to preserve the monotonicity property, we make use of the dominance relation of CRBs. Consider two CRBs c and c ¼ at locations X c Y c µ and X c ¼Y c ¼ µ, respectively, we say that c ¼ dominates c if and only if X c X c ¼ and Y c Y c ¼. If neither c nor c ¼ dominates the other, they are mutually independent. Given the set of CRBs S b of repeater b, we define an independent dominated set IDS b to be a subset of S b such that (i) all CRBs in IDS b are mutually independent; (ii) other than self-dominance, CRBs from IDS b do not dominate CRBs from S b ; and (iii) every CRB from S b dominates (self-dominance included) at least one CRB from IDS b. Figure 8 shows the independent dominated set (IDS) of a repeater. We arrange the CRBs in an IDS by their X-coordinates in increasing order. From each CRB in the IDS, we extend a vertical line segment northward either infinitely if the CRB is the first ¾ 15

16 in the set or until it reaches the Y -coordinate of the previous CRB otherwise. We also extend from each CRB in the IDS a horizontal line segment eastward either infinitely if the CRB is the last in the set or until it reaches the X-coordinate of the next CRB otherwise. These line segments define a monotone contour of the IDS, which is dominated by CRBs in the northeast region of the contour. We refer to the monotone contour as the independent dominated contour (IDC) (see Figure 8). Consider two independent dominated sets IDS bi and IDS b j respectively for repeaters b i and b j, i j, we say that IDS b j dominates IDS bi or IDS bi IDS b j if and only if IDC b j is completely in the northeast region of IDC bi. When IDS b j dominates IDS bi, every CRB in IDS b j dominates at least one CRB in IDS bi. In other words, there exist some repeater assignments that constitute a monotone path. In order for the eventual repeater assignments to preserve the monotone property, we maintain the following relaxed monotone property: Relaxed Monotonicity Property (RMP): i j, 1 i n, 1 j n, i j; IDS bi IDS b j i j Prop. (3) ¾ When a CRB c of repeater b i is considered for deletion, we have to update IDS bi if c happens to be in IDS bi. As a result, we have to update the IDSs of subsequent repeaters b j, i j, if Prop. (3) is violated. If the sequence of updates results in an empty S bk, for some b k, then c should not be deleted in the first place. Instead, b i should be assigned to c. Based on the above observation, an monotonicity update procedure that preserves the relaxed monotonicity property has been developed. Figure 9 gives the details of the update procedure. We assume that e b i cµ is the edge to be deleted, where b i is the i-th repeater of net N, and c is a CRB in S bi. To be more general, the monotonicity update procedure takes in a subset of edges (adjacent to b i ) that are to be deleted. We denote the subset of edges by Ê bi. The Boolean variable ÎÓÐØÓÒ Ð monitors whether an IFR has become empty because of the monotonicity update. When that happens, we un-delete all CRBs that has been deleted, and making use of the changes stored in the variable ÁË»Á Ò, revert to the original IDS and IDC for each IFR. It also signals to the iterative deletion procedure in Figure 5 that b i should be assigned to c, where Ê bi e b i cµ is the input edge set. While the other steps are fairly straightforward, Steps 6 and 7 require further explanation. The procedure ÁË»Á ÍÔØ updates the IDS bi and accordingly IDC bi. Consider deleting a CRB from S bi, we do not have to update IDS bi if that CRB is not originally in IDS bi. Now, we consider the case when we delete a CRB from IDS bi. In the IDS shown in Figure 8, for example, 16

17 Monotonicity Update Procedure(Ê bi ) 1. for each c ¾ Ê bi do 2. mark c deleted; 3. if S bi /0 then 4. ÎÓÐØÓÒ Ð ÌÊÍ; 5. else» Ø ÒÜØ ÁÊ ÓÑ ÑÔØÝ» 6. ÁË»Á Ò ÁË»Á ÍÔØ (Ê bi S bi S bi 1 ); 7. Ê bi 1 c ¼ ¾ S bi 1 that violates RMP; 8. if Ê bi 1 /0 then 9. ÎÓÐØÓÒ Ð ÄË; 10. else 11. ÎÓÐØÓÒ Ð Monotonicity Update Procedure(Ê bi 1 ); 12. if ÎÓÐØÓÒ Ð ÌÊÍ then 13. for each c ¾ Ê bi do 14. mark c undeleted; 15. un-do ÁË»Á Ò ; 16. return ÎÓÐØÓÒ Ð; Figure 9: Monotonicity Update Procedure. 17

18 we have three ordered CRBs c i, c j, and c k. When we delete c j from the IDS, we search for new members of the IDS from within the darker shaded region defined by the rectangles bounded by X c j, X ck 1, Y c j, and Y ci 1, assuming that every CRB occupies a unit square. The new IDS members can be computed by sweeping either row-wise from Y c j to Y ci 1 or column-wise from X c j to X ck 1. Let c r be the first CRB that we encounter when we sweep row-wise from Y c j to Y ci 1 such that X cr X ck. Then, any remaining new IDS members lie in the rectangular region defined by X c j, X cr 1, Y cr 1, and Y ci 1. When we sweep columnwise, we search for the first CRB c c such that Y cc Y ci. Then, the next new IDS member, if any, lies in the rectangular region bounded by X cc 1, X ck 1, Y ci, and Y cc 1. Although the number of columns or rows that the algorithm has to sweep varies, we are able to amortize the operation cost such that the run-time complexity of a CRB deletion operation is O 1µ. The cost of marking a CRB as deleted, be it a IDS member or not, is charged to the CRB itself. In the following analysis, we assume that the algorithm performs a row-wise sweep after deleting a IDS member. If sweeping a row produces a new IDS member, we charge the operation cost to that CRB. An IDS member, once created, will not be involved in any sweeping operation again. The next time it is charged with an operation cost again is when it is marked deleted. If the sweep produces no new IDS member, then we charge the cost to a previously deleted CRB which is located at the east-most boundary of the rectangular region. For the example used in the preceding two paragraphs, we charge the cost of sweeping rows between Y c j to Y cr 1 to CRBs with co-ordinates X ck 1Y c j µ X ck 1Y cr 1µ. Such CRBs will not be charged with another sweep as they now lie outside the northeast region of the IDC. Therefore, every CRB is charged at most twice, i.e. O 1µ. Propagating the change in the contour from IDC bi to IDC bi 1 may turn out to be slightly more costly. Using the example in Figure 8, let c r be the CRB that is ordered before c k in the new IDS after c j is deleted. If X ck 1Y cr 1µ is in the northeast region of IDC bi 1, then we have to update IDC bi 1. As IDS bi 1 is ordered, we can perform a binary search in O logids bi 1 µ to find the two adjacent CRBs in IDS bi 1 whose Y -coordinates sandwich Y cr 1. Note that IDS bi 1 is no greater than the minimum of the numbers of the columns and rows in IDS bi 1. If X ck 1Y cr 1µ does not dominate the two adjacent CRBs, then no deletion of CRBs in S bi 1 is required. We can achieve a speed-up by taking advantage of the geometrical properties of IFRs. As mentioned in Section 3.4, all IFRs are octilinear in shape. Therefore, we can define for S bi 1 a 1-slope line Y X M such that for each c ¾ S bi 1, X c Y c M. Only when X ck Y cr 2 M do we have to perform a binary search. In general, we have to verify the new contour between c i and c k does not invade the northeast region of IDC bi 1 after 18

19 Description Value r wire resistance per unit length (Ωµm) c wire capacitance per unit length (ff/µm) T b intrinsic repeater delay (ps) 36.4 C s C b sink/repeater input capacitance (ff) 23.4 R b R d driver/repeater output resistance (Ω) 180 Table 1: Parameter values for interconnect, repeater, driver, and sink. deleting c j. The algorithm outlined in Figure 9 illustrates how the deletion of a CRB of b i affects the IFRs of repeaters b j, j i in net N. Such a deletion also affects the IFRs of repeaters b j, j i in net N. Similar to the algorithm in Figure 9, we develop an algorithm to perform monotonicity updates of IFRs for b j, j i, by defining the independent dominating set of S b as follows: (i) All CRBs in the independent dominating set are mutually independent; (ii) other than selfdominance, CRBs from S b do not dominate CRBs from the independent dominating set; and (iii) every CRB from S b is dominated (self-dominance included) by at least one CRB from the independent dominating set. The independent dominating set also defines an independent dominating contour, which dominates all CRBs in the southwest region of the contour. As the algorithm to update the independent dominating contours of preceding repeaters is very similar to the algorithm in Figure 9, we do not present its details here. 5 Experimental Results We have implemented our repeater block planning algorithm using C on a SUN UltraSPARC II machine. In this section, we present the details of our experimental set up and the results obtained. The interconnect and repeater parameters and the Elmore delay model have been described in Section 3.1. The values (see Table 1) used for these parameters are based on a 018µm technology in the NTRS 97 roadmap [25]. We report the results of our repeater block planner for six MCNC [26] benchmark circuits. The relevant details of these benchmarks are shown in Table 2. In this work we focus on solving the problem of repeater block planning for two-pin (single source, single sink) nets. As the benchmark files do not provide information on signal direction, we choose the first pin to be the source and all the others to be sinks, and decompose a multiple terminal net into a set 19

20 Circuit Modules Nets 2-Pin Nets apte xerox hp ami ami playout Table 2: Details of MCNC benchmarks. of two-pin nets. This may not be good especially for path based timing optimizations, but it provides a sufficiently good model of the input to a repeater block planner. We ignore all single pin, power, and ground interconnects. The initial floorplans of the MCNC benchmark circuits used for this work were obtained from [27], and are the same as those used in [1]. These floorplans were generated by running simulated tempering using an improved Monte-Carlo technique [28]. Since the MCNC benchmarks do not come with any timing information, we assign target delays to the two-pin nets as follows. We ignore all two pin nets whose lengths are smaller than the critical length L min [18], above which repeater insertion can be used for delay reduction. For each net we then compute the optimal delay T opt obtainable by repeater insertion [18] and then randomly assign a target delay between 1.05 and 1.20 times T opt as in [1]. As we generate these timing constraints on our own, a direct comparison between our method and that in [1] may not be fair even though we do use their results for reference in the following discussion. In Table 3 we report the following results from our repeater block planning algorithm : (i) ratio of number of nets for which the delay constraint is met to the number of nets for which MET the delay constraint is not satisfied, NOT MET ; (ii) the total number of repeaters inserted to meet timing constraints, N REP ; (iii) the maximum tile congestion, C MAX ; (iv) the chip area increase, δa expressed as a percentage of the original chip area; and (v) CPU time required, TCPU. We also include the results for these benchmark circuits reported in [1]. Compared to [1], MET is significantly higher. The success rates of meeting the timing constraints (or completion rates) range from 71% (for apte) to 95% (for playout). The average completion rate is 85%. On the other hand, the average completion rate for the same examples in [1] is 67.5%, and the completion rates range from 58% (for xerox and hp) to 84% (for ami33). Our completion rates are higher for all benchmark circuits in Table 2. 20

21 Circuit apte MET NOT MET N REP C MAX δa TCPU (s) p 1 1 p / p 1 0 p / p 1 p / [1] 102 / xerox p 1 1 p / p 1 0 p / p 1 p / [1] 260 / hp p 1 1 p / p 1 0 p / p 1 p / [1] 131 / ami33 p 1 1 p / p 1 0 p / p 1 p / [1] 305 / ami49 p 1 1 p / p 1 0 p / p 1 p / [1] 412 / playout p 1 1 p / p 1 0 p / p 1 p / [1] 1533 / Table 3: Comparison of repeater block planning solutions. 21

22 In Table 3, we report an identical N REP under three different combinations of p 1 and p 2 for each benchmark circuit even though the completion rates are different. The completion rates are different because some nets cannot meet the timing constraints after we expand the floorplan to accommodate the repeater blocks along the boundaries of circuit blocks (see Section 4). In fact, all entries in N REP, C MAX, and δa include nets that do not meet the timing constraints after floorplan expansions. Table 3 also shows that we generally require a smaller number of repeaters to achieve a higher completion rates, compared to [1]. In [1], all channels between blocks are expanded at the beginning of the planning step [27]. As a result, net lengths become longer, and more repeaters are required. In our algorithm, we expand the channels only when required. Therefore, our net lengths are shorter in general, and we require a smaller number of repeaters to meet the timing constraints. The results in Table 3 also show that our algorithm is able to reduce routing congestion. With p 1 1 and p 2 0 the objective function is oriented towards the reduction of congestion cost, whereas setting p 1 0 and p 2 1 causes the repeater block count to be minimized. p 1 p 2 05 represents a solution with a tradeoff between the two contradictory cost functions. The Table shows that routing congestion levels can be lowered when they are accounted for during repeater block planning. In xerox, for example, C MAX for p 1 0 and p 2 1 (i.e., the objective of minimizing the number of repeater blocks) is about 50% higher than the C MAX for p 1 1 and p 2 0 (i.e., the objective of minimizing the routing congestion). Clearly, if the design is routing-limited, the actual completion rates for the former case will be reduced accordingly. However, the average increase in chip area due to repeater insertion is 1.25%, which is higher than the 1.05% average increase reported in [1]. Moreover, the run-times of our algorithm are higher than those in [1]. 6 Conclusion In this paper we have presented a repeater block planning algorithm that uses the concept of independent feasible regions to do repeater clustering. The assignments of repeaters to repeater blocks are routability-driven and thus, we are able to avoid regions of high routing congestion. Experimental results show that our technique performs significantly better than existing repeater planning methods. 22

23 7 Acknowledgments The authors would like to thank Prof. Jason Cong, Mr. David Z. Pan, and Mr. Tianming Kong of UCLA for providing the floorplans of the MCNC benchmark circuits. They would also like to thank them and Dr. Kevin K.-Y. Chao of Intel for their helpful discussions, and the anonymous reviewers for their constructive comments. Appendix: Proofs of Theorems 1 and 2 We consider a net N of length L optimally inserted with n repeaters. We denote the optimal placement of the first repeater by x L, the distance between two optimally placed repeaters by y L, and the distance between the last repeater and the sink by z L. The expressions for x L and y L are given by Eqns. (3) and (4), respectively, and z L L n 1µy L x L. In the following lemma and theorems, we deal with repeaters displaced from their optimal locations. As a result, the distances between the driver and the first repeater, between repeaters, and between the last repeater and the sink change. We capture the resultant change in the delay by the following expression: RCwlµ D RCl wµ D RClµ rc 2 w2 rcl Rc rcµw (5) where w is the change in net length, l is the original net length, R is the driver resistance, C is the sink capacitance, and D RClµ is the Elmore delay in Eqn. (1). The coefficient of w in Eqn. (5) has an interesting property summarized in the following lemma: Lemma 1 The coefficient of w in Eqn. (5) satisfies the following equalities: rcx L R dc rc b rcy L R bc rc b rcz L R bc rc s (6) Proof: From Eqns. (2) (4), we can show that rcx L R dc rc b rc n 1 L c n n 1 R b rc R b R d µ L n 1 r rcy L R bc rc b c n 1 R d C s C b µ c r n 1 C r n s n 1 C b R b c rc b 23

24 and rcz L R bc rc s rc L n 1µy L x L µ R bc rc s rc n 1 L c n n 1 R c b n 1 R c d n 1 C c n s n 1 C b rcy L R bc rc b ¾ Theorem 1 For Dtgt N DN opt nlµ, the width of the feasible region for the i-th repeater (i n) of the net N is W FR 2 2 Dtgt N DN opt nlµµ n i 1µ iµ rc n 1µ Proof: Let DN opt nlµ denote the optimal delay for a net N of length L inserted with n repeaters. Also, let D N opt nl W µ denote the optimal delay for the same net (i.e., same R d and C s ) but with a difference of W in its wirelength. First, we consider the difference in the optimal delays of the two configurations: D N opt nl W µ DN opt nlµ D R dc b x L W µ D R dc b x L µ From Eqns. (2) (4), n 1µD R b C b y L W µ D R bc b y L µ D R b C s z L W µ D R bc s z L µ x L W x L y L W y L z L W z L W n 1 Therefore, we can rewrite the difference between the optimal delays as: D N opt nl W µ DNopt nlµ R W dc b n 1 x L µ n 1µ R W bc b n 1 y L µ W R b C s n 1 z L µ (7) We split N into two subnets N 1 and N 2 at the i-th repeater. Therefore, the length of N 1 is L 1 x L i 1µy L and the sink of N 1 is the i-th repeater of N. The length of N 2 is L 2 n iµy L z L and the driver of N 2 is the i-repeater. There are i 1 repeaters in N 1 and n i repeaters in N 2. As y L 1, z L 1, x L 2, and y L 2 correspond to the distances between optimally placed repeaters in N, the following equalities hold: y L 1 z L 1 x L 2 y L 2 y L (8) 24

25 Furthermore, x L 1 x L and z L 2 z L (9) Let the i-th repeater of N be displaced by a distance W from its optimal location (W being positive toward the sink), and the remaining repeaters be placed optimally with respect to the displaced repeater. Therefore, the lengths of N 1 and N 2 change by W and W, respectively. The delay of such a configuration is From Eqns. (7) and (8), ˆD nlµ D N 1 opt i 1L 1 W µ D N 2 opt n il 2 W µ T b D N 1 opt i 1L 1 W µ D N 1 opt i 1L 1µ R d C b W i x L 1 µ i 1µ R b C b W i y L 1 µ D N opt n 2 il 2 W µ D opt n N 2 il 2µ n iµ R b C b Therefore, W R b C s n i 1 z L 2 µ W n i 1 y L 2 µ ˆD nlµ D N opt nlµ R dc b W i x L 1 µ i 1µ R b C b W i y L 1 µ n iµ R b C b W W n i 1 y L 2 µ R b C s n i 1 z L 2 µ Expanding the s and substituting x L for x L 1, y L for y L 1 and y L 2, and z L for z L 2, we obtain ˆD nlµ D N opt nlµ i rc 2 Making use of the equalities in Lemma 1, W i 2 rcx L R dc rc b µ W i 1µ rcy L R bc rc b µ i rc W 2 n i 1µ 2 n i 1 W n iµ rcy L R bc rc b µ n i 1 rcz L R bc rc s µ ˆD nlµ D N opt nlµ rc 2 W n i 1 1 W 2 1 i n i 1 W i 25

26 Therefore, the maximum displacement for the i-th repeater without violating the timing constraint D N tgt is W 2 Dtgt N DN opt nlµµ n i 1µ iµ n 1µrc Hence, W FR for the i-th repeater is 2 Dtgt N DN opt nlµµ n i 1µ iµ W FR 2 n 1µrc ¾ Theorem 2 For Dtgt N D N opt nlµ, the width of the independent feasible region for the i-th repeater (i n) of the net N is W IFR 2 Dtgt N D N opt nlµ rc 2n 1µ Proof: Consider the net N of length L having n repeaters as shown in Figure 2. Let w i be the displacement of the i-th repeater from its optimal position (w i being positive toward the sink). Let w 0 0 and w n 1 0 denote the displacements of the source and the sink from their original positions, respectively. The delay for this configuration of repeaters is D D R d C b x L µ R dc b w 1 w 0 x L µ n 1 D R b C b y L µ R bc b w i 1 w i y L µ i1 D R b C s z L µ R bc s w n 1 w n z L µ nt b D N rc n 1 opt nlµ 2 w i w i 1 µ 2 rcx L R dc rc b µ w 1 w 0 µ i1 n 1 rcy L R bc rc b µ w i 1 w i µ rcz L R bc rc s µ w n 1 w n µ i1 Making use of the fact that displacements w 0 w n 1 0, and that the second last term of the preceding equation is a telescopic sum, we derive the following expression for the last three terms of the preceding equation: rcx L R dc rc b µw 1 rcy L R bc rc b µ w n w 1 µ rcz L R bc rc s µw n which sum to zero because of the equalities in Lemma 1. 26

A buffer planning algorithm for chip-level floorplanning

Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 763 A buffer planning algorithm for chip-level floorplanning CHEN Song 1, HONG Xianlong 1, DONG Sheqin 1, MA Yuchun 1, CAI Yici 1,