A buffer planning algorithm for chip-level floorplanning

Similar documents
An Enhanced Perturbing Algorithm for Floorplan Design Using the O-tree Representation*

Satisfiability Modulo Theory based Methodology for Floorplanning in VLSI Circuits

Buffer Block Planning for Interconnect Planning and Prediction

TCG-Based Multi-Bend Bus Driven Floorplanning

Interconnect Delay and Area Estimation for Multiple-Pin Nets

Slicing Floorplan With Clustering Constraint

Floorplan considering interconnection between different clock domains

An Interconnect-Centric Design Flow for Nanometer Technologies

Non-Rectangular Shaping and Sizing of Soft Modules for Floorplan Design Improvement

Routability Driven Floorplanner with Buffer Block Planning

Routability-Driven Repeater Block Planning for Interconnect-Centric Floorplanning

Constraint-Driven Floorplanning based on Genetic Algorithm

Congestion Prediction in Early Stages of Physical Design

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

An Interconnect-Centric Design Flow for Nanometer Technologies

On Improving Recursive Bipartitioning-Based Placement

A Novel Performance-Driven Topology Design Algorithm

On Increasing Signal Integrity with Minimal Decap Insertion in Area-Array SoC Floorplan Design

A Linear Programming-Based Algorithm for Floorplanning in VLSI Design

A GENETIC ALGORITHM BASED APPROACH TO SOLVE VLSI FLOORPLANNING PROBLEM

Integrated Floorplanning with Buffer/Channel Insertion for Bus-Based Microprocessor Designs 1

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion

An Enhanced Congestion-Driven Floorplanner

Exploring Adjacency in Floorplanning

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h)

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

Tree Structure and Algorithms for Physical Design

On the Number of Rooms in a Rectangular Solid Dissection

Making Fast Buffer Insertion Even Faster Via Approximation Techniques

Floorplan Area Minimization using Lagrangian Relaxation

Routability-Driven Bump Assignment for Chip-Package Co-Design

A Novel Framework for Multilevel Full-Chip Gridless Routing

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Bus-Aware Microarchitectural Floorplanning

An Automated System for Checking Lithography Friendliness of Standard Cells

Crosslink Insertion for Variation-Driven Clock Network Construction

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations

University of California at Berkeley. Berkeley, CA the global routing in order to generate a feasible solution

Basic Idea. The routing problem is typically solved using a twostep

Wojciech P. Maly Department of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA

Thermal-Aware 3D IC Physical Design and Architecture Exploration

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

On GPU Bus Power Reduction with 3D IC Technologies

Multi-path Routing for Mesh/Torus-Based NoCs

Short Papers. Creating and Exploiting Flexibility in Rectilinear Steiner Trees

Routing Tree Construction with Buffer Insertion under Buffer Location Constraints and Wiring Obstacles

Time Algorithm for Optimal Buffer Insertion with b Buffer Types *

Effective Decap Insertion in Area-Array SoC Floorplan Design

DpRouter: A Fast and Accurate Dynamic- Pattern-Based Global Routing Algorithm

Fast, Accurate A Priori Routing Delay Estimation

Double Patterning-Aware Detailed Routing with Mask Usage Balancing

Representing Topological Structures for 3-D Floorplanning

Constructive floorplanning with a yield objective

A Provably Good Approximation Algorithm for Rectangle Escape Problem with Application to PCB Routing

VERY large scale integration (VLSI) design for power

Fast Wire Length Estimation by Net Bundling for Block Placement

An Interconnect-Centric Design Flow for Nanometer. Technologies

Beyond the Combinatorial Limit in Depth Minimization for LUT-Based FPGA Designs

An Optimal Algorithm for Layer Assignment of Bus Escape RoutingonPCBs

Can Recursive Bisection Alone Produce Routable Placements?

On the Complexity of the Channel Routing Problem in the Dogleg-free Multilayer Manhattan Model

On the Complexity of Graph Cuboidal Dual Problems for 3-D Floorplanning of Integrated Circuit Design

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Fast Delay Estimation with Buffer Insertion for Through-Silicon-Via-Based 3D Interconnects

An O(nlogn) Algorithm for Obstacle-Avoiding Routing Tree Construction in the λ-geometry Plane *

λ-oat: λ-geometry Obstacle-Avoiding Tree Construction With O(n log n) Complexity

[14] M. A. B. Jackson, A. Srinivasan and E. S. Kuh, Clock routing for high-performance ICs, 27th ACM

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Placement Algorithm for FPGA Circuits

Buffered Steiner Trees for Difficult Instances

Placement Constraints in Floorplan Design

Introduction VLSI PHYSICAL DESIGN AUTOMATION

Cell Density-driven Detailed Placement with Displacement Constraint

An Analytical Approach to Floorplan Design and Optimization. Suphachai Sutanthavibul, Eugene Shragowitz, J. Ben Rosen

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

Circuit Placement: 2000-Caldwell,Kahng,Markov; 2002-Kennings,Markov; 2006-Kennings,Vorwerk

An Effective Decap Insertion Method Considering Power Supply Noise during Floorplanning *

Generation of Optimal Obstacle-avoiding Rectilinear Steiner Minimum Tree

A General Sign Bit Error Correction Scheme for Approximate Adders

IN recent years, interconnect delay has become an increasingly

Retiming & Pipelining over Global Interconnects

Pseudopin Assignment with Crosstalk Noise Control

Crosstalk Noise Optimization by Post-Layout Transistor Sizing

AS VLSI technology advances, crosstalk becomes increasingly

Multilayer Routing on Multichip Modules

How Much Logic Should Go in an FPGA Logic Block?

ICS 252 Introduction to Computer Design

Incremental Exploration of the Combined Physical and Behavioral Design Space

General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs

AN ENTROPY BASED GENETIC ALGORITHM TO SIMULTANEOUSLY MINIMIZE AREA AND WIRELENGTH FOR VLSI FLOORPLANNING PROBLEM

Chapter 28: Buffering in the Layout Environment

AS VLSI technology scales to deep submicron and beyond, interconnect

A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers

Effects of FPGA Architecture on FPGA Routing

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

Wirelength Estimation based on Rent Exponents of Partitioning and Placement Λ

Efficient Static Timing Analysis Using a Unified Framework for False Paths and Multi-Cycle Paths

IN TODAY S system-on-chip designs, both digital and

Transcription:

Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 763 A buffer planning algorithm for chip-level floorplanning CHEN Song 1, HONG Xianlong 1, DONG Sheqin 1, MA Yuchun 1, CAI Yici 1, Chung-Kuan Cheng 2 & Jun Gu 3 1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; 2. Department of Computer Science and Engineering, University of California, San Diego, USA; 3. Department of Computer Science, Science & Technology University of Hong Kong, China Correspondence should be addressed to Chen Song (email: chens00@mails.tsinghua.edu.cn) Received June 7, 2004 Abstract This paper studies the buffer planning problem for interconnect-centric floorplanning for nanometer technologies. The dead-spaces are the spaces left unused within a placement that are not held by any circuit block. In this paper, we proposed a buffer planning algorithm based on dead space redistribution to make good use of dead-spaces for buffer insertion. Associated with circuit blocks under topological representations, the dead space can be redistributed by moving freely some circuit blocks within their rooms in the placement. The total area and the topology of the placement keep unchanged while doing the dead space redistribution. The number of nets satisfying the delay constraint can be increased by redistributing the dead space all over the placement, which has been demonstrated by the experimental results. The increment of the number of nets that meet delay constraint is 9% on an average. Keywords: buffer planning, dead space, redistribution, floorplanning, VLSI, corner block list. DOI: 10.1360/ 03yf0028 As the Very Large Scale Integrated (VLSI) circuits are scaled into nanometer dimensions and operate in gigahertz frequencies, interconnect design and optimization has become critical in determining system performance, cost, and reliability. To ensure the timing closure of design, interconnects must be considered as early as possible in the design flow. In the last decade, several interconnect optimization techniques have been studied, such as topology construction, buffer insertion, device sizing, wire sizing and spacing, and combinations of them. A comprehensive survey of these techniques can be found in ref. [1]. Buffer insertion is an effective technique to reduce the interconnect delay. While the Elmore delay of a long wire grows quadratically in terms of the length of the wire, buffer insertion properly results in a linear delay increase due to the length of the wire. The number of buffers needed to achieve timing closure continues to increase with de-

764 Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 creasing feature size. For example, in a chip with an estimated total wire length of 10 kilometers there are close to 800000 buffers to be inserted for performance optimization in the 70 nm technique [2]. Buffers must be planned early in the design flow because they will take up silicon resources. Recently, the buffer planning problem has attracted much more attention, and many approaches to buffer planning have been proposed. Because the buffer planning is very time-consuming, it is difficult to integrate the buffer planning into the procedure of floorplanning. Consequently, the buffer planning is generally achieved as the post-processing of floorplanning. Cong et al. [3, 4] introduces the concept of feasible region, which is used to generate buffer blocks. The feasible region for a buffer of a net is the maximum region where the buffer can be located such that the target delay of the net can be satisfied. They make use of dead space among circuit blocks to insert buffers. Sarkar et al. [5, 6] add the notion of independence to feasible region and also try to improve the routing congestion. The [7, 8] buffer blocks are located in the dead space among circuit blocks. Tang and Wong propose an optimal algorithm assuming only one buffer for each net, and dead space is also used for buffer insertion. Dragan et al. [9, 10] allocate buffers to pre-existing buffer blocks by the multi-commodity flow-based approach. Alpert et al. [11, 12] make use of tile graph and dynamic programming to perform buffer block planning. They assume that buffers be allowed to be inserted inside macro blocks and their approach will distribute buffer sites all over the layout. Sham et al. [13, 14] propose a routability driven floorplanner, which can estimate buffer usage and buffer resource for the congestion constraint. Rafiq et al. [15] provide an integrated floorplanner with buffer/channel insertion for bus-based microprocessor designs. Dead space is the space left unused in the placement. In the case of many IP blocks, it is the unique choice of the achievement of the system performance to insert buffers into dead space. As a post-processing of floorplanning, the buffer planning has to be performed in an approximate optimal solution of floorplanning (i.e. a compact placement), in which the location, size and amount of the dead space are fixed. All of the previous work performed the buffer planning under this restriction generally. To break this limit and favor the later routing procedure by a good initial floorplanning and buffer planning results, two alternative solutions have to be adopted. One is to reserve dead space for buffer planning in the stage of floorplanning and the other is to change the distribution of the dead space in existing placement. The former is not practical. Because it is inefficient and difficult to integrate the buffer planning into the floorplanning, it is difficult to determine the amount and location of dead space that need to be reserved. The latter method we just proposed redistributes the dead space. Given a floorplanning result, the amount of the dead space is determined, but the application of our algorithm has no limitation on the amount of the dead space. No previous work changes the distribution of the dead space in the placement to improve the timing closure of the chip. In this paper, we devise a dead space redistribution based buffer planning algorithm

A buffer planning algorithm for chip-level floorplanning 765 to make good use of dead space for buffer insertion. Associated with circuit blocks under topological representations, the dead space in the placement can be redistributed by moving freely some circuit blocks within their rooms, while the total area and the topology of the placement keep unchanged. We compute the independent feasible region (IFR) [5] for each buffer under delay constraint. All buffers can be moved anywhere within their respective IFRs without violating the timing constraints. Each buffer can be inserted into the intersection area between its IFR and the dead-spaces. Therefore, redistributing the dead space in the placement can increase the number of the buffers that can be inserted. In other words, the number of nets satisfying the delay constraints will be increased, which is demonstrated by the experimental results. The increment of the number of nets which meet delay constraints is 9% on an average. 1 Preliminary 1.1 Problem definition In this paper, we concentrate on the buffer planning problem: Given an initial floorplan/placement and timing constraint for each net, we want to determine the number and locations of buffers for each net to meet the timing closure. The buffers are considered to be inserted into the dead space among circuit blocks. In our method, the dead spaces used in the buffer planning are adjacent to blocks or channels among blocks, but they have no overlaps. The dead space among blocks can be classified into two types (section 2.1). One type of dead space can be redistributed, and the other type of dead space cannot be redistributed which is the consequential result of floorplanning. But we can take advantage of the latter type of dead space to insert buffers in our algorithm. Additionally, in the stage of floorplanning, some dead space may be reserved for certain demands. For this type of dead space and the channels, the proposed algorithm can make use of them easily if buffer insertion is permitted in these spaces. Otherwise, these spaces can be regarded as dummy blocks to prevent inserting buffers into these spaces. 1.2 Independent feasible region Sarkar and Koh [5, 6] introduced the concept of independent feasible region (IFR) for buffer insertion, and developed analytical formula to calculate IFR. The independent feasible region for a buffer b is the maximum region where b can be located such that by inserting buffer b into any location in that region, the net delay constraint can be satisfied, assuming that the other buffers of that net are also located within their respective independent feasible regions. For example fig. 1 shows a feasible region for a buffer of a net. The minimum number of buffers needed to meet the delay constraint T req for an interconnect of length l [3, 4] is k min 2 K5 K5 4K4K 6 =, (1) 2K4 www.scichina.com

766 Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 where K4 = RbCb + Tb, K = ( rc + cr ) l + T + R C + R C T 5 b b b d b b l req r c 2 ( Cb Cl)2 ( Rb Rd), 2c 2r 1 2 K6 = rcl + ( rcl + crd) l+ RdCl Tr eq. 2 The notation for the physical parameters of the interconnect and buffer we use in this paper is as follows: r c T b C b R b C l R d l wire resistance per unit length; wire capacitance pre unit length; intrinsic buffer delay; buffer input capacitance; buffer output resistance; sink capacitance; driver resistance; the length of source-sink net N (two-pin net). The IFR of each buffer in a net can be calculated by means of the method developed in refs. [5, 6]. 1.3 Corner block list Fig. 1. Independent feasible regions are convex polygons. Corner Block List (CBL) is a topological representation introduced in ref. [16,17]. CBL represents floorplan by a triple list of (S, L, T). It dissects the chip into rectangular areas, denoted as room, and assigns one and only one block to each room according to (S, L, T), where S stands for block assignment, L and T stand for orthogonal line seg-

A buffer planning algorithm for chip-level floorplanning 767 ments. Dong et al. [18] extended the CBL by adding empty rooms to CBL and assigning a dummy block to each empty room. As shown in fig. 2(a), a dummy block 0 is assigned to an empty room. In the following sections, the CBL means the extended CBL without special declaration. The sequence S records blocks in the placement from left-bottom corner to upperright corner. If a block b i is to the left of b j in the sequence S, b i must be below or to the left of b j in the placement. The list L records the orientation of each block. In the floorplan, the segments lying on the left and bottom boundary of the corner block compose a T-junction which defined the orientation of the block. The T-junction has two alternative orientations: T rotated counterclockwise by 90 ( ) representing vertical orientation and by 180 ( ) representing horizontal orientation, respectively. Fig. 2(a) shows an example of vertical orientation T-junction. The sequence T records the number of T-junctions covered by each block. A binary list T i ended by zero is used to record the number of the T-junctions covered by the corner block b i. In fig. 2(a), corner block d covered one T-junction. In sequence T, there is a binary sequence 10 to record the number of T-junctions covered by d. Fig. 2. The CBL representation. (a) The orientation of block d and T-junction covered by d; (b) the CBL of a floorplan. Given (S, L, T), we can construct the corresponding floorplan. A floorplan and its CBL representation are shown in fig. 2(b). More details of CBL can be found in ref. [16] 2 Redistribution of the dead-spaces In this section, the redistribution of the dead space is discussed, and the computation of the dead space in a placement is proposed in the end. 2.1 Redistribution of the dead space in the placement The dead space (made up of many rectangular dead space blocks, denoted as dead-space) is defined as the space within a placement that is not held by any circuit block. The chip can always be dissected into rooms (section 1.3), and there is at most one block in each room. All the rooms are not held entirely by the circuit blocks and www.scichina.com

768 Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 there may be some empty rooms to which no circuit block is assigned. Therefore, some dead-spaces may be generated. According to the generation of a dead-space, the dead-spaces in a floorplan can be classified into the following two types. Definition 1. If a dead-space is generated because of some empty room, the dead-space is called a Detached Dead-Space (DDS). A Detached Dead-Space cannot be associated with any circuit block. For example, the empty room 0 shown in fig. 3(a) is a Detached Dead-Space, and it cannot be associated with any circuit block around it. Definition 2. A dead-space is called an Attached Dead-Space (ADS) if the dead-space is generated because a room is not entirely held by a circuit block. An Attached Dead-Space can be associated with the circuit block in the room where the ADS is generated. As shown in fig. 3(a), the dead-spaces a 1, e 1 and e 2 are Attached Dead-Spaces. The dead-space a 1 is associated with block a, and e 1, e 2 are associated with block e. The Detached Dead-Spaces are moveless, while the Attached Dead-Spaces can be redistributed. Because of the existence of the Attached Dead-Spaces, some circuit blocks can be moved freely in a one-dimensional region or 2-dimensional region (for instance, the block e in fig. 3(b)) keeping the topology and the total area of the floorplan unchanged. Through the topological floorplan representation, we can associate each Attached Dead-Space with some circuit block. Consequently, the distribution of deadspaces can be changed by moving the circuit blocks within the rooms which are not filled. In fig. 3(a), the Attached Dead Space e 1 is above block e, and e 2 is to the right of block e. They are associated with circuit block e. Fig. 3(b) gives a redistribution of the Attached Dead-Spaces in the placement shown in fig. 3(a). The dead-space a 1 is divided into a 11 and a 12 by moving block a, block e is moved to divide e 2 into e 21 and e 22, and e 1 is moved below block e in fig. 3(b). Fig. 3. Dead-space in the floorplan. (a) Two types of dead-space; (b) redistribution of dead. Therefore, the following theorem is obtained easily. Theorem 1. The dead space redistribution can be achieved by redistributing the

A buffer planning algorithm for chip-level floorplanning 769 Attached Dead-Spaces in the floorplan, while the topology and total area of the floorplan keep unchanged. Proof. The chip is partitioned into rooms by horizontal and vertical segments. And the horizontal and vertical segments determine the topology among rooms. Each of the circuit blocks is assigned to certain room, in which there is at most one block. Therefore, the topological relations among rooms indicate the topological relations among circuit blocks. The circuit block is moved within the room and causes no moves of other blocks. Consequently the topology and the area of the placement keep unchanged while redistributing the dead space in the placement. 2.2 Computation of the dead-spaces in a given floorplan/placement In this subsection, we describe how to find all the dead-spaces in a floorplan/placement and to associate each dead-space with some circuit block or dummy block under the representation of CBL which has been introduced in section 1. For a floorplan of n blocks, the CBL dissects the chip into m (m n) rectangular rooms by horizontal and vertical segments, which determine the topology among rooms. Thus, each of the dead-spaces must be generated in a certain room. Because each room is assigned with a circuit block or a dummy block, and each dead-space must be associated with some circuit block or dummy block. It is obvious that each of the Detached Dead-Spaces must be associated with a dummy block, and each of the Attached Dead-Spaces is associated with a circuit block. When a corner block is inserted during the packing process, the corner block must cover some other inserted blocks. For each block b, we check all the blocks covered by b, and determine whether or not there are dead-spaces between b and blocks it covered by comparing the coordinates of them. Simultaneously each dead-space is associated with a covered circuit block. The detached dead-spaces are associated with a dummy block automatically. For example, blocks a, d and e are covered by block g in fig. 3(a). We compare respectively the coordinates of block g with those of a, d and e to find the dead-spaces between them. The dead-spaces a 1, e 1 and e 2 are found, and dead-space a 1 is an Attached Dead-Space associated with the block a, and dead-spaces e 1 and e 2 are Attached Dead-Spaces associated with block e. Each Detached Dead-Space is associated with a dummy block automatically and the dimension of the dummy block determines the size of the detached dead-space. The size of dummy blocks will be determined during the packing automatically. By checking the size of dummy blocks, we can find all the Detached Dead-Spaces. For example, in fig. 3(a), we find the Detached Dead-Space associated with dummy block 0. 3 Buffer planning and optimization In this section, we describe the buffer planning algorithm based on the dead-space redistribution in detail. First, how to compute the candidate tiles of a buffer is discussed. Second, a bipartite graph that represents all possible assignment from buffers to tiles is www.scichina.com

770 Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 constructed, in which the max cardinality matching is found to get maximum buffer insertion. Finally, the dead-spaces are redistributed to improve the solution. Given a placement, we assume that the buffers can be only inserted into the dead space among circuit blocks and also that no buffers can be inserted inside circuit blocks. As shown in ref. [16], the CBL representation of the placement can be obtained by deleting the corner block recursively. And then we compute the dead-spaces in the placement and associate each dead-space with a circuit block or a dummy block using the method described in section 2.2. The dead spaces we compute are represented by rectangular block area. 3.1 Computation of the candidate tiles The dead-spaces are divided into tiles where buffers can be inserted to satisfy the target delay of nets. The computation of candidate tiles of buffers is difficult and time-consuming when doing the buffer planning. A naive solution is to examine all the tiles in the placement, but this method s running-time depends on the number of tiles. We propose here a fast algorithm to compute candidate tiles for each buffer. The IFR of a buffer of a net is a convex polygon (fig. 4(a)), which is bounded by two parallel lines having the slope of +1 or 1 and the bounding box determined by the source and the sink of the net. Instead of the complex computation of the intersection between the convex polygon of IFR and the dead-spaces, we decompose the problem into two simple problems: the first step is to compute the intersection between the dead space and the bounding box of the source and the sink; the second step is to compute the overlapping between the result of the first step and the region between two parallel lines with slope of +1 or 1. Fig. 4. Candidate tiles of a buffer and its computation. (a) IFR and candidate tiles; (b) computation of candidate tiles. Step 1. Compute the intersection between dead space and the source-sink bounding box. It is a basic problem on computational geometry to figure out the intersection between two rectangles. We solve it by the extension on the intersection of the segments.

A buffer planning algorithm for chip-level floorplanning 771 Let the lower left corner of the bounding box between the source and the sink be (x s lb, y s lb ) and the upper right corner be (x t rt, y t rt ). The lower left corner of the dead space block is (x ds lb, y ds lb ), and the upper right corner is (x ds rt, y ds rt ). The two rectangles intersect with each other if the following inequalities are satisfied: rt lb x x > 0, rt lb y y > 0, where x lb =max(x ds lb, x s lb ), y lb =max(y ds lb, y s lb ), x rt =min(x ds rt, x t rt ), and y rt =min(y ds rt, y t rt ). And the lower left corner of the intersection rectangle should be (x lb, y lb ) ; the upper right corner is (x rt, y rt ). The obtained result in this step is denoted hereafter as intersected dead space. Step 2. Compute the intersection between intersected dead space and IFR. The buffer has no candidate locations if the intersection is empty. Otherwise, the intersection can be calculated as follows. To compute the intersection between IFR and intersected dead space, we number the tiles located in the intersected dead space in the sequence of the slope +1 or 1. Fig. 4(b) shows the case of 1. Here we only consider the parallel lines with the slope of 1, since the other case can be dealt with similarly. For convenience, we call the left parallel line first line, and the right one second line. A tile is considered as the candidate location of a buffer if the lower left corner of the tile is inside the IFR of the buffer. Suppose that the first line meets the intersected dead space at the bottom boundary or the right boundary in the tile t b and the second line meets the intersected dead space at the top boundary or the left boundary in the tile t e. Then the feasible buffer insertion locations should be between t b and t e. The tiles between 7 and 17 are candidate tiles of the buffer in fig. 4(b). 3.2 Buffer planning The objective of the buffer planning is to determine the number and locations of buffers, and insert as many buffers as possible to maximize the number of nets that meet the timing constraints. First, the candidate tile set for each buffer is calculated. Secondly a bipartite graph is constructed to represent all the possible assignment from buffers to tiles. Finally the assignment of buffers to tiles is achieved by finding the max cardinality matching in a bipartite graph. Algorithm 1 shows an outline of the buffer planning algorithm. In step 1, each dead-space is divided into small tiles in which the buffers can be located. For each buffer b, we compute all the possible tiles where b can be placed in step 3 by means of the method discussed in subsection 3.1. Consequently, the set of all possible buffer assignments is computed, from which a bipartite graph G can be constructed. www.scichina.com

772 Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 Algorithm 1 Buffer planning 1. Build the tile data structure for all the dead-spaces. 2. Compute IFR for each buffer. 3. Compute the set of candidate tiles for each buffer. 4. Construct a bipartite graph G (V, E), V = V 1 V 2, where vertices in V 1 represent buffers and vertices in V 2 represent tiles, E = {(v 1, v 2 ), v 1 V 1, v 2 V 2, v 2 is a candidate location of v 1 }. 5. Construct an s-t graph from G. 6. Find the max flow from s to t and determine the location of each buffer. Each edge of G represents a possible assignment from a buffer to a tile. The bipartite graph G is constructed as follows: G = (V, E), V = V 1 V 2, where vertices in V 1 represent buffers and vertices in V 2 represent tiles, E = {(v 1, v 2 ), v 1 V 1, v 2 V 2, v 2 is a candidate location of v 1 }. In step 5, in order to insert as many buffers as possible, we construct an s-t graph based on bipartite graph G to find the max cardinality matching. We direct all edges from V 1 to V 2, add a source s and a directed edge from s to each element of V 1, and add a sink t and a directed edge from each element of V 2 to t. Let each edge have a capacity 1. An example is shown in fig. 5. The max cardinality matching in graph G is computed by finding the max flow from s to t. 3.3 Optimization Fig. 5. Construction of s-t graph. (a) Bipartite graph G; (b) s-t graph. The solution of the above Buffer Planning algorithm is improved by redistributing the dead-spaces all over the placement. The objective is to maximize the number of nets that meet the delay constraints. Algorithm 2 outlines the optimization algorithm.

A buffer planning algorithm for chip-level floorplanning 773 Algorithm 2 Optimization 1. Compute all the dead-spaces in the placement and associate each of the dead-spaces with some circuit block. 2. Run the buffer planning algorithm to compute the number of nets that meet the target delay, denoted as N old. 3. Generate new distribution of the dead-spaces and update related information. 4. Run the Buffer Planning algorithm to compute the number of nets that satisfy the delay constraints, denoted as N new. 5. If N old < N new, the new dead space distribution is accepted, N old = N new. Otherwise, restore the previous dead space distribution. The new distribution of the dead space in the floorplan can be generated by the following two methods. One method is to randomly select two dead-spaces which are related to some blocks and change the distribution of the two dead-spaces by moving each of the selected dead-spaces to the other side of the corresponding circuit block. For example in fig. 3(b), the dead space e 1 is moved below the block e. The other method is to randomly select one dead-space which is related to some block and change the distribution of the dead-space by dividing the dead space into two separate parts. And the associated circuit block will be located between the two parts. For example in fig. 3(b), the dead space a 1 is divided into a 11 and a 12. When the new dead space distribution is generated, we update the information of the nets that have pins in the moved circuit block which include wire length, buffer number and independent feasible region of each buffer. In step 5, if the new dead space distribution is not accepted, we restore the dead space distribution to the previous status and erase the changes on the related nets information. In step 6, steps 3 to 5 are repeated hundreds of times. It is obvious that the topology of the placement and total area of the chip keep unchanged after the dead space redistribution. We run the buffer planning algorithm described in the above subsection to compute the number of nets that meet the target delay for each new dead space distribution. 4 Experimental results The Buffer Planning and Optimization algorithm have been implemented using C language on a SUN Ultra-SPARC III (750M) machine. In this section, we present some details of our experimental results obtained. The interconnect line and buffer parameters have been introduced in section 1. The values (see table 1) used for these parameters are based on a 0.18 µm technology in the NTRS 97 roadmap [19]. www.scichina.com

774 Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 In this paper, we concentrate on solving the problem of buffer planning for two-pin (single source, single sink) nets, and all the multiple-pin nets are decomposed into two-terminal nets. Because of the lack of information on signal direction in the benchmark files, we choose a pin to be the source and all the others to be sinks, and then decompose a multiple terminal net into a set of two-pin nets. We ignore all power and ground interconnects. The initial floorplans of the MCNC benchmark circuits used for this work were obtained from ref. [16]. Table 1 Value for the parameters used R unit length wire resistance /Ω µm 1 0.075 C unit length wire capacitance /ff µm 1 0.118 T b intrinsic buffer delay /ps 36.4 C d buffer input capacitance/ff 23.4 R b buffer output resistance/ω 180 R d driver output resistance/ω 180 C l sink input capacitance /ff 23.4 Target delays of the two-pin nets are assigned as in ref. [3], since the MCNC benchmarks do not include any timing information. All two-pin nets whose lengths are smaller than the critical length l min [20] are ignored, because buffer insertion cannot reduce their delay. We compute the optimal delay T opt under optimal buffer insertion [20] for each net and then assign randomly a constraint delay between 1.05 and 1.20 times of T opt to the net as in ref. [3]. Since we generate placements and timing constraints on our own, a direct comparison between our method and that in refs. [3 6] cannot be fair. But the results of ref. [3] are shown in table 3 for reference. We provide the results of our buffer planner for 5 MCNC benchmark circuits [21]. The relevant details of these benchmarks are shown in table 2. The data in table 3 include the number of nets which meet the delay constraint, the total number of buffers which are inserted successfully, the improvement of the optimization algorithm and the CPU time. The column of Buffer Planning shows the experimental results of running Buffer Planning (Algorithm 1) algorithm under the initial dead space distribution, and the column of Optimization is the experimental results of Optimization (Algorithm 2) algorithm. The sub-column of met, buffers, and time respectively show the number of nets that satisfy the delay constraint, the number of buffers inserted and the CPU time consumed. The improved number of nets that meet the target delay and the ratio are respectively given in the columns of N imp and R imp. The results in table 3 show that Optimization algorithm is able to increase the number of nets that satisfy the delay constraints, while the total area of the floorplan are unchanged. In circuit Xerox, for example, the number of nets that satisfy the delay constraints is 275 in the initial dead space distribution, and the number is increased to 315 after optimization. The nets which meet the delay constraints increase by 14.5%. For the five circuits, the increment of the number of the nets that satisfy delay constraints is 9%

A buffer planning algorithm for chip-level floorplanning 775 on an average. The experimental results show that our Optimization algorithm is very effective. Because of the iteration of the dead space redistribution, the run-time of our algorithm is higher than that in refs. [3, 5]. Table 2 MCNC bechmarks statistics Circuit Blocks Nets Two-pin nets Apte 9 97 172 Xerox 10 203 455 Hp 11 83 226 Ami33 33 123 363 Ami49 49 408 545 Table 3 The results of the Buffer Planning algorithm and Optimization algorithm Circuit Buffer Planning Optimization Ref. [3] N imp R imp met buffers time/s met buffers time/s met buffers time/s Apte 89 83 0.16 100 104 28.6 11 12.4% 102 185 0.23 Xerox 275 152 0.1 315 182 8.7 40 14.5% 260 399 0.53 Hp 129 179 0.25 139 182 25.1 10 7.8% 131 280 0.48 Ami33 235 162 0.08 249 178 7.1 14 5.9% 305 667 1.63 Ami49 437 236 0.51 457 253 49.1 20 4.6% 412 946 3.25 5 Conclusion In this paper, we proposed a buffer planning algorithm based on dead space redistribution to make good use of dead-spaces for buffer insertion. The dead space redistribution can be achieved by redistributing the Attached Dead-Spaces in the floorplan, while the topology and total area of the floorplan keep unchanged. Experimental results show that our approach is effective. As a basic buffer planning algorithm embedded in the optimization procedure, our Buffer Planning algorithm can be easily extended to handle the additional constraints, such as congestion and noise. To get an approximate optimal solution, it is necessary to apply an advanced search strategy such as simulated annealing and etc. Because of time consumption, a greedy algorithm is used for optimization in our paper. Though the experimental results show that the greedy strategy is effective, it is required to develop a faster buffer planning algorithm for applying a better search strategy, and the detour route is not considered in this paper. We will work on them in the future. Acknowledgements The first author is thankful to Wang Yibo from Tsinghua University and Dr. Kong Tianming from UCLA for their help with the work. This work was supported by the National Natural Science Foundation of China (Grant No. 90307005), NSFC and Hong Kong RGC Joint Project (Grant No. 60218004), the National Natural Science Foundation of China (NSFC) (Grant No. 60121120706), the National Natural Science Foundation of USA (NSF) (Grant No. CCR-0096383) and the Hi-Tech Research & Development (863) Program of China (Grant No. 2002AA1Z1460). References 1. Cong, J., He, L., Koh, C. K. et al., Performance optimization of VLSI interconnect layout integration, the VLSI Journal, 1996, 21: 1 94.[DOI] www.scichina.com

776 Science in China Ser. F Information Sciences 2004 Vol.47 No.6 763 776 2. Cong, J., Challenges and opportunities for design innovations in nanometer technologies, Frontiers in Semiconductor Research: A Collection of SRC Working Papers, Semiconductor Research Corporation, http://www.src.org/prg_mgmt/frontier.dgw, 1997. 3. Cong, J., Kong, T., Pan, D. Z., Buffer block planning for interconnect-driven floorplanning, in Proc. IEEE/ACM Int. Conf. on Computer Aided Design, San Jose, USA, 1999, 358 363. 4. Cong, J., Kong, T., Pan, D. Z., Buffer block planning for interconnect planning and prediction, Very Large Scale Integration (VLSI) Systems, IEEE Transactions, 2001, 9(6): 929 937. 5. Sarkar, P., Koh, C. K., Routability-driven repeater block planning for interconnect-centric floorplanning, Proc. ACM Intl. Symp. Physical Design, San Diego, USA, 2000, 186 191. 6. Sarkar, P., Koh, C. K., Routability-driven repeater block planning for interconnect-centric floorplanning, Computer-Aided Design of Integrated Circuits and Systems, IEEE Trans., 2001, 20(5): 660 671. 7. Tang, X., Wong, D. F., Planning buffer locations by network flows, Proc. ACM Intl. Symp. Physical Design, San diego, USA, 2000, 180 185. 8. Tang, X., Wong, D. F., Network flow based buffer planning, Integration, the VLSI Journal, 2001, 30(2): 143 155.[DOI] 9. Dragan, F. F., Kahng, A. B., Mandoiu, I. et al., Provably good global buffering using an available buffer block plan, Proc. IEEE/ACM ICCAD, San Jose, USA, 2000, 104 109. 10. Dragan, F. F., Kahng, A. B., Mandoiu, I. et al., Provably good global buffering by multiterminal multicommodity flow approximation, Proc. IEEE/ACM ASP-DAC, Yokohama, Japan, 2001, 120 125. 11. Alpert, C. J., Hu, J., Sapatnekar, S. S. et al., A practical methodology for early buffer and wire resource allocation, Proc. IEEE/ACM, DAC, Las Vegas, USA, 2001, 189 194. 12. Alpert, C. J., Hu, J., Sapatnekar, S. S. et al., A practical methodology for early buffer and wire resource allocation, Computer-aided Design of Integrated Circuits and Systems, IEEE Trans., 2003, 22(5): 573 583. 13. Sham, C. W., Young, F. Y., Routability driven floorplanner with buffer block planning, Proc. ACM ISPD, San Deigo, USA, 2002, 50 55. 14. Sham, C. W., Young, F. Y., Routability-driven floorplanner with buffer block planning, Computer-aided Design of Integrated Circuits and Systems, IEEE Trans., 2003, 22(4): 470 480. 15. Ragiq, F., Jeske, M. C., Yang, H. H. et al., Integrated floorplanning with buffer/channel insertion for bus-based microprocessor designs, Proc. ACM ISPD, San Diego, USA, 2002, 56 61. 16. Hong, X. L., Huang G. et al., Corner Block List: an effective and efficient topological representation of non-slicing floorplan, Proc. IEEE/ACM ICCAD, San Jose, USA, 2000, 8 12. 17. Hong, X. L., Ma, Y., Dong, S. Q. et al., Corner block list representation and its application with boundary constraints, Science in China, Ser. F, 2004, 47(1): 1 19. 18. Dong, S., Zhou, S., Hong, X. L. et al., An optimum placement search algorithm based on extended corner block list, Journal of Computer Science and Technology, 2002, 17(6): 699 707. 19. Semiconductor Industry Association, National Technology Roadmap for Semiconductors, 1997. 20. Alpert, C. J., Devgan, A., Wire segmenting for improved buffer insertion, in Proc. IEEE/ACM Design Automation Conf, Anaheim, USA, 1997, 588 593. 21. Collaborative Benchmarking Laboratory, North Carolina State University, http://www.cbl.ncsu.edu/cbl Docs/lys92.html: LayoutSynth 92 Benchmark Information.