Timing-Constrained I/O Buffer Placement for Flip- Chip Designs

Similar documents
Routability-Driven Bump Assignment for Chip-Package Co-Design

Constraint Driven I/O Planning and Placement for Chip-package Co-design

Physical Design Implementation for 3D IC Methodology and Tools. Dave Noice Vassilios Gerousis

Floorplan Management: Incremental Placement for Gate Sizing and Buffer Insertion

On Increasing Signal Integrity with Minimal Decap Insertion in Area-Array SoC Floorplan Design

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

On Improving Recursive Bipartitioning-Based Placement

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Physical Implementation

ECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms

Non-Rectangular Shaping and Sizing of Soft Modules for Floorplan Design Improvement

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Power-Mode-Aware Buffer Synthesis for Low-Power Clock Skew Minimization

An Effective Decap Insertion Method Considering Power Supply Noise during Floorplanning *

An Efficient and Fast Algorithm for Routing

On GPU Bus Power Reduction with 3D IC Technologies

ICS 252 Introduction to Computer Design

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

Very Large Scale Integration (VLSI)

Signal Integrity Comparisons Between Stratix II and Virtex-4 FPGAs

Routing. Robust Channel Router. Figures taken from S. Gerez, Algorithms for VLSI Design Automation, Wiley, 1998

Binary Decision Tree Using Genetic Algorithm for Recognizing Defect Patterns of Cold Mill Strip

New Optimal Layer Assignment for Bus-Oriented Escape Routing

L14 - Placement and Routing

Topological Routing to Maximize Routability for Package Substrate

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

ASIC Physical Design Top-Level Chip Layout

Parallel algorithms for fast air pollution assessment in three dimensions

Minimum Implant Area-Aware Placement and Threshold Voltage Refinement

An Interconnect-Centric Design Flow for Nanometer. Technologies

Crosslink Insertion for Variation-Driven Clock Network Construction

Cell Density-driven Detailed Placement with Displacement Constraint

Diagonal Routing in High Performance Microprocessor Design

An Efficient Routing Tree Construction Algorithm with Buffer Insertion, Wire Sizing and Obstacle Considerations

Topics. ! PLAs.! Memories: ! Datapaths.! Floor Planning ! ROM;! SRAM;! DRAM. Modern VLSI Design 2e: Chapter 6. Copyright 1994, 1998 Prentice Hall

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Chap5 The Theory of the Simplex Method

1 Model checking and equivalence checking

Place and Route for FPGAs

On Refining Standard Cell Placement for Self-aligned Double Patterning *

A Hierarchical Bin-Based Legalizer for Standard-Cell Designs with Minimal Disturbance

On the Rectangle Escape Problem

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Satisfiability Modulo Theory based Methodology for Floorplanning in VLSI Circuits

Basic Idea. The routing problem is typically solved using a twostep

Efficient Rectilinear Steiner Tree Construction with Rectangular Obstacles

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers

Detailed Placement Algorithm for VLSI Design with Double-Row Height Standard Cells

A Study of IR-drop Noise Issues in 3D ICs with Through-Silicon-Vias

Circuit Model for Interconnect Crosstalk Noise Estimation in High Speed Integrated Circuits

Chapter 5 Global Routing

VERY large scale integration (VLSI) design for power

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

ECE260B CSE241A Winter Placement

Floorplan considering interconnection between different clock domains

Clock Skew Optimization Considering Complicated Power Modes

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

Can Recursive Bisection Alone Produce Routable Placements?

On the Rectangle Escape Problem

Visibility: Finding the Staircase Kernel in Orthogonal Polygons

A Trade-off Oriented Placement Tool 1

Full Custom Layout Optimization Using Minimum distance rule, Jogs and Depletion sharing

Monolithic 3D IC Design for Deep Neural Networks

Processing Rate Optimization by Sequential System Floorplanning

Calibrating Achievable Design GSRC Annual Review June 9, 2002

Analysis of different legalisation methods for unequal sized recursive Min-cut placement

Configurable Rectilinear Steiner Tree Construction for SoC and Nano Technologies

Recent Results from Analyzing the Performance of Heuristic Search

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

A Routing Method based on Nearest Via Assignment for 2-Layer Ball Grid Array Packages

Additional Slides for Lecture 17. EE 271 Lecture 17

8D-3. Experiences of Low Power Design Implementation and Verification. Shi-Hao Chen. Jiing-Yuan Lin

Double Patterning-Aware Detailed Routing with Mask Usage Balancing

Real-Time Median Filtering for Embedded Smart Cameras

Metal-Density Driven Placement for CMP Variation and Routability

a) wire i with width (Wi) b) lij C coupled lij wire j with width (Wj) (x,y) (u,v) (u,v) (x,y) upper wiring (u,v) (x,y) (u,v) (x,y) lower wiring dij

Further Studies on Placement Optimality

Lecture 3: Art Gallery Problems and Polygon Triangulation

Non Uniform On Chip Power Delivery Network Synthesis Methodology

Graph Models for Global Routing: Grid Graph

On the Rectangle Escape Problem

S 1 S 2. C s1. C s2. S n. C sn. S 3 C s3. Input. l k S k C k. C 1 C 2 C k-1. R d

Performance Analysis on the Target Detection of Linear Camera of CCD Vertical Target Coordinate Measurement System

2 A Plane Sweep Algorithm for Line Segment Intersection

Efficient Multilayer Routing Based on Obstacle-Avoiding Preferred Direction Steiner Tree

A Modified Playfair Cipher for a Large Block of Plaintext

ECE260B CSE241A Winter Routing

High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology

Simultaneous OPC- and CMP-Aware Routing Based on Accurate Closed-Form Modeling

An Enhanced Congestion-Driven Floorplanner

CAD Algorithms. Placement and Floorplanning

Cluster-based approach eases clock tree synthesis

Eliminating Routing Congestion Issues with Logic Synthesis

VLSI Physical Design: From Graph Partitioning to Timing Closure

Agus Harjoko Lab. Elektronika dan Instrumentasi, FMIPA, Universitas Gadjah Mada, Yogyakarta, Indonesia

Linking Layout to Logic Synthesis: A Unification-Based Approach

Burn-in & Test Socket Workshop

Multilayer Routing on Multichip Modules

Finding Shortest Path on Land Surface

Transcription:

Timing-Constrained I/O Buffer Placement for Flip- Chip Designs Zhi-Wei Chen 1 and Jin-Tai Yan 2 1 College of Engineering, 2 Department of Computer Science and Information Engineering Chung-Hua University, Hsinchu, Taiwan, R.O.C {m9423, yan}@chu.edu.tw Astract Due to inappropriate assignment of ump pads or improper placement of I/O uffers, the configured delays of I/O signals may not satisfy the timing requirement inside die core. In this paper, the prolem of timing-constrained I/O uffer placement in an area-io flip-chip design is firstly formulated. Furthermore, an efficient two-phase approach is proposed to place I/O uffers onto feasile uffer locations etween I/O pins and ump pads with the consideration of the timing constraints. Compared with Peng s SA-ased approach[7], with no timing constraint, our approach can reduce 71.82% of total wirelength and 55.74% of the maximum delay for 7 tested cases on the average. Under the given timing constraints, our result otains higher timing-constrained satisfaction ratio(tcsr) than the SAased approach[7]. I. INTRODUCTION With the increase of circuit complexity and the decrease of feature size, the dramatic demand for I/O counts ecomes a significant issue in VLSI designs. An advanced packaging technology, flip-chip package[1], is introduced to meet the higher integration density and the larger I/O counts in modern VLSI designs. In general, a flip-chip design as shown in Fig. 1 descries the methodology of electrically connecting the die core to the package carrier ased on the technology of the flipchip package. Because of the reduced signal inductance and package footprint, the flip-chip technology has een widely used in high-performance circuit designs. However, the placement of I/O uffers is not exactly mapped onto the locations of ump pads in an area-i/o flip-chip design. Hence, an extra metal layer, Re-Distriution Layer(RDL), is used to redistriute the connections from I/O uffers on the die to the ump pads on the RDL. Fig. 1. Flip Chip Die Package Build-up Layer Package Core I/O Buffer Package Build-up Layer PCB Die interconnect RDL Bump Pad Bump Ball Package Ball Connection of IO signal in an area-i/o flip-chip design In general, the performance in a chip design is evaluated y the timing effect inside the die core and most of works are concentrated on the timing delay inside the die core for the performance optimization. In an area-io flip-chip design, I/O uffers can e placed onto white space inside the die to ridge I/O pins in circuit locks to ump pads on RDL. After optimizing the performance inside the die core, the configured delay of I/O signals may not satisfy the timing requirement inside the die core due to inappropriate assignment of ump pads or improper placement of IO uffers. As a result, the final performance of the chip may e dominated y the timing effect at package stage. The I/O uffer placement prolem in conventional wireond designs[2, 3, 4, 5] has een extensively studied. However, these approaches are inapplicale in area-io flip-chip designs. Recently, many works[6, 7, 8, 9, 1] on I/O uffer placement have een pulished in area-io flip-chip designs. Hsieh et. al.[6] proposed a constructive approach with considering the total path delay and signal skew to place circuit locks and I/O uffers simultaneously. A SA-ased approach with the similar consideration was further introduced in [7]. However, these two approaches are concentrated on the skew minimization on the asis of wirelength. Xiong et al. [8] studied the I/O uffer placement with considering signal skew and other constraints. However, it is assumed that I/O uffers are placed at fixed locations for usage. Lai et. al.[9] considered the additional differential pairs into the IO uffer placement prolem. Wang et al. [1] proposed an I/O uffer placement approach to take the routaility issue for escape routing into consideration. However, all the works only considered how to place the circuit locks and I/O uffers at the same time and the timing constraint did not e considered for I/O uffer placement. For I/O uffer placement in an area-io flip-chip design, the prolem of timing-constrained I/O uffer placement in an area- IO flip-chip design is firstly formulated. Furthermore, an efficient two-phase approach is proposed to assign necessary I/O uffers for all the I/O signals such that the timing constraint of each I/O signal is satisfied. In the first phase, all I/O pins are assigned onto the feasile ump pads under the consideration of timing constraints. Furthermore, the I/O uffers are inserted to complete all the I/O signals from I/O pins to ump pads. Compared with Peng s SA-ased approach[7], with no timing constraint, our approach can reduce 71.82% of total wirelength and 55.74% of the maximum delay for 7 tested cases on the average. Under the given timing constraints, our result otains higher timing-constrained satisfaction ratio(tcsr) than the SA-ased approach[7]. 978-3-98181-7-9/DATE11/ 211 EDAA

II. PROBLEM FORMULATION According to the design specification in an area-io flipchip design, I/O uffers can e placed on white space in a floorplan plane and ump pads must e placed at pre-defined locations on RDL. It is assumed that a set of locks with some I/O pins is located in a floorplan plane and an array of ump pads is located on RDL. Generally speaking, any I/O pin must e assigned onto a unique ump pad and a necessary I/O uffer must e required to ridge the connection etween the I/O pin and its corresponding ump pad. To assign an I/O uffer onto an availale space in a floorplan plane to connect each I/O pin onto the corresponding ump pad, two essential constraints, timing constraint and non-overlapping constraint, must e satisfied on RDL. In an area-io flip-chip design, I/O signal is defined as an I/O connection to connect an I/O pin to a ump pad via an I/O uffer as shown in Fig. 2. In general, the delay of any I/O signal is constrained y a given timing delay of its I/O pin and divided into three parts: The first one is the delay etween the I/O pin and the I/O uffer, the second part is the intrinsic delay of the I/O uffer, and the final part is the delay etween the I/O uffer and the ump pad. Besides the intrinsic delay of the I/O uffer, the remaining delays are dependent on the lengths of the two connections. To satisfy the given timing delay, the timing constraint of each I/O connection must e considered in I/O uffer placement. On the other hand, the overlapping relation etween any I/O uffer and any lock is not allowed. Hence, I/O uffers are only placed on white space in a floorplan plane. In general, white spaces in a floorplan plane can e otained y performing the sweeping line algorithm[11]. Furthermore, all availale locations for I/O uffers can e achieved according to the size of an I/O uffer and the size of each white space in the plane. To satisfy the valid locations of I/O uffers, the nonoverlapping constraint must e must e considered in I/O uffer placement. Fig. 2. I/O pin I/O uffer Bump pad RDL Device layer I/O connection from an I/O pin to a ump pad via an I/O uffer Given a set of m locks, M = {B 1, B 2,...,B m }, and a set of s uffer locations, S = {s 1, s 2,...,s s }, in a floorplan plane, a set of n I/O pins, P = {,,...,p n }, on the locks, an array of r ump pads, B = {,,..., r }, on the given RDL and the timing constraint, t i, each I/O pin, p i, 1 i n, the timing-constrained I/O uffer placement is to map each I/O pin onto a unique ump pad to construct an I/O signal and assign an I/O uffer on a feasile location for each I/O signal in the given floorplan plane under the timing and non-overlapping constraints. As illustrated in Fig. 3(a), given a set of 7 locks, B 1, B 2,..., and B 7, and a set of 13 uffer locations, s 1, s 2,..., and s 13, in a floorplan plane, a set of 9 I/O pins,,,..., and p 9, on the locks, a set of 9 ump pads,,,, and, on the given RDL and the timing constraint of each I/O pin, the 9 resultant I/O signals, (, s 6, ), (, s 2, ), (, s 1, ), (, s 8, ), (, s 3, ), (, s 9, ), (, s 13, ), (, s 12, ) and (p 9, s 11, ), with the location assignment of 9 I/O uffers can e otained in timingconstrained I/O uffer placement and shown in Fig. 3(). s 1 s 2 B 2 Fig. 3. B 3 B 4 B s 7 1 s 1 8 B p 9 8 7 s 6 s 7 s 8 s 9 s 12 B 5 B 1 s 13 (a) s 3 s 4 s 5 B 2 B 3 B 1 B 4 () B p 9 8 7 B 5 B 6 Timing-constrained I/O uffer placement in an area-io flip-chip design III. TIMING-CONSTRAINED I/O BUFFER PLACEMENT For timing-constrained I/O uffer placement in an area-io flip-chip design, an efficient approach is proposed to assign I/O pins onto feasile ump pads to construct I/O signals and assign necessary I/O uffers onto feasile uffer locations for all I/O signals. Basically, the placement process can e divided into three main steps: Extraction of timing-constrained routing length, I/O signal assignment and Timing-constrained I/O uffer placement and the design flow for the proposed approach is illustrated in Fig. 4. (1) A given set of locks with some I/O pins and a given set of uffer locations in a floorplan plane (2) A given array of ump pads (3) The timing constraints of all the I/O pins Extraction of Timing -Constrained Routing Length I/O Signal Assignment Construction of Manhattan Circular Regions Graph Construction for I/O Signal Assignment Fig. 4. Matching-Based I/O Signal Assignment Timing-Constrained I/O Buffer placement Construction of Feasile Location Regions Location Construction for I/O Buffers Graph Construction for I/O Buffer Placement Matching-Based I/O Buffer Placement Timing-Constrained I/O Buffer Placement result Design flow of timing-constrained I/O uffer placement

A. Extraction of Timing-Constrained Routing Length For any I/O pin in a floorplan plane, its timing constraint determines all the feasile ump pads to construct an I/O signal. In other words, the location of any valid ump pad must connect from the I/O pin to the pad to satisfy the given timing constraint. To find all the valid ump pads under the timing constraint, the maximal routing length of the I/O pin must e extracted from the given timing constraint. Basically, an I/O connection can e modeled as a RC circuit and the delay of the RC circuit can e computed y using Elmore delay model. In Elmore delay model, the delay, t, of any I/O connection with length, l, can e otained as r c 2 t = l 2 ( Rc r C) l RC where R is the driver strength, C is the driving load, r is the unit resistance and c is the unit capacitance. It is assumed that t max is the maximum timing delay of a given I/O connection with length, l. It is known that the delay, t, of the I/O connection with length, l, must satisfy the timing constraint, t t. By using the delay expression in Elmore max delay model, the length, l, of the I/O connection must satisfy the following constraint. l 2 2 2 2 c R r C 2r c t c R r C max, r c Hence, the maximal length, l max, of the I/O connection can e otained as, feasile to e assigned onto the pin, p k, ecause the two ump pads are located inside MCR k. Hence, the two pads, i,j and i,j1, are assigned as the pad candidates of the pin, p k. Fig. 5. l k i,j1 i,j MCR k p k (x k,y k ) i1,j1 i1,j Bump pad candiates for I/O pin Refer to the set of 9 I/O pins,,,..., and p 9, and the set of 9 ump pads,,,..., and, in Fig. 3(a), ased on the extraction of the maximum routing lengths of the I/O pins, all the MCRs can e achieved and illustrated in Fig. 6(a). According to the locations of all the ump pads and all the MCRs, the pad candidates of each I/O pin can e otained and all the possile I/O signals can e constructed as illustrated in Fig. 6(). Clearly, I/O pin,, has a set of pad candidates,, and, I/O pin,, has a set of pad candidates, and, I/O pin,, has a set of pad candidates, and, I/O pin,, has a set of pad candidates,,, and, I/O pin,, has a set of pad candidates, and, I/O pin,, has a set of pad candidates, and, I/O pin,, has a set of pad candidates, and, I/O pin,, has a set of pad candidates, and, and I/O pin, p 9, has a pad candidate,. c R r C 2r c t c R r C 2 2 2 2 max max =, r c l p3 4 According to the length extraction of any I/O connection under a given timing constraint, t i, for I/O pin, p i, the maximum routing length, l i, of the I/O pin can e extracted. p2 p4 p5 p8 p9 4 p 9 B. I/O Signal Assignment According to the length extraction under a given timing constraint, the maximum routing lengths of all I/O pins in a floorplan plane can e extracted. In a Manhattan routing model, the collection of points within a fixed distance of a given point is called a Manhattan circular region(mcr) whose oundary is composed of two line segments with slope 1 and two line segments with slop -1. Furthermore, the radius of the MCR is the Manhattan distance etween the given point and its oundary. For any I/O pin, p i, with its timing constraint, t i, its feasile pad candidates must e inside the Manhattan circular region, MCR i, whose center is the coordinate, (x i, y i ), of the I/O pin, p i, and whose radius is the maximum routing length, l i, under the timing constraint, t i. As illustrated in Fig. 5, I/O pin, p k, constructs its MCR k ased on the coordinate, (x k, y k ), of the I/O pin as the MCR center and the maximum routing length, l k, as the MCR radius. It is assumed that there are four ump pads, i,j, i1,j, i,j1, and i1,j1, around the MCR k. Clearly, the ump pads, i1,j and i1,j1, are not feasile to e assigned onto the pin, p k, ecause the two ump pads are not located inside MCR k. In contrast, the two ump pads, i,j and i,j1, are Fig. 6. p1 p6 (a) p7 () Distriution of Manhattan circular regions for I/O pins and all the possile I/O signal for 9 I/O pins According to the possile I/O signals for all the I/O pins, a ipartite graph, G=(V p, V, E), can e constructed as follows: the vertex set in G can e divided into a vertex set, V p, for I/O pins and a vertex set, V, for pad candidates. For v i in V p and v j in V, there exists an edge, e i,j, in E if the connection etween the corresponding I/O pin in v i and the corresponding pad candidate in v i is a possile I/O signal under a given timing constraint. Clearly, an assignment solution can e otained if the numer of pad candidates is greater than or equal to the numer of I/O pins and each I/O pin has at least one pad candidate. To make each I/O pin to assign a feasile pad candidate, the I/O pins with only one pad candidate must have the highest priority to construct I/O signals. Therefore, a

matching-ased I/O signal assignment can e applied as follows: Firstly, any vertex in V p with degree 1 and its connecting vertex in V are selected to form an I/O signal and the original ipartite graph is modified y deleting the selected vertex pair in V. If there is no vertex in V p with degree 1, the vertex in V p with the smallest degree and its nearest connecting vertex in V are selected to form an I/O signal and the original ipartite graph is modified y deleting the selected vertex pair in V. Until the vertex set, V p, is empty or there is no edge in G, the assignment process will stop and the final I/O signal assignment result will e otained. Refer to all the possile I/O signals in Fig. 6(), a ipartite graph, G=(V p, V, E) can e uilt as illustrated in Fig. 7, where V p ={,,,,,,,, p 9 }, V ={,,,,,,,, } and E={e 1,2, e 1,3, e 1,6, e 2,1, e 2,2, e 3,1, e 3,4, e 4,2, e 4,3, e 4,5, e 4,6, e 5,4, e 5,7, e 6,5, e 6,6, e 7,8, e 7,9, e 8,7, e 8,8, e 9,8 }. By performing a matching-ased I/O signal assignment, the edge, e 9,8, is firstly selected to form an I/O signal and the edges, e 7,8 and e 8,8, are deleted ecause the pad candidate,, is assigned onto the pin, p 9. Furthermore, for the vertices in V p with degree 1, the edges, e 7,9, e 8,7, e 5,4, e 3,1 and e 2,2, are sequentially selected to form I/O signals. Finally, for the vertices in V p with the smallest degree, the edges, e 6,6, e 1,3 and e 4,5, are sequentially selected to form I/O signals and the assignment result of 9 I/O signals is illustrated in Fig. 8. Fig. 7. p 9 Construction of a ipartite graph for all the I/O signals of 9 I/O pins delay of an I/O signal from its I/O pin to its ump pad via an I/O uffer violates the given timing constraint, the placement of the I/O uffer for the I/O signal is not allowed. Hence, the location of the placed uffer for an I/O signal must e one availale uffer location under its timing constraint. Fig. 9. s 1 s 2 s 6 s 7 s 8 s 9 s 3 s 4 s 5 p 9 s 1 s 11 s 12 s 13 Availale locations for I/O uffer placement For any I/O signal from I/O pin, p i, to ump pad, j, ased on the construction of a Manhattan circular region, two Manhattan circular regions, MCR p i, and MCR j, must e uilt from I/O pin, p i, to the placed I/O uffer and from the placed I/O uffer to the corresponding ump pad, j, respectively. Furthermore, the feasile location region, FLR, of the placed I/O uffer for the corresponding I/O signal can e defined as the intersection region etween two MCRs, MCR p i, and MCR j, as illustrated in Fig. 1, where the center of the MCR p i is the coordinate, (x i, y i ), of the I/O pin, p i, and the radius, l p i, of the MCR p i is the maximum routing length under the given timing delay, t p i, and the center of the MCR j is the coordinate, (x j, y j ), of the ump pad, j, and the radius, l j, of the MCR j is the maximum routing length under the given timing delay, t j. l j FLR j (x j,y j ) IO l p i p i (x i,y i ) p 9 MCR j MCR p i Fig. 8. I/O signal assignment for 9 I/O pins C. Timing-Constrained I/O Buffer Placement After a set of I/O signals is constructed in I/O signal assignment, necessary I/O uffers must e further placed onto feasile uffer locations to complete I/O connections etween IO pins and ump pads. As illustrated in Fig. 9, according to the size of an I/O uffer, the whitespace in a floorplan plane can e assigned as 13 uffer locations. On the other hand, if the Fig. 1. Feasile location region for an I/O uffer To satisfy the given timing constraint of any I/O pin, p i, the FLR etween MCR p i, and MCR j, for the placed I/O uffer must e constrained into two necessary conditions. Firstly, the timing constraint of the I/O signal for the I/O pin, p i, must e satisfied as t p i t uf t j t max, where t p i is the timing delay from the I/O pin, p i, to the placed I/O uffer, t uf is the intrinsic delay of the placed I/O uffer, and t j is the timing delay from the placed I/O uffer to the ump pad, j. Based on the length, l p i, etween the I/O pin, p i, and the placed I/O uffer and the length, l j, etween the placed I/O uffer and the ump pad, j, t p i and t j can e otained as

p rc t i = li 2 p ( Rc r C ) li RC rc t j = l j j 2 ( R c r C) l R C and where R and C are the output resistance and input capacitance of an placed I/O uffer, respectively. On the other hand, to form the intersection region etween two MCRs, the total length of oth radiuses must e greater than or equal to the distance, d ij, etween the points, (x i, y i ) and (x j, y j ), that is, l p il j d ij. According to all the availale locations of I/O uffers for any I/O signal in a floorplan plane, a ipartite graph, G(V s, V, E), can e constructed as follows: the vertex set in G can e divided into a vertex set, V s, for I/O signals and a vertex set, V, for availale uffer locations. For v i in V s and v j in V, there exists an edge, e i,j, in E if the corresponding I/O signal in v i can use the corresponding uffer location in v i to place an I/O uffer under a given timing constraint. Clearly, a placement solution can e otained if the numer of uffer locations is greater than or equal to the numer of I/O signals and any I/O signal has at least one uffer location to place an I/O uffer. To make each I/O signal to place an I/O uffer, the I/O signals with only one uffer location must have the highest priority to place I/O uffers. Therefore, a matching-ased I/O uffer placement can e applied as follows: Firstly, any vertex in V s with degree 1 and its connecting vertex in V are selected to place an I/O uffer and the original ipartite graph is modified y deleting the selected vertex pair in V. If there is no vertex in V s with degree 1, the vertex in V s with the smallest degree and its nearest connecting vertex in V are selected to place an I/O uffer and the original ipartite graph is modified y deleting the selected vertex pair in V. Until the vertex set, V s, is empty or there is no edge in G, the placement process will stop and the final I/O uffer placement result will e otained. Refer to the set of 9 I/O signals, n 1, n 2,..., and n 9, and the set of 13 uffer locations, s 1, s 2,..., and s 13, in Fig. 9, a ipartite graph, G(V s, V, E) can e uilt as illustrated in Fig. 11, where V s ={n 1, n 2, n 3, n 4, n 5, n 6, n 7, n 8, n 9 }, V s ={s 1, s 2, s 3, s 4, s 5, s 6, s 7, s 8, s 9, s 1, s 11, s 12, s 13 } and E={e 1,3, e 1,4, e 2,2, e 3,1, e 3,2, e 4,5, e 4,6, e 5,7, e 5,8, e 6,5, e 6,6, e 7,13, e 8,1, e 8,11, e 9,11, e 9,12 }. By performing a matching-ased I/O uffer placement, the edges, e 2,2 and e 7,13, are selected to place two I/O uffers and the edges, e 3,2, is deleted ecause the location, s 2, is used for the I/O signal, n 2. Furthermore, for the vertices in V s with degree 1, the edge, e 3,1, is selected to place an I/O uffer. Finally, for the vertices in V p with the smallest degree, the edges, e 1,3, e 4,5, e 6,6, e 5,7, e 8,11, and e 9,12, are sequentially selected to place I/O uffers and the placement result of 9 I/O uffers is illustrated in Fig. 12. Fig. 11. n 1 n 2 n 3 n 4 n 5 n 6 n 7 n 8 n 9 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9 s 1 s 11 s 12 s 13 Construction of a ipartite graph for 9 I/O signals and 13 availale uffer locations, Fig. 12. B 2 B 3 B 1 B 4 B 5 B 6 p B 9 8 7 Timing-constrained I/O uffer placement for 9 I/O pins D. Analysis of Time Complexity Based on the timing constraints of all I/O pins, it is clear that the time complexity of extracting the timing-constrained routing lengths of all the pins in Elmore delay model is O(n), where n is the numer of I/O pins in a floorplan plane. Furthermore, the time complexity of constructing the Manhattan circular regions(mcrs) for all the pins under their timing constraints is O(n). Based on all the possile I/O signal, it is clear that the time complexity of constructing a ipartite graph is O(nr) and the time complexity in a matching-ased I/O signal assignment is O(n 2 ), where r is the numer of ump pads in the area-i/o flip-chip design. In the placement of I/O uffers, firstly, the time complexity of finding the possile uffer locations for all the I/O signals under their timing constraints is O(n 2 ). Furthermore, the time complexity of constructing a ipartite graph is O(ns). Finally, the time complexity in a matching I/O uffer placement is O(n 2 ). Hence, the time complexity of timing-constrained I/O uffer placement is O(n 2 nrns). As a result, the time complexity in the proposed approach is O(nrns) ecause the numer of ump pads is generally greater than the numer of I/O pins. IV. EXPERIMENTAL RESULTS For timing-constrained I/O uffer placement, our proposed approach has een implemented y standard C language and run on an Intel Core2 Quad 2.6GHz machine with 3.5G memory. We take seven industrial circuits in [7] as tested cases. Due to no timing constraint in these tested cases, we set the timing delay of the distance etween two adjacent ump pads as a unit delay and the timing constraint for all I/O pins as the window size of the delay of multiple units. The parameters in.18μm process from NTRS 97 are used to compute the delay of a RC circuit. In the experiments, the driver strength and the driving load are set as 18(Ω) and 23.4(pF), respectively, r is assigned as.76(ω/μm) and c is assigned as.118(pf/μm). Besides that, one kind of I/O uffers with input capacitance of 23.4(pF), output resistance of 18(Ω) and the intrinsic delay of 36.4(ps) is applied for the tested cases. Here, the timingconstrained satisfaction ratio(tcsr) is defined as the ratio etween the numer of I/O signals which satisfy the given timing constraints and the total numer of I/O signals in an area-i/o flip-chip design. Two experiments are set up to evaluate the TCSR and the results in our proposed approach are compared with those in

Peng s SA-ased approach[7]. In the first experiment, no timing constraint is considered in 7 tested cases and the experimental results including the total wirelength and the maximum delay are shown in Tale I. The experimental results in Tale I show that our approach can reduce 71.82% of the total wirelength and 55.74% of the maximum delay for 7 tested cases on the average. Under the various timing constraints from the window sizes(w.s) of the delay of 6, 12 and 18 units, the tested cases are used to evaluate the TCSR in our second experiment and the experimental results are shown in Tale II. Compared with Peng s SA-ased approach[7], the experimental results show that the SA-ased approach only has low TCSR in 7 tested cases to satisfy the given timing constraints. Under the first timing constraint, our proposed approach otains 86.5% TCSR in 7 tested cases in on the average. As the timing constraint is modified as the window size of the delay 18 units, all the tested cases are completely assigned y our proposed approach. V. CONCLUSIONS The prolem of timing-constrained I/O uffer placement in an area-io flip-chip design is firstly formulated. Given a set of locks with some I/O pins and the timing constraints of the I/O pins in a floorplan plane, our proposed approach is effectively to assign all the I/O signals and place the necessary I/O uffers for the assigned I/O signals with satisfying the given timing constraints. Experimental results present that our proposed approach reduces the total wirelength and the maximum delay with no timing constraint and achieves higher TCSR with the given timing constraints than the previous work. REFERENCES [1] P. Dehkordi and D. Bouldin, Design for packageaility: the impact of onding technology on the size and layout of VLSI dies, Multi-chip Module Conference, pp. 153-159, 1993. [2] P. Buffet, J. Natonio, R. Proctor, Y. Sun and G. Yasar, Methodology for I/O cell placement and checking in ASIC designs using area-array power grid, IEEE Custom Integrated Circuits Conference, pp. 125-128, 2. [3] A. E. Caldwell, A. B. Kahng and I. L. Markov, Can recursive isection alone produce routale placement?, ACM/IEEE Design Automation Conference, pp. 477-482, 2. [4] A. R. Agnihotri, M. C. Yildiz, A. Khatkhate, A. Mathur, S. Ono and P. H. Madden, Fractical cut: improved recursive isection placement, IEEE/ACM International Conference on Computed-Aided Design, pp. 37-31, 23. [5] S. N. Adya, I. L. Markov and P. G. Villarruia, On whitespace in mixed-size placement and physical synthesis, IEEE/ACM International Conference on Computed-Aided Design, pp. 311-318, 23. [6] H. Y. Hsieh and T. C. Wang, Simple yet effective algorithms for lock and I/O uffer placement in flip-chip design, IEEE International Symposium on Circuits and Systems, pp.1879-1882, 25. [7] C. Y. Peng, W. C. Chao, Y. W. Chang and J. H. Wang, Simultaneous lock and I/O uffer floorplanning for flip-chip design, IEEE Asia and South Pacific Design Automation Conference, pp.213-218, 26. [8] J. Xiong, Y. C. Wong, E. Sarto and L. He, Constraint driven I/O planning and placement for chip-package co-design, IEEE/ACM Asia and South Pacific Design Automation Conference, pp.27-212, 27. [9] M. F. Lai and H. M. Chen, An implementation of performance-driven lock and I/O placement for chip-package codesign,, IEEE International Symposium on Quality Electronic Design, pp.64-67, 28. [1] C. Y. Wang and W. K. Mak, Signal skew aware floorplanning and umper signal assignment technique for flip-chip, IEEE/ACM Asia and South Pacific Design Automation Conference, pp.341-346, 29. [11] M. de Berg, O. Cheong, M. van Kreveld and M. Overmars, Computational Geometry: Algorithms and Applications, 3 rd edition, Springer-Verlag, 25. TABLE I. EXPERIMENTAL RESULTS FOR I/O BUFFER PLACEMENT WITH NO TIMING CONSTRAINT Ex #Pin #Block #Bump Chip area Peng s SA-ased Approach[7] #WL(μm) TCSR Max. delay(ps) #WL(μm) TCSR Max. delay(ps) Time(s) case1 25 6 25 14x14 1627(1%) 1% 23.18 (1%) 1213(75.2%) 1% 21.41(92.36%).16 case2 168 12 289 344x344 7145(1%) 1% 122.51(1%) 16236(22.85%) 1% 62.92(51.35%).562 case3 32 23 441 424x424 177984(1%) 1% 17.3(1%) 3394(19.4%) 1% 69.61(4.93%) 1.97 case4 384 28 484 444x444 19672(1%) 1% 147.79(1%) 25226(13.23%) 1% 54.43(36.82%) 4.94 case5 384 28 484 444x444 2212(1%) 1% 173.89(1%) 417(18.86%) 1% 73.46(42.24%) 2.547 case6 384 28 529 448x448 233922(1%) 1% 345.91(1%) 38534(16.47%) 1% 68.65(19.84%) 3.937 case7 384 28 529 448x448 22974(1%) 1% 33.95(1%) 69726(31.55%) 1% 87.12(26.32%) 3.188 TABLE II. EXPERIMENTAL RESULTS FOR TIMING-CONSTRAINED I/O BUFFER PLACEMENT UNDER THREE WINDOW SIZES = 6, 12 AND 18 Ex Peng s SA-ased Approach[7] W.S = 6 W.S = 12 W.S = 18 Peng s SA-ased Approach[7] Peng s SA-ased Approach[7] TCSR #WL(μm) TCSR #WL(μm) TCSR #WL(μm) TCSR #WL(μm) TCSR #WL(μm) TCSR #WL(μm) case1 1% 1627 1% 1213 1% 1627 1% 1213 1% 1627 1% 1213 case2 N/A N/A 89.2% 11162 N/A N/A 1% 1628.6% 355 1% 16236 case3 N/A N/A 82.5% 19486 N/A N/A 1% 33228 N/A N/A 1% 3394 case4 N/A N/A 98.4% 24584 N/A N/A 1% 25794 N/A N/A 1% 25226 case5 N/A N/A 83.% 2537 N/A N/A 1% 41228 N/A N/A 1% 417 case6 N/A N/A 86.1% 24366 3.9% 366 99.2% 37684 1.2% 154 1% 38534 case7 1.4% 43 66.4% 19672 6.2% 4214 94.2% 53386 18.2% 18224 1% 69726