EWA: Exact Wiring-Sizing Algorithm

Similar documents
5 The Primal-Dual Method

Parallelism for Nested Loops with Non-uniform and Flow Dependences

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

GSLM Operations Research II Fall 13/14

Support Vector Machines

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Solving two-person zero-sum game by Matlab

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Feature Reduction and Selection

Support Vector Machines

Hermite Splines in Lie Groups as Products of Geodesics

Optimizing Document Scoring for Query Retrieval

Active Contours/Snakes

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

S1 Note. Basis functions.

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

A Binarization Algorithm specialized on Document Images and Photos

Biostatistics 615/815

Meta-heuristics for Multidimensional Knapsack Problems

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Wishing you all a Total Quality New Year!

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

11. APPROXIMATION ALGORITHMS

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

LS-TaSC Version 2.1. Willem Roux Livermore Software Technology Corporation, Livermore, CA, USA. Abstract

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Mathematics 256 a course in differential equations for engineering students

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

X- Chart Using ANOM Approach

y and the total sum of

Topology Design using LS-TaSC Version 2 and LS-DYNA

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

CS 534: Computer Vision Model Fitting

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Smoothing Spline ANOVA for variable screening

Multicriteria Decision Making

The Codesign Challenge

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Network Coding as a Dynamical System

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Parallel matrix-vector multiplication

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

Multi-objective Design Optimization of MCM Placement

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Programming in Fortran 90 : 2017/2018

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

An Entropy-Based Approach to Integrated Information Needs Assessment

LECTURE : MANIFOLD LEARNING

Optimization of Critical Paths in Circuits with Level-Sensitive Latches

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Module Management Tool in Software Development Organizations

Lecture 5: Multilayer Perceptrons

An Optimal Algorithm for Prufer Codes *

Message-Passing Algorithms for Quadratic Programming Formulations of MAP Estimation

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Simulation Based Analysis of FAST TCP using OMNET++

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Electrical analysis of light-weight, triangular weave reflector antennas

Hierarchical clustering for gene expression data analysis

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

How Accurately Can We Model Timing In A Placement Engine?

Polyhedral Compilation Foundations

Classification / Regression Support Vector Machines

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

CHAPTER 2 DECOMPOSITION OF GRAPHS

TN348: Openlab Module - Colocalization

Load-Balanced Anycast Routing

Analysis of Continuous Beams in General

Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

A Multilevel Analytical Placement for 3D ICs

3. CR parameters and Multi-Objective Fitness Function

Private Information Retrieval (PIR)

Routability Driven Modification Method of Monotonic Via Assignment for 2-layer Ball Grid Array Packages

A Saturation Binary Neural Network for Crossbar Switching Problem

Fitting: Deformable contours April 26 th, 2018

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Unsupervised Learning

3D vector computer graphics

Space-Optimal, Wait-Free Real-Time Synchronization

Routing on Switch Matrix Multi-FPGA Systems

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Design of Structure Optimization with APDL

A Min-Cost Flow Based Detailed Router for FPGAs

K-means and Hierarchical Clustering

UNIT 2 : INEQUALITIES AND CONVEX SETS

Transcription:

EWA: Exact Wrng-Szng Algorthm Rony Kay, Gennady Bucheuv and Lawrence T. Plegg Carnege Mellon Unversty Department of Electrcal and Computer Engneerng Pttsburgh, PA 15213 ABSTRACT The wre szng problem under nequalty Elmore delay constrants s known to be posynomal, hence convex under an exponental varable-transformaton. There are formal methods for solvng convex programs. In practce heurstcs are often appled because they provde good approxmatons whle offerng smpler mplementaton and better effcency. There are methods for solvng related problems, whch are comparable to heurstcs from effcency pont of vew, but they solve a less desrable formulaton n terms of the objectve functon and constrants. In ths paper the EWA algorthm s descrbed. It solves the problem of mnmzng the wrng area or capactance of an nterconnect tree subject to constrants on the Elmore delay. EWA s smple to mplement and ts effcency s comparable to the avalable heurstcs. No restrctons are placed on the crcut or wre wdths, e.g., non-monotone wre wdths assgnment solutons are feasble. We prove that the optmal wre wdth assgnment for a mnmum wrng area objectve satsfes all the delay constrant as equaltes, when mnmum wre wdth constrants are relaxed. It follows that EWA can be appled also for problems wth equalty delay constrants such as clock trees. Ths and other descrbed propertes are general enough to permt extensons to hgher order delay models n the future. 1.0 Introducton The nterconnect wrng between gates on a VLSI chp can domnate the performance for today s technologes. It has been reported that n some desgns up to 70% of the overall path delays are attrbuted to the nterconnect, whle the manufacturng trends suggest that the nterconnect delay wll be even more domnant n the foreseeable future. Mnmzng the wrng length has always been a prmary objectve of routng algorthms to control the chp area, but now t must be done to satsfy tght performance requrements too. Numerous technques have been developed to construct nterconnect trees and optmze ther length. When the nterconnect resstance and capactance are domnatng the IC performance, t s advantageous to explot also the wre wdths as an addtonal degree of freedom for performance optmzaton. Ths work was supported n part by the Semconductor Research Corporaton under contract 97-DC-068. Clock trees were wre wdth optmzed n [14] to demonstrate the mprovement n controllng the delay and desentzng the skew to nevtable process varatons. The benefts of wre szng for sgnal nets was demonstrated n [3]. Varous strateges for wre wdth optmzaton have followed for both clock trees and sgnal nets. Most wre szng technques optmze ether wrng delay or wrng area subject to constrants on the other, or a weghted combnaton of delay and area. Some heurstc approaches can be appled smultaneously wth the tree constructon, whle others are used for postprocessng a gven tree topology. Refer to [2][3][14][18][10][21] for examples of each type of approach. Ths paper descrbes a wre-szng algorthm for post-processng an nterconnect tree wth effcency and smplcty that are comparable to the heurstc methods. EWA fnds the soluton for mnmzng the total wrng area subject to constrants on the Elmore path delays, as well as upper and lower bounds on wres wdth. The delay constrants can be nequaltes or equaltes, hence EWA can be appled ether for sgnal nets or for clock nets. We propose a twophase methodology for frst targetng delays, then reachng them n terms of mnmum area. EWA s based on several new observatons and proofs for propertes of RC nterconnect trees. We prove that the soluton to the problem of mnmum area subject to nequalty constrants on target delays must be on the delay boundary wth respect to all the delay constrants, when mnmum wre wdth constrants are relaxed. Thus the soluton subject to nequalty constrants on the path delays s equvalent to the soluton wth equalty constrants on the path delays. We revew some related work n Secton 2.0. An overvew of our two-phase wre szng methodology s descrbed n Secton 3.0. The wre szng crcut, delay model, and some necessary termnology s outlned n Secton 4.0. The formulaton of the wre szng problem s gven n Secton 5.0. Key theorems and new observatons are made n Secton 6.0, followed by a descrpton of the EWA algorthm n Secton 7.0. Some expermental results are gven n Secton 8.0. Conclusons are drawn n Secton 9.0, along wth some comments about extendng EWA to handle more accurate delay models. 2.0 Background Ths secton contans a bref overvew of related work. It s not ntended as an exhaustve survey. For a detaled revew of prevous work on wre szng refer to [2]. 2.1 Sgnal Nets Cong and Leung [3][4] proposed a wre szng algorthm for sgnal nets under the Elmore delay model. Because t lends tself to an elegant soluton, they consdered a mnmzaton of a weghted sum of crtcal delays to the leaf nodes (snks); the crtcal paths must be gven a-pror. Specal propertes of nterconnect trees - separablty, monotoncty, and domnance - were used to develop an

O( n r ) dynamc programmng algorthm, where n s the number of segments and r s the cardnalty of a dscrete wdth range. Based on the theoretcal analyss of propertes of RC trees, greedy heurstcs were also proposed. The heurstcs can be useful n early phase of the desgn cycle. Fshburn and Dunlop showed that the Elmore delay s a posynomal functon of transstor wdths when appled to transstor szes optmzaton n [7] (TILOS). Gven that posynomal programs can be cast as convex optmzatons under the exponental varable transformaton, t was shown by TILOS that heurstcs can obtan good approxmatons usng less run-tme and requrng smpler mplementatons. Exact solutons for the contnuous szng are possble f one s wllng to resort to geometrc programmng [5] or to convex programmng n the transformed doman. Sapatnekar et. al solved the same problem as TILOS usng convex programmng n [16]. More recently Sapatnekar appled a smlar convex programmng approach n [18] to solve the mnmzaton of the crtcal path delay by wre szng under the Elmore delay model. A senstvty-based heurstc was consdered n [19] and compared to the results from convex programmng. It was emprcally observed that the results of the exact method and the heurstc are very well correlated. Menezes et. al showed n [10] that also hgher order moments can be cast as posynomals, whch allowed them to optmze transfer functons under hgher order delay models. Later, a Sequental Quadratc Programmng (SQP) approach was appled n [11] to optmze drver and wre szes that satsfy the target delays of crtcal leaf nodes. The SQP approach converges to the global mnmum n terms of the Elmore delay model, and to a local mnmum n terms of a hgher order delay model usng RICE[16]. It was shown that applyng hgher order delay model yelds superor results. 2.2 Clock Trees Unfortunately, geometrc (posynomal) programmng does not apply drectly to clock tree problems because the assocated formulaton ncludes only nequalty constrants. Pullela, Menezes and Plegg proposed a wre szng approach for clock trees n [14] that used a smple heurstc based on one-wre-at-a-tme downhll mprovements. Later, Zhu, Da and X proposed a least-squares approach whch consdered several wres concurrently [21], but n spte of the ncreased runtme and complexty they could not prove convergence to the global optmum ether. Snce least-squares approaches can be costly n terms of runtme due to buldng and factorng a matrx of senstvtes, heurstcs were used to sparsfy the senstvty matrx n [15]. It s generally more dffcult to acheve exact solutons for equalty constrants, hence clock tree problems. It should be noted that such constrant formulatons can have advantages for sgnal nets too. The advantages should become more clear n the subsequent sectons. 3.0 Proposed Wre Szng Methodology The formulatons for RC tree wre-szng optmzaton generally fall nto one of three categores: () mnmze the maxmum path delay subject to restrctons on the avalable area resources; () mnmze the wrng area subject to upper bound constrants on the path delays; () mnmze a combnaton (weghted product or weghted sum) of delays and area. Each of those formulatons captures mportant aspects of the overall desgn goals. Clearly, there s a trade off between delay and area of the nterconnects [3][18][11], a qualtatve llustraton of whch s shown n Fg. 1. From the fgure t s apparent that to acheve the mnmum delay area FIGURE 1. Typcal curve of delay versus area tradeoff. delay possble t s necessary to use extensve area resources. Analogously, too much concern about area resources results n a poor performance. A weghted sum or a weghted product of delay and area s able to capture a trade off relaton somewhat, however, assgnng good weghts s based on a tral and error. The desgner s goal s to comply wth the desgn specfcaton whle mnmzng resources utlzaton, yet the desgn specfcaton s drven by the best performance that can be acheved n a reasonable cost. For ths reason we beleve that the best methodology for wdth optmzaton (of nterconnects or transstors) s based on a two-phase approach. Phase I s comprsed of a delay mnmzaton step, not necessarly to the optmal soluton but to provde a measure of how well we can do. Phase II s the more mportant phase, where delay bounds that are larger than the delays acheved at Phase I are mposed as constrants for an area mnmzaton. The purpose of Phase I s to ensure that there s a feasble soluton for Phase II and to provde orentaton wth respect to the quanttatve behavor of the area-delay curve (Fg. 1). Further trade-off analyss can be made by varyng the delay targets. An mportant advantage of the two phase approach s that t supports the engneerng decson of tradng area resources for performance n a more practcal and ntutve way then a weghted objectve functon. 4.0 Crcut and Delay Model 4.1 Overvew In ths paper the nterconnect s assumed to have an electrcal model n the form of an RC tree, as shown n Fg. 2. The Elmore R gate FIGURE 2. Lumped equvalent RC tree model. delay[6] s used to estmate path delays. The Elmore path delay from each leaf node to the root s calculated by summng up the ndvdual contrbutons of wre segments along the path. The contrbuton to the delay of path P by a segment j (on the path) s the product of the segment s resstance R j and the down stream capactance Cds j. Cds j s the total capactance of the wre segments and external capactance loads whch are descendants of j. Denotng ds(j) as the set of segments and leaf nodes downstream of j, we can calculate the delay D of path P as,

D = R j Cds = R j P j j C k j P k ds() j (1) We use a lumped RC tree model of the nterconnect as llustrated n Fg. 2. A Π-model s used to represent any Unform RC (URC) wre segment. Whle a more accurate delay model may beneft from choppng the URC nto multple lumped sub-sectons, t wll not change the value of the Elmore delay. That s, a Π-model yelds the precse Elmore delay when used to replace any URCs. Ths statement stems from examnng the Elmore delay of an URC that drves a downstream load Cds; Dvdng an URC wth resstance R and capactance C nto n Π-model segments, the ncremental contrbuton ( D) of the wre to the overall Elmore-path-delay s n R C D R Cds ----------- 1 n 2 -- RC = + 2 = R Cds + ------- 2 = 1 From (2), t s apparent that D s ndependent of the number of Π sectons used to model the URC. The resstance and capactance of each wre segment are dependent on three technology based parameters: () resstance per square r s (Ω/ ), () capactance per unt area c a (ff/µm 2 ), and () frnge capactance per unt length c f (ff/µm). The resstance R and capactance C of a wre segment wth length l and wdth w are calculated usng the formulas: l R = r s ---- ; C (3) w = c a l w + c f l Wres on the same layer have smlar parameters. Each layer s assocated wth dfferent parameters due to dfferences n metal thckness, spacng, and delectrc constants. Usng a layer assgnment and a technology fle, equatons (1) and (3) are combned to express the Elmore-path-delays as a functon of wre wdths. 5.0 Formulaton As stated above, our objectve (Phase II from Secton 3.0) s to mnmze the total wrng area subject to upper bound constrants on each path delay, as well as upper and lower bounds on wre wdths. Representng the optmzaton varables n terms of a vector of wre segment wdths w, the optmzaton s formally stated as: mnmze subject to: S A ( w) = l w = 1 ED ( w) TD = 1 P w L w w U = 1 S Where S and P denote the number of wre segments and the number of source-snk paths respectvely; ED ( w) s an estmator of the -th path delay as a functon of the wdths; TD s the target delay for path ; w U and wl are the upper and lower bounds on the wdth of segment. Usng the Elmore delay model, ED ( w) s obtaned by substtutng (3) nto (1): (2) (4) where α jk and β j are postve constants and ds(j) s the set of wre segments and leafs downstream (descendants) of segment j. It s worth notng that the constants depend only on the technology parameters, the fxed capactance loads, and the lengths of the wre segments. 5.1 Posynomalty Posynomal programmng s a branch of optmzaton theory [5]. A posynomal s a functon p of postve vector x, havng the form wth w k 1 ED ( w) = α jk ----- +, jk w j β j ---- j w j k ds() j p( x) = u ( x) a j u ( x) = b ( x j ) (7) j where the exponents a j are real numbers and the coeffcents b are postve real numbers. A posynomal program s a mnmzaton problem where the objectve functon s posynomal and the constrants are posynomal nequaltes that are less or equal to 1. Usng a varable substtuton x = EXP( z ), a posynomal program can be easly transformed to a convex program. Consequently, any local mnma s a global mnma and formal methods of convex programmng can be appled to fnd the mnmum. The Elmore delay (5) s a posynomal. In addton, the range constrants on the wre wdths can be expressed as posynomal constrants: w L w w U w w L s equvalent to -------- and. w U 1 -------- 1 w It follows that under the Elmore delay model the formulaton of our problem (Eqn. (4)) s a posynomal program n the orgnal doman, whch s equvalent to a convex program n the transformed doman. We explot the posynomalty when we descrbe the algorthm n Secton 7.0. In the next secton we prove some nterestng propertes of the problem of nterconnect area mnmzaton. 6.0 Theoretcal Analyss In ths secton we analyze specal propertes of the wre szng problem that have practcal and theoretcal relevance. It s worth notng that the analyss s applcable also n a broader context of resource allocaton, e.g. transstors szng and concurrent wre and gate szng. 6.1 Prelmnares We begn by defnng some necessary termnology: Defnton 1: the delay boundary s an arc n the feasble space for whch all nequalty delay constrants are satsfed as equaltes (equvalently stated, all delay constrants are actve). Defnton 2: a wre segment has negatve senstvty wth respect to the area mnmzaton (4) f we can decrease (5) (6)

ts wdth by an arbtrarly small amount wthout volatng any constrants. We also formalze a smple observaton whch s useful for understandng the subsequent proofs. Lemma 1: By decreasng the wdth of a segment, the path delays from the root to all the leaf nodes whch are not descendants of do not ncrease. proof: the Elmore expressons of the path delays from the root to the leaf nodes whch are not descendants of segment ether do not nclude, or they nclude as a capactve load only. Smaller capactve loads cannot ncrease the delay (plan algebra). QED 6.2 The boundary property In cases that the optmal soluton has no wres at ther mnmum allowable wdth, we prove that at the optmal soluton all the delay constrants are actve. Observe that f we relax the mnmum wre wdth constrants (set w L to zero) n (4), then ths property s always true. Theorem 1 (the boundary property): f at the optmal soluton no mnmum wre wdth constrant s actve, then the soluton s on the delay boundary. proof: assume, for sake of contradcton, that w s the global optmum and there exsts a constrant c whch corresponds to path that has a postve slack (.e. not satsfed as an equalty). Let us decrease the wdth of the leaf wre segment of path untl an equalty s acheved for the constrant c. By lemma 1, all the other root-to-leaf path delays do not ncrease, therefore the new confguraton s feasble. Furthermore, the total area of the tree s smaller then the area of w, hence there s a feasble soluton wth smaller value of the objectve functon and w s not the optmum, contradcton. QED It can be shown that the boundary property holds for any reasonable delay model. For example, t can be shown for a hgher order domnant pole model[13] usng adjont senstvtes[9]. 6.3 Senstvty Based Propertes The followng propertes are related to the senstvty of a soluton wth respect to wre wdths. By defnton, as long as some wdth confguraton w contans segments wth negatve senstvty, an mprovement step s possble by decreasng the sze of one (or more) wres wth negatve senstvty. Moreover, f we are not on the delay boundary, there exsts a wre segment (e.g. leaf segment) whose wdth can be decreased to reduce the overall wrng area whle mantanng the feasblty wth respect to the delay constrants. It s useful to defne not only the sgn of the senstvtes but also ther magntude. A change of a specfc wre wdth mght ether ncrease or decrease the Elmore delay to the down-stream leaf nodes. However, by nspecton of the algebrac expresson of the Elmore delay t s evdent that ncreasng (decreasng) the wdth results always wth ncreased (decreased) delays to the other (.e. not descendant) leaf nodes. Therefore we defne the magntude of the senstvty to be proportonal to the change n the delay to descendant leaf nodes (cone of nfluence). Snce the lengths of wres may dffer, we dvde t by the wre length n order to get a senstvty that s normalzed wth respect to an area unt rather then a wdth unt. Sometmes t s necessary to mnmze capactance rather then the area. It s worth notng that the only change to the mathematcal program beng solved s n the constant coeffcents of the objectve functon. For such objectve functon we dvde the magntude also by the capactance per area parameter to get the senstvty of the delay to downstream leaf nodes wth respect to a capactance unt. More formally, for the mnmum capactance objectve the magntude of the senstvty of segment s actually the partal dervatve: 1 ------------- EDj ( w), where ED s the delay to any down-stream l c a w j ( w) leaf segment. Let us consder a wdth confguraton on the delay boundary such that no sngle wre can be decreased whle mantanng the feasblty. Assume that the objectve s to mnmze the total capactance of the nterconnect tree. It can stll be possble to reduce the objectve functon f there s a par of segments on a path from the root to leaf such that the senstvty of the upstream segment s larger. It stems from the fact that the sub-tree rooted at the upper segment contans the subtree rooted at the lower segment; ncreasng the upper wre wdth s compensated by a larger decrease of the lower wre wthout volatng any delay constrant. Thus a feasble pont wth lower value of the objectve functon can be obtaned. The ntuton s as follows. Snce the transformed doman s convex, we can move between any two feasble ponts by makng subsequent steps n axs parallel drectons,.e. reszng a sngle segment at a tme. Good drectons nvolve ether a sngle wdth decrease, or a par of segments, or a more complex step such that the objectve functon s reduced wthout volatng the target delays. An obvous choce would be to ncrease the wdth of a wre that has the smallest cost for a gven decrease n delay, and to decrease the wdth of the wre that brngs the largest gan for an equvalent ncrease n delay. 6.4 Geometrcal Interpretaton We show why heurstcs whch consder only a sngle wre decrements (or ncrements) at a tme are lkely to reach a statonary pont whch s dfferent from the optmum. Observe that the area (capactance) objectve s a separable functon whose components are monotone. Consequently the optmal soluton cannot be strctly nteror and there must be some actve constrants at the optmal pont. w 2 w * w1 FIGURE 3. Illustraton of the feasble doman (oval) and a set (shaded rectangle) that domnates the optmum w*. A smplfed 2D llustraton of a feasble convex doman and a set of ponts that domnate the optmal soluton s gven n Fg.3. One pont domnates another pont f each of ts element s larger or equal to the correspondng element of the other pont. Consder an teratve process that starts from some feasble nteror pont and mproves the value of the objectve functon only by decrements of a sngle wre at a tme, as sketched n Fg. 3. As soon as a pont that does not domnate the optmum s reached, the sequence cannot converge to the optmal soluton because t would requre an ncrease of a wdth. One way to mprove on the short sghted sngle wre down-hll mprovement s to support both ncrements and decrements, or alternatvely support uphll moves when requred. Whle desgnng such teratve algorthms a care must be taken to avod cyclc behavor and jammng. An engneerng soluton can be to consder at

least a par of wres such that (potentally) some are ncreased and some are decreased. It provdes a down-hll mprovement of the objectve functon together wth a mechansm to escape from suboptmal statonary ponts. Unfortunately, counter examples can be contrved to show that for some pathologcal cases consderng at most a par s not suffcent. Those cases may occur only on the delay boundary when a concurrent resze of more then a par s necessary to mantan the feasblty. In our experence good results are obtaned by consderng at most pars. 7.0 Exact Wre-Szng Algorthm (EWA) EWA s mplemented followng the two phase methodology descrbed n Secton 3.0. The results from Phase I ensure a feasble startng pont for Phase II. However, any feasble pont could be used to ntalze Phase II. For example, the delay mnmzaton heurstcs proposed n [3] and [19] could provde the startng pont for the area mnmzaton phase. Followng the proofs and observatons from Secton 6.0 and the posynomal propertes from Secton 5.0, EWA converges to the exact soluton by solvng (4) n the transformed doman. 7.1 The Algorthm The followng s a hgh level descrpton of EWA. Phase I: [0] Intalze all the wres to ther maxmum wdth. [1] Decrease the wdth of one (or several) wre(s), such as to maxmze the reducton n the crtcal path delay. [2] If there exsts a wre whose sze can be decreased whle reducng the crtcal path delay, goto [1]. [3] End of Phase I. Phase II: [0] Intalze the wre wdths to the soluton of Phase 1 (or any feasble pont); Set the delay bounds usng the results of Phase I to ensure a feasble startng pont. [1] Decrease the wdth of one (or several) wre(s) wth negatve senstvty such that the product of sum of slacks wth respect to the delay constrants (negatve slacks are prohbted) and the area reducton s maxmal. [2] If there exsts a wre wth negatve senstvty goto [1]. [3] If there exsts a subset (e.g. par) of wres that can be reszed to reduce the total area wthout volatng the constrants perform a resze of that wres and goto [1]. [4] End of Phase II. It s straghtforward to realze that the algorthm termnates snce the area s reduced at each teraton. Recall that the mathematcal program s un-modal and that t s equvalent to a convex program under the exponental (.e. log) varable transformaton, thus general purpose convex programmng methods yeld an optmal soluton. Next we show how to customze a combnaton of the above wth a formal method to sute the needs of a specfc applcaton. 7.2 Implementaton Consderatons The way to pck a set of segments and to resze them at steps I(1), II(1), and II(3) of the algorthm s somewhat undetermned. Generally speakng, the larger the subset of wres whch s consdered the less effcent the teratons become, however, jammng s less lkely. In practce we obtaned very good results whle consderng at most a par of wres. The confguraton w II obtaned from Phase II s feasble and the value of the objectve functon s an upper bound of the optmum. In order to bound the error we propose to lnearzng the objectve functon and constrants around pont w II. The resultng Lnear Program s a true relaxaton of the nonlnear program. The value of the orgnal objectve functon at the optmal soluton of the lnear program (whch s not necessarly a feasble soluton to the nonlnear program) s a lower bound of the soluton. As an alternatve approach to consderng large subsets of segments, convergence can be acheved by usng Sequental Lnear Programmng (SLP). Each teraton of Lnear Programmng can be done very effcently. At ths stage t s far easer then applyng a formal method from the outset. Few major dffcultes for applyng formal methods n the general case are not present. To be more specfc: the startng pont s feasble and n the neghborhood of the optmal soluton; by adjustng the unts of the parameters a good scalng s easly obtaned; the objectve s monotone and separable and therefore lends tself to lnearzaton; buldng a dense Hessan matrx s not requred. Practcally, usng the algorthm n secton 7.1 n conjuncton wth SLP covers a broad range of applcatons that requre dfferent settngs of mplementaton smplcty, computatonal effcency, and accuracy of results. 7.3 Maxmum Wre Wdth Constrants Maxmum wre wdth constrants can be represented as posynomal nequaltes that do not dsturb the convexty of the problem. A wre wdth may be ncreased durng a parwse (or mult-way) szng step. If a max wdth constrant becomes actve, the wdth s set to w U. In most practcal stuatons, the optmal soluton s on the delay boundary even when some maxmum wdth constrants are actve. For example the leaf wre segments are typcally less than ther maxmum wdth for trees wth sgnfcant depth, such as clock trees. If at the optmal soluton the wdth of all the leaf wre segments s less than the upper bound and more than the lower bound, t mples that all the delay constrants are actve, regardless of the actve wdth constrants of upstream wres. 7.4 Mnmum Wre Wdth Constrants Lke the maxmum wdth constrants, the mnmum wdth constrants are posynomal nequaltes that do not dsturb the convexty of the problem. When the -th wre s decreased such that ts mnmum wdth constrant becomes actve, ts wdth s set to w L. It may be ncreased when there are no wres wth negatve senstvty and a step that nvolves at least a par of wres s requred; e.g. one s ncreased and the other decreased. When one or more mnmum wdth constrants are actve at the optmal soluton, some delay constrants may not be actve, thus the soluton may not be on the delay boundary. Ths observaton has mportant mplcatons on both sgnal nets (nequalty constrants) and clock nets (equalty constrants). 7.4.1 Sgnal Nets (Inequalty Constrants) At convergence of the algorthm to a confguraton w, some mnmum wre wdth constrants may become actve, therefore w does not have to be on the delay boundary TD. Note that f we set TD = ED ( w) and re-run the algorthm t wll converge to the same soluton. Less formally t can be argued that wthn the set of actve constrants there s a trade-off between delay constrants and wdth constrants.

7.4.2 Clock Nets (Equalty Constrants) Due to actve mnmum wre wdth constrants, not all the delay constrants have to be actve, hence there may be a skew. Unlke for sgnal nets, we cannot smply tghten a delay constrant on some paths and accept the mnmum area soluton. In such cases all of the path-delay targets would have to be decreased, whch could cause the clock tree to be over-desgned (.e. more resources than necessary used for meetng the desgn goals). It s reasonable to assume that actve mnmum-wre-wdth constrants ndcate that there exsts a better tree topology or a clusterng of loads. Due to space lmtatons we cannot explore these ssues n detal, but we use a smple clock tree example to llustrate them n the next secton. segment wdth 30.0 20.0 10.0 Fshburn s result and EWA after Phase I EWA After Phase II 8.0 Expermental Results Several examples are ncluded n ths secton to demonstrate the utlty of EWA. 0 5000.0 10000.0 length (a) 8.1 Tapered Lne As a frst example of applyng EWA we consder a sngle tapered wre wth capactance load and drver resstance, as shown n Fg. 4. The parameters for the example were taken from a paper by Fshburn and Schevon [8], where a taper functon that mnmzes the Elmore delay of a sngle wre s derved. A smlar dervaton was made ndependently n [1]. Unlke the analytcal dervaton, EWA demonstrates the taperng for the mnmum area soluton whle consderng frnge capactance, maxmum wdth constrants, and mnmum wdth constrants. 100Ω 14cm C L = 4pF r (Ω/ ) = 0.03 c a = 5e-10 (F/cm 2 ) c f =0 FIGURE 4. Fshburn s tapered lne example. Fshburn s model acheves a delay of 3.72ns wth exponental taperng of the wdth from 30.7 to 7.8 mcrons, as shown n Fg. 5(a). The EWA result for Phase I s also plotted for comparson, where the wre was broken nto 20 segments and the total wrng area was 2.32mm 2. Next we appled Phase II of the EWA approach targetng a delay of 4.25ns that s 15% above the mnmum possble delay. The total metal area was reduced by 50% to 1.17mm 2. The wdths are shown plotted n Fg. 5(a). Of further nterest s to consder what the optmal taperng looks lke when frnge capactance or wre wdth constrants are mposed. The taperng s no longer a perfect exponental, but t dd not change dramatcally for ths example. The results are summarzed n Fg. 5(b). 8.2 Small Sgnal Net wth Non-monotone Wre Wdth Assgnment As mentoned n Secton 1.0, EWA does not rely on any restrctons on the wre wdth assgnment, however some other methods [3] [19] rely on a monotone assgnment of wre wdths that ncrease from the leaf nodes toward the root node. The monotoncty makes sense and can be proved for a sngle layer wrng; t does not hold for deep submcron technologes and mult-layer wrng, where the unt resstvty and unt-capactance vary sgnfcantly between layers. To demonstrate ths pont, consder the smple example crcut n segment wdth 20.0 15.0 10.0 5.0 0 Wth Max. Wre Wdth Constrant Wth Frnge Capactance EWA After Phase II 5000.0 10000.0 length (b) FIGURE 5. Tapered lne example from Fg. 4. (a) Comparson wth Fshburn s analytcal soluton after phase I, and mnmum area soluton after phase II. (b) Change n exponental taperng functon due to frnge capactance (c f =0.05(fF/mcron)) and maxmum wre wdth constrants (12 mcrons). Fg. 6 for a typcal 0.5 mcron CMOS process. All of the horzontal segments are routed on metal 1 (M1) and all of the vertcal segments are routed on metal 4 (M4). Phase I of EWA produced a delay of 273ps for the crtcal path to the 0.5pF load. For Phase II we targeted the delay at ths node to be 314ps whch s 15% larger. Notce that the wre wdths assgnment n Fg. 6 s non-monotone, All dmensons are n mcrons w 2 : 1.00 0.15 pf M1 r = 0.14 (ohms/sq) c a = 0.08 (ff/um 2 ) c f =0.03 (ff/um) w 1 : 1.99 1320 r = 0.02 (ohms/sq) R d = 5Ω M4 c a = 0.03 (ff/um 2 ) c f =0.08 (ff/um) 1500 w 3 : 1.00 w 4 : 1.00 570 w 5 : 1.14 0.2 pf 630 0.5 pf 1828 FIGURE 6. Sngle-stage drver and wre szng example. Non-monotone wre wdth assgnment for a target delay of 314ps and max and mn wre wdth constrants of 2 and 1 mcron respectvely. The total metal area s 7,588 um 2

snce segment 5 s wder than the upstream segment 3. 8.3 Clock Tree We have tred EWA on some of the Tsay benchmark crcuts [20] wth the mnmum wre wdth constrants set to zero. As could be expected, many of the downstream wres became less then the mnmum feature sze of today s technologes. It ndcates that we must ether reduce the target delays so that the wres are requred to be wder, or modfy the tree topology. Both are mportant consderatons that can be most easly dscussed n terms of a smpler clock tree example. r=0.02 ohms/sq. c a =0.08 ff/um 2 c f =0.06 ff/um 3.01pF 610 length unts are mcrons 610 287 287 287 287 1.97pF 1.98pF 2.75pF 1.03pF 575 610 575 718 1202 3.01pF 2.63 2.76 1.97pF 1.36 1.39 1.98pF 2.75pF 2.59 1.23 1.03pF 3.08 2.88 4.02 4.99 9.15 1.69 4.31 17.7 1.37pF 1.03 0.64 0.85pF 2.18pF 1.96 1.85 2.06pF (a) Delay target = 114ps; w max = 20um; w mn = 0; total metal area = 49,073 um 2 ; skew = 0 575 1077 575 3.01pF 1.37pF 287 287 0.85pF 2.18pF 287 287 2.06pF 3.18 FIGURE 7. Clock tree dstrbuton - drver resstance s 2 ohms. Fg. 7 s a sketch of the top-level clock dstrbuton for a mcroprocessor. The load capactances represent clusters of local repeaters and the assocated nterconnect. The desgn objectve s to sze the wres, such as to balance the delays (zero the skew) to all of the clusters. Usng the two phase approach, the delay for Phase II was frst targeted at 114ps. The maxmum allowable wdth was 20 mcrons for all the wres. The szed clock tree s shown n Fg. 8(a). Note that the skew s zero, but one of the wre wdths s 0.64 mcrons whch s less than the smallest manufacturable wdth of 0.9 mcrons. There are two optons to proceed, ether modfy the tree or target the delay more aggressvely. For ths desgn we consdered retargetng the delay to 105ps, whch tends to ncrease the smallest wre wdth, as shown n Fg. 8(b). The narrowest wre became manufacturable at a wdth of 0.9 mcrons, whle the total metal area was ncreased by 20%. If the clock s dstrbuted on a metal layer wth mnmum feature szes greater than 0.9 mcrons, we should retarget the delays even more aggressvely, as shown n Fg. 8(c) for a mnmum allowable wdth of 2.5 mcrons. The area s decreased as we trade area for less aggressve delay and more skew. The skew became 5.7ps due to the delay at the 0.85pF load whch s less then the target. Such small skew s probably acceptable. It could be brought to zero by changng the tree topology or by addng dummy loads. Observe what the mnmum wre wdth volatons ndcate for clock trees. Whle the tree n Fg. 7 was dstrbuted n somewhat of an H-tree fashon, the mbalance of the cluster loads made a balanced H-tree non-optmal. In other words, there exsts a better tree topology and/or a clusterng of loads as a startng pont for wre szng. Whle t s beyond the scope of ths paper to dscuss such desgn trade-offs, we shall explore ths problem as part of the future work assocated wth EWA. 3.36 1.97pF 1.79 1.80 1.98pF 2.75pF 3.05 1.44 1.03pF 3.72 3.55 4.64 5.97 11.6 1.98 5.02 20.0 1.37pF 1.44 0.90 0.85pF 2.18pF 2.26 2.14 2.06pF (b) Delay target = 105ps; w max = 20um; w mn = 0; total metal area = 59,048um 2 ; skew = 0 3.01pF 2.72 2.86 1.97pF 2.50 2.50 1.98pF 2.75pF 3.19 2.50 1.03pF 3.37 2.99 4.05 4.34 9.32 2.50 4.15 16.7 1.37pF 2.50 2.50 0.85pF 2.18pF 2.64 2.50 2.06pF (c) Delay target = 114ps; w max =20um; w mn =2.5um; total metal area = 51,600 um 2 ; skew = 5.7ps FIGURE 8. Clock tree example that demonstrates the desgn trade-offs offered by changng the delay targets and wre wdth constrants. (Wdth unts are mcrons).

9.0 Concluson The algorthm presented n ths paper converges to the global optmum of the wre-szng problem under the Elmore delay model. The algorthm s smple and effcent makng t useful for applcatons where one mght be tempted to use heurstcs. Several new observatons were made to facltate the algorthm whch are of both theoretcal and practcal value. Most of the observatons hold for more exact delay models, but we can not at ths tme prove that the delay bounds form a convex set. It can be shown emprcally that usng hgher order delay models yeld better results. Future work wll be focused on extendng EWA to nclude hgher order delay models and developng effcent schemes for selectng the best set of wres to be szed n each step of the algorthm. 10.0 References [1] C-P. Chen, Y-P. Chen and D.F. Wong, Optmal Wre-Szng Formula Under the Elmore Delay Model, In Proc. Desgn Automaton Conference, pp. 487-490, June 1996. [2] J. Cong, L. He, C-K Koh, and P.H. Madden, Performance Optmzaton of VLSI Interconnect Layout, to appear n Integraton, the VLSI Journal. [3] J. Cong and K.-S. Leung, Optmal wreszng under the dstrbuted Elmore delay model, Proc. of the Intl. Conf. on Computer-Aded Desgn, pp. 634 639, Nov. 1993. [4] J. Cong and K.-S. Leung, Optmal wreszng under Elmore delay model, IEEE Trans. Computer-Aded Desgn, 14(3), pp. 321-336, March 1995. [5] J. G. Ecker, Geometrc programmng: methods, computatons and applcatons, SIAM Revew, 22, pp. 338-362, July 1980. [6] W. C. Elmore, The transent response of damped lnear networks wth partcular regard to wde-band amplfers, J. Appled Physcs, 19(1), pp. 55-63, Jan. 1948. [7] J. P. Fshburn and A. E. Dunlop, TILOS: A posynomal programmng approach to transstor szng, Proc. of the Intl. Conf. on Computer-Aded Desgn, pp. 326 328, Nov. 1985. [8] J. P. Fshburn and C.A. Schevon, Shapng a Dstrbuted- RC Lne to Mnmze Elmore Delay, IEEE Trans. on Crcuts and Systems, 42(12), pp. 1020-1022, December 1995. [9] J. Y. Lee, X Huang, and R. Rohrer, Pole and zero senstvty calculaton n asymptotc waveform evaluaton, IEEE Trans. Computer-Aded Desgn, 11(5), pp. 586-597, May 1992. [10] N. Menezes, S. Pullela, F. Dartu, and L. T. Pllage, RC nterconnect synthess a moment fttng approach, Proc. of the Intl. Conf. on Computer-Aded Desgn, pp. 418 425, Nov. 1994. [11] N. Menezes, R. Baldck and L. Plegg, A Sequental Quadratc Programmng Approach to Concurrent Gate and Wre Szng, Proceedngs of the Internatonal Conference on Computer-Aded Desgn, pp. 144-151, 1995. [12] P. Penfeld, J. Rubnsten, and M. Horowtz, Sgnal delay n RC tree networks, IEEE Trans. Computer-Aded Desgn, CAD-2, pp. 202-211, July 1983. [13] L. T. Pllage and R. Rohrer, Asymptotc waveform evaluaton for tmng analyss, IEEE Trans. Computer-Aded Desgn, 9(4), pp. 352-366, Aprl 1990. [14] S. Pullela, N. Menezes, and L. T. Pllage, Relable non- Zero Skew Clock Trees Usng Wre Wdth Optmzaton, Proc. 30th ACM/IEEE Desgn Automaton Conference, pp.165-170, June 1993. [15] S. Pullela, N. Menezes and L.T. Plegg, Clock Skew Mnmzaton va Wre Wdth Optmzaton, IEEE Transactons on Computer-Aded Desgn, Accepted for Publcaton. [16] C. Ratzlaff and L. T. Pllage, RICE: rapd nterconnect crcut evaluaton usng AWE, IEEE Trans. Computer-Aded Desgn, 13(6), pp. 763-776, June 1994. [17] S. S. Sapatnekar, V. B. Rao, P. M. Vadya, and S.-M. Kang, An exact soluton to the transstor szng problem for CMOS crcuts usng convex optmzaton, IEEE Trans. Computer-Aded Desgn, 12(11), pp. 1621-1634, May 1992. [18] S. S. Sapatnekar, RC nterconnect optmzaton under the Elmore delay model, Proc. 31st ACM/IEEE Desgn Automaton Conference, pp. 387 391, June 1994. [19] S. S. Sapatnekar, Wre Szng as a Convex Optmzaton Problem: Explorng the Area-Delay Tradeoff, IEEE Trans. Computer-Aded Desgn, 15(8), pp. 1001-1011, Aug. 1996. [20] R-S. Tsay, Exact Zero-skew, pp. 336-339, Proc. IEEE Internatonal Conference on Computer-Aded Desgn, Nov 1991. [21] Qung Zhu, Wayne W.-M. Da, and Joe G. X, Optmal Szng of Hgh-Speed Clock Networks Based on Dstrbuted RC and Lossy Transmsson Lne Models, pp. 628-633, Proc. Internatonal Conference on Computer Aded Desgn 1993.