Projection-Based Performance Modeling for Inter/Intra-Die Variations

Similar documents
Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms

Feature Reduction and Selection

Lecture 4: Principal components

Smoothing Spline ANOVA for variable screening

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Support Vector Machines

Cluster Analysis of Electrical Behavior

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Recognizing Faces. Outline

An Optimal Algorithm for Prufer Codes *

S1 Note. Basis functions.

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Support Vector Machines

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

LECTURE : MANIFOLD LEARNING

A Binarization Algorithm specialized on Document Images and Photos

Mathematics 256 a course in differential equations for engineering students

A Robust LS-SVM Regression

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Wishing you all a Total Quality New Year!

The Codesign Challenge

Intra-Parametric Analysis of a Fuzzy MOLP

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

Programming in Fortran 90 : 2017/2018

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

X- Chart Using ANOM Approach

Backpropagation: In Search of Performance Parameters

Review of approximation techniques

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

The Research of Support Vector Machine in Agricultural Data Classification

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Private Information Retrieval (PIR)

Wavefront Reconstructor

Optimal Workload-based Weighted Wavelet Synopses

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Classification / Regression Support Vector Machines

Reading. 14. Subdivision curves. Recommended:

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Newton-Raphson division module via truncated multipliers

Module 6: FEM for Plates and Shells Lecture 6: Finite Element Analysis of Shell

CS 534: Computer Vision Model Fitting

Meta-heuristics for Multidimensional Knapsack Problems

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

High-Boost Mesh Filtering for 3-D Shape Enhancement

Solving two-person zero-sum game by Matlab

High-Level Power Modeling of CPLDs and FPGAs

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Reducing Frame Rate for Object Tracking

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

y and the total sum of

Sequential Projection Maximin Distance Sampling Method

Parallel matrix-vector multiplication

An Entropy-Based Approach to Integrated Information Needs Assessment

Adaptive Virtual Support Vector Machine for the Reliability Analysis of High-Dimensional Problems

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

Classifier Selection Based on Data Complexity Measures *

A Topology-aware Random Walk

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

Support Vector Machines. CS534 - Machine Learning

Stitching of off-axis sub-aperture null measurements of an aspheric surface

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Lecture 5: Multilayer Perceptrons

SVM-based Learning for Multiple Model Estimation

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

A Semi-parametric Regression Model to Estimate Variability of NO 2

Biostatistics 615/815

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Modeling Multiple Input Switching of CMOS Gates in DSM Technology Using HDMR

Analog Circuit Sizing using Adaptive Worst-Case Parameter Sets

TPL-Aware Displacement-driven Detailed Placement Refinement with Coloring Constraints

Minimization of the Expected Total Net Loss in a Stationary Multistate Flow Network System

Principal Component Inversion

Fast Computation of Shortest Path for Visiting Segments in the Plane

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

How Accurately Can We Model Timing In A Placement Engine?

THE PULL-PUSH ALGORITHM REVISITED

Topology Design using LS-TaSC Version 2 and LS-DYNA

Transcription:

Proecton-Based Performance Modelng for Inter/Intra-De Varatons Xn L, Jayong Le 2, Lawrence. Plegg and Andrze Strowas Dept. of Electrcal & Computer Engneerng Carnege Mellon Unversty Pttsburgh, PA 523, USA {xnl, plegg, as}@ece.cmu.edu 2 Extreme DA 65 Unversty Avenue Palo Alto, CA 9430, USA elvn@extreme-da.com Abstract Large-scale process fluctuatons n nano-scale IC technologes suggest applyng hgh-order (e.g., quadratc) response surface models to capture the crcut performance varatons. Fttng such models requres sgnfcantly more smulaton samples and solvng much larger lnear equatons. In ths paper, we propose a novel proecton-based extracton approach, PROBE, to effcently create quadratc response surface models and capture both nterde and ntra-de varatons wth affordable computaton cost. PROBE apples a novel proecton scheme to reduce the response surface modelng cost (.e., both the requred number of samples and the lnear equaton sze) and mae the modelng problem tractable even for large problem szes. In addton, a new mplct power teraton algorthm s developed to fnd the optmal proecton space and solve for the unnown model coeffcents. Several crcut examples from both dgtal and analog crcut modelng applcatons demonstrate that PROBE can generate accurate response surface models whle achevng up to 2x speedup compared wth the tradtonal methods.. Introducton As IC technologes scale to fner feature szes, t becomes ncreasngly dffcult to control the relatve process varatons, partcularly due to sub-wavelength photolthography []-[2]. he ncreasng fluctuatons n manufacturng process have ntroduced unavodable and sgnfcant uncertanty n crcut performance. Hence, modelng and analyzng these random process varatons to ensure manufacturablty and mprove yeld has been dentfed as a top prorty for today s IC desgn problems. In order to address ths process varaton problem, response surface models [3] are utlzed to capture the crcut performance varatons caused by manufacturng fluctuatons. he obectve of response surface modelng s to approxmate the crcut performance (e.g., delay, gan) as a polynomal (e.g., lnear or quadratc) functon of varatonal process parameters (e.g., V H, OX ). hese models are extensvely appled n many applcatons such as statstcal tmng analyss [], analog msmatch analyss [4], yeld optmzaton [5]-[6], etc. Most of the prevous response surface models, e.g., [], utlze lnear approxmatons, whch are effcent and accurate when process varatons are suffcently small. However, two recent changes n advanced IC technologes suggest a need to revst ths assumpton. Frstly, process varatons are becomng relatvely larger. As reported n [], the gate length varaton can reach ±35% n nano-scale technologes. hs, n turn, mples the mportance of applyng hgh-order (e.g., quadratc) response surface models to guarantee hgh approxmaton accuracy [3], [6], [7]. Applyng nonlnear response surface models s especally mportant for analog crcuts, snce many analog performances (e.g., offset voltage) can be strongly nonlnear n the presence of large-scale varatons. Secondly, but most mportantly, ntra-de varatons (.e., msmatches) are becomng ncreasngly mportant [2], especally for analog crcuts [4]. hese ntra-de varatons model the ndvdual, but spatally correlated, local varatons wthn the same de. he ntra-de varatons must be modeled by usng many addtonal random varables, thereby sgnfcantly ncreasng the number of unnown model coeffcents. herefore, more smulaton samples are requred n order to determne all these unnown coeffcents by solvng a larger lnear equaton. hs maes model fttng much more expensve, especally when usng hgh-order response surface models. For example, the number of unnown coeffcents (hence the requred number of samples and the lnear equaton sze) n a quadratc response surface model wll quadratcally ncrease n the number of random process parameters, thereby qucly mang the quadratc model fttng nfeasble. For ths reason, generatng accurate hgh-order (e.g., quadratc) response surface models wth affordable computaton cost becomes a new challengng problem n nano-scale technologes. In ths paper we propose a novel Proecton-Based Extracton (PROBE) for quadratc response surface modelng. he novelty of PROBE les n our new formulaton of the model fttng problem such that quadratc response surface modelng becomes tractable even for large-sze problems. Instead of fttng a full-ran quadratc model, PROBE apples proecton operator and attempts to fnd an optmal low-ran model by mnmzng the approxmaton error. In PROBE, the modelng accuracy can be easly traded for smplcty by ncreasng or decreasng the dmenson of the proecton space. Most mportantly, tang advantage of ths novel proecton scheme, PROBE can dramatcally reduce the number of unnown coeffcents that need to be solved, thereby sgnfcantly reducng the fttng cost and facltatng scalng to much larger problem szes. Another mportant contrbuton of PROBE s a new mplct power teraton algorthm to fnd the optmal proecton space and extract the unnown model coeffcents. hs teraton solves a sequence of over-determned lnear equatons and exhbts robust convergence. Usng the proposed mplct power teraton algorthm, PROBE can acheve sgnfcant speedup for generatng low-ran quadratc response surface models. As demonstrated by the numercal examples from both dgtal and analog crcut modelng applcatons, PROBE can extract accurate models and reduce the computaton cost by up to 2x compared wth the tradtonal full-ran quadratc modelng. he remander of the paper s organzed as follows. In Secton 2 we revew the bacground on response surface modelng. hen, we propose our PROBE approach, ncludng both the theoretcal analyss and the mplct power teraton algorthm, n Secton 3. he computatonal effcency of PROBE s demonstrated by several crcut examples n Secton 4, followed by the conclusons n Secton 5. 2. Bacground Gven a crcut topology, the crcut performance (e.g., delay, gan) s a functon of the desgn parameters (e.g., bas current, 0-7803-9254-X/05/$20.00 2005 IEEE. 72

transstor szes), as well as the process parameters (e.g., V H, OX ). he desgn parameters are optmzed and fxed durng the desgn process; however, the process parameters must be modeled as random varables to account for any uncertan varatons. Gven a set of fxed desgn parameters, the crcut performance f can be approxmated by a lnear response surface model [], [3]: ( X ) B X + C f = () where X = [x,x 2,...,x N ] represents the process varatons, B R N and C R stand for the model coeffcents and N s the total number of the varatonal process parameters. he lnear model n () s not suffcently accurate for modelng the large-scale process varatons that are expected for nano-scale technologes. It, n turn, suggests that applyng quadratc response surface models mght be requred to mprove the modelng accuracy [3], [6], [7]: f ( X ) X AX + B X + C = (2) where C R s the constant term, B R N represents the lnear coeffcents and A R N N denotes the quadratc coeffcents. he unnown model coeffcents A, B and C can be determned by solvng the over-determned lnear equaton [3]: ~ X AX + B X + C = f ( =,2,L) (3) where X and f ĩ are the value of X and the exact value of f for the - th samplng pont, respectvely. It s straghtforward to verfy that the number of unnown coeffcents n (3) s O(N 2 ). he overall computaton cost for determnng all these coeffcents conssts of two portons: Smulaton cost:.e., the cost for runnng a smulator to determne the performance values f at the samplng ponts X. he number of smulaton samples should be greater than the number of unnown coeffcents, n order to unquely solve the lnear equaton n (3). herefore, at least O(N 2 ) samplng ponts are requred for fttng the quadratc model n (2). In practcal applcatons, the number of samples s generally selected to be sgnfcantly larger than the unnown coeffcent number to avod over-fttng. Fttng cost:.e., the cost for solvng the over-determned lnear equaton n (3). For the quadratc model n (2), the fttng cost s of the order of O(N 6 ). he aforementoned hgh computaton cost lmts the tradtonal quadratc response surface modelng approach [3] to small or medum sze applcatons. hs observaton, therefore, motvates us to propose a novel proecton-based response surface modelng algorthm, PROBE, whch can sgnfcantly reduce the computaton cost. 3. Proecton-Based Extracton 3. Mathematc Formulaton he ey dsadvantage of the tradtonal quadratc response surface modelng s the need to compute all elements of the matrx A n (2). hs matrx s often sparse and ran-defcent n many practcal problems. herefore, nstead of fndng the full-ran matrx A, PROBE approxmates A by another low-ran matrx A L. Such a low-ran approxmaton problem can be stated as follows: gven a matrx A, fnd another matrx A L wth ran p < ran(a) such that ther dfference A L A F s mnmzed. Here, F denotes the Frobenus norm, whch s the square root of the sum of the squares of all matrx elements. Wthout loss of generalty, we assume that A s symmetrc n ths paper, snce any asymmetrc quadratc form X AX can be easly converted to an equvalent symmetrc form 0.5 X (A+A )X [8]. From matrx theory [8], for any symmetrc matrx A R N N, the optmal ran-p approxmaton wth the least Frobenus-norm error s: A L = p = P P λ (4) where λ s the -th domnant egenvalue, and P R N s the -th domnant egenvector. he egenvectors n (4) defne an orthogonal proector P P +...+P p P p, and every column n A L s the proecton of every column n A onto the subspace span{p,...,p p }. We use ths orthogonal proector for response surface modelng n ths paper. Fg. ntutvely llustrates the low-ran proecton for quadratc response surface modelng. A Low-Ran Proecton A L Fg.. Illustraton of the low-ran proecton. he man advantage of the ran-p proecton s that, for approxmatng the matrx A R N N n (2), only λ R and P R N ( =,...,p) need to be determned, thus reducng the number of problem unnowns to O(pN). In many practcal applcatons, p s sgnfcantly less than N and the number of unnown coeffcents that PROBE needs to solve s almost a lnear functon of N. herefore, compared wth the problem sze O(N 2 ) n tradtonal quadratc modelng, PROBE s much more effcent and can be appled to large-sze problems. 3.2 Coeffcent Fttng va Implct Power Iteraton Snce the matrx A n (2) s not nown n advance, we cannot use the standard matrx computaton algorthm to compute the domnant egenvalues λ and egenvectors P that are requred for a low-ran approxmaton. One approach for fndng the optmal ran-p model s to solve the followng optmzaton problem for the unnown coeffcents λ and P ( =,2,...,p) and B, C: mnmze subect to ψ P X = p = = 2 λ P P ( =, L, p) ~ X + B X + C f (5) where 2 denotes the 2-norm of a vector. Compared wth (2), equaton (5) approxmates the matrx A by λ P P +...+λ P p P p. herefore, we can expect that mnmzng the cost functon Ψ n (5) wll converge λ and P to the domnant egenvalues and egenvectors of the orgnal matrx A, respectvely. Unfortunately, Ψ n (5) s a sxth order polynomal and mght not be convex. In addton, the constrant set n (5) s specfed by a quadratc equaton and s not convex ether. herefore, the optmzaton n (5) s not a convex programmng problem and there s no effcent optmzaton algorthm that can guarantee fndng the globally optmal soluton for Ψ. Instead of solvng the non-convex optmzaton problem n (5), we propose a novel mplct power teraton method to effcently extract the unnown coeffcents λ and P. In what follows, we frst develop the mplct power teraton algorthm for ran-one approxmaton, and then extend t to ran-p approxmaton. A. Ran-One Approxmaton Fg. 2 outlnes the mplct power teraton algorthm for a 2 722

ran-one approxmaton. hs algorthm repeatedly solves a sequence of over-determned lnear equatons untl the convergence s dentfed. Next, we explan why the mplct power teraton yelds the optmal ran-one approxmaton A L = λ P P. Note that Step 4 n Fg. 2 approxmates the matrx A by Q Q, where Q s determned n the prevous teraton step. Fndng such an optmal approxmaton s equvalent to solvng the over-determned lnear equaton: Q Q = A (6) he least-square-error soluton for (6) s gven by [8]: ( Q Q ) AQ Q (7) = AQ = In (7), Q Q = Q 2 2 =, snce Q s normalzed n Step 3 of Fg. 2. Equaton (7) reveals an nterestng fact that solvng the over-determned lnear equaton n Step 4 mplctly computes the matrx-vector product AQ, whch s the basc operaton requred n the tradtonal power teraton for domnant egenvector computaton [8].. Start from a set of samplng ponts {X, f }. 2. Randomly select an ntal vector Q 0 R N and set =. 3. Compute Q = Q / Q 2. 4. Solve the over-determned lnear equaton for Q, B and C : ~ X QQ X + B X + C = f ( =,2,L ) 5. If the resdue: ~ 2 ψ Q, B, C X Q Q X + B X + C f ( ) ( ) = s unchanged,.e.: ψ Q B, C ψ Q, B C < ( ) ( ) ε,, where ε s the pre-defned error tolerance, then go to Step 6. Otherwse, = + and return Step 3. 6. he ran-one response surface model s: ( X ) = X QQ X + B X C f + Fg. 2. Implct power teraton for a ran-one approxmaton. Gven an ntal vector: Q0 = α P + α 2P2 +L (8) where Q 0 s represented as the lnear combnaton of all egenvectors of A, the -th teraton step yelds: Q = A Q0 = α λ P + α 2λ2P2 +L (9) In (9), we gnore the normalzaton Q = Q / Q 2 whch s nothng else but a scalng factor. hs scalng factor wll not change the drecton of Q. As long as α 0 n (8),.e., P s not orthogonal to the ntal vector Q 0, α λ P (wth λ > λ 2 >...) wll become more and more domnant over other terms. Q wll asymptotcally approach the drecton of P. After the teraton n Fg. 2 converges, we have Q = Q / Q 2 = P and Q = AQ = λ P. Q Q s the optmal ran-one approxmaton A L = λ P P. hus the proposed mplct power teraton extracts the unnown coeffcents λ and P wth guaranteed convergence, but n an mplct way (.e., wthout nowng the full-ran matrx A). hs mplct property s the ey dfference between the proposed algorthm and the tradtonal power teraton n [8]. he above dscusson demonstrates that the mplct power teraton s provably convergent f A s symmetrc. For an asymmetrc A, Q and Q should teratvely converge to the drectons of the domnant left and rght sngular vectors of A to acheve the optmal ran-one approxmaton. However, the global convergence of the mplct power teraton s dffcult to prove n that case. B. Ran-p Approxmaton Fg. 3 shows the mplct power teraton algorthm for a ranp approxmaton. Assumng that the unnown functon can be approxmated by the full-ran quadratc form n (2), the algorthm n Fg. 3 frst extracts ts ran-one approxmaton: ( X ) X ( P P ) X + B X + C g = λ (0) hen, the component of g (X) s subtracted from the full-ran quadratc functon n Step 3 of Fg. 3, yeldng: f N λ () =2 ( X ) g ( X ) = X P P X Now, λ 2 and P 2 become the respectve domnant egenvalue and egenvector of the quadratc functon n (), and they are extracted by the ran-one mplct power teraton to generate g 2 (X). he ran-one mplct power teraton and the subtracton are repeatedly appled for p tmes untl the ran-p approxmaton f p (X) s acheved.. Start from a set of samplng ponts {X, f }. For =, 2,..., p 2. Apply the mplct power teraton algorthm n Fg. 2 to compute the ran-one approxmaton g (X). 3. Update the samplng ponts: ~ ~ f = f g X =,2,L ( ) ( ) End For 4. he ran-p response surface model s: f X = g X + L g ( ) ( ) ( X ) p + Fg. 3. Implct power teraton for a ran-p approxmaton. he algorthm n Fg. 3 assumes a gven approxmaton ran p. In practcal applcatons, the value of p can be teratvely determned based on the approxmaton error. For example, startng from a low-ran approxmaton, p should be teratvely ncreased f the modelng error remans large. he ran-p mplct power teraton n Fg. 3 requres runnng the ran-one mplct power teraton for p tmes. Each ran-one approxmaton needs to solve 2N+ unnown coeffcents, for whch the requred number of samples s of the order of O(N), and solvng the over-determned lnear equaton n Step 4 of Fg. 2 has a complexty of O(N 3 ). herefore, a ran-p approxmaton requres O(pN) smulaton samples n total and the overall computaton cost for the ran-p mplct power teraton n Fg. 3 s O(pN 3 ). In many practcal applcatons, p s much less than N and, therefore, PROBE s much more effcent than the tradtonal full-ran quadratc modelng whch requres O(N 2 ) smulaton samplngs and has a fttng cost of O(N 6 ) for solvng the overdetermned lnear equaton. 3.3 Comparson wth radtonal echnques here are several tradtonal technques, such as prncpal component analyss [9], varable screenng [0] or proecton pursut [], whch am to reduce the computaton cost of response surface modelng. In ths subsecton, we compare PROBE wth these tradtonal technques and hghlght ther dfferences. Prncpal component analyss (PCA) [9] s a statstcal method for reducng the number of random varables that are requred to represent the process varatons. Gven N normally dstrbuted process parameters X = [x,x 2,...,x N ] and ther correlaton matrx R, PCA computes the domnant egenvalues and egenvectors of p 723

R, and then constructs a set of new random varables Y = [y,y 2,...,y M ], where M < N, to approxmate the orgnal N- dmensonal random space. he essence of PCA can be nterpreted as the coordnate rotaton of the orgnal random space X followed by a low-ran proecton onto the low-dmensonal space Y. he new random varables y are called the prncpal components or factors. After PCA, the crcut performances can be approxmated as functons of the new random varables y usng response surface modelng. Snce the number of new varables y s less than the number of orgnal varables x, PCA reduces the number of unnown model coeffcents. Such a PCA approach, however, s substantally dfferent from our proposed PROBE method. he PCA operaton s completely determned by the statstcal characterstcs,.e., the correlaton matrx R, of random process varatons, wthout dependng on a specfc crcut performance f. In contrast, PROBE reduces the modelng cost by carefully analyzng a specfc performance f. In other words, PROBE wll elmnate (or eep) one egenvector P f f s strongly (or wealy) dependent on P. herefore, PCA and PROBE rely on completely dfferent mechansms to mnmze the computaton cost. In practcal applcatons, both PCA and PROBE should be smultaneously appled to acheve the mnmal modelng cost, as shown n Fg. 4. x M x N Orgnal Space PCA y M y M Low-Dmensonal Space PROBE /w Addtonal Coordnate Rotaton PROBE f = Y AY + B Y + C Response Surface Model Fg. 4. Combnaton of PCA and PROBE to reduce cost. Varable screenng s another tradtonal approach for reducng the response surface modelng cost [0]. Gven a crcut performance f, varable screenng apples fractonal factoral expermental desgn and tres to dentfy a subset (hopefully small) of the random process parameters that have much greater nfluence on f than the others. Compared wth varable screenng, PROBE also do a smlar varable screenng, but wth an addtonal coordnate rotaton, as shown n Fg. 5. he addtonal coordnate rotaton offers more flexblty n flterng out nsgnfcant components, thereby achevng better modelng accuracy and/or cheaper modelng cost. From ths pont of vew, the proposed PROBE can be consdered as a generalzed varable screenng whch s an extenson of the tradtonal varable screenng n [0]. Rotaton by Egenvectors Λ L 0 A L 0 A = P M O M P A = M O M 0 L ε 0 L ε Insgnfcant Component radtonal Varable Screenng Fg. 5. Comparson of PROBE wth varable screenng. Proecton pursut [] tres to approxmate the unnown hgh-dmensonal nonlnear functon by the sum of several smooth low-dmensonal functons. he authors n [] utlze the onedmensonal proecton: f ( X ) = g ( P X ) + g 2 ( P2 X ) + L (2) where g ( ) s the pre-defned one-dmensonal nonlnear functon and P R N defnes the proecton space. One of the man dffcultes n tradtonal proecton pursut s to fnd the optmal proecton vectors P. he authors n [] apply local optmzaton wth heurstcs to search for the optmal P. Such an optmzaton can easly get stuc at a local mnmum. Our proposed PROBE algorthm s actually a specal case of the tradtonal proecton pursut, where all g ( ) are quadratc functons. In such cases, the theoretcal soluton of the optmal proecton vectors P s nown,.e., they are determned by the domnant egenvalues and egenvectors of the orgnal full-ran matrx A. hese domnant egenvalues and egenvectors can be extracted by the proposed mplct power teraton algorthm qucly and robustly. Such a specal advantage of usng the quadratc g ( ), however, has not been explored n tradtonal proecton pursut. 3.4 Applcaton of PROBE Models he low-ran quadratc models extracted by PROBE can be generally appled to any applcatons that requre quadratc response surface modelng, such as [3]-[7]. In addton to these general applcatons, we emphasze a specal ln between our PROBE modelng and the APEX algorthm proposed n [7]. In APEX, the most expensve computaton s the bnomal moment evaluaton, whch requres dagonalzng the quadratc coeffcent matrx A by egen-decomposton. Usng our PROBE modelng, however, the matrx A s approxmated by a low-ran one A L. he egen-decomposton of the low-ran matrx A L s much cheaper than fndng the egenvalues/egenvectors of the full-ran matrx A. herefore, the complexty of the APEX algorthm can be sgnfcantly reduced f usng the PROBE model as ts nput. he detaled mplementaton for combnng PROBE and APEX s beyond the scope of ths paper and, therefore, s not dscussed n detal. 4. Numercal Examples In ths secton we demonstrate the computatonal effcency of PROBE usng several crcut examples. For each example, two ndependent samplng sets, called tranng set and testng set respectvely, are generated. he tranng set s created by Latn hypercube samplng [2], whch pcs the most mportant samples based on statstcal analyss; ths s used for coeffcent fttng. For testng and comparson, we collect 500 random samples as the testng set and use them to measure the modelng error. All numercal experments are performed on a SUN GHz server. 4. ISCAS 89 S27 Fg. 6. Longest path n ISCAS 89 S27. We create a physcal mplementaton for the ISCAS 89 S27 benchmar crcut usng the S CMOS 90 nm process. Gven a set of fxed gate szes, the longest path delay n the benchmar crcut (shown n Fg. 6) s a functon of the process varatons (e.g., V H, OX, L, etc.). Snce the crcut only conssts of sx gates whch can be put close to each other n the layout, nter-de varatons wll domnate over ntra-de varatons, and gate delays wll domnate over (local) nterconnect delays n ths example. herefore, for smplcty, we only consder nter-de varatons for CMOS transstors n ths example. he probablty dstrbutons and the correlaton nformaton of the nter-de transstor varatons are obtaned from the S Mcroelectroncs desgn t. After PCA analyss, 6 prncpal random factors are dentfed to 724

represent these process varatons. We should note, however, that nothng precludes us from ncludng more detaled ntra-de and/or nterconnect varaton models n PROBE as well. A. Robust Convergence of Implct Power Iteraton In order to test the convergence of the proposed mplct power teraton algorthm, we pc 00 random ntal vectors Q 0 and use them for runnng power teraton n coeffcent fttng. We fnd that all 00 experments relably converge wthout a sngle falure. observe that the number of tranng samples should be around 3~4 tmes greater than the number of unnown coeffcents to avod over-fttng. Further ncreasng the sze of tranng set does not have a sgnfcant mpact on reducng fttng error. hs observaton mples that the requred number of tranng samples depends on the number of unnown coeffcents. As the unnown coeffcent number s reduced n PROBE, we not only decrease the computaton tme for coeffcent fttng, but also save a large porton of crcut smulaton cost because of the smaller tranng set. B. Modelng Accuracy 7% 6% Delay (Rsng) 5% Delay (Fallng) 4% 3% 2% % 0% Lne 2 3 4 5 6 Quad Approxmaton Ran (p) Fg. 7. Response surface modelng error for path delay. Fg. 7 shows the response surface modelng error when the path delays of both rsng and fallng transtons for the crcut are approxmated by the lnear, ran-p quadratc (by PROBE) and tradtonal full-ran quadratc models. All response surface models are ftted usng 578 tranng samples. It s shown n Fg. 7 that as p ncrease, the ran-p modelng error asymptotcally approaches the full-ran quadratc modelng error. However, after p > 2, further ncreases n p do not have a sgnfcant mpact on reducng error. It, n turn, mples that a ran-2 model, nstead of the full-ran quadratc model wth ran 6, s suffcently accurate n ths example. Error 4.2 Low Nose Amplfer PROBE Fttng Error (%) Lnear Fttng Error (%) 0 -.3 0 -.5 0 -.7 0 -.9 2 4 6 8 Rato (ranng Sample # / Unnown Coeffcent #) (a) 0-0 -2 F0 S S2 S2 S22 NF IIP3 Power F0 S S2 S2 S22 NF IIP3 Power Fg. 8. Crcut schematc for LNA. As a second example, we consder a low nose amplfer desgned n the IBM BCMOS 0.25 µm process, as shown n Fg. 8. In ths example, the varatons on both MOS transstors and passve components (resstors, capactors and nductors) are consdered. he probablty dstrbutons and the correlaton nformaton of these varatons are provded n the IBM desgn t. After PCA analyss, 8 prncpal factors are dentfed to represent the process varatons. A. Effect of ranng Set Sze Fg. 9 shows the relaton between the modelng error and the tranng set sze for three modelng approaches. From Fg. 9 we Quadratc Fttng Error (%) 0-3 2 4 6 8 Rato (ranng Sample # / Unnown Coeffcent #) (b) 0-0 -2 0-3 F0 S S2 S2 S22 NF IIP3 Power 2 4 6 8 Rato (ranng Sample # / Unnown Coeffcent #) (c) Fg. 9. Effect of the tranng set sze for LNA. (a) Lnear fttng error. (b) Ran-one PROBE fttng error. (c) Full-ran quadratc fttng error. 725

B. Modelng Accuracy and Cost able compares the fttng errors for the lnear, ran-one PROBE and full-ran quadratc models. As we would expect, the ran-one PROBE modelng error s smaller than the lnear modelng error, but larger than the full-ran quadratc modelng error. able 2 compares the response surface modelng cost for these three modelng approaches. he tranng set sze n able 2 s selected to be suffcently large to avod over-fttng. Snce the ran-one PROBE model contans substantally fewer unnown coeffcents and, therefore, requres much less tranng samples than the full-ran quadratc model, PROBE acheves 2.6x speedup n smulaton cost due to the smaller tranng set. Compared wth the smulaton cost, the fttng cost s almost neglectable n ths example, snce the problem sze s small and solvng the overdetermned lnear equatons only taes a few seconds for all performance metrcs. able. Response surface modelng error for LNA Performance Lnear PROBE Quad (Ran-) (Ran-8) F0.04% 0.25% 0.% S 3.04% 0.8% 0.79% S2 2.39% 0.84% 0.77% S2 2.35%.28% 0.22% S22 2.72% 2.68%.80% NF.64% 0.97% 0.9% IIP3 2.55%.07% 0.46% Power 2.6% 0.47% 0.4% able 2. Response surface modelng cost for LNA Performance Lnear PROBE Quad (Ran-) (Ran-8) Unnown Coeff # 9 7 45 ranng Sample # 36 68 80 Smulaton Cost (Sec.) 2620 4949 300 able and able 2 reveal an mportant fact that PROBE can easly facltate the tradeoff between accuracy and cost durng response surface modelng. radtonally, f the lnear model cannot provde suffcent accuracy, the full-ran quadratc model s mmedately utlzed whch mght provde over-accurate results and requre expensve modelng cost. PROBE, however, offers an ntermedate step between lnear modelng and full-ran quadratc modelng. Dependng on the modelng accuracy requrement, PROBE can teratvely select a correct p value and create a ran-p model. In ths example, the ran-one PROBE model already provdes suffcent accuracy, as shown n able. 4.3 Scalng wth Problem Sze Next, we consder a two-stage folded-cascode operatonal amplfer desgned n the IBM BCMOS 0.25 µm process, as shown n Fg. 0. In ths example, 49 prncpal random factors are extracted by PCA analyss to represent the process varatons, ncludng both nter-de varatons and devce msmatches. he probablty dstrbutons and the correlaton nformaton of these random varatons are obtaned from the IBM desgn t. Due to the ncluson of msmatches, the problem sze becomes sgnfcantly larger n ths example. However, modelng msmatches s extremely mportant for the Op Amp n Fg. 0, snce the devce msmatches can sgnfcantly mpact the performance of the nput dfferental par. able 3. Response surface modelng error for Op Amp Performance Lnear PROBE (Ran-) Quad (Ran-49) Gan 4.20% 2.00%.74% Offset 24.83% 0.28% 9.09% UGF.23% 0.48% 0.48% Gan Margn.03% 0.55% 0.55% Phase Margn.20% 0.44% 0.44% Slew Rate (+) 0.92% 0.93% 0.70% Slew Rate ( ).38% 0.53% 0.48% Power.05% 0.77% 0.68% able 4. Performance Response surface modelng cost for Op Amp Lnear PROBE (Ran-) Quad (Ran-49) Unnown Coeff # 50 99 275 ranng Sample # 200 396 500 Smulaton Cost (Sec.) 7.88 0 3.56 0 4 2.0 0 5 Fttng Cost (Sec.) 2.68 54.3 592.06 able 3 compares the response surface modelng errors for three dfferent approaches: lnear approxmaton, ran-one approxmaton by PROBE and tradtonal full-ran approxmaton. As we would expect, the Op Amp offset s strongly nonlnear n devce msmatches. herefore, the smple lnear approxmaton yelds an extremely large error (.e., 24.83%) as shown n able 3. Compared wth the lnear modelng, both the ran-one PROBE modelng and the full-ran quadratc modelng acheve more than 2x error reducton. Although hgher-order (e.g., cubc) response surface models can be appled to further mprove the accuracy, these hgher-order models are rarely utlzed n practcal applcatons as they wll nevtably lead to an unaffordable computaton cost. able 4 shows the response surface modelng cost for these three approaches. he tranng set sze n able 4 s selected to be suffcently large to avod over-fttng. As shown n able 4, whle the full-ran quadratc modelng taes more than 2 days to generate all tranng samples, PROBE reduces the smulaton cost to 4.3 hours (2x smaller). In addton, 96x addtonal speedup s acheved by PROBE for coeffcent fttng (.e., solvng the unnown model coeffcents) compared wth the full-ran quadratc modelng, although the fttng cost s not the domnant one n ths example. Fg. 0. Crcut schematc of a two-stage Op Amp. 5. Conclusons We propose a novel proecton-based extracton approach, PROBE, for quadratc response surface modelng of crcut performances wth consderaton of both nter-de and ntra-de process varatons. PROBE utlzes a new proecton scheme to 726

facltate the tradeoff between modelng accuracy and cost. In addton, a novel mplct power teraton algorthm s developed to fnd the optmal proecton space and solve the unnown model coeffcents. By usng the proposed mplct power teraton algorthm, PROBE sgnfcantly reduces the modelng cost (.e., both the requred number of samples and the lnear equaton sze), thereby facltatng scalng to much larger problem szes. As demonstrated by numercal examples n ths paper, PROBE can generate accurate response surface models and acheve up to 2x speedup compared wth the tradtonal quadratc modelng approach. he response surface models generated by PROBE can be ncorporated nto a statstcal analyss/optmzaton envronment for accurate and effcent yeld analyss/optmzaton. 6. Acnowledgements hs wor was funded n part by the MARCO Focus Center for Crcut & System Solutons (C2S2, www.c2s2.org) under contract 2003-C-888. 7. References [] S. Nassf, Modelng and analyss of manufacturng varatons, IEEE CICC, pp. 223-228, 200. [2] M. Orshansy; L. Mlor and C. Hu, Characterzaton of spatal ntrafeld gate CD varablty, ts mpact on crcut performance, and spatal mas-level correcton, IEEE rans. Semconductor Manufacturng, vol. 7, no., pp. 2-, Feb. 2004. [3] R. Myers and D. Montgomery, Response Surface Methodology: Process and Product Optmzaton Usng Desgned Experments, Wley-Interscence, 2002. [4] C. Mchael and M. Ismal, Statstcal modelng of devce msmatch for analog MOS ntegrated crcuts, IEEE Journal of Sold-State Crcuts, vol. 27, no. 2, pp. 54-66, Feb. 992. [5] Z. Wang and S. Drector, An effcent yeld optmzaton method usng a two step lnear approxmaton of crcut performance, IEEE EDAC, pp. 567-57, 994. [6] A. Dharchoudhury and S. Kang, Worse-case analyss and optmzaton of VLSI crcut performance, IEEE rans. CAD, vol. 4, no. 4, pp. 48-492, Apr. 995. [7] X. L, J. Le, P. Gopalarshnan and L. Plegg, Asymptotc probablty extracton for non-normal dstrbutons of crcut performance, IEEE ICCAD, pp. 2-9, 2004. [8] G. Golub and C. Loan, Matrx Computatons, he Johns Hopns Unv. Press, 996. [9] G. Seber, Multvarate Observatons, Wley Seres, 984. [0] K. Low and S. Drector, An effcent methodology for buldng macromodels of IC fabrcaton processes, IEEE rans. CAD, vol. 8, no. 2, pp. 299-33, Dec. 989. [] J. Fredman and W. Stuetzle, Proecton pursut regresson, Journal of the Amercan Statstcal Assocaton, vol. 76, no. 376, pp. 87-823, 98. [2] M. Mcay, R. Becman and W. Conover, A comparson of three methods for selectng values of nput varables n the analyss of output from a computer code, echnometrcs, vol. 2, no. 2, pp. 239-245, May. 979. 727