An Interconnect-Centric Design Flow for Nanometer Technologies

Size: px

Start display at page:

Download "An Interconnect-Centric Design Flow for Nanometer Technologies"

Jonathan Rodgers
6 years ago
Views:

An Interconnect-Centric Design Flow for Nanometer Technologies Professor Jason Cong <cong@cs.ucla.

1 An Interconnect-Centric Design Flow for Nanometer Technologies Professor Jason UCLA Computer Science Department Los Angeles, CA /~cong Gate Delays vs. Interconnect Delays Source: National Technology Roadmap of Semiconductors (1997) VLSI-TSA'99 Jason Cong 2

2 Interconnect-Centric Design Methodology Proposed transition interconnect device device interconnect device/function centric Analogy Data/Objects Programs interconnect/communication centric Programs Data/Objects VLSI-TSA'99 Jason Cong 3 Interconnect-Centric Design Flow Key steps in an interconnect-centric centric design flow: Interconnect Planning Interconnect Synthesis Interconnect Layout Other supporting tools to enable an interconnect- centric design flow Interconnect performance estimation Interconnect performance verification VLSI-TSA'99 Jason Cong 4

3 Outline of the Talk Interconnect Synthesis Interconnect Performance Estimation Interconnect Planning VLSI-TSA'99 Jason Cong 5 Constraints: Delay Skew Signal integrity... Interconnect Synthesis Optimized interconnect designs: Topology Sizing Spacing Buffer insertion Automatic solutions guided by accurate interconnect models VLSI-TSA'99 Jason Cong 6

4 Example: Single-Net Optimal Wire Sizing (OWS) [Cong-Leung, ICCAD 93] Given: A set of possible wire widths { W, W 1 2,, W r } Find: An optimal wire width assignment to minimize distributed RC delay Wiresizing Optimization Example: Global Interconnect Sizing and Spacing (GISS) [Cong et al, ICCAD 97] Sizing Given: Initial layout of multiple nets Critical sinks and their criticalities Capacitance model and design rules Output: Spacing Sizing and spacing of every net to minimize RC delays with consideration of coupling cap. VLSI-TSA'99 Jason Cong 8

5 Capacitance Model C x Ca c f 2.5D capacitance model [Cong Cong et al, DAC 97] Consider: C (area), C a f (fringing) and C x (coupling) Build capacitance table from 3D field solver (FastCap) Table lookup by interpolation and extrapolation Main Approaches to GISS Heuristic: Optimize one net at a time: bottom- up dynamic programming (optimal for one net) Better approach: Compute upper and lower bounds of opt. wire widths/spacings spacings of all nets Extended local refinement (ELR) using generalized CH-posynomial formulation Or iterative bound refinement (BR) In practice, lower and upper bounds meet most of time => optimal solution. VLSI-TSA'99 Jason Cong 10

6 GISS Optimization Results 16-bit 10mm bus structure equally spaced, with 5 different centerspacings from 2x to 5x min. pitch pitch = min. width + min.spacing Center spacing Average Delays(ns) MIN OWS GISS/S GISS/M 2 x pitch (-17%) 0.82 (-46%) 0.76 (-50%) 3 x pitch (-45%) 0.56 (-58%) 0.50 (-62%) 4 x pitch (-64%) 0.45 (-65%) 0.40 (-69%) 5 x pitch (-70%) 0.37 (-70%) 0.35 (-72%) For non-equal net weights, GISS/M shall have more advantage than GISS/S VLSI-TSA'99 Jason Cong 11 UCLA TRIO Package (Tree, Repeater, Interconnect Optimization) Synthesis/optimization capabilities Interconnect topology optimization Optimal buffer insertion Wiresizing optimization Global interconnect sizing and spacing Simultaneous driver, buffer, and interconnect sizing Simultaneous topology generation with buffer insertion and wiresizing... Efficient polynomial-time optimal/near-optimal algorithms Interconnect performance can be improved by up to 7x! Available on the web: Demo at DAC 99 VLSI-TSA'99 Jason Cong 12

7 Impact of Interconnect Optimization --For a 2cm Global Interconnect Using the TRIO Package Delay (ns) Technology (u m) 2cm DS 2cm BIS 2cm BISWS DS: Driver Sizing only BIS: Buffer Insertion and Sizing BISWS: Simultaneous Buffer Insertion/Sizing and Wiresizing 5x ~ 7x performance improvement! Interconnect Synthesis in Layout Design Flow Chip-planning, Floorplaning, Global Int. Planning and Optimization Timing Driven Placement Delay Budgeting Performance Driven Global Routing with Interconnect Optimization Detailed Routing with Variable Width and Spacing Topology Optimization Buffer Buffer Insertion Device Device Sizing Sizing Wiresizing Interconnect Optimizations Library (e.g. TRIO) VLSI-TSA'99 Jason Cong 14

8 Outline of the Talk Interconnect Synthesis Interconnect Performance Estimation Interconnect Planning VLSI-TSA'99 Jason Cong 15 Interconnect Performance Estimation G 0 G S 1 C s1 Sn Input S 2 C sn C s2 Problem: Estimate the optimized interconnect delay, area, etc., without actually running the optimization algorithms (such as TRIO)!

9 Needs for Interconnect Performance Estimation Models Efficiency need to explore many micro-architectures/ architectures/floorplans => require to process > 1 million nets/second cannot afford actual synthesis/optimization ( nets/second) Abstractionto to hide detailed design information granularity of wire segmentation number of wire widths, buffer sizes,... Explicit relationto to enable optimal design decision at high levels Result: very efficient (constant-time) time) estimation models for various interconnect optimization operations Example: Delay/Area Estimation under OWS Closed-form delay estimation formula T ows where α 1l 2α 1l ( Rd, l, CL) = + + Rdcf + 2 W ( α 2l) W( α 2l) 1 α 1 = 4 rca, 1 α 2 = 2 rc a RdCL W(x) is Lambert s W function defined as we Closed-form area estimation formula Rdrcacfl l w = x A ows ( Rd, l, CL) = r( cf l + 2C 2Rd ca L ) l

10 Delay Comparison of OWS model vs. TRIO ns Model TRIO length(um) OWS delay model consistently matches TRIO. 0.10um technology from NTRS 97. Driver is 100x min. To run TRIO, 40 discrete wire widths are used with the max width set to be 40x min width. VLSI-TSA'99 Jason Cong 19 Average Width (Area) Comparison width(um) length(um) Model TRIO Area estimation model for OWS almost exactly matches TRIO. VLSI-TSA'99 Jason Cong 20

11 Example: Delay Estimation Model for BIWS Problem: estimate interconnect delay with optimal buffer insertion and wire sizing (BIWS) Critical length for BIWS: threshold length over which buffer insertion provides additional delay reduction over optimal wire-sizing (OWS) Critical length for BIWS can be computed efficiently Critical Lengths of Un-Buffered Wires Technology (um) b=10x b=50x b=100x b=200x b=500x unit: mm With optimal wire sizing [Cong-Pan, IWLS 98/ASP-DAC 99] Min. WS Without wire sizing [Otten ISPD 98, Otten-Brayton DAC 98]

12 Example: Delay Estimation Model for BIWS (Cont d) Linear delay estimation model for BIWS: τbiws T biws = τbiws l + is the slope, and can be obtained from optimal wire sizing for critical length t g Comparison of BIWS Model vs. TRIO Delay Modeling Model TRIO ns length(um) n R d0 = r g /10, C L = c g x 10, buffer type is 100 x min. n For expt., max. wire width is 20x min. width, wire is segmented in every 100um.

13 Outline of the Talk Interconnect Synthesis Interconnect Performance Estimation Interconnect Planning VLSI-TSA'99 Jason Cong 25 Interconnect Planning Interconnect architecture planning (pre-design) Decide within freedom of fabrication technology: number of routing layers metal and isolation material at each layer thickness of each metal and isolation layer nominal width and spacing on each layer vertical interconnection schemes (via structure?)... Interconnect planning with RTL-floorplan Interconnect planning with physical-level level floorplan VLSI-TSA'99 Jason Cong 26

14 Interconnect Planning (cont d) Interconnect architecture planning (pre-design) Interconnect planning with RTL-floorplan Define global and local interconnects Estimate overall interconnect distribution Guide RTL-level and logic-level level synthesis/optimization Re-partition of design hierarchy Logic replication Retiming and pipelining... Interconnect planning with physical-level level floorplan VLSI-TSA'99 Jason Cong 27 Interconnect Planning (cont d) Interconnect architecture planning (pre-design) Interconnect planning with RTL-floorplan Interconnect planning with physical-level level floorplan Interconnect topologies Wire ordering Wire width and spacing Number of buffers and their locations VLSI-TSA'99 Jason Cong 28

15 Example: Optimal Wire-Width Planning Given: Certain technology Wire length distribution per layer Find: A small set of globally optimal widths per layer Performance/Area optimization Motivation Simplify interconnect optimization Simply detailed routing, layout extraction,... VLSI-TSA'99 Jason Cong 29 Overall Flow For each metal layer i Assign length range l min and l max ; Find a small set of optimal widths W to minimize l max r r Φ( W, l min, l max) = λ ( l) f ( W, l) dl l h f(w, l): the objective function to be minimized by the design for wire length l, using W hl (l): the weight function for wire lengthl Method: Analytical or numerical min

16 Objective in Our Study r r r j k f ( W, l) = A ( W, l) T ( W, l) A: area T: delay or r r f ( W, l) = T( W, l) f ( W r, l) A( W r, l) T ( W r 4 =, l) (performance only) (performance-driven and area-saving) Recommendation for Future Tech. 2-width design under objective function of AT 4 Wiring hierarchy for both performance and density! Technology (um) Tier1 Range (mm) W (um) Strawman Tier2 Range(mm) [Otten- W1(um) Brayton, W2(um) DAC 98] Tier3 Range(mm) W1(um) um W2(um) Tier4 Range(mm) W1(um) W2(um) um

17 Two Simple Wire Sizing Schemes 2.5 ns Tier1-1WS Tier1-2WS Tier1-OWS Tier4-1WS 0.5 Tier4-OWS length(um) 1-WS and 2-WS have less than 10% difference from OWS for length <4mm in Tier1 Both 1-WS and 2-WS work well in Tier4 up to chip size A Performance-Driven, Area-Saving Metric Opt. width for AT 4. Only increase delay by 10%, save area by 60%! metric AT^2 AT T AT^4 AT^3 width(um) Optimal width for delay T um tech; - Top layer pair; - Length range 8-23 mm; - Assume uniform distribution; - Metric: integral of T, AT, AT 2,, AT 4 - Driver/load 100x min gate

18 Experimental Setting For each metal pair (tier), assume certain wire length range Assume the max length in tier1 is 10,000x feature size, and top tier is L edge (chip dimension) [Fisher+ 98] Intermediate tier length range follows a geometric sequence mm Representative driver size for each metal layer (10x, 40x, 100x, and 250x for tiers 1-4) A Rather Surprising Result: 2 Widths /Per Layer are Sufficient! [DAC 99] pitch-sp=2um pitch-sp=2.9um pitch-sp=3.8um scheme avg-d max-erravg-w avg-d max-err avg-w avg-d max-err avg-w 1-width % % % width % % % 1.41 m-width % % % 1.38 Assumptions: 0.10 um process, layers 7&8 ( mm), under AT 4 metric, limited driver size variation size per layer 2-width design superior than 1-width delay reduction up to 12.4% area saving up to 48%! 2-width design comparable to many-width Avg. delay less than 5% and Max. delay less than 7% Area difference less than 4.7%

19 Paradigm shift Summary Device/function-centric centric => interconnect/communication-centriccentric Key components in an interconnect-centric centric design flow Interconnect planning Interconnect synthesis Interconnect layout Also need estimation, simulation, and verification tools at each stage for interconnect performance and signal integrity VLSI-TSA'99 Jason Cong 37 Acknowledgements Thanks for the supports from Semiconductor Research Corporation (SRC) National Science Foundation (NSF) Defense Advanced Research Project Agency (DARPA) Intel Corporation More information: /~cong VLSI-TSA'99 Jason Cong 38

20 Logic Volume within critical lengths - Defined as the number of min 2-input NAND gates that can be packed within the area of l c /2 * l c /2 Technology (um) NAND (um 2 ) b=10x b=50x b=100x b=200x b=500x unit: million Another Examp: Buffer Block Planning Buffer Blocks Logic Blocks Problem: automatically generates buffer blocks during physical-levellevel floorplan Motivation: Avoid buffer over hard IP-blocks Power/ground network sharing among buffers More regular layout, etc. VLSI-TSA'99 Jason Cong 40

21 Experimental Result: Number of BB Circuit RDM/RES RDM/FR BBP/RES BBP/FR Apte Xerox Hp Ami Ami playout RDM: a buffer is randomly assigned to a feasible location BBP: buffers are clustered appropriately RES: Restricted (delay-minimal) buffer insertion point FR: feasible buffer region for delay constraints Our buffer block planning (B -P) algorithm can reduce the number of buffer blocks to 1/10~1/20 of those from RDM VLSI-TSA'99 Jason Cong 41 Interconnect Layout Need a multi-layer layer general-area area router gridless flexible (variable widths within the same segment, variable spacings for each pair of nets) efficient Will leverage our current research on gridless routing Use of implicit graph representation Use of computational geometry techniques Highly scalable and flexible VLSI-TSA'99 Jason Cong 42

Interconnect Delay and Area Estimation for Multiple-Pin Nets

Interconnect Delay and Area Estimation for Multiple-Pin Nets Jason Cong and David Z. Pan UCLA Computer Science Department Los Angeles, CA 90095 Sponsored by SRC and Avant!! under CA-MICRO Presentation