Section IV: Timing Closure Techniques. June 2002 DAC02 Physical Chip Implementation 1

Size: px
Start display at page:

Download "Section IV: Timing Closure Techniques. June 2002 DAC02 Physical Chip Implementation 1"

Transcription

1 Section IV: Timing Closure Techniques June 22 DAC2 Physical Chip Implementation

2 IBM Contributions to this presentation include: T.J. Watson Research Center Austin Research Lab ASIC Design Centers EDA Organization * For more detailed information see references at the end of this presentation, which include a wide variety of IBM and External publications covering these areas. June 22 DAC2 - Physical Chip Implementation 2

3 Overview Introduction Review material (timing and synthesis) Introduction to placement Placement algorithms Paradigms for placement-synthesis integration Placement aware synthesis techniques Congestion avoidance / mitigation techniques Routing optimization June 22 DAC2 - Physical Chip Implementation 3

4 Timing Closure Many aspects of a design contribute to performance, power, and density Architecture / Logic Implementation PD Design Style (Flat, Hierarchical, etc) Clocking Paradigm / Test / Circuit Family Floor Plan / Synthesis / Placement / Routing Design Automation for timing closure is more significant than ever before Designs are larger Wires are longer, invalidating statistical synthesis models, and requiring lots of buffers Cycle times are more aggressive June 22 DAC2 - Physical Chip Implementation 4

5 Design Automation Tools are Individually Mature Timing analysis Synthesis / Technology mapping Placement / Routing Floor Planning Extraction / Analysis June 22 DAC2 - Physical Chip Implementation 5

6 Challenge is to integrate them into one cooperative application Netlist in Physical Synthesis Place&route synthesis timing Completed Design June 22 DAC2 - Physical Chip Implementation 6

7 Design Flow Evolution: Design Entry Synthesis w/timing Timing Integrated Place Driven driven Placement, placement and Synthesis plus & Automatic Routing Post placement tuning Route. Tech independent optimization 2. Tech mapping 3. Timing correction. Physically aware optimizations 2. Physically aware timing correction 3. Timing / Noise aware routing Timing June 22 DAC2 - Physical Chip Implementation 7

8 Purpose of this Section: Provide users with an intuitive feel of the inner workings of the major timing closure tools Demonstrate the advancements in timing closure tools technology via example designs Explore a variety of significant design choices June 22 DAC2 - Physical Chip Implementation 8

9 What you should expect: High level concepts presented are generally applicable across a wide range of tools / methodologies (ie: not IBM specific) Specific tool internals used in this tutorial are taken from IBM tools. They should provide a reasonable feel as to how things are done in the industry. June 22 DAC2 - Physical Chip Implementation 9

10 Worldwide ASIC/PLD Sales Top 5 Suppliers for 2 IBM $ 2758 growth.2% Agere $ 3 growth -43.5% LSI $ 243 growth -38.2% NEC $ 243 growth -35.2% XLIINX $ 49 growth -26.3% Revenue: Millions of U.S. Dollars Source: Gartner Dataquest (March 22) June 22 DAC2 - Physical Chip Implementation

11 IBM ASIC Supplier # since 999 Dataquest NEC NEC Lucent IBM IBM IBM 2 LSI Logic IBM IBM Lucent Lucent Lucent 3 Fujitsu Lucent NEC NEC LSI Logic LSI Logic 4 Lucent Fujitsu LSI Logic LSI Logic NEC NEC 5 IBM LSI Logic Fujitsu Fujitsu Xilinx Xilinx 6 TI TI Altera Xilinx Fujitsu Fujitsu 7 Toshiba VLSI Xilinx Altera Altera Altera 8 Xilinx Toshiba TI STM Toshiba Toshiba 9 Hitachi Altera Toshiba VLSI STM Mitsubishi VLSI Xilinx VLSI TI TI Agilent June 22 DAC2 - Physical Chip Implementation

12 Section Outline Introduction Review material (timing and synthesis) Introduction to placement Placement algorithms Paradigms for placement-synthesis integration Placement aware synthesis techniques Congestion avoidance / mitigation techniques Routing optimization June 22 DAC2 - Physical Chip Implementation 2

13 Static Timing Analysis June 22 DAC2 - Physical Chip Implementation 3

14 Timing Analysis Basics: Why static timing since simulation is more accurate? Simulation has a number of key drawbacks requires input state vectors long runtimes A simple example: a b c z How would one calculate the worst case rising delay from a to z? c= c= b= a-z delay a-z delay2 b= a-z delay3 a-z delay4 Exponential explosion as possible design input states grow! June 22 DAC2 - Physical Chip Implementation 4

15 Timing Analysis Basics: Definition of basic terms -Arrival time(at) -- the time at switch a pin switches state -Slew - the rate at which a signal switches usually difference of % and 9% on voltage curve vdd time slew = time9-9 time 5 AT = time5 -Required arrival time(rat) -- the time a signal must arrive at in order to avoid a chip fail -Slack = Required arrival time - Arrival time Positive slack good, negative slack bad June 22 DAC2 - Physical Chip Implementation 5

16 Timing Analysis Basics: Block based timing: Worst value only stored at merge points Each segment is processed just once Example Problem: What is slack at PO? at= at= d= d=2 at= at=2 d=2 d=3 temp at=3 at=5 d= at=6 at=5 d= d=3 temp at=7 at=8 d=3 at= rat= at= d=5 Slack= - June 22 DAC2 - Physical Chip Implementation 6

17 Timing Analysis Basics: What is Incremental Timing? Enabling small incremental changes without full retiming Only direct fanin/fanout cone is processed at= at= at= d= d=2 at= at= 2 d=2 d=3 at= d=5 d= at= 5 at= d= at= at=3 at= 5 6 d= d=3 at = at= 7 8 d=3 at= at = rat= 2 d= slack= d= it passed! June 22 DAC2 - Physical Chip Implementation 7

18 Early Mode Analysis Definitions change as follows longest becomes shortest slack = arrival - required SL a AT a AT b SL b = = = = a b = = c SL y AT y y = = = AT c = SL c = = RAT x = 2 x AT x = SL x = 2 = June 22 DAC2 - Physical Chip Implementation 8

19 Timing Correction Fix electrical violations Resize cells Buffer nets Copy (clone) cells Fix timing problems Local transforms (bag of tricks) Path-based transforms June 22 DAC2 - Physical Chip Implementation 9

20 Local Synthesis Transforms Resize cells Buffer or clone to reduce load on critical nets Decompose large cells Swap connections on commutative pins or among equivalent nets Move critical signals forward Pad early paths Area recovery June 22 DAC2 - Physical Chip Implementation 2

21 Transform Example.. Double Inverter Removal.... Delay = 4 Delay = 2 June 22 DAC2 - Physical Chip Implementation 2

22 Resizing a b? d e f d a b A load A B C a b C.26 June 22 DAC2 - Physical Chip Implementation 22

23 Cloning d load A B C d.2 d a b? e f g h a b A B e f g h June 22 DAC2 - Physical Chip Implementation 23

24 Buffering d load A B C a b d.2 e.2 a f?.2 b g.2 h.2 B. B d e.2.2 f g h June 22 DAC2 - Physical Chip Implementation 24

25 Redesign Fan-in Tree Arr(a)=4 Arr(b)=3 Arr(c)= Arr(d)= a b c d e Arr(e)=6 c d b a e Arr(e)=5 June 22 DAC2 - Physical Chip Implementation 25

26 Redesign Fan-out Tree Longest Path = 5 Longest Path = 4 Slowdown of buffer due to load June 22 DAC2 - Physical Chip Implementation 26

27 Decomposition June 22 DAC2 - Physical Chip Implementation 27

28 Swap Commutative Pins a 5 b 2 c 2 Simple Sorting on arrival times and delay works 2 a b c 3 2 June 22 DAC2 - Physical Chip Implementation 28

29 Move Critical Signals Forward a b c d a b d c e e Based on ATPG linear in circuit size Detects redundancies efficiently Efficiently find wires to be added and remove. Based on mandatory assignments. June 22 DAC2 - Physical Chip Implementation 29

30 Section outline Introduction Review material (timing and synthesis) Introduction to placement Placement algorithms Paradigms for placement-synthesis integration Placement aware synthesis techniques Congestion avoidance / mitigation techniques Routing optimization June 22 DAC2 - Physical Chip Implementation 3

31 Placement Objective: Find optimal relative ordering of cells minimize wire length and congestion maximize timing slack Find optimal spacing of cells eliminate wiring congestion problems provide space for post placement synthesis clock trees buffer insertion timing correction Find optimal Global Position June 22 DAC2 - Physical Chip Implementation 3

32 Optimal Relative Order: A B C June 22 DAC2 - Physical Chip Implementation 32

33 To spread... A B C June 22 DAC2 - Physical Chip Implementation 33

34 .. or not to spread A B C June 22 DAC2 - Physical Chip Implementation 34

35 Place to the left A B C June 22 DAC2 - Physical Chip Implementation 35

36 or to the right A B C June 22 DAC2 - Physical Chip Implementation 36

37 Optimal Relative Order: A B C Without free space the problem is dominated by order June 22 DAC2 - Physical Chip Implementation 37

38 Placement Footprints: Standard Cell: Data Path: IP - Floorplanning June 22 DAC2 - Physical Chip Implementation 38

39 Placement Footprints: Core Reserved areas IO Control Mixed Data Path & sea of gates: June 22 DAC2 - Physical Chip Implementation 39

40 Placement Footprints: Perimeter IO Area IO June 22 DAC2 - Physical Chip Implementation 4

41 Placement objectives are subject to User Constraints / Design Style: Hierarchical Design Constraints pin location power rail reserved layers Flat Design w/floor Plan constraints Fixed circuits IO connections June 22 DAC2 - Physical Chip Implementation 4

42 Unconstrained Placement June 22 DAC2 - Physical Chip Implementation 42

43 Floor planned Placement June 22 DAC2 - Physical Chip Implementation 43

44 Congestion MAP June 22 DAC2 - Physical Chip Implementation 44

45 Advantages of Hierarchy Design is carved into smaller pieces that can be worked on in parallel (improved throughput) A known floor plan provides the logic design team with a large degree of placement control. A known floor plan provided early knowledge of long wires Timing closure problems can be addressed by tools, logic design, and hierarchy manipulation Late design changes can be done with minimal turmoil to the entire design June 22 DAC2 - Physical Chip Implementation 45

46 Disadvantages of Hierarchy Results depend on the quality of the hierarchy. The logic hierarchy must be designed with PD taken into account. Additional methodology requirements must be met to enable hierarchy. Ex. Pin assignment, Macro Abstract management, area budgeting, floor planning, timing budgets, etc Late design changes may affect multiple components. Hierarchy allows divergent methodologies Hierarchy hinders DA algorithms. They can no longer perform global optimizations. June 22 DAC2 - Physical Chip Implementation 46

47 Physical Synthesis Flow Wire-load Models Synthesized Netlist Unplaced Physically unaware timing Cleanup: Remove buffers, nominal power levels on gates Initial basic placement For minimal wire-length, min-cut, Steiner tree estimates, physically aware timing Logical + Placement optimizations Physically aware logic optimizations Yes Timing No more Improvement? Placed Netlist Timing-driven placement w/resynthesis For minimal netweights, based on the timing of the net June 22 DAC2 - Physical Chip Implementation 47

48 Logical + Placement Optimizations Start with a placed or unplaced netlist Do recursive partitioning During and following each partition action, apply logic optimizations such as timing corrections rebuffering repowering cloning pin swapping move boxes etc Bin Cut June 22 DAC2 - Physical Chip Implementation 48

49 Section Outline Introduction Review material (timing and synthesis) Introduction to placement Placement algorithms Paradigms for placement-synthesis integration Placement aware synthesis techniques Congestion avoidance / mitigation techniques Routing optimization June 22 DAC2 - Physical Chip Implementation 49

50 Overview of Common Placement Algorithms: - Simulated Annealing - Quadratic Placement - Partitioning June 22 DAC2 - Physical Chip Implementation 5

51 Simulated Annealing: for(temp=high; temp > absolute_zero; temp -= increment) { make a random move score the move use temp dependent probability to decide to accept or reject } Note: Clustering can be use to improve performance June 22 DAC2 - Physical Chip Implementation 5

52 Annealing: Pros: - ease of implementation, dumb moves / smart scoring - can easily accommodate new constraints - just add them to the scoring function - great quality - can be made to run on parallel processors Cons: - very long run time June 22 DAC2 - Physical Chip Implementation 52

53 Quadratic Placement June 22 DAC2 - Physical Chip Implementation 53

54 Review: x= x=2 x Cost = (x ) 2 + (x x 2 ) 2 +(x 2 2) 2 x 2 x Cost = 2(x )+ 2(x x 2 ) x 2 Cost = 2(x x 2 ) + 2(x 2 2) setting the partial derivatives = we solve for the minimum Cost: Ax + B = x x = 2 x 2 x x=4/3 x2=5/3 = June 22 DAC2 - Physical Chip Implementation 54

55 Review: x= x=2 x x 2 setting the partial derivatives = we solve for the minimum Cost: Ax + B = x x = 2 x 2 x x=4/3 x2=5/3 = Interpretation of matrices A and B: The diagonal values A[i,i] correspond to the number of connections to xi The off diagonal values A[i,j] are if object i is connected to object j, otherwise The values B[i] correspond to the sum of the locations of fixed objects connected to object i June 22 DAC2 - Physical Chip Implementation 55

56 Why formulate the problem this way? Because we can Because it is trivial to solve Because there is only one solution Because the solution is a global optimum Because the solution conveys relative order information Because the solution conveys global position information June 22 DAC2 - Physical Chip Implementation 56

57 However: Solution is not legal Solution depends of fixed anchor points Solution does not minimize linear wire length, congestion, or timing Solution is generally highly overlapping w/ high density (ie needs to be spread out) June 22 DAC2 - Physical Chip Implementation 57

58 What does the solution look like? To get an intuitive feel for the solution, examine the relaxation method for solving Ax + B = Actual program implementation may use other solution methods (that are generally less intuitive). June 22 DAC2 - Physical Chip Implementation 58

59 Solution of Quadratic using Relaxation: June 22 DAC2 - Physical Chip Implementation 59

60 June 22 DAC2 - Physical Chip Implementation 6

61 Constrained Solutions: Sometimes we want to solve for the minimum wire length subject to a constraint Example: Using quadratic for partitioning, we may want the quadratic placement to be "centered" June 22 DAC2 - Physical Chip Implementation 6

62 June 22 DAC2 - Physical Chip Implementation 62

63 June 22 DAC2 - Physical Chip Implementation 63

64 Constrained Solutions To minimize Cost = f(x) subject to a constraint g(x) = we can use langrangian multipliers to modify the Cost function as follows: Co st = f(x) +2g(x) x C ost = x f(x) +2 x g(x) Using CG as a constraint: CG = n where: s is the size of object_i i= s i x i n i= s i n is the number of objects g(x ) = ( n i= s i x i n i= s i ) C G where we use N to represent the constant x g(x) = s i N n i= s i We have already shown that x f(x) = leads to the system of equations --- Ax + B = Therefore solving the constrained porblem = x C ost = x f(x) +2 x g(x) leads to: = Ax + B +2 s i N June 22 DAC2 - Physical Chip Implementation 64

65 Constrained Solutions (cont): To solve Ax + B + 2 s i = we could use a packaged solver and add the N additional unknown 2and equation CG = n to our matricies and i= s i x i N solve. Here is an alternative way to solve the system: by substitution we let x = x u+2 x l where is the unconstrained solution (ie the solution to Ax + B = ) xu Assuming we can solve the unconstrained problem, xu is known. By substitution we get: A(xu +2x l) + B +2 s i N = which becomes: A2x or l +2 s i N = Ax l + s i N = June 22 DAC2 - Physical Chip Implementation 65

66 Constrained Solutions (cont): We need to solve: A x l + s i N = Note: The A matrix is the same as the A matrix for the unconstrained solution. Since the A matrix is the netlist connectivity specification, we have A. The B matrix here is s i N instead of the sum of fixed location connects. Intrepretation: The solution to s A x l + i N = and placement such that: can be obtained by modifying the original netlist.) All fixed objects are moved x = 2.) A constant force vector is applied to each object. The constant force vector for the i th object has magnitude s i N Then use the same solver as was used to solve Ax + B = June 22 DAC2 - Physical Chip Implementation 66

67 Constrained Solutions (cont): We also need to solve for 2 From the CG relationship and x=x u +2x l we get: CG= n where i= si (xu +2x ) N N= n i li i= si (ie total size) since we have solved for and the only unkown is xu x l 2 we get: 2= NCG n i= si x ui n i= si x li June 22 DAC2 - Physical Chip Implementation 67

68 Constrained Solutions (summary): To minimize f(x) (the wl squared cost function) subject to a CG constraint we do the following:.) Solve for by solving using relaxation or some other method x u A x u + B = 2.) Solve for as follows: x l A x l + s i N = - Move all fixed objects to location= - Add a constant force vector to each object. The constant force vector for the i th object has magnitude s i N - Using rela xation or some other method, solve for x l 3.) Solve for 2 using 2= NC G n i= s i x u i n i= s i x l i 4.) Compute the final placement using x = x u + 2x l June 22 DAC2 - Physical Chip Implementation 68

69 Review: Force CG to 5 x= x=2 x s= x 2 s= From the previous example we know that the solution to: Axu + B = x = ( 33.33)+( 66.67) with this solution the CG is at (ie not 5) = Now we need to solve: Ax l + s i which is the same as solving -> N = Ax l + + s i N = Recall that the B matrix represents the position of fixed objects. So, this equation represents the solution to: x = x x2 / / June 22 DAC2 - Physical Chip Implementation 69

70 Constrained Solutions (summary): Advantages of this approach:.) The Solver data structure is the netlist only ie. no additional memory requirements 2.) Sometimes the unconstrained solution is by itself sufficient, therefore we can avoid the additional overhead of producing the constrained solution 3.) The numerical iterations in this method are NOT dependent on the CG. We can solve for xu and xl, then try many different CG points at very low cost. June 22 DAC2 - Physical Chip Implementation 7

71 Quadratic Techniques: Pros: - mathematically well behaved - efficient solution techniques find global optimum - great quality Cons: - solution of Ax + B = is not a legal placement, so generally some additional partitioning techniques are required. - solution of Ax + B = is that of the "mapped" problem, ie nets are represented as cliques, and the solution minimizes wire length squared, not linear wire length unless additional methods are deployed - fixed IOs are required for these techniques to work well June 22 DAC2 - Physical Chip Implementation 7

72 Partitioning June 22 DAC2 - Physical Chip Implementation 72

73 Partitioning: Objective: Given a set of interconnected blocks, produce two sets that are of equal size, and such that the number of nets connecting the two sets is minimized. June 22 DAC2 - Physical Chip Implementation 73

74 FM Partitioning: list_of_sets = entire_chip; while(any_set_has_2_or_more_objects(list_of_sets)) { for_each_set_in(list_of_sets) { partition_it(); } /* each time through this loop the number of */ /* sets in the list doubles. */ } Initial Random Placement After Cut After Cut 2 June 22 DAC2 - Physical Chip Implementation 74

75 FM Partitioning: Moves are made based on object gain. Object Gain: The amount of change in cut crossings that will occur if an object is moved from its current partition into the other partition - each object is assigned a gain - objects are put into a sorted gain list - the object with the highest gain from the smaller of the two sides is selected and moved. - the moved object is "locked" - gains of "touched" objects are recomputed - gain lists are resorted June 22 DAC2 - Physical Chip Implementation 75

76 FM Partitioning: June 22 DAC2 - Physical Chip Implementation 76

77 June 22 DAC2 - Physical Chip Implementation 77

78 June 22 DAC2 - Physical Chip Implementation 78

79 June 22 DAC2 - Physical Chip Implementation 79

80 June 22 DAC2 - Physical Chip Implementation 8

81 June 22 DAC2 - Physical Chip Implementation 8

82 June 22 DAC2 - Physical Chip Implementation 82

83 June 22 DAC2 - Physical Chip Implementation 83

84 June 22 DAC2 - Physical Chip Implementation 84

85 June 22 DAC2 - Physical Chip Implementation 85

86 June 22 DAC2 - Physical Chip Implementation 86

87 June 22 DAC2 - Physical Chip Implementation 87

88 June 22 DAC2 - Physical Chip Implementation 88

89 June 22 DAC2 - Physical Chip Implementation 89

90 Partitioning: Pros: - very fast - great quality - scales nearly linearly with problem size Cons: - non-trivial to implement - very directed algorithm, but this limits the ability to deal with miscellaneous constraints June 22 DAC2 - Physical Chip Implementation 9

91 FM Partitioning - For large designs min-cut (FM) produces poor results To Compensate, there are two widely used enhancements:.) Quadratic seeding 2.) Multi-Level partitioning June 22 DAC2 - Physical Chip Implementation 9

92 Partitioning: cut line move move3 move2 move4 June 22 DAC2 - Physical Chip Implementation 92

93 Global Placement - Multi-Level Partitioning: generate clusters: while(there are clusters) { partition_it; remove cluster layer; } partition_it; 2 move move3 2 move2 June 22 DAC2 - Physical Chip Implementation move4 93

94 move 2 2 move3 move2 June 22 DAC2 - Physical Chip Implementation move4 94

95 2 move move3 2 move2 June 22 DAC2 - Physical Chip Implementation move4 95

96 2 move 2 move3 move2 June 22 DAC2 - Physical Chip Implementation move4 96

97 2 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 97

98 2 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 98

99 2 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 99

100 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4

101 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4

102 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 2

103 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 3

104 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 4

105 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 5

106 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 6

107 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 7

108 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 8

109 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 9

110 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4

111 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4

112 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 2

113 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 3

114 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 4

115 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 5

116 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 6

117 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 7

118 move move3 move2 June 22 DAC2 - Physical Chip Implementation move4 8

119 MLP/FM Partitioning Cons: Does not know how to handle free space Results tend to be erratic, ie results from run to run have significant variation June 22 DAC2 - Physical Chip Implementation 9

120 MLP/FM Partitioning Pros: Handles designs that have no fixed connection points Very fast - can handle large designs June 22 DAC2 - Physical Chip Implementation 2

121 Hybrid Techniques Use both MLP and Quadratic techniques Results are more predictable due to quadratic cost function Partitioning is used for overlap removal Quadratic is used for free space handling and some relative order indications June 22 DAC2 - Physical Chip Implementation 2

122 Quadratic Partitioning June 22 DAC2 - Physical Chip Implementation 22

123 June 22 DAC2 - Physical Chip Implementation 23

124 June 22 DAC2 - Physical Chip Implementation 24

125 June 22 DAC2 - Physical Chip Implementation 25

126 June 22 DAC2 - Physical Chip Implementation 26

127 Analytical Constraint Generation Combine Quadratic techniques with MLP Use Quadratic solution to determine global position (ie balance) Use MLP to determine relative ordering of cells June 22 DAC2 - Physical Chip Implementation 27

128 Analytical Constraint Generation Capacity = 2 Capacity = 2 Quadratic Poor Solution solution ACG solution Area= Analytical constraint June 22 DAC2 - Physical Chip Implementation 28

129 Analytical Constraint Generation June 22 DAC2 - Physical Chip Implementation 29

130 June 22 DAC2 - Physical Chip Implementation 3

131 June 22 DAC2 - Physical Chip Implementation 3

132 June 22 DAC2 - Physical Chip Implementation 32

133 June 22 DAC2 - Physical Chip Implementation 33

134 June 22 DAC2 - Physical Chip Implementation 34

135 June 22 DAC2 - Physical Chip Implementation 35

136 June 22 DAC2 - Physical Chip Implementation 36

137 June 22 DAC2 - Physical Chip Implementation 37

138 June 22 DAC2 - Physical Chip Implementation 38

139 June 22 DAC2 - Physical Chip Implementation 39

140 June 22 DAC2 - Physical Chip Implementation 4

141 June 22 DAC2 - Physical Chip Implementation 4

142 June 22 DAC2 - Physical Chip Implementation 42

143 June 22 DAC2 - Physical Chip Implementation 43

144 June 22 DAC2 - Physical Chip Implementation 44

145 June 22 DAC2 - Physical Chip Implementation 45

146 June 22 DAC2 - Physical Chip Implementation 46

147 June 22 DAC2 - Physical Chip Implementation 47

148 June 22 DAC2 - Physical Chip Implementation 48

149 June 22 DAC2 - Physical Chip Implementation 49

150 MLP w/acg June 22 DAC2 - Physical Chip Implementation 5

151 Global Route Results: June 22 DAC2 - Physical Chip Implementation 5

152 MLP w/o ACG June 22 DAC2 - Physical Chip Implementation 52

153 Side by Side Comparison: Original ACG June 22 DAC2 - Physical Chip Implementation 53

154 June 22 DAC2 - Physical Chip Implementation 54

155 June 22 DAC2 - Physical Chip Implementation 55

156 Observations on Quadratic Placement placements are predictable and repeatable timing is inherently better wire length is not the best, but good run time: slower than MLP by 4x run time: faster than annealing by 4x excellent free space handling placements feel similar to those produced by annealing June 22 DAC2 - Physical Chip Implementation 56

157 Repeatability Example: One circuit Minimum linear length occurs for all solutions where y=5 < x < Minimum quadratic length occurs for y=5, x=5 Quadratic solution IS both minimum linear and minimum quadratic length (,5) (,) June 22 DAC2 - Physical Chip Implementation 57

158 Section Outline Introduction Review material (timing and synthesis) Introduction to placement Placement algorithms Paradigms for placement-synthesis integration Placement aware synthesis techniques Congestion avoidance / mitigation techniques Routing optimization June 22 DAC2 - Physical Chip Implementation 58

159 Synthesis - Placement Interface June 22 DAC2 - Physical Chip Implementation 59

160 Partitioning Algorithm: Partition & Reflow Read Data Preprocessing Netlist While ( any partition has > 2 cells ) Global Placement Divide each Partition Reflow across partitions Synthesis Detailed Placement Done June 22 DAC2 - Physical Chip Implementation 6

161 What Synthesis Can do when Invoked: - add boxes - delete boxes - add nets - delete nets - reconnect nets - change box sizes - query placement locations of boxes - query "bin" statistics - remove a box from a bin - add a box to a bin June 22 DAC2 - Physical Chip Implementation 6

162 Placement and Synthesis Integration Loosely coupled: (methodology coupling) do some synthesis, then write out data do some placement, then write out data.. Repeat Interleaved: (placement & synthesis in same process) do pre-pd synthesis for each placement step redo synthesis Tightly coupled: (simultaneous P&S aware transforms) June 22 DAC2 - Physical Chip Implementation 62

163 Loosely Coupled Placement & Synthesis: Characteristics: Do Placement - Placement is treated as a black box Analyze - Multiple placement runs are made - re-synthesize - Generate Constraints No Meet Objectives Yes Done w/placement June 22 DAC2 - Physical Chip Implementation 63

164 Interleaved Placement & Synthesis: Synthesis Characteristics: - the placement flow is the same as in a placement only methodology Synthesis - in between each step of the placement progression, synthesis is invoked Synthesis June 22 DAC2 - Physical Chip Implementation 64

165 Tightly Coupled Placement and synthesis algorithms become co-dependant Placement algorithms have awareness of synthesis activity Synthesis algorithms have awareness of placement activity June 22 DAC2 - Physical Chip Implementation 65

166 Section Outline Introduction Review material (timing and synthesis) Introduction to placement Placement algorithms Paradigms for placement-synthesis integration Placement aware synthesis techniques Congestion avoidance / mitigation techniques Routing optimization June 22 DAC2 - Physical Chip Implementation 66

167 Placement Driven Cloning critical Cloning to off-load non-critical path from critical path non-critical June 22 DAC2 - Physical Chip Implementation 67

168 Placement Driven Expansion Logic AO Logic Expansion Transformation Logic Logic Expansion allows primitives to be placed in a more timing friendly way Logic Logic June 22 DAC2 - Physical Chip Implementation 68

169 Example: Tightly Coupled Placement Driven Expansion June 22 DAC2 - Physical Chip Implementation 69

170 Tightly Coupled Synthesis & Placement: a b c d e f g Transform a d b e f c g June 22 DAC2 - Physical Chip Implementation 7

171 Tightly Coupled Synthesis & Placement Example: Suppose the primary IO constraints look like this: c a b f g June 22 DAC2 - Physical Chip Implementation 7 d e

172 Tightly Coupled Synthesis & Placement Example: The placement of the synthesized netlist would look something like this: c a b f g June 22 DAC2 - Physical Chip Implementation 72 d e

173 Tightly Coupled Synthesis & Placement Example: If we could re-synthesize the netlist, we could get something that looks like this. c a b f g June 22 DAC2 - Physical Chip Implementation 73 d e

174 Tightly Coupled Synthesis & Placement: weight = a b c a b c weight = / d e f g PD-MAP d e f g June 22 DAC2 - Physical Chip Implementation 74

175 Tightly Coupled Synthesis & Placement example: Map_TREE For each cut partition_it For each partition If(partition number > M) { if(related_node_count < N) merge_nodes if(related_node_count == ) merge_node into neighbor partition } end end June 22 DAC2 - Physical Chip Implementation 75

176 Tightly Coupled Synthesis & Placement Example: c a b f g June 22 DAC2 - Physical Chip Implementation 76 d e

177 Tightly Coupled Synthesis & Placement Example: c a b f g June 22 DAC2 - Physical Chip Implementation 77 d e

178 Tightly Coupled Synthesis & Placement Example: c a b f g June 22 DAC2 - Physical Chip Implementation 78 d e

179 Tightly Coupled Synthesis & Placement Example: c a b f g June 22 DAC2 - Physical Chip Implementation 79 d e

180 Tightly Coupled Synthesis & Placement Example: c a b f g June 22 DAC2 - Physical Chip Implementation 8 d e

181 Tightly Coupled Synthesis & Placement Example: c a b f g June 22 DAC2 - Physical Chip Implementation 8 d e

182 Tightly Coupled Synthesis & Placement Example: a b c a b c f g Result f g d e d e June 22 DAC2 - Physical Chip Implementation 82

183 Placement Driven Timing Correction June 22 DAC2 - Physical Chip Implementation 83

184 Redesign Fan-in Tree Arr(a)=4 Arr(b)=3 Arr(c)= Arr(d)= a b c d e Arr(e)=6 Arr(e)= e c d e b a e Arr(e)=5 June 22 DAC2 - Physical Chip Implementation 84

185 Placement Driven Repowering Repowering is traditionally done using load based cell characterization Placement changes continuously during partitioning Need high efficiency algorithms to do repowering in this environment Solution: Use Gain Based Formulation June 22 DAC2 - Physical Chip Implementation 85

186 Delay Models Load based formulation: Gain based formulation: β.cin Cin Cout Cin Cout g = C C out in d d Cout = c. E. (+ β ). C k = k. C inv + out+ p in p d = Cout c. E. ( + β ). C l inv + d = l. g + p in p d: delay l: logical effort g: gain p: intrinsic delay June 22 DAC2 - Physical Chip Implementation 86

187 Area vs Delay Centric Load Based Paradigm (load-based delay eq.) Know: sized Size of each cell Total Area -> Don t know: Wire loads area centric Delay of each cell Delay of a path Gain Based Paradigm (gain based delay eq.) Know: sizeless The delay of each cell. The delay of a path -> Don t know: Wire loads delay centric The area of each cell The total area Estimation error is in the delay: Local path based property. Estimation error is in the area Global property. June 22 DAC2 - Physical Chip Implementation 87

188 Design Flow High Level Synthesis Load Based Delay (DCL) Library Analysis Restructuring Tech Mapping GainBased Opt Gain Based Delay Discretization Late Timing Corr June 22 DAC2 - Physical Chip Implementation 88

189 Power Levels (Gate Sizes) d d Cout Cout A B C June 22 DAC2 - Physical Chip Implementation 89

190 Library (Gain) Analysis Cin g = C C out in d d Cout g d = l. g + p A B C June 22 DAC2 - Physical Chip Implementation 9

191 Area and Load Calculation g g 2 g 3 C out =.7 C = C g 2 C 2 = C g 3 2 C 3 = C g out 3 Start at primary outputs/ register inputs. Much like static timing analysis. Incremental. g g 2 3 g C = C g 2 C 2 = C g 3 2 C out June 22 DAC2 - Physical Chip Implementation 9 =.7

192 Gain Calculation D d d 2 d 3 d 4 Cin N out G = gi = g. g 2. g 3. g 4. =. = Cin C 2 C 3 C 4 i= N D = d i= i = d + d 2 + d 3 + d = l. g + p = f + p d C 4 C C C C C out in Minimize D such that: G = C C out in Cout Minimize D = N i= N N N N Cout fi + pi Such that: F = fi = li. gi = L. i= i = i = i = Cin Geometric Solution: June 22 DAC2 - Physical Chip Implementation 92 fi = f

193 Example I d d 2 d 3 C in =.9 C NAND2 : d =.38+. Cin C NOR2: d = C INV : d = C C out 3 F = C 2 C 3 f C C 3 out. f 2. f3 = f = L = in.8 Cout=.7 f =.23 Nand2 Nor2 Inv Path p f d Cin June 22 DAC2 - Physical Chip Implementation 93

194 Constant Delay Calculation dc dc Cout Cout Inverter : d = l. g + p Set : gc = 3.6 gc Calculate: Cout = Cin Measure: dc Set : Cout = Measure: do = Calculate: l. gc p = d c p = f c Other l. g = g g d nand nor c. nand = 2.5 =.8 = l gates : f c. nand. g nand + p nand June 22 DAC2 - Physical Chip Implementation 94

195 Discretization From gain-based model back to appropriate power levels There is an error in timing/load when ideal power levels are not available. Goal: Minimize this error. Can be tuned to delay error or capacitance error. g g 2 g 3 d = l. g + p C out =.7 C = C g 2 C 2 = C g 3 2 C 3 = C g out 3 [Kudva98] [Beeftink98] June 22 DAC2 - Physical Chip Implementation 95

196 Gain Based: Observations: Gain Based algorithms: A major improvement. More homogeneous (global) algorithms and designs. Can be better targeted for area and/or delay. Reveal inherent cell characteristics to optimization tools, leading to improved QOR Good library design is required to facilitate discretization step Ideally suited for operation within Physical Synthesis June 22 DAC2 - Physical Chip Implementation 96

197 Placement Driven Buffering Rip Out all Buffers Insert Buffers based on placement info June 22 DAC2 - Physical Chip Implementation 97

198 What to do About Long Wires? Add buffers Tune wire sizes Modify the placement to reduce them June 22 DAC2 - Physical Chip Implementation 98

199 Placement Driven vs Logic Driven Buffer Insertion Logic driven buffer insertion focuses on logic topology and buffer sizing while assuming a statistical wire load model Placement driven buffering uses an existing placement as the fundamental constraint June 22 DAC2 - Physical Chip Implementation 99

200 Placement Driven Buffer Insertion: Buffopt (IBM) Multiple buffer types Inverters Capacitance, Slew and Noise constraints Wire Sizing Simultaneous driver sizing High order interconnect delay and Ceffective Blockage handling June 22 DAC2 - Physical Chip Implementation 2

201 How Do Buffers Help? Reduce delay Wire delay quadratic in length Buffers make delay essentially linear Delay gate dominated, not wire dominated Fix other problems Bad slews at sinks Capacitance range violations Noise induced by capacitance coupling June 22 DAC2 - Physical Chip Implementation 2

202 How Does Wire Sizing Help? Highly resistive lines increase delay Wider wires or thick metal layers reduces resistance, but can increase capacitance For long interconnect, resistance reduction outweighs capacitance increase June 22 DAC2 - Physical Chip Implementation 22

203 Simple Buffer Insertion Problem Given: Source and sink locations, sink capacitances and RATs, a buffer type, source delay rules, unit wire resistance and capacitance Buffer RAT 3 RAT 4 s RAT 2 RAT June 22 DAC2 - Physical Chip Implementation 23

204 Simple Buffer Insertion Problem Find: Buffer locations and a routing tree such that slack at the source is minimized q( s ) = min i 4{ RAT ( si) delay( s, s i )} RAT 3 RAT 4 s RAT 2 RAT June 22 DAC2 - Physical Chip Implementation 24

205 Fundamental Buffer Insertion Van Ginneken s dynamic programming algorithm Building block: candidate (Cap, slack) Candidates for each node stored as a list Each sink has one candidate Propagate candidates up the tree Guarantees optimal solution Quadratic complexity June 22 DAC2 - Physical Chip Implementation 25

206 Assumptions for the Basic Van Ginneken algorithm: Given a routing tree Given a set of potential insertion points Single buffer size No sink or driver sizing Linear gate delay model R d C down + K d Elmore wire delay model R w (C w /2 + C down ) June 22 DAC2 - Physical Chip Implementation 26

207 Van Ginneken Extensions Multiple buffer types Inverters Capacitance, Slew and Noise constraints Wire Sizing Simultaneous driver sizing High order interconnect delay and Ceffective Blockage recognition June 22 DAC2 - Physical Chip Implementation 27

208 Example - Connect the end points of the net using a steiner route - Add Candidate Nodes - Final buffer solution is optimal for this route, and this set of candidate nodes. - Other routes may produce better final solutions. - Net routing topology is an input to Van Ginneken s algorithm June 22 DAC2 - Physical Chip Implementation 28

209 Example June 22 DAC2 - Physical Chip Implementation 29

210 How Many Candidates? Number of candidates seems to double with each additional node Prune candidate with worst slack when capacitances is greater or equal Linear number of candidates June 22 DAC2 - Physical Chip Implementation 2

211 Pseudo Code: List = NULL; For each node (bottom up traversal of graph) { augment each item in list with wire segment up to node duplicate the list for each element of the duplicate list add a buffer at node analyze each element in list analyze each element in buffered (duplicate) list pick best element of buffered list and delete the rest new list is union of list and best element of buffered list } Pick best solution; June 22 DAC2 - Physical Chip Implementation 2

212 Example Node processing: 2 evaluations, at most 2 candidates kept Node 2 processing: 4 evaluations, at most 3 candidates kept Node 3 processing: 6 evaluations, at most 4 candidates kept Node 4 processing: 8 evaluations, at most 5 candidates kept Node 5 processing: evaluations, at most 6 candidates kept Node 6 processing: 2 evaluations, at most 7 candidates kept Now pick the best one: Optimal solution Num Num evaluations candidates = 2 i <= i= N N + ( N)( N + ) June 22 DAC2 - Physical Chip Implementation 22 =

213 Merging Branches Critical Merge is additive June 22 DAC2 - Physical Chip Implementation 23

214 Van Ginneken Algorithm Summary Good Bad Clever pruning controls # of candidates Finds an optimal solution in quadratic time Easily extended to cover a variety of important considerations (like multiple buffer types, wire sizing, polarity, slew, & capacitance constraints, etc. Results depend on quality of route provided June 22 DAC2 - Physical Chip Implementation 24

215 Example Route: Critical: can not offload due to route Different route leads to better solution June 22 DAC2 - Physical Chip Implementation 25

216 Physical Synthesis Flow Wire-load Models Synthesized Netlist Unplaced Physically unaware timing Cleanup: Remove buffers, nominal power levels on gates Initial basic placement For minimal wire-length, min-cut, Steiner tree estimates, physically aware timing Logical + Placement optimizations Physically aware logic optimizations Yes Timing No more Improvement? Placed Netlist Timing-driven placement w/resynthesis For minimal netweights, based on the timing of the net June 22 DAC2 - Physical Chip Implementation 26

217 Example Route: If still critical, add net weight June 22 DAC2 - Physical Chip Implementation 27

218 Example Route: June 22 DAC2 - Physical Chip Implementation 28

219 Multiple Buffer Types Instead of one buffer type, can choose from m power levels Generate m candidates instead of one Still optimal Complexity increase quadratic in m June 22 DAC2 - Physical Chip Implementation 29

220 Inverters Store candidates in + and - lists + implies polarity preserved - implies polarity reversed Adding inverter Switches candidate in + list to - list Switches candidate in - to + list Final result only chosen from + list June 22 DAC2 - Physical Chip Implementation 22

221 Capacitance Constraints Each gate g can drive at most C(g) capacitance When inserting buffer g, check downstream capacitance. If it is bigger than C(g), throw out candidate Increases efficiency June 22 DAC2 - Physical Chip Implementation 22

222 Slew Constraints Similar to capacitance constraints When inserting buffer, compute slews to gates driven by buffer If any slew exceeds its target, throw out candidate Potential difficulty: computing slew accurately in bottom-up fashion June 22 DAC2 - Physical Chip Implementation 222

223 Noise Constraints Each gate has acceptable noise threshold Compute cumulative noise for each wire via Devgan noise metric Throw out candidates that violate noise Can avoid noise while optimizing timing! June 22 DAC2 - Physical Chip Implementation 223

224 Wire Sizing: For each node (bottom up traversal of graph) { for each Wire Size { augment each item in list with Sized wire segment duplicate the list for each element of the duplicate list add a buffer at node analyze each element in list analyze each element in buffered (duplicate) list pick best element of buffered list and delete the rest new list is union of list and best element of buffered list } } Do Final pruning & Pick best solution; June 22 DAC2 - Physical Chip Implementation 224

225 Blockage Recognition Delete insertion points that run over blockages June 22 DAC2 - Physical Chip Implementation 225

226 Route Around Blockage June 22 DAC2 - Physical Chip Implementation 226

227 Buffer Bays June 22 DAC2 - Physical Chip Implementation 227

228 Routing Into Buffer Bays June 22 DAC2 - Physical Chip Implementation 228

229 Buffer Site Similar to buffer bays, only exact buffer locations are pre-specified, not just areas Useful as a mechanism for IP blocks and microprocessor design Dummy cell that holds a buffer Not connected to any net Becomes buffer when assigned to a net Extra sites decoupling caps Sprinkle sites throughout design Allocate percentage within macros June 22 DAC2 - Physical Chip Implementation 229

230 Routing Into Buffer Sites June 22 DAC2 - Physical Chip Implementation 23

231 Generate Steiner Tree June 22 DAC2 - Physical Chip Implementation 23

232 Reduce Congestion and Coupling June 22 DAC2 - Physical Chip Implementation 232

233 Reduce Congestion and Coupling June 22 DAC2 - Physical Chip Implementation 233

234 Assign Buffers June 22 DAC2 - Physical Chip Implementation 234

235 Comments about Buffering and Wire Sizing: Extremely critical: One of the highest leverage timing closure items There are extended provably correct algorithms for dealing with the problem. Steiner route & Blockage avoidance are mostly heuristic: Hot research area! June 22 DAC2 - Physical Chip Implementation 235

236 Section Outline Introduction Review material (timing and synthesis) Introduction to placement Placement algorithms Paradigms for placement-synthesis integration Placement aware synthesis techniques Congestion avoidance / mitigation techniques Routing Optimization June 22 DAC2 - Physical Chip Implementation 236

237 Congestion Mitigation June 22 DAC2 - Physical Chip Implementation 237

238 Sources of Congestion Placement Quality: Do we have a good relative ordering of cells? Placement Density: Do we have appropriate cell spreading? Preplacement of large cells: Is there a better location for these cells? Floorplan quality: Is this a good floorplan / hierarchy? Netlist complexity: Are some logic groupings inherently difficult to route Library characteristics: Do some cells block too much metal internally? June 22 DAC2 - Physical Chip Implementation 238

239 Congestion Mitigation Constructive Avoidance control global placement pin density: fewer pins per unit area means fewer wires per unit area monitor congestion during placement and perform dynamic spreading Post placement fix up remove problems from an already placed netlist June 22 DAC2 - Physical Chip Implementation 239

240 Constructive Avoidance: Characteristics: - as placement is formed, take action to avoid problems - between each step of the placement progression there is the potential to evaluate congestion and take action Groute / Spread / Redo Groute / Spread / Redo Groute / Spread / Redo etc June 22 DAC2 - Physical Chip Implementation 24

241 Constructive Avoidance Deficiencies: Depends on early estimates of congestion that may not be accurate enough to avoid all problems Post placement actions such as clock tree insertion, repowering, buffering, etc may add congestion too the design Guard banding with conservative constructive avoidance causes lose of performance and density June 22 DAC2 - Physical Chip Implementation 24

242 Post Placement Congestion Mitigation Use production global router, not internal placement based global router Translate congestion values into density targets for placement regions Perform flow based circuit spreading Preserve relative logic ordering of cells June 22 DAC2 - Physical Chip Implementation 242

243 Network Flow based Spreading Min-cost max-flow formulation i j s t i if b(i) >, Cap(esi) = b(i) Cost(esi) = Supply Nodes Demand Nodes b(i) > b(j) < i s, j t, Cap(eij) = Infinity (Large Int) Cost(eij) = K j if b(j) <, Cap(ejt) = -b(j) Cost(ejt) = June 22 DAC2 - Physical Chip Implementation 243

244 Congestion Driven Circuit Spreading Initial Placement Calculate bin level congestion Translate bin score to bin target density Network flow based circuit spreading No Is Congestion below threshold? Yes Final Placement June 22 DAC2 - Physical Chip Implementation 244

245 June 22 DAC2 - Physical Chip Implementation 245

246 June 22 DAC2 - Physical Chip Implementation 246

247 June 22 DAC2 - Physical Chip Implementation 247

248 June 22 DAC2 - Physical Chip Implementation 248

249 We ve Talked About Placement algorithms Placement / Synthesis interaction Placement aware synthesis techniques The Constant Delay paradigm Physical Buffer insertion / Wire sizing Congestion Mitigation June 22 DAC2 - Physical Chip Implementation 249

250 Let s Look at some Examples: June 22 DAC2 - Physical Chip Implementation 25

251 Pure MLP Quadratic June 22 DAC2 - Physical Chip Implementation 25

252 Optimization Results buffer shatter clone fanin June 22 DAC2 - Physical Chip Implementation 252

253 Optimization Results June 22 DAC2 - Physical Chip Implementation 253

254 Optimization Results June 22 DAC2 - Physical Chip Implementation 254

255 Section outline Introduction Review material (timing and synthesis) Introduction to placement Placement algorithms Paradigms for placement-synthesis integration Placement aware synthesis techniques Congestion avoidance / mitigation techniques Routing optimization June 22 DAC2 - Physical Chip Implementation 255

256 Routing Based Optimization: RBO (IBM) June 22 DAC2 - Physical Chip Implementation 256

257 Routing based Timing Closure Issues Post Routing timing problems can be significant affect design schedule may be too numerous to fix manually Increasing design density can reduce cost, but it also increases wiring congestion timing and signal integrity become more significant available resource for manual fixup is limited without automation may not be doable Rerouting with constraints may resolve some of the problems, but this process is slow June 22 DAC2 - Physical Chip Implementation 257

258 Solution: Integrate global routing, detailed routing and timing correction Global routing is efficient enough to be run in an iterative timing closure loop Timing critical nets avoid scenic routes Non-critical nets that go scenic can be repowered and buffered prior to detailed routing June 22 DAC2 - Physical Chip Implementation 258

259 Example Problem: critical paths non-critical paths ideal "Steiner" routes Timing deficient wiring solution Force optimal use of wiring resource (e.g. critical paths get direct route) PDS Timing uses steiner wires - fast Post PD Timing Catches this problem: Slow! Timing driven wiring solution RBO Timing Driven Routing sees this during global route stage: Fast! June 22 DAC2 - Physical Chip Implementation 259

260 Global Routing Divides the entire chip into localized rectangular regions called tiles. Compress several pin location in each tile to a single pin location All the shapes, wires and open are represented in terms of global track capacity and usage. June 22 DAC2 - Physical Chip Implementation 26

261 Global Routing Two step approach Create the initial steiner routes Compute the edge congestion's on the grid Perform a rip-up reroute using shortest path algorithm to reduce the overall congestion of the design Advantages Can communicate with detail router Good correlation with final detail routing solution June 22 DAC2 - Physical Chip Implementation 26

262 Routing Based Optimization Current Methodology RBO Methodology Physical Synthesis XrGlobal Physical Synthesis Global Routing Extractor Optimizer RBO / Physical Synthesis Einstimer Detailed Routing Detailed Routing Timing Analysis No Timing Criticality for Global router Costly Manual Timing Correction Analysis June 22 DAC2 - Physical Chip Implementation 262

263 RBO Extraction Process Very fast Excellent correlation with final 3D extraction Uses global routes for extraction Neighbor information probabilistically determined based on the global routing congestion information Based on extraction tables Probability of having a neighbor = (#OccupiedTracks)/(#Capacity) = 2/4 =.5 Capacity of All Edges = 5 June 22 DAC2 - Physical Chip Implementation 263

264 RBO results on memcntl Design : Example Nets : ~.6M Size : 2393 x 2393 Congestion : Attached is a display of Global congestion June 22 DAC2 - Physical Chip Implementation 264

265 Timing Critical Nets: Without RBO June 22 DAC2 - Physical Chip Implementation 265

266 Nets Routed with RBO flow June 22 DAC2 - Physical Chip Implementation 266

267 RBO Results - Example Worst Slack #Slack Violations #Cap Violations #Slew Violations #Opens #Loops Steiner Estimates XrLocal without RBO RBO Timing Closure (Global Routes) Detailed routing with RBO June 22 DAC2 - Physical Chip Implementation 267

268 Example 2: - Critical Net Routed Without RBO June 22 DAC2 - Physical Chip Implementation 268

269 Example 2: Critical Net Routed With RBO June 22 DAC2 - Physical Chip Implementation 269

ECE260B CSE241A Winter Placement

ECE260B CSE241A Winter Placement ECE260B CSE241A Winter 2005 Placement Website: / courses/ ece260b- w05 ECE260B CSE241A Placement.1 Slides courtesy of Prof. Andrew B. Slides courtesy of Prof. Andrew B. Kahng VLSI Design Flow and Physical

More information

Comprehensive Place-and-Route Platform Olympus-SoC

Comprehensive Place-and-Route Platform Olympus-SoC Comprehensive Place-and-Route Platform Olympus-SoC Digital IC Design D A T A S H E E T BENEFITS: Olympus-SoC is a comprehensive netlist-to-gdsii physical design implementation platform. Solving Advanced

More information

Physical Design Closure

Physical Design Closure Physical Design Closure Olivier Coudert Monterey Design System DAC 2000 DAC2000 (C) Monterey Design Systems 1 DSM Dilemma SOC Time to market Million gates High density, larger die Higher clock speeds Long

More information

Linking Layout to Logic Synthesis: A Unification-Based Approach

Linking Layout to Logic Synthesis: A Unification-Based Approach Linking Layout to Logic Synthesis: A Unification-Based Approach Massoud Pedram Department of EE-Systems University of Southern California Los Angeles, CA February 1998 Outline Introduction Technology and

More information

(Lec 14) Placement & Partitioning: Part III

(Lec 14) Placement & Partitioning: Part III Page (Lec ) Placement & Partitioning: Part III What you know That there are big placement styles: iterative, recursive, direct Placement via iterative improvement using simulated annealing Recursive-style

More information

Very Large Scale Integration (VLSI)

Very Large Scale Integration (VLSI) Very Large Scale Integration (VLSI) Lecture 6 Dr. Ahmed H. Madian Ah_madian@hotmail.com Dr. Ahmed H. Madian-VLSI 1 Contents FPGA Technology Programmable logic Cell (PLC) Mux-based cells Look up table PLA

More information

Digital VLSI Design. Lecture 7: Placement

Digital VLSI Design. Lecture 7: Placement Digital VLSI Design Lecture 7: Placement Semester A, 2016-17 Lecturer: Dr. Adam Teman 29 December 2016 Disclaimer: This course was prepared, in its entirety, by Adam Teman. Many materials were copied from

More information

Cluster-based approach eases clock tree synthesis

Cluster-based approach eases clock tree synthesis Page 1 of 5 EE Times: Design News Cluster-based approach eases clock tree synthesis Udhaya Kumar (11/14/2005 9:00 AM EST) URL: http://www.eetimes.com/showarticle.jhtml?articleid=173601961 Clock network

More information

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR An Introduction to FPGA Placement Yonghong Xu Supervisor: Dr. Khalid RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS UNIVERSITY OF WINDSOR

More information

Eliminating Routing Congestion Issues with Logic Synthesis

Eliminating Routing Congestion Issues with Logic Synthesis Eliminating Routing Congestion Issues with Logic Synthesis By Mike Clarke, Diego Hammerschlag, Matt Rardon, and Ankush Sood Routing congestion, which results when too many routes need to go through an

More information

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling

Fast Dual-V dd Buffering Based on Interconnect Prediction and Sampling Based on Interconnect Prediction and Sampling Yu Hu King Ho Tam Tom Tong Jing Lei He Electrical Engineering Department University of California at Los Angeles System Level Interconnect Prediction (SLIP),

More information

Basic Idea. The routing problem is typically solved using a twostep

Basic Idea. The routing problem is typically solved using a twostep Global Routing Basic Idea The routing problem is typically solved using a twostep approach: Global Routing Define the routing regions. Generate a tentative route for each net. Each net is assigned to a

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 URL: http://cadlab.cs.ucla.edu/~cong Exponential Device

More information

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS

POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS POWER PERFORMANCE OPTIMIZATION METHODS FOR DIGITAL CIRCUITS Radu Zlatanovici zradu@eecs.berkeley.edu http://www.eecs.berkeley.edu/~zradu Department of Electrical Engineering and Computer Sciences University

More information

L14 - Placement and Routing

L14 - Placement and Routing L14 - Placement and Routing Ajay Joshi Massachusetts Institute of Technology RTL design flow HDL RTL Synthesis manual design Library/ module generators netlist Logic optimization a b 0 1 s d clk q netlist

More information

Place and Route for FPGAs

Place and Route for FPGAs Place and Route for FPGAs 1 FPGA CAD Flow Circuit description (VHDL, schematic,...) Synthesize to logic blocks Place logic blocks in FPGA Physical design Route connections between logic blocks FPGA programming

More information

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments 8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments QII51017-9.0.0 Introduction The Quartus II incremental compilation feature allows you to partition a design, compile partitions

More information

Spiral 2-8. Cell Layout

Spiral 2-8. Cell Layout 2-8.1 Spiral 2-8 Cell Layout 2-8.2 Learning Outcomes I understand how a digital circuit is composed of layers of materials forming transistors and wires I understand how each layer is expressed as geometric

More information

FastPlace 2.0: An Efficient Analytical Placer for Mixed- Mode Designs

FastPlace 2.0: An Efficient Analytical Placer for Mixed- Mode Designs FastPlace.0: An Efficient Analytical Placer for Mixed- Mode Designs Natarajan Viswanathan Min Pan Chris Chu Iowa State University ASP-DAC 006 Work supported by SRC under Task ID: 106.001 Mixed-Mode Placement

More information

Chapter 5 Global Routing

Chapter 5 Global Routing Chapter 5 Global Routing 5. Introduction 5.2 Terminology and Definitions 5.3 Optimization Goals 5. Representations of Routing Regions 5.5 The Global Routing Flow 5.6 Single-Net Routing 5.6. Rectilinear

More information

Introduction VLSI PHYSICAL DESIGN AUTOMATION

Introduction VLSI PHYSICAL DESIGN AUTOMATION VLSI PHYSICAL DESIGN AUTOMATION PROF. INDRANIL SENGUPTA DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Introduction Main steps in VLSI physical design 1. Partitioning and Floorplanning l 2. Placement 3.

More information

CSE241 VLSI Digital Circuits UC San Diego

CSE241 VLSI Digital Circuits UC San Diego CSE241 VLSI Digital Circuits UC San Diego Winter 2003 Lecture 05: Logic Synthesis Cho Moon Cadence Design Systems January 21, 2003 CSE241 L5 Synthesis.1 Kahng & Cichy, UCSD 2003 Outline Introduction Two-level

More information

CAD Flow for FPGAs Introduction

CAD Flow for FPGAs Introduction CAD Flow for FPGAs Introduction What is EDA? o EDA Electronic Design Automation or (CAD) o Methodologies, algorithms and tools, which assist and automatethe design, verification, and testing of electronic

More information

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University Outline Introduction Problem Formulation Algorithm -

More information

ECE260B CSE241A Winter Logic Synthesis

ECE260B CSE241A Winter Logic Synthesis ECE260B CSE241A Winter 2007 Logic Synthesis Website: /courses/ece260b-w07 ECE 260B CSE 241A Static Timing Analysis 1 Slides courtesy of Dr. Cho Moon Introduction Why logic synthesis? Ubiquitous used almost

More information

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure

Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Clock Tree Resynthesis for Multi-corner Multi-mode Timing Closure Subhendu Roy 1, Pavlos M. Mattheakis 2, Laurent Masse-Navette 2 and David Z. Pan 1 1 ECE Department, The University of Texas at Austin

More information

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface. Placement Introduction A very important step in physical design cycle. A poor placement requires larger area. Also results in performance degradation. It is the process of arranging a set of modules on

More information

PERFORMANCE optimization has always been a critical

PERFORMANCE optimization has always been a critical IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 11, NOVEMBER 1999 1633 Buffer Insertion for Noise and Delay Optimization Charles J. Alpert, Member, IEEE, Anirudh

More information

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design Wei-Jin Dai, Dennis Huang, Chin-Chih Chang, Michel Courtoy Cadence Design Systems, Inc. Abstract A design methodology for the implementation

More information

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Clock Routing Problem Formulation Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets. Better to develop specialized routers for these nets.

More information

Making Fast Buffer Insertion Even Faster Via Approximation Techniques

Making Fast Buffer Insertion Even Faster Via Approximation Techniques 1A-3 Making Fast Buffer Insertion Even Faster Via Approximation Techniques Zhuo Li 1,C.N.Sze 1, Charles J. Alpert 2, Jiang Hu 1, and Weiping Shi 1 1 Dept. of Electrical Engineering, Texas A&M University,

More information

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Lab #2: Layout and Simulation

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Lab #2: Layout and Simulation UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Lab #2: Layout and Simulation NTU IC541CA 1 Assumed Knowledge This lab assumes use of the Electric

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Chapter 5: ASICs Vs. PLDs

Chapter 5: ASICs Vs. PLDs Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task.

More information

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public

Reduce Your System Power Consumption with Altera FPGAs Altera Corporation Public Reduce Your System Power Consumption with Altera FPGAs Agenda Benefits of lower power in systems Stratix III power technology Cyclone III power Quartus II power optimization and estimation tools Summary

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

Variation Tolerant Buffered Clock Network Synthesis with Cross Links

Variation Tolerant Buffered Clock Network Synthesis with Cross Links Variation Tolerant Buffered Clock Network Synthesis with Cross Links Anand Rajaram David Z. Pan Dept. of ECE, UT-Austin Texas Instruments, Dallas Sponsored by SRC and IBM Faculty Award 1 Presentation Outline

More information

Sungmin Bae, Hyung-Ock Kim, Jungyun Choi, and Jaehong Park. Design Technology Infrastructure Design Center System-LSI Business Division

Sungmin Bae, Hyung-Ock Kim, Jungyun Choi, and Jaehong Park. Design Technology Infrastructure Design Center System-LSI Business Division Sungmin Bae, Hyung-Ock Kim, Jungyun Choi, and Jaehong Park Design Technology Infrastructure Design Center System-LSI Business Division 1. Motivation 2. Design flow 3. Parallel multiplier 4. Coarse-grained

More information

EEL 4783: HDL in Digital System Design

EEL 4783: HDL in Digital System Design EEL 4783: HDL in Digital System Design Lecture 13: Floorplanning Prof. Mingjie Lin Topics Partitioning a design with a floorplan. Performance improvements by constraining the critical path. Floorplanning

More information

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas 1 RTL Design Flow HDL RTL Synthesis Manual Design Module Generators Library netlist

More information

An Interconnect-Centric Design Flow for Nanometer. Technologies

An Interconnect-Centric Design Flow for Nanometer. Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong Department of Computer Science University of California, Los Angeles, CA 90095 Abstract As the integrated circuits (ICs) are scaled

More information

101-1 Under-Graduate Project Digital IC Design Flow

101-1 Under-Graduate Project Digital IC Design Flow 101-1 Under-Graduate Project Digital IC Design Flow Speaker: Ming-Chun Hsiao Adviser: Prof. An-Yeu Wu Date: 2012/9/25 ACCESS IC LAB Outline Introduction to Integrated Circuit IC Design Flow Verilog HDL

More information

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering FPGA Fabrics Reference Wayne Wolf, FPGA-Based System Design Pearson Education, 2004 Logic Design Process Combinational logic networks Functionality. Other requirements: Size. Power. Primary inputs Performance.

More information

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions

Overview. Design flow. Principles of logic synthesis. Logic Synthesis with the common tools. Conclusions Logic Synthesis Overview Design flow Principles of logic synthesis Logic Synthesis with the common tools Conclusions 2 System Design Flow Electronic System Level (ESL) flow System C TLM, Verification,

More information

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 1 Lecture 10: Repeater (Buffer) Insertion Introduction to Buffering Buffer Insertion

More information

Routing. Robust Channel Router. Figures taken from S. Gerez, Algorithms for VLSI Design Automation, Wiley, 1998

Routing. Robust Channel Router. Figures taken from S. Gerez, Algorithms for VLSI Design Automation, Wiley, 1998 Routing Robust Channel Router Figures taken from S. Gerez, Algorithms for VLSI Design Automation, Wiley, 1998 Channel Routing Algorithms Previous algorithms we considered only work when one of the types

More information

Unification of Partitioning, Placement and Floorplanning. Saurabh N. Adya, Shubhyant Chaturvedi, Jarrod A. Roy, David A. Papa, and Igor L.

Unification of Partitioning, Placement and Floorplanning. Saurabh N. Adya, Shubhyant Chaturvedi, Jarrod A. Roy, David A. Papa, and Igor L. Unification of Partitioning, Placement and Floorplanning Saurabh N. Adya, Shubhyant Chaturvedi, Jarrod A. Roy, David A. Papa, and Igor L. Markov Outline Introduction Comparisons of classical techniques

More information

SYNTHESIS FOR ADVANCED NODES

SYNTHESIS FOR ADVANCED NODES SYNTHESIS FOR ADVANCED NODES Abhijeet Chakraborty Janet Olson SYNOPSYS, INC ISPD 2012 Synopsys 2012 1 ISPD 2012 Outline Logic Synthesis Evolution Technology and Market Trends The Interconnect Challenge

More information

Delay Estimation for Technology Independent Synthesis

Delay Estimation for Technology Independent Synthesis Delay Estimation for Technology Independent Synthesis Yutaka TAMIYA FUJITSU LABORATORIES LTD. 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, JAPAN, 211-88 Tel: +81-44-754-2663 Fax: +81-44-754-2664 E-mail:

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

AN 567: Quartus II Design Separation Flow

AN 567: Quartus II Design Separation Flow AN 567: Quartus II Design Separation Flow June 2009 AN-567-1.0 Introduction This application note assumes familiarity with the Quartus II incremental compilation flow and floorplanning with the LogicLock

More information

ABC basics (compilation from different articles)

ABC basics (compilation from different articles) 1. AIG construction 2. AIG optimization 3. Technology mapping ABC basics (compilation from different articles) 1. BACKGROUND An And-Inverter Graph (AIG) is a directed acyclic graph (DAG), in which a node

More information

VLSI Physical Design: From Graph Partitioning to Timing Closure

VLSI Physical Design: From Graph Partitioning to Timing Closure VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5 Global Routing Original uthors: ndrew. Kahng, Jens, Igor L. Markov, Jin Hu VLSI Physical Design: From Graph Partitioning to Timing

More information

Topics. ! PLAs.! Memories: ! Datapaths.! Floor Planning ! ROM;! SRAM;! DRAM. Modern VLSI Design 2e: Chapter 6. Copyright 1994, 1998 Prentice Hall

Topics. ! PLAs.! Memories: ! Datapaths.! Floor Planning ! ROM;! SRAM;! DRAM. Modern VLSI Design 2e: Chapter 6. Copyright 1994, 1998 Prentice Hall Topics! PLAs.! Memories:! ROM;! SRAM;! DRAM.! Datapaths.! Floor Planning Programmable logic array (PLA)! Used to implement specialized logic functions.! A PLA decodes only some addresses (input values);

More information

EE-382M VLSI II. Early Design Planning: Front End

EE-382M VLSI II. Early Design Planning: Front End EE-382M VLSI II Early Design Planning: Front End Mark McDermott EE 382M-8 VLSI-2 Page Foil # 1 1 EDP Objectives Get designers thinking about physical implementation while doing the architecture design.

More information

Buffered Routing Tree Construction Under Buffer Placement Blockages

Buffered Routing Tree Construction Under Buffer Placement Blockages Buffered Routing Tree Construction Under Buffer Placement Blockages Abstract Interconnect delay has become a critical factor in determining the performance of integrated circuits. Routing and buffering

More information

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS)

ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) ESE 570 Cadence Lab Assignment 2: Introduction to Spectre, Manual Layout Drawing and Post Layout Simulation (PLS) Objective Part A: To become acquainted with Spectre (or HSpice) by simulating an inverter,

More information

Best Practices for Incremental Compilation Partitions and Floorplan Assignments

Best Practices for Incremental Compilation Partitions and Floorplan Assignments Best Practices for Incremental Compilation Partitions and Floorplan Assignments December 2007, ver. 1.0 Application Note 470 Introduction The Quartus II incremental compilation feature allows you to partition

More information

VLSI Physical Design: From Graph Partitioning to Timing Closure

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5 Global Routing Original uthors: ndrew. Kahng, Jens, Igor L. Markov, Jin Hu Chapter 5 Global Routing 5. Introduction 5.2 Terminology and Definitions 5.3 Optimization Goals 5. Representations of

More information

On GPU Bus Power Reduction with 3D IC Technologies

On GPU Bus Power Reduction with 3D IC Technologies On GPU Bus Power Reduction with 3D Technologies Young-Joon Lee and Sung Kyu Lim School of ECE, Georgia Institute of Technology, Atlanta, Georgia, USA yjlee@gatech.edu, limsk@ece.gatech.edu Abstract The

More information

CAD Algorithms. Placement and Floorplanning

CAD Algorithms. Placement and Floorplanning CAD Algorithms Placement Mohammad Tehranipoor ECE Department 4 November 2008 1 Placement and Floorplanning Layout maps the structural representation of circuit into a physical representation Physical representation:

More information

CS612 Algorithms for Electronic Design Automation. Global Routing

CS612 Algorithms for Electronic Design Automation. Global Routing CS612 Algorithms for Electronic Design Automation Global Routing Mustafa Ozdal CS 612 Lecture 7 Mustafa Ozdal Computer Engineering Department, Bilkent University 1 MOST SLIDES ARE FROM THE BOOK: MODIFICATIONS

More information

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices

Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Automated Extraction of Physical Hierarchies for Performance Improvement on Programmable Logic Devices Deshanand P. Singh Altera Corporation dsingh@altera.com Terry P. Borer Altera Corporation tborer@altera.com

More information

Physical Synthesis Methodology for High Performance Microprocessors

Physical Synthesis Methodology for High Performance Microprocessors 41.1 Physical Synthesis Methodology for High Performance Microprocessors Yiu-Hing Chan Prabhakar Kudva Lisa Lacey Greg Northrop Thomas Rosser IBM Server Group IBM TJ Watson Research Center IBM Server Group

More information

EE582 Physical Design Automation of VLSI Circuits and Systems

EE582 Physical Design Automation of VLSI Circuits and Systems EE582 Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University Preliminaries Table of Contents Semiconductor manufacturing Problems to solve Algorithm complexity

More information

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment

Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Symmetrical Buffered Clock-Tree Synthesis with Supply-Voltage Alignment Xin-Wei Shih, Tzu-Hsuan Hsu, Hsu-Chieh Lee, Yao-Wen Chang, Kai-Yuan Chao 2013.01.24 1 Outline 2 Clock Network Synthesis Clock network

More information

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations

Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations XXVII IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, OCTOBER 5, 2009 Symmetrical Buffer Placement in Clock Trees for Minimal Skew Immune to Global On-chip Variations Renshen Wang 1 Takumi Okamoto 2

More information

June 2003, ver. 1.2 Application Note 198

June 2003, ver. 1.2 Application Note 198 Timing Closure with the Quartus II Software June 2003, ver. 1.2 Application Note 198 Introduction With FPGA designs surpassing the multimillion-gate mark, designers need advanced tools to better address

More information

Unit 5A: Circuit Partitioning

Unit 5A: Circuit Partitioning Course contents: Unit 5A: Circuit Partitioning Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic Simulated annealing based partitioning algorithm Readings Chapter 7.5 Unit 5A 1 Course

More information

Advanced multi-patterning and hybrid lithography techniques. Fedor G Pikus, J. Andres Torres

Advanced multi-patterning and hybrid lithography techniques. Fedor G Pikus, J. Andres Torres Advanced multi-patterning and hybrid lithography techniques Fedor G Pikus, J. Andres Torres Outline Need for advanced patterning technologies Multipatterning (MP) technologies What is multipatterning?

More information

Time Algorithm for Optimal Buffer Insertion with b Buffer Types *

Time Algorithm for Optimal Buffer Insertion with b Buffer Types * An Time Algorithm for Optimal Buffer Insertion with b Buffer Types * Zhuo Dept. of Electrical Engineering Texas University College Station, Texas 77843, USA. zhuoli@ee.tamu.edu Weiping Shi Dept. of Electrical

More information

Planning for Local Net Congestion in Global Routing

Planning for Local Net Congestion in Global Routing Planning for Local Net Congestion in Global Routing Hamid Shojaei, Azadeh Davoodi, and Jeffrey Linderoth* Department of Electrical and Computer Engineering *Department of Industrial and Systems Engineering

More information

An Interconnect-Centric Design Flow for Nanometer Technologies. Outline

An Interconnect-Centric Design Flow for Nanometer Technologies. Outline An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science Department Email: cong@cs.ucla.edu Tel: 310-206-2775 http://cadlab.cs.ucla.edu/~cong Outline Global interconnects

More information

Reducing Power in an FPGA via Computer-Aided Design

Reducing Power in an FPGA via Computer-Aided Design Reducing Power in an FPGA via Computer-Aided Design Steve Wilton University of British Columbia Power Reduction via CAD How to reduce power dissipation in an FPGA: - Create power-aware CAD tools - Create

More information

Effects of Specialized Clock Routing on Clock Tree Timing, Signal Integrity, and Routing Congestion

Effects of Specialized Clock Routing on Clock Tree Timing, Signal Integrity, and Routing Congestion Effects of Specialized Clock Routing on Clock Tree Timing, Signal Jesse Craig IBM Systems & Technology Group jecraig@us.ibm.com Denise Powell Synopsys, Inc. dpowell@synopsys.com ABSTRACT Signal integrity

More information

High Quality, Low Cost Test

High Quality, Low Cost Test Datasheet High Quality, Low Cost Test Overview is a comprehensive synthesis-based test solution for compression and advanced design-for-test that addresses the cost challenges of testing complex designs.

More information

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA A Path Based Algorithm for Timing Driven Logic Replication in FPGA By Giancarlo Beraudo B.S., Politecnico di Torino, Torino, 2001 THESIS Submitted as partial fulfillment of the requirements for the degree

More information

AS VLSI technology scales to deep submicron and beyond, interconnect

AS VLSI technology scales to deep submicron and beyond, interconnect IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL., NO. 1 TILA-S: Timing-Driven Incremental Layer Assignment Avoiding Slew Violations Derong Liu, Bei Yu Member, IEEE, Salim

More information

Effects of FPGA Architecture on FPGA Routing

Effects of FPGA Architecture on FPGA Routing Effects of FPGA Architecture on FPGA Routing Stephen Trimberger Xilinx, Inc. 2100 Logic Drive San Jose, CA 95124 USA steve@xilinx.com Abstract Although many traditional Mask Programmed Gate Array (MPGA)

More information

ECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms

ECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms ECE 7 Complex Digital ASIC Design Topic : Physical Design Automation Algorithms Christopher atten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece7

More information

Compiler User Guide. Intel Quartus Prime Pro Edition. Updated for Intel Quartus Prime Design Suite: Subscribe Send Feedback

Compiler User Guide. Intel Quartus Prime Pro Edition. Updated for Intel Quartus Prime Design Suite: Subscribe Send Feedback Compiler User Guide Intel Quartus Prime Pro Edition Updated for Intel Quartus Prime Design Suite: 18.0 Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1. Design Compilation...

More information

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5.

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5. Course contents: Partitioning Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic Readings Chapter 7.5 Partitioning 1 Basic Definitions Cell: a logic block used to build larger circuits.

More information

Intel Quartus Prime Pro Edition User Guide

Intel Quartus Prime Pro Edition User Guide Intel Quartus Prime Pro Edition User Guide Design Compilation Updated for Intel Quartus Prime Design Suite: 18.1 Subscribe Latest document on the web: PDF HTML Contents Contents 1. Design Compilation...

More information

ASIC, Customer-Owned Tooling, and Processor Design

ASIC, Customer-Owned Tooling, and Processor Design ASIC, Customer-Owned Tooling, and Processor Design Design Style Myths That Lead EDA Astray Nancy Nettleton Manager, VLSI ASIC Device Engineering April 2000 Design Style Myths COT is a design style that

More information

CAD Algorithms. Circuit Partitioning

CAD Algorithms. Circuit Partitioning CAD Algorithms Partitioning Mohammad Tehranipoor ECE Department 13 October 2008 1 Circuit Partitioning Partitioning: The process of decomposing a circuit/system into smaller subcircuits/subsystems, which

More information

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Physical Design of Digital Integrated Circuits (EN029 S40) Sherief Reda Division of Engineering, Brown University Fall 2006 Lecture 09: Routing Introduction to Routing Global Routing Detailed Routing 2

More information

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation

Design Compiler Graphical Create a Better Starting Point for Faster Physical Implementation Datasheet Create a Better Starting Point for Faster Physical Implementation Overview Continuing the trend of delivering innovative synthesis technology, Design Compiler Graphical streamlines the flow for

More information

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers

Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers Addressing Verification Bottlenecks of Fully Synthesized Processor Cores using Equivalence Checkers Subash Chandar G (g-chandar1@ti.com), Vaideeswaran S (vaidee@ti.com) DSP Design, Texas Instruments India

More information

An Exact Algorithm for the Statistical Shortest Path Problem

An Exact Algorithm for the Statistical Shortest Path Problem An Exact Algorithm for the Statistical Shortest Path Problem Liang Deng and Martin D. F. Wong Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Outline Motivation

More information

Static Timing Verification of Custom Blocks Using Synopsys NanoTime Tool

Static Timing Verification of Custom Blocks Using Synopsys NanoTime Tool White Paper Static Timing Verification of Custom Blocks Using Synopsys NanoTime Tool September 2009 Author Dr. Larry G. Jones, Implementation Group, Synopsys, Inc. Introduction With the continued evolution

More information

An Interconnect-Centric Design Flow for Nanometer Technologies

An Interconnect-Centric Design Flow for Nanometer Technologies An Interconnect-Centric Design Flow for Nanometer Technologies Professor Jason Cong UCLA Computer Science Department Los Angeles, CA 90095 http://cadlab.cs.ucla.edu/~ /~cong

More information

Thermal-Aware 3D IC Physical Design and Architecture Exploration

Thermal-Aware 3D IC Physical Design and Architecture Exploration Thermal-Aware 3D IC Physical Design and Architecture Exploration Jason Cong & Guojie Luo UCLA Computer Science Department cong@cs.ucla.edu http://cadlab.cs.ucla.edu/~cong Supported by DARPA Outline Thermal-Aware

More information

VLSI Physical Design: From Graph Partitioning to Timing Closure

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8 Timing Closure VLSI Physical Design: From Graph Partitioning to Timing Closure Original Authors: Andrew B. Kahng, Jens, Igor L. Markov, Jin Hu VLSI Physical Design: From Graph Partitioning to

More information

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Digital Design Methodology (Revisited) Design Methodology: Big Picture Digital Design Methodology (Revisited) Design Methodology Design Specification Verification Synthesis Technology Options Full Custom VLSI Standard Cell ASIC FPGA CS 150 Fall 2005 - Lec #25 Design Methodology

More information

Lecture 8: Synthesis, Implementation Constraints and High-Level Planning

Lecture 8: Synthesis, Implementation Constraints and High-Level Planning Lecture 8: Synthesis, Implementation Constraints and High-Level Planning MAH, AEN EE271 Lecture 8 1 Overview Reading Synopsys Verilog Guide WE 6.3.5-6.3.6 (gate array, standard cells) Introduction We have

More information

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2) ESE535: Electronic Design Automation Preclass Warmup What cut size were you able to achieve? Day 4: January 28, 25 Partitioning (Intro, KLFM) 2 Partitioning why important Today Can be used as tool at many

More information

HIPEX Full-Chip Parasitic Extraction. Summer 2004 Status

HIPEX Full-Chip Parasitic Extraction. Summer 2004 Status HIPEX Full-Chip Parasitic Extraction Summer 2004 Status What is HIPEX? HIPEX Full-Chip Parasitic Extraction products perform 3D-accurate and 2D-fast extraction of parasitic capacitors and resistors from

More information

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence Chen-Wei Liu 12 and Yao-Wen Chang 2 1 Synopsys Taiwan Limited 2 Department of Electrical Engineering National Taiwan University,

More information

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance

More information