PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University

Size: px

Start display at page:

Download "PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO. IRIS Lab National Chiao Tung University"

Jade Jefferson
5 years ago
Views:

1 PushPull: Short Path Padding for Timing Error Resilient Circuits YU-MING YANG IRIS HUI-RU JIANG SUNG-TING HO IRIS Lab National Chiao Tung University

2 Outline Introduction Problem Formulation Algorithm - PushPull Experimental Results Conclusion

3 Outline 3 Introduction Problem Formulation Algorithm - PushPull Experimental Results Conclusion

4 Introduction 4 Timing characterization is difficult due to a wide range of dynamic variations Supply voltage droops, process variations, etc. A timing guardband is reserved to ensure correct functionality Degrade circuit performance, i.e., limit the clock frequency clk1 Timing guardband clk Timing with normal condition Timing with dynamic variation T 1 T

5 Resilient Circuit 5 Eliminate the guardband by error detection and correction Ex: Razor flip-flop (one error-detection circuit) Correction through instruction replay Cons: short path issue error detection window clk clk_del Data error 0 1 FF Latch D. Ernst et al., Razor: a low-power pipeline based on circuit-level timing speculation MICRO, 003

significant hold time margin for short paths Focus on short path

6 Short Path Issue 6 Short paths should exceed the error detection window Otherwise, may detect false timing errors Require a significant hold time margin for short paths Focus on short path padding in this paper error detection window clk clk_del Data error 1 0 False timing error

7 Previous Work 7 Determine the padding delay path by path Combine with clock skew scheduling to minimize the clock period at logic synthesis stage J.P. Fishburn, IEEE TC, 1990 R.B. Deokar and S.S. Sapatnekar, ISCAS, 1994 S.H. Huang et al., DAC, 007 Determine the gate/wire to insert padding delay Solve by linear programming N. V. Shenoy et al., ICCAD, 1993 Heuristics Pad the gate with the largest setup slack US Patent 6,990,646,006 and 7,78,16,007 Pad the gate passed by most hold violating paths Y. Liu et al., DAC, 011

8 Short Path Padding Example 8 A=0.6 a=0.3 FF 1 i 1 (1.0/0.7/0.3, 0.4/0.1/-0.3) (1./0.8/0.4, 0.5/0./-0.3) R=1.0 r=0.3 R=1. r=0.5 i 3 1 FF (1.1/0.1/1.0, 0.4/0.1/-0.3) o 1 The largest setup slack FF 1 (1.0/0.7/0.3, (0.8/0.7/0.1, (0.8/0.8/0.0, 0.4/0.1/-0.3) 0.3/0.1/-0.) 0.3/0./-0.1) o 1 i R= r=0.5 i 3 (1.1/0.1/1.0, (1.1/0.4/0.7, (0.8/0.4/0.4, 0.4/0.1/-0.3) 0.4/0.4/0.0) 0.1/0.4/0.3) 1 (1./0.8/0.4, (1./1.1/0.1, 0.5/0./-0.3) 0.5/0.5/0.0) R=1.0 r=0.3 FF Setup Legend: Hold Unfix ID (R/A/S, r/a/h) : gate : edge : PI/PO : Flip-flop ID 1 3 Delay

9 Short Path Padding Example (cont d) 9 The most hold violating path A=0.6 a=0.3 FF 1 i 1 (1.0/0.7/0.3, 0.4/0.1/-0.3) R=1.0 r=0.3 i 3 1 FF (1.1/0.1/1.0, 0.4/0.1/-0.3) (1./0.8/0.4, 0.5/0./-0.3) o 1 R=1. r=0.5 FF 1 i 1 i (0.8/0.8/0.0, (1.0/0.7/0.3, (0.8/0.7/0.1, 0.3/0./-0.1) 0.4/0.1/-0.3) 0.3/0.1/-0.) (1.1/0.1/1.0, (0.8/0.1/0.7, 0.4/0.1/-0.3) 0.1/0.1/0.0) (1./0.8/0.4, (1./1.1/0.1, 0.5/0./-0.3) 0.5/0.5/0.0) R=1.0 r=0.3 o 1 R=1. r=0.5 FF Legend: Unfix ID (R/A/S, r/a/h) : gate : edge : PI/PO : Flip-flop ID 1 3 Delay

10 Short Path Padding Example (cont d) 10 A=0.6 a=0.3 FF 1 i 1 (1.0/0.7/0.3, 0.4/0.1/-0.3) (1./0.8/0.4, 0.5/0./-0.3) R=1.0 r=0.3 R=1. r=0.5 i 3 1 FF (1.1/0.1/1.0, 0.4/0.1/-0.3) o 1 FF 1 i 1 i (0.9/0.9/0.0, 0.3/0.3/0.0) (0.9/0./0.7, 0./0./0.0) (1./1./0.0, 0.5/0.5/0.0) R=1.0 r=0.3 o 1 R=1. r=0.5 FF Legend: Optimal padding delay = 0.5 ID (R/A/S, r/a/h) : gate : edge : PI/PO : Flip-flop ID 1 3 Delay

11 Our Contributions 11 Find the padding values and locations with a global view Compute the fanout padding flexibility of each gate Realize delay padding at the post-layout stage Coarse-grained delay padding: by spare cells Fine-grained delay padding: by dummy metal

12 Outline 1 Introduction Problem Formulation Algorithm - PushPull Experimental Results Conclusion

13 Problem Formulation 13 Given Placed and routed resilient design Cell library Spare cells and dummy metal information Target clock period and error detection window Goal Pad short paths Satisfy Setup/hold timing constraints Minimize the padding overhead

14 Delay Padding 14 Lengthen short paths by buffer insertion and extra capacitance hook-up buffer any type of cells Buffer insertion Padding wire Extra cap hook-up Padding gate

15 Outline 15 Introduction Problem Formulation Algorithm - PushPull Experimental Results Conclusion

16 Overview of PushPull 16 Start netlist, DEF/SPEF file cell library, timing constraints Padding Value Determination Padding resource collection Fanout padding flexibility calculation / Feasibility checking Load/Buffer Allocation Spare cell selection Dummy metal allocation Y Padding value decision Improvement? N Padding value refinement Timing analysis End

17 17 Example: Padding Value Determination First, pad gates Legend: A=0.6 a=0.3 FF 1 i (1.0/0.7/0.3, (1.0/1.0/0.0, 0.5/0.1/-0.4) 0.5/0.4/-0.1) (1./0.8/0.4, (1./1./0.0, 0.5/0./-0.3) 0.5/0.5/0.0) R=1.0 r=0.5 R=1. r=0.5 i 3 1 FF (1.1/0.3/0.7, (1.1/0.1/1.0, 0.3/0.3/0.0) 0.4/0.1/-0.3) o 1 ID (R/A/S, r/a/h) : gate : edge : PI/PO : Flip-flop ID 1 3 Delay

18 18 Example: Padding Value Determination Second, pad wires Legend: A=0.6 a=0.3 FF 1 i (1.0/1.0/0.0, 0.5/0.4/-0.1) 0.5/0.5/0.0) (0.6,-0.1) (0.5,0.0) (0.0,-0.1) (0.0,0.0) (0.6,0.) +0.3 (1./1./0.0, 0.5/0.5/0.0) R=1.0 r=0.5 R=1. r=0.5 i 3 1 FF (1.1/0.3/0.7, 0.3/0.3/0.0) o 1 ID (R/A/S, r/a/h) : gate : edge : PI/PO : Flip-flop ID 1 3 Delay

19 Example: Load/Buffer Allocation 19 A=0.6 a=0.3 FF 1 i (1.0/1.0/0.0, 0.5/0.5/0.0) (1./1./0.0, 0.5/0.5/0.0) R=1.0 r=0.5 R=1. r=0.5 i 3 1 FF (1.1/0.3/0.7, 0.3/0.3/0.0) 0.5 Spare cell Dummy metal o 1 Legend: ID (R/A/S, r/a/h) : gate : edge : PI/PO : Flip-flop ID 1 3 Delay

20 Padding Value Determination 0 Start netlist, DEF/SPEF file cell library, timing constraints Padding Value Determination Padding resource collection Fanout padding flexibility calculation / Feasibility checking Load/Buffer Allocation Spare cell selection Dummy metal allocation Y Padding value decision Improvement? N Padding value refinement Timing analysis End

21 Fanout Padding Flexibility Calculation 1 Fanout padding flexibility P F (i) of gate i Reflect the maximum padding value allowed on its whole fanout cone with a global view P F i, j = 0, if g i PO or H i 0; min 0, min H i, j e i, j E H i, otherwise. A=0.6 a=0.3 FF 1 i 1 (0.8/0.7/0.1, 0.3/0.1/-0.) (1.0/0.7/0.3, 0.4/0.1/-0.3) P F ()=0.1 P F (3)=0.3 P F (1)=0.0 (1./0.8/0.4, 0.5/0./-0.3) R=1.0 r=0.3 R=1. r=0.5 i 3 1 FF (0.8/0.1/0.7, 0.1/0.1/0.0) (1.1/0.1/1.0, 0.4/0.1/-0.3) +0.3 o 1 (1./1.1/0.1, 0.5/0.5/0.0)

22 Padding Value Decision Decide the padding value of each gate in the topological order Only pad the remaining slack P i = max P saf i P F i, 0 A=0.6 a=0.3 FF 1 i 1 P()=0. (1.0/0.7/0.3, 0.4/0.1/-0.3) P F ()=0.1 P F (3)=0.3 P F (1)=0. (1./0.8/0.4, (1./1.0/0., 0.5/0./-0.3) R=1.0 r=0.3 R=1. r=0.5 i 3 1 FF (1.1/0.1/1.0, 0.4/0.1/-0.3) P(3)=0.0 P(1)=0. o 1

23 Padding Value Decision 3 Iterate until no more improvement A=0.6 a=0.3 FF 1 i 1 P()=0. (0.9/0.9/0.0, 0.3/0.3/0.0) P F ()=0.0 P F (3)=0.0 P F (1)=0.0 R=1.0 r=0.3 R=1. r=0.5 i 3 1 FF (0.9/0.1/0.8, (0.9/0./0.7, 0./0.1/-0.1) 0./0./0.0) P(3)=0.1 (1./1./0.0, 0.5/0.4/-0.1) 0.5/0.5/0.0) P(1)=0. o 1

24 Padding Value Refinement 4 Further reduce the total padding values with forked path Push the padding values toward the gates where two or more paths fork A=0. A=0. (0.9/0.6/0.3, 0.4/0.4/0.0) P()=0.3 P(3)=0.3 3 (0.9/0.6/0.3, 0.4/0.4/0.0) (1.0/0.7/0.3, 0.5/0.5/0.0) 1 P(1)=0.3 Total = 0.6 Total = 0.3 R=1.0 r=0.5 FF 1 A=0.4 (0.5/0.5/0.0, 0.1/0.1/0.0) 1 P(1)=0. Total = 0.5 Total = 0.3 (1.0/1.0/0.0, 0.5/0.5/0.0) 0.0 P()= P(3)=0.3 3 (1.0/1.0/0.0, 0.5/0.5/0.0) R=1.0 r=0.5 FF 1 R=1.0 r=0.5 FF Joined short paths Forked short paths

25 Buffer Insertion 5 When hold violations cannot be fully cleaned by padding on gates Padding on wires (buffer insertion) +0.1 A=0.6 a=0.3 FF 1 i 1 (1.0/1.0/0.0, 0.5/0.4/-0.1) 0.5/0.5/0.0) (0.6, (0.5, -0.1) 0.0) (0.0, -0.1) 0.0) (0.0, 0.) P()=0.3 P(3)=0. (1./1./0.0, 0.5/0.5/0.0) R=1.0 r=0.5 R=1. r=0.5 i 3 1 FF (1.0/0.3/0.7, 0.3/0.3/0.0) (0.0, 0.1) P(1)=0.1 (0.7, 0.0) (0.7, 0.0) (0.0, 0.0) o 1

26 Load/Buffer Allocation 6 Start netlist, DEF/SPEF file cell library, timing constraints Padding Value Determination Padding resource collection Fanout padding flexibility calculation / Feasibility checking Load/Buffer Allocation Spare cell selection Dummy metal allocation Y Padding value decision Improvement? N Padding value refinement Timing analysis End

27 Spare Cell Selection (1/) 7 Extract the available spare cells located within the bounding box of its fanout net Find suitable spare cell candidates for gate/wire Reduce to the subset sum problem +0.5 s 1 : 0. s s : : 0.1 spare cell 3 Spare cells {s 1 } {s 1, s } {s 1, s, s 3 } Padding value {s 3 } {s } 0.15 {s } 0.0 {s 1 } 0.0 {s 1 } 0.0 {s 1 } 0.0 {s 1 } 0.0 {s 1 } 0.5 {s, s 3 }

28 Spare Cell Selection (/) 8 Solve the competition problem between different padding gates/wires Record multiple subset sum solutions Sort the gates/wires in ascending order of the number of spare cell solutions Assign the gates/wires to their best solution in the sorted order Padding gates/wires Spare cells s 1 Spare cells Subset sum for g 1 g 1 s s s 3 s 4 g s 3 Subset sum for g Subset sum for w 3 s 5 w 3 s 4 s 5 optimal suboptimal s 6

29 Dummy Metal Allocation 9 Fix remaining padding by dummy metal insertion Convert the delay to an amount of capacitance Assign un-overlapped metal resource Resort the unfixed padding to maximum network flow padding g g w d 1 d bounding box 0.15/0.15 -/0.15 s 0./0. -/0. 0.1/0.1 -/0.1 Padding gates/wires g 1 g w / -/ 0.1/ -/ 0.1/ -/ 0.1/ -/ Dummy metal d 1 d 0.5/0.5 -/0.5 t 0./0. -/0. flow/capacity

30 Outline 30 Introduction Problem Formulation Algorithm - PushPull Experimental Results Conclusion

31 Experimental Results 31 Platform: Intel Xeon CPU with 16GB memory, CentOS 5.5 Compare with two greedy heuristics Circuit Greedy 1: Greedy : Ours: PushPull Largest setup first [] Most path sharing first [1][][3] TNS 1 THS 1 Padding #Ite. Runtime TNS 1 THS 1 Padding Runtime TNS #Ite. 1 THS 1 Padding Runtime #Ite. (ps) (ps) delay (ps) (s) (ps) (ps) delay (ps) (s) (ps) (ps) delay (ps) (s) s s , , , s , , , s , , , s s , , , s , , , des_perf , , ,0.0 3,414 6, , ,088.0,57 4,0.46 b ,675, , ,18, ,90 37, , ,79,30.0 6,0 10, TNS 1 : total negative setup slack after padding value determination. THS 1 : total negative hold slack after padding value determination. [1] S. Yoshikawa, Hold time error correction method and correction program for integrated circuits, US Patent 6,990,646, 006. [] Y. Sun et al., Method and apparatus for fixing hold time violations in a circuit design, US Patent 7,78,16, 007. [3] Y. Liu et al., Re-synthesis for cost-efficient circuit-level timing speculation, DAC, pp , 011.

32 Experimental Results (cont d) 3 Compare with LP solutions Conservative LP Ours: PushPull Circuit #Gate #FF #SFF clock period THS (ps) TNS 1 THS 1 Padding Runtime TNS 1 THS 1 Padding Normalized #Ite. Runtime (ns) (ps) (ps) delay (ps) (s) (ps) (ps) delay (ps) to LP (s) s s , , , s , , , s , , , s s , , , s ,890 1, , , , des_perf 51,349 8,8081, , , , b19 7,87 5,541, ,481,800.0 NA NA NA timeout ,675, TNS 1 : total negative setup slack after padding value determination. THS 1 : total negative hold slack after padding value determination. The impact of input slew on padding delay capacitance conversion The average error rate over all cases is only 0.56%. LP: N. V. Shenoy et al., "Minimum padding to satisfy short path constraints," ICCAD-93.

33 Experimental Results (cont d) 33 Compare our load/buffer allocation method with LP+Mapping Circuit LP+Mapping Ours: PushPull TNS (ps) THS (ps) Runtime (s) TNS (ps) THS (ps) Runtime (s) s s s s s s s , des_perf , b19 NA NA NA TNS : total negative setup slack after load/buffer allocation. THS : total negative hold slack after load/buffer allocation. LP+Mapping: N. V. Shenoy et al., "Minimum padding to satisfy short path constraints," ICCAD-93.

34 Conclusion 34 Focus on the severe short path padding problem in resilient circuits Enable the timing error detection and correction mechanism of resilient circuits Determine the padding values and locations with a global view vs. the local view adopted by resent prior work Realize the determined padding values at physical implementation Propose coarse-grained and fine-grained load/buffer allocation by using spare cells and dummy metal

35 Future Work 35 Allocate spare cell and dummy metal simultaneously Spare cells: discrete capacitance resource Dummy metal: continuous capacitance resource Adopt Composite Current Source (CCS) timing model Use more accuracy delay model

36 36 Thank You! Contact info: Yu-Ming Yang

37 37 Backup Slides

38 The Resilient Circuit Design Flow 38 Logic synthesis Timing analysis S th /H th selection Resynthesis Timing analysis Resilient circuit replacement Placement & routing Short path padding

Kyoung Hwan Lim and Taewhan Kim Seoul National University

Kyoung Hwan Lim and Taewhan Kim Seoul National University Table of Contents Introduction Motivational Example The Proposed Algorithm Experimental Results Conclusion In synchronous circuit design, all sequential