Performance Optimization and Analysis of Blade Designs under Delay Variability

Size: px

Start display at page:

Download "Performance Optimization and Analysis of Blade Designs under Delay Variability"

Gervais Lewis
5 years ago
Views:

Matheus Trevisan Moreira, Melvin Breuer, Ney Laert Vilar Calazans, and

Beerel May 5th, 2015 * University of Southern California, Los Angeles,

1 Performance Optimization and Analysis of Blade Designs under Delay Variability Dylan Hand, Hsin-Ho Huang, Benmao Chang, Yang Zhang *, Matheus Trevisan Moreira, Melvin Breuer, Ney Laert Vilar Calazans, and Peter A. Beerel May 5th, 2015 * University of Southern California, Los Angeles, CA Pontifi cia Universidade Cato lica do Rio Grande do Sul, Porto Alegre, Brazil Dept. of Automation, Tsinghua University, Beijing, China

2 The Context: Delay Overheads Cycle time of clocked logic Logic Time Logic gates PVT margin Clock margin Flip-flop alignment Traditional synchronous design suffers from increased margins Worse at low and near-threshold regions [Dreslinksi et al., IEEE Proc. 2010] MOTIVATION 2

3 Number of Operations Potential of Average-Case Data Cycle time of clocked logic Data delay in Plasma CPU Logic Time Logic gates Worst-Case Data Delay Slower Delay variation due to data is rarely exploited in synchronous designs MOTIVATION 3

4 Number of Operations Potential of Average-Case Data Cycle time of clocked logic Worst-Case Delay Data delay in Plasma CPU Logic Time Logic gates Worst-Case Data Delay Slower Delay variation due to data is rarely exploited in synchronous designs MOTIVATION 3

5 Number of Operations Potential of Average-Case Data Cycle time of clocked logic Average Delay Worst-Case Delay Data delay in Plasma CPU Logic Time Logic gates Worst-Case Data Delay Slower Delay variation due to data is rarely exploited in synchronous designs MOTIVATION 3

6 CLK Sample CLK Sample One Solution: Resiliency Controller A Asynchronous Controller Controller B Err Reconfigurable Delay Lines Err Combinational Logic Error Detection Logic Single Rail Datapath Error Detection Logic Error Detecting Latch Timing errors delay handshaking by the resiliency window RESILIENCY 4

7 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4

8 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4

9 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4

10 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4

11 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err = 1 Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4

12 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err = 1 Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4

13 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4

14 CLK Sample CLK Sample One Solution: Resiliency Controller A Controller B Err Err Combinational Logic Error Detection Logic Error Detection Logic Timing errors delay handshaking by the resiliency window RESILIENCY 4

15 Number of Operations Resiliency Performance Benefit Nominal Delay Worst-Case Delay Data delay in Plasma CPU Delay Key Question: How do we set to optimize performance RESILIENCY 5

16 Outline Performance Optimization Delay models Impact of delay line quantization Impact of metastability Comparison to Bubble Razor Case Study Analyze and optimize a 3-stage Blade CPU Conclusions and Future Work OUTLINE 6

Our approach Log-Normal Distribution Logic Delay Analyze the

17 Probability Probability Frequency Delay Models Normal Distribution Real World Distribution Logic Delay Logic Delay Our approach Log-Normal Distribution Logic Delay Analyze the performance of Blade for a variety of delay models DELAY MODELS 7

18 Probability Optimal Average-Case Performance σ μ + DELAY MODELS 8

Probability Probability Optimal Average-Case Performance P R ( d ) Probability of error (p) + : Average delay of Blade stage Definitions C : Clock Period / Cycle Time EC : Effective Clock Period p

19 Probability Probability Optimal Average-Case Performance P R ( d ) Probability of error (p) + : Average delay of Blade stage Definitions C : Clock Period / Cycle Time EC : Effective Clock Period p : Probability of error d : Average delay of Blade stage C = + d = + p 1-P R ( d ) + Optimal performance achieved by minimizing d Assumes backward latency is hidden via latch retiming DELAY MODELS 8

20 Optimal Probability of Error - p opt p opt observations Varies between 20% and 35% for lognormal distributions Significantly higher than in sync resiliency Higher Variance Constant for normal distributions! DELAY MODELS 9

21 Proof of constant p opt Assume worst case delay per stage is constant Worst case delay is set by mean, variance, and SER K = + K = μ + m σ Systematic Error Rate (ξ) sets the worst-case delay per stage, K ξ = 1 P R d C m = f(ξ) N DELAY MODELS 10

22 Proof of constant p opt Assume worst case delay per stage is constant Worst case delay is set by mean, variance, and SER K = + K = μ + m σ Recall: d = + p = 1 p + p K For Normal distribution: 1 p = 1 [1 + erf μ ] 2 2σ μ 1 2p = erf 2σ Taking inverse error function of both sides: erf 1 μ 1 2p = 2σ = 2σ[erf 1 1 2p ] + μ DELAY MODELS 10

23 Proof of constant p opt Assume worst case delay per stage is constant Worst case delay is set by mean, variance, and SER K = + K = μ + m σ Recall: Rewrite: d = + p = 1 p + p K d = 1 p [ 2σ[erf 1 1 2p ] + μ] + p K Minimize d by taking derivative and setting it equal to zero : d p = 1 + y 2 erf 1 y y 2 erf 1 y + m = 0 y = 1 2p Note y and p are independent of σ and μ! m depends only on ξ Implication: Tuning of delay line may target fixed probability! DELAY MODELS 10

24 (%) Optimal Size of Resiliency Window - Blade supports maximum of 50% of clock cycle Optimal is larger for designs with high-variance! DELAY LINE QUANTIZATION 11

25 Delay Delay Line Quantification Effects Recall: d = + p p * Quantification effects reduced due to inherent tradeoff between nominal delay and error penalty p * DELAY LINE QUANTIZATION 12

26 Delay Delay Line Quantification in BD Bundled Data: d = d Linear relationship between delay line quantization and average stage delay in Bundled Data DELAY LINE QUANTIZATION 13

27 No MS MS in (only) MS in Detection Logic Metastability Effects p : probability of timing violation t MS : time for metastability to resolve t MSE : MS in error-detecting latch t MSL : MS in detection logic E[delay] + + t MSL + t MSL METASTABILITY 14

No MS MS in (only) MS in Detection Logic Metastability Effects p : probability of timing violation t MS : time for metastability to resolve t MSE : MS in

28 No MS MS in (only) MS in Detection Logic Metastability Effects p : probability of timing violation t MS : time for metastability to resolve t MSE : MS in error-detecting latch t MSL : MS in detection logic Metastability resolution times most often hidden! E[delay] + + t MSL + t MSL METASTABILITY 14

29 Metastability Effects METASTABILITY 15

30 Metastability Effects METASTABILITY 15

31 Comparison to Sync Resiliency Synchronous EC set by systematic error rate N-Stage Rings F F F F Bubble Razor [Zhang,2014] EC = C 2 1 p 2N Blade EC = + p opt M S M S COMPARISON 16

32 Comparison to Sync Resiliency Synchronous EC set by systematic error rate Synchronous 35% 23% BR Bubble Razor [Zhang,2014] EC = C 2 1 p 2N Blade EC = + p opt Blade Normal Distribution COMPARISON 16

33 Comparison to Sync Resiliency Synchronous EC set by systematic error rate Bubble Razor [Zhang,2014] EC = C 2 1 p 2N Blade EC = + p opt Normal Distribution COMPARISON 16

34 Application to 3-Stage CPU Plasma MIPS OpenCore 28nm FDSOI Compare mathematical model of optimal resiliency window () with simulation results [1] CASE STUDY 17

35 Application to 3-Stage CPU 20 Distribution A Cycles (%) Delay (%) Distribution C Cycles (%) Delay (%) CASE STUDY 18

36 Application to 3-Stage CPU Distribution A Distribution B Distribution C Model Sim Model Sim Model Sim max 27% 35% 43% opt 26% 27% 34% 35% 39% 37% EC opt 74.8% 75.1% 71.3% 71.9% 75.1% 74.8% Model estimated optimal within 5.4% Optimal EC within 99% CASE STUDY 19

Application to 3-Stage CPU Distribution A Distribution B Distribution C Model Sim Model Sim Model Sim max 27% 35% 43% opt 26% 27% 34% 35% 39% 37% EC opt 74.8% 75.1% 71.3% 71.9% 75.1% 74.

37 Application to 3-Stage CPU Distribution A Distribution B Distribution C Model Sim Model Sim Model Sim max 27% 35% 43% opt 26% 27% 34% 35% 39% 37% EC opt 74.8% 75.1% 71.3% 71.9% 75.1% 74.8% Ideal opt 48% 53% 46% Ideal EC opt 63.2% 67.8% 74.5% Model estimated optimal within 5.4% Optimal EC within 99% Model allows estimation of optimal w/o limitations of simulated design CASE STUDY 19

38 Summary and Conclusions Performance model Use either analytical and real world delay distributions Predicts performance within 99% accuracy Comparison to sync N-stage rings 23% better than Bubble Razor 35% better than traditional designs Several interesting conclusions Optimal error rate is relatively high and may be constant Programmable delay line need not be fine-grained Metastability impact is negligible Supporting larger resiliency windows may be useful CONCLUSIONS 20

39 Questions?

On the Reuse of Timing Resilient Architecture for Testing Path Delay Faults in Critical Paths

On the Reuse of Timing Resilient Architecture for Testing Path Delay Faults in Critical Paths Felipe A. Kuentzer, Leonardo R. Juracy and Alexandre M. Amory Faculty of Informatics, PUCRS University Av.