Flexible Cache Error Protection using an ECC FIFO

Size: px

Start display at page:

Download "Flexible Cache Error Protection using an ECC FIFO"

Cecil Sims
5 years ago
Views:

1 Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1

2 ECC FIFO Goal: to reduce on-chip ECC overhead Two-tiered error protection T1EC: light-weight on-chip error code T2EC: strong error correcting code Off-load T2EC overhead to FIFO in DRAM Why FIFO? It s simple to manage 15-25% LLC area reduction 10-17% LLC power saving Just 1% performance penalty 2

3 BACKGROUND 3

4 Error Correcting Codes 1-bit parity for error detection SEC-DED (Hamming) codes Single-bit Error Correction and Double-bit Error Detection 8bit ECC for 64bit data DEC-TED Double-bit Error Correction and Triple-bit Error Detection 15bit ECC for 64bit data 4

5 Interleaving To detect and correct burst errors N-way interleaving converts an N-bit burst error to N single-bit errors N-1 N N+1 N+2 2N-1 2N 2N+1 2N+2 Error code 0 Error code 1 Error code 2 Error code N-1 5

6 Interleaving To detect and correct burst errors N-way interleaving converts an N-bit burst error to N single-bit errors N-1 N N+1 N+2 2N-1 2N 2N+1 2N+2 Error code 0 Error code 1 Error code 2 Error code N-1 6

7 Interleaving To detect and correct burst errors N-way interleaving converts an N-bit burst error to N single-bit errors Baseline cache error protection 8 way interleaved SEC-DED Can correct up to 8-bit burst errors 8B ECC per 64B cache line 7

8 Uniform Error Protection Tag Data ECC 8 ways 2 11 sets 64B 8B ECC increases area AND leakage/dynamic power 8

9 RELATED WORK 9

10 Soft Errors: Observations Still, Soft Error Rate (SER) is low Every cache access tries to detect errors, but finds no error in most cases Error Detection Common case Need a low cost, low overhead error detection mechanism Error Correction Uncommon case Correction can be slow But, still need to maintain error correction info somewhere Memory hierarchy provides redundancy inherently for clean data Only dirty lines need error correcting codes 10

11 PERC [Sorin 06] / Energy Efficient [Li 04] Tag Data EDC ECC 8 ways 2 11 sets 64B 1B 7B Read only Data and EDC saves dynamic power Power gate ECC of clean lines saves static power 11

12 Area Efficient [Kim 06] 4 ways Tag Data EDC ECC 2 12 sets 2 12 sets 64B 1B 8B Allow only 1 dirty line per set 12

13 MAXn scheme Tag Data EDC 8 ways 2 11 sets n ways (n<8) 2 11 sets Tag ECC 64B 1B 8B Allow only n dirty lines per set May cause detrimental cleaning traffic 13

14 Two-tiered error protection Tier-1 Error Code (T1EC) On-chip light-weight error code Uniform error protection Tier-2 Error Code (T2EC) Strong error codes only for dirty lines Corrects Detected but Uncorrected Errors (DUE) of T1EC 14

15 Memory Mapped ECC [Yoon 09] Tag Data T1EC T2EC 8 ways 2 11 sets 64B 1B 8B On-Chip DRAM T2EC is memory mapped AND cached 15

16 ECC FIFO 16

17 ECC FIFO Use Two-tiered error protection T2EC is off-loaded to FIFO in DRAM LLC caching behavior is unaffected FIFO Simple to manage Coalesce buffer To better utilize DRAM channel 17

18 Rest of cache hierarchy Last Level Cache Data T1EC T2EC encoder Coalesce Buffer DRAM ECC FIFO 18

19 Rest of cache hierarchy Dirty line eviction to LLC Last Level Cache Data T1EC T2EC encoder Coalesce Buffer DRAM ECC FIFO 19

20 Rest of cache hierarchy Last Level Cache Data T1EC Encode T2EC and TAG T2EC TAG encoder T2EC Coalesce Buffer DRAM ECC FIFO 20

21 Rest of cache hierarchy Last Level Cache Data T1EC T2EC encoder TAG T2EC Push to Coalesce Buffer Coalesce Buffer DRAM ECC FIFO 21

22 Rest of cache hierarchy Next dirty line comes Data T1EC Last Level Cache DRAM T2EC encoder TAG TAG T2EC T2EC Tag/T2EC buffered in Coalesce Buffer Coalesce Buffer ECC FIFO 22

23 Rest of cache hierarchy Last Level Cache Data T1EC T2EC encoder TAG TAG TAG TAG TAG TAG T2EC T2EC T2EC T2EC T2EC T2EC Coalesce Buffer is now FULL Coalesce Buffer DRAM ECC FIFO 23

24 Rest of cache hierarchy Last Level Cache DRAM Data T1EC Write the coalesced T2EC into ECC FIFO T2EC encoder TAG TAG TAG TAG TAG TAG T2EC T2EC T2EC T2EC T2EC Coalesce T2EC Buffer T2EC write size matches to DRAM burst size ECC FIFO T2EC 24

25 Rest of cache hierarchy Last Level Cache DRAM Data T1EC T2EC encoder Coalesce Buffer Coalesce Buffer becomes empty ECC FIFO T2EC 25

26 More on ECC FIFO Write-back data, but write-through ECC Potential performance degradation Increased DRAM traffic due to T2EC writes Error correction Search the matching TAG in coalesce buffer AND ECC FIFO May take a long time Not a problem since SER is low Sometimes, may not find the matching TAG ECC FIFO is finite Potentially unprotected dirty lines discussed later 26

27 EVALUATION 27

28 Performance Evaluation GEMS + DRAMsim An out-of-order SPARC V9 core Exclusive two-level cache hierarchy DDR2 667MHz 533GB/s Eager write-back Clean dirty lines periodically Workloads 16 data intensive applications SPEC CPU 2006, PARSEC, and SPLASH2 28

29 Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Performance Penalty DRAM 533 GB/s 60% 12% SPLASH2 PARSEC SPEC

30 Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Performance Penalty DRAM 533 GB/s 60% 12% SPLASH2 PARSEC SPEC DRAM 267 GB/s 63% 68% 23% SPLASH2 PARSEC SPEC

31 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 09 SPLASH2 PARSEC SPEC

32 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 09 SPLASH2 PARSEC SPEC

33 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 09 SPLASH2 PARSEC SPEC

34 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME Max1 Max2 Max4 MME ECC FIFO % 1% 09 SPLASH2 PARSEC SPEC

35 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time Comparison to MAXn and MME % 36% Max1 Max2 Max4 MME ECC FIFO 11 11% 11% 8% 1 4% 1% 09 SPLASH2 PARSEC SPEC

36 Execution Time [cycle] 260E E+09 Comparison to MME OCEAN 258x258 Baseline MME ECC FIFO 220E E E % 160E E KB 512KB 1MB 2MB LLC size 36

37 LLC Area and Power CACTI 5 model of a 45nm process Parity T1EC + SEC-DED T2EC 15% area and 9% power saving Compared to SEC-DED baseline SEC-DED T1EC + DEC-TED T2EC 15% area and 10% power saving Compared to DEC-TED baseline More error code examples in the paper 37

38 FIFO OVERWRITE 38

39 ECC FIFO is Finite ECC FIFO is implemented as a circular buffer Each T2EC push overwrites the oldest entry in the FIFO If the associated data is still valid even after T2EC is overwritten The cache line is no longer protected Errors on this line cannot be corrected using T2EC 39

40 t k LLC ECC FIFO A small LLC and an ECC FIFO with k entries (no coalesce buffer for the simplicity) 40

41 t 0 dirty line eviction to LLC TAG/T2EC to ECC FIFO 0 0 k LLC ECC FIFO 41

42 t 0 1 dirty line eviction to LLC TAG/T2EC to ECC FIFO k LLC ECC FIFO 42

43 t dirty line eviction to LLC TAG/T2EC to ECC FIFO k LLC ECC FIFO 43

44 t k-3 k-2 k k-2 3 k-1 k-3 k-1 k-2 k k LLC ECC FIFO 44

45 t k-3 k-2 k-1 k dirty line eviction to LLC TAG/T2EC to ECC FIFO k k-2 3 k-1 k-3 k k-1 k k LLC ECC FIFO T2EC 0 is evicted from the FIFO dirty line 0 is no longer protected 45 0

46 T2EC valid-time Reuse or evict t k-3 k-2 k-1 k T2EC overwrite-time Window of vulnerability T2EC valid-time > T2EC overwrite-time The cache line becomes T2EC unprotected 46

47 T2EC valid-time Reuse or evict t k-3 k-2 k-1 k T2EC overwrite-time T2EC valid-time > T2EC overwrite-time The cache line becomes T2EC unprotected T2EC valid-time Reuse or evict Window of vulnerability t k-3 k-2 k-1 k T2EC overwrite-time T2EC valid-time < T2EC overwrite-time No window of vulnerability 47

48 How to Avoid (or Mitigate) It? T2EC overwrite-time Make the FIFO bigger The bigger the FIFO, the longer T2EC lifetime Not sure how big it has to be T2EC valid-time Inherent memory access pattern A dirty line is re-used or evicted Make the T2EC obsolete before it gets overwritten Eager write-back Limit the lifetime of dirty lines 48

49 Eager Write-Back Scan LLC lines periodically Eagerly write-back dirty lines older than the period It s cleaning, so it doesn t affect hit-rate 1M cycle period Limit the T2EC valid-time (max: 2xEWB period) More T2ECs become obsolete before it gets overwritten Eager write-back improves performance 6-10% in general, 26% in libquantum 49

50 Probability of T2EC unprotected 100E E-01 FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp lbm milc sphinx3 100E E E E E E E E E FIFO size [thousand entries]

51 Probability of T2EC unprotected 100E E-01 FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp lbm milc sphinx3 100E E E E E E E E E FIFO size [thousand entries]

52 Probability of T2EC unprotected 100E E-01 FFT CHOLESKY OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp lbm milc sphinx3 100E E E E E E E E E FIFO size [thousand entries]

53 Required FIFO size 100k entry FIFO, the worst case 1MB of storage in DRAM More in the paper A simple analytic model on Probability of unprotected Required FIFO size with regards to eager write-back period how to guarantee zero probability of unprotected 53

54 Conclusion ECC FIFO is an efficient low cost error protection mechanism A simple FIFO off-loads the overhead of strong T2EC 15-25% LLC area reduction 10-17% LLC power saving Does not affect LLC caching behavior Choosing error code for T1EC/T2EC is flexible See more error code examples in the paper Penalties Average 1% performance degradation negligible Increased error correction latency but SER is low Potentially unprotected cache lines Can make this quite low or even guarantee not to occur 54

55 Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 55

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez

Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection