Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez

Size: px

Start display at page:

Download "Memory Mapped ECC Low-Cost Error Protection for Last Level Caches. Doe Hyun Yoon Mattan Erez"

Jerome George
6 years ago
Views:

1 Memory Mapped ECC Low-Cost Error Protection for Last Level Caches Doe Hyun Yoon Mattan Erez

2 1-Slide Summary Reliability issues in caches Increasing soft error rate (SER) Cost increases with error protection capability Memory Mapped ECC Two-tiered error protection Tier-1 Error Code: Light-weight on-chip error codes Tier-2 Error Code: Strong error codes in DRAM namespace T2EC storage is off-loaded to DRAM No compromise on error protection Achieved 15% area and 8% power reduction Low performance overhead 13% on average Supports arbitrary error codes (c) Doe Hyun Yoon Memory Mapped ECC 2

3 Background

4 ECC overhead [%] N Error Detecting and Correcting Codes Parity for error detection SEC-DED (Hamming code) Single bit Error Correction and Double bit Error Detection Parity SEC-DED DEC-TED bit 7 bit 1 bit 8 bit 9 bit Data bits (c) Doe Hyun Yoon Memory Mapped ECC 4

5 ECC overhead [%] N Error Detecting and Correcting Codes Parity for error detection SEC-DED (Hamming code) Single bit Error Correction and Double bit Error Detection DEC-TED Double bit Error Correction and Triple bit Error Detection 100 Parity SEC-DED 80 DEC-TED bit 13 bit 6 bit 15 bit 7 bit 17 bit 1 bit 8 bit 9 bit Data bits (c) Doe Hyun Yoon Memory Mapped ECC 5

6 Interleaved Error Codes Interleaving To protect burst (adjacent) errors N-way Interleaved code makes an N-bit burst error to N single bit errors for each error code N-1 N N+1 N+2 2N-1 2N 2N+1 2N+2 Error code 0 Error code 1 Error code 2 (c) Doe Hyun Yoon Memory Mapped ECC 6 Error code N-1

7 Interleaved Error Codes Interleaving To protect burst (adjacent) errors N-way Interleaved code makes an N-bit burst error to N single bit errors for each error code 8-way interleaved SEC-DED for a 64B cache line 8 SEC-DED codes (8B) 13% Overhead Can correct up to 8 bit burst errors (c) Doe Hyun Yoon Memory Mapped ECC 7

8 Conventional Uniform Error Protection Tag Data ECC 8 ways 2 11 sets 64B 8B ECC increases area, leakage/dynamic power (c) Doe Hyun Yoon Memory Mapped ECC 8

9 Related Work (c) Doe Hyun Yoon Memory Mapped ECC 9

10 Observations on Soft Errors in LLC Error detection common case operation Every cache line read requires error detection Only dirty cache lines need ECC Memory hierarchy provides multiple copies for clean lines (c) Doe Hyun Yoon Memory Mapped ECC 10

11 Prior work PERC [Sorin 06] and Energy Efficient [Li 04] Tag Data EDC ECC 8 ways 2 11 sets 64B Read only data and EDC save dynamic power Power gate ECC of clean lines save static power (c) Doe Hyun Yoon Memory Mapped ECC 11 1B 7B

12 Area Efficient Scheme [Kim 06] 4 ways Tag Data EDC ECC 2 12 sets 2 12 sets 64B 1B 8B Allow only 1 dirty line per set (c) Doe Hyun Yoon Memory Mapped ECC 12

13 MAXn Scheme Tag Data EDC 2 11 sets 8 ways n ways (n<8) 2 11 sets Tag ECC 64B 1B 8B Allow only n dirty lines per set May cause detrimental cleaning traffic (c) Doe Hyun Yoon Memory Mapped ECC 13

14 Memory Mapped ECC (c) Doe Hyun Yoon Memory Mapped ECC 14

15 Observations on Soft Errors in LLC Error detection -- common case operation Every cache line read requires error detection Only dirty cache lines need ECC Memory hierarchy provides multiple copies for clean lines Soft Error Rate is increasing, but is still very low Mean time to failure (MTTF) of a 32MB cache is estimated as 155 days even with pessimistic assumptions Error correction uncommon, extremely rare, case latency/complexity is not important (c) Doe Hyun Yoon Memory Mapped ECC 15

16 MME Architecture 15% Area Saving Tag Data ECC 8 ways 64B 8B On-Chip DRAM (c) Doe Hyun Yoon Memory Mapped ECC 16

17 MME Architecture 15% Area Saving Tag Data T1EC T2EC 8 ways 64B 1B 8B On-Chip DRAM (c) Doe Hyun Yoon Memory Mapped ECC 17

18 MME Architecture 15% Area Saving Tag Data T1EC T2EC 8 ways 64B 1B 8B On-Chip DRAM T2EC is memory mapped to DRAM namespace (c) Doe Hyun Yoon Memory Mapped ECC 18

19 Managing T2EC information ways set Data + T1EC (64B+1B) Last Level Cache T2EC (8B) T2EC in DRAM (c) Doe Hyun Yoon Memory Mapped ECC 19

20 Managing T2EC information ways A physical cache line is associated with a T2EC in DRAM set Data + T1EC (64B+1B) Last Level Cache T2EC (8B) T2EC in DRAM (c) Doe Hyun Yoon Memory Mapped ECC 20

21 Managing T2EC information ways A physical cache line is associated with a T2EC in DRAM A T2EC line set T2EC Read Data + T1EC (64B+1B) Last Level Cache T2EC (8B) T2EC in DRAM (c) Doe Hyun Yoon Memory Mapped ECC 21

22 Managing T2EC information ways A physical cache line is associated with a T2EC in DRAM A T2EC line set T2EC Read Data + T1EC (64B+1B) Last Level Cache T2EC Write T2EC (8B) T2EC in DRAM (c) Doe Hyun Yoon Memory Mapped ECC 22

23 Managing T2EC information ways A physical cache line is associated with a T2EC in DRAM A T2EC line set T2EC Read is avoided when all these lines are clean T2EC Read Data + T1EC (64B+1B) Last Level Cache T2EC Write T2EC (8B) T2EC in DRAM (c) Doe Hyun Yoon Memory Mapped ECC 23

24 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average T2EC Miss Rate [%] N T2EC Miss Rate T2EC allocate T2EC fetch SPLASH2 PARSEC SPEC2006 (c) Doe Hyun Yoon Memory Mapped ECC 24

25 Implementation Details T2EC is inherently private to a cache Can be easily integrated with cache coherent multi processor systems T2EC region in DRAM 128kB for a 1MB cache in our example (8way SEC-DED as T2EC) Relatively small compared to DRAM capacity Can be reserved by OS or BIOS A generic way of providing meta-data for physical cache lines Performance impact Increased cache miss rate due to T2EC in caches Increased traffic of T2EC reads and writes (c) Doe Hyun Yoon Memory Mapped ECC 25

26 Memory Mapped ECC RECAP Two-Tiered Error Protection Low on-chip ECC cost Only T1EC on-chip, T2EC storage is off-loaded to DRAM 15% area and 8% power in a 1MB cache (45nm Cacti model) No compromise on error protection capability T2EC provides strong error protection T2EC is memory mapped and cacheable Dynamic and transparent partition of LLC into data and T2EC Worst case error correction is roughly DRAM latency Flexibility in design T1EC/T2EC T1EC: error detection, T2EC: error correction T1EC: light-weight error correction, T2EC: strong error correction More examples in our paper (c) Doe Hyun Yoon Memory Mapped ECC 26

27 Evaluation (c) Doe Hyun Yoon Memory Mapped ECC 27

28 Evaluation GEMS + DRAMsim An out-of-order 3GHZ SPARC processor core Exclusive two level cache hierarchy L1: split I/D, each 2-way 64kB L2: unified 8-way 1MB DDR2 667MHz DRAM: 533GB/s Eager write-back is integrated Dirty lines are periodically scanned and cleaned Workloads 16 data-intensive applications from SPLASH2, PARSEC, and SPEC 2006 Other non data-intensive applications are insensitive to our technique Omitted for clarity and emphasize degradation (c) Doe Hyun Yoon Memory Mapped ECC 28

29 ormalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average N Performance 13% Penalty on Average DRAM 533 GB/s 4% 13% SPLASH2 PARSEC SPEC2006 DRAM 267 GB/s 5% 12% (c) Doe Hyun Yoon SPLASH2 PARSEC SPEC2006 Memory Mapped ECC 29

30 CHOLESKY FFT OCEAN RADIX canneal dedup fluidanimate freqmine bzip2 mcf hmmer libquantum omnetpp milc lbm sphinx3 Average Normalized Execution Time N Comparison to MAXn Scheme 14 36% Max % Max2 Max4 MME 11 11% 11% 8% 1 1% 09 SPLASH2 PARSEC SPEC2006 (c) Doe Hyun Yoon Memory Mapped ECC 30

31 Base MAX1 MME Base MAX1 MME Base MAX1 MME Base MAX1 MME Base MAX1 MME Base MAX1 MME Base MAX1 MME Base MAX1 MME Request per thousand instructions N Traffic Increase 2% on Average T2EC Rd T2EC Wr Cleaning eager write-back DRAM Wr DRAM Rd CHOL OCEAN canneal dedup bzip2 mcf milc lbm SPLASH2 PARSEC SPEC 2006 (c) Doe Hyun Yoon Memory Mapped ECC 31

32 Conclusions Flexible two-tiered error protection Low on-chip overhead of only T1EC No dedicated on-chip storage for T2EC No compromise on error protection T2EC is memory mapped and cacheable Reduced cost 15% area saving and 8% LLC power saving Performance impact 13% on average DRAM traffic 2% increase on average (c) Doe Hyun Yoon Memory Mapped ECC 32

Flexible Cache Error Protection using an ECC FIFO

Flexible Cache Error Protection using an ECC FIFO Doe Hyun Yoon and Mattan Erez Dept Electrical and Computer Engineering The University of Texas at Austin 1 ECC FIFO Goal: to reduce on-chip ECC overhead