Virtualized and Flexible ECC for Main Memory

Size: px

Start display at page:

Download "Virtualized and Flexible ECC for Main Memory"

Lorin Morgan
6 years ago
Views:

1 Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin ASPLOS

2 Memory Error Protection Applying ECC uniformly ECC DIMMs Simple and transparent to programmers Error protection level Fixed, design-time decision Chipkill-correct used in high-end servers Constrain memory module design space Allow only x4 DRAMs Lower energy efficiency than x8 DRAMs Virtualized ECC objectives To provide flexible memory error protection To relax design constraints of chipkill 2

3 Virtualized ECC Two-tiered error protection Tier-1 Error Code (T1EC) Simple error code for detection or light-weight correction Tier-2 Error Code (T2EC) Strong error correcting code Store T2EC within the memory namespace itself OS manages T2EC Flexible memory error protection Different T2EC for different data pages Stronger protection for more important data 3

Error Protection Level Virtualized ECC Example Virtual Address space Physical Memory Low Virtual page i Virtual page j Virtual Page to Physical Frame mapping Page

4 Error Protection Level Virtualized ECC Example Virtual Address space Physical Memory Low Virtual page i Virtual page j Virtual Page to Physical Frame mapping Page frame i Page frame j High Virtual page k Page frame k T2EC for Chipkill ECC page j Physical Frame to ECC Page mapping ECC page k Data T1EC T2EC for Double Chipkill 4

5 VIRTUALIZED ECC 5

6 Observations on Memory Errors Per-system error rate is still low Most of time, we try to detect errors finding no error To detect errors is a common case operation Need a low latency, low complexity error detection mechanism T1EC To correct errors is an uncommon case operation Correction can be complex, take a long time But, still need to manage error correction info somewhere Virtualized T2EC 6

7 Uniform ECC Physical Memory VA VPN offset Page Frame PA Virtual Memory PA PFN offset Data ECC 7

8 Virtualized ECC Physical Memory VA VPN offset Page Frame PA Virtual Memory PA PFN offset OS manages PFN to EPN translation Scale according to T2EC size EA T2EC ECC Page ECC Address ECC page number offset Data T1EC 8

9 Update only valid T2EC to DRAM Write: update data, T1EC, and T2EC Don t T2ECs Read: Virtualized of consecutive T2EC need fetch lines T2EC can data ECC data be lines partially operation and map most valid to T1EC a T2EC cases line ECC Address Translation Unit: fast PA to EA translation B0 A 3 PA: 0x0200 ECC Address Translation Unit LLC EA: 0x Wr: 0x0200 DRAM Rank 0 Rank c Rd: 0x00c0 A 5 Wr: 0x0540 B0 B1 B2 B c c c T2EC for Rank 1 data c c Data T1EC Data T1EC T2EC for Rank 0 data

10 Penalty with V-ECC Increased data miss rate T2EC lines in LLC reduce effective LLC size Increased traffic due to T2EC write-back One-way write-back traffic Not in a critical-path 10

11 CHIPKILL-CORRECT 11

12 Chipkill-correct Single Device-error Correct Double Device-error Detect Can tolerate a DRAM failure Can detect a second DRAM failure Chipkill requires x4 DRAMs x8 chipkill is impractical But, x8 DRAM is more energy efficient 12

13 Baseline x4 Chipkill Two x4 ECC DIMMs 128bit data + 16bit ECC (redundancy overhead: 12.5%) 4 check symbol error code using 4-bit symbol Access granularity 64B in DDR2 (min. burst 4 x 128 bit) 128B in DDR3 (min. burst 8 x 128 bit) 144-bit wide data bus x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 13

14 x8 Chipkill x8 chipkill with the same access granularity 152-bit wide data path 128-bit data + 24-bit ECC Redundancy overhead: 18.75% Need a custom-designed DIMM Increase the system cost a lot x8 152-bit wide data bus x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 14

15 x8 Chipkill /w Standard DIMMs Increase access granularity 128B in DDR2 (min. burst 4 x 256 bit) 256B in DDR3 (min. burst 8 x 256 bit) 280-bit wide data bus x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 x8 15

16 V-ECC for Chipkill Use 3 check symbol error codes Single Symbol-error Correct and Double Symbol-error Detect T1EC 2 check symbols Detect up to 2 symbol error T2EC 3rd check symbol Combined T1EC/T2EC provides Chipkill 16

17 V-ECC: ECC x4 configuration Use 8-bit symbol error code 2 bursts out of a x4 DRAM form an 8bit-symbol Modern DRAMs have minimum burst of 4 or 8 1 x4 ECC DIMM + 1 x4 Non-ECC DIMM Each DRAM access in DDR2 (burst 4) 64B data, 4B T1EC 2B T2EC is virtualized within memory namespace 32 T2ECs per 64B cache line 136-bit wide data bus Virtualized within memory T2EC Data T1EC x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 Data x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 x4 17

18 V-ECC: ECC x8 configuration Use 8-bit symbol error code 2 x8 ECC DIMMs Each DRAM access in DDR2 (burst 4) 64B data, 8B T1EC 4B T2EC is virtualized 16 T2ECs per 64B cache line 144-bit wide data bus Virtualized within memory T2EC Data T1EC x8 x8 x8 x8 x8 x8 x8 x8 x8 Data T1EC x8 x8 x8 x8 x8 x8 x8 x8 x8 18

19 Flexible Error Protection Single HW with V-ECC can provide Chipkill-detect, Chipkill-correct, and Double chipkill-correct Use different T2EC for different pages Chipkill- Detect Chipkill- Correct ECC x4 0B 2B 4B Reliability Performance tradeoff ECC x8 0B 4B 8B Double Chipkill- Correct Maximize performance/power efficiency with Chipkill-Detect Stronger protection at the cost of additional T2EC access 19

20 EVALUATION 20

21 Simulator/Workload GEMS + DRAMsim An out-of-order SPARC V9 core Exclusive two-level cache hierarchy DDR2 800MHz 12.8GB/s (128-bit wide data path) 1 channel 4 ranks Power model WATTCH for processor power scaled to 45nm CACTI for cache power cacti 45nm Micron model for DRAM power commodity DRAMs Workloads 12 data intensive applications from SPEC CPU 2006 and PARSEC Microbenchmarks: STREAM and GUPS 21

22 STREAM GUPS Normalized Execution Time Less than 1% penalty on average Performance penalty Spatial locality Write-back traffic Baseline x4 ECC x4 ECC x8 bzip2 hmmer mcf libq omnet milc lbm sphinx3 canneal dedup fluid freq avg SPEC 2006 PARSEC

23 STREAM GUPS System Energy Efficiency Energy Delay Product (EDP) gain ECC x4: 1.1% on average ECC x8: 12.0% on average Baseline x4 ECC x4 ECC x8 17% 20% bzip2 hmmer mcf libq omnet milc lbm sphinx3canneal dedup fluid freq avg % % SPEC 2006 PARSEC

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Flexible Error Protection 1.12 1.08 1.04 Chipkill-Detect Chipkill-Correct Double Chipkill-Correct Normalized Execution Time 1.12 1.08 1.04 1.00 1.00 0.96 0.96 1.

24 Flexible Error Protection Chipkill-Detect Chipkill-Correct Double Chipkill-Correct Normalized Execution Time Normalized EDP bzip2 hmmer mcf libq omnet milc lbm sphinx3 canneal dedup fluid freq avg STREAM GUPS SPEC 2006 PARSEC

25 Conclusion Virtualized ECC Two-tiered error protection, virtualized T2EC Improved system energy efficiency with chipkill Reduce DRAM power consumption by 27% Improve system EDP by 12% Performance penalty 1% on average Error protection even for Non-ECC DIMMs Can be used for GPU memory error protection Flexibility in error protection Adaptive error protection level by user/system demand Cost of error protection is proportional to protection level 25

26 Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin 26

Virtualized ECC: Flexible Reliability in Memory Systems

Virtualized ECC: Flexible Reliability in Memory Systems Doe Hyun Yoon Advisor: Mattan Erez Electrical and Computer Engineering The University of Texas at Austin Motivation Reliability concerns are growing