Virtualized ECC: Flexible Reliability in Memory Systems

Size: px

Start display at page:

Download "Virtualized ECC: Flexible Reliability in Memory Systems"

Buddy Wood
6 years ago
Views:

1 Virtualized ECC: Flexible Reliability in Memory Systems Doe Hyun Yoon Advisor: Mattan Erez Electrical and Computer Engineering The University of Texas at Austin

2 Motivation Reliability concerns are growing Memory failure rate is increasing Shrinking semiconductor design rules & lower Vdd More and more memory cells Higher capacity & larger number of nodes in a system Industry consistently requires high reliability levels Chipkill and even higher reliability levels Traditional solutions Apply ECC uniformly across memory locations Not all apps/data need same error tolerance levels Cost increases with tolerance levels Hard to meet the required level of reliability at low cost 2

3 Virtualized ECC Uniform ECC Virtualized ECC Virtualized ECC Stronger ECC Data block ECC Store redundant info within memory hierarchy Dynamic mapping between data and ECC Flexible protection level/access granularity Two-tiered protection Finer access granularity 3

4 Outline Motivation Traditional Memory Protection Virtualizing Redundant Information Memory Mapped ECC (MME) Virtualized and Flexible ECC (V-ECC) Adaptive Granularity Memory System (AGMS) Conclusions and Future Work 4

5 Traditional Memory Protection Uniform ECC Simple and transparent to the programmer L2 cache Dedicated and aligned HW resources Waste storage and bandwidth Better reliability at higher cost Memory bus Design-time decision Error tolerance level Access granularity Main Memory Users always pay for the cost of memory protection 5

6 Virtualizing Redundant Information Store all or part of redundant information within the memory hierarchy Minimize dedicated resources to redundant information Two-tiered protection Tier-1 error code: Low-cost, low-complexity for the common case Tier-2 error code: Strong error correcting code, but rarely accessed Decouple data and ECC storage Flexibility in memory protection Adaptive error tolerance levels / access granularities Users pay for the error tolerance they need, rather than overpay for what they might need 6

7 Examples of Virtualizing ECC Last-level cache protection with memory mapped ECC Memory Mapped ECC (MME) [ISCA 09] ECC FIFO [SC 09] Chipkill level flexible main memory protection Virtualized and Flexible ECC (V-ECC) [ASPLOS 10] Adaptive memory access granularity with ECC Adaptive Granularity Memory System (AGMS) [On-going] 7

8 Memory Mapped ECC [ISCA 09]

9 Memory Mapped ECC [ISCA 09] Tag Data T1ECCT2EC 8 ways 64B 1B 8B 8B On-Chip DRAM T2EC is memory mapped and cacheable 9

10 Memory Mapped ECC (Cont d) Last-level cache protection mechanism Two-tiered protection Tier-1 error code (T1EC) Low-overhead on-chip error code Tier-2 error code (T2EC) Strong memory mapped error correcting code LLC is dynamically and transparently partitioned into data and T2EC Fixed, one-on-one mapping between physical cache lines and memory mapped T2EC Area saving: 15%, power saving: 9% 10

11 Virtualized and Flexible ECC [ASPLOS 10]

12 Virtualized and Flexible ECC [ASPLOS 10] Main memory protection mechanism Virtualize redundant information within the memory hierarchy Augment Virtual Memory (VM) Dynamic mapping between data and ECC Flexible memory protection Single hardware can provide different tolerance levels Allow adaptive tuning of reliability levels Enable protection even for Non-ECC DIMMs 12

13 Virtualized and Flexible ECC (Cont d) Virtual Address space Physical Memory Protection Level Low Medium Virtual page i Virtual page j Virtual Page to Physical Frame mapping Page frame i Page frame j High Virtual page k Page frame k T2EC for Chipkill ECC page j Physical Frame to ECC Page mapping ECC page k Data T1EC T2EC for Double Chipkill 13

14 Write-back a T2EC line when evicted Write: update data, T1EC, and T2EC Don t T2ECs Read: Virtualized of consecutive need fetch T2EC data ECC lines operation map most and to T1EC a cases T2EC line ECC Address Translation Unit: fast PA to EA translation B0 A 3 PA: 0x0200 ECC Address Translation Unit LLC EA: 0x Wr: 0x0200 DRAM Rank 0 Rank c Rd: 0x00c0 A 5 Wr: 0x0540 B0 B1 B2 B c c c T2EC for Rank 1 data c c Data T1EC Data T1EC T2EC for Rank 0 data 14

15 Performance Impact of V-ECC Increased data miss rate T2EC lines in LLC reduce effective LLC size Increased traffic due to T2EC write-back One-way write-back traffic Not on the critical path 15

16 Chipkill correct Single Device-error Correct and Double Device-error Detect Can tolerate a DRAM failure Can detect a second DRAM failure Traditional chipkill requires x4 DRAMs V-ECC x8 Two-tiered error code Simpler T1EC relaxes module design constraints Enable more energy efficient x8 configurations T2EC for error correction is virtualized 16

17 Flexible Error Protection Single HW with V-ECC can provide Chipkill-detect, Chipkill-correct, and Double chipkill-correct Use different T2EC for different pages T2EC per 64B Chipkill- Detect Chipkill- Correct 0B 4B 8B Double Chipkill- Correct Maximize performance/power efficiency with Chipkill-Detect Stronger protection at the cost of additional T2EC accesses 17

18 N VECC x8 (normalized to baseline x4 chipkill) Chipkill-Detect Chipkill-Correct Double Chipkill-Correct Normalized Execution Time Normalized EDP bzip2 hmmer mcf libq omnet milc lbm sphinx3 canneal dedup fluid freq avg STREAM GUPS SPEC 2006 PARSEC 18

19 N V-ECC with Non-ECC DIMM (normalized to baseline x4 chipkill) Normalized Execution Time No protection 2 SPEC Chipkill-Detect Chipkill-Correct 4 Double Chipkill-Correct Normalized EDP bzip2 hmmer mcf libquantum omnetpp milc lbm sphinx3 canneal dedup fluidanimatefreqmine Average PARSEC STREAM GUPS bzip2 hmmer mcf libquantum omnetpp milc lbm sphinx3 canneal dedup fluidanimatefreqmine Average STREAM GUPS SPEC 2006 PARSEC

20 Adaptive Granularity Memory System

21 Adaptive Granularity Memory System Bandwidth wall More and more cores/threads per chip Off-chip bandwidth scaling is limited due to pins and power Fixed, coarse access granularity 64B or 128B data block with 125% redundancy overhead Works well with most applications with spatial locality Inefficient for apps with low spatial locality Waste off-chip BW for unnecessary data with a block Can utilize off-chip BW better with fine-grained blocks Cache hierarchy, DRAM and uniform ECC prevents fine-grained accesses 21

22 Coarse- and Fine-Grained Memory Access Coarse-grained access Lower ECC/control overhead Generally works well But, poor throughput without spatial locality Data ECC Fine-grained access Higher ECC/control overhead Better throughput, if apps have poor spatial locality Data ECC Data ECC Data ECC Data ECC Data ECC Data ECC Data ECC Data ECC 22

23 Design examples Wide channel for Coarse-grained access Narrow multi-channel for Fine-grained access AGMS with virtualized ECC Finegrained access Coarsegrained access ECC page ECC page Data ECC Data ECC Data ECC Data 23

24 Register/Demux Demux Demux N AGMS Design Augment Virtual Memory (VM) to identify fine-grained pages -- 1 bit per PTE Core $D $I Core $D $I Core $D $I Sectored caches to manage 8B data blocks L2 L2 L2 Memory controller can handle multiple access granularities Last Level Cache MC SR0 DRAM SR1 Double Data Rate ABUS Reg clk SR2 SR3 SR4 ABUS DBUS x8 x8 x8 x8 x8 x8 x8 x8 x8 Reg SR5 SR0 SR1 SR2 SR3 SR4 SR5 SR6 SR7 SR8 SR6 clk SR7 Higher Address/Command BW Subranked DRAM to enable an 8B access granularity 24

25 System Throughput System Throughput N Preliminary Results 50 CG AG AG+ECC 4 core system throughput mcf x4 omnetpp x4 libq x4 bzip2 x4 mst x4 em3d x4 SSCA2 x4 linked list x4 gups x4 MIX--1 MIX--2 MIX--3 MIX core system throughput CG AG AG+ECC omnetpp x8 SSCA2 x8 MIX--A MIX--B MIX--C 25

26 Conclusions and Future Work Virtualized ECC Store all or part of redundant information within the memory hierarchy Two-tiered protection Minimize dedicated resources to redundancy Flexibility in error tolerance level/access granularity Future work GPU memory protection Non-Volatile memory protection Adaptive/tunable reliability Generic meta-data storage 26

Virtualized and Flexible ECC for Main Memory

Virtualized and Flexible ECC for Main Memory Doe Hyun Yoon and Mattan Erez Dept. Electrical and Computer Engineering The University of Texas at Austin ASPLOS 2010 1 Memory Error Protection Applying ECC