Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes

Similar documents
Near-Threshold Computing: How Close Should We Get?

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Low Power Cache Design. Angel Chen Joe Gambino

Reducing Cache Power with Low-Cost, Multi-bit Error-Correcting Codes

Breaking the Energy Barrier in Fault-Tolerant Caches for Multicore Systems

V CC I 0 I 1 P 1 X 1 N 1 V SS

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Optimizing Replication, Communication, and Capacity Allocation in CMPs

Portland State University ECE 587/687. Caches and Prefetching

Improving DRAM Performance by Parallelizing Refreshes with Accesses

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

Post-Manufacturing ECC Customization Based on Orthogonal Latin Square Codes and Its Application to Ultra-Low Power Caches

Module Outline. CPU Memory interaction Organization of memory modules Cache memory Mapping and replacement policies.

Silent Shredder: Zero-Cost Shredding For Secure Non-Volatile Main Memory Controllers

Lecture 11: Large Cache Design

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Two-Layer Error Control Codes Combining Rectangular and Hamming Product Codes for Cache Error

Blurred Persistence in Transactional Persistent Memory

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques

EFFICIENTLY ENABLING CONVENTIONAL BLOCK SIZES FOR VERY LARGE DIE- STACKED DRAM CACHES

Spare Block Cache Architecture to Enable Low-Voltage Operation

The Reuse Cache Downsizing the Shared Last-Level Cache! Jorge Albericio 1, Pablo Ibáñez 2, Víctor Viñals 2, and José M. Llabería 3!!!

Lecture 17: Virtual Memory, Large Caches. Today: virtual memory, shared/pvt caches, NUCA caches

Parallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik

Portland State University ECE 587/687. Superscalar Issue Logic

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Lecture 15: Virtual Memory and Large Caches. Today: TLB design and large cache design basics (Sections )

DRAM Disturbance Errors

CPS101 Computer Organization and Programming Lecture 13: The Memory System. Outline of Today s Lecture. The Big Picture: Where are We Now?

Lecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1)

WALL: A Writeback-Aware LLC Management for PCM-based Main Memory Systems

ECE 485/585 Microprocessor System Design

Portland State University ECE 587/687. Virtual Memory and Virtualization

Portland State University ECE 587/687. Caches and Memory-Level Parallelism

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold

Could We Make SSDs Self-Healing?

ECE/CS 757: Homework 1

Portland State University ECE 588/688. Transactional Memory

HW1 Solutions. Type Old Mix New Mix Cost CPI

Memory Hierarchy. Slides contents from:

The Nios II Family of Configurable Soft-core Processors

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

A Comparison of Capacity Management Schemes for Shared CMP Caches

CSE502: Computer Architecture CSE 502: Computer Architecture

SEESAW: Set Enhanced Superpage Aware caching

Banked Multiported Register Files for High-Frequency Superscalar Microprocessors

ARCHITECTURAL APPROACHES TO REDUCE LEAKAGE ENERGY IN CACHES

Correction Prediction: Reducing Error Correction Latency for On-Chip Memories

VIRTUAL MEMORY II. Jo, Heeseung

Portland State University ECE 587/687. Memory Ordering

Chapter 8 Memory Management

Self-Adaptive NAND Flash DSP

Structure of Computer Systems

Lecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors

Address Translation. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568/668

ECE232: Hardware Organization and Design

Use-Based Register Caching with Decoupled Indexing

Error Recovery Flows in NAND Flash SSDs

Emerging NVM Memory Technologies

ECE 411 Exam 1 Practice Problems

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian

Cache Controller with Enhanced Features using Verilog HDL

ECE 485/585 Microprocessor System Design

1. Creates the illusion of an address space much larger than the physical memory

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory

A Comprehensive Analytical Performance Model of DRAM Caches

Using Transparent Compression to Improve SSD-based I/O Caches

Slide Set 5. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

Virtual Memory. Motivation:

Area-Efficient Error Protection for Caches

A Case for Clumsy Packet Processors. Arindam Mallik and Gokhan Memik Electrical and Computer Engineering Dept. Northwestern University

Power / Capacity Scaling: Energy Savings With Simple Fault-Tolerant Caches

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

How does a Client SSD Controller Fit the Bill in Hyperscale Applications?

Adapted from David Patterson s slides on graduate computer architecture

Advanced cache optimizations. ECE 154B Dmitri Strukov

FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES

COSC 6385 Computer Architecture - Memory Hierarchies (III)

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Execution-based Prediction Using Speculative Slices

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

A Novel On-the-Fly NAND Flash Read Channel Parameter Estimation and Optimization

Flexible Cache Error Protection using an ECC FIFO

Controller Concepts for 1y/1z nm and 3D NAND Flash

TOLERANCE to runtime failures in large on-chip caches has

Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories

ECE7995 (6) Improving Cache Performance. [Adapted from Mary Jane Irwin s slides (PSU)]

Impact of Cache Coherence Protocols on the Processing of Network Traffic

CS 333 Introduction to Operating Systems. Class 11 Virtual Memory (1) Jonathan Walpole Computer Science Portland State University

Transcription:

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen Ilya Wagner Zeshan Chishti Wei Wu Chris Wilkerson Shih-Lien Lu Intel Labs

Overview Large caches and memories limit voltage scaling Many cells fail at low voltages Need to account for weakest cell Error-Correcting Codes (ECC) allow lower voltages by recovering from (multiple) failures Uniform ECC increases latency, power & area Our Proposal: Variable-Strength ECC (VS-ECC) Better performance, power and area vs. uniform ECC Allocates ECC budget to lines that need it Online testing identifies lines needing more protection 2

Outline Overview Motivation Prior Work Our Proposal: Variable-Strength ECC Evaluation Conclusions 3

Probability Motivation 1.E+00 1.E-03 Vcc 64B lines 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 1.E-06 1.E-09 1.E-12 1.E-15 pbitfail P(e=1) P(e=2) P(e=3) P(e=4) 1.E-18 Most cache lines have 0-1 failures at low voltage But some lines (especially for large caches) have more failures 4

Probability Motivation 1.E+00 1.E-01 1.E-02 1.E-03 1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 Vcc 0.4 0.45 0.5 0.55 0.6 pbitfail P(e=1) P(e=2) P(e=3) P(e=4) Need a strong ECC code to protect worst lines Uniform ECC for all lines is expensive AND unnecessary 64B lines 5

Prior Low Voltage Solutions Uniform-Strength Error Correction Codes SECDED (Single Error Correction, Double Error Detection) DECTED (Double Error Correction, Triple Error Detection) Two-dimensional ECC: Kim et al., MICRO 07 Multi-bit segmented ECC (MS-ECC): Chishti et al., MICRO 09 Architectural solutions for persistent failures Word Disable: Wilkerson et al., ISCA 08, Roberts et al., DSD 07 Bit Fix: Wilkerson et al., ISCA 08 Circuit Solutions: Larger cells, alternative cell designs All use same level of protection for all cache lines 6

Variable-Strength ECC (VS-ECC) Key idea: Provide strong ECC protection only for lines that need it But still provide single-error correction for soft errors VS-ECC achieves lower voltage at minimum cost Three variations are explored Need to identify which lines need stronger protection 7

Design 1: VS-ECC-Fixed SECDED ECC bits Fixed number of regular and extended ECC lines Regular lines protected by SECDED Extended ECC lines use 4-bit correction 8

Design 2: VS-ECC-Disable SECDED ECC bits Add a disable bit to each line Lines with 3 or more errors are disabled Lines with zero errors use SECDED, 1-2 errors use 4-bit correction 9

Cache Characterization We need to classify cache lines based on their number of failures Manufacturing-time testing expensive & needs non-volatile on-die storage for fault map Proposal: Online testing on 1 st transition to low voltage 10

Online Testing at Low Voltage Cache is still functional during testing, but with reduced capacity Divide cache to working part (protected by 4-bit ECC) and part under test, then switch roles Use standard testing patterns, store error locations in tag Note: Not all VS-ECC designs require the same testing accuracy Optimizing test time is an opportunity for future work 11

Simulated Configurations Baseline 2MB 16-way L2 (12 cycles), SECDED ECC to recover from nonpersistent errors (1 cycle) Uniform-strength ECC DECTED: 1 cycle, corrects one persistent error per line 4EC5ED: 15 cycles, corrects up to three persistent errors per line MS-ECC: 64-bit segments, 4 corrections/segment, corrects up to three persistent errors per segment, cache becomes 1MB 8-way Variable-strength ECC VS-ECC-Fixed: 12 lines with SECDED (1 cycle), 4 with 4EC5ED (15 cycles) VS-ECC-Disable: VS-ECC-Fixed+disable lines with 3 errors 12

Probability Results: Reliability 1.E+00 Vmin set at 1/1000 cache failure probability 1.E-03 1.E-06 1.E-09 1.E-12 2MB SECDED DECTED 4EC5ED VS-ECC-Fixed MS-ECC VS-ECC-Disable 1.E-15 0.4 0.5 0.6 0.7 0.8 Supply Voltage (V) VS-ECC has similar voltage scaling to 4EC5ED VS-ECC-Disable achieves lowest voltage 13

DH FSPEC ISPEC GM MM OFF PROD SERV WS KERN GMEAN Normalized IPC Results: Performance at Low Voltage 1.02 1 0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 2MB Base VS-ECC-Dis 4EC5ED MS-ECC Similar IPC to baseline, better than uniform ECC 14

Results: Power & Energy Design Vccmin (mv) Frequency (MHz) Norm. Power Norm. EPI Baseline (SECDED) 830 2000 1.00 1.000 DECTED 675 1350 0.49 0.72 4EC5ED 565 940 0.26 0.57 MS-ECC 540 830 0.22 0.56 VS-ECC-Fixed 590 1040 0.31 0.59 VS-ECC-Disable 500 650 0.16 0.50 15

Conclusions We need strong ECC capability in large caches to lower voltage and power Uniform ECC techniques are expensive (performance, power, area) Variable-Strength ECC provides strong protection only to lines that need it VS-ECC + Line Disable is the most cost-effective mechanism Optimizing test algorithms is an important topic for future work 16

Backup Slides 17

Design 3: VS-ECC-Variable Each line has minimum SECDED correction Lines in a set share extended ECC blocks, gets extra protection as needed Needs knowledge of exact failure count per line

Cache Operation at Low Voltage Access Hit Tag lookup and E-bit decode Miss Read Access type Write N Writeback needed? Y SECDED ECC type eecc SECDED ECC type eecc SECDED Victim ECC type eecc SECDED ECC check Send line to CPU SECDED ECC compute Write line and ECC Cache line fill SECDED ECC compute Writeback victim line Multi-bit ECC check Send line to CPU Multi-bit ECC compute Write line and ECC Multi-bit ECC compute Writeback victim line 19

Simulated Configurations Baseline 32KB 8-way L1 caches, 2MB 16-way L2 (12 cycles), SECDED ECC to recover from non-persistent failures (1 cycle) Uniform-strength ECC DECTED: 1 cycle, corrects one persistent failure per line 4EC5ED: 15 cycles, corrects up to three persistent failures per line MS-ECC: 64-bit segments, 4 corrections/segment, corrects up to three persistent failures per segment, cache becomes 1MB 8-way Variable-strength ECC VS-ECC-Fixed: 12 lines with SECDED (1 cycle), 4 with 4EC5ED (15 cycles) VS-ECC-Variable: SECDED + 12 extra 10-bit ECC blocks per set VS-ECC-Disable: VS-ECC-Fixed + disable lines with 3 or more failures 20

Probability Results: Reliability 1.E+00 Vmin set at 1E-3 failure probability 1.E-03 1.E-06 1.E-09 1.E-12 1.E-15 0.4 0.5 0.6 0.7 0.8 Supply Voltage (V) VS-ECC has similar voltage scaling to 4EC5ED VS-ECC-Disable achieves lowest voltage 2MB Base (SECDED) DECTED 4EC5ED VS-ECC-Fixed MS-ECC VS-ECC-Variable VS-ECC-Disable 21

Results: Performance at Low Voltage 1.02 1 0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 2MB Base VS-ECC 4EC5ED MS-ECC Similar IPC to baseline, better than uniform ECC 22

Results: Power & Energy Design Vccmin (mv) Frequency (MHz) Norm. Power Norm. EPI Baseline (SECDED) 830 2000 1.00 1.000 DECTED 675 1350 0.49 0.72 4EC5ED 565 940 0.26 0.57 MS-ECC 540 830 0.22 0.56 VS-ECC-Fixed 590 1040 0.31 0.59 VS-ECC-Variable 565 940 0.26 0.56 VS-ECC-Disable 500 650 0.16 0.50 23