SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

Similar documents
ThyNVM. Enabling So1ware- Transparent Crash Consistency In Persistent Memory Systems

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. Donghyuk Lee Carnegie Mellon University

Design-Induced Latency Variation in Modern DRAM Chips:

Phase Change Memory An Architecture and Systems Perspective

Phase Change Memory An Architecture and Systems Perspective

Understanding and Overcoming Challenges of DRAM Refresh. Onur Mutlu June 30, 2014 Extreme Scale Scientific Computing Workshop

Phase-change RAM (PRAM)- based Main Memory

ARCHITECTURAL TECHNIQUES TO ENHANCE DRAM SCALING. Thesis Defense Yoongu Kim

Understanding Reduced-Voltage Operation in Modern DRAM Devices

NVthreads: Practical Persistence for Multi-threaded Applications

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Toward a Memory-centric Architecture

Emerging NV Storage and Memory Technologies --Development, Manufacturing and

SoftMC A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies

SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory

18-740: Computer Architecture Recitation 5: Main Memory Scaling Wrap-Up. Prof. Onur Mutlu Carnegie Mellon University Fall 2015 September 29, 2015

Lecture 5: Scheduling and Reliability. Topics: scheduling policies, handling DRAM errors

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes

Organization. 5.1 Semiconductor Main Memory. William Stallings Computer Organization and Architecture 6th Edition

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

The Engine. SRAM & DRAM Endurance and Speed with STT MRAM. Les Crudele / Andrew J. Walker PhD. Santa Clara, CA August

Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Scalable Many-Core Memory Systems Lecture 3, Topic 2: Emerging Technologies and Hybrid Memories

Memory technology and optimizations ( 2.3) Main Memory

How Next Generation NV Technology Affects Storage Stacks and Architectures

Emerging NVM Memory Technologies

Rethinking Last-Level Cache Management for Multicores Operating at Near-Threshold

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

Blurred Persistence in Transactional Persistent Memory

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory

William Stallings Computer Organization and Architecture 6th Edition. Chapter 5 Internal Memory

Will Phase Change Memory (PCM) Replace DRAM or NAND Flash?

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Hardware Undo+Redo Logging. Matheus Ogleari Ethan Miller Jishen Zhao CRSS Retreat 2018 May 16, 2018

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

DRAM Disturbance Errors

Chapter 5 Internal Memory

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)

The CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

Memory. Lecture 22 CS301

Hardware Support for NVM Programming

Lecture 5: Refresh, Chipkill. Topics: refresh basics and innovations, error correction

Reducing MLC Flash Memory Retention Errors through Programming Initial Step Only

CS429: Computer Organization and Architecture

Main Memory (RAM) Organisation

JOURNALING techniques have been widely used in modern

Distributed Shared Persistent Memory

China Big Data and HPC Initiatives Overview. Xuanhua Shi

Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery

Redrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories

A Non-Volatile Microcontroller with Integrated Floating-Gate Transistors

Internal Memory. Computer Architecture. Outline. Memory Hierarchy. Semiconductor Memory Types. Copyright 2000 N. AYDIN. All rights reserved.

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Persistent Memory. High Speed and Low Latency. White Paper M-WP006

Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014

CS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es

Advanced Flash Technology Status, Scaling Trends & Implications to Enterprise SSD Technology Enablement

Today s Presentation

HeatWatch Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

DPA: A data pattern aware error prevention technique for NAND flash lifetime extension

Overcoming System Memory Challenges with Persistent Memory and NVDIMM-P

CS429: Computer Organization and Architecture

Creating Storage Class Persistent Memory With NVDIMM

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

Lecture-14 (Memory Hierarchy) CS422-Spring

A Comparison of Capacity Management Schemes for Shared CMP Caches

The Role of Storage Class Memory in Future Hardware Platforms Challenges and Opportunities

Chapter 5. Internal Memory. Yonsei University

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Computer Fundamentals and Operating System Theory. By Neil Bloomberg Spring 2017

Onyx: A Prototype Phase-Change Memory Storage Array

Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University

Row Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Contents. Main Memory Memory access time Memory cycle time. Types of Memory Unit RAM ROM

Lecture 8: Virtual Memory. Today: DRAM innovations, virtual memory (Sections )

Could We Make SSDs Self-Healing?

Random-Access Memory (RAM) CS429: Computer Organization and Architecture. SRAM and DRAM. Flash / RAM Summary. Storage Technologies

MEMORY BHARAT SCHOOL OF BANKING- VELLORE

The Evolving NAND Flash Business Model for SSD. Steffen Hellmold VP BD, SandForce

arxiv: v1 [cs.ar] 8 May 2018

The Memory Hierarchy 1

Memory. Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Ultra-Low Latency Down to Microseconds SSDs Make It. Possible

AC-DIMM: Associative Computing with STT-MRAM

Annual Update on Flash Memory for Non-Technologists

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns

Run-Time Reverse Engineering to Optimize DRAM Refresh

Virtual Storage Tier and Beyond

MRAM Developer Day 2018 MRAM Update

Unleashing MRAM as Persistent Memory

Large and Fast: Exploiting Memory Hierarchy

Transcription:

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS Samira Khan

MEMORY IN TODAY S SYSTEM Processor DRAM Memory Storage DRAM is critical for performance 2

MAIN MEMORY CAPACITY Gigabytes of DRAM Increasing demand for high capacity 1. More cores 2. Data-intensive applications 3

DEMAND 1: INCREASING NUMBER OF CORES 2012 2013 2014 SPARC M5 6 Cores SPARC M6 12 Cores SPARC M7 32 Cores 2015 More cores need more memory 4

DEMAND 2: DATA-INTENSIVE APPLICATIONS MEMORY CACHING IN-MEMORY DATABASE More demand for memory 5

HOW DID WE GET MORE CAPACITY? Technology Scaling DRAM Cells DRAM Cells DRAM scaling enabled high capacity 6

MEGABITS/CHIP DRAM SCALING TREND 10000 1000 100 10 1 1985 1995 2005 2015 START OF MASS PRODUCTION DRAM scaling is getting difficult Source: Flash Memory Summit 2013, Memcon 2014 7

DRAM SCALING CHALLENGE Technology Scaling DRAM Cells DRAM Cells Manufacturing reliable cells at low cost is getting difficult 8

WHY IS IT DIFFICULT TO SCALE? DRAM Cells In order to answer this we need to take a closer look to a DRAM cell 9

WHY IS IT DIFFICULT TO SCALE? Transistor Capacitor Bitline Contact Capacitor Transistor LOGICAL VIEW VERTICAL CROSS SECTION A DRAM cell 10

WHY IS IT DIFFICULT TO SCALE? Technology Scaling DRAM Cells DRAM Cells Challenges in Scaling 1. Capacitor reliability 2. Cell-to-cell interference 11

SCALING CHALLENGE 1: CAPACITOR RELIABILITY Technology Scaling DRAM Cells DRAM Cells Capacitor is getting taller 12

CAPACITOR RELIABILITY 58 nm 140 m Results in failures while manufacturing Source: Flash Memory Summit, Hynix 2012 13

Cost/Bit IMPLICATION: DRAM COST TREND PROJECTION 2000 2005 2010 2015 2020 YEAR Cost is expected to go higher Source: Flash Memory Summit, Hynix 2012 14

SCALING CHALLENGE 2: CELL-TO-CELL INTERFERENCE Technology Scaling Less Interference More Interference Indirect path Indirect path More interference results in more failures 15

IMPLICATION: DRAM ERRORS IN THE FIELD 1.52% of DRAM modules failed in Google Servers 1.6% of DRAM modules failed in LANL SIGMETRICS 09, SC 12 16

GOAL ENABLE HIGH CAPACITY MEMORY WITHOUT SACRIFICING RELIABILITY 17

TWO DIRECTIONS DRAM Difficult to scale MAKE DRAM SCALABLE SIGMETRICS 14, DSN 15, ONGOING NEW TECHNOLOGIES Predicted to be highly scalable LEVERAGE NEW TECHNOLOGIES WEED 13, ONGOING 18

DRAM Cells Technology Scaling DRAM Cells DRAM SCALING CHALLENGE DRAM Cells Detect and Mitigate Reliable System SYSTEM-LEVEL TECHNIQUES TO ENABLE DRAM SCALING Non-Volatile Memory UNIFY Storage NON-VOLATILE MEMORIES: UNIFIED MEMORY & STORAGE PAST AND FUTURE WORK 19

TRADITIONAL APPROACH TO ENABLE DRAM SCALING Make DRAM Reliable Unreliable DRAM Cells Manufacturing Time Reliable DRAM Cells Reliable System System in the Field DRAM has strict reliability guarantee 20

MY APPROACH Unreliable DRAM Cells Make DRAM Reliable Manufacturing Manufacturing Time Time Reliable DRAM Cells System in the Field Reliable System System in the Field Shift the responsibility to systems 21

VISION: SYSTEM-LEVEL DETECTION AND MITIGATION Detect and Mitigate Unreliable DRAM Cells Reliable System Detect and mitigate errors after the system has become operational ONLINE PROFILING Reduces cost, increases yield, and enables scaling 22

CHALLENGE: INTERMITTENT FAILURES DRAM Cells Detect and Mitigate Reliable System SYSTEM-LEVEL TECHNIQUES TO ENABLE DRAM SCALING EFFICACY OF SYSTEM-LEVEL TECHNIQUES WITH REAL DRAM CHIPS HIGH-LEVEL DESIGN NEW SYSTEM-LEVEL TECHNIQUES 23

CHALLENGE: INTERMITTENT FAILURES Unreliable DRAM Cells Detect and Mitigate Reliable System If failures were permanent, a simple boot up test would have worked What are the characteristics of intermittent failures? 24

DRAM RETENTION FAILURE Switch Refreshed Every 64 ms Leakage Capacitor Retention Retention Time Time Refresh Interval 64 ms Time 25

INTERMITTENT RETENTION FAILURE DRAM Cells Some retention failures are intermittent Two characteristics of intermittent retention failures 1 2 Data Pattern Sensitivity Variable Retention Time 26

DATA PATTERN SENSITIVITY INTERFERENCE NO 10 0 10 FAILURE FAILURE Some cells can fail depending on the data stored in neighboring cells JSSC 88, MDTD 02 27

Retention Time (ms) VARIABLE RETENTION TIME 640 512 384 256 128 0 Time Retention time changes randomly in some cells IEDM 87, IEDM 92 28

CURRENT APROACH TO CONTAIN INTERMITTENT FAILURES Manufacturing Time Testing PASS FAIL 1. Manufacturers perform exhaustive testing of DRAM chips 2. Chips failing the tests are discarded 29

SCALING AFFECTING TESTING Manufacturing Time Testing PASS Longer Tests and More Failures FAIL More interference in smaller technology nodes leads to lower yield and higher cost 30

SYSTEM-LEVEL ONLINE PROFILING Not fully tested during manufacture-time 1 Ship modules with possible failures 2 PASS FAIL Detect and mitigate failures online 3 Increases yield, reduces cost, enables scaling 31

CHALLENGE: INTERMITTENT FAILURES DRAM Cells Detect and Mitigate Reliable System SYSTEM-LEVEL TECHNIQUES TO ENABLE DRAM SCALING EFFICACY OF SYSTEM-LEVEL TECHNIQUES WITH REAL DRAM CHIPS HIGH-LEVEL DESIGN NEW SYSTEM-LEVEL TECHNIQUES 32

EFFICACY OF SYSTEM-LEVEL TECHNIQUES 1 Can we leverage existing techniques? Testing 2 3 4 Guardbanding Error Correcting Code Higher Strength ECC We analyze the effectiveness of these techniques using experimental data from real DRAMs Data set publicly available 33

METHODOLOGY FPGA-based testing infrastructure Evaluated 96 chips from three major vendors 34

1. TESTING Write some pattern in the module 1 Repeat Read 3 and verify Wait until 2 refresh interval Test each module with different patterns for many rounds Zeros (0000), Ones (1111), Tens (1010), Fives (0101), Random 35

Number of Failing Cells Found 200000 150000 100000 50000 EFFICACY OF TESTING 0 ZERO ONE TEN FIVE RAND All Even after hundreds of Only rounds, a few a rounds small number can discover of new most cells keep of the failing failures 0 100 200 300 400 500 600 700 800 900 1000 Number of Rounds Conclusion: Testing alone cannot detect all possible failures 36

2. GUARDBANDING Adding a safety-margin on the refresh interval Can avoid VRT failures Refresh Interval 4X Guardband 2X Guardband Effectiveness depends on the difference between retention times of a cell 37

EFFICACY OF GUARDBANDING 1000000 Number of Failing Cells 100000 10000 1000 100 10 1 0 4 8 12 16 20 Retention Time (in seconds) 38

EFFICACY OF GUARDBANDING 1000000 Number of Failing Cells 100000 10000 1000 100 10 1 0 4 8 12 16 20 Retention Time (in seconds) 39

EFFICACY OF GUARDBANDING 1000000 Number of Failing Cells 100000 10000 1000 100 10 1 0 4 8 12 16 20 Retention Time (in seconds) 40

EFFICACY OF GUARDBANDING 1000000 Number of Failing Cells 100000 10000 1000 100 10 Most of the cells exhibit closeby retention times 1 0 4 8 12 16 20 Retention Time (in seconds) 41

EFFICACY OF GUARDBANDING 1000000 Number of Failing Cells 100000 10000 1000 100 10 There are few cells with large differences in retention times 1 0 4 8 12 16 20 Retention Time (in seconds) Conclusion: Even a large guardband (5X) cannot detect 5-15% of the failing cells 42

3. ERROR CORRECTING CODE Error Correcting Code (ECC) Additional information to detect error and correct data 43

Probability of New Failure EFFECTIVENESS OF ECC 1E+00 1E-06 1E-12 1E-18 No ECC SECDED SECDED, 2X Guardband 1 10 100 1000 Number of Rounds 44

Probability of New Failure EFFECTIVENESS OF ECC 1E+00 1E-06 1E-12 1E-18 No ECC SECDED SECDED, 2X Guardband 1 10 100 1000 Number of Rounds 45

Probability of New Failure EFFECTIVENESS OF ECC 1E+00 1E-06 1E-12 1E-18 No ECC SECDED SECDED, 2X Guardband 1 10 100 1000 Number of Rounds 46

Probability of New Failure EFFECTIVENESS OF ECC 1E+00 1E-06 1E-12 1E-18 No ECC SECDED SECDED, 2X Guardband Combination of techniques SECDED code reduces reduces error rate by 10 error rate by 100 times 7 times Adding a 2X guardband reduces error rate by 1000 times 1 10 100 1000 Number of Rounds Conclusion: A combination of mitigation techniques is much more effective 47

4. HIGHER STRENGTH ECC (HI-ECC) No testing, use strong ECC But amortize cost of ECC over larger data chunk Can potentially tolerate errors at the cost of higher strength ECC 48

EFFICACY OF HI-ECC Time to Failure (in years) 1E+25 1E+20 1E+15 1E+10 1E+05 1E+00 1E-05 4EC5ED, 2X Guardband 3EC4ED, 2X Guardband DECTED, 2X Guardband SECDED, 2X Guardband 1 10 100 1000 10000 Number of Rounds 10 Years 49

EFFICACY OF HI-ECC Time to Failure (in years) 1E+25 1E+20 1E+15 1E+10 1E+05 1E+00 1E-05 4EC5ED, 2X Guardband 3EC4ED, 2X Guardband DECTED, 2X Guardband SECDED, 2X Guardband 1 10 100 1000 10000 Number of Rounds 10 Years 50

Time to Failure (in years) EFFICACY OF HI-ECC 1E+25 1E+20 1E+15 1E+10 1E+05 1E+00 1E-05 4EC5ED, 2X Guardband 3EC4ED, 2X Guardband DECTED, 2X Guardband SECDED, 2X Guardband After starting with 4EC5ED, can reduce to 3EC4ED code after 2 rounds of tests 1 10 100 1000 10000 Number of Rounds 10 Years 51

Time to Failure (in years) EFFICACY OF HI-ECC 1E+25 1E+20 1E+15 1E+10 1E+05 1E+00 1E-05 4EC5ED, 2X Guardband 3EC4ED, 2X Guardband DECTED, 2X Guardband SECDED, 2X Guardband Can reduce to DECTED code after 10 rounds of tests 1 10 100 1000 10000 Number of Rounds 10 Years 52

Time to Failure (in years) EFFICACY OF HI-ECC 1E+25 1E+20 1E+15 1E+10 1E+05 1E+00 1E-05 4EC5ED, 2X Guardband 3EC4ED, 2X Guardband DECTED, 2X Guardband SECDED, 2X Guardband Can reduce to SECDED code, after 7000 rounds of tests (4 hours) 1 10 100 1000 10000 Number of Rounds 10 Years Conclusion: Testing can help to reduce the ECC strength, but blocks memory for hours 53

CONCLUSIONS SO FAR Key Observations: Testing alone cannot detect all possible failures Combination of ECC and other mitigation techniques is much more effective Testing can help to reduce the ECC strength Even when starting with a higher strength ECC But degrades performance 54

CHALLENGE: INTERMITTENT FAILURES DRAM Cells Detect and Mitigate Reliable System SYSTEM-LEVEL TECHNIQUES TO ENABLE DRAM SCALING EFFICACY OF SYSTEM-LEVEL TECHNIQUES WITH REAL DRAM CHIPS HIGH-LEVEL DESIGN NEW SYSTEM-LEVEL TECHNIQUES 55

TOWARDS AN ONLINE PROFILING SYSTEM Initially Protect DRAM with Strong ECC 1 Periodically Test Parts of DRAM 2 Test Test Test Mitigate errors and reduce ECC 3 Run tests periodically after a short interval at smaller regions of memory 56

CHALLENGE: INTERMITTENT FAILURES DRAM Cells Detect and Mitigate Reliable System SYSTEM-LEVEL TECHNIQUES TO ENABLE DRAM SCALING EFFICACY OF SYSTEM-LEVEL TECHNIQUES WITH REAL DRAM CHIPS HIGH-LEVEL DESIGN NEW SYSTEM-LEVEL TECHNIQUES 57

WHY SO MANY ROUNDS OF TESTS? DATA-DEPENDENT FAILURE Fails when specific pattern in the neighboring cell 0L D1 R0 LINEAR ADDRESS X-1 X X+1 SCRAMBLED ADDRESS 0 0 1 0 0 X-1 X-4 X X+2X+1 Even many rounds of random patterns cannot detect all failures 58

DETERMINE THE LOCATION OF PHYSICALLY ADJACENT CELLS SCRAMBLED ADDRESS L D R X-? X X+? NAÏVE SOLUTION For a given failure X, test every combination of two bit addresses in the row O(n 2 ) 8192*8192 tests, 49 days for a row with 8K cells Our goal is to reduce the test time 59

STRONGLY VS. WEAKLY DEPENDENT CELLS STRONGLY DEPENDENT Fails even if only one neighbor data changes WEAKLY DEPENDENT Fails if both neighbor data change Can detect neighbor location in strongly dependent cells by testing every bit address 0, 1,, X, X+1, X+2, n 60

Aggregate the locations from different rows 61 PHYSICAL NEIGHBOR LOCATION TEST Testing every bit address will detect only one neighbor Run parallel tests in different rows X-4 L A L C R B R X+2

PHYSICAL NEIGHBOR LOCATION TEST SCRAMBLED ADDRESS LINEAR TESTING 0 1 2 3 4 5 6 7 RECURSIVE TESTING 0, 1, 2, 3 0, 1 0 1 L A X-4 X 2 6 2, 3 4, 5 2 3 4 5 4, 5, 6, 7 6, 7 6 7 62

PHYSICAL NEIGHBOR-AWARE TEST A B C NUM TEST REDUCED EXTRA FAILURES 745654X 1016800X 745654X 27% 11% 14% Detects more failures with small number of tests leveraging neighboring information 63

NEW SYSTEM-LEVEL TECHNIQUES REDUCE FAILURE MITIGATION OVERHEAD Mitigation for worst vs. common case Reduces mitigation cost around 3X-10X ONGOING Variable refresh Leverages adaptive refresh to mitigate failures DSN 15 64

SUMMARY Unreliable DRAM Cells Detect and Mitigate Reliable System Proposed online profiling to enable scaling Analyzed efficacy of system-level detection and mitigation techniques Found that combination of techniques is much more effective, but blocks memory Proposed new system-level techniques to reduce detection and mitigation overhead 65

DRAM Cells Technology Scaling DRAM Cells DRAM SCALING CHALLENGE DRAM Cells Detect and Mitigate Reliable System SYSTEM-LEVEL TECHNIQUES TO ENABLE DRAM SCALING Non-Volatile Memory UNIFY Storage NON-VOLATILE MEMORIES: UNIFIED MEMORY & STORAGE PAST AND FUTURE WORK 66

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR 67

STORAGE MEMORY CPU TWO-LEVEL STORAGE MODEL Ld/St DRAM FILE I/O NVM PCM, STT-RAM VOLATILE FAST BYTE ADDR NONVOLATILE SLOW BLOCK ADDR Non-volatile memories combine characteristics of memory and storage 68

VISION: UNIFY MEMORY AND STORAGE Ld/St NVM CPU PERSISTENT MEMORY Provides an opportunity to manipulate persistent data directly 69

MEMORY CPU DRAM IS STILL FASTER DRAM Ld/St NVM CPU PERSISTENT MEMORY A hybrid unified memory-storage system 70

MEMORY CPU CHALLENGE: DATA CONSISTENCY Ld/St CPU PERSISTENT MEMORY System crash can result in permanent data corruption in NVM 71

CURRENT SOLUTIONS Explicit interfaces to manage consistency NV-Heaps [ASPLOS 11], BPFS [SOSP 09], Mnemosyne [ASPLOS 11] AtomicBegin { Insert a new node; } AtomicEnd; Limits adoption of NVM Have to rewrite code with clear partition between volatile and non-volatile Burden on the programmers 72

OUR GOAL AND APPROACH Goal: Provide efficient transparent consistency in hybrid systems time Running Checkpointing Running Checkpointing Epoch 0 Epoch 1 Periodic Checkpointing of dirty data Transparent to application and system Hardware checkpoints and recovers data 73

CHECKPOINTING GRANULARITY PAGE EXTRA WRITES SMALL METADATA DIRTY CACHE BLOCK NO EXTRA WRITE HUGE METADATA DRAM NVM High write locality pages in DRAM, low write locality pages in NVM 74

DUAL GRANULARITY CHECKPOINTING DRAM NVM GOOD FOR STREAMING WRITES GOOD FOR RANDOM WRITES Can adapt to access patterns 75

TRANSPARENT DATA CONSISTENCY Cost of consistency compared to systems with zero-cost consistency UNMODIFIED LEGACY CODE DRAM NVM -3.5% +2.7% Provides consistency without significant performance overhead 76

DRAM Cells Technology Scaling DRAM Cells DRAM SCALING CHALLENGE DRAM Cells Detect and Mitigate Reliable System SYSTEM-LEVEL TECHNIQUES TO ENABLE DRAM SCALING Non-Volatile Memory UNIFY Storage NON-VOLATILE MEMORIES: UNIFIED MEMORY & STORAGE PAST AND FUTURE WORK 77

STORAGE MEMORY CPU OTHER WORK IMPROVING CACHE PERFORMANCE MICRO 10, PACT 10, ISCA 12, HPCA 12, HPCA 14 EFFICIENT LOW VOLTAGE PROCESSOR OPERATION HPCA 13, INTEL TECH JOURNAL 14 NEW DRAM ARCHITECTURE HPCA 15, ONGOING ENABLING DRAM SCALING SIGMETRICS 14, DSN 15, ONGOING UNIFYING MEMORY & STORAGE WEED 13, ONGOING 78

STORAGE MEMORY CPU FUTURE WORK: TRENDS DRAM NVM LOGIC 79

APPROACH DESIGN AND BUILD BETTER SYSTEMS WITH NEW CAPABILITIES BY REDEFINING FUNCTIONALITIES ACROSS DIFFERENT LAYERS IN THE SYSTEM STACK 80

APPLICATION OPERATING SYSTEM SSD RETHINKING STORAGE CPU FLASH CONTROLLER FLASH CHIPS APPLICATION, OPERATING SYSTEM, DEVICES What is the best way to design a system to take advantage of the SSDs? 81

ENHANCING SYSTEMS WITH NON-VOLATILE MEMORY Recover and migrate NVM PROCESSOR, FILE SYSTEM, DEVICE, NETWORK, MEMORY How to provide efficient instantaneous system recovery and migration? 82

DESIGNING SYSTEMS WITH MEMORY AS AN ACCELERATOR MANAGE DATA FLOW SPECIALIZED CORES MEMORY WITH LOGIC APPLICATION, PROCESSOR, MEMORY How to manage data movement when applications run on different accelerators? 83

THANK YOU QUESTIONS? 84

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS Samira Khan