Could We Make SSDs Self-Healing?

Similar documents
Error Recovery Flows in NAND Flash SSDs

NAND Flash Basics & Error Characteristics

NAND Controller Reliability Challenges

PEVA: A Page Endurance Variance Aware Strategy for the Lifetime Extension of NAND Flash

Controller Concepts for 1y/1z nm and 3D NAND Flash

3D NAND - Data Recovery and Erasure Verification

Increasing NAND Flash Endurance Using Refresh Techniques

How Much Can Data Compressibility Help to Improve NAND Flash Memory Lifetime?

Solid-State Drive System Optimizations In Data Center Applications

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

Sub-block Wear-leveling for NAND Flash

DPA: A data pattern aware error prevention technique for NAND flash lifetime extension

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques

Flash Trends: Challenges and Future

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Improving LDPC Performance Via Asymmetric Sensing Level Placement on Flash Memory

B4-Flash for Tier 0 SSD Applications

A Self Learning Algorithm for NAND Flash Controllers

QLC Challenges. QLC SSD s Require Deep FTL Tuning Karl Schuh Micron. Flash Memory Summit 2018 Santa Clara, CA 1

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

Self-Adaptive NAND Flash DSP

Holistic Flash Management for Next Generation All-Flash Arrays

FLASH DATA RETENTION

Memory technology and optimizations ( 2.3) Main Memory

Memory Modem TM FTL Architecture for 1Xnm / 2Xnm MLC and TLC Nand Flash. Hanan Weingarten, CTO, DensBits Technologies

Using MLC Flash to Reduce System Cost in Industrial Applications

Flash Memory Overview: Technology & Market Trends. Allen Yu Phison Electronics Corp.

How to Extend 2D-TLC Endurance to 3,000 P/E Cycles

Will Phase Change Memory (PCM) Replace DRAM or NAND Flash?

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

Designing Enterprise Controllers with QLC 3D NAND

ASIC/Merchant Silicon Chip-Based Flash Controllers

Advanced Flash Technology Status, Scaling Trends & Implications to Enterprise SSD Technology Enablement

A Novel On-the-Fly NAND Flash Read Channel Parameter Estimation and Optimization

Uniform and Concentrated Read Disturb Effects in TLC NAND Flash Memories

Versatile RRAM Technology and Applications

Improving the Reliability of Chip-Off Forensic Analysis of NAND Flash Memory Devices. Aya Fukami, Saugata Ghose, Yixin Luo, Yu Cai, Onur Mutlu

HeatWatch Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

Flash Memory. Gary J. Minden November 12, 2013

Wear Leveling Static, Dynamic and Global. White paper CTWP013

Technical Notes. Considerations for Choosing SLC versus MLC Flash P/N REV A01. January 27, 2012

High Performance and Highly Reliable SSD

Reducing MLC Flash Memory Retention Errors through Programming Initial Step Only

Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery

ECE 486/586. Computer Architecture. Lecture # 2

An LDPC-Enabled Flash Controller in 40 nm CMOS

CS311 Lecture 21: SRAM/DRAM/FLASH

GLS89SP032G3/064G3/128G3/256G3/512G3/001T3 Industrial Temp 2.5 SATA ArmourDrive PX Series

Embedded 28-nm Charge-Trap NVM Technology

SSD (Solid State Disk)

SLC vs MLC: Considering the Most Optimal Storage Capacity

Raising QLC Reliability in All-Flash Arrays

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

High-Performance and Large-Capacity Storage: A Winning Combination for Future Data Centers. Phil Brace August 12, 2015

Design Considerations for Using Flash Memory for Caching

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation

GLS87BP032G3/064G3/128G3/256G3/512G3/001T3 Industrial Temp SATA M.2 ArmourDrive PX Series

Optimizing Software-Defined Storage for Flash Memory

WHITE PAPER. Title What kind of NAND flash memory is used for each product? ~~~ Which is suitable SD card from reliability point of view?

Secure data storage -

Frequently asked questions from the previous class survey

islc Claiming the Middle Ground of the High-end Industrial SSD Market

Improving MLC flash performance and endurance with Extended P/E Cycles

1110 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 7, JULY 2014

Linux Kernel Abstractions for Open-Channel SSDs

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University

Phase Change Memory and its positive influence on Flash Algorithms Rajagopal Vaideeswaran Principal Software Engineer Symantec

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China

Data Organization and Processing

Flash Memory Reliability Model Based on Operations and Faults

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Myoungsoo Jung. Memory Architecture and Storage Systems

16:30 18:00, June 20 (Monday), 2011 # (even student IDs) # (odd student IDs) Scope

arxiv: v2 [cs.ar] 5 Jan 2018

NAND flash memory must use error correction code (ECC)

IBM High IOPS MLC Adapters Product Guide

Optimizes Embedded Flash-based Storage for Automotive Use

Replacing the FTL with Cooperative Flash Management

Improving Min-sum LDPC Decoding Throughput by Exploiting Intra-cell Bit Error Characteristic in MLC NAND Flash Memory

GLS85LP0512P / 1002P / 1004P / 1008P Industrial Grade PATA NANDrive

NAND/MTD support under Linux

Temperature Considerations for Industrial Embedded SSDs

Content courtesy of Wikipedia.org. David Harrison, CEO/Design Engineer for Model Sounds Inc.

Design Tradeoffs for SSD Reliability

GLS86FB008G2 / 016G2 / 032G2 / 064G2 Industrial Temp msata ArmourDrive

Don t Let RAID Raid the Lifetime of Your SSD Array

Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery

Linux Kernel Extensions for Open-Channel SSDs

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks

Increasing SSD Endurance for Enterprise Applications

Enabling NVMe I/O Scale

Solid-State Solutions as a Catalyst for Evolving Data Center Requirements

2.5-Inch SATA SSD -7.0mm PSSDS27xxx3

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

3ME3 Series. Customer Approver. Innodisk Approver. Customer: Customer Part Number: Innodisk Part Number: Innodisk Model Name: Date:

Flash memory talk Felton Linux Group 27 August 2016 Jim Warner

Five Key Steps to High-Speed NAND Flash Performance and Reliability

EPTDM Features SATA III 6Gb/s msata SSD

Transcription:

Could We Make SSDs Self-Healing? Tong Zhang Electrical, Computer and Systems Engineering Department Rensselaer Polytechnic Institute Google/Bing: tong rpi Santa Clara, CA 1

Introduction and Motivation Hot Topic NAND Flash Registers & Cache CPU Main memory Hard disk 100nm Bit cost reduction 10nm 2007 2008 2009 2010 2011 2012 2013 2014 Santa Clara, CA 2

Introduction and Motivation Error Rate Endurance Retention X Noisy NAND Flash Y Tb/in 2 Increasingly noisy media Hard disk drive Santa Clara, CA 3

In-Depth Knowledge of Storage Media Hard disk drive X Controller & W/R channel Controller & W/R channel Y Bit response Device characteristics Non-linear distortions Media defects Head noise Media noise BPM HAMR MAMR TDMR Santa Clara, CA 4

In-Depth Knowledge of Storage Media Solid-State Storage X Controller & W/R channel Controller & W/R channel Y Device characteristics Write latency Read latency Erase latency Endurance Redundancy Painless system designers life Lost system optimization opportunities Device-Aware SSD System Design Santa Clara, CA 5

Device Wear-Out Program Erase Modulate transistor threshold voltage Program/erase (P/E) cycling Accumulates charge traps 1. Larger random telegraph noise 2. Larger threshold voltage degradation during retention 3. Larger program/read disturb Smaller noise margin Santa Clara, CA 6

Device Wear-Out 2bits/cell Noise margin Noise margin Threshold voltage (fresh memory cells) Threshold voltage (after 10000 program/erase cycles) P/E cycling endurance Noise margin ECC tolerance limit P/E cycling Santa Clara, CA 7

Device Wear-Out Recovery Oxide trap Interface state trap Recover over time P/E cycling Time Noise margin Temperature Recovery speed Santa Clara, CA 8

Self-Healing SSD? Explicitly leverage this device wear-out recovery phenomenon in FTL Re-think of how to utilize existing over-provisioning? Keep track of the history of environment temperature Intentionally operate SSDs under higher environment temperature Write speed Inherent wear-out recovery Retention Endurance Most Aggressive Scenario 3D-enabled self-heating flash chip Flash dies Heater die Santa Clara, CA 9

Self-Healing SSD? Flash dies Heater die Improve P/E cycling endurance Improve write speed Increase retention time backup backup backup backup Self-heating Self-heating Santa Clara, CA 10

A Preliminary Evaluation Improve P/E cycling endurance Heating time Thermal modeling Heating energy Self-healing SSD simulation Flash cell modeling Cell-to-cell interference Random telegraph noise Retention noise SSD system modeling Impact of data backup on system performance Santa Clara, CA 11

Thermal Simulation Setup HotSpot thermal modeling 3D chip structure 3D chip setup Santa Clara, CA 12

Thermal Simulation Results 1.5W to 5.1W power consumption when temperature changes from 110 to 250C Choose 200 C as the target heating temperature ~35 minutes for 80% interface state traps to recover Santa Clara, CA 13

P/E Cycling Endurance Improvement Allowable worst-case memory read raw BER of 2.04e-3 10-year retention limit Include trap recovery under normal temperature (45 C) Self-healing trigger BER 1.50e-3 Three times longer cooling time P/E cycling endurance: 3000 17400 Santa Clara, CA 14

Data Backup During Chip Self-Heating SSD Controller ECC Host ECC ECC ECC Santa Clara, CA 15

Data Backup During Chip Self-Heating Backup start I/O request 1B 1B 1B 1B 1B 1B 1B 1B xxxb 1B 1B 1B 1B 1B 1B Initialize data backup when SSD is idle I/O request can interrupt data backup operation Impact of data backup granularity on I/O request response time Need to handle read and write conflict Santa Clara, CA 16

Simulation Setup DiskSim simulator with SSD model patch 2 dies/chip, 8-bit I/O bus and a number of common control bus 2 planes/die, 2048 blocks/plane, 64 pages/block, 8 sectors/page, 512bytes/sector 2 channels, 17 flash chips/channel including one backup chip Backup chip on each channel only backups the data of the flash chips on the same channel ONFI 2.0, 133MB/s, read access time 50μs, program time 600μs Santa Clara, CA 17

Simulation Setup Santa Clara, CA 18

Summary Continuous technology scaling demands true device-aware SSD system design How to exploit memory cell wear-out recovery? Explicitly leverage this wear-out recovery phenomenon in FTL A more aggressive scenario: self-heating NAND flash memory chips o SSD controller scheduling and data backup strategy o Simulation based on detailed thermal, flash memory cell, and SSD system modeling Comprehensive cross-layer optimization: an open question Santa Clara, CA 19