Sub-block Wear-leveling for NAND Flash

Similar documents
Holistic Flash Management for Next Generation All-Flash Arrays

Health-Binning Maximizing the Performance and the Endurance of Consumer-Level NAND Flash

NAND Flash Basics & Error Characteristics

Designing Enterprise Controllers with QLC 3D NAND

A Self Learning Algorithm for NAND Flash Controllers

Solid-State Drive System Optimizations In Data Center Applications

WHITE PAPER. Title What kind of NAND flash memory is used for each product? ~~~ Which is suitable SD card from reliability point of view?

Could We Make SSDs Self-Healing?

PEVA: A Page Endurance Variance Aware Strategy for the Lifetime Extension of NAND Flash

SSD (Solid State Disk)

Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques

It Takes Guts to be Great

NAND Flash Architectures Reducing Write Amplification Through Multi-Write Codes

Understanding SSD overprovisioning

SLC vs MLC: Considering the Most Optimal Storage Capacity

Flash Memory Overview: Technology & Market Trends. Allen Yu Phison Electronics Corp.

OSSD: A Case for Object-based Solid State Drives

CS311 Lecture 21: SRAM/DRAM/FLASH

3D NAND - Data Recovery and Erasure Verification

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Research Report. IBM Research. OFC: An Optimized Flash Controller for Enterprise Storage Systems

ECC Approach for Correcting Errors Not Handled by RAID Recovery

NAND Controller Reliability Challenges

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

Improving LDPC Performance Via Asymmetric Sensing Level Placement on Flash Memory

Secure data storage -

Controller Concepts for 1y/1z nm and 3D NAND Flash

Rechnerarchitektur (RA)

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

Improving the Reliability of Chip-Off Forensic Analysis of NAND Flash Memory Devices. Aya Fukami, Saugata Ghose, Yixin Luo, Yu Cai, Onur Mutlu

Asymmetric Programming: A Highly Reliable Metadata Allocation Strategy for MLC NAND Flash Memory-Based Sensor Systems

A File-System-Aware FTL Design for Flash Memory Storage Systems

SSD Applications in the Enterprise Area

NAND/MTD support under Linux

arxiv: v2 [cs.ar] 5 Jan 2018

Wear Leveling Static, Dynamic and Global. White paper CTWP013

FLASH DATA RETENTION

White Paper: Increase ROI by Measuring the SSD Lifespan in Your Workload

Embedded SSD Product Challenges and Test Mitigation

Health-Binning: Maximizing the Performance and the Endurance of Consumer-Level NAND Flash

Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives

Over provisioning in solid state hard drives: benefits, design considerations, and trade-offs in its use

Content courtesy of Wikipedia.org. David Harrison, CEO/Design Engineer for Model Sounds Inc.

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

A Prototype Storage Subsystem based on PCM

NAND Flash Architecture and Specification Trends

Flash File Systems Overview

Technical Notes. Considerations for Choosing SLC versus MLC Flash P/N REV A01. January 27, 2012

Wear Leveling Implementation on SQFlash

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

[537] Flash. Tyler Harter

2.5-Inch SATA SSD -7.0mm PSSDS27xxx3

Purity: building fast, highly-available enterprise flash storage from commodity components

arxiv: v3 [cs.ar] 22 Sep 2017

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

WHAT HAPPENS WHEN THE FLASH INDUSTRY GOES TO TLC? Luanne M. Dauber, Pure Storage

Transitioning from e-mmc to UFS: Controller Design. Kevin Liu ASolid Technology Co., Ltd.

DPA: A data pattern aware error prevention technique for NAND flash lifetime extension

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory Modem TM FTL Architecture for 1Xnm / 2Xnm MLC and TLC Nand Flash. Hanan Weingarten, CTO, DensBits Technologies

Reducing MLC Flash Memory Retention Errors through Programming Initial Step Only

Presented by: Nafiseh Mahmoudi Spring 2017

Flash type comparison for SLC/MLC/TLC and Advantech s Ultra MLC technology

SLC vs. MLC: An Analysis of Flash Memory

Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation

A Survey on NAND Flash Memory Controller Using FPGA

Ultra MLC Technology Introduction

QLC Challenges. QLC SSD s Require Deep FTL Tuning Karl Schuh Micron. Flash Memory Summit 2018 Santa Clara, CA 1

Introduction to Solid State Storage Systems

islc A Cost-Effective Superior-MLC solution With Similar Performance, Endurance and Reliability To SLC White paper An Innodisk White Paper March 2013

Improving MLC flash performance and endurance with Extended P/E Cycles

SFS: Random Write Considered Harmful in Solid State Drives

Myoungsoo Jung. Memory Architecture and Storage Systems

2TB DATA SHEET Preliminary

SSD WRITE AMPLIFICATION

White Paper: Understanding the Relationship Between SSD Endurance and Over-Provisioning. Solid State Drive

VISUAL NAND RECONSTRUCTOR

Data Organization and Processing

HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness

islc Claiming the Middle Ground of the High-end Industrial SSD Market

Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery

Advanced Flash Technology Status, Scaling Trends & Implications to Enterprise SSD Technology Enablement

Pseudo SLC. Comparison of SLC, MLC and p-slc structures. pslc

Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery

HeatWatch Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

SNIA Tutorial 1 A CASE FOR FLASH STORAGE HOW TO CHOOSE FLASH STORAGE FOR YOUR APPLICATIONS

pblk the OCSSD FTL Linux FAST Summit 18 Javier González Copyright 2018 CNEX Labs

smxnand RTOS Innovators Flash Driver General Features

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

Performance Modeling and Analysis of Flash based Storage Devices

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory technology and optimizations ( 2.3) Main Memory

Sentient Storage: Do SSDs have a mind of their own? Tom Kopchak

SD 3.0 series (SLC) Customer Approver. Approver. Customer: Customer Part Number: Innodisk Part Number: Model Name: Date:

3ME3 Series. Customer Approver. Innodisk Approver. Customer: Customer Part Number: Innodisk Part Number: Innodisk Model Name: Date:

3SE4 Series. Customer Approver. Innodisk Approver. Customer: Customer Part Number: Innodisk Part Number: Innodisk Model Name: Date:

Datasheet. Embedded Storage Solutions. Industrial. SATA III 2.5 Solid State Drive. SED2FIV Series CSXXXXXRC-XX09XXXX. 04 Jul 2016 V1.

Transcription:

IBM Research Zurich March 6, 2 Sub-block Wear-leveling for NAND Flash Roman Pletka, Xiao-Yu Hu, Ilias Iliadis, Roy Cideciyan, Theodore Antonakopoulos Work done in collaboration with University of Patras

Overview Motivation Experimental results from SLC and MLC NAND Flash memories Block vs. Page RBER Sub-block wear-leveling schemes Content-aware Wear-leveling Wear-leveling using page tracking Device-dependent wear-leveling Combinations of sub-block wear-leveling schemes Conclusion & Outlook

Introduction Flash Translation Layer: Speeds up writes (no write-in-place) and better spreads wear over blocks. Garbage collection: Reclaims free space from invalid pages of a block by relocating still valid pages to a free block. Often combined with wear leveling. Existing wear-leveling schemes: Dynamic wear-leveling: Allows to equalize wear among blocks with dynamic data (i.e., overwritten often), except for blocks with static data that never get touched. Static wear-leveling: Periodically force relocation of static data to aged blocks from the free block pool. => But these schemes consider only wear of blocks, not individual pages. Other methods to increase the Flash lifetime: ECC: Typical recommendations by manufacturers: SLC: -4 bit/54b, MLC: 8-2 bit/54b ECC schemes taking into account device statistics []. Multi-write coding: Reprogram without erasing using coding [2] Write reduction using compression [3] or deduplication [4]. [] Eitan Yaakobi et al., Error Characterization and Coding Schemes for Flash Memories, Globecom 2 [2] Ashish Jagmohan et al., Write Amplification Reduction in NAND Flash through Multi-write Coding, MSST 2 [3] SandForce DuraWrite, http://www.sandforce.com [4] Feng Chen et al., CAFTL: A Content-aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives, FAST 2

Experimental Results Endurance Normalized number of errors for Flash pages within a single block for different NAND Flash devices, shown for the exercised P/E cycles. Device X Y Y Z Flash Type SLC SLC MLC MLC Page Size [B] 22 434 434 432 Block Size [pages/block] 64 64 28 256 Capacity [Gbits] 4 8 6 256 Endurance [P/E cycles] k k k 5k Exercised P/E cycles M M k k

Experimental Results Retention Number of healthy pages (i.e., with RBER less than -3 ) as a function of retention time and P/E cycles (device X).

Conclusion from Experimental Results Significant RBER differences between pages within the same block have been found. The error pattern is similar for blocks in the same device, but absolute error rate may vary. This stems from tolerances in the manufacturing process (e.g., slight variations in the thickness of the tunnel oxide). NAND Flash memories of the same type (manufacturer, technology) show similar error patterns. This comes from the internal structure of NAND Flash memories. Measuring RBER: NAND Flash RBER should be measured at the page level, not over a full Flash block. Other error sources: Program and read disturbs are best addressed using ECC. Chip failures best addressed with higher level redundancy (e.g., RAID-like schemes).

Content-aware Sub-block Wear-leveling Equally distribute the number of s written among fixed-sized data segments according to the current wear of each Flash segment. Because the tunnel oxide of a memory cell is stressed when electrons are pushed into (writing a ) or out of (erasing a ) the floating gate, which leads to irreparable defects. Segment granularity: Sub-page, page, block, Generally, the finer the granularity, the better the wear is spread. For practical reasons page granularity is the most promising. How does it work? The number of s in a new data segment to be written are counted. Use this information for tracking wear and/or placement decision. For MLC devices, the intermediate levels can be weighted accordingly. Implementation alternatives: Tracking the wear of each data segment. -> Significant amount of additional meta-data. Write segments with similar wear to the same block -> Requires a set of blocks from which data segments are allocated (more complex write page allocator). Number of zeros: 8 New pages to write: Free pages with different wear: 2 2 2 Wear: 9 4 2 5 7 2 2 3 3 2 3 3 3 3 3 2 25

Sub-block Wear-leveling using Page Tracking Use the number of errors that have been corrected by ECC when reading a page combined with a page error threshold (i.e., maximum number of tolerated errors per page). Retire pages individually when the page error threshold is hit. Remaining pages in the Flash block are still used. Tracking valid pages requires one additional bit per page of meta-data and allows to utilize a page to its utmost limit. However, the page must be periodically read to determine the current number of corrected errors, due to retention errors. Frequency of reads increases with P/E cycles. Host Interface Block Read Page 3 from Block Flash ECC ECC capability: Tolerated errors/page: 82 errors corrected 24bits/52B 92bits/page 8 Page 2 3 4 Retired no yes no no yes no

Device-dependent Sub-block Wear-leveling Characteristics for a given device type are consistent and can be utilized. Concept: Partitioning of a block into two or more groups with pages in the same group having similar characteristics: First group holds pages that are most likely to fail early; the last groups holds the most robust ones. Groups from the same block are selectively retired. The mapping of pages to groups is obtained from device characterization. The defective block marker used in ONFi (8-bits) can be used to indicate which groups are still healthy in a block. Alternative: Utilize pages from different groups with varying relative frequency (e.g., periodically skip allocation from groups). Per block P/E counter together with the device characteristics can be used to determine when a group is skipped. -> Advantages: Allows to retire full blocks instead of individual groups (less meta-data). Device capacity (including over-provisioning) doesn t vary over the whole lifetime. Group Group 2 Group 3 Group 4

How to integrate Sub-block Wear-leveling into SSDs SSD Architecture Host Interface Counter Data Buffer Channel Controller ECC Flash Chip Flash Chip Count number of zeros, ones, Microprocessor Number of corrected errors RAM Channel Controller Flash Chip Flash Chip ECC Request Handler LBA-to-PBA Map Bad Block List LBA-to-PBA mapper Firmware Write Page Allocator Set of allocation blocks Garbage Collector Wear Leveling Evaluation of ECC information Sub-block WL Free Block Queue Tracking Information (page/group/blk) Meta Data Cache

Combinations of Sub-block Wear-leveling schemes Content-aware WL & page tracking: Utilize number of corrected errors from ECC of most recent read operation to determine wear-out of a page. Content-aware WL & device-dependent WL: Place new pages with more zeros to Flash pages from a group with better characteristics. Page tracking & device-dependent WL: Retire an entire group when the first page in the group hits the maximum tolerated error rate during a read. Combination of all 3 WL schemes: Place new pages with more zeros to Flash pages with better characteristics. Determine characteristics from the number of corrected ECC errors. Page reads are done when block is garbage collected (small amount of meta-data). Retire an entire group when the first page in the group hits the maximum tolerated error rate during a read.

Conclusion Sub-block wear-leveling can significantly extend the lifetime of SSDs. RBER at block and page level are not the same thing! Three sub-block wear-leveling schemes have been presented. => The different sub-block wear-leveling schemes can be combined. Suitable combinations of sub-block wear-leveling schemes exist for different use cases (Flash cache, SSD). Stable vs. decreasing r/w performance over time Device capacity: Fixed vs. shrinking

Questions??