Introduction to Open-Channel Solid State Drives and What s Next!

Similar documents
Introduction to Open-Channel Solid State Drives

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

Open-Channel SSDs Then. Now. And Beyond. Matias Bjørling, March 22, Copyright 2017 CNEX Labs

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

pblk the OCSSD FTL Linux FAST Summit 18 Javier González Copyright 2018 CNEX Labs

Optimizing SSDs for Multiple Tenancy Use

Linux Kernel Extensions for Open-Channel SSDs

Linux Kernel Abstractions for Open-Channel SSDs

Persistent Memory in Mission-Critical Architecture (How and Why) Adam Roberts Engineering Fellow, Western Digital Corporation

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

Toward a Memory-centric Architecture

Radian MEMORY SYSTEMS

Denali Open-Channel SSDs

Open-Channel Solid State Drives Specification

Open Channel Solid State Drives NVMe Specification

Replacing the FTL with Cooperative Flash Management

LightNVM: The Linux Open-Channel SSD Subsystem

QLC Challenges. QLC SSD s Require Deep FTL Tuning Karl Schuh Micron. Flash Memory Summit 2018 Santa Clara, CA 1

Enabling NVMe I/O Scale

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

SFS: Random Write Considered Harmful in Solid State Drives

The impact of 3D storage solutions on the next generation of memory systems

Experimental Results of Implementing NV Me-based Open Channel SSDs

Shingled Magnetic Recording (SMR) Panel: Data Management Techniques Examined Tom Coughlin Coughlin Associates

Raising QLC Reliability in All-Flash Arrays

Virtual Storage Tier and Beyond

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

Designing Enterprise Controllers with QLC 3D NAND

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

Maximizing Data Center and Enterprise Storage Efficiency

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics

Optimizing Software-Defined Storage for Flash Memory

OSSD: A Case for Object-based Solid State Drives

Data-Centric Innovation Summit ALPER ILKBAHAR VICE PRESIDENT & GENERAL MANAGER MEMORY & STORAGE SOLUTIONS, DATA CENTER GROUP

Flash Trends: Challenges and Future

Fusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

Standards for improving SSD performance and endurance

Presented by: Nafiseh Mahmoudi Spring 2017

FFS: The Fast File System -and- The Magical World of SSDs

Defining The Software-Defined Technology Market Mario Blandini

Holistic Flash Management for Next Generation All-Flash Arrays

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

FlashTier: A Lightweight, Consistent and Durable Storage Cache

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computational Storage: Acceleration Through Intelligence & Agility

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

RocksDB on Open-Channel SSDs. Javier González RocksDB Annual Meetup'15 - Facebook

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Design Considerations for Using Flash Memory for Caching

Flash memory talk Felton Linux Group 27 August 2016 Jim Warner

VSSIM: Virtual Machine based SSD Simulator

Data Organization and Processing

FlashKV: Accelerating KV Performance with Open-Channel SSDs

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

I/O Devices & SSD. Dongkun Shin, SKKU

Performance Assessment of an All-RRAM Solid State Drive Through a Cloud-Based Simulation Framework

Ben Walker Data Center Group Intel Corporation

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

Radian MEMORY SYSTEMS

AutoStream: Automatic Stream Management for Multi-stream SSDs in Big Data Era

An SMR-aware Append-only File System Chi-Young Ku Stephen P. Morgan Futurewei Technologies, Inc. Huawei R&D USA

In Pursuit of Optimal Storage Performance: Hardware/Software Co-Design with Dual-Mode SSD

White Paper: Understanding the Relationship Between SSD Endurance and Over-Provisioning. Solid State Drive

NVMe: The Protocol for Future SSDs

Using Transparent Compression to Improve SSD-based I/O Caches

Client vs. Enterprise SSDs

A Self Learning Algorithm for NAND Flash Controllers

Developing Low Latency NVMe Systems for HyperscaleData Centers. Prepared by Engling Yeo Santa Clara, CA Date: 08/04/2017

Intel SSD Data center evolution

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Innovations in Non-Volatile Memory 3D NAND and its Implications May 2016 Rob Peglar, VP Advanced Storage,

How Next Generation NV Technology Affects Storage Stacks and Architectures

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University

Next Generation Data Center : Future Trends and Technologies

Developing Extremely Low-Latency NVMe SSDs

SanDisk Enterprise Storage Solutions

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Five Key Steps to High-Speed NAND Flash Performance and Reliability

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

Flash Storage Complementing a Data Lake for Real-Time Insight

Samsung PM1725a NVMe SSD

A New Key-Value Data Store For Heterogeneous Storage Architecture

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

Using MLC Flash to Reduce System Cost in Industrial Applications

stec Host Cache Solution

Optimizing the Data Center with an End to End Solutions Approach

The 3D-Memory Evolution

SanDisk Choosing the Right Non- Volatile Memory for Your Application Needs. August 2009

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Chapter 12 Wear Leveling for PCM Using Hot Data Identification

Jim Harris. Principal Software Engineer. Data Center Group

Hyperscaler Storage. September 12, 2016

High-Performance and Large-Capacity Storage: A Winning Combination for Future Data Centers. Phil Brace August 12, 2015

The Unwritten Contract of Solid State Drives

Transcription:

Introduction to Open-Channel Solid State Drives and What s Next! Matias Bjørling Director, Solid-State System Software September 25rd, 2018 Storage Developer Conference 2018, Santa Clara, CA

Forward-Looking Statements Safe Harbor Disclaimers This presentation contains forward-looking statements that involve risks and uncertainties, including, but not limited to, statements regarding our solid-state technologies, product development efforts, software development and potential contributions, growth opportunities, and demand and market trends. Forward-looking statements should not be read as a guarantee of future performance or results, and will not necessarily be accurate indications of the times at, or by, which such performance or results will be achieved, if at all. Forward-looking statements are subject to risks and uncertainties that could cause actual performance or results to differ materially from those expressed in or suggested by the forwardlooking statements. Key risks and uncertainties include volatility in global economic conditions, business conditions and growth in the storage ecosystem, impact of competitive products and pricing, market acceptance and cost of commodity materials and specialized product components, actions by competitors, unexpected advances in competing technologies, difficulties or delays in manufacturing, and other risks and uncertainties listed in the company s filings with the Securities and Exchange Commission (the SEC ) and available on the SEC s website at www.sec.gov, including our most recently filed periodic report, to which your attention is directed. We do not undertake any obligation to publicly update or revise any forwardlooking statement, whether as a result of new information, future developments or otherwise, except as required by law. 2

Agenda 1 Motivation 2 Interface 3 Eco-system 4 What s Next? Standardization 3

4K Random Read Latency 0% Writes - Read Latency 4K Random Read / 4K Random Write I/O Percentiles 4

4K Random Read Latency 20% Writes - Read Latency 4K Random Read / 4K Random Write 4ms! Signficant outliers! Worst-case 30X I/O Percentiles 5

NAND Chip Density Continues to Grow While Cost/GB decreases 120 3D NAND Layers SLC MLC TLC QLC 100 96 80 60 48 64 Workload #2 Workload #3 40 20 Workload #1 Workload #4 0 2015 2017 2018 6

Ubiquitous Workloads Efficiency of the Cloud requires many different workloads of a single SSD Databases Sensors Analytics Virtualization Video SSD 7

Solid State Drive Internals Read/Write Host Interface (NVMe) NAND Read/Program/Erase Highly Parallel Architecture Tens of Dies NAND Access Latencies Translation Layer Logical to Physical Translation Layer Wear-leveling Garbage Collection Bad block management Media error handling Etc. Read/Write/Erase -> Read/Write Solid State Drive Logical to Physical Translation Map Bad Block Management Die0 Die1 Die2 Die3 Die0 Die1 Die2 Die3 Wear-Leveling Media Error Handling Media Interface Controller Read/Program/Erase Die0 Die1 Die2 Die3 Garbage Collection Highly Parallel Architecture Die0 Die1 Die2 Die3 Read (50-100us) Write (1-10ms) Erase (3-15ms) 8

Single-User Workloads Indirection and Indirect Writes causes outliers Host: Log-on-Log Device: Indirect Writes User Space Log-structured Database (e.g., RocksDB) Metadata Mgmt. Address Mapping Garbage Collection 1 Writes Reads pread/pwrite Log- Structured ii Kernel Space VFS Log-structured File-system Metadata Mgmt. Address Mapping Garbage Collection 2 Solid-State Drive Pipeline Write Buffer die 0 die 1 NAND Controller die 2 die 3 Block Layer HW Read/Write/Trim Solid-State Drive Metadata Mgmt. Address Mapping Garbage Collection 3 Indirect Writes Drive maps logical data to the physical location using Best Effort Unable to align data logically = Write amplification increase + extra GC Host is oblivious to data placement due to the indirection 9

Open-Channel SSDs I/O Isolation Predictable Latency Data Placement & I/O Scheduling 10

Solid State Drive Internals Host Responsibility Logical to Physical Translation Map Garbage Collection Logical Wear-leveling Hint to host to place hot/cold data NVMe Device Driver Integration Block device Host-side FTL that does L2P, GC, Logical Wearleveling Similar overhead to traditional SSDs Applications Databases and File-systems Solid State Drive Logical to Physical Translation Map Bad Block Management Logical Device Wear-Leveling Media Error Handling Garbage Collection Media Interface Controller Read/Program/Erase Die0 Die0 Die0 Die0 Die1 Die1 11

Concepts in an Open-Channel SSD Interface Blocks Chunks Sequential write only LBA ranges Align writes to internal block sizes Hierarchical addressing A sparse addressing scheme projected onto the NVMe LBA address space Host-assisted Media Refresh Improve I/O predictability Host-assisted Wear-leveling Improve wear-leveling 12

Chunks #1 Enable orders of magnitude reduction of device-side DRAM A chunk is a range of LBAs where writes must be sequential. Reduces DRAM for L2P table by orders of magnitude Hot/Cold data separation Rewrite requires a reset A chunk can be in one of four states (free/open/closed/offline) If a chunk is open, there is a write pointer associated. Same device model as the ZAC/ZBC standards. Similar device model to be standardized in NVMe (I ll come back to this) Namespace Chunk 0 Chunk 1 Chunk X 0... Max LBAs

Chunks #2 Drive capacity divided into chunks Chunk types Conventional Random or Sequential Sequential Write Required Chunk must be written sequential only Must be reset entirely before being rewritten Disk LBA range divided in chunks Write pointer position Write commands advance the write pointer Reset commands rewind the write pointer 9/22/2018 14

Chunk Chunk Chunk Chunk Chunk Chunk Chunk Chunk Hierarchical Addressing Channels and Dies are mapped to Logical Groups and Parallel Units Expose device parallelism through Groups/Parallel Units One or a group of dies are exposed as parallel units to the host Parallel units are a logical representation Physical LBA Address Space Logical Host SSD Group 0 1 Group - 1 LUN PU (Dies) Groups (Channels) LUN PU (Dies) PU Chunk LBA 0 1 PU - 1 0 1 Chunk - 1 0 1 LBA - 1 PU PU PU Group(s) NVMe Namespace PU 15

OCSSD Host-assisted Media Refresh Enable host to assist SSD data refresh SSDs refreshes its data periodically to maintain reliability. It does this through a data scrubbing process Internal read and writes make the drive I/O latencies unpredictable. Writes dominates I/O outliers 2-step Data Refresh Device to only perform the data scrubbing read part - Data movement is managed by host Increases predictability of the drive. Host manages refresh strategy Should it refresh? Is there a copy elsewhere? die 0 die 1 Step 1 Step 2 NVMe AER (Chunk Notification Entry) Read-only Scrubbing NAND Controller die 0 die 1 die 0 die 1 Refresh Data (If necessary) die 0 die 1 16

Host-assisted Wear-Leveling Enable host to separate Hot/Cold data to Chunks depending on wear SSDs typically does not know the temperature of newly written data Placing hot and cold data together increases write amplication Write amplication is often 4-5X for SSDs with no optimizations Chunk characteristics Limited reset cycles (as NAND blocks has limited erase cycles) Place cold data on chunks that are nearer end-of-life and use younger chunks for hot data Approach Introduce per-chunk relative wear-level indicator (WLI) Host knows workload and places data w.r.t. to WLI Reduces garbage collection Increases lifetime, I/O latency, Flash and Memory performance Summit 2018, Santa Clara, CA 1st GC 2nd GC 3rd GC Host Writes Write Seq. Hot SB Warm SB Host Writes Chunk X Chunk Y Chunk Z Superblock (SB) WLI: 0 WLI: 33% WLI: 90% GC SB Cold? SB Cold SB SSD SSD 17

Interface Summary The concepts together provide I/O Isolation through the use of Groups & Parallel Units Fine-grained data refresh managed by the host Reduce write amplification by enabling host to place hot/cold data efficiently DRAM & Over-provisioning reduction through append-only Chunks Direct-to-media to avoid expensive internal data movement Specification available at http://lightnvm.io 18

Flexible IO Tester (fio) Eco-system Large eco-system through Zoned Block Devices and OCSSD Linux Kernel NVMe Device Driver Detection of OCSSDs Support for 1.2 and 2.0 specification Register with LightNVM subsystem Register as a Zoned Block Devices (patches available) LightNVM Subsystem Core functionality Target management Target interface Enumerate, get geometry, I/O interface, etc. pblk host-side FTL Map OCSSD to Block Device User-space libzbc, fio (ZBD support), liblightnvm SPDK User Space Regular File- Systems (xfs) Logical Block Device (pblk) LightNVM Subsystem OCSSD2 Applications File-System with SMR Support (f2fs, btrfs) NVMe Driver Block Layer Applications liblightnvm Linux Kernel 19

Open-Source Software Contributions Initial release of subsystem with Linux kernel 4.4 (January 2016). User-space library (liblightnvm) support upstream in Linux kernel 4.11 (April 2017). pblk available in Linux kernel 4.12 (July 2017). Open-Channel SSD 2.0 specification released (January 2018) and support available from Linux kernel 4.17 (May 2018). SPDK Support for OCSSD (June 2018) Fio with Zone support (August 2018) Upcoming OCSSD as a Zoned Block Device (Patches available) RAIL XOR support for lower latency 2.0a revision 9/19/2018 20

Tools and Libraries LightNVM: The Linux Open-Channel SSD Subsystem https://www.usenix.org/conference/fast17/technical-sessions/presentation/bjorling LightNVM http://lightnvm.io LightNVM Linux kernel Subsystem https://github.com/openchannelssd/linux liblightnvm https://github.com/openchannelssd/liblightnvm QEMU NVMe with Open-Channel SSD Support https://github.com/openchannelssd/qemu-nvme 21

Western Digital and the Western Digital logo are registered trademarks or trademarks of Western Digital Corporation or its affiliates Storage Developer Conference 2018, Santa Clara, CA in the US and/or other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. The NVMe word mark is a trademark of NVM Express, Inc. All other marks are the property of their respective owners. 22