SFS: Random Write Considered Harmful in Solid State Drives

Similar documents
Presented by: Nafiseh Mahmoudi Spring 2017

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory

OSSD: A Case for Object-based Solid State Drives

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Understanding the Relation between the Performance and Reliability of NAND Flash/SCM Hybrid Solid- State Drive

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks

Design of Flash-Based DBMS: An In-Page Logging Approach

SSD-based Information Retrieval Systems

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

SSD-based Information Retrieval Systems

Clustered Page-Level Mapping for Flash Memory-Based Storage Devices

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

LAST: Locality-Aware Sector Translation for NAND Flash Memory-Based Storage Systems

Using Transparent Compression to Improve SSD-based I/O Caches

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

A File-System-Aware FTL Design for Flash Memory Storage Systems

Toward Seamless Integration of RAID and Flash SSD

[537] Flash. Tyler Harter

1110 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 7, JULY 2014

SSD (Solid State Disk)

16:30 18:00, June 20 (Monday), 2011 # (even student IDs) # (odd student IDs) Scope

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices

VSSIM: Virtual Machine based SSD Simulator

FlashTier: A Lightweight, Consistent and Durable Storage Cache

Data Organization and Processing

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Toward SLO Complying SSDs Through OPS Isolation

Reducing Excessive Journaling Overhead with Small-Sized NVRAM for Mobile Devices

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk

SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto

A Semi Preemptive Garbage Collector for Solid State Drives. Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim

p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

White Paper: Understanding the Relationship Between SSD Endurance and Over-Provisioning. Solid State Drive

Lightweight Application-Level Crash Consistency on Transactional Flash Storage

Over provisioning in solid state hard drives: benefits, design considerations, and trade-offs in its use

Architecture Exploration of High-Performance PCs with a Solid-State Disk

From server-side to host-side:

System Software for Flash Memory: A Survey

Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD

Flash-Conscious Cache Population for Enterprise Database Workloads

A Mixed Flash Translation Layer Structure for SLC-MLC Combined Flash Memory System

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality

Virtual Storage Tier and Beyond

Chapter 12 Wear Leveling for PCM Using Hot Data Identification

Beyond Block I/O: Rethinking

FStream: Managing Flash Streams in the File System

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

Performance Modeling and Analysis of Flash based Storage Devices

LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine

White Paper. File System Throughput Performance on RedHawk Linux

File System Management

Improving File System Performance and Reliability of Car Digital Video Recorders

Maximizing Data Center and Enterprise Storage Efficiency

vstream: Virtual Stream Management for Multi-streamed SSDs

Operating Systems. File Systems. Thomas Ropars.

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

Power Analysis for Flash Memory SSD. Dongkun Shin Sungkyunkwan University

Understanding Write Behaviors of Storage Backends in Ceph Object Store

WHITE PAPER. Know Your SSDs. Why SSDs? How do you make SSDs cost-effective? How do you get the most out of SSDs?

Purity: building fast, highly-available enterprise flash storage from commodity components

CFDC A Flash-aware Replacement Policy for Database Buffer Management

High Performance SSD & Benefit for Server Application

Computer Systems Laboratory Sungkyunkwan University

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

DSM: A Low-Overhead, High-Performance, Dynamic Stream Mapping Approach for MongoDB

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

Optimizing Fsync Performance with Dynamic Queue Depth Adaptation

Improving MLC flash performance and endurance with Extended P/E Cycles

G i v e Y o u r H a r d D r i v e w h a t i t s B e e n M i s s i n g P e r f o r m a n c e The new Synapse

arxiv: v1 [cs.db] 25 Nov 2018

EMSOFT 09 Yangwook Kang Ethan L. Miller Hongik Univ UC Santa Cruz 2009/11/09 Yongseok Oh

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

Introduction to Open-Channel Solid State Drives and What s Next!

Understanding SSD overprovisioning

Application-Managed Flash

Flash Memory Based Storage System

Meta Paged Flash Translation Layer

BPCLC: An Efficient Write Buffer Management Scheme for Flash-Based Solid State Disks

Mass-Storage Structure

2. PICTURE: Cut and paste from paper

Optimizing SSD Operation for Linux

CS311 Lecture 21: SRAM/DRAM/FLASH

Optimizing Flash-based Key-value Cache Systems

Copyright 2014 Fusion-io, Inc. All rights reserved.

Client vs. Enterprise SSDs

NAND Flash Memory. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computing and Informatics, Vol. 36, 2017, , doi: /cai

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Overprovisioning and the SanDisk X400 SSD

1MG3-P Series. Customer Approver. Innodisk Approver. Customer: Customer Part Number: Innodisk Part Number: Innodisk Model Name: Date:

Introduction to Open-Channel Solid State Drives

SSD Applications in the Enterprise Area

Flash Trends: Challenges and Future

Transcription:

SFS: Random Write Considered Harmful in Solid State Drives Changwoo Min 1, 2, Kangnyeon Kim 1, Hyunjin Cho 2, Sang-Won Lee 1, Young Ik Eom 1 1 Sungkyunkwan University, Korea 2 Samsung Electronics, Korea

Outline Background Design Decisions Introduction Segment Writing Segment Cleaning Evaluation Conclusion 2

Flash-based Solid State Drives Solid State Drive (SSD) A purely electronic device built on NAND flash memory No mechanical parts Technical merits Low access latency Low power consumption Shock resistance Potentially uniform random access speed Remaining two problems limiting wider deployment of SSDs Limited life span Random write performance 3

Limited lifespan of SSDs Limited program/erase (P/E) cycles of NAND flash memory Single-level Cell (SLC): 100K ~ 1M Multi-level Cell (MLC): 5K ~ 10K Triple-level Cell (TLC): 1K As bit density increases cost decreases, lifespan decreases Starting to be used in laptops, desktops and data centers. Contain write intensive workloads 4

Random Write Considered Harmful in SSDs Random write is slow. Even in modern SSDs, the disparity with sequential write bandwidth is more than ten-fold. Random writes shortens the lifespan of SSDs. Random write causes internal fragmentation of SSDs. Internal fragmentation increases garbage collection cost inside SSDs. Increased garbage collection overhead incurs more block erases per write and degrades performance. Therefore, the lifespan of SSDs can be drastically reduced by random writes. 5

Optimization Factors SSD H/W Larger over-provisioned space lower garbage collection cost inside SSDs Higher cost Flash Translation Layer (FTL) More efficient address mapping schemes Purely based on LBA requested from file system Less effective for the no-overwrite file systems Lack of information Applications File System Flash Translation Layer (FTL) SSD H/W Applications SSD-aware storage schemes (e.g. DBMS) Quite effective for specific applications Lack of generality We took a file system level approach to directly exploit file block level statistics and provide our optimizations to general applications. 6

Outline Background Design Decisions Log-structured File System Eager on writing data grouping Introduction Segment Writing Segment Cleaning Evaluation Conclusion 7

Performance Characteristics of SSDs If the request size of the random write are same as erase block size, such write requests invalidate whole erase block inside SSDs. Since all pages in an erase block are invalidated together, there is no internal fragmentation. The random write performance becomes same as sequential write performance when the request size is same as erase block size. 8

Log-structured File System How can we utilize the performance characteristics of SSD in designing a file system? Log-structured File System It transforms the random writes at file system level into the sequential writes at SSD level. If segment size is equal to the erase block size of a SSD, the file system will always send erase block sized write requests to the SSD. So, write performance is mainly determined by sequential write performance of a SSD. 9

Eager on writing data grouping To secure large empty chunk for bulk sequential write, segment cleaning is needed. Major source of overhead in any log-structured file system When hot data is colocated with cold data in the same segment, cleaning overhead significantly increases. Disk segment (4 blocks) 1 2 3 4 5 6 7 8 1 3 7 8 Four live blocks should be moved to secure an empty segment. 1 3 7 8 2 4 5 6 1 3 7 8 No need to move blocks to secure an empty segment. Traditional LFS writes data regardless of hot/cold and then tries to separate data lazily on segment cleaning. If we can categorize hot/cold data when it is first written, there is much room for improvement. Eager on writing data grouping 10

Outline Background Design Decisions Introduction Segment Writing Segment Cleaning Evaluation Conclusion 11

SFS in a nutshell A log-structured file system Segment size is multiple of erase block size Random write bandwidth = Sequential write bandwidth Eager on writing data grouping Colocate blocks with similar update likelihood, hotness, into the same segment when they are first written To form bimodal distribution of segment utilization Significantly reduces segment cleaning overhead Cost-hotness segment cleaning Natural extension of cost-benefit policy Better victim segment selection 12

Outline Background Design Decisions Introduction Segment Writing Segment Cleaning Evaluation Conclusion 13

On Writing Data Grouping Colocate blocks with similar update likelihood, hotness, into the same segment when they are first written. Dirty Pages: t 1 2 3 4 5 6 1. Calculate hotness 2. Classify blocks 1 2 How 3 to 4measure 5 hotness? 6 Hot group Cold group 1 3 How 4 to 5determine 2 grouping 6 criteria? 3. Write large enough groups 1 3 4 5 Disk segment (4 blocks) Dirty Pages: t+1 2 6 14

Measuring Hotness Hotness: update likelihood Frequently updated data hotness Recently updated data hotness File block hotness H b Segment hotness H s 15

Determining Grouping Criteria : Segment Quantization The effectiveness of block grouping is determined by the grouping criteria. Improper criteria may colocate blocks from different groups into the same segment, thus deteriorates the effectiveness of grouping. Naïve solution does not work. hot group warm group hot group warm group cold group cold group read-only group read-only group equi-width partitioning equi-height partitioning 16

Iterative Segment Quantization Find natural hotness groups across segments in disk. Mean of segment hotness in each group is used as grouping criterion. Iterative refinement scheme inspired by k-means clustering algorithm Runtime overhead is reasonable. 32MB segment only 32 segments for 1GB disk space For faster convergence, the calculated centers are stored in meta data and loaded at mounting a file system. 1. Randomly select initial center of groups 2. Assign each segment to the closest center. 3. Calculate a new center by averaging hotnesses in a group. 4. Repeat Step 2 and 3 until convergence has been reached or three times at most. 17

Process of Segment Writing Segment Writing write request 1. Iterative segment quantization Segment writing is invoked in four case: every five seconds flush daemon to reduce dirty pages segment cleaning sync or fsync 2. Classify dirty blocks according to hotness hot blocks warm blocks cold blocks read-only blocks 3. Only groups large enough to completely fill a segment are written. Writing of the small groups will be deferred until the size of the group grows to completely fill a segment. Eventually, the remaining small groups will be written at creating a check-point. 18

Outline Background Design Decisions Introduction Segment Writing Segment Cleaning Evaluation Conclusion 19

Cost-hotness Policy Natural extension of cost-benefit policy In cost-benefit policy, the age of the youngest block in a segment is used to estimate update likelihood of the segment. cost-benefit In cost-hotness policy, we use segment hotness instead of the age, since segment hotness directly represents the update likelihood of segment. cost-hotness Segment cleaner selects a victim segment with maximum costhotness value. 20

Writing Blocks under Segment Cleaning Live blocks under segment cleaning are handled similarly to typical writing scenario. Their writing can also be deferred for continuous re-grouping Continuous re-grouping to form bimodal segment distribution. 21

Scenario of Data Loss in System Crash There are possibility of data loss for the live blocks under segment cleaning in system crash or sudden power off. dirty pages 1 3 7 8 1 3 7 8 2 4 1 3 7 8 2 4 disk segment 1 2 3 4 1 2 3 4 1 23 37 8 4 1. Segment cleaning. Live blocks are read into the page cache. 2. Hot blocks are written. 3. System Crash!! Block 2, 4 will be lost since they do not have on-disk copy. 22

How to Prevent Data Loss Segment Allocation Scheme Allocate a segment in Least Recently Freed (LRF) order. Check if writing a normal block could cause data loss of blocks under cleaning. This guarantees that live blocks under cleaning are never overwritten before they are written elsewhere. 2. If so, write the live blocks under cleaning first regardless of grouping. dirty pages 1 3 7 8 2 4 1. Check if live blocks under cleaning is originated from S t+1? 1` 3` 7` 8` 2 4 disk segment 1 2 3 4 1 2 3 4 S t : currently allocated segment S t+1 : segment that will be allocated next time 12 24 31 43 17 28 3 4 S t : currently allocated segment S t+1 : segment that will be allocated next time 23

Outline Background Design Decisions Introduction Segment Writing Segment Cleaning Evaluation Conclusion 24

Evaluation Server Intel i5 Quad Core, 4GB RAM Linux Kernel 2.6.27 SSD SSD-H SSD-M SSD-L Interface SATA SATA USB 3.0 Flash Memory SLC MLC MLC Max. Sequential Writes (MB/s) 170 87 38 Random 4KB Writes (MB/s) 5.3 0.6 0.002 Price ($/GB) 14 2.3 1.4 Configuration 4 data groups Segment size: 32MB 25

Workload Synthetic Workload Zipfian Random Write Uniform Random Write No skewness worst-case scenario of SFS Real-world Workload TPC-C benchmark Research Workload (RES) [Roseli2000] Collected for 113 days on a system consisting of 13 desktop machines of research group. Replaying workload To measure the maximum write performance, we replayed write requests in the workloads as fast as possible in a single thread and measured throughput at the application level. Native Command Queuing (NCQ) is enabled. 26

Throughput vs. Disk Utilization 2x 1.7x 1.9x 2.5x Zipfian Random Write TPC-C 1.9x 1.4x 1.2x 1.2x Uniform Random Write RES * SSD-M 27

Segment Utilization Distribution nearly empty nearly full Zipfian Random Write TPC-C Uniform Random Write RES * Disk utilization is 70%. 28

Comparison with Other File Systems workload Ext4 In-place-update file system Btrfs No overwrite file system Measured Throughput File System blktrace Coarse grained hybrid mapping FTL FAST FTL [Lee 07] Full page mapping FTL Measured Write Amplification and FTL Simulator Block Erase Count 29

Throughput under Different File Systems 1.3x 1.7x 1.5x 7.3x 1.6x 4.2x 1.6x 14.6x 2x 1.3x 10.6x 1.3x 2.4x 48x 1.4x * Disk utilization is 85%. / SSD-M 30

Block Erase Count 2.7x 7.5x 6.4x 3.8x 1.8x 1.3x 1.1x 2.6x 2.6x 4.9x 1.2x 2.5x 1.2x 2.4x 1.5x 1.2x 1.3x 6.1x 3.1x 3.3x 2.8x 1.2x 1.2x 5.2x 5.4x 1.2x 1.1x 4.3x 6.7x 1.4x Full page mapping FTL Coarse Grained Hybrid Mapping FTL: FAST FTL * Disk utilization is 85%. 31

Outline Background Design Decisions Introduction Segment Writing Segment Cleaning Evaluation Conclusion 32

Conclusion Random write on SSDs causes performance degradation and shortens the lifespan of SSDs. We present a new file system for SSD, SFS. Log-structured file system On writing data grouping Cost-hotness policy We show that SFS considerably outperforms existing file systems and prolongs the lifespan of SSD by drastically reducing block erase count inside SSD. Is SFS also beneficial to HDDs? Preliminary experiment results are available on our poster! 33

THANK YOU! QUESTIONS? 34