The Unwritten Contract of Solid State Drives

Similar documents
The Unwritten Contract of Solid State Drives

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer

Presented by: Nafiseh Mahmoudi Spring 2017

FlashTier: A Lightweight, Consistent and Durable Storage Cache

Getting Real: Lessons in Transitioning Research Simulations into Hardware Systems

Toward SLO Complying SSDs Through OPS Isolation

Hibachi: A Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays

FStream: Managing Flash Streams in the File System

Performance Modeling and Analysis of Flash based Storage Devices

FFS: The Fast File System -and- The Magical World of SSDs

Understanding Write Behaviors of Storage Backends in Ceph Object Store

Using Transparent Compression to Improve SSD-based I/O Caches

vstream: Virtual Stream Management for Multi-streamed SSDs

SFS: Random Write Considered Harmful in Solid State Drives

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks

HashKV: Enabling Efficient Updates in KV Storage via Hashing

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

Optimizing Flash-based Key-value Cache Systems

Designing a True Direct-Access File System with DevFS

A Self Learning Algorithm for NAND Flash Controllers

Middleware and Flash Translation Layer Co-Design for the Performance Boost of Solid-State Drives

VSSIM: Virtual Machine based SSD Simulator

Warped Mirrors for Flash Yiying Zhang

Baoping Wang School of software, Nanyang Normal University, Nanyang , Henan, China

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

Application-Managed Flash

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Optimizing Software-Defined Storage for Flash Memory

Divided Disk Cache and SSD FTL for Improving Performance in Storage

A File-System-Aware FTL Design for Flash Memory Storage Systems

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices

Flash Memory Based Storage System

LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine

Redesigning LSMs for Nonvolatile Memory with NoveLSM

Understanding SSD overprovisioning

Speeding Up Cloud/Server Applications Using Flash Memory

Replacing the FTL with Cooperative Flash Management

Clustered Page-Level Mapping for Flash Memory-Based Storage Devices

Getting it Right: Testing Storage Arrays The Way They ll be Used

OSSD: A Case for Object-based Solid State Drives

A Novel Buffer Management Scheme for SSD

Linux Kernel Abstractions for Open-Channel SSDs

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

arxiv: v1 [cs.db] 25 Nov 2018

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

A Novel On-the-Fly NAND Flash Read Channel Parameter Estimation and Optimization

Purity: building fast, highly-available enterprise flash storage from commodity components

A Flash Scheduling Strategy for Current Capping in Multi-Power-Mode SSDs

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

BROMS: Best Ratio of MLC to SLC

Reducing Solid-State Storage Device Write Stress Through Opportunistic In-Place Delta Compression

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

Introduction to Open-Channel Solid State Drives and What s Next!

Gecko: Contention-Oblivious Disk Arrays for Cloud Storage

Open-Channel SSDs Then. Now. And Beyond. Matias Bjørling, March 22, Copyright 2017 CNEX Labs

Fusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Optimizing SSD Operation for Linux

Benchmarking Enterprise SSDs

I/O Stack Optimization for Smartphones

Barrier Enabled IO Stack for Flash Storage

CBM: A Cooperative Buffer Management for SSD

Improving throughput for small disk requests with proximal I/O

Using MLC Flash to Reduce System Cost in Industrial Applications

Bitmap discard operation for the higher utilization of flash memory storage

OS-caused Long JVM Pauses - Deep Dive and Solutions

Physical Disentanglement in a Container-Based File System

Differential RAID: Rethinking RAID for SSD Reliability

Recent Advances in Analytical Modeling of SSD Garbage Collection

SSD Architecture for Consistent Enterprise Performance

Zettabyte Reliability with Flexible End-to-end Data Integrity

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Copyright 2014 Fusion-io, Inc. All rights reserved.

ECE 598-MS: Advanced Memory and Storage Systems Lecture 7: Unified Address Translation with FlashMap

Chapter 12 Wear Leveling for PCM Using Hot Data Identification

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Delayed Partial Parity Scheme for Reliable and High-Performance Flash Memory SSD

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

SMORE: A Cold Data Object Store for SMR Drives

CS 318 Principles of Operating Systems

FlashKV: Accelerating KV Performance with Open-Channel SSDs

Design Considerations for Using Flash Memory for Caching

GearDB: A GC-free Key-Value Store on HM-SMR Drives with Gear Compaction

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Warming up Storage-level Caches with Bonfire

Don t stack your Log on my Log

Towards Efficient, Portable Application-Level Consistency

arxiv: v1 [cs.os] 24 Jun 2015

Enabling NVMe I/O Scale

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

Don t Let RAID Raid the Lifetime of Your SSD Array

[537] Flash. Tyler Harter

Transcription:

The Unwritten Contract of Solid State Drives Jun He, Sudarsun Kannan, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau Department of Computer Sciences, University of Wisconsin - Madison

Enterprise SSD revenue is expected to exceed enterprise HDD in 2017 Enterprise HDD Enterprise SSD 15 Revenue (Billion Dollars) 10 5 HDD SSD 2017 0 2012 2013 2014 2015E 2016E 2017E 2018E 2019E Year Source: Gartner, Stifel Estimates https://www.theregister.co.uk/2016/01/07/gartner_enterprise_ssd_hdd_revenue_crossover_in_2017/

Storage stack is shifting from the HDD era to the SSD era App FS

Storage stack is shifting from the HDD era to the SSD era App FS

Storage stack is shifting from the HDD era to the SSD era App App FS FS

Storage stack is shifting from the HDD era to the SSD era App App? FS FS?

Storage stack is shifting from the HDD era to the SSD era App App? FS FS??

Storage stack is shifting from the HDD era to the SSD era App App? FS FS??

Storage stack is shifting from the HDD era to the SSD era for SSD App App? App for SSD FS FS? FS?

Storage stack is shifting from the HDD era to the SSD era for SSD App App? App? FS FS? FS? for SSD?

Storage stack is shifting from the HDD era to the SSD era for SSD App App? App? FS FS? FS? for SSD??

http://crestingwave.com/sites/default/files/collateral/velobit_whitepaper_ssdperformancetips.pdf S. Boboila and. Desnoyers. Write Endurance in Flash Drives: Measurements and Analysis. In roceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST 10), San Jose, California, February 2010 The consequences of misusing SSDs

http://crestingwave.com/sites/default/files/collateral/velobit_whitepaper_ssdperformancetips.pdf S. Boboila and. Desnoyers. Write Endurance in Flash Drives: Measurements and Analysis. In roceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST 10), San Jose, California, February 2010 The consequences of misusing SSDs erformance degradation

http://crestingwave.com/sites/default/files/collateral/velobit_whitepaper_ssdperformancetips.pdf S. Boboila and. Desnoyers. Write Endurance in Flash Drives: Measurements and Analysis. In roceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST 10), San Jose, California, February 2010 The consequences of misusing SSDs erformance degradation erformance fluctuation

http://crestingwave.com/sites/default/files/collateral/velobit_whitepaper_ssdperformancetips.pdf S. Boboila and. Desnoyers. Write Endurance in Flash Drives: Measurements and Analysis. In roceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST 10), San Jose, California, February 2010 The consequences of misusing SSDs erformance degradation erformance fluctuation Early end of device life

What is the right way to achieve high performance on SSDs?

What is the right way to achieve high performance on SSDs? Block Device Interface: read(range), write(range), discard(range)

What is the right way to achieve high performance on SSDs? Block Device Interface: read(range), write(range), discard(range) Unwritten Contract of HDDs Sequential accesses are best Nearby accesses are more efficient than farther ones MEMS-based storage devices and standard disk interfaces: A square peg in a round hole? Steven W. Schlosser, Gregory R. Ganger FAST 04

What is the right way to achieve high performance on SSDs? Block Device Interface: read(range), write(range), discard(range) Unwritten Contract of HDDs Sequential accesses are best Nearby accesses are more efficient than farther ones MEMS-based storage devices and standard disk interfaces: A square peg in a round hole? Steven W. Schlosser, Gregory R. Ganger FAST 04 Unwritten Contract of SSDs?

What is the right way to achieve high performance on SSDs?

What is the right way to achieve high performance on SSDs? Existing studies

What is the right way to achieve high performance on SSDs? Existing studies Experience of implementing a detailed SSD simulator

What is the right way to achieve high performance on SSDs? Existing studies Experience of implementing a detailed SSD simulator Analysis of experiments

What is the right way to achieve high performance on SSDs? Existing studies Experience of implementing a detailed SSD simulator Analysis of experiments The Unwritten Contract of SSDs

Do the current apps/fses comply with the unwritten contract of SSDs?

Do the current apps/fses comply with the unwritten contract of SSDs? 5 apps LevelDB for SSD RocksDB SQLite (RollBack) SQLite (WAL) Varmail

Do the current apps/fses comply with the unwritten contract of SSDs? 5 apps LevelDB for SSD RocksDB SQLite (RollBack) SQLite (WAL) Varmail 3 file systems for SSD ext4 F2FS XFS

Do the current apps/fses comply with the unwritten contract of SSDs? 5 apps LevelDB for SSD RocksDB SQLite (RollBack) SQLite (WAL) Varmail 3 file systems for SSD ext4 F2FS XFS Contract Rule 1 Rule 2 Rule 3 Rule 4 Rule 5

Do the current apps/fses comply with the unwritten contract of SSDs? 5 apps LevelDB for SSD RocksDB SQLite (RollBack) SQLite (WAL) Varmail 3 file systems for SSD ext4 F2FS XFS Contract Rule 1 Rule 2 Rule 3 Rule 4 Rule 5

Do the current apps/fses comply with the unwritten contract of SSDs? 5 apps LevelDB for SSD RocksDB SQLite (RollBack) SQLite (WAL) Varmail 3 file systems To study the contract, we built a sophisticated SSD simulator and workload analyzer for SSD ext4 F2FS XFS Contract Rule 1 Rule 2 Rule 3 Rule 4 Rule 5

In the paper

In the paper We made 24 detailed observations

In the paper We made 24 detailed observations We learned several high-level lessons

Outline Overview SSD Unwritten Contract Violations of the Unwritten Contract Conclusions

Outline Overview SSD Unwritten Contract Violations of the Unwritten Contract Conclusions

SSD Background

SSD Background

SSD Background Block

SSD Background Block

SSD Background Channel Block

SSD Background Channel Channel Block Block

SSD Background Channel Channel Channel Block Block Block

SSD Background Channel Channel Channel Block Block Block

SSD Background Channel Channel Channel Block Block Block

SSD Background Controller Channel Channel Channel Block Block Block

SSD Background Block Channel Block Channel Block Channel Controller FTL

SSD Background Controller FTL - address mapping - garbage collection - wear-leveling Channel Channel Channel Block Block Block

SSD Background Controller RAM FTL - address mapping - garbage collection - wear-leveling Channel Block Channel Block Channel Block

SSD Background Controller FTL - address mapping - garbage collection - wear-leveling RAM Mapping Table Data Cache Channel Channel Channel Block Block Block

Rules of the Unwritten Contract #1 Request Scale #2 Locality #3 Aligned Sequentiality #4 Grouping by Death Time #5 Uniform Data Lifetime

Rule #1: Request Scale SSD clients should issue large data requests or multiple outstanding data requests.

Rule #1: Request Scale SSD clients should issue large data requests or multiple outstanding data requests. SSD Channel Channel Channel Channel

Rule #1: Request Scale SSD clients should issue large data requests or multiple outstanding data requests. Request SSD Channel Channel Channel Channel

Rule #1: Request Scale SSD clients should issue large data requests or multiple outstanding data requests. SSD Channel Channel Channel Channel

Rule #1: Request Scale SSD clients should issue large data requests or multiple outstanding data requests. SSD High internal parallelism Channel Channel Channel Channel

Rule #1: Request Scale Violation

Rule #1: Request Scale Violation SSD Channel Channel Channel Channel

Rule #1: Request Scale Violation SSD Channel Channel Channel Channel

Rule #1: Request Scale Violation SSD Channel Channel Channel Channel Wasted

Rule #1: Request Scale Violation If you violate the rule: SSD - Low internal parallelism Channel Channel Channel Channel Wasted

Rule #1: Request Scale Violation If you violate the rule: SSD - Low internal parallelism Channel Channel Channel Channel erformance impact: 18x read bandwidth 10x write bandwidth Wasted F. Chen, R. Lee, and X. Zhang. Essential Roles of Exploit- ing Internal arallelism of Flash Memory Based Solid State Drives in High-speed Data rocessing. In roceedings of the 17th International Symposium on High erformance Computer Architecture (HCA-11), pages 266 277, San Antonio, Texas, February 2011.

Rule 2: Locality SSD clients should access with locality SSD RAM Translation Cache FLASH Logical to hysical Mapping Table

Rule 2: Locality SSD clients should access with locality SSD RAM Translation Cache FLASH Logical to hysical Mapping Table

Rule 2: Locality SSD clients should access with locality SSD RAM Translation Cache FLASH Logical to hysical Mapping Table

Rule 2: Locality SSD clients should access with locality SSD RAM Hit Translation Cache FLASH Logical to hysical Mapping Table

Rule 2: Locality SSD clients should access with locality SSD RAM Hit Translation Cache FLASH Logical to hysical Mapping Table High Cache Hit Ratio

Rule 2: Locality Violation SSD SSD clients should access with locality RAM Logical to hysical Mapping Table FLASH

Rule 2: Locality Violation SSD SSD clients should access with locality RAM Logical to hysical Mapping Table FLASH

Rule 2: Locality Violation SSD SSD clients should access with locality If you violate the rule: RAM - Translation cache misses - More translation entry reads/writes Logical to hysical Mapping Table FLASH

Rule 2: Locality Violation SSD SSD clients should access with locality If you violate the rule: RAM - Translation cache misses - More translation entry reads/writes FLASH Logical to hysical Mapping Table erformance impact: 2.2x average response time Y. Zhou, F. Wu,. Huang, X. He, C. Xie, and J. Zhou. An Efficient age-level FTL to Optimize Address Translation in Flash Memory. In roceedings of the EuroSys Conference (EuroSys 15), Bordeaux, France, April 2015.

Rule 3: Aligned Sequentiality Details in the paper

Rule 4: Grouping By Death Time Data with similar death times should be placed in the same block.

Rule 4: Grouping By Death Time Data with similar death times should be placed in the same block. 1pm 2pm 1pm 2pm 1pm 2pm 1pm 2pm

Rule 4: Grouping By Death Time Data with similar death times should be placed in the same block. 1pm 2pm 1pm 2pm 1pm 2pm Time 1pm 2pm 1pm 2pm

Rule 4: Grouping By Death Time Data with similar death times should be placed in the same block. 1pm 2pm 1pm 2pm 1pm 2pm Time 1pm 2pm 1pm 2pm

Rule 4: Grouping By Death Time Data with similar death times should be placed in the same block. 1pm 2pm 1pm 2pm 1pm 2pm Time 1pm 2pm 1pm 2pm

Rule 4: Grouping By Death Time Data with similar death times should be placed in the same block. 1pm 2pm 1pm 2pm 1pm 2pm Time 1pm 2pm 1pm 2pm

Rule 4: Grouping By Death Time Data with similar death times should be placed in the same block. 1pm 2pm 1pm 2pm 1pm 2pm Time 1pm 2pm 1pm 2pm

Rule 4: Grouping By Death Time Violation 1pm 1pm 1pm 1pm 2pm 2pm Time 1pm 2pm 2pm 2pm

Rule 4: Grouping By Death Time Violation 1pm 1pm 1pm 1pm 2pm 2pm Time 1pm 2pm 2pm 2pm

Rule 4: Grouping By Death Time Violation 1pm 1pm 1pm 1pm 2pm 2pm Time 1pm 2pm 2pm 2pm

Rule 4: Grouping By Death Time Violation 1pm 1pm 1pm 1pm 2pm 2pm Time 1pm 2pm 2pm 2pm

Rule 4: Grouping By Death Time Violation 1pm 1pm 2pm 1pm 1pm 2pm 2pm 2pm Time 1pm 2pm 2pm 2pm

Rule 4: Grouping By Death Time Violation 1pm 2pm 1pm 2pm 2pm Time 1pm 2pm 2pm

Rule 4: Grouping By Death Time Violation Data movement!!! 1pm 2pm 1pm 2pm 2pm Time 1pm 2pm 2pm

Rule 4: Grouping By Death Time Violation Data movement!!! 1pm 2pm 1pm 2pm 2pm Time 1pm 2pm 2pm

Rule 4: Grouping By Death Time Violation If you violate the rule: Data movement!!! - erformance penalty - Write amplification 1pm 2pm 1pm 2pm 2pm Time 1pm 2pm 2pm

Rule 4: Grouping By Death Time Violation If you violate the rule: Data movement!!! - erformance penalty - Write amplification 1pm 2pm 1pm 2pm Time 1pm 2pm erformance impact: 2pm 4.8x write bandwidth 1.6x throughput 2pm 1.8x block erasure count C. Lee, D. Sim, J.-Y. Hwang, and S. Cho. F2FS: A New File System for Flash Storage. In roceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 15), Santa Clara, California, February 2015. J.-U. Kang, J. Hyun, H. Maeng, and S. Cho. The Multi- streamed Solid-State Drive. In 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 14), hiladelphia, A, June 2014.. Y. Cheng, F. Douglis,. Shilane, G. Wallace,. Desnoyers, and K. Li. Erasing Belady s Limitations: In Search of Flash Cache Offline Optimality. In 2016 USENIX Annual Technical Conference (USENIX ATC 16), pages 379 392, Denver, CO, 2016. USENIX Association.

Rule 5: Uniform Data Lifetime Clients of SSDs should create data with similar lifetimes

Rule 5: Uniform Data Lifetime Clients of SSDs should create data with similar lifetimes Lifetime 1 Day

Rule 5: Uniform Data Lifetime Clients of SSDs should create data with similar lifetimes Lifetime 1 Day SSD Usage Count: 0 0 0 0

Rule 5: Uniform Data Lifetime Clients of SSDs should create data with similar lifetimes Lifetime 1 Day SSD Usage Count: 3 3 3 3

Rule 5: Uniform Data Lifetime Clients of SSDs should create data with similar lifetimes Lifetime 1 Day No wear-leveling needed SSD Usage Count: 3 3 3 3

Rule 5: Uniform Data Lifetime Violation Lifetime 1 Day 1000 Years SSD Usage Count: 0 0 0 0

Rule 5: Uniform Data Lifetime Violation Lifetime 1 Day 1000 Years SSD Usage Count: 365*10001365*10001

Rule 5: Uniform Data Lifetime Violation Lifetime 1 Day 1000 Years Some blocks wear out sooner Frequent wear-leveling needed!!! SSD Usage Count: 365*10001365*10001

Rule 5: Uniform Data Lifetime Violation If you violate the rule: Lifetime 1 Day - erformance penalty - Write amplification 1000 Years Some blocks wear out sooner Frequent wear-leveling needed!!! SSD Usage Count: 365*10001365*10001

Rule 5: Uniform Data Lifetime Violation If you violate the rule: Lifetime 1 Day - erformance penalty - Write amplification 1000 Years Some blocks wear out sooner erformance impact: Frequent wear-leveling needed!!! 1.6x write latency SSD S. Boboila and. Desnoyers. Write Endurance in Flash Drives: Measurements and Analysis. In roceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST 10), San Jose, California, February 2010. Usage Count: 365*10001365*10001

Outline Overview SSD Unwritten Contract Violations of the Unwritten Contract Conclusions

Do applications/file systems comply with the unwritten contract?

We conduct vertical analysis to find violations of SSD contract

We conduct vertical analysis to find violations of SSD contract App + FS SSD

We conduct vertical analysis to find violations of SSD contract App + FS Block Trace SSD

We conduct vertical analysis to find violations of SSD contract App + FS Block Trace SSD SSD Simulator: WiscSim

We conduct vertical analysis to find violations of SSD contract App + FS Block Trace Analyzer: SSD WiscSee Rule violation? SSD Simulator: WiscSim

We conduct vertical analysis to find violations of SSD contract App + Root Cause FS Block Trace Analyzer: SSD WiscSee Rule violation? SSD Simulator: WiscSim

2 of Our 24 Observations 1. Linux page cache limits request scale 2. F2FS incurs more GC overhead than traditional file systems

2 of Our 24 Observations 1. Linux page cache limits request scale 2. F2FS incurs more GC overhead than traditional file systems

We evaluate request scale by request size and number of concurrent requests ext4 f2fs xfs ext4 f2fs xfs leveldb rocksdb sqlite rb sqlite wal varmail leveldb rocksdb sqlite rb sqlite wal varmail 1200 30 800 400 0 1200 800 400 read write Num of Concurrent Requests 20 10 0 30 20 10 read write 0 0 seq rand mix seq rand mix seq rand mix seq rand mix seq rand mix seq rand mix small large mix seq rand mix seq rand mix small large mix Request Size (KB)

We evaluate request scale by request size and number of concurrent requests ext4 f2fs xfs ext4 f2fs xfs leveldb rocksdb sqlite rb sqlite wal varmail leveldb rocksdb sqlite rb sqlite wal varmail 1200 30 800 400 0 1200 800 400 read write Num of Concurrent Requests 20 10 0 30 20 10 read write 0 0 seq rand mix seq rand mix seq rand mix seq rand mix seq rand mix seq rand mix small large mix seq rand mix seq rand mix small large mix Request Size (KB)

We evaluate request scale by request size and number of concurrent requests ext4 f2fs xfs ext4 f2fs xfs leveldb rocksdb sqlite rb sqlite wal varmail leveldb rocksdb sqlite rb sqlite wal varmail 1200 LevelDB & RocksDB: 30 800 400 0 1200 800 400 0 read write 20 10 0 30 20 10 0 read write seq rand mix seq rand mix seq rand mix seq rand mix seq rand mix seq rand Num of Concurrent Requests mix small large mix seq rand mix seq rand mix Request Size (KB) small large mix Insertions Background Compactions

We evaluate request scale by request size and number of concurrent requests ext4 f2fs xfs ext4 f2fs xfs leveldb rocksdb sqlite rb sqlite wal varmail leveldb rocksdb sqlite rb sqlite wal varmail 1200 LevelDB & RocksDB: 30 Request Size (KB) 800 400 0 1200 800 400 0 read write 20 10 0 30 20 10 0 read write seq rand mix seq rand mix seq rand mix seq rand mix seq rand mix seq rand mix small large mix Num of Concurrent Requests seq rand mix seq rand mix small large mix Insertions Background Compactions Median: ~100KB

We evaluate request scale by request size and number of concurrent requests ext4 f2fs xfs ext4 f2fs xfs leveldb rocksdb sqlite rb sqlite wal varmail leveldb rocksdb sqlite rb sqlite wal varmail 1200 LevelDB & RocksDB: 30 Request Size (KB) 800 400 0 1200 800 400 0 read write 20 10 0 30 20 10 0 read write seq rand mix seq rand mix seq rand mix seq rand mix seq rand mix seq rand mix small large mix Num of Concurrent Requests seq rand mix seq rand mix small large mix Insertions Background Compactions Median: ~100KB Median: ~2

LevelDB and RocksDB can access files in large sizes. Why was the request scale low?

Buffered read(): age cache implementation splits and serializes user requests App age Cache Block Layer SSD

Buffered read(): age cache implementation splits and serializes user requests App read() 2MB age Cache Block Layer SSD

Buffered read(): age cache implementation splits and serializes user requests App read() 2MB age Cache 128KB 128KB 128KB Block Layer SSD

Buffered read(): age cache implementation splits and serializes user requests App read() 2MB age Cache 128KB 128KB 128KB One request at a time Block Layer SSD

Buffered read(): age cache implementation splits and serializes user requests App read() 2MB age Cache 128KB 128KB 128KB One request at a time Surprise! Even reading 2MB in your Block Layer app will not utilize SSD well. SSD

Cause of Violation Large reads are throttled by small prefetching (readahead).

2 of Our 24 Observations 1. Linux page cache limits request scale 2. F2FS incurs more GC overhead than traditional file systems

2 of Our 24 Observations 1. Linux page cache limits request scale 2. F2FS incurs more GC overhead than traditional file systems

We study GC (rule #3: grouping by death time) by zombie curves

We study GC (rule #3: grouping by death time) by zombie curves What s a zombie curve??

We study GC (rule #3: grouping by death time) by zombie curves What s a zombie curve?

We study GC (rule #3: grouping by death time) by zombie curves What s a zombie curve? Run workloads with infinite space over-provisioning

We study GC (rule #3: grouping by death time) by zombie curves What s a zombie curve? Run workloads with infinite space over-provisioning

We study GC (rule #3: grouping by death time) by zombie curves What s a zombie curve? Run workloads with infinite space over-provisioning Valid

We study GC (rule #3: grouping by death time) by zombie curves What s a zombie curve? Run workloads with infinite space over-provisioning Invalid Valid

We study GC (rule #3: grouping by death time) by zombie curves Valid ratio 1.0 0.25 0.75 0 0.75

We study GC (rule #3: grouping by death time) by zombie curves Valid ratio 1.0 0.75 0.75 0.25 0

We study GC (rule #3: grouping by death time) by zombie curves Valid ratio 1.0 0.75 0.75 0.25 0

We study GC (rule #3: grouping by death time) by zombie curves Valid ratio 1.0 0.75 0.75 0.25 0

We study GC (rule #3: grouping by death time) by zombie curves Valid ratio 1.0 0.75 0.75 0.25 0

We study GC (rule #3: grouping by death time) by zombie curves Valid ratio 1.0 0.75 0.75 0.25 0

We study GC (rule #3: grouping by death time) by zombie curves Valid ratio 1.0 0.75 0.75 0.25 0

What s a good zombie curve? Over-provisioned Over-provisioned

What s a good zombie curve? Ready to be used Ready to be used Over-provisioned Over-provisioned

What s a bad zombie curve? Over-provisioned

What s a bad zombie curve? Move data before use Over-provisioned

BTW, zombie curve helps you choose over-provisioning ratio Over-provisioned

BTW, zombie curve helps you choose over-provisioning ratio Over-provisioned

F2FS incurs a worse zombie curve (higher GC overhead) than ext4 for SQLite 1.0 Over-rovisioned Valid Ratio 0.75 0.5 0.25 Animations cannot be displayed in DF. lease see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html 0 0 0.5 1 1.5 2 Flash Space

F2FS incurs a worse zombie curve (higher GC overhead) than ext4 for SQLite 1.0 Over-rovisioned Valid Ratio 0.75 0.5 0.25 Animations cannot be displayed in DF. lease see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html 0 0 0.5 1 1.5 2 Flash Space

F2FS incurs a worse zombie curve (higher GC overhead) than ext4 for SQLite 1.0 Over-rovisioned Valid Ratio 0.75 0.5 ext4 0.25 Animations cannot be displayed in DF. lease see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html 0 0 0.5 1 1.5 2 Flash Space

F2FS incurs a worse zombie curve (higher GC overhead) than ext4 for SQLite 1.0 F2FS Over-rovisioned Valid Ratio 0.75 0.5 ext4 0.25 Animations cannot be displayed in DF. lease see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html 0 0 0.5 1 1.5 2 Flash Space

F2FS incurs a worse zombie curve (higher GC overhead) than ext4 for SQLite 1.0 F2FS Over-rovisioned Valid Ratio 0.75 0.5 ext4 0.25 Animations cannot be displayed in DF. lease see the animations at Stable-state curves characterize workloads. http://pages.cs.wisc.edu/~jhe/zombie.html 0 0 0.5 1 1.5 2 Flash Space

F2FS incurs a worse zombie curve (higher GC overhead) than ext4 for SQLite 1.0 F2FS Over-rovisioned Valid Ratio 0.75 0.5 ext4 0.25 Animations cannot be displayed in DF. lease see the animations at Stable-state curves characterize workloads. http://pages.cs.wisc.edu/~jhe/zombie.html 0 0 0.5 1 1.5 2 Flash Space

Why did F2FS incur a worse zombie curve (GC overhead)?

Why did F2FS incur a worse zombie curve (GC overhead)? SQLite fragmented F2FS

Why did F2FS incur a worse zombie curve (GC overhead)? SQLite fragmented F2FS F2FS did not discard data that was deleted by SQLite

Why did F2FS incur a worse zombie curve (GC overhead)? SQLite fragmented F2FS F2FS did not discard data that was deleted by SQLite F2FS was not able to stay log-structured for SQLite s I/O pattern

More Observations

More Observations Legacy file system allocation policies break locality

More Observations Legacy file system allocation policies break locality Application log structuring does not reduce GC

More Observations Legacy file system allocation policies break locality Application log structuring does not reduce GC 24 observations in the paper

Lessons Learned

Lessons Learned The SSD contract is multi-dimensional Optimizing for one dimension is not enough We need more sophisticated tools to analyze workloads

Lessons Learned The SSD contract is multi-dimensional Optimizing for one dimension is not enough We need more sophisticated tools to analyze workloads Although not perfect, traditional file systems perform surprisingly well upon SSDs

Lessons Learned The SSD contract is multi-dimensional Optimizing for one dimension is not enough We need more sophisticated tools to analyze workloads Although not perfect, traditional file systems perform surprisingly well upon SSDs Myths spread if the unwritten contract is not clarified Random writes increase GC overhead

Conclusions

Conclusions Understanding the unwritten contract is crucial for designing high performance application and file systems

Conclusions Understanding the unwritten contract is crucial for designing high performance application and file systems System designing demands more vertical analysis

Conclusions Understanding the unwritten contract is crucial for designing high performance application and file systems System designing demands more vertical analysis WiscSee (analyzer) and WiscSim (SSD simulator) are available at: http://research.cs.wisc.edu/adsl/software/wiscsee