Don t stack your Log on my Log

Similar documents
HEC: Improving Endurance of High Performance Flash-based Cache Devices

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

Copyright 2014 Fusion-io, Inc. All rights reserved.

AutoStream: Automatic Stream Management for Multi-stream SSDs

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

Presented by: Nafiseh Mahmoudi Spring 2017

A File-System-Aware FTL Design for Flash Memory Storage Systems

FStream: Managing Flash Streams in the File System

Flash Memory Based Storage System

CS 537 Fall 2017 Review Session

Overprovisioning and the SanDisk X400 SSD

Maximizing Data Center and Enterprise Storage Efficiency

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

Using Transparent Compression to Improve SSD-based I/O Caches

January 28-29, 2014 San Jose

OSSD: A Case for Object-based Solid State Drives

GFS: The Google File System

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison

SFS: Random Write Considered Harmful in Solid State Drives

FlashTier: A Lightweight, Consistent and Durable Storage Cache

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

Optimizing Flash-based Key-value Cache Systems

pblk the OCSSD FTL Linux FAST Summit 18 Javier González Copyright 2018 CNEX Labs

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

Linux Kernel Abstractions for Open-Channel SSDs

Speeding Up Cloud/Server Applications Using Flash Memory

[537] Flash. Tyler Harter

Exploiting the benefits of native programming access to NVM devices

SMORE: A Cold Data Object Store for SMR Drives

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Operating Systems. File Systems. Thomas Ropars.

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

The Google File System

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Enabling NVMe I/O Scale

OS-caused Long JVM Pauses - Deep Dive and Solutions

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Open-Channel SSDs Then. Now. And Beyond. Matias Bjørling, March 22, Copyright 2017 CNEX Labs

Application-Managed Flash

The Google File System

File Systems: Consistency Issues

FILE SYSTEMS, PART 2. CS124 Operating Systems Winter , Lecture 24

CA485 Ray Walshe Google File System

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

CSE 153 Design of Operating Systems

Journal Remap-Based FTL for Journaling File System with Flash Memory

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

The Google File System

Optimizing Software-Defined Storage for Flash Memory

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

2. PICTURE: Cut and paste from paper

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

NVMKV: A Scalable, Lightweight, FTL-aware Key-Value Store

GFS: The Google File System. Dr. Yingwu Zhu

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

AutoStream: Automatic Stream Management for Multi-stream SSDs in Big Data Era

Chapter 12: File System Implementation

CS3600 SYSTEMS AND NETWORKS

NVM Compression: Hybrid Flash- Aware Applica;on Level Compression

Distributed File Systems II

Chapter 12: File System Implementation

NPTEL Course Jan K. Gopinath Indian Institute of Science

From server-side to host-side:

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Google File System

Chapter 10: File System Implementation

Main Points. File layout Directory layout

Implica(ons of Non Vola(le Memory on So5ware Architectures. Nisha Talagala Lead Architect, Fusion- io

The Btrfs Filesystem. Chris Mason

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

Utilitarian Performance Isolation in Shared SSDs

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory

Google Disk Farm. Early days

Google File System. Arun Sundaram Operating Systems

Lecture 18: Reliable Storage

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

Getting Real: Lessons in Transitioning Research Simulations into Hardware Systems

FOR decades, transactions have been widely used in database

Replacing the FTL with Cooperative Flash Management

Clustered Page-Level Mapping for Flash Memory-Based Storage Devices

EECS 482 Introduction to Operating Systems

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

A Memory Management Scheme for Hybrid Memory Architecture in Mission Critical Computers

Chapter 11: Implementing File Systems

FFS: The Fast File System -and- The Magical World of SSDs

Workload-Aware Elastic Striping With Hot Data Identification for SSD RAID Arrays

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices

Linux Filesystems Ext2, Ext3. Nafisa Kazi

MTD Based Compressed Swapping for Embedded Linux.

High Performance Transactions in Deuteronomy

Transcription:

Don t stack your Log on my Log Jingpei Yang, Ned Plasson, Greg Gillis, Nisha Talagala, Swaminathan Sundararaman Oct 5, 2014 c 1

Outline Introduction Log-stacking models Problems with stacking logs Solutions to mitigate log-stacking issues 2

Log-structuring Append data at the tail Maximized write throughput by aggregating small random writes to large sequential ones Easier storage management Free space management Fast recovery Good for transactional consistency Low overhead snapshots Flexibility on reclaiming space or data chunks Good for database systems that unmodified data could be grouped during GC process free space 3

Log-structuring -why on flash? Flash does not support in place update Erase is costly Asymmetric read/write performance Write is much slower than Read Operations are in a unit of page (4KB, 8KB) High overhead for small random I/O e.g. a 512-byte write may consume a 4KB flash page Log-structure is a better choice for flash memory! block 0 block 1 block 2 page 0 page 1 page 2 page 3 page 4 page 5 page 6 page 7 page 8 page 9 page 10 page 11 logging 4

Log-structuring -who? database e.g. change and access logging records logging flash aware file systems, FTL e.g. data log, metadata log logging system, pub/sub everything is logged 5

Are more logs better? Individual logging is optimized for performance Individual logging provides better isolation Multiple logs within one application/layer is even better Hot/cold data separation Easy metadata management Multiple layers of logs? 6

Outline Introduction Log-stacking models Problems with stacking logs Solutions to mitigate log-stacking issues 7

Log-on-log models Log Structured Application or Filesystem Crash Recovery Journaling Other Services Garbage Collection User Data Meta Data Log Segments (size=n) Data Append Point #1 Data Append Point #3 Data Append Point #2 Device Flash Translation Layer (FTL) Log Crash Recovery Journaling Other Services Garbage Collection User Data Meta Data Log Segments Data Append Point #1 (size=1.5n) Data Append Point #2 8

Log-on-log models Log Non-log FTL Log Log Log FTL Log Log Log FTL Log 9

Outline Introduction Log-stacking models Problems with stacking logs Solutions to mitigate log-stacking issues 10

Problems with stacking logs Increased write pressure Destroyed sequentiality Duplicated log capabilities Lack of coordination Application logs are not FTL aware FTL log is not application aware app 1 app 2 app 3 Flash logging 11

Experimental methodology A flash-aware file system F2FS Up to 6 logs On top of a single log (one append point) SSD A log-on-log simulator Mimic a file system to device storage system Data placing, storage management, garbage collector virtual logical physical Upper log storage medium write read Lower log storage medium Storage Mgmt. Storage Mgmt. Garbage collector Garbage collector Metadat a Mgmt. I/O interface Metadat a Mgmt. 12

1 -Metadata footprint increase Each layer/log has its own metadata On-media metadata is increased resulting in more writes to the device In-memory metadata is increased higher memory consumption Increased metadata results in reduced storage for user data and additional writes which reduces endurance. 13

2 -Fragmented logs Mixed workload from logs and other traffic destroys sequentiality Each log writes sequentially, but the device gets mixed workload, most likely to be random Underlying flash-based SSD also writes its own metadata Log 1 Log 2 Log 3 Other non-log traffic Application layer write Flash layer SEG 1 SEG 2 SEG 3 read Multiple logs on a shared device results in random traffic seen by underlying device. 14

2 -Fragmented logs (cont.) Unaligned segment size Garbage collecting one upper segment results in data invalidation across multiple segments in device log Matching segment sizes does not prevent data fragmentation application log flash log app_seg 1 app_seg 2 app_seg 3 dev_seg 1 dev_seg 2 dev_seg 3 Unaligned segment size results in fragmented device space caused by garbage collection. 15

3 -Decoupled segment cleaning Without TRIM, data has different validation view on each log layer Data could be moved multiple times across log layers valid block invalid block free block file system logs 1 2 3 4 5 6 7 device log 2 4 6 7 2 4 6 Segment 1 Segment 2 Uncoordinated garbage collection on each log destroys the sequentialityof data that the application log intends to maintain, and leads to more flash writes. Even with TRIM support, the sequentialitycould hardly be guaranteed in the lower flash log. 16

Fragmented logs + decoupled cleaning A combined effect on the overall WA -TCWA Higher log ratio results in higher device WA 60GB random writes, 4K direct IO < 1% 4-32% 3-24% More upper log results in higher device WA, and hence higher TCWA 17

Outline Introduction Log-stacking models Problems with stacking logs Solutions to mitigate log-stacking issues 18

Methods for log coordination Total Combined WA 9.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 Turning slope: low_seg_size > up_seg_size Upper log segment size MB up_2 up_4 up_8 up_16 up_32 Smaller flash segment size is better in terms of WA Total WA increases as lower log segment size increases Turning slope when low_seg_size > up_seg_size 0.00 2 4 8 16 32 64 128 256 512 Lower log segment size MB Tpce- like: upper/lower size ratio 90% 19

Methods for log coordination (cont.) Total Combined WA 9.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 Upper log segment size MB up_2 up_4 up_8 up_16 up_32 Total Combined WA 3.00 2.50 2.00 1.50 1.00 0.50 0.00 0.00 2 4 8 16 32 64 128 256 512 2 4 8 16 32 64 128 256 512 Lower log segment size MB Lower log segment size MB upper/lower size ratio 90% upper/lower size ratio 70% Total WA is a combined effect of capacity ratio, segment size ratio, GC frequency, and etc. 20

Methods for log coordination (cont.) Size coordination to reduce Total Combined WA -TCWA TCWA = (upper log WA) * (lower log WA) Capacity ratio Segment size ratio Tradeoff the space management and GC efficiency by adjusting upper log segment size Coordinated GC activity Postpone the device log GC while the upper GC is active Avoid/minimize the same data to be moved multiple times Start multiple upper logs GC simultaneously 21

Collapsing logs Stacking logs is not always optimal for flash-based system Log coordination is less possible with multiple layers be involved Two approaches to collapsing logs NVMFS (formerly called DirectFS): sparse space, atomic writes, PTRIM Object-based flash: breaking the standard block interface 22

Collapsing logs (cont.) Log-structured file system Metadata mgmt. Garbage collection Metadata mgmt. Garbage collection Space allocation Crash recovery Kernel block layer FTL-based flash memory Space allocation Crash recovery Application VFS abstraction layer Addr. mapping Logging/ journaling Addr. mapping FTL Logging Metadata mgmt. Garbage collection Flash-aware file system Space allocation Crash recovery Flash memory device Addr. mapping Atomic transaction Eliminate redundant log layers Remove similar log functionalities Break the block interface Keep the log capabilities in only one layer File system layer with a lightweight flash device design FTL layer with a log-less file system design 23

Conclusion Log structured system is designed to provide high throughput Combining small random requests to large sequential ones Multiple logs/append points in different system layers tends to provide better data and space management hot/cold separation Stacking logs on another log achieves sub-optimal performance Lack of coordination on each layer mixes the traffic Log-on-log can perform better with improved coordination Size coordination, GC coordination File system support Collapsing logs breaking the block interface NVMFS: sparse space, atomic writes, PTRIM Object-based flash storage 24

Thank You 25