An SMR-aware Append-only File System Chi-Young Ku Stephen P. Morgan Futurewei Technologies, Inc. Huawei R&D USA

Similar documents
SMR in Linux Systems. Seagate's Contribution to Legacy File Systems. Adrian Palmer, Drive Development Engineering

Skylight A Window on Shingled Disk Operation. Abutalib Aghayev, Peter Desnoyers Northeastern University

Shingled Magnetic Recording (SMR) Panel: Data Management Techniques Examined Tom Coughlin Coughlin Associates

SMORE: A Cold Data Object Store for SMR Drives

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Announcements. Persistence: Log-Structured FS (LFS)

GearDB: A GC-free Key-Value Store on HM-SMR Drives with Gear Compaction

Operating Systems. File Systems. Thomas Ropars.

File Systems. Chapter 11, 13 OSPP

File System Implementations

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

File Systems: FFS and LFS

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

[537] Fast File System. Tyler Harter

Storage Systems for Shingled Disks

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

W4118 Operating Systems. Instructor: Junfeng Yang

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems

Operating Systems. Operating Systems Professor Sina Meraji U of T

2. PICTURE: Cut and paste from paper

LightNVM: The Linux Open-Channel SSD Subsystem Matias Bjørling (ITU, CNEX Labs), Javier González (CNEX Labs), Philippe Bonnet (ITU)

Buffer Management for XFS in Linux. William J. Earl SGI

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

Chapter 10: Case Studies. So what happens in a real operating system?

Linux SMR Support Status

EECS 482 Introduction to Operating Systems

22 File Structure, Disk Scheduling

File System Implementation

Introduction to Open-Channel Solid State Drives and What s Next!

Application-Managed Flash

File Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System

Operating Systems Design Exam 2 Review: Spring 2011

CS 416: Opera-ng Systems Design March 23, 2012

CS370 Operating Systems

Using Transparent Compression to Improve SSD-based I/O Caches

CSE 153 Design of Operating Systems

Main Points. File layout Directory layout

TCSS 422: OPERATING SYSTEMS

Linux Kernel Support for Hybrid SMR Devices. Damien Le Moal, Director, System Software Group, Western Digital

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Improving throughput for small disk requests with proximal I/O

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

<Insert Picture Here> Btrfs Filesystem

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Introduction to Open-Channel Solid State Drives

FS Consistency & Journaling

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

arxiv: v1 [cs.db] 25 Nov 2018

CS 318 Principles of Operating Systems

Novel Address Mappings for Shingled Write Disks

COS 318: Operating Systems. Journaling, NFS and WAFL

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Using SMR Drives with Smart Storage Stack-Based HBA and RAID Solutions

What is a file system

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Emulating Goliath Storage Systems with David

Chapter 11: File System Implementation. Objectives

The Btrfs Filesystem. Chris Mason

A Light-weight Compaction Tree to Reduce I/O Amplification toward Efficient Key-Value Stores

Journaling and Log-structured file systems

CS 550 Operating Systems Spring File System

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

File System Consistency

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

Caching and reliability

ECE 598 Advanced Operating Systems Lecture 14

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

(Not so) recent development in filesystems

Caching and consistency. Example: a tiny ext2. Example: a tiny ext2. Example: a tiny ext2. 6 blocks, 6 inodes

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

43. Log-structured File Systems

Topics. " Start using a write-ahead log on disk " Log all updates Commit

CS-736 Midterm: Beyond Compare (Spring 2008)

File Systems Management and Examples

CS370 Operating Systems

Making Storage Smarter Jim Williams Martin K. Petersen

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

FlashKV: Accelerating KV Performance with Open-Channel SSDs

The Unwritten Contract of Solid State Drives

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

Frequently asked questions from the previous class survey

Operating Systems Design Exam 2 Review: Fall 2010

Linux Filesystems Ext2, Ext3. Nafisa Kazi

NOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System

SSDs vs HDDs for DBMS by Glen Berseth York University, Toronto

CSC369 Lecture 9. Larry Zhang, November 16, 2015

CS 537 Fall 2017 Review Session

SMORE: A Cold Data Object Store for SMR Drives (Extended Version)

Open-Channel SSDs Then. Now. And Beyond. Matias Bjørling, March 22, Copyright 2017 CNEX Labs

NOVA: The Fastest File System for NVDIMMs. Steven Swanson, UC San Diego

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Linux on zseries Journaling File Systems

File Systems. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

Red Hat Enterprise 7 Beta File Systems

Transcription:

An SMR-aware Append-only File System Chi-Young Ku Stephen P. Morgan Futurewei Technologies, Inc. Huawei R&D USA

SMR Technology (1) Future disk drives will be based on shingled magnetic recording. Conventional Shingled track track track gap track track track track track Higher recording density But no random writes 2

SMR Technology (2) Drive divided into large zones. Typically 256 Mbytes each. Per-zone write pointer for next write loc n. Write pointer advances as data is written. Can reset write pointer on per-zone basis. Zone may be empty, full, or partially full. Unwritten area filled with initialization pattern. 3

SMR Technology (3) Three kinds of SMR drives: Drive managed Has STL layer that accepts random I/Os. Existing software runs correctly, poor performance. Host managed Writes must be performed at write pointer. Requires new software to be written. Host aware Has STL layer that accepts random I/Os, but: Prefers writes performed at write pointer. Existing software may be tweaked to run better. 4

SMR Translation Layer (STL) Part of drive is reserved to buffer random I/Os. The data in this area is eventually moved to its home location after a read-modify-write cycle. Operation is performed in the background When possible. Disk space could be replaced by flash memory At a significant cost, but higher performance. 5

Common File Systems on SMR Drives Due to Dr. Hannes Reinecke (SUSE Labs) btrfs is nearly there. Writes sequentially due to its CoW nature. Very few fixed data locations. xfs might be an option Roughly same zone usage as btrfs. Hardly any sequential writes. Report by Dave Chinner for adoption for SMR drives. 6

Changes to ext4 for SMR (SMRFFS) See https://github.com/seagate/smr_fs-ext4 Optimizes sequential file layout In-order writes and idle-time garbage collection Block groups laid out to match zone alignments Allocator changed to follow forward-write rqmts New extent layout Many more changes throughout stack 7

Append-only Applications Scientific sensor data. Financial time series data. Temporal business data. Surveillance data. Web logs. RocksDB / LevelDB (LSM-tree). 8

Circular Append-only Applications Probability of access to data in most appendonly applications decreases with the age of data. Depending on the requirements, old data could be purged or migrated to cool storage. In both cases, it would be advantageous to design such applications to circularly append data. 9

Log-structured File Systems File system data and metadata are written to a large circular buffer called a log. Reads are satisfied from a large memory cache. Unrealistic in practice. Disk seeks are minimized for writes, not reads. Garbage collection becomes frequent as file system fills up. Seemingly good match for SMR drives. No update in place. 10

SMR-aware Append-only FS Overview Combination of a log-structured file system and a conventional file system. Log is a (large) list of zones. File comprises a zone or a list of zones. Design also supports multiple files per zone. Data initially written to log, then migrated to file. Happens during log compaction, instead of LFS s generational garbage collection. 11

SMR-aware Append-only FS Overview (2) Some FS data structures are rewritten in place E.g., inodes, allocation maps Host-aware drives support a small number of random I/O zones (e.g., 16) Log and files (frequently updated) written in order within zones, from start to finish. Log compaction eats a zone at a time 12

SAFS Layout Inode m Inode m+1 Inode m+2 Zone n Zone o... Log head Log tail Zone p Zone q Zone r... Zone s 13

Segment Structure SHB Data Log Data Log Data Log Head Segment Segment Segment Segment Segment Zone Tail Zone Zone 14

Compaction Inode Array File 0 Data Data Data File 0 Zone 0 File 1 File 2 Copy SHB Data Log Data Log Data Log Head Segment Zone Tail Zone Zone 15

SAFS Implementation Implemented with CSIM 20 simulator on Linux Coupled with Seagate 5TByte HA SMR drive Measured Performance of append-only applic ns 256 zones in file system, 16 zones random RW, 16 Gbytes of DRAM, x86-64 system 16

Disclaimer Not a production file system Purposes: Explore potential of HA SMR drives Explore combination of LFS and conventional file systems Explore append-only file systems 17

CSIM 20 Discrete Event Simulator Use CSIM event to simulate semaphores Use CSIM ports to simulate IPC Use CSIM virtual time to account for SMR disk processing time Use CSIM processes to simulate POSIX threads SMR disk I/O performed via HA SMR drive 18

SAFS Simulator Components Workload simulation module File system commands simulation module Buffer cache simulation module Segment system simulation module Journaling system simulation module Lock manager simulation module SMR disk simulation module 19

Measured SAFS Applications (1) Creates four files Appends to all files (one block at a time to each file) until the system is ½ full Reads each file (one block at a time from each file) to the end Deletes all four files 20

Performance (1) File system size was 64 GBytes Total amount of data read/written was 64 GBytes Total time was 458 seconds Average processing rate was 143.1 MBytes/sec 21

Performance Comparison (1) Ran same steps on other file systems using a 4TByte conventional drive: File System Time Rate SAFS* 458 sec 143.1 MB/sec F2FS 504 sec 130.0 MB/sec NILFS2 510 sec 128.5 MB/sec EXT4 571 sec 114.8 MB/sec * Simulated, on a 5TByte, HA SMR drive. 22

Measured SAFS Applications (2) Creates four files Appends to all files (one block at a time to each file) until the system is ¾ full Deletes a file, re-creates and appends to it until the system is ¾ full again (for all four files) Deletes all four files 23

Performance (2) File system size was 64 GBytes Total amount of data written was 96 GBytes Total time was 698 seconds Average ingestion rate was 140.8 MBytes/sec 24

Performance Comparison (2) Ran same steps on other file systems using a 4TByte conventional drive: File System Time Rate SAFS* 698 sec 140.8 MB/sec NILFS2 742 sec 132.5 MB/sec EXT4 988 sec 99.4 MB/sec F2FS DNF N/A * Simulated, on a 5TByte, HA SMR drive. 25

Conclusion Simulated SAFS on HA SMR drive performs better than modern production LFS and production conventional file system on conventional disk under append-only workload. 26

Questions? 27