MEMBRANE: OPERATING SYSTEM SUPPORT FOR RESTARTABLE FILE SYSTEMS

Similar documents
Membrane: Operating System support for Restartable File Systems

Membrane: Operating System Support for Restartable File Systems

Membrane: Operating System Support for Restartable File Systems

COS 318: Operating Systems. Journaling, NFS and WAFL

Tolera'ng File System Mistakes with EnvyFS

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Announcements. Persistence: Log-Structured FS (LFS)

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

Operating Systems. File Systems. Thomas Ropars.

FS Consistency & Journaling

Advanced Memory Management

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University

W4118 Operating Systems. Instructor: Junfeng Yang

Announcements. Persistence: Crash Consistency

Journaling versus Soft-Updates: Asynchronous Meta-data Protection in File Systems

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

*-Box (star-box) Towards Reliability and Consistency in Dropbox-like File Synchronization Services

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

Physical Disentanglement in a Container-Based File System

[537] Journaling. Tyler Harter

Case study: ext2 FS 1

ECE 598 Advanced Operating Systems Lecture 19

University of Wisconsin-Madison

mode uid gid atime ctime mtime size block count reference count direct blocks (12) single indirect double indirect triple indirect mode uid gid atime

Barrier Enabled IO Stack for Flash Storage

EECS 482 Introduction to Operating Systems

CS 318 Principles of Operating Systems

Ext3/4 file systems. Don Porter CSE 506

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Emulating Goliath Storage Systems with David

Case study: ext2 FS 1

Lab 4 File System. CS140 February 27, Slides adapted from previous quarters

Designing a True Direct-Access File System with DevFS

What is a file system

File System Consistency

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

Caching and reliability

<Insert Picture Here> Filesystem Features and Performance

OCFS2 Mark Fasheh Oracle

<Insert Picture Here> Btrfs Filesystem

(Not so) recent development in filesystems

CS5460: Operating Systems Lecture 20: File System Reliability

Operating Systems Design Exam 2 Review: Spring 2011

ViewBox. Integrating Local File Systems with Cloud Storage Services. Yupu Zhang +, Chris Dragga + *, Andrea Arpaci-Dusseau +, Remzi Arpaci-Dusseau +

Topics. " Start using a write-ahead log on disk " Log all updates Commit

Optimistic Crash Consistency. Vijay Chidambaram Thanumalayan Sankaranarayana Pillai Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau

CS 416: Opera-ng Systems Design March 23, 2012

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

2. PICTURE: Cut and paste from paper

Tricky issues in file systems

CS 318 Principles of Operating Systems

CSE506: Operating Systems CSE 506: Operating Systems

ext3 Journaling File System

Coerced Cache Evic-on and Discreet- Mode Journaling: Dealing with Misbehaving Disks

OPERATING SYSTEM TRANSACTIONS

NTFS Recoverability. CS 537 Lecture 17 NTFS internals. NTFS On-Disk Structure

Chapter 11: Implementing File-Systems

1 / 23. CS 137: File Systems. General Filesystem Design

CrashMonkey: A Framework to Systematically Test File-System Crash Consistency. Ashlie Martinez Vijay Chidambaram University of Texas at Austin

File System Internals. Jo, Heeseung

jvpfs: Adding Robustness to a Secure Stacked File System with Untrusted Local Storage Components

Soft Updates Made Simple and Fast on Non-volatile Memory

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Tolerating File-System Mistakes with EnvyFS

NPTEL Course Jan K. Gopinath Indian Institute of Science

DB Goals. Concurrency Control & Recovery. Transactions. Std. example - Durability

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison

1 / 22. CS 135: File Systems. General Filesystem Design

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

OPERATING SYSTEM. Chapter 12: File System Implementation

To Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1

Application Fault Tolerance Using Continuous Checkpoint/Restart

File Systems: Consistency Issues

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

Recovering Device Drivers

Chapter 11: Implementing File Systems

Current Topics in OS Research. So, what s hot?

Linux Filesystems Ext2, Ext3. Nafisa Kazi

Nooks. Robert Grimm New York University

Computer Systems Laboratory Sungkyunkwan University

File Systems. Chapter 11, 13 OSPP

Chapter 12: File System Implementation

Copyright 2014 Fusion-io, Inc. All rights reserved.

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems

Object Oriented Transaction Processing in the KeyKOS Microkernel

Chapter 12 File-System Implementation

An Overview of The Global File System

Chapter 10: File System Implementation

Presented by: Alvaro Llanos E

Chapter 11: Implementing File Systems

Chapter 11: File System Implementation

Chapter 10: Case Studies. So what happens in a real operating system?

Chapter 11: File System Implementation

Fast File, Log and Journaling File Systems" cs262a, Lecture 3

Chapter 11: File System Implementation. Objectives

OS Extensibility: SPIN and Exokernels. Robert Grimm New York University

HARDFS: Hardening HDFS with Selective and Lightweight Versioning

Rethink the Sync. Abstract. 1 Introduction

Transcription:

Department of Computer Science Institute of System Architecture, Operating Systems Group MEMBRANE: OPERATING SYSTEM SUPPORT FOR RESTARTABLE FILE SYSTEMS SWAMINATHAN SUNDARARAMAN, SRIRAM SUBRAMANIAN, ABHISHEK RAJIMWALE, ANDREA C. ARPACI-DUSSEAU, REMZI H. ARPACI-DUSSEAU, MICHAEL M. SWIFT CARSTEN WEINHOLD

MOTIVATION Operating Systems crash. File systems fail. Many bugs are in file systems Reasons: Complex code bases Under active development Large number of file systems How to fix? 2

HOW TO RECOVER? Research on OS subsystem recovery: Isolation, micro-rebooting Checkpoint / restart Problem: file systems are stateful On-disk data In-memory data Spread across kernel / user memory 3

SOLUTIONS Heavyweight Lightweight Nooks/Shadow[31, 32] SafeDrive[40] Stateless Xen[10], Minix[13, 14] Singularity[19] L4[20], Nexus[37] Stateful CuriOS[7] EROS[29] Membrane 4

MEMBRANE Membrane is OS framework: Light-weight stateful recovery Checkpointing on-disk state Logging of operations In case of failure: Park all file system operations Cleanup state, reset file system Replay logged operations from checkpoint Continue 5

OVERVIEW 6

GOALS Fault tolerant Lightweight Transparent Generic Maintain file-system consistency 7

FAULT DETECTION Aims at transient fail-stop errors Light-weight detection: Exceptions (divide-by-zero, page fault,...) assert(), panic(), BUG(),... Argument checks at kernel / file system boundaries No address-space isolation, etc. 8

CLEAN STATE Recovery requires clean on-disk state Based on checkpointing Checkpoints mark begin / end of epochs Epoch 0 Epoch 1 (time=0) Write A to Block 0 (time=1) Checkpoint (time=2) Write B to Block 0 (time=3) I/O Flush A (dirty) [block 0, epoch 0] A [block 0, epoch 0] (dirty,cow) A [block 0, epoch 0] B (dirty, COW) (dirty) B (dirty) In Memory [block 0, epoch 1] [block 0, epoch 1] A On Disk 9

CHECKPOINTING Reuse existing mechanisms: Journaling support, snapshots,... Notify Membrane of begin / end of transaction Generic checkpointing at VFS level: Park new file system operations Wait for pending operations to complete Copy dirty metadata back to buffers Mark dirty buffers copy-on-write Write back asynchronously 10

LOGS operation log: operations and data session log: open files from previous epoch, file pointers,... malloc table: all memory allocated by file system lock stack: all held global locks for LIFO releasing unwind stack: register state to support unwinding 11

RECOVERY Halt execution and park threads Unwind in-flight threads Commit dirty pages from last epoch to stable storage Kill file system ( unmount ) Restart file system ( mount ) Roll forward logged operations / state Resume execution 12

SKIP/TRUST 13

CLEANUP STATE Free all memory allocated by file system Release all global locks in LIFO order 14

OPTIMIZATIONS Compressed operation log Uses page stealing Latest written data is in dirty pages steal before recovery, write to disk op-log (naive) write(a) to blk 0 A write(b) to blk 1 B write(c) to blk 0 C op-log (with page stealing) write(a) to blk 0 (not needed) write(b) to blk 1 write(c) to blk 0 C Page Cache B 15

FAULT STUDY ext2 ext2+ ext2+ How Detected? Application? FS:Consistent? FS:Usable? boundary How Detected? Application? FS:Consistent? FS:Usable? Membrane How Detected? Application? FS:Consistent? FS:Usable? ext2 Function Fault create null-pointer o o d create mark inode dirty o o d writepage write full page o a d s a d writepages write full page o a d s a d free inode mark buffer dirty o o b a d mkdir d instantiate o d s d get block map bh o a o b d readdir page address G G d get page kmap o o b d get page wait page locked o o b d get page read cache page o o d lookup iget o o b d add nondir d instantiate o d e d find entry page address G G b d symlink null-pointer o o d rmdir null-pointer o o d empty dir page address G G d make empty grab cache page o o b d commit chunk unlock page o d e d readpage mpage readpage o i d 16

PERFORMANCE ext2 ext2+ ext3 ext3+ VFAT VFAT+ Benchmark Membrane Membrane Membrane Seq. read 17.8 17.8 17.8 17.8 17.7 17.7 Seq. write 25.5 25.7 56.3 56.3 18.5 20.2 Rand. read 163.2 163.5 163.2 163.2 163.5 163.6 Rand. write 20.3 20.5 65.5 65.5 18.9 18.9 create 34.1 34.1 33.9 34.3 32.4 34.0 delete 20.0 20.1 18.6 18.7 20.8 21.0 ext2 ext2+ ext3 ext3+ VFAT VFAT+ Benchmark Membrane Membrane Membrane Sort 142.2 142.6 152.1 152.5 146.5 146.8 OpenSSH 28.5 28.9 28.7 29.1 30.1 30.8 PostMark 46.9 47.2 478.2 484.1 43.1 43.8 [...] in all cases, the overheads were between 0% and 2% 17

RESTART Data Recovery (MB) time (ms) 10 12.9 20 13.2 40 16.1 (a) Table 6: Recovery Time. Open Recovery Sessions time (ms) 200 11.4 400 14.6 800 22.0 (b) Log Recovery Records time (ms) 1K 15.3 10K 16.8 100K 25.2 (c) Tables a, b, and c show recovery time as a function of dirty pages (at checkpoint), s-log, and op-log respectively. Dirty pages are created by copying new files. Open sessions are created by getting handles to files. Log records are generated by reading and seeking to arbitrary data inside multiple files. The recovery time was 8.6ms when all three states were empty. 18

RESTART IMPACT Read Latency(ms) 12 8 4 Crash Average Response Time Response Time Indirect Blocks 60 40 20 Indirect blocks 0 15 25 35 45 55 Elapsed time (s) 0 19

COMPLEXITY File System Added Modified ext2 4 0 VFAT 5 0 ext3 1 0 JBD 4 0 Individual File-system Changes Components No Checkpoint With Checkpoint Added Modified Added Modified FS 1929 30 2979 64 MM 779 5 867 15 Arch 0 0 733 4 Headers 522 6 552 6 Module 238 0 238 0 Total 3468 41 5369 89 Kernel Changes 20

SUMMARY File systems fail Usually they case kernel / app crashes Membrane allows them to be restarted Light-weight Stateful Generic Transparent 21

DISCUSSION Does the fault model actually cover most of the bugs? NFS? Why is it called Membrane? 22