Tricky issues in file systems

Similar documents
File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File Systems: Consistency Issues

Journaling. CS 161: Lecture 14 4/4/17

File System Consistency

CSE506: Operating Systems CSE 506: Operating Systems

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

Advanced File Systems. CS 140 Feb. 25, 2015 Ali Jose Mashtizadeh

Lecture 18: Reliable Storage

Operating Systems. File Systems. Thomas Ropars.

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

Lecture 11: Linux ext3 crash recovery

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University

COS 318: Operating Systems. Journaling, NFS and WAFL

Ext3/4 file systems. Don Porter CSE 506

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Caching and reliability

EECS 482 Introduction to Operating Systems

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Review: FFS background

W4118 Operating Systems. Instructor: Junfeng Yang

File Systems. CS170 Fall 2018

To Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

Devsummit Concurrency Hacks

Evolution of the Unix File System Brad Schonhorst CS-623 Spring Semester 2006 Polytechnic University

Review: FFS [McKusic] basics. Review: FFS background. Basic FFS data structures. FFS disk layout. FFS superblock. Cylinder groups

CS3210: Crash consistency

The Tux3 File System

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Operating System Concepts Ch. 11: File System Implementation

OPERATING SYSTEM. Chapter 12: File System Implementation

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Chapter 11: Implementing File Systems

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Journaling versus Soft-Updates: Asynchronous Meta-data Protection in File Systems

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

CS3600 SYSTEMS AND NETWORKS

File System Implementation

CS 537 Fall 2017 Review Session

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems

2. PICTURE: Cut and paste from paper

Chapter 11: Implementing File

CrashMonkey: A Framework to Systematically Test File-System Crash Consistency. Ashlie Martinez Vijay Chidambaram University of Texas at Austin

Chapter 12: File System Implementation

File Systems. Chapter 11, 13 OSPP

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Midterm evaluations. Thank you for doing midterm evaluations! First time not everyone thought I was going too fast

Lecture 10: Crash Recovery, Logging

SCSI overview. SCSI domain consists of devices and an SDS

CS 318 Principles of Operating Systems

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

November 9 th, 2015 Prof. John Kubiatowicz

ext3 Journaling File System

File Systems: Naming

1 / 23. CS 137: File Systems. General Filesystem Design

Operating Systems. Operating Systems Professor Sina Meraji U of T

Chapter 10: File System Implementation

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

File Systems. Todays Plan. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

FS Consistency & Journaling

Chapter 11: File System Implementation. Objectives

Chapter 12: File System Implementation

C13: Files and Directories: System s Perspective

Chapter 10: Case Studies. So what happens in a real operating system?

Virtual File System. Don Porter CSE 306

CS122 Lecture 15 Winter Term,

File Systems: Recovery

File Systems. CSE 2431: Introduction to Operating Systems Reading: Chap. 11, , 18.7, [OSC]

To understand this, let's build a layered model from the bottom up. Layers include: device driver filesystem file

Week 12: File System Implementation

From Crash Consistency to Transactions. Yige Hu Youngjin Kwon Vijay Chidambaram Emmett Witchel

22 File Structure, Disk Scheduling

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions

File systems CS 241. May 2, University of Illinois

Directory. File. Chunk. Disk

Implementation should be efficient. Provide an abstraction to the user. Abstraction should be useful. Ownership and permissions.

File Systems Ch 4. 1 CS 422 T W Bennet Mississippi College

Linux Filesystems Ext2, Ext3. Nafisa Kazi

Announcements. Persistence: Crash Consistency

The Google File System

CS 111. Operating Systems Peter Reiher

Shared snapshots. 1 Abstract. 2 Introduction. Mikulas Patocka Red Hat Czech, s.r.o. Purkynova , Brno Czech Republic

ABrief History of the BSD Fast Filesystem. Brought to you by. Dr. Marshall Kirk McKusick

FILE SYSTEMS, PART 2. CS124 Operating Systems Winter , Lecture 24

File Systems. What do we need to know?

CS307: Operating Systems

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

NFS. Don Porter CSE 506

ECE 598 Advanced Operating Systems Lecture 18

Chapter 7: File-System

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

OPERATING SYSTEMS CS136

Chapter 11: File-System Interface

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Reminder from last time

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

Project 5: GeekOS File System 3

Chapter 12: File System Implementation

Transcription:

Tricky issues in file systems Taylor Riastradh Campbell campbell@mumble.net riastradh@netbsd.org EuroBSDcon 2015 Stockholm, Sweden October 4, 2015

What is a file system? Standard Unix concept: hierarchy of directories, regular files, device nodes, fifos, sockets. Unified API: file descriptors. Traditionally unified storage: inodes. Directories are sometimes different : contain metadata pointers.

File system operations creat (open) unlink link rename mkdir, rmdir read, write, fsync (mkfifo, symlink, readdir, &c.)

Reliability properties ACID Atomicity Consistency Isolation Durability No transactions: only individual operations.

Atomicity Operation either happens all at once, or not at all. Crash in middle will not leave half-finished operation.

Atomicity: rename rename(old, new) acts as if unlink(new) link(old, new) unlink(old)... but atomic.

Consistency All operations preserve consistent disk state.

Isolation If process A does rename: unlink("bar") link("foo", "bar") unlink("foo")... then process B won t see two links at foo and bar in the middle.

Durability When the sync program returns, whatever file system operations you performed will stay on disk.

File system states File system has one of three states: clean dirty corrupt (bugs, disk failure, cosmic rays) Clean flag in superblock: 0 means known clean 1 means not known whether clean or dirty (or corrupt)

File system states: fsck Traditional: file system operations write metadata synchronously Inode updates, directory entry updates Every step preserves consistent state but not necessarily clean state. On boot: If marked clean, just mount. Otherwise, fsck -p globally analyzes file system to undo partially completed operations.

File system states: fsck example Inode block allocation need space to append to file: Find block in free list. Mark block allocated. Assign block to inode. If crash after block allocated, before block assigned: fsck -p scans all inodes finds all assigned blocks frees unassigned but allocated blocks

File system states: fsck to fix corruption fsck (without -p) also tries to fix corrupted file systems (Doesn t always work)

Logging Physical block logging: Write blocks serially to write-ahead log (not synchronously) After committed to log, write to disk After committed to disk, free space in log After crash, replay all committed writes in log Faster to recover from crash (but not corruption): replay log is quick scan, not global analysis... but isn t usually quite enough

Logging Logical block logging: Write logical operations serially to write-ahead log After committed to log, perform operations on disk After committed to disk, free space in log After crash, replay all committed operations in log More complex to implement But usually necessary at least a little

Physical vs logical NetBSD FFS WAPBL, write-ahead physical block logging Actually, composite of physical and logical

Physical vs logical: block deallocation Deallocate blocks from file, e.g. on rm Reuse blocks immediately? No! Reallocated block write might happen before log write! Logical log entry: deallocate block Physical log entry: change inode to not point at block When physical log committed, then commit logical log

Reliability assumptions Atomic disk sector writes Disk write ordering Disk write cache

Disk encryption Threat model: attacker reads (possibly several) snapshots of disk (e.g., airport security) (Why several? Reallocated disk sectors, especially in SSDs.) 1 1 plaintext/ciphertext disk sector mapping 512 bytes of plaintext 512 bytes of ciphertext No defence against malicious modification of disk! Easy to preserve atomicity of disk sector writes, write ordering, &c.

Disk authentication Threat model: attacker can write malicious data to disk Expand each disk block with secret-key MAC? 512 bytes of user data 528 bytes of disk sector? Splits file system s idea of logical disk sector across two physical disk sectors Atomicity? Nope! 496 bytes of user data 512 bytes of disk sector? Not nice for file system!

File system authentication Rewrite tree of pointers-with-mac all the way to the root? ZFS can do this; FFS, not so much.

Data/metadata write ordering Traditional FFS: Synchronous metadata writes Asynchronous data writes (roughly) Typical logging FFS: Serial metadata writes Asynchronous data writes No write ordering between data/metadata!

Garbage data appended after write? Allocate free block Write data to block (A) Write inode to point at new block, increase length (B) What if B happens before A? What if crash between B and A? Now file has whatever data was in free block!

Performance and concurrency Coarse-grained locking: easy, slow Want per-object locking

Rename Four different objects to lock! Any pair of them may be the same! Need consistent lock order.

Rename: orphaned directory trees / /home /usr /var /home/foo % mv /home /home/foo/bar

Lock order? Traditional in FFS: flail randomly? FreeBSD: wait/wound locks, without priorities livelock! ZFS: complicated! (Ask me after. Also broken.) Linux and NetBSD: ancestor-first, one rename in flight, deadlock-free

Suspend Need for taking snapshot (unless, e.g., log-structured) Need for unmount Prevent all operations: Block new operations. Wait for existing ones to drain. In NetBSD: called fstrans.

Suspend: reader/writer lock? Can use recursive reader-writer lock: file system operations take read lock, suspend takes write lock Slow! Point of contention for every file system operation.

Suspend: pserialized reader/writer lock Better: use passive serialization. Reader creates per-thread structure, touches no global state, unless suspend in progress Writer: marks suspend in progress, waits for all per-thread structures to drain

Questions? campbell@mumble.net