File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Similar documents
File System Consistency

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

FS Consistency & Journaling

Announcements. Persistence: Crash Consistency

[537] Journaling. Tyler Harter

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

CS 318 Principles of Operating Systems

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer Systems Laboratory Sungkyunkwan University

W4118 Operating Systems. Instructor: Junfeng Yang

Operating Systems. File Systems. Thomas Ropars.

Linux Filesystems Ext2, Ext3. Nafisa Kazi

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Journaling. CS 161: Lecture 14 4/4/17

File Systems: Consistency Issues

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Ext3/4 file systems. Don Porter CSE 506

Caching and reliability

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Journaling and Log-structured file systems

Lecture 18: Reliable Storage

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

File Systems II. COMS W4118 Prof. Kaustubh R. Joshi hdp://

November 9 th, 2015 Prof. John Kubiatowicz

ext3 Journaling File System

CSE506: Operating Systems CSE 506: Operating Systems

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

EECS 482 Introduction to Operating Systems

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Announcements. Persistence: Log-Structured FS (LFS)

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Lecture 11: Linux ext3 crash recovery

Review: FFS background

Chapter 11: Implementing File Systems

Advanced File Systems. CS 140 Feb. 25, 2015 Ali Jose Mashtizadeh

Review: FFS [McKusic] basics. Review: FFS background. Basic FFS data structures. FFS disk layout. FFS superblock. Cylinder groups

Files and File Systems

Midterm evaluations. Thank you for doing midterm evaluations! First time not everyone thought I was going too fast

15: Filesystem Examples: Ext3, NTFS, The Future. Mark Handley. Linux Ext3 Filesystem

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

42. Crash Consistency: FSCK and Journaling

CS3210: Crash consistency

CSE 451: Operating Systems Winter Module 17 Journaling File Systems

CrashMonkey: A Framework to Systematically Test File-System Crash Consistency. Ashlie Martinez Vijay Chidambaram University of Texas at Austin

CS5460: Operating Systems Lecture 20: File System Reliability

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Tricky issues in file systems

Crash Consistency: FSCK and Journaling

Operating System Concepts Ch. 11: File System Implementation

COS 318: Operating Systems. Journaling, NFS and WAFL

File Systems 1. File Systems

File Systems 1. File Systems

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Lecture 10: Crash Recovery, Logging

File Systems: Recovery

CS3600 SYSTEMS AND NETWORKS

C13: Files and Directories: System s Perspective

I/O and file systems. Dealing with device heterogeneity

What is a file system

File System Implementation

Caching and consistency. Example: a tiny ext2. Example: a tiny ext2. Example: a tiny ext2. 6 blocks, 6 inodes

Journaling versus Soft-Updates: Asynchronous Meta-data Protection in File Systems

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

CS 537 Fall 2017 Review Session

NTFS Recoverability. CS 537 Lecture 17 NTFS internals. NTFS On-Disk Structure

Advanced UNIX File Systems. Berkley Fast File System, Logging File System, Virtual File Systems

Chapter 11: File System Implementation. Objectives

Design Choices 2 / 29

Implementation should be efficient. Provide an abstraction to the user. Abstraction should be useful. Ownership and permissions.

File Systems Ch 4. 1 CS 422 T W Bennet Mississippi College

Operating Systems. Operating Systems Professor Sina Meraji U of T

File Management 1/34

OPERATING SYSTEMS CS136

SMD149 - Operating Systems - File systems

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Evolution of the Unix File System Brad Schonhorst CS-623 Spring Semester 2006 Polytechnic University

(Not so) recent development in filesystems

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

NFS. Don Porter CSE 506

Topics. " Start using a write-ahead log on disk " Log all updates Commit

<Insert Picture Here> Btrfs Filesystem

CS 4284 Systems Capstone

File System Internals. Jo, Heeseung

SCSI overview. SCSI domain consists of devices and an SDS

File Systems. What do we need to know?

[537] Fast File System. Tyler Harter

Case study: ext2 FS 1

Push-button verification of Files Systems via Crash Refinement

CS162 Operating Systems and Systems Programming Lecture 20. Reliability, Transactions Distributed Systems. Recall: File System Caching

File Systems. File system interface (logical view) File system implementation (physical view)

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]

CS 541 Database Systems. Recovery Managers

FILE SYSTEMS, PART 2. CS124 Operating Systems Winter , Lecture 24

Transcription:

File System Consistency Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu

Crash Consistency File system may perform several disk writes to complete a single system call e.g. creat(), write(), unlink(), rename(), But, disk only guarantees atomicity of a single sector write If file system is interrupted between writes, the ondisk structure may be left in an inconsistent state Power loss System crash (kernel panic) Transient hardware malfunctioning We want to move file system from one consistent state to another atomically 2

Initial state Example: Appending s Blocks I Appending a data block Db s Blocks I Db 3

Example: Crash Scenarios (1) Everything touched media: No problem s Blocks I Db Nothing touched media: No problem Due to page cache or internal disk write buffer s Blocks I 4

Example: Crash Scenarios (2) Only data block (Db) is written: OK s Blocks I Db No inode points to data block 5 (Db) bitmap says data block 5 is free 5

Example: Crash Scenarios (3) Only updated inode (I ) is written: Inconsistency s Blocks I I points to data block 5, but data bitmap says it s free Read will get garbage data (old contents of data block 5) If data block 5 is allocated to another file later, the same block will be used by two inodes 6

Example: Crash Scenarios (4) Only updated data bitmap is written: Inconsistency s Blocks I bitmap indicates data block 5 is allocated, but no inode points to it block 5 will never be used by the file system Lost data block (space leak) 7

Example: Crash Scenarios (5) Only inode and bitmap are written: OK s Blocks I File system metadata is completely consistent I has a pointer to data block 5 and data bitmap indicates it is in use Read will get garbage data (old contents of data block 5) 8

Example: Crash Scenarios (6) Only inode and data block are written: Inconsistency s Blocks I Db I has a pointer to data block 5, but data bitmap indicates it is free block 5 can be reallocated to another inode 9

Example: Crash Scenarios (7) Only bitmap and data block are written: Inconsistency s Blocks I Db bitmap indicates data block 5 is in use, but no inode points to it block 5 will never be used by the file system Lost data block (space leak) 10

FSCK File System Checker A Unix tool for finding inconsistencies in a file system and repairing them (cf. Scandisk in Windows) Run before the file system is mounted and made available After crash, scan whole file system for contradictions and fix it if needed bitmap consistency bitmap consistency link count Duplicated/invalid data block pointers Other integrity checks for superblock, inode, and directories 11

FSCK: Fixing Blocks Inconsistent data bitmap s Blocks 4 5 blk0: 4 blk1: 5 4 5 Duplicated data block pointers s Blocks 4 5 7 blk0: 4 blk1: 5 blk0: 75 4 5 7 12

FSCK: Fixing Link Count Inconsistent link count 1 2 5 s Blocks 1 4 6 /A #link: 12 blk0: 4 /B /A foo: 2 4 /B bar: 2 Lost file: no corresponding directory entry Create a temporary file in /lost+found 0 2 1 s Blocks 4 /lost+ found #link: 1 blk0: 4 /lost+ found 4 tmp: 2 13

Too slow! 5 phases Need to scan the entire directory tree and block pointers Fsck ing a 600GB disk takes ~ 70 min. FSCK Problems Requires intricate knowledge of the file system Not always obvious how to fix file system image Don t know correct state, just consistent one 14

Journaling Write-ahead logging A well-known technique for database transactions Record a log, or journal, of changes made to on-disk data structures to a separate location ( journaling area ) Write updates to their final locations ( checkpointing ) only after the journal is safely written to disk If a crash occurs: Discard the journal if the journal write is not committed Otherwise, redo the updates based on the journal data Fast as it requires to scan only the journaling area Used in modern file systems: Linux Ext3/4, ReiserFS, IBM JFS, SGI XFS, Windows NTFS, 15

Example: Appending (1) Appending a data block Db s Blocks I Db DB IN Step 1: Journal write Write journal header block (TxB), inode block (IN ), data bitmap block (DB ) and data block (Db) Journal TxB id=1 IN DB Db 16

Example: Appending (2) Step 2: Journal commit Write journal commit block (TxE) Journal TxB id=1 IN DB Db TxE id=1 Step 3: Checkpoint Write updates to their final on-disk locations (IN, DB, Db) s Blocks I Db DB IN 17

Example: Recovery (1) Crash between step 1 & 2 Journal write has not been committed Simply discard the journal File system is rolled back to the state before data block Db is appended Journal TxB id=1 IN?? Db s Blocks I 18

Example: Recovery (2) Crash between step 2 & 3 Doesn t matter which metadata/data blocks were actually updated Roll-forward recovery (redo logging): overwrite their final on-disk locations using the journal data Journal TxB id=1 IN DB Db TxE id=1 s Blocks????? 19

Optimizing Journaling Circular log Mark the transaction free and reuse the journal space Batching log updates Buffer all updates into a global transaction e.g. 5 seconds in Ext3/4 Journal checksums Eliminate write barrier between journal write & commit Metadata journaling Only guarantees metadata consistency Ordered journaling in Ext3/4: force the data write before the journal is committed so as not to point to garbage 20