CSE506: Operating Systems CSE 506: Operating Systems

Similar documents
Virtual File System. Don Porter CSE 506

Virtual File System. Don Porter CSE 306

RCU. ò Dozens of supported file systems. ò Independent layer from backing storage. ò And, of course, networked file system support

RCU. ò Walk through two system calls in some detail. ò Open and read. ò Too much code to cover all FS system calls. ò 3 Cases for a dentry:

VFS, Continued. Don Porter CSE 506

Fall 2017 :: CSE 306. File Systems Basics. Nima Honarmand

Logical disks. Bach 2.2.1

File Systems. CS170 Fall 2018

Files and the Filesystems. Linux Files

CSE506: Operating Systems CSE 506: Operating Systems

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay

CS 111. Operating Systems Peter Reiher

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

3/26/2014. Contents. Concepts (1) Disk: Device that stores information (files) Many files x many users: OS management

File Systems. What do we need to know?

TDDB68 Concurrent Programming and Operating Systems. Lecture: File systems

Ext3/4 file systems. Don Porter CSE 506

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]

File System Internals. Jo, Heeseung

UNIX File System. UNIX File System. The UNIX file system has a hierarchical tree structure with the top in root.

The UNIX File System

Implementation should be efficient. Provide an abstraction to the user. Abstraction should be useful. Ownership and permissions.

File Systems Ch 4. 1 CS 422 T W Bennet Mississippi College

Basic OS Progamming Abstrac7ons

Chapter 6. File Systems

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

we are here Page 1 Recall: How do we Hide I/O Latency? I/O & Storage Layers Recall: C Low level I/O

Basic OS Progamming Abstrac2ons

File System Interface. ICS332 Operating Systems

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Network File System (NFS)

Chapter 8: Filesystem Implementation

Computer Systems Laboratory Sungkyunkwan University

CS370 Operating Systems

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

1 / 23. CS 137: File Systems. General Filesystem Design

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

File-System Interface. File Structure. File Concept. File Concept Access Methods Directory Structure File-System Mounting File Sharing Protection

File Systems. CSE 2431: Introduction to Operating Systems Reading: Chap. 11, , 18.7, [OSC]

File Management 1/34

CS 4284 Systems Capstone

The UNIX File System

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

CSE 333 Lecture 9 - storage

CSE 506: Opera.ng Systems. The Page Cache. Don Porter

CSE 333 SECTION 3. POSIX I/O Functions

NFS. Don Porter CSE 506

File Systems. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

File Systems. Todays Plan. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo

CS370 Operating Systems

Chapter 11: File System Implementation. Objectives

VIRTUAL FILE SYSTEM AND FILE SYSTEM CONCEPTS Operating Systems Design Euiseong Seo

The Page Cache 3/16/16. Logical Diagram. Background. Recap of previous lectures. The address space abstracvon. Today s Problem.

Lecture 24: Filesystems: FAT, FFS, NTFS

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

[537] Fast File System. Tyler Harter

we are here I/O & Storage Layers Recall: C Low level I/O Recall: C Low Level Operations CS162 Operating Systems and Systems Programming Lecture 18

File Systems. Chapter 11, 13 OSPP

Last Class: Memory management. Per-process Replacement

Arvind Krishnamurthy Spring Implementing file system abstraction on top of raw disks

Outline. Operating Systems. File Systems. File System Concepts. Example: Unix open() Files: The User s Point of View

To understand this, let's build a layered model from the bottom up. Layers include: device driver filesystem file

Chapter 1 - Introduction. September 8, 2016

Introduction to File Systems

UNIX Files. How UNIX Sees and Uses Files

Files and File Systems

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Thanks for the feedback! Chapter 8: Filesystem Implementation. File system operations. Acyclic-Graph Directories. General Graph Directory

Operating System Concepts Ch. 11: File System Implementation

ECE 598 Advanced Operating Systems Lecture 19

Outline. OS Interface to Devices. System Input/Output. CSCI 4061 Introduction to Operating Systems. System I/O and Files. Instructor: Abhishek Chandra

File Systems. CS 4410 Operating Systems

The UNIX Time- Sharing System

File I/O and File Systems

File System Implementation

11/3/71 SYS MOUNT (II) sys mount; special; name / mount = 21.; not in assembler

What is a file system

File System: Interface and Implmentation

Secure Architecture Principles

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

1 / 22. CS 135: File Systems. General Filesystem Design

CS140 Operating Systems Final December 12, 2007 OPEN BOOK, OPEN NOTES

Inter-Process Communication

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems

A2 Design Considerations. CS161, Spring

CS370 Operating Systems

Today: File System Functionality. File System Abstraction

Network File System (NFS)

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Network File System (NFS)

Operating Systems. File Systems. Thomas Ropars.

Applications of. Virtual Memory in. OS Design

Page Frame Reclaiming

CptS 360 (System Programming) Unit 6: Files and Directories

8. Files and File Systems

File Systems Management and Examples

Transcription:

CSE 506: Operating Systems Virtual File System

History Early OSes provided a single file system In general, system was tailored to target hardware In the early 80s, desire for more than one file system Any guesses why? Networked file systems Sharing parts of a file system across a network of workstations

Modern VFS Dozens of supported file systems Allows new features and designs transparent to apps Interoperability with removable media and other OSes Independent layer from backing storage In-memory file systems (ramdisks) Pseudo file systems used for configuration (/proc, /dev, ) only backed by kernel data structures And, of course, networked file system support

User More detailed diagram Kernel ext4 VFS btrfs fat32 nfs Page Cache Block Device Network IO Scheduler Driver Disk

User s perspective Single programming interface (POSIX file system calls open, read, write, etc.) Single file system tree CSE506: Operating Systems Remote FS can be transparently mounted (e.g., at /home) Alternative: Custom library for each file system Much more trouble for the programmer

What the VFS does The VFS is a substantial piece of code not just an API wrapper CSE506: Operating Systems Caches file system metadata (e.g., names, attributes) Coordinates data caching with the page cache Enforces a common access control model Implements complex, common routines path lookup opening files file handle management

FS Developer s Perspective FS developer responsible for CSE506: Operating Systems Implementing standard objects/functions called by the VFS Primarily populating in-memory objects Typically from stable storage Sometimes writing them back Can use block device interfaces to schedule disk I/O And page cache functions And some VFS helpers Analogous to implementing Java abstract classes

High-level FS dev. tasks CSE506: Operating Systems Translate between VFS objects and backing storage (whether device, remote system, or other/none) Potentially includes requesting I/O Read and write file pages VFS doesn t prescribe all aspects of FS design More of a lowest common denominator

Core VFS abstractions super block FS-global data CSE506: Operating Systems Early/many file systems put this as first block of partition inode (index node) metadata for one file dentry (directory entry) name to inode mapping file file descriptor dentry and cursor (file offset)

Super blocks SB + inodes are extended by file system developer Stores all FS-global data Opaque pointer (s_fs_info) for FS-specific data Includes many hooks Tasks such as creating or destroying inodes Dirty flag for when it needs to be synced with disk Kernel keeps a circular list of all of these When there are multiple FSes (in today s systems: always)

Inode The second object extended by the FS Huge more fields than we can talk about Tracks: File attributes: permissions, size, modification time, etc. File contents: Address space for contents cached in memory Low-level file system stores block locations on disk Flags, including dirty inode and dirty data

Inode history Original file systems stored files at fixed intervals If you knew the file s index number you could find its metadata on disk Think of a portion of the disk as a big array of metadata Hence, the name index node Original VFS design called them vnode virtual node (perhaps more appropriately) Linux uses the name inode

Embedded inodes CSE506: Operating Systems Many FSes embed VFS inode in FS-specific inode struct myfs_inode { int ondisk_blocks[]; /* other stuff*/ struct inode vfs_inode; } Why? Finding the low-level from inode is simple Compiler translates references to simple math

Linking An inode uniquely identifies a file for its lifespan Does not change when renamed Model: inode tracks links or references on disk Count 1 for every reference on disk Created by file names in a directory that point to the inode What happens when a file is renamed? renaming file temporarily increases link count and then lowers it (at least in some implementations) When link count is zero, inode (and contents) deleted There is no delete system call, only unlink

Linking, cont. Hard link (link() system call/ln utility) Creates a new name for the same inode Opening either name opens the same file This is not a copy Open files create an in-memory reference to a file If an open file is unlinked, the directory entry is deleted inode and data retained until all in-memory references are deleted Famous feature: rm on large open file when out of quota Still out of quota

Common trick for temporary files How to clean up temp file when program crashes? create (1 link) open (1 link, 1 ref) unlink (0 link) File gets cleaned up when program dies Kernel removes last reference on exit Happens regardless if exit is clean or not Except if the kernel crashes / power is lost Need something like fsck to clean up inodes without dentries» Dropped into lost+found directory

inode stats The stat word encodes both permissions and type High bits encode the type: regular file, directory, pipe, device, socket, etc Unix: Everything s a file! VFS involved even with sockets! Lower bits encode permissions: 3 bits for each of User, Group, Other + 3 special bits Bits: 2 = read, 1 = write, 0 = execute Ex: 750 User RWX, Group RX, Other nothing How about the sticky bit? suid bit? chmod has more pleasant syntax [ugo][+-][rwx]

Special bits For directories, Execute means search X-only allows to find readable subdirectories or files Can t enumerate the contents Useful for sharing files in your home directory Without sharing your home directory contents Setuid bit Program executes with owner s UID Crude form of permission delegation

Group inheritance bit More special bits CSE506: Operating Systems When I create a file, it is owned by my default group When I create in a g+s directory, directory group owns file Useful for things like shared git repositories Sticky bit Prevents non-owners from deleting or renaming files Ex: CSE506 submission directory

File objects Represent an open file; point to a dentry and cursor Each process has a table of pointers to them The int fd returned by open is an offset into this table VFS-only abstractions FS doesn t track which process has a reference to a file Files have a reference count. Why? Fork also copies the file handles Particularly important for stdin, stdout, stderr If child reads from the handle, it advances (shared) cursor These days, hard to tell if this is a feature or a bug

File handle games dup(), dup2() Copy a file handle Creates 2 table entries for same file struct Increments the reference count seek() adjust the cursor position CSE506: Operating Systems Back when files were on tape... fcntl() Set flags on file (ioctl() for inodes) CLOSE_ON_EXEC bit prevents inheritance on exec() Set by open() or fcntl()

These store: Dentries A file name A link to an inode A parent ptr (null for root of file system) Ex: /home/myuser/vfs.pptx may have 4 dentries: /, home, myuser, and vfs.pptx Parent ptr distinguishes /home/myuser from /tmp/myuser Also VFS-only abstraction Although inode hooks on directories can populate them

Why dentries? Simple directory model can treat it as a file Contents are a list of <name, inode> tuples Why not just use the page cache? FS directory tree traversal very common Optimize with special data structures No need to re-parse and traverse on-disk layout format The dentry cache is a complex data structure We will discuss in more detail later CSE506: Operating Systems

Symbolic Links Special file type that stores a string String usually assumed to be a filename Created with symlink() system call How different from a hard link Completely Doesn t raise the link count of the file CSE506: Operating Systems Can be broken, or point to a missing file (just a string) Sometimes abused to store short strings [myself@newcastle ~/tmp]% ln -s "silly example" mydata [myself@newcastle ~/tmp]% ls -l lrwxrwxrwx 1 myself mygroup 23 Oct 24 02:42 mydata -> silly example

Quick review: dentry What purpose does a dentry serve? Maps a path name to an inode More in 2 slides on how to find a dentry dentries are cached in memory CSE506: Operating Systems Only recently accessed parts of dir are in memory Others may need to be read from disk dentries can be freed to reclaim memory (like pages)

3 cases for a dentry: dentry Caching CSE506: Operating Systems In memory (exists) Not in memory (doesn t exist) Not in memory (on disk/evicted for space or never used) How to distinguish last 2 cases? Case 2 can generate a lot of useless disk traffic Negative dentry Dentry with a NULL inode pointer

dentry Tracking dentries are stored in four data structures: CSE506: Operating Systems A hash table (for quick lookup) A LRU list (for freeing cache space wisely) A child list of subdirectories (mainly for freeing) An alias list (to do reverse mapping of inode -> dentries) Recall that many names can point to one inode

Summary of open() Implementation Key kernel tasks: Map a human-readable path name to an inode Check access permissions, from / to the file Possibly create or truncate the file (O_CREAT, O_TRUNC) Create a file struct Allocate a descriptor Point descriptor at file struct Return descriptor

open() arguments CSE506: Operating Systems int open(char *path, int flags, int mode); Path: file name Flags: many (see manual page) Mode: If creating file, what perms? (e.g., 0755) Return value: File handle index (>= 0 on success) Or (0 errno) on failure

Absolute vs. Relative Paths Each process has root and working directories Stored in pcb->root (or pcb->fs) and pcb->cwd These are dentry pointers (not strings) Why store a current root directory? Some programs are chroot jailed Should not be able to access anything outside of the jail CSE506: Operating Systems

More on paths An absolute path starts with the / character E.g., /home/myself/foo.txt, /lib/libc.so A relative path starts with anything else: E.g., vfs.pptx,../../etc/apache2.conf CSE506: Operating Systems First character dictates where to start searching Only two options: root (absolute) or cwd (relative)

Search Execute in a loop looking for next piece Treat / character as component delimiter Each iteration looks up part of the path Ex: /home/myself/foo would look up home, myself, then foo, starting at /

Iteration 1 For searched dentry (/), dereference the inode Check access permission (mode is stored in inode) Use permission() function pointer on inode Can be overridden by a file system If ok, look at next path component (/home)

Some special cases: Detail (2) If next component is a., just skip it If next component is a.., move up to parent Catch special case where current dentry is root Treat this as a no-op If not a. or.. : Compute a hash value to find bucket in d_hash table Hash of path from root (e.g., /home/foo, not foo ) Search the d_hash bucket at this hash value

Detail (3) If no dentry in the hash bucket Call lookup() method on parent inode (provided by FS) Probably will read the dentry from disk Or the network, or kernel data structures, If dentry found, check if it is a symlink If so, call inode->readlink() (also provided by FS) Get the path stored in the symlink Then continue next iteration First char decides to start at root or at cwd again If not a symlink, check if it is a directory If not a directory and not last element, we have a bad path

Iteration 2 We have dentry/inode for /home, now finding myself Check permission in /home Hash /home/myself, find dentry Confirm not.,.., or a symlink Confirm is a directory Repeat with dentry/inode for /home/myself Search for foo

Symlink Loops CSE506: Operating Systems What if /home/myself/foo is a symlink to foo? Kernel gets in an infinite loop Can be more subtle: foo -> bar bar -> baz baz -> foo

Preventing infinite symlink recursion More heuristics If more than 40 symlinks resolved quit with ELOOP If more than 6 symlinks in a row without non-symlink quit with ELOOP Can prevent execution of legitimate 41 symlink path Better than an infinite loop

Key tasks: Back to open() Map a human-readable path name to an inode Check access permissions, from / to the file CSE506: Operating Systems Possibly create or truncate the file (O_CREAT, O_TRUNC) Create a file descriptor We ve seen how first few steps are done

Creation Handled as part of search; last item is special Usually, if an item isn t found, search returns an error If last item (foo) exists and O_EXCL flag set, fail If O_EXCL is not set, return existing dentry If it does not exist, call FS create method Make a new inode and dentry Then open it Why is Create a part of Open? Avoid races in if (!exist()) create(); open();

File descriptors CSE506: Operating Systems Index into per-process array of struct file pointers struct file stores dentry pointer cursor into the file permissions (cache of inode svalue) reference count open() marks a free table entry as in use If full, create a new table 2x the size and copies old one Allocate a new struct file and put a pointer in table

Once open(), can read() ssize_t read(int fd,void *buf,size_t bytes); fd: File descriptor index buf: Buffer kernel writes the read data into bytes: Number of bytes requested Returns: bytes read (if >= 0), or errno

Summary of read() Implementation Translate int fd to a struct file (if valid) Check cached permissions in the file Increase reference count FS read() routine might context switch away temporarily Do read() routine associated with file (FS-specific) Probe the page cache for data Access storage if needed Can even do both: read() is allowed to return less than requested Drop refcount, return bytes read

Copying data to user CSE506: Operating Systems Kernel needs to be sure that buffer is a valid address Validate that buf is a valid address And that buf size>=bytes requested How to do it? Can walk appropriate page table entries What could go wrong? Concurrent munmap from another thread Page might be lazy-allocated by kernel

Trick for Validating User Buffers What if we don t do all of this validation? Looks like kernel had a page fault Usually really bad Idea: set kernel flag in_copy_to_user If a page fault happens for a user address, don t panic Just handle demand faults If the page is really bad Set indicator that write loop should be aborted Indicator can be left in a register» Tricky context switch code while exiting the page fault

Zero-Copy How many memory copies needed for a large read? One in page cache One in user space What if we then write this to network? One more for NIC driver Avoided extra copies if read/write is page-granularity Steal physical page containing buffer from process Replace it with physical page from page cache Mark it as COW in process, just in case Avoids unnecessary copies in common case