File System Implementation

Similar documents
File System Implementation. Sunu Wibirama

FILE SYSTEM IMPLEMENTATION. Sunu Wibirama

Chapter 11: Implementing File Systems

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

File System Internals. Jo, Heeseung

Chapter 11: Implementing File Systems. Operating System Concepts 8 th Edition,

CS3600 SYSTEMS AND NETWORKS

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS370 Operating Systems

Chapter 11: File System Implementation. Objectives

CS370 Operating Systems

File System CS170 Discussion Week 9. *Some slides taken from TextBook Author s Presentation

UNIX File Systems. How UNIX Organizes and Accesses Files on Disk

ECE 598 Advanced Operating Systems Lecture 18

Computer Systems Laboratory Sungkyunkwan University

Chapter 12: File System Implementation

Chapter 12 File-System Implementation

Case study: ext2 FS 1

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

OPERATING SYSTEM. Chapter 12: File System Implementation

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Case study: ext2 FS 1

Chapter 11: Implementing File Systems

Chapter 11: Implementing File

CS370 Operating Systems

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Chapter 11: File System Implementation

Chapter 12: File System Implementation

File Management By : Kaushik Vaghani

Chapter 10: File System Implementation

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Chapter 11: Implementing File-Systems

412 Notes: Filesystem

ECE 598 Advanced Operating Systems Lecture 17

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Week 12: File System Implementation

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

File System: Interface and Implmentation

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Chapter 12: File System Implementation

Operating Systems 2010/2011

Chapter 12: File System Implementation

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

4/19/2016. The ext2 file system. Case study: ext2 FS. Recap: i-nodes. Recap: i-nodes. Inode Contents. Ext2 i-nodes

File System Implementation

Introduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

CS307: Operating Systems

File System & Device Drive Mass Storage. File Attributes (Meta Data) File Operations. Directory Structure. Operations Performed on Directory

mode uid gid atime ctime mtime size block count reference count direct blocks (12) single indirect double indirect triple indirect mode uid gid atime

CSE506: Operating Systems CSE 506: Operating Systems

Operating Systems CMPSC 473. File System Implementation April 1, Lecture 19 Instructor: Trent Jaeger

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

File System Implementation

Operating System Concepts Ch. 11: File System Implementation

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Outlook. File-System Interface Allocation-Methods Free Space Management

File Systems. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

C13: Files and Directories: System s Perspective

Frequently asked questions from the previous class survey

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Frequently asked questions from the previous class survey

3/26/2014. Contents. Concepts (1) Disk: Device that stores information (files) Many files x many users: OS management

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

File System (Internals) Dave Eckhardt

ECE 650 Systems Programming & Engineering. Spring 2018

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Advanced Operating Systems

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems

Chapter 11: Implementing File Systems

ECE 598 Advanced Operating Systems Lecture 14

Operating Systems Design Exam 2 Review: Spring 2011

UNIX File System. UNIX File System. The UNIX file system has a hierarchical tree structure with the top in root.

CS 416: Opera-ng Systems Design March 23, 2012

File Management. Ezio Bartocci.

File Systems Management and Examples

SMD149 - Operating Systems - File systems

File Systems. File system interface (logical view) File system implementation (physical view)

CS 4284 Systems Capstone

Disk divided into one or more partitions

File Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System

TDDB68 Concurrent Programming and Operating Systems. Lecture: File systems

File Systems: Fundamentals

Outline. Operating Systems. File Systems. File System Concepts. Example: Unix open() Files: The User s Point of View

Input & Output 1: File systems

File Systems: Fundamentals

Operating Systems. Operating Systems Professor Sina Meraji U of T

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Example Implementations of File Systems

e-pg Pathshala Subject: Computer Science Paper: Operating Systems Module 35: File Allocation Methods Module No: CS/OS/35 Quadrant 1 e-text

CSE325 Principles of Operating Systems. File Systems. David P. Duggan. March 21, 2013

File System. Minsoo Ryu. Real-Time Computing and Communications Lab. Hanyang University.

Implementation should be efficient. Provide an abstraction to the user. Abstraction should be useful. Ownership and permissions.

File Systems Ch 4. 1 CS 422 T W Bennet Mississippi College

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Transcription:

Introduction to Operating Systems File System Implementation John Franco Electrical Engineering and Computing Systems University of Cincinnati

Layered File System Application Programs Logical File System File Organization Module Basic File System I/O Control Device The file system is composed of many different levels Each level in the design uses the features of lower levels to create new features for use by higher levels

Logical File System Layer Manages metadata information Metadata includes all of the file-system structure except the actual contents of the files Manages the directory structure to provide the file organization module with the information it needs, given a symbolic file name. Maintains file structure via file-control blocks (FCB) FCB contains file info such as ownership, permissions, and location of the file contents Responsible for protection and security

Typical File Control Block permissions last access, last modified, created dates/times owner, group, access control list (users, what they can do) size data blocks or pointers to file data blocks

File Organization Layer Knows about files and their logical blocks, as well as physical blocks Can translate logical block addresses to physical block addresses for the basic file system to transfer data Each file s logical blocks are numbered from 0 to N Includes the free-space manager which tracks unallocated blocks and provides these blocks to the file organization module when requested

Basic File System Layer Issues generic commands to the appropriate device driver to read and write physical blocks on the disk Each physical block is identified by its numeric disk address (in the case of hard disks). For example: drive 1, cylinder 73, track 2, sector 10.

I/O Control Layer Consists of device drivers and interrupt handlers to transfer information between the main memory and the disk system Recall, a device driver translates high-level commands such as retrieve block 123 to low-level, hardware-specific instructions that are used by the hardware controller which interfaces the device to the rest of the system

File Structures With a layered implementation there is less duplication of code For example, the I/O Control and Basic File System layers can be used by many file systems A logical layer and a file organization layer can be written for each file system separately

File Structures on Disk The file system may contain information about how to boot an operating system stored there, the total number of blocks, the number and location of free blocks, the directory structure, and individual files Examples: Boot Control Block: information needed by the system to boot from that volume Volume control block: volume or partition details, such as the number of blocks in the partition, size of the blocks, freeblock count and free-block pointers, and free FCB count and FCB pointers. In UFS, this is called a superblock (use dumpe2fs /dev/sda5) Directory structure: in UFS, this includes file names and associated inode numbers (use ls -i) File Control Block: contains file details including permissions, ownership, size, and location of the data blocks. In UFS, this is called the inode.

File Structures in Memory For file-system management and performance improvement via caching The data are loaded at mount time and discarded at dismount The structures may include the ones described below: Mount table: information about each mounted volume Directory-structure cache: holds the directory information of recently accessed directories Open-file table: contains a copy of the FCB of each open file, as well as other information - global Open-file table (per process): contains a pointer to the appropriate entry in the global open-file table, as well as other information

File Creation 1. Application calls logical file system (LFS) which knows the format of the directory structures 2. A new FCB is allocated by the LFS, or, if the file system implementation creates all FCBs at file system creation time, an FCB is allocated from the set of free FCBs 3. The LFS reads the appropriate directory into memory from disk 4. The directory is updated with the new file name and FCB and is written back to the disk 5. The LFS can call the file organization module to map the directory I/O into disk-block numbers, which are passed on to the basic file system and I/O control system.

File Structures in Memory open(file) directory structure directory structure 3 FCB user space kernel space disk read(file) data blocks 2 proc OFT global OFT FCB user space kernel space disk

File Open 1. The open() call passes a file name to the file system 2. The open() system call searches the global open file table (OFT) to see if the file is already in use by another process 3. If so, a per-process OFT entry is created pointing to the existing global OFT 4. The directory structure is searched for the given file name (parts of the directory structure are cached for speed) 5. Once the file is found the FCB is copied into a global OFT in memory 6. This OFT also maintains the number of processes that have the file open 7. An entry is made in the process OFT with a pointer to the entry in the global OFT plus some other fields such as cursor and access mode 8. open() returns a pointer to the entry in the process OFT 9. All operations are performed via this file descriptor (or handle)

File Close 1. The process OFT entry is removed and the global OFT entry s open count is decremented 2. When all users that have opened the file close it, any updated metadata is copied back to the disk-based directory structure and the global OFT entry is removed

Partitions and Mounting Partition can be a volume containing a file system ( cooked ) or just a sequence of blocks with no file system ( raw ) Boot block can point to boot volume or boot loader set of blocks that contain enough code to know how to load the kernel from the file system Root partition contains the OS, other partitions can hold other OSes, other file systems, or be raw Mounted at boot time Other partitions can be mounted automatically at boot or manually System consistency is checked at mount time If the metadata is correct then mount and add to mount table Otherwise, correct the problem and try again Look at /etc/mtab

Unix Mounting Implemented by setting a flag in the in-memory copy of the inode for the directory on which a partition is mounted (the flag indicates that the directory is a mount point) Then an inode field points to an entry in the mount table, indicating which device is mounted there The mount table entry contains a pointer to the superblock of the file system on that device This scheme enables the operating system to traverse its directory structure, switching among file systems of varying types, seamlessly

Virtual File System Virtual File Systems (VFS) on Unix provide an object-oriented way of implementing file systems VFS allows the same system call interface (the API) to be used for different types of file systems Separates file-system generic operations from implementation details Implementation can be one of many file system types, or network file system Activates file-system-specific operations to handle local requests according to their file-system types and even calls the NFS protocol procedures for remote requests The API belongs to the VFS interface, rather than any specific type of file system

Virtual File System Architecture Four main object types: inode object: represents an individual file file object: represents an open file superblock object: represents an entire file system dentry object: represents an individual directory entry For each object type, the VFS defines operations that must be implemented These operations are implemented for each file system Thus, thevfsinvokesafunction, sayread, withoutknowingorcaringabout the file system type it is dealing with

Directory Implementation The selection of directory-allocation and directory-management algorithms significantly affects the efficiency, performance, and reliability of the file system Linear List: Simple to program - expensive to execute To create a new file 1. search the directory to be sure no existing file has the same name 2. add new entry at the end of the directory To delete a file 1. search the directory for the named file 2. release the space allocated to it 3. use a linked list to improve performance To reuse the directory entry 1. mark the entry as unused (give it a special name) or 2. attach it to a list of free directory entries or 3. copy the last entry in the directory into the freed location

Directory Implementation linear list should be sorted in some special structure for log(n) access, deletion, and insertion B Tree: http://gauss.ececs.uc.edu/courses/c4029/lectures/btrees.pdf Red Black Tree: http://gauss.ececs.uc.edu/redblack/redblack.html

Directory Implementation The selection of directory-allocation and directory-management algorithms significantly affects the efficiency, performance, and reliability of the file system Hash Table: Complex to program - cheap to execute - requires good hash function usually table is of fixed size (if sizing is wrong, performance suffers) To create a new file 1. apply hash function to user-path pair to locate a cell in the table 2. if the cell is occupied, add a node to a linked list referenced from cell To delete a file 1. apply hash function to user-path pair to locate a cell in the table 2. walk the linked list to find the node corresponding to the file 3. release the space allocated to the file To reuse the directory entry 1. mark the entry as unused (give it a special name) or 2. attach it to a list of free directory entries or 3. copy the last entry in the directory into the freed location

Block Allocation Allocating disk space to files so that space is used efficiently and yet files can be accessed efficiently requires careful thought Contiguous Allocation: All files occupy a contiguous set of blocks Positives 1. good for sequential access - head moves from track to adjacent track, sectors on each track are read in order as they spin under the head 2. directory entries need only point to the first block as the location of all others can be calculated from that and block size Negatives 1. finding space for a new file can be really tough 2. there will probably be an awful lot of external fragmentation 3. some sort of de-fragmentation is needed - such algorithms are really slow 4. if a file is modified to be larger, all hell breaks loose (usually) 5. fix for above - begin with contiguous chunk, then add extents (other contiguous chunks) as needed

Block Allocation Allocating disk space to files so that space is used efficiently and yet files can be accessed efficiently requires careful thought Linked Allocation: Each file is a linked list of blocks Positives 1. external fragmentation is gone - de-fragmentation is unnecessary 2. free space is easy to find - when a block is given up, a special number is inserted into the link 3. files can grow as much as they like until blocks are exhausted Negatives 1. random access possibly needs to traverse many links, inefficient 2. a lot of space is needed for the linked structures - mitigated by using clusters of blocks 3. if a link is lost, it is difficult to recover a file - mitigated by having a backup copy of the FAT 4. the FAT must be cached since otherwise head moves to beginning of FAT, moves to directory entry, moves to desired link, moves to desired block

FAT16 Allocation Files are linked through a File Allocation Table containing a logically contiguous array of two byte elements, one for each cluster, and such that the i th element corresponds to the i th cluster Given the j th cluster of a file as system cluster k, the j +1 st file cluster is found from the k th array element of FAT A special end-of-file number in an array element says there are no further clusters belonging to the file The beginning of the linked list is found from an entry in the directory table - a directory entry contains 32 bytes and looks like this: H O W D Y J P G atr ctime cdate adate ltime ldate 0AB1 file size where the first 11 bytes are used for the file name and extension, the two bytes above containing number 0xB10A indicate the starting cluster, and the remaining bytes are time and date stamps of various kinds, plus some reserved bytes and an attribute byte The following is a section of the FAT starting at index B10A: 0DB1 14B1 FFFF 10B112B113B1

Block Allocation Allocating disk space to files so that space is used efficiently and yet files can be accessed efficiently requires careful thought Indexed Allocation: bring all block pointers together in a single array of disk block addresses info inode Index Structure Index Structure Index Struc File Blocks Positives: 1. fast random access 2. no external fragmentation Negatives: 1. wasted indexing space 2. mitigated as shown to left Index Structure

HOWDY dentry DOODY dentry BUFFALO BOB dentry Dentry Tree Structure (directory) (file) (file) inode dentry (directory) dentry objects are arranged in a tree with one root a dentry object has a pointer to a linked list of children which may include directories, files, and symbolic links there is also a pointer to the parent all siblings are in a doubly linked list due to a third set of pointers info inode info

Superblock Structure struct ext2 super block { u32 s inodes count; /* Inodes count */ u32 s blocks count; /* Blocks count */ u32 s free blocks count; /* Free blocks count */ u32 s free inodes count; /* Free inodes count */ u32 s first data block; /* First data block */ u32 s log block size; /* Block size */ u32 s log cluster size; /* Cluster size */ u32 s first ino; /* First non-reserved inode */ u16 s inode size]; /* Size of inode structure */ u32 s blocks per group; /* A file s blocks in same group */ u32 s inodes per group; /* A dir s inodes in same group */... } Open a filesystem like this: struct struct ext2 filsys filsys; ext2 filsys fs = &filsys; ext2fs open("/dev/sdb2", EXT2 FLAG RW, 0, 0, unix io manager, &fs); Find first inode like this: ext2 inode t ino = fs->super->s first ino;

Inode Structure struct ext2 inode { u16 i mode; /* File mode */ u16 i uid; /* Low 16 bits of Owner Uid */ u32 i size; /* Size in bytes */ u32 i atime; /* Access time */ u32 i mtime; /* Modification time */ u16 i gid; /* Low 16 bits of Group Id */ u16 i links count; /* Links count */ u32 i blocks; /* Blocks count */ u32 i flags; /* File flags */ u32 i block[ext2 N BLOCKS]; /* Pointers to blocks */ u32 i generation; /* File version (for NFS) */ u32 i file acl; /* File Access Control List */... } By way of an example, i flags might be 0x80000 (extents!) and i mode might be 0x180 which means this is not a directory and the i block array has the structure shown on the next slide (EXT2 N BLOCKS=15, for a total of 60 bytes)

Inode Structure 0A F3 01 00 04 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 64 81 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The number 0xF30A is a magic number, verifying type The number 0x1 in blue means there is 1 extent The number 0x8164 in green is the block number of that extent To visit the block at that number do this: unsigned char buffer[1000]; blk t blocknr = 0x8164; io channel read blk(fs->io, blocknr, 1, buffer); A partial, typical result is on the next slide

Data Block 0 31 2e 20 44 6f 65 73 20 74 68 65 72 65 20 65 78 1. Does there ex 10 69 73 74 20 61 20 6e 6f 6e 2d 71 48 6f 72 6e 20 ist a non-qhorn 20 73 61 74 69 73 66 69 61 62 6c 65 20 66 6f 72 6d satisfiable form 30 75 6c 61 20 74 68 61 74 20 69 73 0a 20 20 20 66 ula that is. f 40 75 6e 63 74 69 6f 6e 61 6c 6c 79 20 65 71 75 69 unctionally equi 50 76 61 6c 65 6e 74 20 74 6f 20 61 20 71 2d 48 6f valent to a q-ho 60 72 6e 20 66 6f 72 6d 75 6c 61 0a 32 2e 20 43 61 rn formula.2. Ca 70 6e 20 79 6f 75 20 62 75 69 6c 64 20 61 20 42 44 n you build a BD 80 44 20 75 6e 64 65 72 20 6f 6e 65 20 6f 72 64 65 D under one orde 90 72 69 6e 67 20 74 68 61 74 20 69 73 20 71 48 6f ring that is qho a0 72 6e 20 61 6e 64 20 69 73 20 6e 6f 74 20 0a 20 rn and is not. b0 20 20 71 48 6f 72 6e 20 75 6e 64 65 72 20 61 6e qhorn under an c0 6f 74 68 65 72 2e 0a 33 2e 20 43 61 6e 20 61 6e other..3. Can an d0 79 20 42 6f 6f 6c 65 61 6e 20 65 78 70 72 65 73 y Boolean expres

Directory Structure 0 02 00 00 00 0c 00 01 02 2e 00 00 00 02 00 00 00... 10 0c 00 02 02 2e 2e 00 00 0b 00 00 00 18 00 0e 01... 20 63 68 61 6c 6c 65 6e 67 65 73 2e 74 78 74 00 00 challenges.txt.. 30 0c 00 00 00 14 00 09 01 63 69 74 65 73 2e 74 78...cites.tx 40 74 00 00 00 01 f9 01 00 24 00 05 02 44 69 72 65 t...$...dire 50 63 70 64 66 33 32 4c 6f 67 5c 64 65 62 75 67 6c cpdf32log\debugl 60 6f 67 2e 74 78 74 00 00 0e 00 00 00 14 00 09 01 og.txt... 70 66 69 78 65 73 2e 74 78 74 00 00 00 0f 00 00 00 fixes.txt... 80 14 00 0a 01 68 69 74 73 65 74 2e 74 78 74 00 00...hitset.txt.. 90 10 00 00 00 14 00 09 01 6e 6f 74 65 73 2e 74 78...notes.tx a0 74 00 00 00 11 00 00 00 14 00 0b 01 73 61 74 5f t...sat b0 69 64 78 2e 74 78 74 00 12 00 00 00 48 0f 08 01 idx.txt...h... c0 74 6f 64 6f 2e 74 78 74 00 00 00 00 00 00 00 00 todo.txt... d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00... If i mode had indicated a directory and i flags an extent then the block contains directory entries consisting of variable length records with the following format: 4 bytes: inode number, 2 bytes: record length, 2 bytes: name length, remaining bytes: name of file, directory, or link The first three records are shown in blue, red, green Entry challenges.txt points to inode 11 (0xb) which is on the next slide

File Inode 0A F3 01 00 04 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 64 81 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Inode 11 blocks array. This inode is pointed to by challenges.txt dentry as on the previous slide The i flags field was 0x8000 so the format is for extents The 0x1 shown blue says there is one extent and its block number is shown in red to be 0x8164 (33124) Block 33124 is shown on the next slide

Data Block 0 31 2e 20 44 6f 65 73 20 74 68 65 72 65 20 65 78 1. Does there ex 10 69 73 74 20 61 20 6e 6f 6e 2d 71 48 6f 72 6e 20 ist a non-qhorn 20 73 61 74 69 73 66 69 61 62 6c 65 20 66 6f 72 6d satisfiable form 30 75 6c 61 20 74 68 61 74 20 69 73 0a 20 20 20 66 ula that is. f 40 75 6e 63 74 69 6f 6e 61 6c 6c 79 20 65 71 75 69 unctionally equi 50 76 61 6c 65 6e 74 20 74 6f 20 61 20 71 2d 48 6f valent to a q-ho 60 72 6e 20 66 6f 72 6d 75 6c 61 0a 32 2e 20 43 61 rn formula.2. Ca 70 6e 20 79 6f 75 20 62 75 69 6c 64 20 61 20 42 44 n you build a BD 80 44 20 75 6e 64 65 72 20 6f 6e 65 20 6f 72 64 65 D under one orde 90 72 69 6e 67 20 74 68 61 74 20 69 73 20 71 48 6f ring that is qho a0 72 6e 20 61 6e 64 20 69 73 20 6e 6f 74 20 0a 20 rn and is not. b0 20 20 71 48 6f 72 6e 20 75 6e 64 65 72 20 61 6e qhorn under an c0 6f 74 68 65 72 2e 0a 33 2e 20 43 61 6e 20 61 6e other..3. Can an d0 79 20 42 6f 6f 6c 65 61 6e 20 65 78 70 72 65 73 y Boolean expres The single data block of challenges.txt