Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block allocation and free-block algorithms and trade-offs 11.2 Silberschatz, Galvin and Gagne 2005 1

File Concept The file system consists of two distinct parts: collection of files and a directory structure. The OS provides abstraction of physical properties of disks by defining a logical storage unit, the file. A file is a collection of related information that is recorded on secondary storage. Why files, why we don t save info in the process itself? Limited space When the process terminates, the information is lost Multiple processes must be able to access the information concurrently. 11.3 Silberschatz, Galvin and Gagne 2005 File Attributes Name only information kept in human-readable form, For the convenience of human users. Identifier unique tag (number) identifies file within file system Type needed for systems that support different types Location Pointer to a device and the file location on the device. Size current file size Protection controls who can do reading, writing, executing Time, date, and user identification data for protection, security, and usage monitoring Information about files are kept in the directory structure, which is maintained on the disk Usually, a directory entry consists of a filename and its ID. The ID locates the other attributes. 11.4 Silberschatz, Galvin and Gagne 2005 2

File Operations A file is an abstract data type. System calls in the OS implement basic file operations: Create: The file is created with no data (just to initialize attributes) Open: Before using a file, a process must open it. to allow the system to fetch the attributes and list of disk addresses into main memory for rapid access on later calls. Close: to free up internal table space Read: read from the current pointer position Write: Data are written to the file usually at the current position Reposition within file (file seek) Delete Truncate Other possible operations: append, rename, get and set file attributes Combining the above operations to perform other file operation (copying) 11.5 Silberschatz, Galvin and Gagne 2005 Directory Structure A collection of nodes containing information about all files in a file system (name. location, size) A directory is a system file for maintaining the structure of the file system Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the files reside on disk (or a partition) A disk can contain multiple file systems Directories can be organized in several different ways. 11.6 Silberschatz, Galvin and Gagne 2005 3

Operations Performed on Directory Search for a file Create a file Delete a file List a directory Rename a file 11.7 Silberschatz, Galvin and Gagne 2005 Issues for Directory Organization Efficiency locating a file quickly Naming convenient to users two users can have same name for different files the same file can have several different names Grouping logical grouping of files by properties, (e.g., all Java programs, all games, ) 11.8 Silberschatz, Galvin and Gagne 2005 4

Directory Structure Single-Level Directory A single directory for all users, in which all files are contained. Was common in early days when there was only one user. Simple and easy to find files (one place to look in) Naming problem Grouping problem 11.9 Silberschatz, Galvin and Gagne 2005 Two-Level Directory Separate directory for each user Path name: defined by the username and the file name. Each OS has specific format for the path name. System files are searched through search path rather than copied to each user directory Different users can have files with the same name Efficient searching No grouping capability 11.10 Silberschatz, Galvin and Gagne 2005 5

Tree-Structured Directories 11.11 Silberschatz, Galvin and Gagne 2005 Tree-Structured Directories (Cont) Efficient searching Grouping Capability (subdirectories) Current directory (working directory): should contain most of the files of interest to the process, else a user specify new directory Special system calls (cd) are provided to change directories. Absolute or relative path names. Absolute specify the name starting from the top (root) directory. Example: /users/smith/project1/main.c Relative specify the name starting from the current directory. Example: main.c Search path for command names. 11.12 Silberschatz, Galvin and Gagne 2005 6

Tree-Structured Directories (Cont) Creating a new file is done in current directory Deleting a directory two options: Require the directory to be empty before it can be deleted. Provide an option to delete the directory along with all of its files and subdirectories.delete a file rm <file-name> Creating a new subdirectory is done in current directory mkdir <dir-name> Example: if in current directory /mail mkdir count mail prog copy prt exp count Deleting mail deleting the entire subtree rooted by mail 11.13 Silberschatz, Galvin and Gagne 2005 File-System Structure File system resides permanently on secondary storage (disks) For efficiency, I/O transfer between memory and a disk is performed on the units of blocks Each block has one or more sector, a sector is usually 512 bytes OS imposes a file system to store and access data. 2 problems: Define how the file system should look to the user. It involves defining a file, its attributes, operations on a file, directory structure. USER VIEW Creating algorithms and data structure to map the logical file system into the physical secondary-storage device. SYSTEM VIEW 11.14 Silberschatz, Galvin and Gagne 2005 7

File-System Structure (contd( contd) A file system is generally organized into layers: I/O control device drivers and interrupt handlers to transfer info between main memory and the disk system The device driver translates a high level commands ( such as get block 123) into hardware-specific instructions. Basic file system implements basic operations (commands). File-organization module knows about files and their logical and physical blocks (location). Tracks also the free space Logical file system manages metadata, i.e. information about the files. Manages the directory structure via a file control block (FCB), which contains info about the file including ownership, permissions, and location. 11.15 Silberschatz, Galvin and Gagne 2005 Layered File System 11.16 Silberschatz, Galvin and Gagne 2005 8

On-Disk File System Structures File system structures that reside on disk generally include: A boot control block (boot block, boot sector) contains information needed to boot the OS. If no OS, this block is empty Usually at the beginning of each partition (disk) A volume control block contains volume (partition) information such as the number of blocks, size of blocks, freeblock count and free-block pointers. (UFS: superblock. NTFS: master file table.) A directory structure per file system is used to organize files. A file control block for each file contains details about the file, including permissions, ownership, size, and location of data blocks. (UFS: inode. NTFS: stored within the master file table.) 11.17 Silberschatz, Galvin and Gagne 2005 A Typical File Control Block 11.18 Silberschatz, Galvin and Gagne 2005 9

In-Memory File System Structures Information about files that is kept in memory by the operating system may include: A mount table with information about each mounted volume. A directory-structure cache. The system-wide open-file table, with a copy of the FCB for each open file, along with other information. The per-process open-file tables, which contain pointers to appropriate entries in the system-wide open-file table, and other information. The following slide illustrates some of the in-memory structures kept by the operating system. Figure (a) refers to opening a file. Figure (b) refers to reading a file. 11.19 Silberschatz, Galvin and Gagne 2005 In-Memory File System Structures 11.20 Silberschatz, Galvin and Gagne 2005 10

Directory Implementation Linear list of file names with pointer to the data blocks. simple to program time-consuming to execute requires linear search To create a new file, the directory must be searched first to make sure that no existing file has the same name. Also the same for deleting a file. Hash Table linear list stores the directory entries but with hash data structure. (hash function takes filename as input and returns a pointer to the filename in the list) decreases directory search time collisions situations where two file names hash to the same location Another problem is fixed size of hash tables. If the directory size must increase, a new hash function is needed. (ex: mod 64 to mod 128) 11.21 Silberschatz, Galvin and Gagne 2005 File Implementation (Allocation Methods) Disk blocks must be allocated to files. The main problem is how to allocate space to files so that disk space is utilized effectively and files can be accessed quickly. keeping track of which disk blocks go with which file There are three major methods of allocating disk space to files: Contiguous allocation Linked allocation Indexed allocation 11.22 Silberschatz, Galvin and Gagne 2005 11

Contiguous Allocation Each file occupies a set of contiguous blocks on the disk Provides efficient direct access. Accessing block b+1 after block b requires no head movement, so the number of head seeks is minimal. Simple addressing scheme: only starting location (block #) and length (number of blocks) are required The directory entry for each file indicates the address of the starting block and the number of blocks required Both sequential and direct access can be supported A dynamic storage-allocation problem. First fit and best fit are the most common used strategies. External fragmentation can be a problem. Another problem is knowing how much space is needed for the file (file size) when it is created Files cannot grow, so if the allocated space is not enough any more: Terminate the user program, with an appropriate error message Find larger hole and copy the file into it. Pre-allocation of space is generally required if the file size is known in advance, However, the file might reach its final size over long period (may be years) which can cause significant internal fragmentation. 11.23 Silberschatz, Galvin and Gagne 2005 Contiguous Allocation of Disk Space 11.24 Silberschatz, Galvin and Gagne 2005 12

Linked Allocation Each file is a linked list of fixed-size disk blocks. Blocks may be scattered anywhere on the disk. Allocate blocks as needed as the file grows, wherever they are available on the disk The directory contains a pointer to the first and last blocks of the file Each block contains a pointer to the next block (not accessible by users) 11.25 Silberschatz, Galvin and Gagne 2005 Linked Allocation (Cont.) Simple only need to keep the starting address of each file. No external fragmentation. Any free block can be used to satisfy a request for more space. Creating a file is easy, declared of size zero (null pointer to the first block) then grow easily as long as free blocks are available No efficient direct access. Effective for sequential access Pointers saved in the blocks consume some space Use of clusters (a set of blocks which the system deals with as a unit) Decreases overhead of pointers (4 blocks = 1 cluster needs 1 ptr) Increases throughput (fewer head seeks). Increases internal fragmentation. Reliability: if a pointer is lost, blocks (clusters) can t be traversed 11.26 Silberschatz, Galvin and Gagne 2005 13

Linked Allocation with FAT Variation: File-Allocation Table (FAT) A section of disk at the beginning is set a side to store the table. FAT: One entry per block or cluster, and is indexed by block no. Link pointers only are kept in the FAT. taking the pointer word from each disk block and putting it in the FAT table in memory The directory entry contains the block no. of the first block Special end-of-file is saved in the last pointer. The FAT should be cached in memory to reduce head seeks. Random access is much easier Used by MS-DOS 11.27 Silberschatz, Galvin and Gagne 2005 Indexed Allocation (i-node) Linked allocation solves the external fragmentation and sizedeclaration problem, but it can t support efficient direct access with the absence of FAT Indexed allocation brings all pointers together into the index block. Each file has its own index block, which is an array of disk-block addresses The directory contains the address of the index block, from which the i th block can be accessed directly Logical view. 5 7 13 9 6 index block Block 5 Block7 Block 13 Block 9 Block 6 11.28 Silberschatz, Galvin and Gagne 2005 14

Example of Indexed Allocation 11.29 Silberschatz, Galvin and Gagne 2005 Indexed Allocation (cont.) More efficient direct access than linked allocation. No external fragmentation. Any free block can be used to satisfy a request for more space. Suffers from a wasted space due to the use of index block. (pointer overhead is larger) How large the index block should be? What if a file is too large for one index block? Linked scheme: use one block. To allow larger files, link several index blocks Multilevel index: use a first-level index block to point to a second-level index blocks Combined scheme: used in UNIX. 11.30 Silberschatz, Galvin and Gagne 2005 15

Indexed Allocation (Cont.) Multilevel Index M outer-index index table file 11.31 Silberschatz, Galvin and Gagne 2005 Combined Scheme: UNIX (4K bytes per block) 11.32 Silberschatz, Galvin and Gagne 2005 16

Efficiency and Performance Disks are the main bottleneck in a system performance, so they need to be improved in terms of efficiency and performance Efficiency depends on: disk allocation and directory algorithms Clustering reduces head seeks and improve file transfer types of data kept in file s directory entry Do we need to keep last access, last write dates for example. If yes we need to update these fields whenever a file is read or written Pointer size. Performance: it can be improved even after the algorithms are selected A buffer (disk) cache can be used to keep frequently used file blocks in mem. Page cache: cache file data as pages rather than as system blocks. Memory mapped files can improve performance through page caching Synchronous and asynchronous writes free-behind and read-ahead techniques to optimize sequential access 11.33 Silberschatz, Galvin and Gagne 2005 Recovery Care must be taken to ensure that system failure does not result in data loss or data inconsistency. Consistency checking compares data in directory structure kept in memory with data blocks on disk, and tries to fix inconsistencies performing a backup on an active file system might lead to inconsistency. (If files and directories are being added, deleted, and modified during the dumping process, the resulting backup may be inconsistent ) Example: fsck on Unix or chkdsk in DOS Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape, other magnetic disk, optical) Full backup: should the entire file system be backed up or only part of it? it is usually desirable to back up only specific directories and everything in them rather than the entire file system. Incremental backup: back up files that have not changed since the last backup Recover lost file or disk by restoring data from backup 11.34 Silberschatz, Galvin and Gagne 2005 17

Log Structured File Systems We have faster CPUs, faster and larger memories, larger but SLOW disks writes are done in very small chunks very inefficient due to high seek time Ex: creating a file requires the i-node for the directory, the directory block, the i-node for the file, and the file itself must all be written The goal of LFS is to achieve the full bandwidth of the disk, even in the face of a workload consisting in large part of small random writes Log structured (or journaling) file systems record each small update to the file system as a transaction Used in NTFS, optional in Solaris UNIX All operations for a transactions are written to a log (the disk is viewed as a log) A transaction is considered committed once all of its operations have been successfully written to the log. Then the system call can return to the user However, the file system may not yet be updated The transactions in the log are asynchronously written to the file system When the file system has been successfully updated, the transactions are removed from the log. After a system crash, all remaining committed transactions in the log are performed. Uncommitted transactions are rolled back. Summary: all writes are initially buffered in memory, and periodically all the buffered writes are written to the disk in a single segment, at the end of the log 11.35 Silberschatz, Galvin and Gagne 2005 RAID Structure RAID multiple disk drives provides reliability via redundancy. distributes data across several physical disks which look to the operating system and the user like a single logical disk Idea: Use many disks in parallel to increase storage bandwidth, improve reliability Files are striped across disks Each stripe portion is read/written in parallel Bandwidth increases with more disks, but more error prone RAID is arranged into six different levels. RAID 0 distributes data across several disks in a way which gives improved speed and full capacity, but all data on all disks will be lost if any one disk fails. RAID 1 (mirrored disks) uses two (possibly more) disks which each store the same data, so that data is not lost so long as one disk survives RAID 5 combines three or more disks in a way that protects data against loss of any one disk 11.36 Silberschatz, Galvin and Gagne 2005 18

RAID (cont) Several improvements in disk-use techniques involve the use of multiple disks working cooperatively. key concepts in RAID: mirroring, the copying of data to more than one disk; Used for reliability purposes striping, the splitting of data across more than one disk; Used for performance. More disks in the array means higher bandwidth, but greater risk of data loss error correction, where redundant data is stored to allow problems to be detected and possibly fixed. 11.37 Silberschatz, Galvin and Gagne 2005 RAID Levels RAID 0: Striped set without parity: Provides improved performance and additional storage but no fault tolerance. Any disk failure destroys the array, which becomes more likely with more disks in the array RAID1: Mirrored set without parity. Provides fault tolerance from disk errors and single disk failure RAID2: Redundancy through Hamming code RAID3: Striped with interleaved parity Dedicated disk for parity writing bottleneck RAID4: striped with Block level parity. Dedicated disk for parity writing bottleneck RAID5: Striped with distributed parity Each disk have some blocks that hold the corresponding blocks parity Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user RAID6: Striped set with dual parity each group of blocks have two distributed parity blocks that are distributed across the disks Provides fault tolerance from two drive failures 11.38 Silberschatz, Galvin and Gagne 2005 19

RAID (0 + 1) and (1 + 0) 11.39 Silberschatz, Galvin and Gagne 2005 Fast File System The original Unix file system had a simple, straightforward implementation Easy to implement and understand But very poor utilization of disk bandwidth (lots of seeking) BSD Unix folks did a redesign (mid 80s) that they called the Fast File System (FFS) Improved disk utilization, decreased response time Now the FS from which all other Unix FS s have been compared Good example of being device-aware for performance 11.40 Silberschatz, Galvin and Gagne 2005 20

Data and Inode Placement Original Unix FS had two placement problems: 1. Data blocks allocated randomly in aging file systems Blocks for the same file allocated sequentially when FS is new As FS ages and fills, need to allocate into blocks freed up when other files are deleted Problem: Deleted files essentially randomly placed So, blocks for new files become scattered across the disk 2. Inodes allocated far from blocks All inodes at beginning of disk, far from data Traversing file name paths, manipulating files, directories requires going back and forth from inodes to data blocks Both of these problems generate many long seeks 11.41 Silberschatz, Galvin and Gagne 2005 Cylinder Groups BSD FFS addressed these problems using the notion of a cylinder group Disk partitioned into groups of cylinders Data blocks in same file allocated in same cylinder Files in same directory allocated in same cylinder Inodes for files allocated in same cylinder as file data blocks Free space requirement To be able to allocate according to cylinder groups, the disk must have free space scattered across cylinders 10% of the disk is reserved just for this purpose 11.42 Silberschatz, Galvin and Gagne 2005 21

Other Problems Small blocks (1K) caused two problems: Low bandwidth utilization Small max file size (function of block size) Fix using a larger block (4K) Very large files, only need two levels of indirection for 2^32 Problem: internal fragmentation Fix: Introduce fragments (1K pieces of a block) Problem: Media failures Replicate master block (superblock) 11.43 Silberschatz, Galvin and Gagne 2005 End of Chapter 11 22