File Systems Chapter 4 1 What do we need to know? How are files viewed on different OS s? What is a file system from the programmer s viewpoint? You mostly know this, but we ll review the main points. How are file systems put together? How is the disk laid out for directories? For files? What kind of memory structures are needed? What do some real file systems look like? cp/m, ms-dos (fat-12/16/32), ntfs, nfs, ext2, What directions are file systems going? 2 1
Long-term Information Storage 1. Must store large amounts of data 2. Information stored must survive the termination of the process using it 3. Multiple processes must be able to access the information concurrently 3 File Naming Issues Character Set Length Extensions 4 2
File Naming Typical file extensions. 5 File Structure There are lots of files types. Here are three: byte sequence, record sequence, tree 6 3
Sample Files (a) An executable file (b) An archive 7 File Access Sequential access read all bytes/records from the beginning cannot jump around, could rewind or back up convenient when medium was mag tape Random access bytes/records read in any order essential for data base systems read can be move file marker (seek), then read or read and then move file marker 8 4
File Attributes Possible file attributes 9 File Operations 1. Create 2. Delete 3. Open 4. Close 5. Read 6. Write 7. Append 8. Seek 9. Get attributes 10. Set Attributes 11. Rename 10 5
An Example Program Using Unix File System Calls (1/2) 11 An Example Program Using File System Calls (2/2) 12 6
Memory-Mapped Files (a) Segmented process before mapping files into its address space (b) Process after mapping existing file abc into one segment creating new segment for xyz 13 Directories Single-Level Directory Systems A single level directory system contains 4 files owned by 3 different people, A, B, and C 14 7
Two-level Directory Systems Letters indicate owners of the directories and files 15 Hierarchical Directory Systems A hierarchical directory system 16 8
Directory Operations 1. Create 2. Delete 3. Opendir 4. Closedir 5. Readdir 6. Rename 7. Link 8. Unlink 17 File System Implementation A possible file system layout 18 9
Implementing Files (1) (a) Contiguous allocation of disk space for 7 files (b) State of the disk after files D and E have been removed 19 Implementing Files (2) Storing a file as a linked list of disk blocks 20 10
Implementing Files (3) File Allocation Table (FAT) uses a linked list in memory 21 Implementing Files (4) Combination of Direct and Indirect Block Pointers Note: This is a simplified version of Unix i-node 22 11
The UNIX V7 File System A UNIX i-node 23 Implementing Directories (1) (a) A simple directory fixed size entries disk addresses and attributes in directory entry (b) Directory in which each entry just refers to an i-node 24 12
Implementing Directories (2) Two ways of handling long file names in directory (a) In-line (b) In a heap 25 Linking (1) File system containing a file that is shared between two directories 26 13
Links (2) (a) Situation prior to linking (b) After the link is created (c) After the original owner removes the file 27 Disk Space Management (1) Block size Dark line (left hand scale) gives data rate of a disk Dotted line (right hand scale) gives disk space efficiency All files here are 2KB 28 14
Disk Space Management (2) (a) Storing the free list on a linked list (b) A bit map 29 File System Checking Possible results while running fsck (a) consistent (b) missing block (c) duplicate block in free list (d) duplicate data block 30 15
File System Performance (1) The block cache data structures 31 File System Writes Unix Critical Blocks are written immediately Data blocks are written periodically or when the block is removed from the block cache MSDOS Uses Write-through cache. All writes are immediate. 32 16
Read Ahead When block N is requested, the file system can issue a read for block N+1 also. What if the file is not being read sequentially? Initially assume it is, but monitor disk access and set a flag to non-sequential if needed. This can be used to disable read ahead. 33 File System Performance (2) I-nodes placed at the start of the disk Disk divided into cylinder groups each with its own blocks and i-nodes 34 17
Log-Structured File Systems With CPUs faster, memory larger disk caches can also be larger increasing number of read requests can come from cache thus, most disk accesses will be writes LSF Strategy structures entire disk as a log have all writes initially buffered in memory periodically write these to the end of the disk log when file opened, locate i-node, then find blocks 35 Journaling What happens when you remove a file? Remove the directory entry Release the i-node Free the disk blocks What happens if there is a crash after the first or second steps? How can you minimize the damage? 36 18
The CP/M File System (1) Memory layout of CP/M 37 The CP/M File System (2) The CP/M directory entry format 38 19
File Allocation Table (FAT) Partition Layout Partion layout: Boot block FAT FAT copy Root directory In FAT-12 and FAT-16, preassigned enough space for 256 directory entries Other directories and files 39 FAT Table Sizes FAT-12 2 12 clusters Cluster Size: 512 Byte to 8KB Partitions size up to 32MB (4K clusters * 8KB / cluster) Windows default for volumes < 16MB, such as floppies FAT-16 2 16 clusters Cluster Size: 512 Byte to 64KB Partitions size up to 4GB (64K clusters * 64KB / cluster) FAT-32 2 28 clusters Cluster Size: 512 Byte to 32KB Partitions size in principle up to 8TB, but Windows will only create FAT-32 partitions up to 32GB. Note that all FAT systems reserve the first two and last sixteen clusters in a partition, so actual partition sizes are slightly smaller than listed above. 40 20
Original MS-DOS directory entry: Directory Entries Directory entry used in Windows: Bytes 41 The Windows 98 File System An example of how a long name is stored in Windows 98 42 21
UNIX File System Disk layout in classical UNIX systems 43 The UNIX File System A UNIX V7 directory entry (old) 44 22
The UNIX V7 File System A UNIX i-node 45 UNIX File System Directory entry fields. Attributes in the i-node 46 23
The UNIX File System Path Names 47 The UNIX File System Some important directories found in most UNIX systems 48 24
Pathnames Absolute Pathname Begins at the root directory : / Contains all sub-directory names separated by slashes Filename ~jsterling ~ is recognized as a short-cut for the absolute path to the home directory. Relative Pathname Does not begin with the root directory. Starts in the current working directory. Sometimes the current working directory is shown explicitly using:. May move up the file tree using.. to refer to a parent directory. 49 The UNIX File System The steps in looking up /usr/ast/mbox 50 25
The UNIX File System Before linking. After linking. (a) Before linking. (b) After linking Note that hard links must refer to other files in the same file system. are not permitted to refer to directories. are indistinguishable from the original file name. Note that soft links (aka symbolic links) contain only a pathname. result in a dangling pointer if the original filename is deleted. 51 The UNIX File System Separate file systems After mounting (a) (b) (a) Before mounting. (b) After mounting 52 26
UNIX File System (3) The relation between the file descriptor table, the open file description and the i-node 53 The Linux File System Super block: # of inodes, # of blocks, etc. Group Descriptor: # of free i-nodes, # of free blocks, # of directories Bitmaps: Each is one block long 54 27
Record Locking in Unix Can lock any range of bytes in a file Can be multiple locks overlapping on a file Locks can be Exclusive (write): No other process can have a lock on the range. Shared (read): Other locks may exist on the same bytes. A failed lock can be made to block or not (choice of system call). A process s locks are released when a) The process terminates b) The process closes the file. Even if the process had it open more than once simultaneously! Locks are not inherited across fork. Locks can carry across exec. Deadlock possibility: Competing locks could in principle result in deadlock, but this is prevented in Unix. 55 System Calls for File Management s is an error code fd is a file descriptor position is a file offset 56 28
The stat System Call Mode: includes type and protection Inode # Device Link count Owner s ID Group ID File Size in bytes Access Time Modification Time Status change Time Blocksize Block count (512 byte blocks) Note that not all fields are stored in the i-nodes, themselves. From Solaris Man Page 57 System Calls for Directory Management s is an error code dir identifies a directory stream dirent is a directory entry 58 29
UNIX File System (4) A BSD directory with three files The same directory after the file voluminous has been removed 59 Unix File Protection Divide the world into categories (owner / group / world) and specifying read / write / execute access for each. Add read/write permissions for the owner/user and the group with: chmod ug+rw u for user/owner; g for group; o for other/world. + means add and means remove. r for read; w for write; x for execute. Adding / removing files requires write permission on their directory! Setuid Executable files may have their bit setuid set to indicate that they execute with the permission of their owner. An example is the program to change the password. Requires write access to the password file. 60 30
NTFS Goals File / disk security Disk quotas File compression Encryption 61 NTFS Features 64-bit cluster indices Logging of metadata Multiple data streams Unicode-based names Hard links Sparse Files Dynamic bad-cluster remapping 62 31
NTFS Master File Table (MFT) Can be anywhere. Boot sector says where Contains up to 2 48 records First 16 records reserved for Metadata, describing MFT itself copy of the MFT logging file Root directory Bitmap of free/used blocks Bootstrap code Bad blocks Quotas, etc. 63 NTFS MFT Record Describes one file or directory Has a length of 1KB Header Magic number Sequence number Reference count Size Etc. Attribute / value pairs Eg. Name, additional MFT records, if directory then how the entries are stored. May need more than one to describe a file. 64 32
NTFS MFT Data Attributes Data Attribute keeps track of runs of consecutive blocks There may be holes. Each entry consist of a header and a sequence of runs. If file is small, data may be stored in MFT Record 65 NTFS Small Directory A small directory is an MFT record containing directory entries. The entries contain the file s name and the MFT index of the file (plus some additional flags). Note: Large directories are stored as B+ trees. 66 33
NTFS File Name Lookup The file name above is C:\maria\web.htm The filename lookup first prepends \?? to the name. The name \??\C: is a symbolic link to an object that points to the root directory of the C: drive From there the filename lookup is similar to the process in Unix, accept that MFT index numbers replace the i-node numbers. 67 NTFS File Protection Access Control Lists Discretionary (DACL) says who has what permissions System (SACL) says whose actions get audited. 68 34
Network File System (NFS) Goal: to allow file system across a network to appear as one logical whole Client Server Model Server exports one or more directories Client 1 is replacing its /bin directory Client 1 is also mounting the projects directory as /usr/ast/ work. Client 2 is mounting the projects directory as /mnt. 69 NFS Protocols Mounting Client sends pathname to server, requesting mount permission. If the directory is listed as exported then server returns a file handle. Remote directories might be mounted at boot time or automounted. Directory and file access Most Unix system calls supported. Stateless connection. Server does not support open Instead Client sends a lookup which returns a handle. Each read has to say where to start reading from. File locks not supported. 70 35
NFS Implementation System call layer Open, read, close, etc. Virtual File System Layer Maintain table of open files, using v-nodes v-node may point to a standard i- node or to an NFS Client r-node. NFS Client Code On mount, create r-node for NFS directory. On open, issue lookup and create r- node. Transfers in 8KB chunks. Client does read-ahead. Client caching used for efficiency Block discarded after 3 sec for data and 30 sec. for directories. Writes synced after 30 seconds. 71 Example File Systems CD-ROM File Systems ISO9660 Designed for lowest common denominator in 1988 (i.e., msdos). Header contains 16 bytes that are up for grabs (e.g., bootstrap block) Multi-byte numeric field in directories are presented twice, once in big endian and once in little endian. File names were 8.3 + version Rock Ridge extension allowed various Unix features, such as file protections, longer names, more time stamps, arbitrary depth hierarchies Joliet extension (from Microsoft) added unicode. 72 36