Files and File System

Similar documents
Ext3/Ext4 File System

File Management 1/34

CS370 Operating Systems

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CS370 Operating Systems

Operating System Labs. Yuanbin Wu

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay

Logical disks. Bach 2.2.1

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Fall 2017 :: CSE 306. File Systems Basics. Nima Honarmand

CS370 Operating Systems

TDDB68 Concurrent Programming and Operating Systems. Lecture: File systems

Outlook. File-System Interface Allocation-Methods Free Space Management

Chapter 11: File System Implementation. Objectives

File Systems: Interface and Implementation

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Chapter 12: File System Implementation

File System: Interface and Implmentation

File System. Minsoo Ryu. Real-Time Computing and Communications Lab. Hanyang University.

Operating System Concepts Ch. 11: File System Implementation

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems

UNIX File Systems. How UNIX Organizes and Accesses Files on Disk

Chapter 11: Implementing File Systems

Introduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

File Systems. What do we need to know?

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Virtual File System. Don Porter CSE 306

File Systems. File system interface (logical view) File system implementation (physical view)

CS2028 -UNIX INTERNALS

Operating System Labs. Yuanbin Wu

File System CS170 Discussion Week 9. *Some slides taken from TextBook Author s Presentation

Chapter 11: Implementing File

ECE 598 Advanced Operating Systems Lecture 18

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

File Systems: Interface and Implementation

File Systems: Interface and Implementation

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Linux Filesystems Ext2, Ext3. Nafisa Kazi

File System Implementation. Sunu Wibirama

Files and File Systems

Local File Stores. Job of a File Store. Physical Disk Layout CIS657

File System & Device Drive Mass Storage. File Attributes (Meta Data) File Operations. Directory Structure. Operations Performed on Directory

UNIX File System. UNIX File System. The UNIX file system has a hierarchical tree structure with the top in root.

Computer Systems Laboratory Sungkyunkwan University

File System Internals. Jo, Heeseung

Virtual File System. Don Porter CSE 506

Chapter 11: File System Implementation

FILE SYSTEM IMPLEMENTATION. Sunu Wibirama

Last Week: ! Efficiency read/write. ! The File. ! File pointer. ! File control/access. This Week: ! How to program with directories

Implementation should be efficient. Provide an abstraction to the user. Abstraction should be useful. Ownership and permissions.

File Systems Ch 4. 1 CS 422 T W Bennet Mississippi College

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Operating Systems: Lecture 12. File-System Interface and Implementation

Files and File Systems

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File System Implementation

Operating Systems. File Systems. Thomas Ropars.

CS 4284 Systems Capstone

SMD149 - Operating Systems - File systems

File Management. Ezio Bartocci.

W4118 Operating Systems. Instructor: Junfeng Yang

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Chapter 10: File System Implementation

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

Case study: ext2 FS 1

Unix System Architecture, File System, and Shell Commands

CS3600 SYSTEMS AND NETWORKS

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Typical File Extensions File Structure

Chapter 7: File-System

Input & Output 1: File systems

Advanced UNIX File Systems. Berkley Fast File System, Logging File System, Virtual File Systems

File Systems: Consistency Issues

File Systems. Today. Next. Files and directories File & directory implementation Sharing and protection. File system management & examples

OPERATING SYSTEM. Chapter 12: File System Implementation

OPERATING SYSTEMS: Lesson 12: Directories

File-System Interface. File Structure. File Concept. File Concept Access Methods Directory Structure File-System Mounting File Sharing Protection

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

RCU. ò Dozens of supported file systems. ò Independent layer from backing storage. ò And, of course, networked file system support

File Systems. CS170 Fall 2018

CIS Operating Systems File Systems. Professor Qiang Zeng Fall 2017

Case study: ext2 FS 1

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

CIS Operating Systems File Systems. Professor Qiang Zeng Spring 2018

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Chapter 12: File System Implementation

mode uid gid atime ctime mtime size block count reference count direct blocks (12) single indirect double indirect triple indirect mode uid gid atime

File Directories Associated with any file management system and collection of files is a file directories The directory contains information about

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Chapter 11: Implementing File Systems. Operating System Concepts 8 th Edition,

Operating Systems CMPSC 473. File System Implementation April 1, Lecture 19 Instructor: Trent Jaeger

Chapter 11: Implementing File Systems

UNIT V SECONDARY STORAGE MANAGEMENT

Chapter 12: File System Implementation

Overview. Unix System Programming. Outline. Directory Implementation. Directory Implementation. Directory Structure. Directories & Continuation

412 Notes: Filesystem

File System Implementation

Transcription:

Operating System IIIT Kalyani 1 Files and File System

Operating System IIIT Kalyani 2 File Primarily a file is a named collection of data stored in a non-volatile storage media such as a hard disk. In a broader sense many other objects that can be a source and/or destination of data is also viewed as a file by OS. Examples of such objects are devices, pipes, sockets etc.

Operating System IIIT Kalyani 3 File System A file system is the organization and corresponding data structures to store files in the storage media. A file system helps to store and retrieve data efficiently and in a transparent manner from different types of storage media. A few well known file systems are different versions of FAT and ext etc.

Operating System IIIT Kalyani 4 File and File System Files in a file system are organized almost in a hierarchical manner, like a tree or a DAG a. Special files called directories are introduced to organize the hierarchy. The data of a directory are files and subdirectories under it in the hierarchy or graph. a It can be a graph as well.

Operating System IIIT Kalyani 5 File and File System The top level directory is called the root of the file system. Often a secondary storage is partitioned and more than one file systems may be present on it. A file system may spread over more than one physical disk.

Operating System IIIT Kalyani 6 Device as a File Each partition is treated as a device by a Linux kernel and a device is also treated a special file. The device files are available in the subdirectory /dev. One can open these files directly!

Operating System IIIT Kalyani 7 File and File System The total disk space of a partition is viewed by the kernel as a single logical volume with sequence of logical blocks addressed by their numbers. But in the hardware level a logical volume is spread over different cylinders, surfaces, heads and sectors of the disk.

Operating System IIIT Kalyani 8 File and File System The translation of logical block number to the corresponding cylinder-head-sector address is done by the device driver. On any logical partition of a block device, the 0 th logical block is reserved as the boot block which may store the Master Boot Record (MBR) a if the partition has an OS a The system BIOS loads the initial boot loader from MBR in the memory and starts the process of system bootup.

Operating System IIIT Kalyani 9 image b. b Boot partition.

Operating System IIIT Kalyani 10 File and File System A modern boot loader e.g. GRUB boots the system using the OS image present in the file system of the boot partition a. This file system becomes the root file system. Other file systems are mounted b at some subdirectory of the root file system. a GRUB is aware of the file system structure. b We shall talk more about it afterward.

Operating System IIIT Kalyani 11 File Systems on My Computer $ df -T grep "^/dev" Filesystem Type 1K-blocks Used /dev/sda2 ext4 76897312 13985364 Available Use% Mounted on 59005696 20% / /dev/sda5 ext4 144182236 111274936 Available Use% Mounted on 25583228 82% /home

Operating System IIIT Kalyani 12 File System on My Computer The partition /dev/sda2 of the disk is the boot partition. It has the MBR, GRUB and the Linux image and it is mounted as the root file system. The partition /dev/sda5 is another file system (though of same type ext4). It is mounted at the subdirectory /home of the of the root file system.

Operating System IIIT Kalyani 13 Block Devices on My Computer $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom sda 8:0 0 465.8G 0 disk sda1 8:1 0 58.6G 0 part sda2 8:2 0 74.5G 0 part / sda3 8:3 0 4.7G 0 part [SWAP] sda4 8:4 0 1K 0 part sda5 8:5 0 139.7G 0 part /home

Operating System IIIT Kalyani 14 sda6 8:6 0 188.3G 0 part What is there in /dev/sda1?

Operating System IIIT Kalyani 15 File and File System As we mentioned earlier, the block-0 of a partition is reserved for the MBR. The block-1 onward contains information or road-map of the file system. In Linux the road-map is stored in a data structure called the super block.

Operating System IIIT Kalyani 16 Data and Meta data of a File A file may contain different types of data e.g. executable code, text data, image data etc. But a file in a file system has a set of meta data or attributes. A directory contains a few of the meta data of files under it. There is also a link/index to a data structure a containing other meta data. a Known as an inode in some file system.

Operating System IIIT Kalyani 17 Index-Node or inode of ext File System Data as well as most of the meta data of any file are stored in some non-volatile media like a hard disk. The data structure that stores most of the meta data of a file may be called as its index node (inode). Each inode in a file system has a unique identification number.

Operating System IIIT Kalyani 18 Index-Node or inode Disk partitions after the superblock contains the table of inodes corresponding to the files and directories present in the file system. The inode of a file contains the complete information about the device blocks where the file data is stored. When a file is opened, a main-memory copy of the inode is created in a system-wide kernel data structure called inode table.

Operating System IIIT Kalyani 19 Partitions of a Partition Block Device (Hard Disk) Partition i Boot Block Super Block Inode Table Data Blocks May be empty

Operating System IIIT Kalyani 20 Reading a Directory /* dirdata1.c++ directory content. */ #include <iostream> using namespace std; #include <sys/types.h> #include <dirent.h> int main(){ // dirdata1.c++

Operating System IIIT Kalyani 21 DIR *dp; struct dirent *ep; } dp = opendir("./"); while((ep = readdir(dp))!= NULL){ cout << "Name: " << ep->d_name << \t << "Inode: " << ep->d_ino <<endl; } return 0;

Operating System IIIT Kalyani 22 Inodes: Maximum Number and In Use Filesystem Inodes IUsed IFree IUse% Mounte $ df -i grep "^/dev" /dev/sda2 4890624 361578 4529046 8% / /dev/sda5 9158656 213133 8945523 3% /home

Operating System IIIT Kalyani 23 File Metadata Name of the file or its path starting from the root. Size in bytes or blocks. Type of the file e.g. text, executable, directory and many other types supported by the system. Date and time of creation, modification.

Operating System IIIT Kalyani 24 File Metadata owner of the file, user group etc. Access permission for different people - read, write, execute etc. Block addresses of the storage media where the data is stored. Most of these information are available in the inodes.

Operating System IIIT Kalyani 25 Mounting and Unmounting a File System A file system can be mounted under some directory of the root file system. A mounted file system can also be removed (unmounted). My flush drive has a vfat file system. It is automounted under the directory /media.

Operating System IIIT Kalyani 26 Mounting and Unmounting a File System $ mount /dev/sda2 on / type ext4 (rw,errors=remount-ro proc on /proc type proc (rw,noexec,nosuid,node... /dev/sdf on /media/c351-5753 type vfat (rw,... $ ls /media/c351-5753/ cv1.pdf midsem.pdf

Operating System IIIT Kalyani 27 Mounting and Unmounting a File System $ umount /dev/sdf $ ls /media/c351-5753/ ls: cannot access /media/c351-5753/: No such file or directory $ mkdir mpt $ sudo mount /dev/sdf./mpt $ cd mpt $ ls cv1.pdf midsem.pdf

Operating System IIIT Kalyani 28 Mounting and Unmounting a File System $ mount /dev/sda2 on / type ext4 (rw,errors=remount-ro proc on /proc type proc (rw,noexec,nosuid,node... /dev/sdf on /home/.../mpt type vfat (rw) The subdirectory /home/.../mpt is called a mount point. Both mount and unmount commands require superuser permission.

Operating System IIIT Kalyani 29 More than One Name A file is uniquely identified by its inode. But a file may have more than one names or links. The inode keeps track of the number of links of the file. Removing a link or name (unlink) does not remove the file unless the link count in the inode become zero.

Operating System IIIT Kalyani 30 More than One Link $ touch a b c $ ls -l a b c -rw-rw-r-- 1 goutam goutam 0 Oct 1 10:56 a -rw-rw-r-- 1 goutam goutam 0 Oct 1 10:56 b -rw-rw-r-- 1 goutam goutam 0 Oct 1 10:56 c

Operating System IIIT Kalyani 31 More than One Link $ ln a d $ ls -il a b c d 1577484 -rw-rw-r-- 2 goutam goutam 0 Oct 1 10 1577489 -rw-rw-r-- 1 goutam goutam 0 Oct 1 10 1577490 -rw-rw-r-- 1 goutam goutam 0 Oct 1 10 1577484 -rw-rw-r-- 2 goutam goutam 0 Oct 1 10

Operating System IIIT Kalyani 32 More than One Link The inode numbers of a and d are same. So they are two different names of the same file. In both the cases the link counts are 2. $ rm a $ ls -il a d ls: cannot access a: No such file or directory 1577484 -rw-rw-r-- 1... 0 Oct 1 10:56 d

Operating System IIIT Kalyani 33 Symbolic/Soft Link We can create a symbolic/soft link. $ ln -s d a ls -il a d 1577374 lrwxrwxrwx 1... 1 Oct 1 11:50 a -> d 1577484 -rw-rw-r-- 1... 0 Oct 1 10:56 d $ file a a: symbolic link to d

Operating System IIIT Kalyani 34 Symbolic/Soft Link Link a is a different file, with its own inode. Its data is "d" and the link count is 1. The link a is dangling after d is removed. $ rm d $ ls -l a lrwxrwxrwx 1... 1 Oct 1 11:50 a -> d $ cat a cat: a: No such file or directory

Operating System IIIT Kalyani 35 Virtual File System File operations are performed through system calls. Programming languages also supply wrapper library functions for file operations. There are many different file systems. They organize file data and their meta-data in different ways.

Operating System IIIT Kalyani 36 Virtual File System So the implementation of similar system calls can be different on these file systems. A modern OS like Linux supports different file systems through a layer of abstraction and provides an uniform interface to the user. This is called a virtual file system (VFS).

Operating System IIIT Kalyani 37 Virtual File System The VFS creates a uniform file system model with a set of objects such as file, inode, superblock etc. and defines operations on them. There is a mapping between VFS objects and objects of the underlying file system. The VFS operations are translated to the corresponding file system operations.

Operating System IIIT Kalyani 38 Virtual File System User is transparent about the low-level representation and actions. The VFS abstraction also unifies operations on other file like objects e.g. device files, pipes, sockets, special files etc.

Operating System IIIT Kalyani 39 File Operations: Open a File Consider the example of opening a file. The corresponding library function and the system call will work for any file system supported by the VFS. Through the VFS interface user can specify the path (name), modes of opening of the file and other parameters a. a open the file if it exists, create one, set proper permissions, set access mode etc.

Operating System IIIT Kalyani 40 File Operations: Open a File At the low-level it is necessary to locate the directory where the file may exist. This search is dependent on the underlying organization of the file system. If the file exits, its meta-data is extracted and loaded in the in-core inode table of VFS a. Extraction of meta-data is also file system dependent. a If it is not already present.

Operating System IIIT Kalyani 41 File Operations: Open a File If the file does not exist, its name is entered in the directory and some allocation is made in the block device. In VFS an in-memory inode is also created. The system-wide open-file table and the file descriptor table of the process are updated.

Operating System IIIT Kalyani 42 Open a New File in C++ /* fileopen1.c++ opens a new file using fopen( */ #include <stdio.h> int main(){ // fileopen1.c++ FILE *fp ; fp = fopen("./newf", "w"); fclose(fp); return 0; }

Operating System IIIT Kalyani 43 A Few File Attributes $ ls -il newf 1577486 -rw-rw-r-- 1 goutam goutam 0 Sep 30 17:39 newf 1577486 is the inode number.

Operating System IIIT Kalyani 44 Open a New File by a System Call /* fileopen2.c++ opens a file using open() */ #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(){ // fileopen2.c++

Operating System IIIT Kalyani 45 } int fd; fd = open("./newf", O_WRONLY O_CREAT, 0666); close(fd); return 0;

Operating System IIIT Kalyani 46 File Operations: Open a New File /* fileopen3.c++ opens a new file using system c */ #include <unistd.h> int main(){ // fileopen3.c++ char arg[10] = "./newf"; int fd; // fd = open("./newf", O_WRONLY O_CREAT, 0666);

Operating System IIIT Kalyani 47 } asm volatile ( "movq $2, %%rax\n\t" "movq $65, %%rsi\n\t" "movq $0666, %%rdx\n\t" "syscall\n\t" :"=a" (fd) :"D" (arg) ) ; close(fd); return 0;

Operating System IIIT Kalyani 48 The command code for this system call is 2 and it is passed through the register rax. The first, second and the third parameters are passed through rdi (D), rsi (S) and rdx (d) respectively. The value of O WRONLY and O CREAT are 1 and 64 respectively. So O WRONLY O CREAT is 65. The return value of the file descriptor is available in rax.

Operating System IIIT Kalyani 49 File Operations Other file operations are read, write, append, truncate operations. Position the read or write pointers. Close a file, unlink a file, change its access permissions, create a new name (link), rename a file etc.

Operating System IIIT Kalyani 50 Three Linux Kernel Data Structures When a file is opened by a process successfully, a file descriptor is returned. The returned file descriptor is an index of the file descriptor table of the process. A child process inherits this table a a Some descriptor may be closed on exec() call.

Operating System IIIT Kalyani 51 Three Linux Kernel Data Structures On the other hand when a file is opened its inode from the hard disk is copied a to a system-wide kernel data structure called in-core inode table b. There is a third system wide kernel data structure called open file table. a If it is not already there. b There are more fields in the in-core inode than its disk copy.

Operating System IIIT Kalyani 52 Three Linux Kernel Data Structures Each instance of file open creates an entry in the open file table (system wide). Each entry of the open file table points to the corresponding in-core inode. The file descriptor table entry of the process also points to the corresponding entry of the open file table.

Operating System IIIT Kalyani 53 Open File Description Each entry of open file table stores the following information. Current offset within the file. It is changed implicitly be read and write system calls and explicitly by lseek. Status flags and access permissions set during open system call. A pointer to the in-core inode of the file.

Operating System IIIT Kalyani 54 Three Data Structures 0 1 2 3 4 5 Process P 1 Offset Flags Ptr 0 3 5 0 Type T 1 0 1 2 3 4 Process P 2 T 2 File Descriptor Table Open File Table In core Inode table

Operating System IIIT Kalyani 55 Open File Description Consider two processes P 1 and P 2 and the corresponding file descriptor tables be T 1 and T 2. Two different entries (0,5) of T 1 point to the same entry 1 of the open file table. This is possible due to dup2() call.

Operating System IIIT Kalyani 56 Open File Description Two identical entries (1,1) of T 1 and T 2 point to same entry 4 of the open file table. This is possible when one process is a child of the other. The entry 4 of T 1 and the entry 3 of T 2 points to entry 6 and 7 respectively of the open file table. But both 6,7 points to the same entry 1 of in-core inode table. Both the processes have opened the same file.

Operating System IIIT Kalyani 57 File Read-Write and Buffer Cache Consider the following situation. A process uses a system call to read 4-bytes of data e.g. an integer from an open file. But the block device (hard disk) holding the file system reads in units of disk blocks or sectors. Typical sector size is 512 bytes. The question is where will the remaining 508 bytes of data go.

Operating System IIIT Kalyani 58 Buffer Cache The OS stores the block of data read from the disk in some part of the main memory known as buffer cache. Subsequent read from the same block may not require any disk access. The data will be transferred from the buffer cache to the user space.

Operating System IIIT Kalyani 59 Buffer Cache The current process again requesting for data will not be suspended as long as it is available from the buffer cache. A buffer cache is a memory image of some portion of a block devices. It reduces the number of disk access.

Operating System IIIT Kalyani 60 Buffer Cache Consider a situation when a process wishes to update 4-bytes of data at some location in a file. The disk block containing the position of these 4-bytes will be read to the buffer cache and the data will be written in the appropriate position in the buffer.

Operating System IIIT Kalyani 61 Buffer Cache This update of the buffer cache block makes it different from the corresponding disk block. This is called a dirty block. It is necessary to write back the dirty block for data integrity. But usually the write is deferred with the expectation that some more write will take place in the same block.

Operating System IIIT Kalyani 62 Buffer Cache Subsequent writes in the same block will not suspend the processor. This is called asynchronous write. An asynchronous write will speed-up a process. But there is a problem - if the system fails in between there may be data inconsistency.

Operating System IIIT Kalyani 63 Buffer Cache A file may be opened in a mode a such that the process is blocked until the data is written back. This is called a synchronous write. System failure and data inconsistency is a much broader and important issue handled by a modern OS by its journaling file systems. a Find from the man page of open().

Operating System IIIT Kalyani 64 Buffer Cache An important issue in buffer cache management is the information about the blocks present in the memory. Once a block of a device is requested by the file system it is important to quickly know whether the block is present in the memory.

Operating System IIIT Kalyani 65 Memory Mapping of File So far our access to a file is through the system calls read() and write(). But a file can be mapped to the logical address space of a process. A memory mapped file can be accessed as memory locations.

Operating System IIIT Kalyani 66 Memory Mapping of File /* memorymap1.c++ memory mapping of a file for r */ #include <iostream> using namespace std; #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h>

Operating System IIIT Kalyani 67 #include <unistd.h> int main(){ int fd, size; char *cp; fd = open("./data1", O_RDONLY); size = sysconf(_sc_page_size); cp = (char *)mmap(0, size, PROT_READ, MAP_PRIVATE, fd, 0); write(1, cp, size);

Operating System IIIT Kalyani 68 cout << endl; } close(fd); return 0;

Operating System IIIT Kalyani 69 Executing a Memory Mapping of File /* memorymap2.c++ memory mapping of a file of executable code */ #include <iostream> using namespace std; #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h>

Operating System IIIT Kalyani 70 #include <fcntl.h> #include <unistd.h> int main(){ int fd, size, n, fact; char *cp; fd = open("./machinecode", O_RDONLY); size = sysconf(_sc_page_size); cp = (char *)mmap(0, size, PROT_EXEC, MAP_PRIVATE, fd, 0);

Operating System IIIT Kalyani 71 } cout << "Enter a +ve integer: " ; cin >> n ; asm volatile ( "callq *%%rsi \n\t" :"=a" (fact) :"S" (cp), "D" (n) ) ; cout << n << "! = " << fact << endl; close(fd); return 0;

Operating System IIIT Kalyani 72 Executing a Memory Mapping of File The file./machinecode contains the machine code of a function computing factorial. The file is opened and mapped to the logical address space of the process with the memory protection execute.

Operating System IIIT Kalyani 73 Executing a Memory Mapping of File The in-line assembly code is used to pass n as a parameter to the function through the register rdi. The function call address available in cp is passed to the register rsi (indirect call). The returned value of the function from the register rax is transferred to fact.

Operating System IIIT Kalyani 74 Memory Mapped File After memory mapping, a file is part of the virtual address space of the process. To access the content of a memory mapped file it is necessary to copy its from the block device to the page frames. This gives rise to the concept of page cache.

Operating System IIIT Kalyani 75 Buffer Cache to Page Cache A block of buffer cache is related to a block of a block-device. It does not have any direct connection with the page size of the logical address space or the page frame of physical memory. On the other hand a page cache stores pages of memory mapped files.

Operating System IIIT Kalyani 76 Buffer Cache to Page Cache Conventional read from a file reads the blocks from the device to the buffer cache and then copies it to the user space. On the other hand when data is read from a memory mapped file, it is read directly form the page cache in the kernel space. This makes memory mapped reading slightly faster.

Operating System IIIT Kalyani 77 Page Cache Older Unix and similar systems used to have both types of cache. But it had the extra overhead of double caching a. In modern OS both caches are unified and is called page cache. The buffer cache is accommodated within a page cache. a Copying data between page and buffer caches.

Operating System IIIT Kalyani 78 Types of Data in Page a Cache A page of a page cache in Linux may contain different types of data. A page of a memory mapped file or directory. A set of data buffers read from a block device i.e. a set of buffer cache blocks. Part of memory image of a process swapped out to disk a. a In the swap area or in the file system.

Operating System IIIT Kalyani 79 Types of Data in Page Cache Each page in a page cache belongs to some file and has its owner. Read-write operations on different types of pages a are handled in different ways. We briefly describe block device buffers within a page cache in Linux. a Page from a regular file, device file or swap area etc.

Operating System IIIT Kalyani 80 Page Cache used as Disk Block Buffer A cache page used as a buffer of a block device is called a buffer page. A buffer page contains several blocks of the block device as the page size is a multiple of the size of a block. Blocks present in a page need not be adjacent in the logical device.

Operating System IIIT Kalyani 81 Page Cache used as Disk Block Buffer Each buffer block within a page cache block has an associated data structure called the buffer head. The data structure contains pointer to its block device data structure, block number within the device etc.

Operating System IIIT Kalyani 82 Dirty Page and Write-Back When a process modifies the content of a page cache, the page is marked dirty. It is necessary to write-back a dirty page. But there is an advantage to defer the write-back operation. Less number of disk access improves the performance.

Operating System IIIT Kalyani 83 Dirty Page and Write-Back Also a read operation may suspend a process, but often a delayed write operation does not. The main disadvantage of delayed write is a system failure. Updated versions of the file blocks cannot be retrieved from the memory once the system fails.

Operating System IIIT Kalyani 84 Dirty Page and Write-Back In case of a buffer page each buffer head keeps track of whether the corresponding buffer is dirty. A buffer page is marked dirty if any of its buffer is dirty. But while writing-back a buffer page, only the dirty blocks are written.

Operating System IIIT Kalyani 85 Allocation of File Space Data blocks of a file system partition are indexed sequentially. The device driver translates this index to the parameters of the block device e.g. cylinder number, head number, sector or cluster number of a hard-disk. A disk cluster a is the quantum of space allocated to a file. a A cluster is a collection of one or more contiguous sectors.

Operating System IIIT Kalyani 86 Allocation of File Space There are two essential goals of a space allocation policy for a file on a device - fast access of data from a file and good utilization of device space. We shall talk about three strategies, contiguous allocation, linked allocation and indexed allocation.

Operating System IIIT Kalyani 87 Contiguous Allocation A sequence of contiguous clusters are allocated to a file. It requires a small amount of meta-data about allocation, the first cluster address and the file size. Both sequential and direct data access are fast in this allocation.

Operating System IIIT Kalyani 88 Contiguous Allocation Unfortunately there are several problems. Disk partition may suffer from external fragmentation similar to memory. Increase in file size beyond the empty space available in the cluster may require complete reallocation of the file a. a There is the problem of initial allocation, the file size may be unknown at the beginning. Even otherwise insertion of data may lead to data copy on the device.

Operating System IIIT Kalyani 89 Linked Allocation The data blocks of a file may be allocated in non-contiguous disk clusters. Each device cluster along with the data stored in it, also stores a meta-data, the address of the next cluster. It is a linked-list of clusters. The meta-data stored in the inode is the address of the first cluster and the size of the file.

Operating System IIIT Kalyani 90 Linked Allocation The problem of pure linked allocation is direct access to the file, which may require reading several clusters. One solution is to maintain the linked list of clusters in a table which can be loaded in the main memory. This was done in the file allocation table (FAT) of MS-DOS.

Operating System IIIT Kalyani 91 FAT The FAT had one entry for each cluster. It indicates whether the cluster is free or in-use. If it belongs to a file, it also contains the address of the next cluster of the file. The meta-data is the starting cluster number a of the file. a The FAT can be loaded in the memory and traversal in the linked-list can be fast.

Operating System IIIT Kalyani 92 Indexed Allocation This scheme on principle stores the addresses of all clusters in a data structure called index node (i-node). The meta-data for a file is stored in the inode. The inode block holds the addresses of different data blocks or it is nil if the block is empty.

Operating System IIIT Kalyani 93 Indexed Allocation Every file has an index node (inode). The question is, what should be the size of an index node? The space used by the meta-data should not be large for a small file. But for a large file the number of clusters are large, and that requires the storage of larger number of cluster addresses.

Operating System IIIT Kalyani 94 Indexed Allocation One possible solution is a list of inode blocks. The first inode block corresponding to a file contain other meta-data of the file e.g. name, size, dates etc. It also contains address of first set of data blocks and a link (address) to the next index block if necessary.

Operating System IIIT Kalyani 95 Indexed Allocation Another possibility is a multilevel index scheme. The first index block along with other meta-data contains the addresses of the second index blocks. The second level index blocks contain addresses of the actual data blocks. The scheme may be extended further.

Operating System IIIT Kalyani 96 Indexed Allocation Ext3 file system inode stores block addresses of first few data blocks a. Then there are three indirection blocks. The first one stores the address of a block that in turn stores the addresses of data blocks. The second stores the address of a double indirection block and the third one stores the address of a triple indirection block. a Along with other meta data.

Operating System IIIT Kalyani 97 Ext3 Index Node

Operating System IIIT Kalyani 98 Other meta data Data Direct Address Data Data Data Indirect Indirection Block Data 2 Indirect 3 Indirect Double Ind. Data Data I Node Triple ind. Indirection Block Data Data Double Ind. Indirection Block

Operating System IIIT Kalyani 99 Data Inconsistency Due to System Failure Dirty data or meta-data blocks of a file system are kept in the main memory for a long time for access efficiency. But there may be catastrophic events like power failure or system crash. This may leave the file system in inconsistent state.

Operating System IIIT Kalyani 100 An Example of Inconsistency When a new file is created, (i) an inode is allocated, (ii) a directory entry is made that points to the allocated inode. If the system crashes after the inode is marked allocated, but before the directory entry is written on the disk, the inode is logically lost (inode leak).

Operating System IIIT Kalyani 101 File System Check File systems are checked (fsck) for consistency while mounting a Such a check can detect the lost inode of the previous example, but it requires an exhaustive search. Consistency check of a file system takes time. Depending on its size it may take hours for a large system. a Whether it was properly unmounted

Operating System IIIT Kalyani 102 Journaling File System A journaling or log-based file system keeps track of modifications that has not yet been posted in the file system. The data structure that stores such modifications is called a journal. A journal may log incomplete modification of metadata only, or every block that is to be written in a file system.

Operating System IIIT Kalyani 103 Journaling File System In general it is a two phase process. Data or metadata to be written in the file system is first stored in the journal. Once the journal IO is complete, the data is written in the actual file system. When the file system IO is complete, the data written in the journal is discarded.

Operating System IIIT Kalyani 104 Consistency Check with Journal At mount time the journal is checked instead of an exhaustive search. The system may crash before the journal IO is complete or it may crash during the file system IO. In the first event the intended change is discarded and the file system remains consistent at its old state.

Operating System IIIT Kalyani 105 Consistency Check with Journal In the second case the intended modification is available in the i journal and the file system can be updated. The consistency is not to be confused with correctness. Even with journaling a process may fail to do the intended job.

Operating System IIIT Kalyani 106 Consistency Check with Journal Normally the consistency is at the granularity of system call. The state of the file system will be either as it was before the call or as it will be after it. A read may succeed, but the following write may fail. Resulting data loss by the process.

Operating System IIIT Kalyani 107 Performance Penalty It is clear that this two phase process leads to degradation of performance. So often a journaling file system does not copy all blocks in the journal. The journal is maintained only for metadata blocks a of the file system. a In Ext3/Ext4 file systems metadata are superblock, group descriptor blocks, inode and data block bitmap blocks, inode blocks, indirect addressing blocks of files.

Operating System IIIT Kalyani 108 Bibliography 1. Operating System Concepts by Abraham Silberschatz, Peter B Galvin & Gerg Gagne, 9 th ed., Wiley Pub., 2014, ISBN 978-81-265-5427-0. 2. Understanding the Linux Kernel by Daniel P Bovet & Marco Cesati, 3 rd ed., O Reilly, ISBN 81-8404-083-0. 3. A Linus Programming Interface by Michael Kerrisk, Pub. no starch press, 2010, ISBN 978-1-59327-220-3. 4. xv6 a simple, Unix-like Teaching Operating System by Russ Cox, Frans Kaashoek & Robart Morris, xv6-book@pdos.csail.mit.edu, Draft as of September 3, 2014.