1 SOFTWARE ARCHITECTURE 2. FILE SYSTEM Tatsuya Hagino hagino@sfc.keio.ac.jp lecture URL https://vu5.sfc.keio.ac.jp/sa/login.php
2 Operating System Structure Application Operating System System call processing File System Process Management Network Management Memory Management Bootstrap Scheduler Device Management Hardware
3 What is a File? A unit to store information in an external storage sometimes called set Characteristics of a file Information in a file is non-volatile. Its content does not disappear when power is off. Information in a file is persistent. It exists forever. You can start with what you have left. File System Manage files on an external storage Windows NTFS MacOS HFS Unix UFS External Storage (SSD, HDD, USB) File File File related conventions: file name file structure file type file access method
4 File Naming Convention Each file has a name. File name file name Letters used for file name case insensitive: lower and upper letters are not distinguished. case sensitive: lower and upper letters are distinguished. Encoding of non alphabet characters (e.g. kanji file name) Length of file name MS-DOS limits 8+3 characters. UNIX limits 255 characters. file name File document.docx File extension It may express its file type: Windows: use it to find associated program. Mac: uses to hold file type internally. UNIX: depends on applications Compilers may use it to select programming language. file extension
5 File Structure Sequence of bytes No structure in a file User can create arbitrary fields. A text file uses LF or CR to separate lines. May not be efficient when reading or writing. vs Sequence of records Consist of fixed length records. For a punch card, each record consists of 80 characters. (i.e. one record = one line) Efficiently read and write records. line 1 char char char LF record 1 char char char char char char line 2 char char char char char LF record 2 char char char char char char line 3 char char char char LF record 3 char char char char char char line 4 char char LF record 4 char char char char char char
6 File Type Regular File: text file or binary file Directory: Folder Manage a set of files Character Special File: Input/output device Serial devices like terminal, printer, network, etc. Block Special File: Devices with block access (i.e. read/write blocks) HDD, SSD, etc. File system can be created on it.
7 File Access Method Sequential access vs Read/write a file one byte by one byte sequentially. Cannot skip or change the order of reading/writing. May rewind it and start from the beginning again. Random access Read/write a file randomly in any order. In UNIX, the next position can be specified using seek system call. one by one from beginning to end in any order
8 Hierarchical File System parent directory Implemented by allowing a directory to have other directories inside. regular file child directory Implementation of a directory: Each directory consists of a list of entries. Each entry consists of a file name and a pointer to a file. A file can be pointed from more than one directory. File sharing (hard link) File regular file regular file Specifying a file Use path name Concatenate file names with separation characters UNIX / Windows Old Mac OS : directory a.docx bb.c sa.tex xyz file information size owner access blocks file block
9 Path Name Absolute path name Starting from the root directory, sub directory names are listed with / separators. /home/hagino/sa16/02.pptx Relative path name Starting from the current working directory sa16/02.pptx / root directory bin sbin usr var home etc tmp ls bin sbin local root hagino passwd grep current working directory bin sa16 01.pptx 02.pptx
10 File Read/Write Mechanism UNIX as an example Directly use system calls to read/write a file Use standard input/output libraries Application Standard Input/Output Library OS File System related System Call Buffer Management Device Driver Hardware HDD SSD
11 File System related System Calls read from the file file name (path name) read open start access File Descriptor write to the file write close end access ioctl attributes other operations change the position of next read/write seek position
12 File Descriptor A small number returned by open system call specifies an opened file used by read/write remembers a position of read/write Managed by each process The meaning of file descriptors is local to each process. Even in a single process, if the same file is opened more than onece, different file descriptors are assigned. Special file descriptors: standard input 0 standard output 1 standard error output 2
13 Example program of reading a file using system calls #include <fcntl.h> int main(argc, char *argv[]) { int fd, n; char buf[512]; fd = open(argv[1], O_RDONLY); while ((n = read(fd, buf, 512)) > 0) { write(1, buf, n); close(fd);
14 Standard Input/Output Library System call Not easy to use e.g. There is no system call to read one line. Inefficient for small usage System call needs to go though OS protection boundary (user mode to supervise mode). Costs much more than calling a library. Standard input/output library Useful interface read one line read one character Optimize system call access Use buffer Application fopen, fclose, fgetc, fputc, fgets, fputs Standard input/output library open, close, read, write System call buffer
How to Use Standard Input/Output Libraries read one character 15 file name (path name) fopen fgetc one character read one line gets one line start access end access fclose fprintf convert value to string value FILE structure fputs one line fputc write one line convert string to value fscanf value write one character one character
Example program of reading a file using standard input/output libraries 16 #include <stdio.h> int main(argc, char *argv[]) { FILE fp; int ch; fp = fopen(argv[1], "r"); while ((ch = fgetc(fp)) >= 0) { fputc(ch, stdout); fclose(fp);
17 Implementation of Standard Input/Ouput Library fopen FILE *fopen(char *path, char *mode) { FILE *fp; fp = (FILE *)malloc(sizeof(struct FILE)); fp->fd = open(path,...); fp->size = 0; fp->counter = 0; return fp; FILE structure struct FILE { int fd; char buf[bufsize]; int size; int counter; ; typedef struct FILE FILE; size Application buf holds temporal OS FILE structure counter next read/write potition System call
18 Implementation of fgetc int fgetc(file *fp) { fgetc if (fp->counter >= fp->size) { fp->size = read(fp->fd, fp->buf, BUFSIZE); if (fp->size <= 0) return -1; fp->counter = 0; return fp->buf[fp->counter++]; assume BUFSIZE = 512 (1) checks buf (2) read 512 bytes fgetc If there is in buf, no read (4) returns one character buf (3) fill buf with 512 bytes OS file
19 Implementation of System Call Process Management Structure File Descriptor Table 0 1 2 3 4 File Descriptor File Descriptor File Descriptor File Descriptor For each opened file Holds read/write position inode block numbers inode block numbers inode (index node) For each file Manages file information owner size file blocks Block Device
20 Implementation of open Returns a file descriptor for a given file Calls namei to find its inode. Finds an empty file descriptor slot and sets a new file structure. int open(char *path, int flags,...) { struct inode *ip; int fd; struct file *fp; ip = namei(path); if (ip) { fp = create a new file structure; fp->inode = ip; fp->offset = 0; fp->refcount = 1; fd = unused file descriptor of proc; proc->file[fd] = fp; return fd; else return -1; file structure struct file { struct inode *inode; long offset; int refcount; ;
21 namei Following the path, checks each directory to fined the named file. struct inode *namei(char *path) { struct inode *dp; if (*path == / ) { dp = proc->root_directory; path++; else dp = proc->current_working_directory; while (*path) { name = select next name from path; dp = lookup(dp, name); return dp;
22 Implementation of File Each file consists of a list of blocks on a disk (or a block device). UNIX uses inode to manage block numbers FAT file system uses FAT to manage inode structure Filesystem on a block device struct inode { ; short mode; int uid int gid; long size;... int atime; int mtime; int ctime;... int direct[12]; int indirect[3]; owner size modification time inode block direct reference block numbers (12 blocks) indirect block numbers (3 blocks) inode block block
23 Direct and Indirect Blocks Each inode points 12 direct blocks. When one block is 512 bytes, it can hold a file less than 512 12=6KB. An indirect block contains a list of blocks. inode infomation direct 1 direct 2 indirect block indirect block direct 12 indirect 1 indirect 2 indirect 3 indirect block indirect block
24 Calculation of Block Number From file position to calculate corresponding block number A file descriptor holds a position of read/write and the next read/write access the corresponding block. int balloc(struct inode *ip, long offset) { struct buf *bp; int i; blk = (offset + BLKSIZE - 1) / BLKSIZE; if (blk < 12) return ip->direct[blk]; blk -= 12; blocks = BLKSIZE/sizeof(int); for (i = 0; i < 3; i++) { if (blk < blocks) break; blk -= blocks; blocks *= BLKSIZE/sizeof(int); bp = getblock(ip->indirect[i]) while (i-- > 0) { blocks /= BLKSIZE/sizeof(int); bp = getblock(bp->buf[blk/blocks]); blk %= blocks; return bp->buf[blk]; file file descriptor position read write
25 Buffering Disk blocks are cached in OS memory. Read: First time: reads from the disk and put it in the cache. Second time: do not access disk but use in the cache. Write: Do not write to the disk immediately. Write them out all later. Hash table for buffers buffer buffer buffer buffer buffer buf structure struct buf { ; struct buf *next, *prev; struct buf *queue_next, *queue_prev; struct inode *inode; int dirty; int blkno; char buf[blksize]; struct buf buffers[hash]; recently used
26 Disk Format A new disk needs to be formatted. File system on a disk Superblock inode list of unused blocks disk information inode information Unused blocks are managed by a list or a bitmap. blocks actual blocks
27 Summary File System UNIX file system as an example. System call vs Standard Input/Output Library Implementation of the library File Descriptor Implementation of File System inode Direct and indirect blocks