Local File Stores CIS657 Job of a File Store Recall that the File System is responsible for namespace management, locking, quotas, etc. The File Store s responsbility is to mange the placement of data and index blocks on the disk. Prior to 4.4 BSD, these were rolled into one module, also called the filesystem (remember, there were no vnodes). Physical Disk Layout Overhead view sector track cylinder Disk blocks are composed of one or more contiguous sectors. The same track on each platter in a disk makes a cylinder; partitions are groups of contiguous cylinders 1
Sample Partition Table Disk /dev/sda sda: : 255 heads, 63 sectors, 1106 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/sda1 1 261 2096451 83 Linux /dev/sda2 262 1106 6787462+ 5 Extended /dev/sda5 262 264 24066 83 Linux /dev/sda6 265 814 4417843+ 83 Linux /dev/sda7 815 1089 2208906 83 Linux /dev/sda8 1090 1106 136521 82 Linux swap vnode File Store Operations Valloc, vfree Update Vget, blkatoff,, read, write, fsync Truncate Object creation and deletion Attribute update Object read/write Change in space allocation (size) valloc and vfree Valloc creates a new object returns a number (identifier) mapping of names to identifiers is the job of the namespace code; filestore only deals with numbers Vfree takes the number of an object and releases the storage holding that object 2
update Changes attributes of an object Owner Group Permissions Timestamps Does not interpret these fields in any way vget,, read, write, blkatoff, fsync Vget retrieves an entire object from the filestore Read copies data from the object into a buffer (uses uio structure) Write copies data from a buffer to the object (uses uio structure) Blkatoff is like read, but returns a pointer instead of copying data (stays in kernel) Fsync writes out all dirty buffers for the object truncate Changes the amount of space an object has Historically, truncate only shortened objects (decreased their size) In 4.4BSD, truncate can expand the size of an object (confusing name) 3
Filestores and Partitions One-to-one relationship No more than one filestore per partition Filestores may not span partitions Filestore is responsible for managing space within its partition Creation, storage, retrieval, deletion of files Flat name space (inode( numbers and data block numbers) Allocation Strategies The old days: contiguous allocation All blocks of a file stored together As the file grows, it moves around the disk Requires compaction Use of indexed allocation (inodes( inodes) allows non-contiguous (scattered) allocation No compaction necessary Block I/O Even if a user just asks for one byte, the disk transfers a block The file system divides files into fixed-size logical blocks Size depends on underlying filestore Logical blocks are stored in physical disk blocks One or more contiguous sectors E.g., 8,192-byte blocks and 512-byte sectors 4
Disk Request Handling User sees array of bytes User makes request with a pointer to a buffer and a length No alignment guarantees with respect to blocks No size guarantees with respect to blocks Disk blocks are buffered in filesystem buffer cache (remember unified buffer cache) Steps in Request Handling: Example of a simple write Iterate as follows: Allocate a buffer Determine location of physical block on disk Request disk controller to read contents of buffer, and wait. Copy from the user s s I/O buffer to the system buffer Write block to disk and continue (don t t wait) Anatomy of a Request: write(fd fd,, buffer, cnt) buffer Logical file System buffers Logical blocks 0 1 2 3 disk 3 1 12767 90255 2 32447 0 82653 5
Traditional Unix File Systems Filesystem descriptive information kept in the Superblock Number of data blocks Maximum number of files Pointer to free list About 3% of the blocks were inodes All inodes grouped together, followed by data 512-byte blocks, often on different cylinders Drives up seek time per byte transferred Berkeley old (3BSD) File System Improve reliability and throughput Stage modifications to critical file structures (make them atomic), facilitating recovery Double block size 2x as much data transferred on each read More than doubled performance More files fit in direct blocks of files Problems with Old File System The free list started out with nice grouping As files were created and deleted, the free list fragmented Essentially random placement of data blocks Throughput dropped by a factor of 5-6 in a few weeks 6
Key Observation What is the dominant factor in disk operations? Keeping all the data blocks for a file on the same cylinder, or a few close cylinders, would ameliorate this Need to keep inodes near the data, too Berkeley Fast File System (UFS1) 4,096-byte blocks (or power of 2 larger) Allows 2^32 (2 gigabyte) files with 2 levels of indirection Block size recorded in superblock Use cylinder groups to reduce scattering Groups of consecutive cylinders on the disk Inode and data for a file stay in the cylinder Cylinder Groups Bookkeeping information Redundant copy of superblock Bitmap of free blocks Summary information about allocation Default: 1 inode per 2048 bytes of space in the group (more than we should need) Bookkeeping information staggered across platters for availability 7
Wasted Space with Block Sizes Block size (bytes) 512 1024 2048 4096 8192 16384 % total Waste 7.4 8.8 11.7 15.4 29.4 62.0 % data Waste 1.1 2.5 5.4 12.3 27.8 61.2 % inode Waste 6.3 6.3 6.3 3.1 1.6 0.8 1993 survey: median file size < 2048, mean 22k Tradeoffs Previous chart showed tradeoff between block size and waste Throughput goes up with block size. Why? Is this necessarily a good measure? Maximum file size goes up with block size How can we get the best of both worlds? Fragments (uniform pieces of blocks) can be allocated, e.g. a 4096/1024 file system Parameterized Filesystems File system performance can depend on many factors Processor speed Hardware (controller) support for large transfers and caching Maximum disk bandwidth Rotational/seek latencies 8
Layout Policies Global Group data within the same cylinder group Looked at another way, spread unrelated data across cylinder groups Inode and data block allocation (coarse) Local Which data blocks to allocate (take into account ability of disk to read contiguous blocks) More on Global Layout Inodes Inodes in the same directory are often accessed together (e.g., ls) When allocating space for a new directory, find a cylinder group with few directories and a greater than average number of free inodes Why? Data blocks Allocate space for large files across cylinder groups Keeps blocks in the same group contiguous Prevents any one cylinder from being too full (forcing other files to spill over) 9