Disk Management and File Systems. Dr. Yingwu Zhu

Similar documents
CS162 Operating Systems and Systems Programming Lecture 17. Disk Management and File Systems. Review: Want Standard Interfaces to Devices

Operating Systems (1DT020 & 1TT802) Lecture 11. File system Interface (cont d), Disk Management, File System Implementation.

Page 1. Recap: File System Goals" Recap: Linked Allocation"

CS162 Operating Systems and Systems Programming Lecture 17. Disk Management and File Systems. Page 1

Operating Systems (1DT020 & 1TT802) Lecture 12. File System Implementation (continued) I/O systems. Léon Mugwaneza

CS162 Operating Systems and Systems Programming Lecture 19. File Systems continued Distributed Systems. Review: A Little Queuing Theory: Some Results

CS162 Operating Systems and Systems Programming Lecture 17. Disk Management and File Systems

CS162 Operating Systems and Systems Programming Lecture 14 File Systems (Part 2)"

Page 1. Review: Storage Performance" SSD Firmware Challenges" SSD Issues" CS162 Operating Systems and Systems Programming Lecture 14

Page 1. CS194-3/CS16x Introduction to Systems. Lecture 15. Filesystems

CS162 Operating Systems and Systems Programming Lecture 17. Disk Management and File Systems

CS162 Operating Systems and Systems Programming Lecture 18. File Systems, Naming, and Directories

CS162 Operating Systems and Systems Programming Lecture 18. File Systems, Naming, and Directories

CS162 Operating Systems and Systems Programming Lecture 20. Reliability and Access Control / Distributed Systems

Page 1. CS162 Operating Systems and Systems Programming Lecture 20. Reliability and Access Control / Distributed Systems.

Page 1. CS194-3/CS16x Introduction to Systems. Lecture 14. I/O, Disks, Filesystems. Goals for Today. I/O Systems Hardware Access Device Drivers

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS162 Operating Systems and Systems Programming Lecture 16. I/O Systems. Page 1

Page 1. CS162 Operating Systems and Systems Programming Lecture 16. File Systems, Naming, Directories, and Caching"

Page 1. Quiz 13.1: Synchronization" Quiz 13.1: Synchronization" CS162 Operating Systems and Systems Programming Lecture 13. Disk/SSDs, File Systems"

Lecture S3: File system data layout, naming

Page 1. CS162 Operating Systems and Systems Programming Lecture 13. File Systems, Naming, Directories, and Caching"

Disk Scheduling COMPSCI 386

I/O Performance" CS162 C o Operating Systems and Response" 300" Time (ms)" User" I/O" Systems Programming Thread" lle device" 200" Lecture 13 Queue"

we are here Page 1 Recall: How do we Hide I/O Latency? I/O & Storage Layers Recall: C Low level I/O

Chapter 11: File System Implementation. Objectives

Operating Systems. Operating Systems Professor Sina Meraji U of T

File Systems. CS170 Fall 2018

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

CS162 Operating Systems and Systems Programming Lecture 16. Page Allocation and Replacement (con t) I/O Systems

CS162 Operating Systems and Systems Programming Lecture 20. Reliability and Access Control / Distributed Systems

CSE380 - Operating Systems. Communicating with Devices

November 9 th, 2015 Prof. John Kubiatowicz

we are here I/O & Storage Layers Recall: C Low level I/O Recall: C Low Level Operations CS162 Operating Systems and Systems Programming Lecture 18

I/O CANNOT BE IGNORED

CSE 380 Computer Operating Systems

File Layout and Directories

Quiz 12.1: I/O" Quiz 13.1: Synchronization" Quiz 12.1: I/O" Page 1. CS162 Operating Systems and Systems Programming Lecture 13

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Chapter 10: Mass-Storage Systems

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

Chapter 11. I/O Management and Disk Scheduling

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

CS 537 Fall 2017 Review Session

V. Mass Storage Systems

CSE 120. Overview. July 27, Day 8 Input/Output. Instructor: Neil Rhodes. Hardware. Hardware. Hardware

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

CS3600 SYSTEMS AND NETWORKS

CS5460: Operating Systems Lecture 20: File System Reliability

COMP 3361: Operating Systems 1 Final Exam Winter 2009

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

Mass-Storage Structure

Chapter 12 File-System Implementation

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

CSCI-GA Operating Systems. I/O : Disk Scheduling and RAID. Hubertus Franke

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13

Disk scheduling Disk reliability Tertiary storage Swap space management Linux swap space management

Chapter 11: Implementing File-Systems

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

I/O CANNOT BE IGNORED

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

Mass-Storage Structure

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Storage Systems. Storage Systems

COMP 530: Operating Systems File Systems: Fundamentals

Tape pictures. CSE 30341: Operating Systems Principles

Chapter 12: Mass-Storage

I/O and file systems. Dealing with device heterogeneity

Operating Systems 2010/2011

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

CSE380 - Operating Systems

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

Chapter 14: Mass-Storage Systems

Disks and I/O Hakan Uraz - File Organization 1

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now?

Silberschatz, et al. Topics based on Chapter 13

Chapter 11: Implementing File Systems

I/O Management and Disk Scheduling. Chapter 11

Chapter 11: Implementing File Systems

Introduction. Secondary Storage. File concept. File attributes

Final Exam Preparation Questions

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

Chapter 11: Implementing File Systems

Module 13: Secondary-Storage

I/O Device Controllers. I/O Systems. I/O Ports & Memory-Mapped I/O. Direct Memory Access (DMA) Operating Systems 10/20/2010. CSC 256/456 Fall

I/O Systems and Storage Devices

I/O & Storage. Jin-Soo Kim ( Computer Systems Laboratory Sungkyunkwan University

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

Storage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Chapter 12: Mass-Storage Systems. Operating System Concepts 8 th Edition,

Transcription:

Disk Management and File Systems Dr. Yingwu Zhu

Review: Want Standard Interfaces to Devices Block Devices: e.g. disk drives, tape drives, Cdrom Access blocks of data Commands include open(), read(), write(), seek() Raw I/O or file-system access Memory-mapped file access possible Character Devices: e.g. keyboards, mice, serial ports, some USB devices Single characters at a time Commands include get(), put() Libraries layered on top allow line editing Network Devices: e.g. Ethernet, Wireless, Bluetooth different enough from block/character to have own interface Unix and Windows include socket interface Separates network protocol from network operation Includes select() functionality Usage: pipes, FIFOs, streams, queues, mailboxes

Review: How Does User Deal with Timing? Blocking Interface: Wait When request data (e.g. read() system call), put process to sleep until data is ready When write data (e.g. write() system call), put process to sleep until device is ready for data Non-blocking Interface: Don t Wait Returns quickly from read or write request with count of bytes successfully transferred Read may return nothing, write may write nothing Asynchronous Interface: Tell Me Later When request data, take pointer to user s buffer, return immediately; later kernel fills buffer and notifies user When send data, take pointer to user s buffer, return immediately; later kernel takes data and notifies user

Review: How does the processor talk to the device? CPU Interrupt Controller Processor Memory Bus Bus Adaptor Other Devices or Buses Interrupt Request CPU interacts with a Controller Contains a set of registers that can be read and written May contain memory for request queues or bit-mapped images Bus Adaptor Address+ Data Regular Memory Device Controller Bus Interface read write control status Registers (port 0x20) Hardware Controller Addressable Memory and/or Queues Memory Mapped Region: 0x8f008020 Regardless of the complexity of the connections and buses, processor accesses registers in two ways: I/O instructions: in/out instructions Example from the Intel architecture: out 0x21,AL Memory mapped I/O: load/store instructions Registers/memory appear in physical address space I/O accomplished with load and store instructions

Review: Memory-Mapped Display Controller Example Memory-Mapped: Hardware maps control registers and display memory to physical address space Addresses set by hardware jumpers or programming at boot time Simply writing to display memory (also called the frame buffer ) changes image on screen Addr: 0x8000F000 0x8000FFFF Writing graphics description to commandqueue area Say enter a set of triangles that describe some scene Addr: 0x80010000 0x8001FFFF Writing to the command register may cause on-board graphics hardware to do something Say render the above scene Addr: 0x0007F004 Can protect with page tables 0x80020000 0x80010000 0x8000F000 0x0007F004 0x0007F000 Graphics Command Queue Display Memory Command Status Physical Address Space

Our focus Disk Performance Hardware performance parameters Queuing Theory (introduced) File Systems Structure, Naming, Directories, and Caching

Transferring Data To/From Controller Programmed I/O: Each byte transferred via processor in/out or load/store Pro: Simple hardware, easy to program Con: Consumes processor cycles proportional to data size Direct Memory Access: Give controller access to memory bus Ask it to transfer data to/from memory directly Sample interaction with DMA controller:

A Kernel I/O Structure

Device Drivers Device Driver: Device-specific code in the kernel that interacts directly with the device hardware Supports a standard, internal interface Same kernel I/O system can interact easily with different device drivers Special device-specific configuration supported with the ioctl() system call Device Drivers typically divided into two pieces: Top half: accessed in call path from system calls implements a set of standard, cross-device calls like open(), close(), read(), write(), ioctl(), strategy() This is the kernel s interface to the device driver Top half will start I/O to device, may put thread to sleep until finished Bottom half: run as interrupt routine Gets input or transfers next block of output May wake sleeping threads if I/O now complete

Life Cycle of An I/O Request User Program Kernel I/O Subsystem Device Driver Top Half Device Driver Bottom Half Device Hardware

I/O Device Notifying the OS The OS needs to know when: The I/O device has completed an operation The I/O operation has encountered an error I/O Interrupt: Device generates an interrupt whenever it needs service Handled in bottom half of device driver Often run on special kernel-level stack Pro: handles unpredictable events well Con: interrupts relatively high overhead Polling: OS periodically checks a device-specific status register I/O device puts completion information in status register Could use timer to invoke lower half of drivers occasionally Pro: low overhead Con: may waste many cycles on polling if infrequent or unpredictable I/O operations Actual devices combine both polling and interrupts For instance: High-bandwidth network device: Interrupt for first incoming packet Poll for following packets until hardware empty

Hard Disk Drives Read/Write Head Side View Western Digital Drive http://www.storagereview.com/guide/ IBM/Hitachi Microdrive

Properties of a Hard Magnetic Disk Platters Sector Track Properties Independently addressable element: sector OS always transfers groups of sectors together blocks A disk can access directly any given block of information it contains (random access). Can access any file either sequentially or randomly. A disk can be rewritten in place: it is possible to read/modify/write a block from the disk Typical numbers (depending on the disk size): 500 to more than 20,000 tracks per surface 32 to 800 sectors per track A sector is the smallest unit that can be read or written Zoned bit recording Constant bit density: more sectors on outer tracks Speed varies with track location

Disk I/O Performance User Thread Queue [OS Paths] Controller Response Time = Queue+Disk Service Time Disk Response Time (ms) 0% Throughput (Utilization) (% total BW) Performance of disk drive/file system Metrics: Response Time, Throughput Contributing factors to latency: Software paths (can be loosely modeled by a queue) Hardware controller Physical disk media Queuing behavior: Can lead to big increases of latency as utilization approaches 100% 300 200 100 0 100%

Magnetic Disk Characteristic Track Sector Cylinder: all the tracks under the head at a given point on all surface Head Read/write data is a three-stage process: Platter Seek time: position the head/arm over the proper track (into proper cylinder) Rotational latency: wait for the desired sector to rotate under the read/write head Transfer time: transfer a block of bits (sector) under the read-write head Disk Latency = Queueing Time + Controller time + Seek Time + Rotation Time + Xfer Time Cylinder Request Software Queue (Device Driver) Hardware Controller Media Time (Seek+Rot+Xfer) Result Highest Bandwidth: transfer large group of blocks sequentially from one track

Typical Numbers of a Magnetic Disk Average seek time as reported by the industry: Typically in the range of 8 ms to 12 ms Due to locality of disk reference may only be 25% to 33% of the advertised number Rotational Latency: Most disks rotate at 3,600 to 7200 RPM (Up to 15,000RPM or more) Approximately 16 ms to 8 ms per revolution, respectively An average latency to the desired information is halfway around the disk: 8 ms at 3600 RPM, 4 ms at 7200 RPM Transfer Time is a function of: Transfer size (usually a sector): 1 KB / sector Rotation speed: 3600 RPM to 15000 RPM Recording density: bits per inch on a track Diameter: ranges from 1 in to 5.25 in Typical values: 2 to 50 MB per second Controller time? Depends on controller hardware need to examine each case individually

Arrivals Introduction to Queuing Theory Queue Queuing System Departures What about queuing time?? Let s apply some queuing theory Queuing Theory applies to long term, steady state behavior Arrival rate = Departure rate Little s Law: Mean # tasks in system = arrival rate x mean response time Observed by many, Little was first to prove Simple interpretation: you should see the same number of tasks in queue when entering as when leaving. Applies to any system in equilibrium, as long as nothing in black box is creating or destroying tasks Typical queuing theory doesn t deal with transient behavior, only steady-state behavior Controller Disk

Disk Scheduling Disk can do only one request at a time; What order do you choose to do queued requests? User Requests 2,2 5,2 7,2 3,10 FIFO Order Fair among requesters, but order of arrival may be to random spots on the disk Very long seeks SSTF: Shortest seek time first 2,1 Pick the request that s closest on the disk Although called SSTF, today must include rotational delay in calculation, since rotation can be as long as seek Con: SSTF good at reducing seeks, but may lead to starvation SCAN: Implements an Elevator Algorithm: take the closest request in the direction of travel No starvation, but retains flavor of SSTF S-SCAN: Circular-Scan: only goes in one direction Skips any requests on the way back Fairer than SCAN, not biased towards pages in middle 2,3 Head 2 3 1 4 Disk Head

Building a File System File System: Layer of OS that transforms block interface of disks (or other block devices) into Files, Directories, etc. File System Components Disk Management: collecting disk blocks into files Naming: Interface to find files by name, not by blocks Protection: Layers to keep data secure Reliability/Durability: Keeping of files durable despite crashes, media failures, attacks, etc User vs. System View of a File User s view: Durable Data Structures System s view (system call interface): Collection of Bytes (UNIX) Doesn t matter to system what kind of data structures you want to store on disk! System s view (inside OS): Collection of blocks (a block is a logical transfer unit, while a sector is the physical transfer unit) Block size sector size; in UNIX, block size is 4KB

Translating from User to System View File System What happens if user says: give me bytes 2 12? Fetch block corresponding to those bytes Return just the correct portion of the block What about: write bytes 2 12? Fetch block Modify portion Write out Block Everything inside File System is in whole size blocks For example, getc(), putc() buffers something like 4096 bytes, even if interface is one byte at a time From now on, file is a collection of blocks

Disk Management Policies Basic entities on a disk: File: user-visible group of blocks arranged sequentially in logical space Directory: user-visible index mapping names to files Access disk as linear array of blocks. Two Options: Identify blocks as vectors [cylinder, surface, sector]. Sort in cylinder-major order. Not used much anymore. Logical Block Addressing (LBA). Every block has integer address from zero up to max number of cylinders. Controller translates from address physical position First case: OS/BIOS must deal with bad blocks Second case: hardware shields OS from structure of disk Need way to track free disk blocks Link free blocks together too slow today Use bitmap to represent free space on disk Need way to structure files: File Header Track which blocks belong at which offsets within the logical file structure Optimize placement of files disk blocks to match access and usage patterns

Designing the File System: Access Patterns How do users access files? Need to know type of access patterns user is likely to throw at system Sequential Access: bytes read in order ( give me the next X bytes, then give me next, etc ) Almost all file access are of this flavor Random Access: read/write element out of middle of array ( give me bytes i j ) Less frequent, but still important. For example, virtual memory backing file: page of memory stored in file Want this to be fast don t want to have to read all bytes to get to the middle of the file Content-based Access: ( find me 100 bytes starting with ABCD ) Example: employee records once you find the bytes, increase my salary by a factor of 2 Many systems don t provide this; instead, databases are built on top of disk access to index content (requires efficient random access)

Designing the File System: Usage Patterns Most files are small (for example,.login,.c files) A few files are big images, videos However, most files are small.class s,.o s,.c s, etc. Large files use up most of the disk space and bandwidth to/from disk May seem contradictory, but a few enormous files are equivalent to an immense # of small files Although we will use these observations, beware usage patterns: Good idea to look at usage patterns: beat competitors by optimizing for frequent patterns Except: changes in performance or cost can alter usage patterns. Maybe UNIX has lots of small files because big files are really inefficient?

How to organize files on disk Goals: Maximize sequential performance Easy random access to file Easy management of file (growth, truncation, etc) First Technique: Continuous Allocation Use continuous range of blocks in logical block space Analogous to base+bounds in virtual memory User says in advance how big file will be (disadvantage) Search bit-map for space using best fit/first fit What if not enough contiguous space for new file? File Header Contains: First sector/lba in file File size (# of sectors) Pros: Fast Sequential Access, Easy Random access Cons: External Fragmentation/Hard to grow files Free holes get smaller and smaller Could compact space, but that would be really expensive Continuous Allocation used by IBM 360 Result of allocation and management cost: People would create a big file, put their file in the middle

How to organize files on disk Second Technique: Linked List Approach Each block, pointer to next on disk File Header Null Pros: Can grow files dynamically, Free list same as file Cons: Bad Sequential Access (seek between each block), Unreliable (lose block, lose rest of file) Serious Con: Bad random access!!!! Technique originally from Alto (First PC, built at Xerox) No attempt to allocate contiguous blocks

Linked Allocation: File-Allocation Table (FAT) MSDOS links pages together to create a file Links not in pages, but in the File Allocation Table (FAT) FAT contains an entry for each block on the disk FAT Entries corresponding to blocks of file linked together Access properies: Sequential access expensive unless FAT cached in memory Random access expensive always, but really expensive if FAT not cached in memory

How to Organize Files on Disk Third Technique: Indexed Files System Allocates file header block to hold array of pointers big enough to point to all blocks User pre-declares max file size; Pros: Can easily grow up to space allocated for index Random access is fast Cons: Clumsy to grow file bigger than table size Still lots of seeks: blocks may be spread over disk

Multilevel Indexed Files (UNIX 4.1) Multilevel Indexed Files: Like multilevel address translation (from UNIX 4.1 BSD) Key idea: efficient for small files, but still allow big files File header contains 13 pointers Fixed size table, pointers not all equivalent This header is called an inode in UNIX File Header format: First 10 pointers are to data blocks Block 11 points to indirect block containing 256 blocks Block 12 points to doubly indirect block containing 256 indirect blocks for total of 64K blocks Block 13 points to a triply indirect block (16M blocks) Discussion Basic technique places an upper limit on file size that is approximately 16Gbytes Designers thought this was bigger than anything anyone would need. Much bigger than a disk at the time Fallacy: today, EOS producing 2TB of data per day Pointers get filled in dynamically: need to allocate indirect block only when file grows > 10 blocks. On small files, no indirection needed

Example of Multilevel Indexed Files Sample file in multilevel indexed format: How many accesses for block #23? (assume file header accessed on open)? Two: One for indirect block, one for data How about block #5? One: One for data Block #340? Three: double indirect block, indirect block, and data UNIX 4.1 Pros and cons Pros: Simple (more or less) Files can easily expand (up to a point) Small files particularly cheap and easy Cons: Lots of seeks Very large files must read many indirect block (four I/Os per block!)

Attack of the Rotational Delay Problem: Missing blocks due to rotational delay Issue: Read one block, do processing, and read next block. In meantime, disk has continued turning: missed next block! Need 1 revolution/block! Skip Sector Track Buffer (Holds complete track) Solution1: Skip sector positioning ( interleaving:) Place the blocks from one file on every other block of a track: give time for processing to overlap rotation Solution2: Read ahead: read next block right after first, even if application hasn t asked for it yet. This can be done either by OS (read ahead) By disk itself (track buffers). Many disk controllers have internal RAM that allows them to read a complete track Important Aside: Modern disks+controllers do many complex things under the covers Track buffers, elevator algorithms, bad block filtering

How do we actually access files? All information about a file contained in its file header UNIX calls this an inode Inodes are global resources identified by index ( inumber ) Once you load the header structure, all the other blocks of the file are locatable Question: how does the user ask for a particular file? One option: user specifies an inode by a number (index). Imagine: open( 14553344 ) Better option: specify by textual name Have to map name inumber Another option: Icon This is how Apple made its money. Graphical user interfaces. Point to a file and click. Naming: The process by which a system translates from user-visible names to system resources In the case of files, need to translate from strings (textual names) or icons to inumbers/inodes For global file systems, data may be spread over globe need to translate from strings or icons to some combination of physical server location and inumber

Directories Directories Hierarchical name space: Files named by ordered set (e.g.: /programs/p/list) Directories: a special type of relation Just a table of (file name, inumber) pairs Question: how is the relation stored? Directories often stored just like files Can store inumber for directories or files in other directories Question: how is the directory structured? Needs to be quickly searchable!

Where are inodes stored? In early UNIX and DOS/Windows FAT file system, headers stored in special array in outermost cylinders Header not stored anywhere near the data blocks. To read a small file, seek to get header, seek back to data. Fixed size, set when disk is formatted. At formatting time, a fixed number of inodes were created (They were each given a unique number, called an inumber ) Later versions of UNIX moved the header information to be closer to the data blocks Often, inode for file stored in same cylinder group as parent directory of the file (makes an ls of that directory run fast). Pros: Reliability: whatever happens to the disk, you can find all of the files (even if directories might be disconnected) UNIX BSD 4.2 puts a portion of the file header array on each cylinder. For small directories, can fit all data, file headers, etc in same cylinder no seeks! File headers much smaller than whole block (a few hundred bytes), so multiple headers fetched from disk at same time

Summary I/O Controllers: Hardware that controls actual device Processor Accesses through I/O instructions, load/store to special physical memory Report their results through either interrupts or a status register that processor looks at occasionally (polling) Disk Performance: Queuing time + Controller + Seek + Rotational + Transfer Rotational latency: on average ½ rotation Transfer time: spec of disk depends on rotation speed and bit storage density Queuing Latency: M/M/1 and M/G/1 queues: simplest to analyze As utilization approaches 100%, latency T q = T ser x ½(1+C) x u/(1 u)) File System: Transforms blocks into Files and Directories Optimize for access and usage patterns Maximize sequential access, allow efficient random access

Directories Directory: a relation used for naming Just a table of (file name, inumber) pairs How are directories constructed? Directories often stored in files Reuse of existing mechanism Directory named by inode/inumber like other files Needs to be quickly searchable Options: Simple list or Hashtable Can be cached into memory in easier form to search How are directories modified? System calls for manipulation: mkdir, rmdir Ties to file creation/destruction On creating a file by name, new inode grabbed and associated with new file in particular directory Directory Hierarchy Directories organized into a hierarchical structure Seems standard, but in early 70s it wasn t Permits much easier organization of data structures Entries in directory can be either files or directories Files named by ordered set (e.g.: /programs/p/list)

Directory Structure Directory Structure Not really a hierarchy! Many systems allow directory structure to be organized as an acyclic graph or even a (potentially) cyclic graph Hard Links: different names for the same file Multiple directory entries point at the same file Soft Links: shortcut pointers to other files Implemented by storing the logical name of actual file Name Resolution: The process of converting a logical name into a physical resource (like a file) Traverse succession of directories until reach target file Global file system: May be spread across the network

Directory Structure (Con t) How many disk accesses to resolve /avi/book/count? Read in file header for root (fixed spot on disk) Read in first data bock for root Table of file name/index pairs. Search linearly ok since directories typically very small Read in file header for avi Read in first data block for avi ; search for book Read in file header for book Read in first data block for book ; search for count Read in file header for count Current working directory: Per-address-space pointer to a directory (inode) used for resolving file names Allows user to specify relative filename instead of absolute path (say CWD= /avi/book can resolve count )

Where are inodes stored? In early UNIX and DOS/Windows FAT file system, headers stored in special array in outermost cylinders Header not stored near the data blocks. To read a small file, seek to get header, seek back to data. Fixed size, set when disk is formatted. At formatting time, a fixed number of inodes were created (They were each given a unique number, called an inumber ) Later versions of UNIX moved the header information to be closer to the data blocks Often, inode for file stored in same cylinder group as parent directory of the file (makes an ls of that directory run fast). Pros: UNIX BSD 4.2 puts a portion of the file header array on each cylinder. For small directories, can fit all data, file headers, etc in same cylinder no seeks! File headers much smaller than whole block (a few hundred bytes), so multiple headers fetched from disk at same time Reliability: whatever happens to the disk, you can find many of the files (even if directories disconnected)

In-Memory File System Structures Open system call: Resolves file name, finds file control block (inode) Makes entries in per-process and system-wide tables Returns index (called file handle ) in open-file table Read/write system calls: Use file handle to locate inode Perform appropriate reads or writes

File System Caching Key Idea: Exploit locality by caching data in memory Name translations: Mapping from paths inodes Disk blocks: Mapping from block address disk content Buffer Cache: Memory used to cache kernel resources, including disk blocks and name translations Can contain dirty blocks (blocks yet on disk) Replacement policy? LRU Can afford overhead of timestamps for each disk block Advantages: Works very well for name translation Works well in general as long as memory is big enough to accommodate a host s working set of files. Disadvantages: Fails when some application scans through file system, thereby flushing the cache with data used only once Example: find. exec grep foo {} \; Other Replacement Policies? Some systems allow applications to request other policies Example, Use Once : File system can discard blocks as soon as they are used

File System Caching (con t) Cache Size: How much memory should the OS allocate to the buffer cache vs virtual memory? Too much memory to the file system cache won t be able to run many applications at once Too little memory to file system cache many applications may run slowly (disk caching not effective) Solution: adjust boundary dynamically so that the disk access rates for paging and file access are balanced Read Ahead Prefetching: fetch sequential blocks early Key Idea: exploit fact that most common file access is sequential by prefetching subsequent disk blocks ahead of current read request (if they are not already in memory) Elevator algorithm can efficiently interleave groups of prefetches from concurrent applications How much to prefetch? Too many imposes delays on requests by other applications Too few causes many seeks (and rotational delays) among concurrent file requests

File System Caching (con t) Delayed Writes: Writes to files not immediately sent out to disk Instead, write() copies data from user space buffer to kernel buffer (in cache) Enabled by presence of buffer cache: can leave written file blocks in cache for a while If some other application tries to read data before written to disk, file system will read from cache Flushed to disk periodically (e.g. in UNIX, every 30 sec) Advantages: Disk scheduler can efficiently order lots of requests Disk allocation algorithm can be run with correct size value for a file Some files need never get written to disk! (e..g temporary scratch files written /tmp often don t exist for 30 sec) Disadvantages What if system crashes before file has been written out? Worse yet, what if system crashes before a directory file has been written out? (lose pointer to inode!)

Important ilities Availability: the probability that the system can accept and process requests Often measured in nines of probability. So, a 99.9% probability is considered 3-nines of availability Key idea here is independence of failures Durability: the ability of a system to recover data despite faults This idea is fault tolerance applied to data Doesn t necessarily imply availability: information on pyramids was very durable, but could not be accessed until discovery of Rosetta Stone Reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time (IEEE definition) Usually stronger than simply availability: means that the system is not only up, but also working correctly Includes availability, security, fault tolerance/durability Must make sure data survives system crashes, disk crashes, other problems

How to make file system durable? Disk blocks contain Reed-Solomon error correcting codes (ECC) to deal with small defects in disk drive Can allow recovery of data from small media defects Make sure writes survive in short term Either abandon delayed writes or use special, battery-backed RAM (called non-volatile RAM or NVRAM) for dirty blocks in buffer cache. Make sure that data survives in long term Need to replicate! More than one copy of data! Important element: independence of failure Could put copies on one disk, but if disk head fails Could put copies on different disks, but if server fails Could put copies on different servers, but if building is struck by lightning. Could put copies on servers in different continents RAID: Redundant Arrays of Inexpensive Disks Data stored on multiple disks (redundancy) Either in software or hardware In hardware case, done by disk controller; file system may not even know that there is more than one disk in use

Hardware RAID: Subsystem Organization CPU host adapter array controller single board disk controller manages interface to host, DMA control, buffering, parity logic physical device control Some systems duplicate all hardware, namely controllers, busses, etc. single board disk controller single board disk controller single board disk controller often piggy-backed in small format devices

RAID 1: Disk Mirroring/Shadowing recovery group Each disk is fully duplicated onto its "shadow For high I/O rate, high availability environments Most expensive solution: 100% capacity overhead Bandwidth sacrificed on write: Logical write = two physical writes Highest bandwidth when disk heads and rotation fully synchronized (hard to do exactly) Reads may be optimized Can have two independent reads to same data Recovery: Disk failure replace disk and copy data to new disk Hot Spare: idle disk already attached to system to be used for immediate replacement

RAID 5+: High I/O Rate Parity Data stripped across multiple disks Successive blocks stored on successive (non-parity) disks Increased bandwidth over single disk Parity block (in green) constructed by XORing data bocks in stripe P0=D0 D1 D2 D3 Can destroy any one disk and still reconstruct data Suppose D3 fails, then can reconstruct: D3=D0 D1 D2 P0 D0 D1 D2 D3 P0 D4 D5 D6 P1 D7 D8 D9 P2 D10 D11 D12 P3 D13 D14 D15 P4 D16 D17 D18 D19 D20 D21 D22 D23 P5 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Stripe Unit Increasing Logical Disk Addresses

Remote File Systems: Virtual File System (VFS) VFS: Virtual abstraction similar to local file system Instead of inodes has vnodes Compatible with a variety of local and remote file systems provides object-oriented way of implementing file systems VFS allows the same system call interface (the API) to be used for different types of file systems The API is to the VFS interface, rather than any specific type of file system

Network File System (NFS) Three Layers for NFS system UNIX file-system interface: open, read, write, close calls + file descriptors VFS layer: distinguishes local from remote files Calls the NFS protocol procedures for remote requests NFS service layer: bottom layer of the architecture Implements the NFS protocol NFS Protocol: remote procedure calls (RPC) for file operations on server Reading/searching a directory manipulating links and directories accessing file attributes/reading and writing files NFS servers are stateless; each request provides all arguments require for execution Modified data must be committed to the server s disk before results are returned to the client lose some of the advantages of caching Can lead to weird results: write file on one client, read on other, get old data

Schematic View of NFS Architecture

Summary Multilevel index files Inode contains pointers to actual blocks, indirect blocks, double indirect blocks, etc Optimizations for sequential access: start new files in open ranges of free blocks Rotational Optimization Naming: act of translating from user-visible names to actual system resources Directories used for naming for local file systems Important system properties Availability: how often is the resource available? Durability: how well is data preserved against faults? Reliability: how often is resource performing correctly? RAID: Redundant Arrays of Inexpensive Disks RAID1: mirroring, RAID5: Parity block VFS: Virtual File System layer NFS: An example use of the VFS layer