Operating Systems. Operating Systems Professor Sina Meraji U of T

Similar documents
CSC369 Lecture 9. Larry Zhang, November 16, 2015

CSE 451: Operating Systems Winter Lecture 12 Secondary Storage. Steve Gribble 323B Sieg Hall.

CSE 451: Operating Systems Winter Secondary Storage. Steve Gribble. Secondary storage

Chapter 11: File System Implementation. Objectives

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

I/O & Storage. Jin-Soo Kim ( Computer Systems Laboratory Sungkyunkwan University

CSE 153 Design of Operating Systems

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

CSE 451: Operating Systems Spring Module 12 Secondary Storage

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

CSE 120 Principles of Operating Systems

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File System Implementations

UNIX File Systems. How UNIX Organizes and Accesses Files on Disk

Operating Systems. File Systems. Thomas Ropars.

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

CSE 153 Design of Operating Systems

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Lecture S3: File system data layout, naming

EECS 482 Introduction to Operating Systems

CS 537 Fall 2017 Review Session

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

Disk Scheduling COMPSCI 386

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

File Systems. What do we need to know?

Fast File System (FFS)

W4118 Operating Systems. Instructor: Junfeng Yang

22 File Structure, Disk Scheduling

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

OPERATING SYSTEM. Chapter 12: File System Implementation

CS420: Operating Systems. Mass Storage Structure

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

FFS: The Fast File System -and- The Magical World of SSDs

File Systems. Chapter 11, 13 OSPP

Local File Stores. Job of a File Store. Physical Disk Layout CIS657

CSE 120 Principles of Operating Systems

Chapter 11: Implementing File Systems

CS3600 SYSTEMS AND NETWORKS

COMP 530: Operating Systems File Systems: Fundamentals

CS 4284 Systems Capstone

File Systems Management and Examples

File System Implementation

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

CSE 153 Design of Operating Systems Fall 2018

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Computer Systems Laboratory Sungkyunkwan University

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

CS 111. Operating Systems Peter Reiher

File System Internals. Jo, Heeseung

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Chapter 11: Implementing File

ECE 650 Systems Programming & Engineering. Spring 2018

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

SCSI overview. SCSI domain consists of devices and an SDS

File Systems II. COMS W4118 Prof. Kaustubh R. Joshi hdp://

File Systems: Fundamentals

[537] Fast File System. Tyler Harter

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

I/O and file systems. Dealing with device heterogeneity

Chapter 12: File System Implementation

File Systems: Fundamentals

File Systems. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

Introduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

File Systems. Before We Begin. So Far, We Have Considered. Motivation for File Systems. CSE 120: Principles of Operating Systems.

19 File Structure, Disk Scheduling

Main Points. File layout Directory layout

CS510 Operating System Foundations. Jonathan Walpole

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Secondary Storage (Chp. 5.4 disk hardware, Chp. 6 File Systems, Tanenbaum)

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

Chapter 12: File System Implementation

File Layout and Directories

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. February 22, Prof. Joe Pasquale

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Chapter 10: File System Implementation

File System Implementation

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]

OPERATING SYSTEMS CS136

Introduction. Secondary Storage. File concept. File attributes

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

Page 1. Recap: File System Goals" Recap: Linked Allocation"

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Chapter 11: Implementing File Systems

Transcription:

Operating Systems Operating Systems Professor Sina Meraji U of T

How are file systems implemented?

File system implementation Files and directories live on secondary storage Anything outside of primary memory Anything that does not permit direct instruction execution or data fetch via machine load/store instructions Is persistent: data survives loss of power We are focusing on the use of fixed hard magnetic disks for implementing secondary storage

Disk Components Read/Write Head Seek Arm Upper Surface Platter Lower Surface Cylinder Track Sector Actuator Rotation

Disk service time components After BLUE read Seek for RED Rotational latency After RED read Sequential access is a lot faster than random access

Mixing workloads can be tricky Example scenario: Suppose there are two processes Each run in isolation (by itself) gets 20 MB/s disk throughput If you run the two processes simultaneously each gets 2 MB/s What happened?

Components of disk access time Disk request performance depends on three steps Seek moving the disk arm to the correct cylinder Depends on how fast disk arm can move (increasing very slowly) Rotation waiting for the sector to rotate under the head Depends on rotation rate of disk (increasing, but slowly) Transfer transferring data from surface into disk controller electronics, sending it back to the host Depends on density (increasing quickly) How long does this typically take?

Disks are slow Seek times: 1-15ms, depending on distance average 5-6ms improving at 7-10% per year Rotation speeds: ~7200 RPMs for cheap SATA disks 10,000-15,000 RPMs for high-end SCSI disks average latency of 3ms improving at 7-10% per year This is slow!!! Sequential access much faster than random!

OS design principles Since disk I/O is slow: Minimize number of disk accesses E.g. by caching When using the disk try to minimize access cost Particularly seeks and rotation Use smart disk request scheduling Arrange data for sequential access over random access

Disks are messy Disks are messy physical devices: Errors, bad blocks, missed seeks, etc. The job of the OS is to hide this mess from higher level software Low-level device control (initiate a disk read, etc.) Higher-level abstractions (files, databases, etc.)

OS disk interaction Specifying disk requests requires a lot of info: Cylinder #, surface #, track #, sector #, transfer size Modern disks are even more complicated Not all tracks have the same number of sectors, sectors are remapped, etc. Older disks required the OS to specify all of this The OS needed to know all disk parameters Fortunately modern drives provide a more high-level interface: logical block addressing

Logical block addresssing 0 1 2 8 19 OS s view of storage device Storage exposed as linear array of blocks Common block size: 512 bytes The disk exports its data as logical array of blocks [0 N] Disk maps logical blocks to cylinder/surface/track/sector Only need to specify the logical block # to read/write But now the disk parameters are hidden from the OS

Disk Scheduling Because seeks are so expensive, the OS tries to schedule disk requests that are queued waiting for the disk FCFS (do nothing) Reasonable when load is low Long waiting times for long request queues SSTF (shortest seek time first) Minimize arm movement (seek time), maximize request rate Favors middle blocks SCAN (elevator) Service requests in one direction until done, then reverse C-SCAN Like SCAN, but only go in one direction (typewriter) LOOK / C-LOOK Like SCAN/C-SCAN but only go as far as last request in each direction (not full width of the disk)

Disk Scheduling (2) In general, unless there are request queues, disk scheduling does not have much impact Important for servers, less so for PCs Modern disks often do the disk scheduling themselves Disks know their layout better than OS, can optimize better Ignores, undoes any scheduling done by OS

Back to files and directories How does the OS implement the abstraction of files and directories on top of this logical array of disk blocks?

Disk Layout Strategies Files span multiple disk blocks How do you find all of the blocks for a file? 1. Contiguous allocation Like memory Fast, simplifies directory access Inflexible, causes fragmentation, needs compaction 2. Linked, or chained, structure Each block points to the next, directory points to the first Good for sequential access, bad for all others 3. Indexed structure (indirection, hierarchy) An index block contains pointers to many other blocks Handles random better, still good for sequential May need multiple index blocks (linked together)

Indexed Allocation: Unix Inodes Unix inodes implement an indexed structure for files Each inode contains 15 block pointers First 12 are direct block pointers (e.g., 4 KB data blocks) Then single, double, and triple indirect 0 1 12 13 14

Unix Inodes and Path Search Example: We need to find the first data block for the file /one.txt Remember: Unix Inodes are not directories They describe where on the disk the blocks for a file are placed Directories are files, so inodes also describe where the blocks for directories are placed on the disk

Unix Inodes and Path Search Directory entries map file names to inodes To open /one.txt, use Master Block to find inode for / on disk and read inode into memory inode allows us to find data block for directory / Read /, look for entry for one.txt This entry gives locates the inode for one.txt Read the inode for one into memory The inode says where first data block is on disk Read that block into memory to access the data in the file

File System Implementation A Master Block determines location of root directory (aka partition control block, superblock) A free map determines which blocks are free, allocated Remaining disk blocks used to store files (and dirs) There are many ways to do this Superblock Bitmap Default usage of LBN space Space to store files and directories

Original Unix File System Recall FS sees storage as linear array of blocks Each block has a logical block number (LBN) Superblock Default usage of LBN space Bitmap Inodes Data Blocks Simple, straightforward implementation Easy to implement and understand Problems: Poor utilization of disk bandwidth (lots of seeking). Why???

Data and Inode Placement Original Unix FS had two placement problems: 1. Data blocks allocated randomly in aging file systems Blocks for the same file allocated sequentially when FS is new As FS ages and fills, need to allocate into blocks freed up when other files are deleted Problem: Deleted files essentially randomly placed So, blocks for new files become scattered across the disk 2. Inodes allocated far from blocks All inodes at beginning of disk, far from data Traversing file name paths, manipulating files, directories requires going back and forth from inodes to data blocks Both of these problems generate many long seeks

FFS BSD Unix folks did a redesign (early-mid 80s) that they called the Fast File System (FFS) Improved disk utilization, decreased response time Now the FS from which all other Unix FS s have been compared Good example of being device-aware for performance

Cylinder Groups BSD FFS addressed placement problems using the notion of a cylinder group (aka allocation groups in lots of modern FS s) Disk partitioned into groups of cylinders Data blocks in same file allocated in same cylinder group Files in same directory allocated in same cylinder group Inodes for files allocated in same cylinder group as file data blocks Superblock Cylinder Group Cylinder group organization

Cylinder Groups (continued) Allocation in cylinder groups provides closeness Reduces number of long seeks Free space requirement To be able to allocate according to cylinder groups, the disk must have free space scattered across cylinders 10% of the disk is reserved just for this purpose If preferred cylinder group is full, allocate from a nearby group

Space Allocation in Cylinder Groups Allocation Bitmap If possible do allocation in groups of 8 blocks (bytes in bitmap) Find first byte that s all zero Backtrack the bits before to check for zero

Space Allocation in Cylinder Groups Allocation Bitmap If possible do allocation in groups of 8 blocks (bytes in bitmap) Find first byte that s all zero Backtrack the bits before to check for zero

More FFS solutions Small blocks (1K) in orig. Unix FS caused 2 problems: Low bandwidth utilization Small max file size (function of block size) Fix using a larger block (4K) New Problem: internal fragmentation

The Linux Second Extended File System (EXT2) Disk layout of the Linux ext2 file system. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639

The Linux Second Extended File System (EXT2) (a) A Linux directory with three files. (b) The same directory after the file voluminous has been removed. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639

The Linux Second Extended File System (EXT2) The relation between the file descriptor table, the open file description table, and the i-node table. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639

FFS: Consistency Issues Inodes: fixed size structure stored in cylinder groups Superblock Inodes Cylinder Group Metadata updates are synchronous operations: Write newly allocated inode to disk before its name is entered in a directory. Remove a directory name before the inode is deallocated Write a deallocated inode to disk before its blocks are placed into the cylinder group free list. Some updates create cyclical dependences

FFS Observation 1 If the server crashes in between any of these synchronous operations, then the file system is in an inconsistent state. Solutions: fsck post-crash recovery process to scan file system structure and restore consistency Log updates to enable roll-back or roll-forward.

FFS Observation 2 The performance of FFS is optimized for disk block clustering, using properties of the disk to inform file system layout Observation: Memory is now large enough that most of the reads that go to the disk are the first read of a file. Subsequent reads are satisfied in memory by file buffer cache. I.e., there is no performance problem with reads. But write calls could be made faster. Writes are not well-clustered, they include inodes and data blocks.

Log Structured File System (LSF) Ousterhout 1989 Write all file system data in a continuous log. Uses inodes and directories from FFS Needs an inode map to find the inodes An inode number is no longer a simple index. Cleaner reclaims space from overwritten or deleted blocks. superblock inodes summary data blocks

LFS Reads If the writes are easy, what happens to the reads? To read a file from disk: 1. Read the superblock to find the index file 2. Read the index file (linear search on block of inodes) 3. Use the disk address in inode to read the block of index file containing the inode-map 4. Get the file s inode 5. Use the inode as usual to find the file s data blocks But remember, we expect reads to hit in memory most of the time.

NTFS (Windows) The New Technology File System (NTFS) from Microsoft replaced the old FAT file system. The designers had the following goals: 1. Eliminate fixed-size short names 2. Implement a more thorough permissions scheme 3. Provide good performance 4. Support large files 5. Provide extra functionality: Compression Encryption Types In other words, they wanted a file system flexible enough to support future needs.

NTFS Each volume (partition) is a linear sequence of blocks (usually 4 Kb block size). Each volume has a Master File Table (MFT). Sequence of 1 KB records. One or more record per file or directory Similar to inodes, but more flexible Each MFT record is a sequence of variable length (attribute, value) pairs. Long attributes can be stored externally, and a pointer kept in the MFT record. NTFS tries to allocate files in runs of consecutive blocks.

MFT Record Standard info header File name header Data header Header Run 1 Run 2 Run 3 Record header Standard Info File Name 0 9 20 4 64 2 80 3 Disk blocks Block numbers 20-23 64-65 80-82 An MFT record for a 3-run 9-block file. Each data attribute indicates the starting block and the number of blocks in a run (or extent) If all the records don t fit into one MFT record, extension records can be used to hold more.

MFT Record for a Small Directory Standard info header File name header A directory entry contains the MFT index for the file the length of the file name, the file name itself, and various fields and flags. Record header Standard Info Directory entries are stored as a simple list Large directories use B+ trees instead.

Better I/O performance through parallelism Idea: Spread the work across several disks RAID: Redundant Arrays of Inexpensive Disks While we re at it, why not also increase reliability.

Better I/O performance through parallelism Idea: Spread the work across several disks RAID: Redundant Arrays of Inexpensive Disks While we re at it, why not also increase reliability.

Better I/O performance through parallelism Idea: Spread the work across several disks RAID: Redundant Arrays of Inexpensive Disks While we re at it, why not also increase reliability.