To understand this, let's build a layered model from the bottom up. Layers include: device driver filesystem file

Similar documents
Physical and Logical structure. Thursday, December 02, 2004

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

Basic filesystem concepts. Tuesday, November 22, 2011

Operating Systems. Operating Systems Professor Sina Meraji U of T

CSC369 Lecture 9. Larry Zhang, November 16, 2015

I/O and file systems. Dealing with device heterogeneity

Week 10 Project 3: An Introduction to File Systems. Classes COP4610 / CGS5765 Florida State University

Operating System Concepts Ch. 11: File System Implementation

Project 3: An Introduction to File Systems. COP4610 Florida State University

Operating Systems. File Systems. Thomas Ropars.

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 537 Fall 2017 Review Session

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

CS370 Operating Systems

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

File Systems. File system interface (logical view) File system implementation (physical view)

Disk Scheduling COMPSCI 386

Logical disks. Bach 2.2.1

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Secondary Storage (Chp. 5.4 disk hardware, Chp. 6 File Systems, Tanenbaum)

Case study: ext2 FS 1

Introduction. Secondary Storage. File concept. File attributes

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Case study: ext2 FS 1

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

CS370 Operating Systems

mode uid gid atime ctime mtime size block count reference count direct blocks (12) single indirect double indirect triple indirect mode uid gid atime

Chapter 11: Implementing File Systems

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

File System Structure. Kevin Webb Swarthmore College March 29, 2018

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

UNIX File System. UNIX File System. The UNIX file system has a hierarchical tree structure with the top in root.

Operating Systems Design Exam 2 Review: Fall 2010

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

CS 4284 Systems Capstone

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved

Project 3: An Introduction to File Systems. COP 4610 / CGS 5765 Principles of Operating Systems

File Systems: Fundamentals

ECE 650 Systems Programming & Engineering. Spring 2018

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Caching and reliability

CS 318 Principles of Operating Systems

INTERNAL REPRESENTATION OF FILES:

CSE380 - Operating Systems. Communicating with Devices

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay

UNIX File Systems. How UNIX Organizes and Accesses Files on Disk

File Systems: Consistency Issues

CS5460: Operating Systems Lecture 20: File System Reliability

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

The UNIX File System

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

19 File Structure, Disk Scheduling

Disc Allocation and Disc Arm Scheduling?

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Local File Stores. Job of a File Store. Physical Disk Layout CIS657

we are here I/O & Storage Layers Recall: C Low level I/O Recall: C Low Level Operations CS162 Operating Systems and Systems Programming Lecture 18

File Systems. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

File Systems: Fundamentals

COMP 530: Operating Systems File Systems: Fundamentals

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

4/19/2016. The ext2 file system. Case study: ext2 FS. Recap: i-nodes. Recap: i-nodes. Inode Contents. Ext2 i-nodes

Chapter 11: File System Implementation. Objectives

"what you don't know, that can't kill you (but can kill your performance)"

File systems CS 241. May 2, University of Illinois

CSE 153 Design of Operating Systems Fall 2018

we are here Page 1 Recall: How do we Hide I/O Latency? I/O & Storage Layers Recall: C Low level I/O

CS370 Operating Systems

EECS 482 Introduction to Operating Systems

FFS: The Fast File System -and- The Magical World of SSDs

Fully journaled filesystems. Low-level virtualization Filesystems on RAID Filesystems on Flash (Filesystems on DVD)

CS510 Operating System Foundations. Jonathan Walpole

22 File Structure, Disk Scheduling

File Systems. What do we need to know?

Principles of Operating Systems CS 446/646

Introduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

CSE 333 Lecture 9 - storage

Chapter 10: Mass-Storage Systems

Filesystems Overview

CS370 Operating Systems

W4118 Operating Systems. Instructor: Junfeng Yang

CS3600 SYSTEMS AND NETWORKS

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Advanced UNIX File Systems. Berkley Fast File System, Logging File System, Virtual File Systems

MODULE 4. FILE SYSTEM AND SECONDARY STORAGE

File Layout and Directories

Epilogue. Thursday, December 09, 2004

Hard facts. Hard disk drives

CS 4284 Systems Capstone

CSE506: Operating Systems CSE 506: Operating Systems

TCSS 422: OPERATING SYSTEMS

Chapter 12: File System Implementation

File System Code Walkthrough

Transcription:

Disks_and_Layers Page 1 So what is a file? Tuesday, November 17, 2015 1:23 PM This is a difficult question. To understand this, let's build a layered model from the bottom up. Layers include: device driver filesystem file

Disks_and_Layers Page 2 Some basic facts about I/O Monday, November 20, 2017 9:39 AM Everything is by number, not name. Names are a convenience for us. Numbers are the actual identifiers. Example: /dev

Monday, November 20, 2017 9:41 AM Example: /dev A directory of pseudo-files that represent devices. Defines the name of each device. The device is actually known by a pair of numbers The major number is the offset into the driver table for its driver. The minor number determines options to the driver. Example: couchvm01{couch}1005: ls -l /dev/sda* brw-rw----. 1 root disk 8, 0 Oct 5 17:32 /dev/sda brw-rw----. 1 root disk 8, 1 Oct 5 17:32 /dev/sda1 brw-rw----. 1 root disk 8, 2 Oct 5 17:32 /dev/sda2 brw-rw----. 1 root disk 8, 3 Oct 5 17:32 /dev/sda3 How this is used: The output of df reads, in part: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 18447056 9955888 7531068 57% / devtmpfs 1931504 0 1931504 0% /dev... /dev/sda2 1998672 27252 1850180 2% /tmp /dev/sda3 4061888 1621340 2214500 43% /var... In short: Driver 8, minor number 1 corresponds to /dev/sda1, which is interpreted as the filesystem / (the root filesystem). Disks_and_Layers Page 3

I/O and disks Tuesday, December 01, 2009 1:49 PM I/O and disks So far, we've studied devices and major/minor numbers. This is a single-layer model. Disks have a multi-layer I/O model. Drivers talk to each other. open calls the filesystem driver, which calls the raw disk driver. The raw disk driver defines open, close, read, and write for disk blocks. The filesystem driver defines open, close, read, and write for files. Disks_and_Layers Page 4

Disks_and_Layers Page 5 Up the long ladder Thursday, November 17, 2011 11:19 AM A modern disk is a set of layers: The physical media. Geometry Sparing and defragmentation The raw device. Operations: write/read Paging Scheduling The filesystem device. The concept of a file. The concept of a directory.

Disks_and_Layers Page 6 Devices and Files Thursday, December 02, 2004 2:29 PM Ambiguity: use open() to open either devices or files File I/O: talks with a layer above device. Filesystem: a structure for internal format of disk devices Mounting: associate a disk with a place in the directory hierarchy. (opens the disk device!) mkfs/newfs: creates the filesystem on the raw disk device Unmounting: dissassociate disk and place in hierarchy. (closes the disk device!) Lots of layers between device and file access. Device I/O: talks with device directly. Sometimes called "raw device I/O". Strips layers away. Can access disk as raw device. Sometimes used to recover data from bad disks. A corrupt disk, when mounted, can crash the operating system!

Tracks and sectors Thursday, December 02, 2004 5:39 PM Disk structure (and illusions) Round platter Divided into tracks and sectors Tracks: offsets from center. Sectors: rotational offset. NOT! Disks_and_Layers Page 7

Disks Thursday, December 02, 2004 5:39 PM Disks_and_Layers Page 8

Disk geometry Thursday, December 02, 2004 5:39 PM Modern disks "geometry" is a baldfaced lie. Inside a disk: 1,000,000 lines of C code. Lots of internal tricks that are not visible to the OS: Sparing: replacing one sector with another Variable-sector maps: more sectors on outside than inside. Sparing: recovering (automatically) from disk errors Bad block: scratched disk platter or equivalent. Spare table: a special track or tracks used to replace bad blocks. Disks_and_Layers Page 9

Virtual geometry Thursday, December 02, 2004 5:39 PM Virtual geometry It is physically impossible to have the same number of sectors for every track. Outer tracks have more media than inner tracks. Thus there are more sectors at the edge than in the middle! Disks_and_Layers Page 10

How sparing works Thursday, December 02, 2004 5:39 PM How sparing works. Disks_and_Layers Page 11

How do disks fail? Thursday, November 15, 2012 3:45 PM How do disks fail? Globally: controller failure. Scratches: from dropping a spinning disk. Scratches affect a group of sectors. Disks_and_Layers Page 12

Disks_and_Layers Page 13 Disk paging Thursday, December 02, 2004 5:39 PM Very subtle point: Disks can only read and write blocks at a time. Processes often write parts of blocks to files. How to mediate? Ambiguity: Several blocking factors How big is a "page" of the filesystem? "page" =~ 8192 bytes How much of the disk can be read at a time? "sector" =~ 512 bytes Answer: make memory images of disk "blocks". This is called "the disk paging subsystem". Standardize on one page as the read/write unit. Ignore sectors.

Disks_and_Layers Page 14 The paging subsystem Thursday, November 17, 2011 10:57 AM The paging subsystem When one opens a file, allocate the file pages from a "page pool" of available kernel pages. When a process writes to the file, Read current page of file into page pool Perform change to page in page pool. When convenient, "flush" changes from page pool to file. Do this as late as possible. Page pool is A kernel array. Fixed-length. Allocated at boot time.

Disks_and_Layers Page 15 A big mystery Thursday, November 17, 2011 10:59 AM Big mystery: why can't one just turn the power off on a UNIX machine: Answer: disk image of page cache for files is never up to date. Page pool contains "dirty copies" of directory blocks file blocks inodes: where files are located A "dirty page" is out of sync between memory and disk. To turn a machine off, must synchronize page pool to disk image (command: sync) unix delegates page pool flushes to a separate process in user space: "update". Flushes do not happen when file is closed! Absolutely no way that a normal process can assure that its data is flushed to disk! shutdown a machine: shutdown now "for a quick reboot" If the machine is on fire: sync;sync followed by flipping off the power.

The page pool Thursday, December 02, 2004 5:39 PM Page pool in operation: Disks_and_Layers Page 16

Disks_and_Layers Page 17 Paging concepts Thursday, December 02, 2004 5:39 PM Paging concepts Dirty flag: marks each page as "clean" (no write, no need to flush) or "dirty" (need to flush) If not "dirty", can reuse page without flush => update is not called. If "dirty", must flush page before reuse => update has to do the flush.

Disks_and_Layers Page 18 Disk scheduling Thursday, December 02, 2004 5:39 PM Disk scheduling If processors are so fast, why is my computer so sloooow? Answer: disk is largest bottleneck in a modern system. Basic problem: disk geometry: Two phases to a disk read or write Seek: put head in the right track. Read or write: read or write a sector of the track. Two components to "seek" "Rotational delay": how long we have to wait for the sector to be under the head. Arm movement, A.K.A "offset": how long do we have to wait for the arm to be at the appropriate track?

Disks_and_Layers Page 19 Disk scheduling issues Thursday, November 17, 2011 11:03 AM Disk scheduling issues: In a multiprocessing system, processes contend for the same disk to read/write different data. The order in which seeks are made greatly determines throughput. Question: how should one schedule/prioritize/order requests from autonomous processes so that throughput is optimal? Better question: what is "optimal"?

Disks_and_Layers Page 20 Disk scheduling strategies Thursday, November 17, 2011 11:01 AM Disk scheduling strategies: Last-in, first-out: Optimal for sequential access. Not so optimal for random access. Shortest service-time first (SSTF) Keep queue of requests small. High utilization. SCAN: back and forth over disk Pattern accesses so that arm zig-zags. C-SCAN: sawtooth: forth with fast return. Lower variability in service with direction of scan. N-step scan: segment request queue, do round-robin of requests for each segment of tracks.

Disk schedules Thursday, December 02, 2004 5:39 PM Disks_and_Layers Page 21

Scheduling seek Thursday, November 15, 2012 5:10 PM Disks_and_Layers Page 22

Scan, c-scan, and n-segment scheduling Wednesday, November 19, 2014 6:29 PM Disks_and_Layers Page 23

Disks_and_Layers Page 24 Imparting meaning Tuesday, November 22, 2011 2:03 PM Imparting meaning In the preceding, we studied how blocks are written to disk. Now, we will impart meaning to the blocks and build something new: a filesystem.

Layers and meanings Thursday, November 17, 2011 11:04 AM Layers and meanings: So far, we have one meaning for disk data: blocks. We will build up the concept of files in layers. Each layer Imparts a new form of meaning. That is not known at lower layers. First layer: the raw disk Meaning: a physical location L on the disk contains data. Second layer: the raw device Meaning: maps logical block N -> physical location L Third layer: the filesystem Meaning: make a file out of a sequence of blocks. Give the file a unique number in the filesystem. Fourth layer: directories: Meaning: gives numbered files human-readable names. Disks_and_Layers Page 25

Disks_and_Layers Page 26 Components of a filesystem Thursday, November 15, 2012 2:30 PM Components of a filesystem: Inode: describes where files are Contains owner, group, protection. Does not contain names. Stored in a logical array. Indexed by number. Stored in sections of the disk called "inode groups". Multiple inodes/block. Block: describes content of a file. A file is logically a sequence of blocks. A logical linked list. Allocated only in whole-block units. Stored in sections of the disk called "block groups". Superblock: a special kind of block that describes the architecture of the disk and where to find inode groups, block groups, and the root directory (/). Directory: a special kind of file Contains name to inode mapping. Has its own inode.

Disks_and_Layers Page 27 The structure of a filesystem Thursday, November 15, 2012 3:49 PM Two concepts of structure of a filesystem Logical structure: how blocks are logically arranged to create files. Physical structure: how this maps to a device with block numbers. As usual: Logical structure is simple. Physical structure is complex. Reason: performance.

The logical structure of a linux file Thursday, November 15, 2012 2:13 PM The logical structure of a linux file Logically, a null-terminated linked list. Inode determines file properties. Blocks determine file contents. Minimum file size is one block, incremented in one block increments. Disks_and_Layers Page 28

The logical structure of the inode table Thursday, November 15, 2012 2:38 PM Inode table is logically an array. Each inode points to a linked list of blocks that is the file Disks_and_Layers Page 29

The logical structure of a directory Thursday, November 15, 2012 2:16 PM The logical structure of a directory Disks_and_Layers Page 30

Logical layout of a filesystem Thursday, December 03, 2009 11:01 AM Logical layout of an entire linux filesystem: Directories form a "tree". In each directory, ".." points to the parent directory. "." points to the current directory. Directories map names to inodes. Inodes contain access privileges, and point to blocks representing the file. A file or directory is a sequence of blocks. Disks_and_Layers Page 31

Goal: O(1) random access Thursday, November 15, 2012 5:26 PM Solution: Ditch the lists. Put an array in the inode. First few entries point to the first few blocks. The next entry points to a block that is -- in actuality -- an array of pointers to blocks. Disks_and_Layers Page 32

Disks_and_Layers Page 33 The strategy for storing big files Monday, November 20, 2017 5:19 PM Ext2: 13 pointers to blocks 1 points to a block that points to blocks. single indirection 1 points to a block that points to blocks that point to blocks. double indirection 1 points to a block that points to blocks that point to blocks that point to blocks. triple indirection

Getting to O(1) Monday, November 20, 2017 5:21 PM If the block address x is < 13, then return inode[x] (the first 13 are contained in the inode) else If the block address < 13 + 1024, return inode[13][x-13] Else If the block address x < 13 + 1024 + 1024*1024 return inode[14][(x-13-1024)/1024] Disks_and_Layers Page 34

[(x-13-1024)%1024]... Disks_and_Layers Page 35