CSC369 Lecture 9. Larry Zhang, November 16, 2015

Similar documents
Operating Systems. Operating Systems Professor Sina Meraji U of T

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Fast File System (FFS)

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CSE 153 Design of Operating Systems

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

[537] Fast File System. Tyler Harter

Locality and The Fast File System. Dongkun Shin, SKKU

File System Implementations

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

Operating Systems. File Systems. Thomas Ropars.

CS420: Operating Systems. Mass Storage Structure

Chapter 11: File System Implementation. Objectives

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems

File Systems: FFS and LFS

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

CSE 451: Operating Systems Winter Lecture 12 Secondary Storage. Steve Gribble 323B Sieg Hall.

I/O & Storage. Jin-Soo Kim ( Computer Systems Laboratory Sungkyunkwan University

Disk Scheduling COMPSCI 386

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CSE 451: Operating Systems Winter Secondary Storage. Steve Gribble. Secondary storage

CSE 153 Design of Operating Systems Fall 2018

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

CS510 Operating System Foundations. Jonathan Walpole

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

CSE 153 Design of Operating Systems

File System Implementation

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

File System Internals. Jo, Heeseung

Disc Allocation and Disc Arm Scheduling?

CSE 120 Principles of Operating Systems

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved

Evolution of the Unix File System Brad Schonhorst CS-623 Spring Semester 2006 Polytechnic University

Chapter 12: Mass-Storage Systems. Operating System Concepts 8 th Edition,

CSE 333 Lecture 9 - storage

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

Hard Disk Drives (HDDs)

UNIX File Systems. How UNIX Organizes and Accesses Files on Disk

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

To understand this, let's build a layered model from the bottom up. Layers include: device driver filesystem file

Hard Disk Drives (HDDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

I/O and file systems. Dealing with device heterogeneity

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

EECS 482 Introduction to Operating Systems

22 File Structure, Disk Scheduling

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

CS 111. Operating Systems Peter Reiher

b. How many bits are there in the physical address?

FFS: The Fast File System -and- The Magical World of SSDs

Advanced UNIX File Systems. Berkley Fast File System, Logging File System, Virtual File Systems

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

CSE 451: Operating Systems Spring Module 12 Secondary Storage

Locality and The Fast File System

Chapter 10: Mass-Storage Systems

Filesystems Lecture 10. Credit: some slides by John Kubiatowicz and Anthony D. Joseph

CS 4284 Systems Capstone

ECE 5730 Memory Systems

CS3600 SYSTEMS AND NETWORKS

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Final Review. Geoffrey M. Voelker. Final mechanics Memory management Paging Page replacement Disk I/O File systems Advanced topics

CS-537: Final Exam (Fall 2013) The Worst Case

Free Space Management

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

File Systems: Fundamentals

CS 537 Fall 2017 Review Session

Computer Systems Laboratory Sungkyunkwan University

COMP 530: Operating Systems File Systems: Fundamentals

File Systems. File system interface (logical view) File system implementation (physical view)

Workloads. CS 537 Lecture 16 File Systems Internals. Goals. Allocation Strategies. Michael Swift

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

File Systems: Fundamentals

Overview of Mass Storage Structure

Chapter 10: Case Studies. So what happens in a real operating system?

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CSE 120 Principles of Operating Systems

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

Chapter 12: Secondary-Storage Structure. Operating System Concepts 8 th Edition,

Operating Systems Design Exam 2 Review: Spring 2011

CSE 4/521 Introduction to Operating Systems. Lecture 27 (Final Exam Review) Summer 2018

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]

CS 416: Opera-ng Systems Design March 23, 2012

CSE 120. Operating Systems. March 27, 2014 Lecture 17. Mass Storage. Instructor: Neil Rhodes. Wednesday, March 26, 14

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Final Exam Preparation Questions

Lecture 29. Friday, March 23 CS 470 Operating Systems - Lecture 29 1

File Systems Management and Examples

Transcription:

CSC369 Lecture 9 Larry Zhang, November 16, 2015 1

Announcements A3 out, due ecember 4th Promise: there will be no extension since it is too close to the final exam (ec 7) Be prepared to take the challenge to your time management skills Implement ls, cp, mkdir, ln, rm for ext2 -- user space, real deal The 3 most important tips for A3 REA THE SPEC! REA THE SPEC!! REA THE SPEC!!! http://www.nongnu.org/ext2-doc/index.html 2

today s topic Optimizing File System Performance 3

Recap: File system esign What on-disk structure stores the data and metadata? When happens when a process opens a file? What on-disk structure is accessed during a read / write? VSFS: very simple file system 0 S IB B I I I I I 15 16 31 32 47 48 63 4

0 S IB B I I I I I 15 16 31 32 47 48 63 How do we know where inode table starts on the disk? check superblock How do we know where data region starts on the disk? check superblock How do we know if an inode is free? check inode bitmap How do we know if a data block is free? check data bitmap Given an inode, how do we know which data blocks store its data? data pointers stored in inode (direct pointers, indirect pointers, double indirect pointers, etc.) 5

The content of a data block if it belongs to a regular file data of the file if it belongs to a directory list of (name, inode number) pairs, which are the entries under the directory if it belongs to a symbolic link the path of the file that it links to 6

What happens when a process reads /home/alice/lru.c Need to recursively traverse the path to locate the data of lru.c 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 8 disk reads (at least) # of disk reads proportional to the length of the path. Find the inode for the root directory /, how? a. it is a well known by the FS, e.g., in UNIX the root inode number is always 2. Read inode for / (disk read), get pointers to data blocks of inode / Follow the pointers, read the data blocks (disk read), get the list of (name, inode) pairs under / Search the list and find ( home, inode number) Locate the inode for home in the inode table and read it (disk read), get the pointers to the data blocks of inode home. Follow the pointers, read data blocks (disk read), get the list of (name, inode) pairs under home Search the list and find ( alice, inode number) Locate the inode for alice in the inode table and read it (disk read), get the pointers to the data blocks of inode alice. Follow the pointers, read data blocks (disk read), get the list of (name, inode) pairs under alice Search the list and find ( lru.c, inode number) Locate the inode for alice in the inode table and read it (disk read), get the pointers to the data blocks of inode alice. Follow the pointers, read data blocks (disk read), get the content of lru.c. 7

What about write/create a file? The path traversal is similar to read, but is worse because... we may need to allocate new inodes (create a new file) or new data blocks (size of new writes exceeds existing blocks) need to read / write the inode bitmap block, or the data bitmap block extra reads / writes when allocating new data block read data bitmap from disk, find free block, change its bit write updated data bitmap to disk read inode from disk, update the data pointers field to add the newly allocated data block write updated inode back to disk write new data to the newly allocated data block extra reads / writes when creating a new file (even worse, try it yourself) 8

Each simple open() / read() / write() operation easily incurs 10+ disk I/O s. How can a file system accomplish any of this with reasonable efficiency? OS: Help! Address translation takes at least two memory access! TLB! FS: Help! file access takes 10+ disk accesses! Cache & Buffer! 9

Caching and Buffering Use memory to cache important blocks static partitioning: at boot time, allocate a fixed-size cache in memory (typically 10% of total memory), used by early file systems can be wasteful if the file system doesn t really need 10% of the memory dynamic partitioning: integrate virtual memory pages and file system pages into a unified page cache, so pages of memory can be flexibly allocated for either virtual memory or file system, used by modern systems typically use LRU as replacement algorithm the tradeoff between static partition vs dynamic partition is something to think about whenever we have a resource allocation kind of problem 10

Caching and Buffering Caching serves read operations well, a large enough cache would avoid disk I/O altogether. For write operations it doesn t helping everything, because writes have to go to disk anyways in order to become persistent. So we use buffer to buffer the disk I/O s. Buffering a batch of disk writes is helpful because: combine multiple writes into one write e.g., updating multiple bits of the inode bitmap can improve performance by scheduling the buffered writes The idea of lazy updates, useful in many scenarios. e.g., can schedule buffered writes in such a way that they happen sequentially on disk. can avoid some writes e.g., one write changes a bit from 0 to 1, then another write changes it from 1 to 0, if buffered than no need to write anything to disk. 11

Trade-off: speed vs durability The use of caching and buffering improve the speed of file system reads and writes, but at the same time it sacrifices the durability of data. if the a crash occurs, the buffered writes that are not written to disk yet are lost better durability means sync to disk more frequently which in turn means worse speed. Should I favour speed or durability? It depends (the always-correct answer to all system design questions) It depends on the application web browser cache bank database 12

So, we improved the performance of file system by caching and buffering. But we haven t considered much about the direct interaction with disk. There is probably something we can be smart about it, so that we can further improve the performance of the file system. To do this, we need to understand the hard disk first. 13

Hard isk rives 14

Basic Interface isk has a sector-addressable address space so a disk is like an array of sectors Sectors are typically 512 bytes or 4KB Seagate: Transition to Advanced Format 4K Sector Hard rives Main operations: read / write to sectors 15

The Inside of a Hard isk rive picture credits: Tyler Harter 16

Hard isk rive in Action 17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

Seek, Rotation, Transfer Seek: move the disk arm to the correct cylinder depends on how fast disk arm can move typical time: 1-15ms, depending on distance (average 5-6ms) improving very slowly: 7-10% per year Rotation: waiting for the sector to rotate under the head depend on the rotation rate of the disk (7200 RPM SATA, 15K RPM SCSI) average latency of ½ rotation (~4ms for 7200 RPM disk) has not changed in recent years Transfer: transferring data from surface into disk controller electronics, or the other way around depends on density, higher density, higher transfer rate ~100MB / s, average sector transfer time of ~5us improving rapidly (~40% per year) 35

Some Hardware Optimization Track Skew Zones Cache 36

Track skew If arm moves to outer track too slowly, may miss sector 16 and have to wait for a whole rotation. skew the track locations so we have enough time to settle. 37

Zones Each sector is 512 bytes... what doesn t feel right? Outer tracks are larger by geometry so should hold more sectors. 38

Cache, a.k.a. Track Buffer A small memory chip that is part of the hard drive. typically 8MB or 16MB ifferent from the cache that OS has Unlike the OS cache, it is aware of the disk geometry when reading a sector, may cache the whole track to speed up future reads on the same track 39

FFS: a disk-aware file system 40

The original UNIX file system (by Ken Thompson) Recall FS sees disk as an array of blocks each block has a logic block number (LBN) Simple, straightforward implementation easy to implement and understand but very poor utilization of disk bandwidth (2% of full bandwidth) 41

Problem #1 with the original UNIX FS On a new FS, blocks of a file are allocated sequentially and close to each other, which is good. A1 A2 B1 B2 C1 C2 1 2 E1 E2 As the FS gets older, files are being deleted, in a random manner. A1 A2 C1 C2 1 2 Then blocks of a file have to be allocated far from each other. A1 A2 F1 F2 C1 C2 1 2 F3 Fragmentation of an aging file system, causes more seeking. 42

Problem #2 with the original file system Recall that when we traverse a file path, at each level we need to access the inode first then access the data blocks. What problem do we have? In the above layout, inodes are far away from data blocks, need to jump back and forth, which causes a lot of long seeks. 43

FFS: the Fast File System BS Unix folks did a redesign (early-mid 80s) that they called the Fast File System (FFS) Improved disk utilization, decreased response time McKusick, Joy, Leffler, and Fabry, ACM Transactions on Computer Systems, Aug. 1984 A major breakthrough in file system history All modern file systems account for the lesson learned from FFS: Treat the disk like it s a disk 44

Cylinder Groups BS FFS addressed placement problems using the notion of a cylinder group (aka allocation groups in lots of modern FS s) isk partitioned into groups of cylinders ata blocks in same file allocated in same cylinder group Files in same directory allocated in same cylinder group Inodes for files allocated in same cylinder group as file data blocks 45

Cylinder Groups Allocation within cylinder groups provide closeness reduce number of long seeks Free space management need free space scattered across all cylinders. reserve 10% disk space to keep the disk partially free all the time when allocating large files, break it into large chunks and allocate from different cylinder groups, so it does not fill up one cylinder group. if preferred cylinder is full, allocate from a nearby cylinder group 46

isk Scheduling 47

isk Scheduling The OS typically has a queue of disk requests, therefore there is a chance to schedule these requests. Objective: minimize seeking (because they are expensive) 48

isk Scheduling Algorithms FCFS (do nothing) SSTF (shortest seek time first) minimizes arm movement favors blocks in middle tracks, because they have more blocks nearby. SCAN (elevator) serve request in one direction until done, then reverse like an elevator, avoid going back and forth in the middle C-SCAN (typewriter) reasonable when load is low, long waiting time when load is high like SCAN, but only go in one direction (no reverse direction) LOOK / C-LOOK like SCAN / C-SCAN, but only go as far as last request in each direction, instead of going full width of the disk. 49

Example: FCFS 50

Example: SSTF 51

Example: SCAN 52

Example: LOOK 53

isk Scheduling Summary isk scheduling is important only when disk requests queue up important for servers not so much for PCs Modern disks often do disk scheduling themselves isks know their layout better than the OS, can optimize better on-disk scheduling ignores, undoes any scheduling done by the OS 54

Summary: Optimizing FS Performance Caching and Buffering isk-aware file system structure isk scheduling Other techniques 55

Next Week More file system Tutorial this week More exercise for Assignment 3 56