Block Device Scheduling. Don Porter CSE 506

Similar documents
Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

CSE506: Operating Systems CSE 506: Operating Systems

Block Device Driver. Pradipta De

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

I/O & Storage. Jin-Soo Kim ( Computer Systems Laboratory Sungkyunkwan University

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

FFS: The Fast File System -and- The Magical World of SSDs

Operating Systems. Operating Systems Professor Sina Meraji U of T

CSE 153 Design of Operating Systems Fall 2018

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

CSE380 - Operating Systems. Communicating with Devices

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Disk Scheduling COMPSCI 386

CSE 451: Operating Systems Spring Module 12 Secondary Storage

CSE 153 Design of Operating Systems

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

Storage. Hwansoo Han

CS5460: Operating Systems Lecture 20: File System Reliability

EECS 482 Introduction to Operating Systems

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

CS 318 Principles of Operating Systems

Operating Systems. File Systems. Thomas Ropars.

EECS 482 Introduction to Operating Systems

Ext3/4 file systems. Don Porter CSE 506

I/O 1. Devices and I/O. key concepts device registers, device drivers, program-controlled I/O, DMA, polling, disk drives, disk head scheduling

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

CS510 Operating System Foundations. Jonathan Walpole

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Caching and reliability

CSE 451: Operating Systems Winter Secondary Storage. Steve Gribble. Secondary storage

I/O and file systems. Dealing with device heterogeneity

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Disks: Structure and Scheduling

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

COS 318: Operating Systems. Journaling, NFS and WAFL

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

Hard Disk Drives (HDDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Mass-Storage Structure

Operating Systems Design Exam 2 Review: Spring 2011

CSE 451: Operating Systems Winter Lecture 12 Secondary Storage. Steve Gribble 323B Sieg Hall.

CS 416: Opera-ng Systems Design March 23, 2012

Hard Disk Drives (HDDs)

CSE 506: Opera.ng Systems. The Page Cache. Don Porter

Chapter 6. Storage and Other I/O Topics

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Operating Systems 2010/2011

Input/Output. Chapter 5: I/O Systems. How fast is I/O hardware? Device controllers. Memory-mapped I/O. How is memory-mapped I/O done?

Chapter 6. Storage and Other I/O Topics

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

Chapter 10: Mass-Storage Systems

The Page Cache 3/16/16. Logical Diagram. Background. Recap of previous lectures. The address space abstracvon. Today s Problem.

CS 4284 Systems Capstone

Topics. " Start using a write-ahead log on disk " Log all updates Commit

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

Principles of Operating Systems CS 446/646

Improving Disk I/O Performance on Linux. Carl Henrik Lunde, Håvard Espeland, Håkon Kvale Stensland, Andreas Petlund, Pål Halvorsen

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Review: FFS background

Lecture 18: Reliable Storage

The Art and Science of Memory Allocation

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

CS 111. Operating Systems Peter Reiher

I/O 1. Devices and I/O. key concepts device registers, device drivers, program-controlled I/O, DMA, polling, disk drives, disk head scheduling

CSE 120. Overview. July 27, Day 8 Input/Output. Instructor: Neil Rhodes. Hardware. Hardware. Hardware

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

[537] I/O Devices/Disks. Tyler Harter

Mass-Storage Structure

Review: FFS [McKusic] basics. Review: FFS background. Basic FFS data structures. FFS disk layout. FFS superblock. Cylinder groups

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016

Advanced File Systems. CS 140 Feb. 25, 2015 Ali Jose Mashtizadeh

Chapter 14: Mass-Storage Systems. Disk Structure

File Systems. Chapter 11, 13 OSPP

csci 3411: Operating Systems

I/O Device Controllers. I/O Systems. I/O Ports & Memory-Mapped I/O. Direct Memory Access (DMA) Operating Systems 10/20/2010. CSC 256/456 Fall

Outline. Operating Systems: Devices and I/O p. 1/18

Database Systems II. Secondary Storage

[537] Fast File System. Tyler Harter

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Operating Systems, Fall

Chapter 11. I/O Management and Disk Scheduling

[537] RAID. Tyler Harter

Anatomy of a disk. Stack of magnetic platters. Disk arm assembly

Kernel Korner I/O Schedulers

Transcription:

Block Device Scheduling Don Porter CSE 506

Quick Recap CPU Scheduling Balance competing concerns with heuristics What were some goals? No perfect solution Today: Block device scheduling How different from the CPU? Focus primarily on a traditional hard drive Extend to new storage media

Block device goals Throughput Latency Safety file system can be recovered after a crash Fairness surprisingly, very little attention is given to storage access fairness Hard problem solutions usually just prevent starvation Disk quotas for space fairness

Caching Obviously, the number 1 trick in the OS designer s toolbox is caching disk contents in RAM More on the page cache next time Latency can be hidden by pre-reading data into RAM And keeping any free RAM full of disk contents Doesn t help synchronous reads (that miss in RAM cache) or synchronous writes

Caching + throughput Assume that most reads and writes to disk are asynchronous Dirty data can be buffered and written at OS s leisure Most reads hit in RAM cache most disk reads are readahead optimizations Key problem: How to optimally order pending disk I/O requests? Hint: it isn t first-come, first-served

Another view of the problem Between page cache and disk, you have a queue of pending requests Requests are a tuple of (block #, read/write, buffer addr) You can reorder these as you like to improve throughput What reordering heuristic to use? If any?

A note on safety In Linux, and other OSes, the I/O scheduler can reorder requests arbitrarily It is the file system s job to keep unsafe I/O requests out of the scheduling queues

Dangerous I/Os What can make an I/O request unsafe? File system bookkeeping has invariants on disk Example: Inodes point to file data blocks; data blocks are also marked as free in a bitmap Updates must uphold these invariants Ex: Write an update to the inode, then the bitmap What if the system crashes between writes? Block can end up in two files!!!

3 Simple Rules (Courtesy of Ganger and McKusick, Soft Updates paper) Never write a pointer to a structure until it has been initialized Ex: Don t write a directory entry to disk until the inode has been written to disk Never reuse a resource before nullifying all pointers to it Ex: Before re-allocating a block to a file, write an update to the inode that references it Never reset the last pointer to a live resource before a new pointer has been set Ex: Renaming a file write the new directory entry before the old one (better 2 links than none)

A note on safety It is the file system s job to keep unsafe I/O requests out of the scheduling queues While these constraints are simple, enforcing them in the average file system is surprisingly difficult Journaling helps by creating a log of what you are in the middle of doing, which can be replayed (Simpler) Constraint: Journal updates must go to disk before FS updates

A simple disk model Disks are slow. Why? Moving parts << circuits Programming interface: simple array of sectors (blocks) Physical layout: Concentric circular tracks of blocks on a platter E.g., sectors 0-9 on innermost track, 10-19 on next track, etc. Disk arm moves between tracks Platter rotates under disk head to align w/ requested sector

3 key latencies Seek delay: time the disk arm takes to move to a different track Rotational delay: time the disk head waits for the platter to rotate desired sector under it Note: disk rotates continuously at constant speed I/O delay: time it takes to read/write a sector

Observations Latency of a given operation is a function of current disk arm and platter position Each request changes these values Idea: build a model of the disk Maybe use delay values from measurement or manuals Use simple math to evaluate latency of each pending request Greedy algorithm: always select lowest latency

Example formula s = seek latency, in time/track r = rotational latency, in time/sector i = I/O latency, in seconds Time = (Δtracks * s) + (Δsectors * r) + I Note: Δsectors can only be calculated after seek is finished. Why?

Problem with greedy? Far requests will starve Disk head may just hover around the middle tracks

Elevator Algorithm Require disk arm to move in continuous sweeps in and out Reorder requests within a sweep Ex: If disk arm is moving out, reorder requests between the current track and the outside of disk in ascending order (by block number) A request for a sector the arm has already passed must be ordered after the outermost request, in descending order

Elevator Algo, pt. 2 This approach prevents starvation Sectors at inside or outside get service after a bounded time Reasonably good throughput Sort requests to minimize seek latency Can get hit with rotational latency pathologies (How?) Simple to code up! Programming model hides low-level details; difficult to do finegrained optimizations in practice

Pluggable Schedulers Linux allows the disk scheduler to be replaced Just like the CPU scheduler Can choose a different heuristic that favors: Fairness Real-time constraints Performance

Complete Fairness Queue (CFQ) Idea: Add a second layer of queues (one per process) Round-robin promote them to the real queue Goal: Fairly distribute disk bandwidth among tasks Problems? Overall throughput likely reduced Ping-pong disk head around

Deadline Scheduler Associate expiration times with requests As requests get close to expiration, make sure they are deployed Constrains reordering to ensure some forward progress Good for real-time applications

Anticipatory Scheduler Idea: Try to anticipate locality of requests If process P tends to issue bursts of requests for close disk blocks, When you see a request from P, wait a bit and see if more come in before scheduling them

Optimizations at Cross-purposes The disk itself does some optimizations: Caching Write requests can sit in a volatile cache for longer than expected Reordering requests internally Can t assume that requests are serviced in-order Dependent operations must wait until first finishes Bad sectors can be remapped to spares Problem: disk arm flailing on an old disk

Disks aren t everything Flash is increasing in popularity Different types with slight variations (NAND, NOR, etc) No moving parts who cares about block ordering anymore? Can only write to a block of flash ~100k times Can read as much as you want

More in a Flash Flash reads are generally fast, writes are more expensive Prefetching has little benefit Queuing optimizations can take longer than a read New issue: wear leveling need to evenly distribute writes Flash devices usually have a custom, log-structured FS Group random writes

Even newer hotness Byte-addressible, persistent RAMs (BPRAM) Phase-Change Memory (PCM), Memristors, etc. Splits the difference between RAM and flash: Byte-granularity writes (vs. blocks) Fast reads, slower, high-energy writes Doesn t need energy to hold state (DRAM refresh) Wear an issue (bytes get stuck at last value) Still in the lab, but getting close

Important research topic Most work on optimizing storage accessed is tailored to hard drives These heuristics are not easily adapted to new media Future systems will have a mix of disks, flash, PRAM, DRAM Does it even make sense to treat them all the same?

Summary Performance characteristics of disks, flash, BPRAM Disk scheduling heuristics Safety constraints for file systems