I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Similar documents
Mass-Storage Structure

Module 13: Secondary-Storage Structure

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

Chapter 14: Mass-Storage Systems. Disk Structure

Disk Scheduling. Based on the slides supporting the text

Disk Scheduling. Chapter 14 Based on the slides supporting the text and B.Ramamurthy s slides from Spring 2001

Chapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism

Chapter 13: Mass-Storage Systems. Disk Structure

CS3600 SYSTEMS AND NETWORKS

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

V. Mass Storage Systems

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems

Chapter 12: Mass-Storage

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Chapter 12: Mass-Storage Systems. Operating System Concepts 8 th Edition,

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

Chapter 12: Secondary-Storage Structure. Operating System Concepts 8 th Edition,

Module 13: Secondary-Storage

Chapter 14: Mass-Storage Systems

CHAPTER 12: MASS-STORAGE SYSTEMS (A) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Disk scheduling Disk reliability Tertiary storage Swap space management Linux swap space management

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

EIDE, ATA, SATA, USB,

CS420: Operating Systems. Mass Storage Structure

Operating Systems 2010/2011

Tape pictures. CSE 30341: Operating Systems Principles

Chapter 12: Mass-Storage Systems. Operating System Concepts 8 th Edition,

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Free Space Management

Chapter 10: Mass-Storage Systems

Chapter 11. I/O Management and Disk Scheduling

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

Chapter 12: Mass-Storage

Chapter 12: Mass-Storage

Principles of Operating Systems CS 446/646

Mass-Storage Structure

Overview of Mass Storage Structure

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

Lecture 9. I/O Management and Disk Scheduling Algorithms

I/O Device Controllers. I/O Systems. I/O Ports & Memory-Mapped I/O. Direct Memory Access (DMA) Operating Systems 10/20/2010. CSC 256/456 Fall

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

CSE 153 Design of Operating Systems Fall 2018

I/O Systems and Storage Devices

CSE 153 Design of Operating Systems

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

I/O & Storage. Jin-Soo Kim ( Computer Systems Laboratory Sungkyunkwan University

Part IV I/O System. Chapter 12: Mass Storage Structure

Hard Disk Drives (HDDs)

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

CSE 380 Computer Operating Systems

Hard Disk Drives (HDDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CSCI-GA Operating Systems. I/O : Disk Scheduling and RAID. Hubertus Franke

CSE 451: Operating Systems Winter Secondary Storage. Steve Gribble. Secondary storage

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

Chapter 14: Mass-Storage Systems

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

Chapter 14 Mass-Storage Structure

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CS370 Operating Systems

Input/Output. Chapter 5: I/O Systems. How fast is I/O hardware? Device controllers. Memory-mapped I/O. How is memory-mapped I/O done?

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura

Silberschatz, et al. Topics based on Chapter 13

MASS-STORAGE STRUCTURE

Disc Allocation and Disc Arm Scheduling?

Chapter 12: Mass-Storage Systems. Operating System Concepts 8 th Edition

Chapter 11: Mass-Storage Systems

Part IV I/O System Chapter 1 2: 12: Mass S torage Storage Structur Structur Fall 2010

UNIT 4 Device Management

CSE380 - Operating Systems. Communicating with Devices

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

CSE 451: Operating Systems Winter Lecture 12 Secondary Storage. Steve Gribble 323B Sieg Hall.

Chapter 6 External Memory

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

CS510 Operating System Foundations. Jonathan Walpole

Chapter 12: Mass-Storage Systems. Operating System Concepts 9 th Edition

Mass-Storage. ICS332 Operating Systems

CSE 451: Operating Systems Spring Module 12 Secondary Storage

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

I/O CANNOT BE IGNORED

CS370 Operating Systems

Lecture 29. Friday, March 23 CS 470 Operating Systems - Lecture 29 1

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Secondary storage. File systems

Mass-Storage Systems. Mass-Storage Systems. Disk Attachment. Disk Attachment

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

Storage. CS 3410 Computer System Organization & Programming

Magnetic Disk. Optical. Magnetic Tape. RAID Removable. CD-ROM CD-Recordable (CD-R) CD-R/W DVD

Computer System Architecture

Today: Secondary Storage! Typical Disk Parameters!

Disks. Storage Technology. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo

Disk Scheduling COMPSCI 386

CHAPTER 12 AND 13 - MASS-STORAGE STRUCTURE & I/O- SYSTEMS

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

Transcription:

I/O, Disks, and RAID Yi Shi Fall 2017 Xi an Jiaotong University

Goals for Today Disks How does a computer system permanently store data? RAID How to make storage both efficient and reliable? 2

What does the disk look like? 3

Some parameters 2-30 heads (platters * 2) diameter 14 to 2.5 700-20480 tracks per surface 16-1600 sectors per track sector size: 64-8k bytes 512 for most PCs note: inter-sector gaps capacity: 20M-300G main adjectives: BIG, slow 4

Disk overheads To read from disk, we must specify: cylinder #, surface #, sector #, transfer size, memory address Transfer time includes: Seek time: to get to the track Latency time: to get to the sector and Transfer time: get bits off the disk Track Sector Seek Time Rotation Delay 5

Modern disks Barracuda 180 Cheetah X15 36LP Capacity 181GB 36.7GB Disk/Heads 12/24 4/8 Cylinders 24,247 18,479 Sectors/track ~609 ~485 Speed 7200RPM 15000RPM Latency (ms) 4.17 2.0 Avg seek (ms) 7.4/8.2 3.6/4.2 Track-2-track(ms) 0.8/1.1 0.3/0.4 6

Disks vs. Memory Smallest write: sector Atomic write = sector Random access: 5ms not on a good curve Sequential access: 200MB/s Cost $.002MB Crash: doesn t matter ( nonvolatile ) (usually) bytes byte, word 50 ns faster all the time 200-1000MB/s $.10MB contents gone ( volatile ) 7

Disk Structure Disk drives addressed as 1-dim arrays of logical blocks the logical block is the smallest unit of transfer This array mapped sequentially onto disk sectors Address 0 is 1 st sector of 1 st track of the outermost cylinder Addresses incremented within track, then within tracks of the cylinder, then across cylinders, from outermost to innermost Translation is theoretically possible, but usually difficult Some sectors might be defective Number of sectors per track is not a constant 8

Non-uniform #sectors / track Maintain same data rate with Constant Linear Velocity Approaches: Reduce bit density per track for outer layers Have more sectors per track on the outer layers (virtual geometry) 9

Disk Scheduling The operating system tries to use hardware efficiently for disk drives having fast access time, disk bandwidth Access time has two major components Seek time is time to move the heads to the cylinder containing the desired sector Rotational latency is additional time waiting to rotate the desired sector to the disk head. Minimize seek time Seek time seek distance Disk bandwidth is total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer. 10

Disk Scheduling (Cont.) Several scheduling algos exist service disk I/O requests. We illustrate them with a request queue (0-199). 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53 11

FCFS Illustration shows total head movement of 640 cylinders. 12

SSTF Selects request with minimum seek time from current head position SSTF scheduling is a form of SJF scheduling may cause starvation of some requests. Illustration shows total head movement of 236 cylinders. 13

SSTF (Cont.) total head movement: 236 cylinders 14

SCAN The disk arm starts at one end of the disk, moves toward the other end, servicing requests head movement is reversed when it gets to the other end of disk servicing continues. Sometimes called the elevator algorithm. Illustration shows total head movement of 208 cylinders. 15

SCAN (Cont.) total head movement: 236 cylinders 16

C-SCAN Provides a more uniform wait time than SCAN. The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other end it immediately returns to beginning of the disk No requests serviced on the return trip. Treats the cylinders as a circular list that wraps around from the last cylinder to the first one. 17

C-SCAN (Cont.) 18

C-LOOK Version of C-SCAN Arm only goes as far as last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk. 19

C-LOOK (Cont.) 20

Selecting a Good Algorithm SSTF is common and has a natural appeal SCAN and C-SCAN perform better under heavy load Performance depends on number and types of requests Requests for disk service can be influenced by the fileallocation method. Disk-scheduling algo should be a separate OS module allowing it to be replaced with a different algorithm if necessary. Either SSTF or LOOK is a reasonable default algo 21

How is the disk formatted? After manufacturing disk has no information Is stack of platters coated with magnetizable metal oxide Before use, each platter receives low-level format Format has series of concentric tracks Each track contains some sectors There is a short gap between sectors Preamble allows h/w to recognize start of sector Also contains cylinder and sector numbers Data is usually 512 bytes ECC field used to detect and recover from read errors 22

Cylinder Skew Why cylinder skew? How much skew? Example, if 10000 rpm Drive rotates in 6 ms Track has 300 sectors New sector every 20 µs If track seek time 800 µs 40 sectors pass on seek Cylinder skew: 40 sectors 23

Formatting and Performance If 10K rpm, 300 sectors of 512 bytes per track 153,600 bytes every 6 ms 24.4 MB/sec transfer rate If disk controller buffer can store only one sector For 2 consecutive reads, 2 nd sector flies past during memory transfer of 1 st track Idea: Use single/double interleaving 24

Disk Partitioning Each partition is like a separate disk Sector 0 is MBR Contains boot code + partition table Partition table has starting sector and size of each partition High-level formatting Done for each partition Specifies boot block, free list, root directory, empty file system What happens on boot? BIOS loads MBR, boot program checks to see active partition Reads boot sector from that partition that then loads OS kernel, etc. 25

Handling Errors A disk track with a bad sector Solutions: Substitute a spare for the bad sector (sector sparing) Shift all sectors to bypass bad one (sector forwarding) 26

RAID Motivation Disks are improving, but not as fast as CPUs 1970s seek time: 50-100 ms. 2000s seek time: <5 ms. Factor of 20 improvement in 3 decades We can use multiple disks for improving performance By Striping files across multiple disks (placing parts of each file on a different disk), parallel I/O can improve access time Striping reduces reliability 100 disks have 1/100th mean time between failures of one disk So, we need Striping for performance, but we need something to help with reliability / availability To improve reliability, we can add redundant data to the disks, in addition to Striping 27

RAID A RAID is a Redundant Array of Inexpensive Disks In industry, I is for Independent The alternative is SLED, single large expensive disk Disks are small and cheap, so it s easy to put lots of disks (10s to 100s) in one box for increased storage, performance, and availability The RAID box with a RAID controller looks just like a SLED to the computer Data plus some redundant information is Striped across the disks in some way How that Striping is done is key to performance and reliability. 28

Granularity Some Raid Issues fine-grained: Stripe each file over all disks. This gives high throughput for the file, but limits to transfer of 1 file at a time coarse-grained: Stripe each file over only a few disks. This limits throughput for 1 file but allows more parallel file access Redundancy uniformly distribute redundancy info on disks: avoids loadbalancing problems concentrate redundancy info on a small number of disks: partition the set into data disks and redundant disks 29

Raid Level 0 Level 0 is nonredundant disk array Files are Striped across disks, no redundant info High read throughput Best write throughput (no redundant info to write) Any disk failure results in data loss Reliability worse than SLED Stripe 0 Stripe 1 Stripe 2 Stripe 3 Stripe 4 Stripe 5 Stripe 6 Stripe 7 Stripe 8 Stripe 9 Stripe 10 Stripe 11 data disks 30

Raid Level 1 Mirrored Disks Data is written to two places On failure, just use surviving disk On read, choose fastest to read Write performance is same as single drive, read performance is 2x better Expensive Stripe 0 Stripe 1 Stripe 2 Stripe 3 Stripe 0 Stripe 1 Stripe 2 Stripe 3 Stripe 4 Stripe 5 Stripe 6 Stripe 7 Stripe 4 Stripe 5 Stripe 6 Stripe 7 Stripe 8 Stripe 9 Stripe 10 Stripe 11 Stripe 8 Stripe 9 Stripe 10 Stripe 11 31 data disks mirror copies

Parity and Hamming Code What do you need to do in order to detect and correct a one-bit error? Suppose you have a binary number, represented as a collection of bits: <b3, b2, b1, b0>, e.g. 0110 Detection is easy Parity: Count the number of bits that are on, see if it s odd or even EVEN parity is 0 if the number of 1 bits is even Parity(<b3, b2, b1, b0 >) = P0 = b0 b1 b2 b3 Parity(<b3, b2, b1, b0, p0>) = 0 if all bits are intact Parity(0110) = 0, Parity(01100) = 0 Parity(11100) = 1 => ERROR! Parity can detect a single error, but can t tell you which of the bits got flipped 32

Parity and Hamming Code Detection and correction require more work Hamming codes can detect double bit errors and detect & correct single bit errors 7/4 Hamming Code h0 = b0 b1 b3 h1 = b0 b2 b3 h2 = b1 b2 b3 H0(<1101>) = 0 H1(<1101>) = 1 H2(<1101>) = 0 Hamming(<1101>) = <b3, b2, b1, h2, b0, h1, h0> = <1100110> If a bit is flipped, e.g. <1110110> Hamming(<1111>) = <h2, h1, h0> = <111> compared to <010>, <101> are in error. Error occurred in bit 5. 33

Raid Level 2 Bit-level Striping with Hamming (ECC) codes for error correction All 7 disk arms are synchronized and move in unison Complicated controller Single access at a time Tolerates only one error, but with no performance degradation Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 data disks ECC disks 34

Raid Level 3 Use a parity disk Each bit on the parity disk is a parity function of the corresponding bits on all the other disks A read accesses all the data disks A write accesses all data disks plus the parity disk On disk failure, read remaining disks plus parity disk to compute the missing data Bit 0 Bit 1 Bit 2 Bit 3 Parity Single parity disk can be used to detect and correct errors data disks Parity disk 35

Raid Level 4 Combines Level 0 and 3 block-level parity with Stripes A read accesses all the data disks A write accesses all data disks plus the parity disk Heavy load on the parity disk Stripe 0 Stripe 1 Stripe 2 Stripe 3 P0-3 Stripe 4 Stripe 5 Stripe 6 Stripe 7 P4-7 Stripe 8 Stripe 9 Stripe 10 Stripe 11 P8-11 data disks Parity disk 36

Raid Level 5 Block Interleaved Distributed Parity Like parity scheme, but distribute the parity info over all disks (as well as data over all disks) Better read performance, large write performance Reads can outperform SLEDs and RAID-0 Stripe 0 Stripe 1 Stripe 2 Stripe 3 P0-3 Stripe 4 Stripe 5 Stripe 6 P4-7 Stripe 7 Stripe 8 Stripe 9 P8-11 Stripe 10 Stripe 11 data and parity disks 37

Raid Level 6 Level 5 with an extra parity bit Can tolerate two failures What are the odds of having two concurrent failures? May outperform Level-5 on reads, slower on writes 38

RAID 0+1 and 1+0 39

Summary Disks: Latency Seek + Rotational + Transfer Also, queuing time Rotational latency: on average ½ rotation Improve performance (decrease queuing time) via scheduling RAID 40