COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

Similar documents
COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Storage Systems. INF-2201 Operating Systems Fundamentals Spring Lars Ailo Bongo, Bård Fjukstad, Daniel Stødle

CSCI-GA Operating Systems. I/O : Disk Scheduling and RAID. Hubertus Franke

Lecture 16: Storage Devices

CSE 153 Design of Operating Systems

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Disks. Storage Technology. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

CSE 153 Design of Operating Systems Fall 2018

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

Magnetic Disks. Tore Brox-Larsen Including material developed by Pål Halvorsen, University of Oslo (primarily) Kai Li, Princeton University

Introduction to I/O and Disk Management

Introduction to I/O and Disk Management

Mass-Storage. ICS332 Operating Systems

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

Chapter 6. Storage and Other I/O Topics

Today: Secondary Storage! Typical Disk Parameters!

Mass-Storage Structure

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CS3600 SYSTEMS AND NETWORKS

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Storage. CS 3410 Computer System Organization & Programming

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

CSE 120. Operating Systems. March 27, 2014 Lecture 17. Mass Storage. Instructor: Neil Rhodes. Wednesday, March 26, 14

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

High-Performance Storage Systems

Chapter 12: Mass-Storage

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer Architecture 计算机体系结构. Lecture 6. Data Storage and I/O 第六讲 数据存储和输入输出. Chao Li, PhD. 李超博士

Chapter 12: Mass-Storage

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

CSE 451: Operating Systems Spring Module 12 Secondary Storage

Hard Disk Drives (HDDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 11. I/O Management and Disk Scheduling

CS420: Operating Systems. Mass Storage Structure

Input/Output. Chapter 5: I/O Systems. How fast is I/O hardware? Device controllers. Memory-mapped I/O. How is memory-mapped I/O done?

Lecture 9. I/O Management and Disk Scheduling Algorithms

CS2410: Computer Architecture. Storage systems. Sangyeun Cho. Computer Science Department University of Pittsburgh

Hard Disk Drives (HDDs)

V. Mass Storage Systems

CS143: Disks and Files

Chapter 10: Mass-Storage Systems

CS510 Operating System Foundations. Jonathan Walpole

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

Chapter 10: Mass-Storage Systems

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CSE 380 Computer Operating Systems

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Storage Systems. Storage Systems

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Chapter 12: Mass-Storage

I/O & Storage. Jin-Soo Kim ( Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

I/O CANNOT BE IGNORED

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Mass-Storage Structure

I/O CANNOT BE IGNORED

Monday, May 4, Discs RAID: Introduction Error detection and correction Error detection: Simple parity Error correction: Hamming Codes

Anatomy of a disk. Stack of magnetic platters. Disk arm assembly

Part IV I/O System. Chapter 12: Mass Storage Structure

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University

Operating Systems. Mass-Storage Structure Based on Ch of OS Concepts by SGG

Database Systems II. Secondary Storage

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Secondary storage. File systems

Storage System COSC UCB

Part IV I/O System Chapter 1 2: 12: Mass S torage Storage Structur Structur Fall 2010

[537] I/O Devices/Disks. Tyler Harter

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Administrivia. Midterm. Realistic PC architecture. Memory and I/O buses. What is memory? What is I/O bus? E.g., PCI

Mass-Storage Systems. Mass-Storage Systems. Disk Attachment. Disk Attachment

Disk Scheduling COMPSCI 386

STORING DATA: DISK AND FILES

Storage Technologies and the Memory Hierarchy

BBM371- Data Management. Lecture 2: Storage Devices

16:30 18:00, June 20 (Monday), 2011 # (even student IDs) # (odd student IDs) Scope

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Wednesday, April 25, Discs RAID: Introduction Error detection and correction Error detection: Simple parity Error correction: Hamming Codes

Chapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism

Chapter 13: Mass-Storage Systems. Disk Structure

Topics. Lecture 8: Magnetic Disks

Lecture 21 Disk Devices and Timers

COMP283-Lecture 3 Applied Database Management

CSE 451: Operating Systems Winter Secondary Storage. Steve Gribble. Secondary storage

Computer Systems Laboratory Sungkyunkwan University

Computer System Architecture

Questions on Midterm. Midterm. Administrivia. Memory and I/O buses. Realistic PC architecture. What is memory? Median: 59.5, Mean: 57.3, Stdev: 26.

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

Concepts Introduced. I/O Cannot Be Ignored. Typical Collection of I/O Devices. I/O Issues

Transcription:

COS 318: Operating Systems Storage Devices Vivek Pai Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall11/cos318/

Today s Topics Magnetic disks Magnetic disk performance Disk arrays Flash memory 2

A Typical Magnetic Disk Controller External connection IDE/ATA, SATA(1.0, 2.0, 3.0) SCSI (1, 2, 3), Ultra-(160, 320, 640) SCSI Fibre channel Cache Buffer data between disk and interface Controller Read/write operation Cache replacement Failure detection and recovery External connection Interface DRAM cache Controller Disk 3

Disk Caching Method Use DRAM to cache recently accessed blocks Most disk has 16-64 MB Some of the RAM space stores firmware (an embedded OS) Blocks are replaced usually in an LRU order + tracks Pros Good for reads if accesses have locality Cons Need to deal with reliable writes 4

Disk Arm and Head Disk arm A disk arm carries disk heads Disk head Mounted on an actuator Read and write on disk surface Read/write operation Disk controller receives a command with <track#, sector#> Seek the right cylinder (tracks) Wait until the right sector comes Perform read/write

Mechanical Component of A Disk Drive Tracks Cylinder Sectors Concentric rings around disk surface, bits laid out serially along each track A track of the platter, 1000-5000 cylinders per zone, 1 spare per zone Arc of track holding some min # of bytes, variable # sectors/track 6

Disk Sectors Where do they come from? Formatting process Logical maps to physical What is a sector? Header (ID, defect flag, ) Real space (e.g. 512 bytes) Trailer (ECC code) What about errors? Detect errors in a sector Correct them with ECC If not recoverable, replace it with a spare Skip bad sectors in the future Hdr 512 bytes ECC Sector i defect i+1 defect i+2 7

Disks Were Large First Disk: IBM 305 RAMAC (1956) 5MB capacity 50 disks, each 24 8

They Are Now Much Smaller Form factor:.5-1 4 5.7 Storage: 0.5-2TB Form factor:.4-.7 2.7 3.9 Storage: 160-320GB Form factor:.2-.4 2.1 3.4 Storage: 1GB-8GB Replaced by Flash 9

Areal Density vs. Moore s Law (Mark Kryder at SNW 2006) 10

50 Years (Mark Kryder at SNW 2006) IBM RAMAC (1956) Seagate Momentus (2006) Difference Capacity 5MB 160GB 32,000 Areal Density 2K bits/in 2 130 Gbits/in 2 65,000,000 Disks 50 @ 24 diameter 2 @ 2.5 diameter 1 / 2,300 Price/MB $1,000 $0.01 1 / 100,000 Spindle Speed 1,200 RPM 5,400 RPM 5 Seek Time 600 ms 10 ms 1 / 60 Data Rate 10 KB/s 44 MB/s 4,400 Power 5000 W 2 W 1 / 2,500 Weight ~ 1 ton 4 oz 1 / 9,000 11

Sample Disk Specs (from Seagate) Cheetah 15k.7 Barracuda XT Capacity Formatted capacity (GB) 600 2000 Discs 4 4 Heads 8 8 Sector size (bytes) 512 512 Performance External interface Ultra320 SCSI, FC, S. SCSI SATA Spindle speed (RPM) 15,000 7,200 Average latency (msec) 2.0 4.16 Seek time, read/write (ms) 3.5/3.9 8.5/9.5 Track-to-track read/write (ms) 0.2-0.4 0.8/1.0 Internal transfer (MB/sec) 1,450-2,370 600 Transfer rate (MB/sec) 122-204 138 Cache size (MB) 16 64 Reliability Recoverable read errors 1 per 10 12 bits read 1 per 10 10 bits read Non-recoverable read errors 1 per 10 16 bits read 1 per 10 14 bits read 12

Disk Performance Seek Position heads over cylinder, typically 3.5-9.5 ms Rotational delay Wait for a sector to rotate underneath the heads Typically 8-4 ms (7,200 15,000RPM) or ½ rotation takes 4-2ms Transfer bytes Transfer bandwidth is typically 40-138 Mbytes/sec Performance of transfer 1 Kbytes Seek (4 ms) + half rotational delay (2ms) + transfer (0.013 ms) Total time is 6.01 ms or 167 Kbytes/sec! (1/360 of 60MB/s!)

More on Performance What transfer size can get 90% of the disk bandwidth? Assume Disk BW = 60MB/sec, ½ rotation = 2ms, ½ seek = 4ms BW * 90% = size / (size/bw + rotation + seek) size = BW * (rotation + seek) * 0.9 / 0.1 = 60MB * 0.006 * 0.9 / 0.1 = 3.24MB Block Size (Kbytes) % of Disk Transfer Bandwidth 1Kbytes 0.28% 1Mbytes 73.99% 3.24Mbytes 90% Seek and rotational times dominate the cost of small accesses Disk transfer bandwidth are wasted Need algorithms to reduce seek time 14

FIFO (FCFS) order Method First come first serve Pros Fairness among requests In the order applications expect Cons 0 53 199 Arrival may be on random spots on the disk (long seeks) Wild swing can happen 98, 183, 37, 122, 14, 124, 65, 67 15

SSTF (Shortest Seek Time First) Method Pick the one closest on disk Rotational delay is in calculation Pros Try to minimize seek time Cons 0 53 199 Starvation Question Is SSTF optimal? Can we avoid the starvation? 98, 183, 37, 122, 14, 124, 65, 67 (65, 67, 37, 14, 98, 122, 124, 183) 16

Elevator (SCAN) Method Take the closest request in the direction of travel Real implementations do not go to the end (called LOOK) Pros 0 53 199 Bounded time for each request Cons Request at the other end will take a while 98, 183, 37, 122, 14, 124, 65, 67 (37, 14, 65, 67, 98, 122, 124, 183) 17

C-SCAN (Circular SCAN) Method Like SCAN But, wrap around Real implementation doesn t go to the end (C-LOOK) Pros Uniform service time Cons Do nothing on the return 0 53 199 98, 183, 37, 122, 14, 124, 65, 67 (65, 67, 98, 122, 124, 183, 14, 37) 18

Discussions Which is your favorite? FIFO SSTF SCAN C-SCAN Disk I/O request buffering Where would you buffer requests? How long would you buffer requests? 19

RAID (Redundant Array of Independent Disks) Main idea Store the error correcting codes on other disks General error correcting codes are too powerful Use XORs or single parity Upon any failure, one can recover the entire block from the spare disk (or any disk) using XORs Pros Reliability High bandwidth Cons Cost The controller is complex RAID controller D1 D2 D3 D4 P P = D1 D2 D3 D4 D3 = D1 D2 P D4 20

Synopsis of RAID Levels RAID Level 0: Non redundant RAID Level 3: Byte-interleaved, parity RAID Level 1: Mirroring RAID Level 2: Byte-interleaved, ECC RAID Level 4: Block-interleaved, parity RAID Level 5: Block-interleaved, distributed parity 21

RAID Level 6 and Beyond Goals Less computation and fewer updates per random writes Small amount of extra disk space Extended Hamming code Remember Hamming code? Specialized Eraser Codes IBM Even-Odd, NetApp RAID-DP, Beyond RAID-6 Reed-Solomon codes, using MOD 4 equations Can be generalized to deal with k (>2) disk failures 0 1 2 3 A 4 5 6 7 B 8 9 10 11 C 12 13 14 15 D E F G H 22

Next Generation: FLASH Flash chip density increases on the Moore s law curve 1995 16 Mb NAND flash chips 2005 16 Gb NAND flash chips 2009 64 Gb NAND flash chips Doubled each year since 1995 Market driven by Phones, Cameras, ipod, Low entry-cost, ~$30/chip ~$3/chip 2012 1 Tb NAND flash Samsung prediction == 128 Gb chip == 1TB or 2TB disk for ~$400 or 128GB disk for $40 or 32GB disk for $5 23

What s Wrong With FLASH? Expensive: $/GB 2x less than cheap DRAM 50x more than disk today, may drop to 10x in 2012 Limited lifetime ~50k to 100k writes / page (SLC) ~15k to 60k writes / page (MLC) But, suppose you do wear leveling and 200,000 writes/sec, If you have 1,000M pages on SLC flash (100k/page), it will take 15 years to wear out. Current performance limitations Slow to write: can only write 0 s, so erase (set all 1) then write Large (e.g. 128K) blocks to erase 24

Current Development Flash Translation Layer (FTL) Remapping Wear-leveling Write faster Form factors SSD USB, SD, Stick, PCI cards Performance Fusion-IO with 2.5TB, 6GB/s r/w, 26µs latency 25

Summary Disk is complex Disk real density is on Moore s law curve Need large disk blocks to achieve good throughput System needs to perform disk scheduling RAID improves reliability and high throughput at a cost Flash memory has emerged at low and high ends 26