CSCI 350 Ch. 12 Storage Device. Mark Redekopp Michael Shindler & Ramesh Govindan

Similar documents
Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Lecture 16: Storage Devices

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

Main Points. File systems. Storage hardware characteris7cs. File system usage Useful abstrac7ons on top of physical devices

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

Storage. CS 3410 Computer System Organization & Programming

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Chapter 6. Storage & Other I/O

Frequently asked questions from the previous class survey

I/O 1. Devices and I/O. key concepts device registers, device drivers, program-controlled I/O, DMA, polling, disk drives, disk head scheduling

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

Chapter 6. Storage and Other I/O Topics

COMP283-Lecture 3 Applied Database Management

Disk Scheduling COMPSCI 386

College of Computer & Information Science Spring 2010 Northeastern University 12 March 2010

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

SSD (Solid State Disk)

Main Points. File systems. Storage hardware characteris7cs. File system usage Useful abstrac7ons on top of physical devices

Mass-Storage Structure

CS311 Lecture 21: SRAM/DRAM/FLASH

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

CSE 153 Design of Operating Systems

CSE 451: Operating Systems Spring Module 12 Secondary Storage

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Anatomy of a disk. Stack of magnetic platters. Disk arm assembly

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Mass-Storage Structure

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

CSCI-GA Operating Systems. I/O : Disk Scheduling and RAID. Hubertus Franke

FFS: The Fast File System -and- The Magical World of SSDs

Introduction to I/O and Disk Management

Chapter 11. I/O Management and Disk Scheduling

External Memory. Computer Architecture. Magnetic Disk. Outline. Data Organization and Formatting. Write and Read Mechanisms

Introduction to I/O and Disk Management

Content courtesy of Wikipedia.org. David Harrison, CEO/Design Engineer for Model Sounds Inc.

Chapter 6 - External Memory

CS420: Operating Systems. Mass Storage Structure

Data Organization and Processing

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

Chapter 12: Mass-Storage

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

37. Hard Disk Drives. Operating System: Three Easy Pieces

I/O CANNOT BE IGNORED

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Chapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism

Chapter 13: Mass-Storage Systems. Disk Structure

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems

CO212 Lecture 6: Memory Organization III

CS3600 SYSTEMS AND NETWORKS

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

I/O CANNOT BE IGNORED

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

Block Device Driver. Pradipta De

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

Storage. Hwansoo Han

ECEN 449 Microprocessor System Design. Memories

Disks. Storage Technology. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo

Flash Memory. Gary J. Minden November 12, 2013

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

Administrivia. Midterm. Realistic PC architecture. Memory and I/O buses. What is memory? What is I/O bus? E.g., PCI

Appendix D: Storage Systems

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

Lecture 29. Friday, March 23 CS 470 Operating Systems - Lecture 29 1

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Flash memory talk Felton Linux Group 27 August 2016 Jim Warner

Storage Technologies - 3

EECS 482 Introduction to Operating Systems

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Differential RAID: Rethinking RAID for SSD Reliability

EECS 482 Introduction to Operating Systems

Chapter 9: Peripheral Devices: Magnetic Disks

STORING DATA: DISK AND FILES

Memory Hierarchy Y. K. Malaiya

Computer System Architecture

Operating Systems 2010/2011

[537] I/O Devices/Disks. Tyler Harter

CSE 380 Computer Operating Systems

Transcription:

1 CSCI 350 Ch. 12 Storage Device Mark Redekopp Michael Shindler & Ramesh Govindan

2 Introduction Storage HW limitations Poor randomaccess Asymmetric read/write performance Reliability issues File system designers and application writers need to understand the hardware

MAGNETIC DISK STORAGE 3

4 Magnetic Disk Organization Double sided surfaces/platters Magnetic coding on metallic film mounted on ceramic/aluminum surface Each platter is divided into concentric tracks of small sectors that each store several thousand bits Platters are spun (4,50015,000 RPMs = 70250 RPS) and allow the read/write head to skim over the surface inducing a magnetic field and reading/writing bits Reading/writing occurs at granularity of ENTIRE sector [usually 512 bytes] not individual bytes Seek Time: Time needed to position the readhead above the proper track Rotational delay: Time needed to bring the right sector under the readhead Depends on rotation speed (e.g. 5400 RPM) Transfer Time: Disk Controller Overhead: 312 ms 56 ms 0.1 ms 2.0 ms ~20 ms Read/Write Head 0 Read/Write Head 1 Read/Write Head 7 Track 0 Track 1 Surfaces Sector 0 Sector 1 Sector 2

5 Improving Performance Track Skewing Offset sector 0 on neighboring track to allow fast sequential read accounting for the time it takes to move the read head to the next track Onboard RAM to act as cache Track buffer: When head arrives at desired track it may not be at the right sector Still start reading immediate & store the entire track in onboard memory in case they are wanted later without reading them at that point Write Acceleration Store write data in a cache and return to OS, performing the writes at a more convenient time (Can lead to data loss if powerloss or crash) Tag Command Queueing: OS batches writes and communicates the entire batch to the disk which can reorder them as desired to be optimally scheduled Track skewing: Sector 0 is offset on subsequent tracks based on the rotation speed and time it takes to move the head to the next track OS:PP 2 nd Ed. Fig. 12.2 7 0 1 1 7 0 2

6 Disk Access Times Access time = Seek Rotation Transfer Time Seek time Time to move head to correct track Mechanical concerns: Include time to wait for arm to stop vibrating and then make finer grained adjustments to position itself correctly over the track Seek time depends on how far the arm has to move Min. seek time approx. 0.31.5ms Max. seek time approx. 1020ms Average seek time (time to move 1/3 of the way across the disk) Head transition time If reading track t on one head (surface) and we want to read track t on another do we have to move the arm?

7 Disk Access Times (Cont.) Access time = Seek Rotation Transfer Time Rotation time Time to rotate the desired starting sector under the head For 4,200 to 15,000 RPM it takes 7.52ms for a half rotation of the surface (a reasonable estimation for rotation time) Can use track buffering Transfer time Time for the head to read one sector (FAST = few microseconds) into the disks RAM Since outer tracks have more sectors (yet constant rotation speed), outer track bandwidth is higher than inner track Then we must transfer from the disk's RAM to the processor over the computer system's memory Depends on I/O bus (USB 2.0 = 60MB/s, SATA3 = 600 MB/s)* *Src: https://en.wikipedia.org/wiki/list_of_device_bit_rates#storage

8 Example: Random Reads Time for 500 random sector reads in FIFO order (no rescheduling) Seek: Since random locations, use average seek time of 10.5 ms Rotation: At 7200 RPM, 1 rotation = 8.3 ms; Use half of that value 4.15 for average rotation time Transfer: 512 bytes @ 54MB/s = 9.5 us Time per req.: 10.5 4.15 0.0095 ms = 14.66ms Total time = 14.66 * 500 = 7.33s OS:PP 2 nd Ed. Fig. 12.3 Laptop HD specs. (Toshiba MK3254GSY)

9 Example: Sequential Reads Time for 500 sequential sector reads (assume same track) Seek: Since we don't know the track, use average seek time of 10.5 ms Rotation: At 7200 RPM, 1 rotation = 8.3 ms; Use half of that value 4.15 for average rotation time since we don't know where the head will be in relation to the desired start sector Transfer: 500 sectors * 512 bytes/sector * 1s/54MB = 4.8 ms 500 sectors * 512 bytes/sector * 1s/128MB = 2 ms Total time (54MB/s) = 10.54.154.8=19.5 ms Total time (128MB/s) = 10.54.154.8=16.7 ms Actually slightly better due to track buffering OS:PP 2 nd Ed. Fig. 12.3 Laptop HD specs. (Toshiba MK3254GSY) Using the 16.7 ms total time we are achieving => 15.33 MB/s But max rate is 54128 MB/s We are achieving a small fraction of max BW

10 Disk Scheduling FIFO Can yield poor performance for consecutive requests on disparate tracks SSTF/SPTF (Shortest Positioning/Seek Time First) Go to the request that we can get to the fastest (like Shortest Job First) Problem 1: Can lead to starvation Problem 2: Unlike SJF it is not optimal Example: Read n sectors that are distance D away in one direction and 2*n sectors at D1 distance in the opposite direction For response time per request it would be better to first handle the 2n sectors that are d1 distance then the n sectors but SSTF/SPTF would choose the n sectors first

11 Disk Scheduling Elevator Algorithms 1 Elevator algorithms SCAN/CSCAN: Elevatorbase algorithms SCAN: Service all requests in the order encountered as the arm moves from inner to outer tracks and then back again (i.e. scan in both forward and reverse directions) CSCAN: Same as SCAN but when we reach the end we return to starting position (w/o servicing requests) and start SCAN again (i.e. only SCAN 1 way) Likely few requests on the end we just serviced (more pending requests back at the start) More fair

12 Disk Scheduling Elevator Algorithms 2 RSCAN/RCSCAN: Rotationallyaware SCAN or CSCAN Allows for slight diversions from strict SCAN order based on rotation distance to a sector Example: Assume head location on track 0, sector 0 Request 1: Track 0, Sector 1000 Request 2: Track 1, Sector 500 Request 3: Track 10, Sector 0 RSCAN/RCSCAN would allow a servicing order of 2, 1, 3 rather than 1,2,3 according to strict SCAN

13 Effect of Disk Scheduling Recall time for 500 random sector reads was around 7.3 seconds Recalculate using SCAN Seek: Now each seek will be 0.2% of the time to seek across disk. We can interpolate between the minimum track seek (moving over 1 track) and the average 33.3% seek time. This yields 1.06ms Rotation time: Still half the rotation time = 4.15ms Transfer time: Still.0095 ms Time per request = 1.064.15.0095 = 5.22ms Total time = 500*5.22ms = 2.6 seconds Speedup of around 3x for SCAN

FLASH STORAGE 14

15 Transistor Physics Transistor is started by implanting two ntype silicon areas, separated by ptype ntype silicon (extra negative charges) Source Input W L Drain Input ptype silicon ( extra positive charges)

16 Transistor Physics A thin, insulator layer (silicon dioxide or just oxide ) is placed over the silicon between source and drain Source Input Drain Output Insulator Layer (oxide) ntype silicon (extra negative charges) ptype silicon ( extra positive charges)

17 Transistor Physics A thin, insulator layer (silicon dioxide or just oxide ) is placed over the silicon between source and drain Conductive polysilicon material is layered over the oxide to form the gate input Source Input Gate Input Drain Output conductive polysilicon Insulator Layer (oxide) ntype silicon (extra negative charges) ptype silicon ( extra positive charges)

18 Transistor Physics Positive voltage (charge) at the gate input repels the extra positive charges in the p type silicon Result is a negativecharge channel between the source input and drain Source Input negativelycharge channel Gate Input ptype positive charge repelled Drain Output ntype

19 Transistor Physics Electrons can flow through the negative channel from the source input to the drain output The transistor is on Gate Input Source Input Drain Output ntype ptype Negative channel between source and drain = Current flow

20 Transistor Physics If a low voltage (negative charge) is placed on the gate, no channel will develop and no current will flow The transistor is off Gate Input Source Input Drain Output ntype ptype No negative channel between source and drain = No current flow

21 Flash Memory Transistor Physics What if we add a second "gate" between the silicon and actual control gate We'll call this the floating gate Floating Gate Input Connection Source Input Control Gate Input Drain Output Insulator Layer (oxide) ntype silicon (extra negative charges) ptype silicon ( extra positive charges)

22 Flash Memory Transistor Physics Since it is surrounded by "insulators" any charge we deposit will be trapped and stored (even when power is not applied) Floating Gate (Not connected) Source Input Control Gate Input Drain Output Insulator Layer (oxide) ntype silicon (extra negative charges) ptype silicon ( extra positive charges)

23 Flash Memory Transistor Physics If we have no charge on the floating gate (neutral) then a positive charge on the control gate will still apply an electric field strong enough to create the conductive channel in the underlying silicon and thus turn the transistor ON. Floating Gate (Not connected) Source Input Control Gate Input Drain Output Insulator Layer (oxide) ntype silicon (extra negative charges) ptype silicon ( extra positive charges)

24 Flash Memory Transistor Physics If we trap "negative" charge on the floating gate then no matter what we apply to the control gate the transistor will be OFF Floating Gate (Not connected) Control Gate Input Source Input Drain Output Insulator Layer (oxide) ntype silicon (extra negative charges) ptype silicon ( extra positive charges)

25 Flash Memory Transistor Physics How doe we trap electrons on the FG? By Applying a higher than normal voltage to the control gate and drain we can cause "tunneling" of electrons from the source/channel/drain Floating Gate (Not connected) Insulator Layer (oxide) Control Gate Input Source Input ptype silicon ( extra positive charges) High Voltage High Voltage Drain Output ntype silicon (extra negative charges)

26 Flash Memory Transistor Physics Erase by apply a high voltage in the opposite polarity (to "suck out" the charge in the FG) Control Gate Input Source Input High Voltage Floating Gate (Not connected) Drain Output Insulator Layer (oxide) ntype silicon (extra negative charges) ptype silicon ( extra positive charges)

27 NAND vs. NOR Flash 2 Organization Approaches: NAND and NOR Flash NOR allows individual bytes/words to be read and written (no great speed advantage) NAND has increased density but limitations on read/write [Most storage devices use NAND tech.] Erasure [Both NAND/NOR]: Removal of charge on the FG happens at a block (multi KB chunks) level (aka "erasure block") Due to physical constraints and density reasons (i.e. if we erase in smaller blocks we can't fit as much memory on the chip) Read / Write(Program): Page level Like a hard drive we must read/write an entire page not individual bits (usually a few microseconds) Notice a write from 0101 to 1010 will require erasure NAND Block (unit of erasure): 128 512KB Page (unit of reading/writing/programming for NAND): 4KB

28 Wearout & Flash Translation Layer All Flash suffers from wearout After some number of program/erasure cycles (few thousand to few million) the transistor can no longer store its charge reliably Not only affects reliability but performance since we need to take more countermeasures to deal with the nonworking page Flash translation layer (FTL) Map logical (external) flash addresses to internal physical locations Rather than erase and rewrite a page, simply copy page to a fresh (erased) block and remap the address [Faster] Helps spread (evenout) the wearing on cells [Greater durability] If a page goes bad, we can just unmap it [Greater Reliability/Robustness] Trim Command: When a file is deleted, alert the FTL so it can reuse the page Logical Page 2 data v0 Logical Page 2 data v2 used unused Logical Page 2 data v1

29 Flash Performance Better sequential read throughput HD: 122204 MB/s SSD: 210270 MB/s MUCH better random read Max latency for single read/write: 75us When many requests present we can overlap and achieve latency of around 26us (1/38500) Durability: 1.5PB (PB = 10 15 ) of writes For normal workloads this could last years or decades However if we are constantly writing 200MB/s then the SSD would wear out in about 64 days OS:PP 2 nd Ed. Fig. 12.6 Intel 710 SSD specs.

RAID 30

31 RAID RAID = Redundant Array of Inexpensive Disks Store information redundantly so that if a disk fails the data can be recovered Levels RAID 1: Mirror data from one disk on another Can tolerate a disk failure but then must take offline to replace 50% effective storage RAID 5 At least 3 disks and store parity Better effective storage Can recreate missing data on the fly if a disk fails and perform a hotswap with a new disk (no offline penalty)