I/O Devices & SSD. Dongkun Shin, SKKU

Similar documents
Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

Solid State Drives (SSDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Storage Systems : Disks and SSDs. Manu Awasthi CASS 2018

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

S-FTL: An Efficient Address Translation for Flash Memory by Exploiting Spatial Locality

NAND Flash-based Storage. Computer Systems Laboratory Sungkyunkwan University

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

Presented by: Nafiseh Mahmoudi Spring 2017

Computer Science 61C Spring Friedland and Weaver. Input/Output

[537] I/O Devices/Disks. Tyler Harter

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

[537] Flash. Tyler Harter

FFS: The Fast File System -and- The Magical World of SSDs

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Page Mapping Scheme to Support Secure File Deletion for NANDbased Block Devices

Data Organization and Processing

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

I/O 1. Devices and I/O. key concepts device registers, device drivers, program-controlled I/O, DMA, polling, disk drives, disk head scheduling

Storage Architecture and Software Support for SLC/MLC Combined Flash Memory

A Semi Preemptive Garbage Collector for Solid State Drives. Junghee Lee, Youngjae Kim, Galen M. Shipman, Sarp Oral, Feiyi Wang, and Jongman Kim

Mass-Storage Structure

16:30 18:00, June 20 (Monday), 2011 # (even student IDs) # (odd student IDs) Scope

GLS89SP032G3/064G3/128G3/256G3/512G3/001T3 Industrial Temp 2.5 SATA ArmourDrive PX Series

SSD (Solid State Disk)

Hard Disk Drives (HDDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

UCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

I/O. Disclaimer: some slides are adopted from book authors slides with permission 1

Flash memory talk Felton Linux Group 27 August 2016 Jim Warner

Tutorial FTL. Sejun Kwon Computer Systems Laboratory Sungkyunkwan University

Hard Disk Drives (HDDs)

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

I/O Devices. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

NAND/MTD support under Linux

Flash Trends: Challenges and Future

Partitioned Real-Time NAND Flash Storage. Katherine Missimer and Rich West

CSE 120. Overview. July 27, Day 8 Input/Output. Instructor: Neil Rhodes. Hardware. Hardware. Hardware

ECE Enterprise Storage Architecture. Fall 2016

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

Input-Output (I/O) Input - Output. I/O Devices. I/O Devices. I/O Devices. I/O Devices. operating system must control all I/O devices.

Operating Systems. V. Input / Output

CS429: Computer Organization and Architecture

I/O CANNOT BE IGNORED

Operating Systems. V. Input / Output. Eurecom

CSE 451: Operating Systems Spring Module 12 Secondary Storage

CSE 153 Design of Operating Systems

ECE Enterprise Storage Architecture. Fall 2018

Design and Implementation for Multi-Level Cell Flash Memory Storage Systems

I/O Systems. Jo, Heeseung

CS429: Computer Organization and Architecture

Reading and References. Input / Output. Why Input and Output? A typical organization. CSE 410, Spring 2004 Computer Systems

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Comp 204: Computer Systems and Their Implementation. Lecture 18: Devices

Tutorial FTL. Jinyong Computer Systems Laboratory Sungkyunkwan University

Storage Technologies and the Memory Hierarchy

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec

CS 261 Fall Mike Lam, Professor. Memory

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

NAND Flash Memory. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Devices. Today. Comp 104: Operating Systems Concepts. Operating System An Abstract View 05/01/2017. Devices. Devices

p-oftl: An Object-based Semantic-aware Parallel Flash Translation Layer

I/O Performance" CS162 C o Operating Systems and Response" 300" Time (ms)" User" I/O" Systems Programming Thread" lle device" 200" Lecture 13 Queue"

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Over provisioning in solid state hard drives: benefits, design considerations, and trade-offs in its use

GLS86FB008G2 / 016G2 / 032G2 / 064G2 Industrial Temp msata ArmourDrive

Random-Access Memory (RAM) CS429: Computer Organization and Architecture. SRAM and DRAM. Flash / RAM Summary. Storage Technologies

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

GLS87BP032G3/064G3/128G3/256G3/512G3/001T3 Industrial Temp SATA M.2 ArmourDrive PX Series

80 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 1, JANUARY Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD

The What, Why and How of the Pure Storage Enterprise Flash Array. Ethan L. Miller (and a cast of dozens at Pure Storage)

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

CS162 Operating Systems and Systems Programming Lecture 17. Disk Management and File Systems

Clustered Page-Level Mapping for Flash Memory-Based Storage Devices

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Radian MEMORY SYSTEMS

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Pseudo SLC. Comparison of SLC, MLC and p-slc structures. pslc

Storage: HDD, SSD and RAID

SFS: Random Write Considered Harmful in Solid State Drives

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

3ME Series. Customer Approver. Approver. Customer: Customer Part Number: Innodisk Part Number: Model Name: Date:

VSSIM: Virtual Machine based SSD Simulator

Computer Systems. Memory Hierarchy. Han, Hwansoo

I/O CANNOT BE IGNORED

smxnand RTOS Innovators Flash Driver General Features

GLS85LP0512P / 1002P / 1004P / 1008P Industrial Grade PATA NANDrive

Chapter 13: I/O Systems. Operating System Concepts 9 th Edition

Advanced Parallel Architecture Lesson 4 bis. Annalisa Massini /2015

A Buffer Replacement Algorithm Exploiting Multi-Chip Parallelism in Solid State Disks

NAND Flash Basics & Error Characteristics

C02: Interrupts and I/O

Why? Storage: HDD, SSD and RAID. Computer architecture. Computer architecture. 10 µs - 10 ms. Johan Montelius

Transcription:

I/O Devices & SSD 1

System Architecture Hierarchical approach Memory bus CPU and memory Fastest I/O bus e.g., PCI Graphics and higherperformance I/O devices Peripheral bus SCSI, SATA, or USB Connect many slowest devices 2

Device Structure Hardware Interface Allows the system software to control its operation status register: can be read to see the current status of the device command register: can be written to tell the device to perform a certain task data register: transfer data to/from the device Internal Structure Implementation specific Simple devices have one or a few hardware chips to implement their functionality; Complex devices include a simple CPU running firmware, some general purpose memory, and other device-specific chips 3

Protocol By reading and writing internal registers of a device, the operating system can control device behavior. While (STATUS == BUSY) ; // wait until device is not busy -> polling Write data to DATA register Write command to COMMAND register (Doing so starts the device and executes the command) While (STATUS == BUSY) ; // wait until device is done with your request Inefficiencies and Inconveniences Polling Repeatedly reading the status register Programmed I/O (PIO) The main CPU is involved with the data movement 4

Lowering CPU Overhead With Interrupts Interrupt OS can issue a request, put the calling process to sleep, and context switch to another task. When the device is finally finished with the operation, it will raise a hardware interrupt, causing the CPU to jump into the OS at a pre-determined interrupt service routine (ISR) The handler in operating system code that will finish the request e.g., reading data and perhaps an error code from the device Wake the process waiting for the I/O Allow for overlap of computation and I/O Interrupt is not always the best solution If a device is fast, it may be best to poll. Otherwise, use interrupt 5

Interrupt Optimization Unknown device: hybrid approach polls for a little while and then, if the device is not yet finished, uses interrupts. (two-phased approach) Network systems When a huge stream of incoming packets each generate an interrupt, it is possible for the OS to livelock only processing interrupts and never allowing a user-level process to run Occasionally use polling to better control what is happening in the system (e.g., slashdot effect) Interrupt Coalescing Multiple interrupts are coalesced into a single interrupt delivery lower the overhead of interrupt processing Long waiting Many interrupts can be coalesced Increase the latency of a request 6

More Efficient Data Movement With DMA OS programs the DMA engine Source/destination address in memory or device Data amount Other processes can be serviced When the DMA is complete, the DMA controller raises an interrupt, and the OS thus knows the transfer is complete. Programmed IO DMA 7

Methods Of Device Interaction I/O instructions E.g., in and out on x86 To send data to a device, register port. Usually privileged instructions Memory-mapped I/O Device registers are mapped to memory address space load (to read) or store (to write) 8

Device Driver How can we keep most of the OS device-neutral, thus hiding the details of device interactions from major OS subsystems? Solution: Abstraction The device driver in the OS know in detail how a device works Any specifics of device interaction are encapsulated within D/D File System Stack Issues block requests to the generic block layer Routes block requests to the appropriate D/D handles the details of issuing the specific request 9

Device Driver Drawbacks of encapsulation Even if a device has many special capabilities, it has to present a generic interface to the rest of the kernel, those special capabilities will go unused. e.g., SCSI devices which have very rich error reporting Over 70% of OS code is found in device drivers for any given installation, most of that code may not be active have many more bugs and thus are a primary contributor to kernel crashes 10

Disk System 11

Disk System Data access in disk system Seek time Rotational delay (latency time) Data transmission time 3. Disk rotation until sector has passed under the head: Data transfer time (< 1 ms) 3 Sector 2 2. Disk rotation until the desired sector arrives under the head: Rotational latency (0-10s ms) 1 1. Head movement from current position to desired cylinder: Seek time (0-10s ms) Rotation 12

Disk System Average time to access some target sector T access = T avg-seek + T avg-rotation + T avg-transfer Seek time (T avg-seek ) Time to position heads over cylinder containing target sector Typical T avg-seek is 3~9ms Rotational latency (T avg-rotation ) Time waiting for first bit of target sector to pass under head T avg-rotation = 1/2 1/RPMs 60s/1min Typical T avg-rotation = 1~4ms Transfer time (T avg-transfer ) Time to read the bits in the target sector T avg-transfer = 1/RPM 1/(avg. # sectors/track) 60s/1min 13

Plane 0 Plane 1 Plane 2 Plane 3 Plane 0 Plane 1 Plane 2 Plane 3 Flash-based SSD NAND flash memory SLC 1 bit Page: read/write unit, 4KB~16KB 0 1 Block: erase unit, erase-before-write, 128 KB or 256 KB Limited lifetime: wear out SLC, MLC, TLC MLC TLC 00 01 000 001 10 11 010 011 2 bit 3 bit 100 101 110 111 Chip Die 0 Die 1 Plane Block 0 Block Page 0 Block 1 Page 1 14

Flash Operations Read (a page) Random access Erase (a block) set each bit to the value 1 Program (a page) change some of the 1 s within a page to 0 s Erase Block Program Page 0 Previous contents of Page 1, 2, 3 are all gone! They must be moved before the block erase operation. 15

Flash Performance And Reliability Wear out Erased/program slowly accrues a little bit of extra charge. As that extra charge builds up, the block becomes unusable. MLC block: 10,000 P/E (Program/Erase) cycle lifetime; SLC block: 100,000 P/E cycles. Disturbance When accessing a particular page within a flash, it is possible that some bits get flipped in neighboring pages read disturbs or program disturbs cell-to-cell interference 16

From Raw Flash to Flash-Based SSDs Flash-based SSD provides standard block interface atop the raw flash chips inside it. Sector (512 KB) read/write flash translation layer (FTL) 17

Samsung Galaxy Note8 Samsung K3UH6H60AM-NGCJ 6 GB LPDDR4X SDRAM layered over a Qualcomm Snapdragon 835 Samsung KLUCG4J1ED-B0C1 64 GB UFS flash storage Qualcomm WCD9341 Aqstic audio codec Skyworks 78160-11 power amplification module Avago AFEM-9066 power amplification module Wacom W9018 touch control IC 18

Solid-State Disk SATA SSD NVMe SSD M.2 NVMe MacBook Air 19

FTL FTL takes read and write requests on logical blocks and turns them into low-level read, erase, and program commands on the underlying physical blocks and physical pages Performance issues Utilizing multiple flash chips in parallel Reduce write amplification total write issued to flash chips/total write issued by host Reliability issues Wear leveling spread writes across the blocks of the flash as evenly as possible ensure that all of the blocks of the device wear out at roughly the same time Program disturbance sequential-programming program pages within an erased block in order low page high page 20

FTL: Direct Mapped A write to logical page N Read in the entire block that page N is contained within Erase the block Program the old pages as well as the new one. Read-modify-write approach Poor performance & high write amplification The physical blocks containing popular data will quickly wear out. 21

A Log-Structured FTL Upon a write to logical block N The device appends the write to the next free spot in the currently-being-written-to block no overwrite, out-of-place update To allow for subsequent reads of block N, the device keeps a mapping table, which stores the physical address of each logical block. Problems: GC & mapping table Where is the mapping table? What happens if the device loses power? 22

Garbage Collection dead-block reclamation to provide free space for new writes find a block that contains one or more garbage pages read in the live (non-garbage) pages from that block, write out those live pages to the log (finally) reclaim the entire block for use in writing Must know Whether each page is live or dead mapping table or bitmap The number of live pages in each block for victim selection Overprovision can delay GC, and push to the background 23

Mapping Table Size Page-level mapping With a large 1-TB SSD, a single 4-byte entry per 4-KB page results in 1 GB of memory needed the device LPN: logical page number PPN : physical page number write(9, A) LPN PPN 0 12 1 11 2 10 3 9 4 8 5 7 6 6 7 5 8 4 9 3 10 2 11 1 12 0 mapping table PPN 0 PPN 1 PPN 2 PPN 3 PPN 15 data area A OOB flash memory Block 0 Block 1 Block 2 Block 3 24

Mapping Table Size Block-level mapping LBN: logical block number PBN : physical block number PPN 0 PPN 1 PPN 2 data area OOB Block 0 write(9, A) LBN PBN 0 3 A Block 1 1 2 LBN: 9/4 =2 2 1 offset: 1 3 0 PBN: 1 offset: 1 mapping table Block 2 PPN 15 flash memory Block 3 25

Update at Block Mapping 26

Hybrid Mapping To enable flexible writing but also reduce mapping costs FTL keeps a few log blocks and directs all writes to them Page-level mapping (log table) Small number of log blocks small-sized log table Normal data blocks Block-level mapping (data table) small-sized data table For read request first consult the log table; if not found, consult the data table To keep the number of log blocks small, log blocks must be merged with data blocks if no free space is available in log blocks. 27

Hybrid Mapping (BAST) J. Kim et al., A space efficient flash translation layer for compact flash systems, IEEE Trans. on Consumer Electronics, 48(2):366 375, 2002. 28

Hybrid Mapping Merge Operation 29

DFTL 30

SSD Performance Random I/O Large difference between SSD and HDD Sequential I/O Much less of a difference between SSD and HDD HDD still a good choice SSD random read performance is not as good as SSD random write performance Write buffer Performance difference between sequential and random in SSD many of the file system techniques for HDDs are still applicable to SSDs 31