Linux storage system basics

Similar documents
Linux storage system basics

BLOCK DEVICES. Philipp Dollst

Linux BIO and Device Mapper

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

The Journey of an I/O request through the Block Layer

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

Block device drivers. Block device drivers. Thomas Petazzoni Free Electrons

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

CS370 Operating Systems

CHAPTER 12: MASS-STORAGE SYSTEMS (A) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CS370 Operating Systems

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Segmentation with Paging. Review. Segmentation with Page (MULTICS) Segmentation with Page (MULTICS) Segmentation with Page (MULTICS)

Mass-Storage Structure

Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

CSE506: Operating Systems CSE 506: Operating Systems

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

File System Management

Operating Systems. V. Input / Output

Lecture 16: Storage Devices

I/O Device Controllers. I/O Systems. I/O Ports & Memory-Mapped I/O. Direct Memory Access (DMA) Operating Systems 10/20/2010. CSC 256/456 Fall

CSE 451: Operating Systems Spring Module 12 Secondary Storage

[537] I/O Devices/Disks. Tyler Harter

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Operating Systems. Operating Systems Professor Sina Meraji U of T

CSE 4/521 Introduction to Operating Systems. Lecture 27 (Final Exam Review) Summer 2018

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

CSE 153 Design of Operating Systems

Professor: Pete Keleher! Closures, candidate keys, canonical covers etc! Armstrong axioms!

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Mass-Storage Structure

CS3600 SYSTEMS AND NETWORKS

CSE 120. Overview. July 27, Day 8 Input/Output. Instructor: Neil Rhodes. Hardware. Hardware. Hardware

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

Chapter 10: Mass-Storage Systems

Block Device Driver. Pradipta De

Chapter 12: Mass-Storage

Chapter 12: Secondary-Storage Structure. Operating System Concepts 8 th Edition,

CS 261 Fall Mike Lam, Professor. Memory

PC-based data acquisition II

Block Device Scheduling. Don Porter CSE 506

UNIT 4 Device Management

Chapter 10: Mass-Storage Systems

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

I/O & Storage. Jin-Soo Kim ( Computer Systems Laboratory Sungkyunkwan University

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

Lecture on Storage Systems

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

CS420: Operating Systems. Mass Storage Structure

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

CIS 21 Final Study Guide. Final covers ch. 1-20, except for 17. Need to know:

PC-based data acquisition II

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

Data Storage and Query Answering. Data Storage and Disk Structure (2)

Adapted from instructor s supplementary material from Computer. Patterson & Hennessy, 2008, MK]

V. Mass Storage Systems

I/O CANNOT BE IGNORED

CMSC424: Database Design. Instructor: Amol Deshpande

Operating Systems. V. Input / Output. Eurecom

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Chapter 6. Storage and Other I/O Topics

Introduction. Operating Systems. Outline. Hardware. I/O Device Types. Device Controllers. (done)

STORING DATA: DISK AND FILES

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Mass-Storage Systems. Mass-Storage Systems. Disk Attachment. Disk Attachment

CMSC424: Database Design. Instructor: Amol Deshpande

EIDE, ATA, SATA, USB,

Chapter 12: Mass-Storage Systems. Operating System Concepts 8 th Edition,

Readings. Storage Hierarchy III: I/O System. I/O (Disk) Performance. I/O Device Characteristics. often boring, but still quite important

Memory Management. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory

Disks: Structure and Scheduling

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Storage Systems. NPTEL Course Jan K. Gopinath Indian Institute of Science

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

Operating Systems. Mass-Storage Structure Based on Ch of OS Concepts by SGG

I/O Systems and Storage Devices

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

Introduction to I/O and Disk Management

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

Caching and reliability

Introduction to I/O and Disk Management

Introduction. Secondary Storage. File concept. File attributes

Disk Scheduling COMPSCI 386

EECS 482 Introduction to Operating Systems

CSE 153 Design of Operating Systems Fall 2018

NAND Flash-based Storage. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Hard Disk Drives (HDDs) Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Storage Systems : Disks and SSDs. Manu Awasthi July 6 th 2018 Computer Architecture Summer School 2018

Linux Software RAID Level 0 Technique for High Performance Computing by using PCI-Express based SSD

Transcription:

Linux storage system basics

Storage device A computer runs programs that are loaded in memory The program is loaded from a storage device The result of an execution is stored to a storage device Storage device is Nonvolatile Larger than memory Slower than memory

Stores program and data to load Indexed by physical meaning (cabinet, ) History of storage devices Punch card / punch tape

History of storage devices Magnetic tape UNIVAC / 1951 tar: sequential I/O header header header data data data

History of storage devices Magnetic disk IBM's RAMAC 350 3 dimensional addressing (head, cylinder, sector) Still used as a main storage device

Mechanical movement defines I/O latency Magnetic disk characteristics T seek + T rot + Volume T avg = 1 2 RPM + Bandwidth Volume Bandwidth Smaller the request, lower the throughput T avg = 1 2 RPM vs. T avg = Volume Bandwidth 8~32MB I/O does the magic

History of storage devices Non-volatile memories Flash memory Future: PCM, STTM, ZRAM,

Pros Flash memory characteristics Small form factor No mechanical parts: good random I/O throughput Low power consumption Cons Read, program (write), erase interface Slow page programming, even slower block erase Limited block erase cycle Expensive (price / capacity) Types and timing SLC / MLC / TLC

Non-volatile Slow and large Storage hierarchy Storage device characteristics Faster CPU Register L1/L2 cache Cache for RAM L3 cache RAM Cache for HDD/SSD Larger HDD / SSD Tape Primary storage Backup of HDD/SSD

Storage abstraction / Application Application Application usr home File system ext4 VFAT NFS Flat address Block I/O layer Device driver

The Block I/O Layer

Block device concepts Old HDD access method: (head, cylinder, sector) Differs for each disk SSDs has different structure Generic method: contiguous address space 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Let storage devices do the complex managements (Head, cylinder) mapping, flash translation layer, Bad block management Access units: sectors (512 bytes) / blocks (4KB)

I/O subsystem architecture read/write (device, addr, offset, nr sectors) Hard to manipulate Interrupt BUS SSD PCI SATA HDD DMA HDD

Split storage area: safe-keeping, multi OS Partitions / /usr/local /home swap Partition table types Master Boot Record GUID Partition Table Large device support Partition name support Consistency mechanism

open(), read(), write(), mkdir(), Block I/O layer Virtual File System Page Cache Ext4 FAT YAFFS NFS FUSE Block I/O layer Partition support Device mapper LVM RAID Cache I/O scheduler read/write(sector, length, buffer) Operate on struct block_device Allows random access to a block device Supports partitioning, I/O scheduling, <linux/fs.h>

Buffer head and BIO Buffer head I/O mechanism submit_bh() vs. submit_bio() Buffer head Buffer: less than a page size Uses virtual address to map blocks Deprecated API, but still used in some places Uses BIO to execute I/O operations API: alloc_buffer_head(),get_bh(),put_bh(),submit_bh()

Block I/O BIO structure (since 2.5) An array of segments of pages (bio_vec) Can use unmapped pages Type Name block_device bi_bdev sector_t bi_sector struct bio * next long bi_rw int bi_size short bi_vcnt struct bio_vec * bi_io_vec bio_end_io_t bi_end_io void * bi_private

BIO API Life cycle of BIO bio_alloc(gfp_mask, nr_vecs) bio_get(bio) bio_add_page(bio, page, offset, length) Type block_device sector_t struct bio * long int short struct bio_vec * bio_end_io_t void * Name bi_bdev bi_sector next bi_rw bi_size bi_vcnt bi_io_vec bi_end_io bi_private bio_for_each_segment(bvl, bio, i){ bio_put(bio) Need to implement callback fn submit_bio(rw, bio) bio->bi_end_io(bio, errno) bio_endio(bio, errno)

Request queues Request and request_queue bios merged by I/O scheduler bdev request_queue elevator request request request request request bio bio bio Request queue operations request_fn() make_request_fn() submit_bio() Used for device mapper implementation generic_make_request() make_request_fn() Device mapper blk_queue_bio() I/O scheduler

Gathers block I/Os to make large request Minimize seek time Scheduling algorithms in OS courses I/O scheduler FIFO Scan Elevator

Based on (Linus) elevator algorithm Merges BIOs and executes them Linux I/O schedulers Available schedulers noop: simple elevator Deadline: prevent starvation Anticipatory: wait for a few msec to merge contiguous I/O Complete Fair Queuing: per process queue for fairness Choosing and displaying I/O schedulers /sys/block/{device name}/queue/scheduler

Block device (block_device) Block device data structure Physical disk + partition + virtual mapped device Ex. /dev/sda, /dev/sda1, /dev/mapper/vol1, General disk (gendisk) Physical/virtual device driver Ex. /dev/sda Partitions gendisk->part_tbl[] bdev->bd_part

Block device and gendisk /dev/sda /dev/sda1 /dev/sda3 part_tbl part start_sect nr_sects

Creation / destruction gendisk struct gendisk *alloc_disk(int minors); int add_disk(gendisk); del_gendisk(gendisk); Block device struct block_device *bdget_disk( struct gendisk *disk, int partno); int invalidate_device( struct block_device *bdev, bool kill_dirty); Partition int rescan_partitions( struct gendisk *disk, struct block_device *bdev)

block_device_operations gendisk operation open(struct block_device *, fmode_t); release(struct gendisk *, fmode_t); ioctl(struct block_device *, fmode_t, unsigned, unsigned long); direct_access(struct block_device *, sector_t, void **, unsigned long *); check_events(struct gendisk *disk, unsigned int clearing); request_queue->request_fn() Set by blk_init_queue() Sends I/O requests to device driver (gendisk)

submit_bio() BIO request_fn() request Block device: abstracting HDDs, SSDs, /dev/sdxn a~z, aa~zz >= 1, or none Accessing block device Through file system Review: Block I/O layer fd = open( /dev/sdb2, O_RDWR); lseek(fd, 0, SEEK_SET) read(fd, sb, 4096); Reading super block Through device node allows file operation /dev/sda1 /dev/sda2 /dev/sda3 Partition table Linux implementation layers Virtual file system Block I/O layer Device driver ext4 btrfs FAT Device mapper LVM RAID I/O scheduler CFQ NOOP SSD LVM SATA SCSI HDD RAID