The Journey of an I/O request through the Block Layer

Similar documents
Block Device Driver. Pradipta De

Linux Storage System Analysis for e.mmc With Command Queuing

Linux storage system basics

Conoscere e ottimizzare l'i/o su Linux. Andrea Righi -

CSE506: Operating Systems CSE 506: Operating Systems

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

Operating Systems Design Exam 2 Review: Fall 2010

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

Chapter 13: I/O Systems

First-In-First-Out (FIFO) Algorithm

Module 12: I/O Systems

SUSE OpenStack Cloud. Enabling your SoftwareDefined Data Center. SUSE Expert Days. Nyers Gábor Trainer &

Chapter 9 Memory Management

Chapter 5: CPU Scheduling

by I.-C. Lin, Dept. CS, NCTU. Textbook: Operating System Concepts 8ed CHAPTER 13: I/O SYSTEMS

Managing Storage: Above the Hardware

OPERATING SYSTEMS CS3502 Spring Processor Scheduling. Chapter 5

Operating Systems Unit 6. Memory Management

Device-Functionality Progression

Chapter 12: I/O Systems. I/O Hardware

Lecture 5 / Chapter 6 (CPU Scheduling) Basic Concepts. Scheduling Criteria Scheduling Algorithms

Virtual Memory Management

Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016

Chapter 13: I/O Systems

CS 4284 Systems Capstone

Best practices with SUSE Linux Enterprise Server Starter System and extentions Ihno Krumreich

Chapter 13: I/O Systems

Chapter 13: I/O Systems. Chapter 13: I/O Systems. Objectives. I/O Hardware. A Typical PC Bus Structure. Device I/O Port Locations on PCs (partial)

[08] IO SUBSYSTEM 1. 1

How To Make Databases on SUSE Linux Enterprise Server Highly Available Mike Friesenegger

MEMORY MANAGEMENT/1 CS 409, FALL 2013

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

(b) External fragmentation can happen in a virtual memory paging system.

Operating Systems Comprehensive Exam. Spring Student ID # 3/16/2006

SUSE Linux Enterprise Kernel Back to the Future

Block Device Scheduling. Don Porter CSE 506

Chapter 10: Case Studies. So what happens in a real operating system?

Memory management. Requirements. Relocation: program loading. Terms. Relocation. Protection. Sharing. Logical organization. Physical organization

Exam Guide COMPSCI 386

CS6401- Operating System UNIT-III STORAGE MANAGEMENT

File System Internals. Jo, Heeseung

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13

Virtio-blk Performance Improvement

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Open Enterprise & Open Community

Using Linux Containers as a Virtualization Option

Managing Linux Servers Comparing SUSE Manager and ZENworks Configuration Management

CPU Scheduling: Objectives

File Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

The control of I/O devices is a major concern for OS designers

Chapter 13: I/O Systems. Operating System Concepts 9 th Edition

Ben Walker Data Center Group Intel Corporation

SUSE. High Performance Computing. Eduardo Diaz. Alberto Esteban. PreSales SUSE Linux Enterprise

Properties of Processes

I/O AND DEVICE HANDLING Operating Systems Design Euiseong Seo

SUSE Linux Enterprise Server: Supported Virtualization Technologies

Sybase Adaptive Server Enterprise on Linux

CSE 120 PRACTICE FINAL EXAM, WINTER 2013

I/O Device Controllers. I/O Systems. I/O Ports & Memory-Mapped I/O. Direct Memory Access (DMA) Operating Systems 10/20/2010. CSC 256/456 Fall

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

The UNIX Time- Sharing System

Silberschatz and Galvin Chapter 12

Chapter 8 Virtual Memory

Swapping. Operating Systems I. Swapping. Motivation. Paging Implementation. Demand Paging. Active processes use more physical memory than system has

Chapter 8. Operating System Support. Yonsei University

CSI3131 Final Exam Review

CS510 Operating System Foundations. Jonathan Walpole

Readings and References. Virtual Memory. Virtual Memory. Virtual Memory VPN. Reading. CSE Computer Systems December 5, 2001.

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Modification and Evaluation of Linux I/O Schedulers

Operating Systems CSE 410, Spring Virtual Memory. Stephen Wagner Michigan State University

Outline. Operating Systems: Devices and I/O p. 1/18

Chapter 8: Memory-Management Strategies

Linux and z Systems in the Datacenter Berthold Gunreben

Operating Systems Design Exam 2 Review: Spring 2011

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Ausgewählte Betriebssysteme - Mark Russinovich & David Solomon (used with permission of authors)

Process size is independent of the main memory present in the system.

CS370 Operating Systems

CS 416: Opera-ng Systems Design March 23, 2012

Objectives and Functions Convenience. William Stallings Computer Organization and Architecture 7 th Edition. Efficiency

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

File Systems. Todays Plan. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Chapter 13: I/O Systems

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Chapter 8 Virtual Memory

Operating System Support

Module 12: I/O Systems

Chapter 13: I/O Systems

Computer Organization and Architecture. OS Objectives and Functions Convenience Making the computer easier to use

Virtual Memory. Robert Grimm New York University

STORING DATA: DISK AND FILES

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

Chapter 13: I/O Systems

Transcription:

The Journey of an I/O request through the Block Layer Suresh Jayaraman Linux Kernel Engineer SUSE Labs sjayaraman@suse.com

Introduction Motivation Scope Common cases More emphasis on the Block layer Why should you listen to this talk? Better understanding of I/O request handling Help analyze I/O related problems Just curious about finer details? 2

3 Linux I/O Path

Application I/O interface Read or write request Stream I/O: fread()/fwrite()/fgets/fputs etc. System calls: read() / write() Request to a file system or a device file For e.g. read(fd, buf, count); 4

System call handling (simplified) User mode process passes system call number CPU switches to kernel mode Finds the corresponding service routine via the system call dispatch table and invokes it. 5

Relevant concepts

Page cache In-memory store of recently accessed data Quicker subsequent access Page sized chunks (4k typically) Dynamic in size Auto-pruned (LRU) when memory is scarce 7

Writeback and Readahead Writeback Deferred writes Data copied to buffer, marked as dirty and the write returns. The dirty buffer committed to disk later Writeback triggers When page cache gets too full dirty buffer ages Per backing device flusher threads Readahead Adjacent pages read before they are actually requested Enhances disk performance and system responsiveness (only in case of sequential access) 8

9 I/O request handling at the File system Layer (1)

I/O request handling at the File system Layer (2) Storage unit of data is blocks Determine the physical location of the data File block number Vs Logical block number Get file block number from file offset (offset/block_size) File block number -> logical block number Allocate and prepare info needed by lower components (bio) Use generic block layer to submit requests 10

Block Layer

Block layer key concepts bio structure Request queues Partition remapping Device plugging and unplugging Merging and coalescing Elevator interface I/O schedulers 12

bio structure Represents in-flight block I/O operations Scatter-gather I/O I/O vectors (bio_vec) Has a pointer to block device ->end_io() callback to be used by device driver 1MB of data in a single bio (assuming 4k page size) 13

Request queue Per device list of pending I/O requests for the device Each request has one or more bio's Knows which I/O scheduler handles its request Device driver specific ->request_fn Methods for creating new requests and unplugging the device etc. 14

15 Relationship between bio, request and request_queue

Partition remapping Adjusts sector so that sector number is relative to the whole disk Sector 'n' of a partition starting at sector 'm' mapped to sector m + n of the block device. For e.g. read sector 256 of /dev/sda3 Ensures correct area is read or written Sets block device to block device descriptor of whole disk Better scheduling decisions 16

Plugging and unplugging Holds requests, allows build up of requests Why plugging? Implicit Vs Explicit block device plugging How plugging happens? Unplug triggers Implicit: unplug_thresh, unplug timer elapses (usually 3msec) Explicit: process finished submitting I/O 17

Merging and coalescing New request 18

Elevator interface Abstracts IO schedulers Provides merge searching and sorting functions Maintains a hash table of requests indexed by endsector number Back merge opportunities with constant time lookup No front merging Up to the IO scheduler One-hit merge cache stores the last request involved in a merge. Checks for both front and back-merge possibilities 19

I/O schedulers Algorithms for scheduling and re-ordering I/O operations Overall throughput Vs starvation Registered with the elevator interface Implements scheduling policy via a set of methods (elevator_ops) Uses additional queues to classify and sort requests Different I/O schedulers: noop, cfq, deadline Scheduler can be switched during runtime /sys/block/<device>/queue/scheduler 20

CFQ I/O Scheduler (default) The default scheduler Per-process sorted queues for sync requests Queues for async requests (1 per I/O priority) Round robin - picks a queue, lets it send requests for time-slice, picks another queue... Performs some anticipation 21

I/O request handling at the generic block layer Gets request from the file system <bio, rw> Ensures bio doesn't extend beyond the device Gets the request queue of the block device Remaps the request if required If plugged, attempt merge to plug list See if the request can be safely merged Different data direction? Same device? Call elevator's allow_merge_fn 22

I/O request handling at the Elevator layer Find merge opportunities in One-hit cache Find a potential merge and attempt merge If merge unsuccessful Call I/O scheduler specific merge_fn 23

I/O request handling at the I/O scheduler Elevator layer return merge possibilities or lack of Check if the request is merge-able I/O scheduler specific ->merged_fn method gets invoked If not merge-able, allocate a new request instance, fill it with data from bio Adds request to plug list if device is plugged If not, adds request to the request list Kicks the device queue by invoking the ->request_fn for that device queue 24

Handling at the device driver (simplified) Device driver specific ->request_fn Hardware-specific task Typical sequence Reads requests sequentially from the request queue Starts data transfer Sends READ/WRITE commands to disk controller The disk controller raises an interrupt to notify the device driver Waits for IO completion, invokes end_io callback Block layer does the cleanup or wakes up waiting process 25

Summary We explored different phases an I/O request goes though How the request gets handled in different layers Learned some concepts that are needed to understand block subsystem But... The Devil is actually in the code :) 26

Q & A

28 Thank you.

Different IO schedulers Noop Performs merging, but no sorting Truly random access, Intelligent devices Deadline e.g. flash, ramdisk etc. assigns to each request a deadline 500 ms for reads, 5s for writes? Three queues: Sorted queue, Read FIFO & Write FIFO Good for server workloads 29

How different layers manage disk data (Units) I/O scheduler and block device drivers => sectors VFS and mapping layer and filesystems => blocks Block device drivers => segments Page cache/disk cache => page As controllers of hardware block devices transfer data in chunks of sectors Sector: Smallest addressable unit, defined by the device (power of 2, usually 512 bytes) Page: Fixed-length block of main memory that is contiguous in both physical and virtual memory addressing. Smallest unit of data for memory allocation performed by the OS. Block: Smallest addressable unit, defined by the OS (power of 2, at least sector size, at most page size) Buffer: Represents a disk block in memory Buffer head: struct that describes a buffer 30

Why SUSE Linux Enterprise Server? Alliance Solutions: Server MISSION-CRITICAL INTEROPERABILITY The most interoperable Linux for powering physical, virtual, and cloud mission-critical workloads PERFECT GUEST Optimized to deliver superior performance as a guest OS on leading cloud-ready hypervisors INDUSTRY SUPPORT Broad industry support, with the most certified ISV applications and the most hardware certifications. 31

Why SUSE? Engineering Excellence 20+YEARS of LINUX ENGINEERING EXCELLENCE SLES was the first enterprise Linux distribution MICROSOFT RECOMMENDED The ONLY Linux recommended by Microsoft is SUSE Linux Enterprise Server SAP 70%+ Instances of SAP applications running on Linux run on SUSE Linux Enterprise Server MAINFRAMES 80%+ Instances of Linux running on mainframes run on SUSE Linux Enterprise Server HIGH PERFORMANCE COMPUTING 50% + HPC on Linux uses SUSE Linux Enterprise Server KERNEL DEVELOPMENT TOP 3 Where employees consistently rank among contributors to Linux kernel. SUSE employs 400 Linux developers. BROAD PLATFORM COVERAGE 5 ARCHITECTURES SUSE Linux Enterprise Server runs on five architectures, from x86 to mainframe UNIFIED SUPPORT Available for SUSE Linux Enterprise Server and Red Hat Linux 32

Why SUSE? Broad Ecosystem PARTNER ECOSYSTEM 5000+ MEMBERS CERTIFIED HARDWARE 13500+ Hardware systems certified and supported on SUSE Linux Enterprise more than any other Linux distribution CERTIFIED APPLICATIONS 8500 + ISV certifications more than twice as many as Red Hat SOLUTION PROVIDERS & SYSTEM INTEGRATORS 3200 Value-added dealers and resellers TECHNOLOGY PARTNERS 1300 Independent software and hardware vendors INTEROPERABILITY Part of company-wide mission CERTIFIED PRODUCTS 2000 + SUSE Linux Enterprise products certified TRAINING PARTNERS 600 33

Corporate Headquarters Maxfeldstrasse 5 90409 Nuremberg Germany +49 911 740 53 0 (Worldwide) +www.suse.com Join us on: www.opensuse.org 34

This document could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein. These changes may be incorporated in new editions of this document. SUSE may make improvements in or changes to the software described in this document at any time. Copyright 2011 Novell, Inc. All rights reserved. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States. All third-party trademarks are the property of their respective owners.