IO-Lite: A Unified I/O Buffering and Caching System

Similar documents
IO-Lite: A Unified I/O Buffering and Caching System

VM and I/O. IO-Lite: A Unified I/O Buffering and Caching System. Vivek S. Pai, Peter Druschel, Willy Zwaenepoel

IO-Lite: A Unied I/O Buering and Caching System. Rice University. disk accesses and reducing throughput. Finally,

IO-Lite: A unied I/O buering and caching system. Vivek S. Pai Peter Druschel Willy Zwaenepoel. Rice University

Flash: an efficient and portable web server

Operating System Architecture. CS3026 Operating Systems Lecture 03

Virtual Memory Paging

A Content Delivery Accelerator in Data-Intensive Servers

Making the Box Transparent: System Call Performance as a First-class Result. Yaoping Ruan, Vivek Pai Princeton University

Extensible Kernels: Exokernel and SPIN

A Case for Network-Centric Buffer Cache Organization

Lecture 8: Other IPC Mechanisms. CSC 469H1F Fall 2006 Angela Demke Brown

Topics. Lecture 8: Other IPC Mechanisms. Socket IPC. Unix Communication

UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD

Operating System Structure

Device-Functionality Progression

Chapter 12: I/O Systems. I/O Hardware

Operating System Structure

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008

Ausgewählte Betriebssysteme - Mark Russinovich & David Solomon (used with permission of authors)

Operating Systems. Operating System Structure. Lecture 2 Michael O Boyle

Running on the Bare Metal with GeekOS

Operating Systems 2010/2011

Network Level Framing in INSTANCE

Chapter 13: I/O Systems

Fbufs: A High-Bandwidth Cross-Domain Transfer Facility 1

Practical, transparent operating system support for superpages

History of FreeBSD. FreeBSD Kernel Facilities

Introduction. CS3026 Operating Systems Lecture 01

Principles of Operating Systems CS 446/646

Memory Management Strategies for Data Serving with RDMA

Virtualization, Xen and Denali

Information Flow Control For Standard OS Abstractions

Chapter 13: I/O Systems

Process. Program Vs. process. During execution, the process may be in one of the following states

Reducing Data Copying Overhead in Web Servers

MICROKERNELS: MACH AND L4

by I.-C. Lin, Dept. CS, NCTU. Textbook: Operating System Concepts 8ed CHAPTER 13: I/O SYSTEMS

CHAPTER 3 - PROCESS CONCEPT

Design Overview of the FreeBSD Kernel CIS 657

The control of I/O devices is a major concern for OS designers

Operating Systems. 17. Sockets. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

CS370 Operating Systems

EECS 3221 Operating System Fundamentals

Design Issues 1 / 36. Local versus Global Allocation. Choosing

EECS 3221 Operating System Fundamentals

Network Implementation

Chapter 13: I/O Systems

Module 11: I/O Systems

I/O Device Controllers. I/O Systems. I/O Ports & Memory-Mapped I/O. Direct Memory Access (DMA) Operating Systems 10/20/2010. CSC 256/456 Fall

Networking Subsystem in Linux. Manoj Naik IBM Almaden Research Center

I/O Design, I/O Subsystem, I/O-Handler Device Driver, Buffering, Disks, RAID January WT 2008/09

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Chapter 3: Processes. Operating System Concepts 9 th Edition

OS Design Approaches. Roadmap. OS Design Approaches. Tevfik Koşar. Operating System Design and Implementation

Silberschatz and Galvin Chapter 12

OS structure. Process management. Major OS components. CSE 451: Operating Systems Spring Module 3 Operating System Components and Structure

ELEC 377 Operating Systems. Week 1 Class 2

Software Routers: NetMap

Optimizing TCP Receive Performance

Chapter 13: I/O Systems

Chapter 13: I/O Systems. Chapter 13: I/O Systems. Objectives. I/O Hardware. A Typical PC Bus Structure. Device I/O Port Locations on PCs (partial)

Introduction to Operating Systems. Chapter Chapter

One Server Per City: Using TCP for Very Large SIP Servers. Kumiko Ono Henning Schulzrinne {kumiko,

Introduction to Operating Systems. Chapter Chapter

Process. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CSC Operating Systems Fall Lecture - II OS Structures. Tevfik Ko!ar. Louisiana State University. August 27 th, 2009.

Announcements. Computer System Organization. Roadmap. Major OS Components. Processes. Tevfik Ko!ar. CSC Operating Systems Fall 2009

Kernel Types Simple OS Examples System Calls. Operating Systems. Autumn CS4023

Chapter 13: I/O Systems

Extensible Kernels are Leading OS Research Astray

A Look at Intel s Dataplane Development Kit

Architectural Support for Operating Systems. Jinkyu Jeong ( Computer Systems Laboratory Sungkyunkwan University

Sistemi in Tempo Reale

Virtual Memory Outline

Chapter 13: I/O Systems

Design Overview of the FreeBSD Kernel. Organization of the Kernel. What Code is Machine Independent?

COS 318: Operating Systems

Introduction to Operating. Chapter Chapter

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

The Performance of µ-kernel-based Systems

Resource Containers. A new facility for resource management in server systems. Presented by Uday Ananth. G. Banga, P. Druschel, J. C.

SPIN Operating System

Computer System Overview OPERATING SYSTEM TOP-LEVEL COMPONENTS. Simplified view: Operating Systems. Slide 1. Slide /S2. Slide 2.

Module 12: I/O Systems

Capriccio : Scalable Threads for Internet Services

Chapter 12: I/O Systems

Chapter 13: I/O Systems

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition

CSE544 Database Architecture

CSE 120 Principles of Operating Systems

Chapter 3: Process Concept

Module 12: I/O Systems

Chapter 3: Process Concept

RESOURCE MANAGEMENT MICHAEL ROITZSCH

Directed Point: An Efficient Communication Subsystem for Cluster Computing. Abstract

File System Internals. Jo, Heeseung

Evaluation of a Zero-Copy Protocol Implementation

OS concepts and structure. q OS components & interconnects q Structuring OSs q Next time: Processes

Transcription:

IO-Lite: A Unified I/O Buffering and Caching System Vivek S. Pai, Peter Druschel and Willy Zwaenepoel Rice University (Presented by Chuanpeng Li) 2005-4-25 CS458 Presentation 1

IO-Lite Motivation Network hardware is very fast User perceived speed is not always so fast It also depends on the performance of server systems General-purpose OS is inadequate Inefficient buffering and caching schemes Example: a web server sends a file, 3 copies of data exist file cache server application buffer network subsystem buffer 2005-4-25 CS458 Presentation 2

Outline Motivation Problems in detail Traditional approaches IO-Lite design IO-Lite implementation Performance evaluation 2005-4-25 CS458 Presentation 3

Problems in more detail Each major IO subsystem has its own buffering and caching mechanism File system cache Network subsystem buffer IPC buffer (e.g., pipe) Application buffers Other IO subsystems Redundant data copying: CPU overhead Multiple buffering: waste memory, more miss rate in file-system cache Lack of cross-subsystem optimization: Optimizations like TCP checksum caching [4] support for application-specific cache replacement policy [5] 2005-4-25 CS458 Presentation 4

Traditional approaches Memory-mapped files: can avoid double buffering in file system and application, but still buffering in the network subsystem Sendfile system call in Linux, FreeBSD, windows NT Send file through socket directly Do not support dynamic content Fbufs copy-free cross-domain transfer and buffering facility[3] Mainly for handling network streams Do not support file system cache Built for non-unix environment 2005-4-25 CS458 Presentation 5

IO-Lite overview Unify I/O buffering and caching system Almost all subsystems share a single physical copy of the data Principles: Immutable buffers Mutable buffer aggregates 2005-4-25 CS458 Presentation 6

Immutable buffers (IO-Lite design) Buffers are allocated with initial content If shared, the content can not be modified (read-only sharing) Eliminate problems: synchronization, protection, consistency and so on Efficient data transfer across protection domain boundaries: all subsystems safely refer to a single physical copy of data 2005-4-25 CS458 Presentation 7

Buffer, buffer aggregate, and slice <pointer, length> (from ref 1) 2005-4-25 CS458 Presentation 8

Buffer aggregates (IO-Lite design) Data in immutable buffers can not be modified in place Buffer aggregates are built on top of buffers All data accesses are through buffer aggregates An ordered list of <pointer, length> pairs Buffer aggregates are mutable How to modify: newly allocated buffer for modified content (copy on write) Reference counting of buffers allows safe reclamation 2005-4-25 CS458 Presentation 9

Aggregate modification 2005-4-25 CS458 Presentation 10

IO-Lite and applications (IO-Lite design) IO-Lite API IOL_read(int fd, IOL_Agg **aggr, size_t size) read data to a buffer aggregate IOL_write(int fd, IOL_Agg *aggr) write data in a buffer aggregate to a file Linking with modified IO Libs (e.g., stdio) Legacy applications not affected 2005-4-25 CS458 Presentation 11

Buffer allocation (IO-Lite design) Immutable buffers are allocated in a reserved region of virtual address space: IO-Lite window The window appears in the address space of all protection domains, including the kernel In allocation, the owner has temporary write permission to the buffer so it can be initialized After initialization, the buffer is immutable 2005-4-25 CS458 Presentation 12

Access control (IO-Lite design) Access control and protection are at the granularity of process IO-Lite maintains cached pools of buffers with the same ACL (Access Control List) Each pool has its own ACL Programs determine the ACL of an data object before storing it in memory The ACL of the data determines the choice of a pool for a new buffer 2005-4-25 CS458 Presentation 13

Interprocess communication (IO-Lite design) When I/O data is transferred buffer aggregates passed by value buffers are passed by reference IPC is based on page remapping and shared memory (like in fbufs [3]) When an immutable buffer is transferred, VM remapping is done in the receiver When deallocated, the buffer is added to a buffer pool, mappings persist, buffers can be reused When the buffer is reused, no mapping changes are required 2005-4-25 CS458 Presentation 14

IO-Lite and file system (IO-Lite design) Data access is through buffer aggregate File cache supports mapping <file-id,offset,length> -->buffer aggregate Write operations cause buffer replacement Example of IOL_write() after IOL_read() 2005-4-25 CS458 Presentation 15

Example: write after read in file cache 2005-4-25 CS458 Presentation 16

IO-Lite and network (IO-Lite design) Use buffer aggregates to store and manipulate network packets Network device driver determines the ACL of the data before storing it in memory Drivers get that from packet headers using a packet filter: early demultiplexing 2005-4-25 CS458 Presentation 17

Issues to consider Cache replacement and paging Impact of immutable buffer Cross-subsystem optimization Operation in web servers 2005-4-25 CS458 Presentation 18

Cache replacement and paging (1) Cached data may be concurrently accessed by multiple applications Data can be shared in complex ways 2005-4-25 CS458 Presentation 19

Cache replacement and paging (2) Simple strategy: Cache entries maintained in a list ordered by last access time Approximate LRU replacement How cache eviction is triggered More than half of the VM pages selected for replacement are in file cache How file cache is enlarged when there is miss in the cache IO-Lite buffers reside in pageable virtual memory evicted page may need to be written back to more than one backing stores 2005-4-25 CS458 Presentation 20

Impact of immutable buffer Data modification need to allocate new buffer Every word is modified (fully rewriting) The cost is buffer allocation A subset of words is modified Logically combine the unmodified and modified portion The cost is buffer allocation + chaining Modifications are widely scattered Use mmap interface to support modification in place 2005-4-25 CS458 Presentation 21

Cross-subsystem optimization Optimizations leverage the ability to uniquely identify a particular I/O data object throughout the system TCP checksum caching Internet checksum is cached for each slice, avoid repeated checksum calculation Generation number of each buffer is increased every time it is reallocated. Generation number and the buffer address uniquely identify the content of a buffer. 2005-4-25 CS458 Presentation 22

Operation in web servers (1): static content Traditionally, document may be stored in the file cache and the TPC transmission buffers With IO-Lite, all data coping and multiple buffering is eliminated Buffer aggregates are passed between file cache, server application, and network subsystem TCP checksum can be reused 2005-4-25 CS458 Presentation 23

Operation in web servers (2): dynamic content CGI process transfer data to web server process by IPC Multiple buffering in CGI, server process and TCP transmission buffer With IO-Lite, only buffer aggregates are passed, only one copy data is used TCP checksum is reused for portion of dynamic content that are repeated transmitted 2005-4-25 CS458 Presentation 24

Outline Motivation Problems in detail Traditional approaches IO-Lite design IO-Lite implementation Performance evaluation 2005-4-25 CS458 Presentation 25

Implementation (1) Loadable kernel module in FreeBSD 2.2.6; the Lib provides buffer aggregate manipulation routines and stubs for the IO-Lite system calls Network subsystem: encapsulate IO-Lite buffers inside BSD network buffer abstraction mbufs. The mbufs outside interface is unchanged. TCP/IP stack source code unchanged 2005-4-25 CS458 Presentation 26

Implementation (2) File system: IO-Lite file cache module replaces the original BSD buffer cache VM system: IO-Lite buffer are allocated in IO-Lite window, a virtual address space in all processes and kernel No significant changes in terms of how to page-in and pageout Replacement policy of IO-Lite buffers is implemented by page-out handler IPC: modify BSD pipe implementation Transfer buffer aggregates instead of data Ensure the IO-Lite buffers are readable in receiving domain 2005-4-25 CS458 Presentation 27

Experimental environments Pentium II 333MHz, 128MB, 5 network adaptors, 100Mbps Etheret Flash: high performance web server One of the fastest servers available Event driven model Flash-Lite is a IO-Lite version of flash Apache (process based) is used for comparison 2005-4-25 CS458 Presentation 28

Web servers with static files and CGI Static vs CGI Nonpersistent connections v.s. persistent connections (figures from [1]) 2005-4-25 CS458 Presentation 29

Trace based evaluation Rice trace (from [1]) 2005-4-25 CS458 Presentation 30

Wan effects (from [1]) 2005-4-25 CS458 Presentation 31

Other applications (from [1]) 2005-4-25 CS458 Presentation 32

Conclusion IO-Lite provides an efficient and unified framework for IO buffering and caching Experiments show that IO-Lite can improve the performance of web servers and other IO intensive application by 40%-80% 2005-4-25 CS458 Presentation 33

References 1. IO-Lite: A Unified I/O Buffering and Caching System, V. Pai, P. Druschel, and W. Zwaenepoel, ACM TOCS 00. 2. Flash: An Efficient and Portable Web server, V. Pai, P. Druschel, and W. Zwaenepoel, USENIX 99 3. Fbufs: A High-Bandwidth Cross-Domain Transfer Facility, P. Druschel and L. Peterson, SOSP 93 4. Application Performance and Flexibility on Exokernel Systems, M. Kaashoek et al, SOSP 97 5. Implementation and Performance of Application-controlled File Caching, P. Cao and E. Felten, OSDI 94 2005-4-25 CS458 Presentation 34