MegaPipe: A New Programming Interface for Scalable Network I/O
|
|
- Heather Simmons
- 6 years ago
- Views:
Transcription
1 MegaPipe: A New Programming Interface for Scalable Network I/O Sangjin Han in collabora=on with Sco? Marshall Byung- Gon Chun Sylvia Ratnasamy University of California, Berkeley Yahoo! Research
2 tl;dr? MegaPipe is a new network programming API for message- oriented workloads to avoid the performance issues of BSD Socket API OSDI
3 Two Types of Network Workloads 1. Bulk- transfer workload One way, large data transfer Ex: video streaming, HDFS Very cheap A half CPU core is enough to saturate a 10G link OSDI
4 Two Types of Network Workloads 1. Bulk- transfer workload One way, large data transfer Ex: video streaming, HDFS Very cheap A half CPU core is enough to saturate a 10G link 2. Message- oriented workload Short connec=ons or small messages Ex: HTTP, RPC, DB, key- value stores, CPU- intensive! OSDI
5 BSD Socket API Performance Issues n_events = epoll_wait( ); // wait for I/O readiness for ( ) { new_fd = accept(listen_fd); // new connec=on bytes = recv(fd2, buf, 4096); // new data for fd2 Issues with message- oriented workloads System call overhead OSDI
6 BSD Socket API Performance Issues n_events = epoll_wait( ); // wait for I/O readiness for ( ) { new_fd = accept(listen_fd); // new connec=on bytes = recv(fd2, buf, 4096); // new data for fd2 Issues with message- oriented workloads System call overhead Shared listening socket OSDI
7 BSD Socket API Performance Issues n_events = epoll_wait( ); // wait for I/O readiness for ( ) { new_fd = accept(listen_fd); // new connec=on bytes = recv(fd2, buf, 4096); // new data for fd2 Issues with message- oriented workloads System call overhead Shared listening socket File abstrac=on overhead OSDI
8 Microbenchmark: How Bad? RPC- like test on an 8- core Linux server (with epoll) 768 Clients Server new TCP connec=on request (64B) 3. Number of cores 2. Connec=on length response (64B) 10 transac=ons Teardown 1. Message size OSDI
9 1. Small Messages Are Bad 10 Throughput CPU Usage 100 Throughput (Gbps) CPU Usage (%) OSDI K 2K 4K 8K 16K Low throughput Message Size (B) High overhead 9
10 2. Short Connec=ons Are Bad 1.5 Throughput (1M transactions/s) OSDI x lower Number of Transactions per Connection 10
11 3. Mul=- Core Will Not Help (Much) Throughput (1M transactions/s) Ideal scaling Actual scaling # of CPU Cores OSDI
12 MEGAPIPE DESIGN OSDI
13 Design Goals Concurrency as a first- class ci=zen Unified interface for various I/O types Network connec=ons, disk files, pipes, signals, etc. Low overhead & mul=- core scalability Main focus of this presenta=on OSDI
14 Overview Problem Cause Solution Low per- core performance System call overhead System call batching Poor mul=- core scalability Shared listening socket VFS overhead Listening socket par==oning Lightweight socket OSDI
15 Handle Similar to file descriptor Key Primi=ves But only valid within a channel TCP connec=on, pipe, disk file, Channel Per- core, bidirec=onal pipe between user and kernel Mul=plexes I/O opera=ons of its handles OSDI
16 Sketch: How Channels Help (1/3) User Handles I/O Batching Channel Kernel OSDI
17 Sketch: How Channels Help (2/3) Core 1 Core 2 Core 3 accept() Shared accept queue New connec=ons OSDI
18 Sketch: How Channels Help (2/3) Core 1 Core 2 Core 3 New connec=ons Listening socket par==oning OSDI
19 Sketch: How Channels Help (3/3) Core 1 Core 2 Core 3 VFS OSDI
20 Sketch: How Channels Help (3/3) Core 1 Core 2 Core 3 VFS Lightweight socket OSDI
21 MegaPipe API Func=ons mp_create() / mp_destroy() Create/close a channel mp_register() / mp_unregister() Register a handle (regular FD or lwsocket) into a channel mp_accept() /mp_read() / mp_write() / Issue an asynchronous I/O command for a given handle mp_dispatch() Dispatch an I/O comple=on event from a channel OSDI
22 Comple=on No=fica=on Model BSD Socket API Wait- and- Go (Readiness model) MegaPipe Go- and- Wait (Comple=on no=fica=on) epoll_ctl(fd1, EPOLLIN); epoll_ctl(fd2, EPOLLIN); epoll_wait( ); ret1 = recv(fd1, ); ret2 = recv(fd2, ); L mp_read(handle1, ); mp_read(handle2, ); ev = mp_dispatch(channel); OSDI ev = mp_dispatch(channel); J Batching J Easy and intui=ve J J Compa=ble with disk files
23 Transparent batching 1. I/O Batching Exploits parallelism of independent handles Applica=on MegaPipe User- Level Library MegaPipe API (non-batched) Read data from handle 6 Accept a new connection Read data from handle 3 Write data to handle 5 New connection arrived Write done to handle 5 Read done from handle 6 Batched system calls MegaPipe Kernel Module OSDI
24 2. Listening Socket Par==oning Per- core accept queue for each channel Instead of the globally shared accept queue mp_register() Listening socket User Accept queue Kernel OSDI
25 2. Listening Socket Par==oning Per- core accept queue for each channel Instead of the globally shared accept queue mp_register() Listening socket User Accept queue Accept queue Accept queue Kernel OSDI
26 2. Listening Socket Par==oning Per- core accept queue for each channel Instead of the globally shared accept queue Listening socket mp_accept() User Accept queue Accept queue Accept queue Kernel New connec=ons OSDI
27 3. lwsocket: Lightweight Socket Common- case op=miza=on for sockets Sockets are ephemeral and rarely shared Bypass the VFS layer Convert into a regular file descriptor only when necessary File descriptor File instance (states) dentry File name? inode TCP socket OSDI
28 3. lwsocket: Lightweight Socket Common- case op=miza=on for sockets Sockets are ephemeral and rarely shared Bypass the VFS layer Convert into a regular file descriptor only when necessary File descriptor File descriptor File instance (states) dup() or fork() dentry inode TCP socket OSDI
29 3. lwsocket: Lightweight Socket Common- case op=miza=on for sockets Sockets are ephemeral and rarely shared Bypass the VFS layer Convert into a regular file descriptor only when necessary File descriptor File descriptor open() File instance (states) File instance (states) dentry inode TCP socket OSDI
30 3. lwsocket: Lightweight Socket Common- case op=miza=on for sockets Sockets are ephemeral and rarely shared Bypass the VFS layer Convert into a regular file descriptor only when necessary File descriptor lwsocket File instance (states) dentry inode TCP socket TCP socket OSDI
31 EVALUATION OSDI
32 Microbenchmark 1/2 Throughput improvement with various message sizes Throughput Improvement (%) KiB # of CPU Cores 64 B 256 B 512 B 1 KiB 4 KiB OSDI
33 Microbenchmark 1/2 Throughput improvement with various message sizes Throughput Improvement (%) KiB # of CPU Cores 64 B 256 B 512 B 1 KiB 4 KiB OSDI
34 Microbenchmark 1/2 Throughput improvement with various message sizes Throughput Improvement (%) KiB # of CPU Cores 64 B 256 B 512 B 1 KiB 4 KiB OSDI
35 Parallel Speedup Mul=- core scalability Microbenchmark 2/2 with various connec=on lengths (# of transac=ons) Baseline # of CPU Cores OSDI MegaPipe # of CPU Cores
36 Macrobenchmark memcached In- memory key- value store Limited scalability nginx Object store is shared by all cores with a global lock Web server Highly scalable Nothing is shared by cores, except for the listening socket OSDI
37 memcached memaslap with 90% GET, 10% SET, 64B keys, 1KB values 1050 Throughput (1k requests/s) OSDI x 3.9x 2.4x 1.3x Global lock bo?leneck Number of Requests per Connection MegaPipe Baseline 37
38 memcached memaslap with 90% GET, 10% SET, 64B keys, 1KB values Throughput (1k requests/s) MegaPipe-FL Baseline-FL MegaPipe Baseline OSDI Number of Requests per Connection 38
39 nginx Based on Yahoo! HTTP traces: 6.3KiB, 2.3 trans/conn on avg. 20 Improvement 100 Throughput (Gbps) MegaPipe Baseline Improvement (%) # of CPU Cores OSDI
40 CONCLUSION OSDI
41 Related Work Batching [FlexSC, OSDI 10] [libflexsc, ATC 11] Excep=on- less system call MegaPipe solves the scalability issues Par==oning [Affinity- Accept, EuroSys 12] Per- core accept queue MegaPipe provides explicit control over par==oning VFS scalability [Mosbench, OSDI 10] MegaPipe bypasses the issues rather than mi=ga=ng OSDI
42 Conclusion Short connec=ons or small messages: High CPU overhead Poorly scaling with mul=- core CPUs MegaPipe Key abstrac=on: per- core channel Enabling three op=miza=on opportuni=es: Batching, par==oning, lwsocket 15+% improvement for memcached, 75% for nginx OSDI
43 BACKUP SLIDES OSDI
44 Small Messages with MegaPipe Baseline Throughput MegaPipe 1.8 Throughput (1M trans/s) # of Transactions per Connection OSDI
45 1. Small Messages Are Bad à Why? # of messages ma?ers, not the volume of traffic Per- message cost >>> per- byte cost 1KB msg is only 2% more expensive than 64B msg 10G link with 1KB messages à 1M IOPS! Thus 1M+ system calls System calls are expensive [FlexSC, 2010] Mode switching between kernel and user CPU cache pollu=on OSDI
46 Short Connec=ons with MegaPipe Baseline CPU Usage MegaPipe Throughput (Gbps) CPU Usage (%) K 2K 4K 8K 16K Message Size (B) 0 OSDI
47 2. Short Connec=ons Are Bad à Why? Connec=on establishment is expensive Three- way handshaking / four- way teardown More packets More system calls: accept(), epoll_ctl(), close(), Socket is represented as a file in UNIX File overhead VFS overhead OSDI
48 Mul=- Core Scalability with MegaPipe Baseline Per-Core Efficiency MegaPipe Throughput (1M trans/s) Efficiency (%) # of CPU Cores OSDI
49 3. Mul=- Core Will Not Help à Why? Shared queue issues [Affinity- Accept, 2012] Conten=on on the listening socket Poor connec=on affinity User accept() Listening socket TX Kernel RX OSDI
50 3. Mul=- Core Will Not Help à Why? File/VFS mul=- core scalability issues Applica=on Process file descriptor table Shared by threads VFS File instance dentry inode Globally visible in the system OSDI 2012 TCP/IP TCP socket 50
51 Overview User application thread MegaPipe API Handles MegaPipe user-level library Channel Batched async I/O commands Batched completion events Core 1 Core 2 Core N Channel instance lwsocket handles File handles Pending completion events VFS TCP/IP OSDI 2012 Linux kernel 51
52 Ping- Pong Server Example ch = mp_create() handle = mp_register(ch, listen_sd, mask=my_cpu_id) mp_accept(handle) while true: ev = mp_dispatch(ch) conn = ev.cookie if ev.cmd == ACCEPT: mp_accept(conn.handle) conn = new Connection() conn.handle = mp_register(ch, ev.fd, cookie=conn) mp_read(conn.handle, conn.buf, READSIZE) elif ev.cmd == READ: mp_write(conn.handle, conn.buf, ev.size) elif ev.cmd == WRITE: mp_read(conn.handle, conn.buf, READSIZE) elif ev.cmd == DISCONNECT: mp_unregister(ch, conn.handle) delete conn OSDI
53 Contribu=on Breakdown OSDI
54 memcached latency Latency (µs) Baseline-FL 99% MegaPipe-FL 99% Baseline-FL 50% MegaPipe-FL 50% # of Concurrent Client Connections OSDI
55 Clean- Slate vs. Dirty- Slate MegaPipe: a clean- slate approach with new APIs Quick prototyping for various op=miza=ons Performance improvement: worthwhile! Can we apply the same techniques back to the BSD Socket API? Each technique has its own challenges Embracing all could be even harder Future Work TM OSDI
MegaPipe: A New Programming Interface for Scalable Network I/O
MegaPipe: A New Programming Interface for Scalable Network I/O Sangjin Han +, Scott Marshall +, Byung-Gon Chun *, and Sylvia Ratnasamy + + University of California, Berkeley *Yahoo! Research Abstract We
More informationBringing&the&Performance&to&the& Cloud &
Bringing&the&Performance&to&the& Cloud & Dongsu&Han& KAIST & Department&of&Electrical&Engineering& Graduate&School&of&InformaAon&Security& & The&Era&of&Cloud&CompuAng& Datacenters&at&Amazon,&Google,&Facebook&
More informationException-Less System Calls for Event-Driven Servers
Exception-Less System Calls for Event-Driven Servers Livio Soares and Michael Stumm University of Toronto Talk overview At OSDI'10: exception-less system calls Technique targeted at highly threaded servers
More informationLight: A Scalable, High-performance and Fully-compatible User-level TCP Stack. Dan Li ( 李丹 ) Tsinghua University
Light: A Scalable, High-performance and Fully-compatible User-level TCP Stack Dan Li ( 李丹 ) Tsinghua University Data Center Network Performance Hardware Capability of Modern Servers Multi-core CPU Kernel
More informationSEDA An architecture for Well Condi6oned, scalable Internet Services
SEDA An architecture for Well Condi6oned, scalable Internet Services Ma= Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium on Operating Systems Principles (SOSP), October
More informationIsoStack Highly Efficient Network Processing on Dedicated Cores
IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single
More informationSpeeding up Linux TCP/IP with a Fast Packet I/O Framework
Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges
More informationNetwork stack specialization for performance
Network stack specialization for performance goo.gl/1la2u6 Ilias Marinos, Robert N.M. Watson, Mark Handley* University of Cambridge, * University College London Motivation Providers are scaling out rapidly.
More informationEleos: Exit-Less OS Services for SGX Enclaves
Eleos: Exit-Less OS Services for SGX Enclaves Meni Orenbach Marina Minkin Pavel Lifshits Mark Silberstein Accelerated Computing Systems Lab Haifa, Israel What do we do? Improve performance: I/O intensive
More informationTxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions
TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Chen, Vijay Chidambaram, Emmett Witchel The University of Texas at Austin
More informationFlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto
FlexSC Flexible System Call Scheduling with Exception-Less System Calls Livio Soares and Michael Stumm University of Toronto Motivation The synchronous system call interface is a legacy from the single
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationSlides on cross- domain call and Remote Procedure Call (RPC)
Slides on cross- domain call and Remote Procedure Call (RPC) This classic paper is a good example of a microbenchmarking study. It also explains the RPC abstraction and serves as a case study of the nuts-and-bolts
More informationMuch Faster Networking
Much Faster Networking David Riddoch driddoch@solarflare.com Copyright 2016 Solarflare Communications, Inc. All rights reserved. What is kernel bypass? The standard receive path The standard receive path
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this
More informationOperating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering
Operating System Chapter 4. Threads Lynn Choi School of Electrical Engineering Process Characteristics Resource ownership Includes a virtual address space (process image) Ownership of resources including
More informationSEDA: An Architecture for Well-Conditioned, Scalable Internet Services
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles
More informationA Scalable Event Dispatching Library for Linux Network Servers
A Scalable Event Dispatching Library for Linux Network Servers Hao-Ran Liu and Tien-Fu Chen Dept. of CSIE National Chung Cheng University Traditional server: Multiple Process (MP) server A dedicated process
More informationNetworks and Opera/ng Systems Chapter 19: I/O Subsystems
Networks and Opera/ng Systems Chapter 19: I/O Subsystems (252 0062 00) Donald Kossmann & Torsten Hoefler Frühjahrssemester 2013 Systems Group Department of Computer Science ETH Zürich Last /me: File System
More informationEvolution of the netmap architecture
L < > T H local Evolution of the netmap architecture Evolution of the netmap architecture -- Page 1/21 Evolution of the netmap architecture Luigi Rizzo, Università di Pisa http://info.iet.unipi.it/~luigi/vale/
More informationKernel Bypass. Sujay Jayakar (dsj36) 11/17/2016
Kernel Bypass Sujay Jayakar (dsj36) 11/17/2016 Kernel Bypass Background Why networking? Status quo: Linux Papers Arrakis: The Operating System is the Control Plane. Simon Peter, Jialin Li, Irene Zhang,
More informationChapter 13: I/O Systems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance I/O Hardware Incredible variety of I/O devices Common
More informationFlash: an efficient and portable web server
Flash: an efficient and portable web server High Level Ideas Server performance has several dimensions Lots of different choices on how to express and effect concurrency in a program Paper argues that
More informationNetwork Implementation
CS 256/456: Operating Systems Network Implementation John Criswell! University of Rochester 1 Networking Overview 2 Networking Layers Application Layer Format of Application Data Transport Layer Which
More informationCSE 451: Operating Systems. Section 10 Project 3 wrap-up, final exam review
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review Final exam review Goal of this section: key concepts you should understand Not just a summary of lectures Slides coverage and
More informationCapriccio : Scalable Threads for Internet Services
Capriccio : Scalable Threads for Internet Services - Ron von Behren &et al - University of California, Berkeley. Presented By: Rajesh Subbiah Background Each incoming request is dispatched to a separate
More informationStackMap: Low-Latency Networking with the OS Stack and Dedicated NICs
StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs Kenichi Yasukata 1, Michio Honda 2, Douglas Santry 2, and Lars Eggert 2 1 Keio University 2 NetApp Abstract StackMap leverages the
More informationNear- Data Computa.on: It s Not (Just) About Performance
Near- Data Computa.on: It s Not (Just) About Performance Steven Swanson Non- Vola0le Systems Laboratory Computer Science and Engineering University of California, San Diego 1 Solid State Memories NAND
More informationArrakis: The Operating System is the Control Plane
Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building
More informationNetSlices: Scalable Mul/- Core Packet Processing in User- Space
NetSlices: Scalable Mul/- Core Packet Processing in - Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee Packet Processors Essen/al for evolving networks Sophis/cated
More informationLecture 8: Other IPC Mechanisms. CSC 469H1F Fall 2006 Angela Demke Brown
Lecture 8: Other IPC Mechanisms CSC 469H1F Fall 2006 Angela Demke Brown Topics Messages through sockets / pipes Receiving notification of activity Generalizing the event notification mechanism Kqueue Semaphores
More informationTopics. Lecture 8: Other IPC Mechanisms. Socket IPC. Unix Communication
Topics Lecture 8: Other IPC Mechanisms CSC 469H1F Fall 2006 Angela Demke Brown Messages through sockets / pipes Receiving notification of activity Generalizing the event notification mechanism Kqueue Semaphores
More informationLow-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015 Datacenters: Scale and Latency Scale: 1M+ cores 1-10 PB memory 200 PB disk storage Latency: < 0.5 µs speed-of-light delay Most
More informationRemote Procedure Call
Remote Procedure Call Remote Procedure Call Integrate network communication with programming language Procedure call is well understood implementation use Control transfer Data transfer Goals Easy make
More informationCS533 Concepts of Operating Systems. Jonathan Walpole
CS533 Concepts of Operating Systems Jonathan Walpole Improving IPC by Kernel Design & The Performance of Micro- Kernel Based Systems The IPC Dilemma IPC is very import in µ-kernel design - Increases modularity,
More informationAsynchronous Events on Linux
Asynchronous Events on Linux Frederic.Rossi@Ericsson.CA Open System Lab Systems Research June 25, 2002 Ericsson Research Canada Introduction Linux performs well as a general purpose OS but doesn t satisfy
More informationRemote Procedure Call (RPC) and Transparency
Remote Procedure Call (RPC) and Transparency Brad Karp UCL Computer Science CS GZ03 / M030 10 th October 2014 Transparency in Distributed Systems Programmers accustomed to writing code for a single box
More informationhashfs Applying Hashing to Op2mize File Systems for Small File Reads
hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design
More informationOPERATING SYSTEM. Chapter 12: File System Implementation
OPERATING SYSTEM Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management
More informationOperating Systems. Review ENCE 360
Operating Systems Review ENCE 360 High level Concepts What are three conceptual pieces fundamental to operating systems? High level Concepts What are three conceptual pieces fundamental to operating systems?
More informationResource Containers. A new facility for resource management in server systems. Presented by Uday Ananth. G. Banga, P. Druschel, J. C.
Resource Containers A new facility for resource management in server systems G. Banga, P. Druschel, J. C. Mogul OSDI 1999 Presented by Uday Ananth Lessons in history.. Web servers have become predominantly
More informationDistributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016
Distributed Systems 2015 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 1 Question 1 Why did the use of reference counting for remote objects prove to be impractical? Explain. It s not fault
More informationFlexNIC: Rethinking Network DMA
FlexNIC: Rethinking Network DMA Antoine Kaufmann Simon Peter Tom Anderson Arvind Krishnamurthy University of Washington HotOS 2015 Networks: Fast and Growing Faster 1 T 400 GbE Ethernet Bandwidth [bits/s]
More informationChapter 11: Implementing File Systems
Chapter 11: Implementing File Systems Operating System Concepts 99h Edition DM510-14 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation
More informationOutline. Spanner Mo/va/on. Tom Anderson
Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable
More informationServers: Concurrency and Performance. Jeff Chase Duke University
Servers: Concurrency and Performance Jeff Chase Duke University HTTP Server HTTP Server Creates a socket (socket) Binds to an address Listens to setup accept backlog Can call accept to block waiting for
More informationImplica(ons of Non Vola(le Memory on So5ware Architectures. Nisha Talagala Lead Architect, Fusion- io
Implica(ons of Non Vola(le Memory on So5ware Architectures Nisha Talagala Lead Architect, Fusion- io Overview Non Vola;le Memory Technology NVM in the Datacenter Op;mizing sobware for the iomemory Tier
More informationChapter 12: File System Implementation
Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency
More informationWeek 12: File System Implementation
Week 12: File System Implementation Sherif Khattab http://www.cs.pitt.edu/~skhattab/cs1550 (slides are from Silberschatz, Galvin and Gagne 2013) Outline File-System Structure File-System Implementation
More informationECE 650 Systems Programming & Engineering. Spring 2018
ECE 650 Systems Programming & Engineering Spring 2018 Programming with Network Sockets Tyler Bletsch Duke University Slides are adapted from Brian Rogers (Duke) Sockets We ve looked at shared memory vs.
More informationOPERATING SYSTEM TRANSACTIONS
OPERATING SYSTEM TRANSACTIONS Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel The University of Texas at Austin OS APIs don t handle concurrency 2 OS is weak
More information416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018
416 Distributed Systems Distributed File Systems 1: NFS Sep 18, 2018 1 Outline Why Distributed File Systems? Basic mechanisms for building DFSs Using NFS and AFS as examples NFS: network file system AFS:
More informationDevice-Functionality Progression
Chapter 12: I/O Systems I/O Hardware I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Incredible variety of I/O devices Common concepts Port
More informationChapter 12: I/O Systems. I/O Hardware
Chapter 12: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations I/O Hardware Incredible variety of I/O devices Common concepts Port
More informationOpera&ng Systems: Principles and Prac&ce. Tom Anderson
Opera&ng Systems: Principles and Prac&ce Tom Anderson How This Course Fits in the UW CSE Curriculum CSE 333: Systems Programming Project experience in C/C++ How to use the opera&ng system interface CSE
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering
More informationChapter 10: File System Implementation
Chapter 10: File System Implementation Chapter 10: File System Implementation File-System Structure" File-System Implementation " Directory Implementation" Allocation Methods" Free-Space Management " Efficiency
More informationRethink the Sync 황인중, 강윤지, 곽현호. Embedded Software Lab. Embedded Software Lab.
1 Rethink the Sync 황인중, 강윤지, 곽현호 Authors 2 USENIX Symposium on Operating System Design and Implementation (OSDI 06) System Structure Overview 3 User Level Application Layer Kernel Level Virtual File System
More informationOPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.
OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD. File System Implementation FILES. DIRECTORIES (FOLDERS). FILE SYSTEM PROTECTION. B I B L I O G R A P H Y 1. S I L B E R S C H AT Z, G A L V I N, A N
More informationAbstract Storage Moving file format specific abstrac7ons into petabyte scale storage systems. Joe Buck, Noah Watkins, Carlos Maltzahn & ScoD Brandt
Abstract Storage Moving file format specific abstrac7ons into petabyte scale storage systems Joe Buck, Noah Watkins, Carlos Maltzahn & ScoD Brandt Introduc7on Current HPC environment separates computa7on
More informationFalcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University
Falcon: Scaling IO Performance in Multi-SSD Volumes Pradeep Kumar H Howie Huang The George Washington University SSDs in Big Data Applications Recent trends advocate using many SSDs for higher throughput
More informationReducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet
Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationTCP. Networked Systems (H) Lecture 13
TCP Networked Systems (H) Lecture 13 This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/4.0/
More informationChapter 12: File System Implementation. Operating System Concepts 9 th Edition
Chapter 12: File System Implementation Silberschatz, Galvin and Gagne 2013 Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods
More informationCS 3723 Operating Systems: Final Review
CS 3723 Operating Systems: Final Review Instructor: Dr. Tongping Liu Lecture Outline High-level synchronization structure: Monitor Pthread mutex Conditional variables Barrier Threading Issues 1 2 Monitors
More informationChapter 12: File System Implementation
Chapter 12: File System Implementation Silberschatz, Galvin and Gagne 2013 Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods
More informationThe Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith
The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith Review Introduction Optimizing the OS based on hardware Processor changes Shared Memory vs
More informationEI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)
EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationW4118: OS Overview. Junfeng Yang
W4118: OS Overview Junfeng Yang References: Modern Operating Systems (3 rd edition), Operating Systems Concepts (8 th edition), previous W4118, and OS at MIT, Stanford, and UWisc Outline OS definitions
More informationOne Server Per City: Using TCP for Very Large SIP Servers. Kumiko Ono Henning Schulzrinne {kumiko,
One Server Per City: Using TCP for Very Large SIP Servers Kumiko Ono Henning Schulzrinne {kumiko, hgs}@cs.columbia.edu Goal Answer the following question: How does using TCP affect the scalability and
More informationLike select() and poll(), epoll can monitor multiple FDs epoll returns readiness information in similar manner to poll() Two main advantages:
Outline 22 Alternative I/O Models 22-1 22.1 Overview 22-3 22.2 Nonblocking I/O 22-5 22.3 Signal-driven I/O 22-11 22.4 I/O multiplexing: poll() 22-14 22.5 Problems with poll() and select() 22-31 22.6 The
More informationLight & NOS. Dan Li Tsinghua University
Light & NOS Dan Li Tsinghua University Performance gain The Power of DPDK As claimed: 80 CPU cycles per packet Significant gain compared with Kernel! What we care more How to leverage the performance gain
More informationSLIPSTREAM: AUTOMATIC INTERPROCESS COMMUNICATION OPTIMIZATION. Will Dietz, Joshua Cranmer, Nathan Dautenhahn, Vikram Adve
SLIPSTREAM: AUTOMATIC INTERPROCESS COMMUNICATION OPTIMIZATION Will Dietz, Joshua Cranmer, Nathan Dautenhahn, Vikram Adve Introduction 2 Use of TCP is ubiquitous Widely Supported Location Transparency Programmer-friendly
More informationImproving the DragonFlyBSD Network Stack
Improving the DragonFlyBSD Network Stack Yanmin Qiao sephe@dragonflybsd.org DragonFlyBSD project Nginx performance and latency evaluation: HTTP/1.1, 1KB web object, 1 request/connect Intel X550 Server
More informationAn Analysis of Linux Scalability to Many Cores
An Analysis of Linux Scalability to Many Cores 1 What are we going to talk about? Scalability analysis of 7 system applications running on Linux on a 48 core computer Exim, memcached, Apache, PostgreSQL,
More informationThreads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011
Threads Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011 Threads Effectiveness of parallel computing depends on the performance of the primitives used to express
More informationModule 12: I/O Systems
Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Performance 12.1 I/O Hardware Incredible variety of I/O devices Common
More informationStrata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson
A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements
More informationAdvanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)
: LFS and Soft Updates Ken Birman (based on slides by Ben Atkin) Overview of talk Unix Fast File System Log-Structured System Soft Updates Conclusions 2 The Unix Fast File System Berkeley Unix (4.2BSD)
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationMemory-Mapped Files. generic interface: vaddr mmap(file descriptor,fileoffset,length) munmap(vaddr,length)
File Systems 38 Memory-Mapped Files generic interface: vaddr mmap(file descriptor,fileoffset,length) munmap(vaddr,length) mmap call returns the virtual address to which the file is mapped munmap call unmaps
More informationFaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs
FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU), Michael Kaminsky (Intel Labs), David Andersen (CMU) RDMA RDMA is a network feature that
More informationSDC 2015 Santa Clara
SDC 2015 Santa Clara Volker Lendecke Samba Team / SerNet 2015-09-21 (2 / 17) SerNet Founded 1996 Offices in Göttingen and Berlin Topics: information security and data protection Specialized on Open Source
More informationBasic OS Progamming Abstrac7ons
Basic OS Progamming Abstrac7ons Don Porter Recap We ve introduced the idea of a process as a container for a running program And we ve discussed the hardware- level mechanisms to transi7on between the
More informationInternet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016
Internet Technology 06. Exam 1 Review Paul Krzyzanowski Rutgers University Spring 2016 March 2, 2016 2016 Paul Krzyzanowski 1 Question 1 Defend or contradict this statement: for maximum efficiency, at
More informationTailwind: Fast and Atomic RDMA-based Replication. Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes
Tailwind: Fast and Atomic RDMA-based Replication Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes In-Memory Key-Value Stores General purpose in-memory key-value stores are widely used nowadays
More informationAn Implementation of the Homa Transport Protocol in RAMCloud. Yilong Li, Behnam Montazeri, John Ousterhout
An Implementation of the Homa Transport Protocol in RAMCloud Yilong Li, Behnam Montazeri, John Ousterhout Introduction Homa: receiver-driven low-latency transport protocol using network priorities HomaTransport
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationBackground: I/O Concurrency
Background: I/O Concurrency Brad Karp UCL Computer Science CS GZ03 / M030 5 th October 2011 Outline Worse Is Better and Distributed Systems Problem: Naïve single-process server leaves system resources
More informationBasic OS Progamming Abstrac2ons
Basic OS Progamming Abstrac2ons Don Porter Recap We ve introduced the idea of a process as a container for a running program And we ve discussed the hardware- level mechanisms to transi2on between the
More informationCS307: Operating Systems
CS307: Operating Systems Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building 3-513 wuct@cs.sjtu.edu.cn Download Lectures ftp://public.sjtu.edu.cn
More informationInternet Technology 3/2/2016
Question 1 Defend or contradict this statement: for maximum efficiency, at the expense of reliability, an application should bypass TCP or UDP and use IP directly for communication. Internet Technology
More informationCaching and reliability
Caching and reliability Block cache Vs. Latency ~10 ns 1~ ms Access unit Byte (word) Sector Capacity Gigabytes Terabytes Price Expensive Cheap Caching disk contents in RAM Hit ratio h : probability of
More informationMultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores
MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores Junbin Kang, Benlong Zhang, Tianyu Wo, Chunming Hu, and Jinpeng Huai Beihang University 夏飞 20140904 1 Outline Background
More informationFor use by students enrolled in #71251 CSE430 Fall 2012 at Arizona State University. Do not use if not enrolled.
Operating Systems: Internals and Design Principles Chapter 4 Threads Seventh Edition By William Stallings Operating Systems: Internals and Design Principles The basic idea is that the several components
More informationKeeping up with the hardware
Keeping up with the hardware Challenges in scaling I/O performance Jonathan Davies XenServer System Performance Lead XenServer Engineering, Citrix Cambridge, UK 18 Aug 2015 Jonathan Davies (Citrix) Keeping
More informationEbbRT: A Framework for Building Per-Application Library Operating Systems
EbbRT: A Framework for Building Per-Application Library Operating Systems Overview Motivation Objectives System design Implementation Evaluation Conclusion Motivation Emphasis on CPU performance and software
More information