Zero-Copy Socket Splicing

Similar documents
Interprocess Communication. Interprocess Communication

TCP/IP Stack Introduction: Looking Under the Hood!

Network Implementation

L41 - Lecture 5: The Network Stack (1)

The Network Stack (1)

NetBSD Kernel Topics:

COMPUTER NETWORK. Homework #2. Due Date: April 12, 2017 in class

===Socket API User/ OS interface === COP

libnetfilter_log Reference Manual

The Network Stack. Chapter Network stack functions 216 CHAPTER 21. THE NETWORK STACK

Interprocess Communication Mechanisms

shared storage These mechanisms have already been covered. examples: shared virtual memory message based signals

Sockets 15H2. Inshik Song

UNIT III - APPLICATION DEVELOPMENT. TCP Echo Server

Lecture 5 Overview! Last Lecture! This Lecture! Next Lecture! I/O multiplexing! Source: Chapter 6 of Stevens book!

NFS Design Goals. Network File System - NFS

Group-A Assignment No. 6

Any of the descriptors in the set {1, 4} have an exception condition pending

Never Lose a Syslog Message

Proceedings of the General Track: 2003 USENIX Annual Technical Conference

Introduction and Overview Socket Programming Lower-level stuff Higher-level interfaces Security. Network Programming. Samuli Sorvakko/Nixu Oy

CS118 Discussion Week 2. Taqi

CSE 422 Jeopardy. Sockets TCP/UDP IP Routing Link $100 $200 $300 $400. Sockets - $100

SCTP for Vertical Handover.

Memory-Mapped Files. generic interface: vaddr mmap(file descriptor,fileoffset,length) munmap(vaddr,length)

Mike Anderson. TCP/IP in Embedded Systems. CTO/Chief Scientist The PTR Group, Inc.

Operating Systems. 17. Sockets. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

Question Score 1 / 19 2 / 19 3 / 16 4 / 29 5 / 17 Total / 100

Outline. Option Types. Socket Options SWE 545. Socket Options. Out-of-Band Data. Advanced Socket. Many socket options are Boolean flags

Ports under 1024 are often considered special, and usually require special OS privileges to use.

Programming Internet with Socket API. Hui Chen, Ph.D. Dept. of Engineering & Computer Science Virginia State University Petersburg, VA 23806

CS631 - Advanced Programming in the UNIX Environment Interprocess Communication II

NETWORK PROGRAMMING. Instructor: Junaid Tariq, Lecturer, Department of Computer Science

Introduction and Overview Socket Programming Higher-level interfaces Final thoughts. Network Programming. Samuli Sorvakko/Nixu Oy

Networking Subsystem in Linux. Manoj Naik IBM Almaden Research Center

TABLE OF CONTENTS 1 INTRODUCTION 1 COIP-K IMPLEMENTATION REQUIREMENTS... 5 THESIS OUTLINE NETWORKING BACKGROUND 8

Implementing the Wireless Token Ring Protocol As a Linux Kernel Module

Introduction to Socket Programming

I/O Models. Kartik Gopalan

Sockets and Parallel Computing. CS439: Principles of Computer Systems April 11, 2018

Development of reliable protocol Sliding window protocols. C = channel capacity in bps I = interrupt/service time + propagation delay

Processes communicating. Network Communication. Sockets. Addressing processes 4/15/2013

TPF 4.1 Communications - TCP/IP Enhancements

WASHINGTON UNIVERSITY SEVER INSTITUTE OF TECHNOLOGY AN IMPLEMENTATION MODEL FOR CONNECTION-ORIENTED INTERNET PROTOCOLS

Design and Evaluation of a Kernel-Level SCTP Implementation

CAN FD with Dynamic Multi-PDU-to-Frame Mapping

Networks. Administrivia. Bandwidth-delay. Physical connectivity. Indirect connectivity. Example: Ethernet. What is a network?

CSE/EE 461 Lecture 14. Connections. Last Time. This Time. We began on the Transport layer. Focus How do we send information reliably?

IKR SimLib-QEMU: TCP Simulations Integrating Virtual Machines

Network Communication

CALIFORNIA SOFTWARE LABS

MIGSOCK A Migratable TCP Socket in Linux

Lecture 8: Other IPC Mechanisms. CSC 469H1F Fall 2006 Angela Demke Brown

Asynchronous Events on Linux

Topics. Lecture 8: Other IPC Mechanisms. Socket IPC. Unix Communication

CS118 Discussion 1A, Week 3. Zengwen Yuan Dodd Hall 78, Friday 10:00 11:50 a.m.

Kea Messages Manual. Kea Messages Manual

Interprocess Communication Mechanisms

Chapter 8: I/O functions & socket options

Interprocess Communication Mechanisms

Chapter 10: I/O Subsystems (2)

Computer Network Programming. The Transport Layer. Dr. Sam Hsu Computer Science & Engineering Florida Atlantic University

Networked Applications: Sockets. End System: Computer on the Net

ECE 650 Systems Programming & Engineering. Spring 2018

CSE398: Network Systems Design

Chapter 10: I/O Subsystems (2)

Slides on cross- domain call and Remote Procedure Call (RPC)

Computer Science 461 Midterm Exam March 14, :00-10:50am

Randall Stewart, Cisco Systems Phill Conrad, University of Delaware

BENCHMARKING LIBEVENT AGAINST LIBEV

CS 378 (Spring 2003)

(h)icn Socket Library for HTTP Leveraging (h)icn socket library for carrying HTTP messages

Lecture 2. Outline. Layering and Protocols. Network Architecture. Layering and Protocols. Layering and Protocols. Chapter 1 - Foundation

EEC-484/584 Computer Networks

Internetworking With TCP/IP

10GE network tests with UDP. Janusz Szuba European XFEL

Light & NOS. Dan Li Tsinghua University

Lab 1 - Reliable Data Transport Protocol

Network Programming in C: The Berkeley Sockets API. Networked Systems 3 Laboratory Sessions

CPSC 441 Assignment-3 Discussion. Department of Computer Science University of Calgary

Intro to LAN/WAN. Transport Layer

What s an API? Do we need standardization?

Linux Kernel Application Interface

Socket Programming. Dr. -Ing. Abdalkarim Awad. Informatik 7 Rechnernetze und Kommunikationssysteme

Operating Systems 2010/2011

Assignment 2 Group 5 Simon Gerber Systems Group Dept. Computer Science ETH Zurich - Switzerland

OpenBSD Remote Exploit

Ethernet TCP/IP component programming guide

TRANSMISSION CONTROL PROTOCOL. ETI 2506 TELECOMMUNICATION SYSTEMS Monday, 7 November 2016

TCP: Three-way handshake

TCP. Networked Systems (H) Lecture 13

CSE 461 Module 11. Connections

Outline. Distributed Computing Systems. Socket Basics (1 of 2) Socket Basics (2 of 2) 3/28/2014

tee is to design a new TCP/IP API which matches the requirements of embedded systems. RTOS Automotive Application Technical Committee With current pra

Internet Applications and the Application Layer Material from Kurose and Ross, Chapter 2: The Application Layer

CPSC 441 Assignment-3 Discussion. Department of Computer Science University of Calgary

Key Points for the Review

Programming Assignment 3: Transmission Control Protocol

Configure Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) Service Settings on a Switch

Firewalls. Firewall. means of protecting a local system or network of systems from network-based security threats creates a perimeter of defense

Transcription:

Zero-Copy Socket Splicing Alexander Bluhm bluhm@openbsd.org Sunday, 29. September 2013

Agenda 1 Motivation 2 Kernel MBuf 3 Packet Processing 4 Socket Splicing 5 Interface 6 Implementation 7 Applications

Agenda 1 Motivation 2 Kernel MBuf 3 Packet Processing 4 Socket Splicing 5 Interface 6 Implementation 7 Applications

Application Level Gateway User Land Kernel Application TCP/UDP Network IP Data Link Physical Relay Socket Splicing Packet Filter

Persistent HTTP Filtering content length content length Body Header Body Header copy copy copy filter copy copy filter

HTTP Socket Splicing splice length splice length User Land Kernel Header filter Header filter Body splice Body splice

Agenda 1 Motivation 2 Kernel MBuf 3 Packet Processing 4 Socket Splicing 5 Interface 6 Implementation 7 Applications

MBuf Data mbuf m hdr m data m len m dat ether header ip header udp header size 256 42 size 236 size 42

MBuf Data Chaining mbuf m hdr m next m data m len m pkthdr len m pktdat size 256 42 142 size 196 mbuf m hdr m next m data m len m dat payload size 256 NULL 100 size 236 ether header ip header udp header size 42 size 100

MBuf Packet Chaining mbuf m hdr m next m nextpkt m pkthdr mbuf m hdr m next m nextpkt mbuf m hdr m next m nextpkt mbuf m hdr m next m nextpkt m pkthdr mbuf m hdr m next m nextpkt

MBuf Cluster size 2048 mbuf m hdr m data m len m pkthdr m ext ext buf ext size size 256 1400 2048 ether header ip header udp header payload size 1400

MBuf Cluster Copy mbuf m data ext buf ether header ip header udp header payload mbuf m data ext buf

Agenda 1 Motivation 2 Kernel MBuf 3 Packet Processing 4 Socket Splicing 5 Interface 6 Implementation 7 Applications

Packet Input User Land Kernel ether input() tcp input() inetsw[] internet protocol switch ip input() read() soreceive() ip interface receive queue, m nextpkt network driver interrupt handler socket receive buffer, m next

Packet Output write() sosend() tcp output() ip output() socket send buffer, m next ether output() interface send queue, m nextpkt if start() network driver start routine User Land Kernel

Data Copy read() copyout() uiomove() soreceive() so rcv tcp input() Relay write() copyin() uiomove() sosend() so snd tcp output()

Process Wakeup read() select() write() file descriptor struct socket soreceive() sosend() so rcv so snd sowwakeup() sorwakeup() ACK tcp input() tcp output()

Agenda 1 Motivation 2 Kernel MBuf 3 Packet Processing 4 Socket Splicing 5 Interface 6 Implementation 7 Applications

Socket Splicing setsockopt(so SPLICE) sosplice() somove() tcp input() so rcv sowwakeup() sorwakeup() ACK tcp input() so snd tcp output()

UDP Sockets soreceive() so rcv somove() sosend() udp input() udp output()

Layer read() Relaying write() soreceive() Socket Splicing so rcv tcp input() Forwarding ip input() ipintrq sosend() so snd tcp output() ip output() if snd

Agenda 1 Motivation 2 Kernel MBuf 3 Packet Processing 4 Socket Splicing 5 Interface 6 Implementation 7 Applications

Simple API Begin splicing from source to drain setsockopt(source fd, SO SPLICE, drain fd) Stop splicing setsockopt(source fd, SO SPLICE, -1) Get spliced data length getsockopt(source fd, SO SPLICE, &length)

Extended API struct splice { int sp_fd; /* drain */ off_t sp_max; /* maximum */ struct timeval sp_idle; /* timeout */ }; setsockopt(source fd, SO SPLICE, &splice)

Properties Splicing is unidirectional Invoke it twice for bidirectional splicing Process can turn it on and off Works for TCP and UDP Can mix IPv4 and IPv6 sockets

Unsplice Dissolve socket splicing manually read(2) or select(2) from the source EOF source socket shutdown EPIPE drain socket error EFBIG maximum data length ETIMEDOUT idle timeout

Agenda 1 Motivation 2 Kernel MBuf 3 Packet Processing 4 Socket Splicing 5 Interface 6 Implementation 7 Applications

Struct Socket struct socket {... struct socket *so_splice; struct socket *so_spliceback; off_t so_splicelen; off_t so_splicemax; struct timeval so_idletv; struct timeout so_idleto;... };

sosplice(9) Protocol must match Sockets must be connected Double link sockets Move existing data

somove(9) Check for errors Check for space Handle maximum Handle out of band data Move socket buffer data

sounsplice() Manual unsplice Cannot receive Cannot send Maximum Timeout Socket closed

sorwakeup() sowwakeup() Called from tcp input() Source calls sorwakeup() Drain calls sowwakeup() Both invoke somove(9)

Agenda 1 Motivation 2 Kernel MBuf 3 Packet Processing 4 Socket Splicing 5 Interface 6 Implementation 7 Applications

Relayd Plain TCP connections HTTP connections Filter persistent HTTP HTTP Chunking

Tests /usr/src/regress/sys/kern/sosplice/ 15 API tests 18 UDP tests 76 TCP tests perf/relay.c simple example BSD::Socket::Splice Perl API 28 relayd tests

Performance Factor 1 or 2 for TCP Factor 6 or 8 for UDP

Documentation Manpage setsockopt(2) SO SPLICE Manpage sosplice(9) somove(9)

Questions?