Light & NOS. Dan Li Tsinghua University

Similar documents
Light: A Scalable, High-performance and Fully-compatible User-level TCP Stack. Dan Li ( 李丹 ) Tsinghua University

Operating Systems. 17. Sockets. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

CSE398: Network Systems Design

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

URDMA: RDMA VERBS OVER DPDK

SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet

Speeding up Linux TCP/IP with a Fast Packet I/O Framework

Contents. Part 1. Introduction and TCP/IP 1. Foreword Preface. xix. I ntroduction 31

CHETTINAD COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF MCA QUESTION BANK UNIT 1

A Client-Server Exchange

Overview. Last Lecture. This Lecture. Daemon processes and advanced I/O functions

Linux Kernel Application Interface

Much Faster Networking

Computer Networks SYLLABUS CHAPTER - 2 : NETWORK LAYER CHAPTER - 3 : INTERNETWORKING

Motivation of VPN! Overview! VPN addressing and routing! Two basic techniques for VPN! ! How to guarantee privacy of network traffic?!

Operating System Modifications for User-Oriented Addressing Model

Exploring mtcp based Single-Threaded and Multi-Threaded Web Server Design

VPP Host Stack. Transport and Session Layers. Florin Coras, Dave Barach, Keith Burns, Dave Wallace

VPP Host Stack. TCP and Session Layers. Florin Coras, Dave Barach, Keith Burns, Dave Wallace

socketservertcl a Tcl extension for using SCM_RIGHTS By Shannon Noe - FlightAware

OpenOnload. Dave Parry VP of Engineering Steve Pope CTO Dave Riddoch Chief Software Architect

IPv4 and ipv6 INTEROPERABILITY

Chapter 8: I/O functions & socket options

Z/TPF TCP/IP SOCK Driver 12/14/10. z/tpf TCP/IP SOCKET Driver Users Guide. Copyright IBM Corp. 2010

Introduction to OpenOnload Building Application Transparency and Protocol Conformance into Application Acceleration Middleware

IsoStack Highly Efficient Network Processing on Dedicated Cores

VPP Host Stack. Transport and Session Layers. Florin Coras, Dave Barach

DCCP (Datagram Congestion Control Protocol)

Master s Thesis (Academic Year 2015) Improving TCP/IP stack performance by fast packet I/O framework

May 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to

Client Server Computing

NETWORK SIMULATION USING NCTUns. Ankit Verma* Shashi Singh* Meenakshi Vyas*

SLIPSTREAM: AUTOMATIC INTERPROCESS COMMUNICATION OPTIMIZATION. Will Dietz, Joshua Cranmer, Nathan Dautenhahn, Vikram Adve

Linux Kernel Application Interface

PROCESS CONCEPTS. Process Concept Relationship to a Program What is a Process? Process Lifecycle Process Management Inter-Process Communication 2.

Sistemas Operativos /2016 Support Document N o 1. Files, Pipes, FIFOs, I/O Redirection, and Unix Sockets

Ed Warnicke, Cisco. Tomasz Zawadzki, Intel

The design and implementation of the NCTUns network simulation engine

Memory-Mapped Files. generic interface: vaddr mmap(file descriptor,fileoffset,length) munmap(vaddr,length)

Network Implementation

IKR SimLib-QEMU: TCP Simulations Integrating Virtual Machines

Interprocess Communication Mechanisms

shared storage These mechanisms have already been covered. examples: shared virtual memory message based signals

Lecture 5 Overview! Last Lecture! This Lecture! Next Lecture! I/O multiplexing! Source: Chapter 6 of Stevens book!

Storage Performance Development Kit (SPDK) Daniel Verkamp, Software Engineer

NetCheck: Network Diagnoses from Blackbox Traces

Advanced Computer Networks. End Host Optimization

Operating Systems. Review ENCE 360

Research on DPDK Based High-Speed Network Traffic Analysis. Zihao Wang Network & Information Center Shanghai Jiao Tong University

Custom UDP-Based Transport Protocol Implementation over DPDK

What is an L3 Master Device?

An Implementation of the Homa Transport Protocol in RAMCloud. Yilong Li, Behnam Montazeri, John Ousterhout

EEC-484/584 Computer Networks

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Programming with TCP/IP. Ram Dantu

Introduction! Overview! Signal-driven I/O for Sockets! Two different UDP servers!

UNIX Network Programming

Network programming(ii) Lenuta Alboaie

NTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.

Solarflare and OpenOnload Solarflare Communications, Inc.

Network Programming. Introduction to Sockets. Dr. Thaier Hayajneh. Process Layer. Network Layer. Berkeley API

How do we troubleshoot this? How does Esmeralda know how to fix this?

Implementing the Wireless Token Ring Protocol As a Linux Kernel Module

VALLIAMMAI ENGINEERING COLLEGE. SRM Nagar, Kattankulathur QUESTION BANK

NETWORK PROGRAMMING. Ipv4 and Ipv6 interoperability

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

SMD149 - Operating Systems

L41 - Lecture 5: The Network Stack (1)

Transport Layer (TCP/UDP)

Containers Do Not Need Network Stacks

Containing RDMA and High Performance Computing

Memory Management Strategies for Data Serving with RDMA

A socket-based interface to CAN

Assignment 2 Group 5 Simon Gerber Systems Group Dept. Computer Science ETH Zurich - Switzerland

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

Lecture 8: Other IPC Mechanisms. CSC 469H1F Fall 2006 Angela Demke Brown

Topics. Lecture 8: Other IPC Mechanisms. Socket IPC. Unix Communication

CS 3723 Operating Systems: Final Review

Learning with Purpose

CSCI-GA Operating Systems. Networking. Hubertus Franke

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

CS631 - Advanced Programming in the UNIX Environment Interprocess Communication II

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

(Refer Slide Time: 1:09)

A Look at Intel s Dataplane Development Kit

Improving network connection locality on multicore systems

libvnf: building VNFs made easy

Message Passing Architecture in Intra-Cluster Communication

Operating Systems. 16. Networking. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

A Scalable Event Dispatching Library for Linux Network Servers

The Network Stack (1)

Virtio/vhost status update

ECE 650 Systems Programming & Engineering. Spring 2018

Concurrent Architectures - Unix: Sockets, Select & Signals

Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007

Software Routers: NetMap

StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs

Evolution of the netmap architecture

Transcription:

Light & NOS Dan Li Tsinghua University

Performance gain The Power of DPDK As claimed: 80 CPU cycles per packet Significant gain compared with Kernel! What we care more How to leverage the performance gain to serve more applications A great opportunity Decoupling network operation from the kernel / operation system Network can thus develop independently 2

State-of-the-Art Improving the performance of Linux network stack Too many works mtcp Based on DPDK Realizing network operations as a thread in application 6wind Do not know the technical details What we want A DPDK-based network stack that can provide the functionality of network operating system 3

Light: a Polling-based, General-purpose, Userspace, High-performance Network Stack Light Network-related API Applications API call Incoming data Dll, Signal Memory, init., timer, net. I/O Payload Non-network API User space outgoing data Linux kernel Raw packets Kernel s stack Kernel space NIC 4

Light Architecture Shared Hugepage Memory Enqueue Light Parser Transport layer Network layer Data link layer Light Daemon Accept Ready Queue Close Ready Queue TX Ready Queue Recv Ready Queue Command Queue Socket RX Queue TX Queue Light API L I G H T DPDK 5

Design Goals Goal #1: minimize the modification of applications Ease the development of new applications Benefit the porting of legacy applications Goal #2: minimize the performance affect to applications The purpose of DPDK is to increase the I/O performance We do not want that the performance of application is sacrificed due to DPDK 6

Goal #1: Minimizing the Modification of Apps. Light provides network-related APIs as a lib to apps. socket(), listen(), bind(), accept(), connect(), shutdown(), close(), socketpair() send(), receive(), sendto(), recvfrom(), sendmsg(), recvmsg(), read(), write(), readv(),writev() setsockopt(), getsockopt(), ioctl(), fcntl() epoll (), select(), poll() Challenges How to mask the same network APIs of the kernel? How to differentiate the two FD spaces (Light and kernel) in the application? Now we need to add several lines of codes in app. DPDK initialization, DLL management 7

API Mask Applications uses dlsym() to redirect the function address to Light Thus the same network APIs in the kernel are masked Do not need to modify the API calls of the application Light s APIs follow the same format of POSIX APIs Do not need to modify the kernel Help the system stability 8

FD confusion Two FD Spaces: Problems Both sockets and files are referred to by FDs E.g., read(), write(), epoll() Problems of Epoll Epoll in the application can wait for the events of network sockets, file events, as well as inter-process sockets Epoll for network sockets is supported in Light Epoll for file events is supported by kernel Epoll for inter-process sockets can be either realized in Light or supported by kernel The two Epolls cannot work in blocking mode simultaneously Logic problem 9

Two FD Spaces: Our Solution For the FD confusion problem Light assigns FDs from the upper bound of FD space, because the Kernel assigns FDs from the lower bound of the space For the two Epoll problem If we want to detect the events of both FD spaces Intercept Epoll and let it always work in non-blocking mode Cons.: app. cannot be suspended, CPU resource waste If we want to save CPU resource for blocking calls in app. Realize network sockets and inter-process sockets in Light Cons.: Cannot detect file events 10

Goal #2: Minimizing the Performance Affect to App. Challenge 1: DPDK I/O polling in Light occupies 100% CPU resource App. might compete with Light for the CPU resource Challenge 2: How to minimize the overhead of interprocess communication Both Light and application are user-space processes The inter-process communication between Light and app. may incur high overhead to apps. Compared with if app. uses kernel network stack 11

Solution CPU Competition between Light & APPs Run Light Parser and Light Daemon in Light cores Run App. and Light API in App. cores Light cores and App. cores are physically separated I/O polling only occurs in Light cores, not in App. cores Light cores (Light parser and Light daemon) App. Cores (Light API and app.) Receive data Send data 12

Inter-process Communication between App. & Light Basic mechanism Lockless queue based on shared memory RTE-ring in DPDK Blocking API calls in app. Epoll(), recv(), send(), accept(), connect() Kernel can suspend the app. process and wake it up after data arrives, which saves the CPU resource Light is a user-space process and cannot do what kernel does 13

Possible Methods Method 1: If there is no event in the queue, the (Light API in) app. process goes to sleep When an event comes, Light daemon uses signal to wakeup the process (batch process to further reduce the overhead) Problem: if the last event fails to wake up the app. process Signal can be lost Time sequence error due to process management: an app. process receives the wakeup signal before it goes to sleep Method 2: Similar way as in hybrid spinlock While() and sleep for some time inside Problem: still add some CPU overhead to app. 14

Method 1 for Epoll() Our Solution Method 2 for send(), receive(), connect(), accept() Reasons The Epoll queue maintains all the events for all sockets of the process Any kind of event from any socket can wakeup the app. Process The queue of the other 4 APIs only maintains a certain kind of events for a certain socket The probability exists that the last event fails to wakeup the app. process 15

Features of Light Minimal modification of applications Run as a general-purpose service for applications Currently app. only has to add several lines of codes Significant performance improvement Inherit the advantage of DPDK Minimize the performance affect to apps. due to DPDK Complete protocol stack TCP (including congestion control), UDP, ICMP, IP, UDP, ARP, Ethernet 16

Demonstration We run Nginx on Light and Linux kernel separately Single process for Nginx Apache benchmark Concurrent requests processed on Light more than doubles that in Linux 17

Thanks! tolidan@tsinghua.edu.cn 18