A Userspace Packet Switch for Virtual Machines

Similar documents
Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group

Applying Polling Techniques to QEMU

Fakultät Informatik Institut für Systemarchitektur, Betriebssysteme THE NOVA KERNEL API. Julian Stecklina

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel

Virtio 1 - why do it? And - are we there yet? Michael S. Tsirkin Red Hat

Vhost dataplane in Qemu. Jason Wang Red Hat

Evolution of the netmap architecture

Changpeng Liu, Cloud Software Engineer. Piotr Pelpliński, Cloud Software Engineer

Improve VNF safety with Vhost-User/DPDK IOMMU support

KVM Weather Report. Red Hat Author Gleb Natapov May 29, 2013

KVM PERFORMANCE OPTIMIZATIONS INTERNALS. Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May

Virtio/vhost status update

Achieve Low Latency NFV with Openstack*

VIRTUALIZATION. Dresden, 2011/12/6. Julian Stecklina

HKG net_mdev: Fast-path userspace I/O. Ilias Apalodimas Mykyta Iziumtsev François-Frédéric Ozog

VIRTIO-NET: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD. CUNMING LIANG, Intel

Changpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group

VALE: a switched ethernet for virtual machines

Vhost and VIOMMU. Jason Wang (Wei Xu Peter Xu

Real-Time KVM for the Masses Unrestricted Siemens AG All rights reserved

VDPA: VHOST-MDEV AS NEW VHOST PROTOCOL TRANSPORT

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

Red Hat Enterprise Virtualization Hypervisor Roadmap. Bhavna Sarathy Senior Technology Product Manager, Red Hat

VIRTIO: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD. CUNMING LIANG, Intel

Platform Device Assignment to KVM-on-ARM Virtual Machines via VFIO

Userspace NVMe Driver in QEMU

viommu/arm: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

Kata Containers The way to run virtualized containers. Sebastien Boeuf, Linux Software Engineer Intel Corporation

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

VIRTUALIZATION. Dresden, 2011/6/23. Julian Stecklina

Status Update About COLO FT

Quo Vadis Virtio? Michael S. Tsirkin Red Hat

LINUX KVM FRANCISCO JAVIER VARGAS GARCIA-DONAS CLOUD COMPUTING 2017

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen

On the DMA Mapping Problem in Direct Device Assignment

Faculty of Computer Science, Operating Systems Group. The L4Re Microkernel. Adam Lackorzynski. July 2017

KVM as The NFV Hypervisor

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

DPDK Summit 2016 OpenContrail vrouter / DPDK Architecture. Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr.

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

Knut Omang Ifi/Oracle 20 Oct, Introduction to virtualization (Virtual machines) Aspects of network virtualization:

Virtualisation: The KVM Way. Amit Shah

Spring 2017 :: CSE 506. Introduction to. Virtual Machines. Nima Honarmand

Virtual switching technologies and Linux bridge

viommu/arm: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017

KVM PV DEVICES.

Network device virtualization: issues and solutions

Efficient Shared Memory Message Passing for Inter-VM Communications

Xen is not just paravirtualization

Virtualization Device Emulator Testing Technology. Speaker: Qinghao Tang Title 360 Marvel Team Leader

KVM PV DEVICES.

VGA Assignment Using VFIO. Alex Williamson October 21 st, 2013

Software Routers: NetMap

MDev-NVMe: A NVMe Storage Virtualization Solution with Mediated Pass-Through

Low-Overhead Ring-Buffer of Kernel Tracing in a Virtualization System

Virtualization, Xen and Denali

New Approach to OVS Datapath Performance. Founder of CloudNetEngine Jun Xiao

6.033 Spring Lecture #6. Monolithic kernels vs. Microkernels Virtual Machines spring 2018 Katrina LaCurts

PARAVIRTUAL RDMA DEVICE

KVM Weather Report. Amit Shah SCALE 14x

Accelerating VM networking through XDP. Jason Wang Red Hat

Cloud Computing Virtualization

IsoStack Highly Efficient Network Processing on Dedicated Cores

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC,

Advanced Computer Networks. End Host Optimization

RESOURCE MANAGEMENT MICHAEL ROITZSCH

Part 1: Introduction to device drivers Part 2: Overview of research on device driver reliability Part 3: Device drivers research at ERTOS

Storage Performance Tuning for FAST! Virtual Machines

SCONE: Secure Linux Container Environments with Intel SGX

Introduction to Oracle VM (Xen) Networking

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC,

RDMA-like VirtIO Network Device for Palacios Virtual Machines

I/O and virtualization

Faithful Virtualization on a Real-Time Operating System

ntop Users Group Meeting

Tolerating Malicious Drivers in Linux. Silas Boyd-Wickizer and Nickolai Zeldovich

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

Zero-copy Receive for Virtualized Network Devices

KVM on POWER Status update & IO Architecture

Virtio-blk Performance Improvement

Generic Buffer Sharing Mechanism for Mediated Devices

RESOURCE MANAGEMENT MICHAEL ROITZSCH

Support for Smart NICs. Ian Pratt

An O/S perspective on networks: Active Messages and U-Net

Overview. This Lecture. Interrupts and exceptions Source: ULK ch 4, ELDD ch1, ch2 & ch4. COSC440 Lecture 3: Interrupts 1

Revisiting virtualized network adapters

RESOURCE MANAGEMENT MICHAEL ROITZSCH

VIRTUALIZATION. Dresden, 2013/12/3. Julian Stecklina

ARM-KVM: Weather Report Korea Linux Forum


Multi-Hypervisor Virtual Machines: Enabling An Ecosystem of Hypervisor-level Services

Advanced Operating Systems (CS 202) Virtualization

Network stack virtualization for FreeBSD 7.0. Marko Zec

Introduction Construction State of the Art. Virtualization. Bernhard Kauer OS Group TU Dresden Dresden,

Live Migration with Mdev Device

Using (Suricata over) PF_RING for NIC-Independent Acceleration

VT-d Posted Interrupts. Feng Wu, Jun Nakajima <Speaker> Intel Corporation

Virtual Open Systems (VOSyS)

Re-architecting Virtualization in Heterogeneous Multicore Systems

Transcription:

SHRINKING THE HYPERVISOR ONE SUBSYSTEM AT A TIME A Userspace Packet Switch for Virtual Machines Julian Stecklina OS Group, TU Dresden jsteckli@os.inf.tu-dresden.de VEE 2014, Salt Lake City

1 Motivation 2 Userspace Switch 3 Evaluation 4 Summary TU Dresden Userspace Packet Switch slide 2 of 30

01 Microkernels and TCB Systems built on microkernels usually structured as multiserver systems with strong isolation between subsystems. Applications only depend on subsystems they use. kernel user µkernel Application Networking Filesystem TU Dresden Userspace Packet Switch slide 3 of 30

01 Towards Monolithic Hypervisors kernel user Qemu virtio-net TU Dresden Userspace Packet Switch slide 4 of 30

01 Towards Monolithic Hypervisors kernel user KVM Qemu virtio-net TU Dresden Userspace Packet Switch slide 5 of 30

01 Towards Monolithic Hypervisors kernel user KVM vapic Qemu virtio-net TU Dresden Userspace Packet Switch slide 6 of 30

01 Towards Monolithic Hypervisors kernel user KVM vapic v-net... TU Dresden Userspace Packet Switch slide 7 of 30

01 User vs. Kernel Attacks by malicious guest code are a serious concern. Successful attacks on Qemu achieve unprivileged code execution. Dangerous, but manageable. Successful attacks on KVM achieve code execution in kernel mode. Game over. L You are here. TU Dresden Userspace Packet Switch slide 8 of 30

01 KVM/vhost Trusted Code In a modern KVM installation the complete networking path is in the TCB of all applications on the host: (simple) instruction decoding, virtio-net device implementation, NIC driver. Does it have to be? TU Dresden Userspace Packet Switch slide 9 of 30

1 Motivation 2 Userspace Switch 3 Evaluation 4 Summary TU Dresden Userspace Packet Switch slide 10 of 30

02 KVM Networking notification kernel user VCPU A VHOST VHOST VCPU B sv3 guest VM Exit Inject IRQ VM A VM B TU Dresden Userspace Packet Switch slide 11 of 30

02 KVM Networking notification kernel user VCPU A VCPU B sv3 sv3 guest VM Exit Inject IRQ VM A VM B TU Dresden Userspace Packet Switch slide 12 of 30

02 KVM Networking notification kernel user VCPU A VCPU B sv3 sv3 guest VM Exit Inject IRQ VM A VM B CPU 1 CPU 2 CPU 3 TU Dresden Userspace Packet Switch slide 13 of 30

02 A Userspace Switch: sv3 Userspace packet switch running as ordinary process on top of the host Linux: implements virtio-net, (no) packet memory management, NIC drver. Every sv3 instance is a complete isolated networking subsystem. TU Dresden Userspace Packet Switch slide 14 of 30

02 Vhost and KVM KVM and vhost are loosely tied together by Qemu using eventfds. Qemu ties them together using ioctl. KVM can trigger eventfds on VM Exits. eventfds can be used to trigger IRQ injection in KVM. Can use eventfds from userspace as well without using vhost. TU Dresden Userspace Packet Switch slide 15 of 30

02 Tying sv3 into Qemu Enhanced Qemu to support out-of-process PCI devices. Qemu connects to sv3 via AF LOCAL socket. Qemu exchanges fds to establish shared memory. Qemu exchanges eventfds for VM Exit notification, IRQ injection. sv3 implements the complete virtio-net logic and it feels like L4! TU Dresden Userspace Packet Switch slide 16 of 30

02 Zero-Copy Packet Transmission sv3 creates linear mappings of guest memory with mmap. Packet data can be copied with a plain memcpy. No additional copies are necessary. If no buffer space in the receiving VM is available, packet is dropped. No dynamic memory management for packets needed. VM A sv3 VM B VM A VM B TU Dresden Userspace Packet Switch slide 17 of 30

02 External Communication Userspace driver for Intel X520 10 GBit NIC using VFIO (requires IOMMU) supporting static offloads: TCP Segmentation Offload Large Receive Offload Checksum Offload Virtio descriptors translated to HW descriptors allows for zero-copy send with all offloads. Better reuse existing drivers next time... TU Dresden Userspace Packet Switch slide 18 of 30

02 Switching Loop sv3 is mostly single-threaded and lockless. Userspace RCU is used to synchronize adding and removing switch ports. 1 disable events on all virtio queues 2 disable HW IRQs 3 poll for work until queues empty 4 enable events/irqs 5 poll a last time, if packet seen goto 1 6 block on eventfd In overload scenarios, sv3 naturally operates in polling mode. TU Dresden Userspace Packet Switch slide 19 of 30

1 Motivation 2 Userspace Switch 3 Evaluation 4 Summary TU Dresden Userspace Packet Switch slide 20 of 30

03 Resource Consumption Processes are lightweight alternatives to driver VMs. sv3 NIC driver sv3 total < 2 MiB + 14 MiB < 16 MiB smallest VM 32 MiB [1] netback VM 128 MiB [1] Breaking Up is Hard to Do: Security and Functionality in a Commodity Hypervisor, Colp et al., SOSP 11 TU Dresden Userspace Packet Switch slide 21 of 30

03 Evaluation System Intel Core i7 3770S (Ivy Bridge) C-states, HT and frequency scaling disabled 16 GiB RAM @ 159 Gbit/s Host: Fedora 19 with Linux 3.10 (vanilla) Guest: Linux 3.10 (vanilla), 256 MiB RAM Qemu 1.5 (plus patches) TU Dresden Userspace Packet Switch slide 22 of 30

03 Cost of Userspace Execution 14 Roundtrip Latency in μs 12 10 8 6 4 2 0 vhost sv3 Notification to IRQ injection times for vhost vs. sv3 without any packet processing. Cost of additional trip to syscall layer and mode switch. TU Dresden Userspace Packet Switch slide 23 of 30

03 Latency Roundtrip Latency in μs 100 80 60 40 20 0 vhost-external vhost-vm sv3-external sv3-vm Latency measured using netperf UDP RR. TU Dresden Userspace Packet Switch slide 24 of 30

03 VM-to-VM Bandwidth 3 2.5 vhost, tso sv3, tso sv3 only CPU Utilization 2 1.5 1 0.5 0 0 5 10 15 20 25 30 GBit/s TU Dresden Userspace Packet Switch slide 25 of 30

03 VM-to-VM Bandwidth (cont d) 3 2.5 vhost, no tso sv3, no tso sv3 only CPU Utilization 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 GBit/s TU Dresden Userspace Packet Switch slide 26 of 30

03 external-to-vm Bandwidth 3 2.5 vhost, tso sv3, tso sv3 only CPU Utilization 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 GBit/s TU Dresden Userspace Packet Switch slide 27 of 30

1 Motivation 2 Userspace Switch 3 Evaluation 4 Summary TU Dresden Userspace Packet Switch slide 28 of 30

04 Summary sv3 is an efficient lockless userspace packet switch for VMs running on (unmodified) Linux/KVM. Code: https://github.com/blitz/sv3. Linux/KVM has all the mechanisms to make a microkernel-style design possible and efficient: rights transfer via AF LOCAL sockets (capabilities) efficient notifications via eventfds drivers in userspace via VFIO using eventfds for IRQs tying eventfds to VM exits / IRQ injection address space switch cost not a factor in performance Few reasons to write new systems functionality in kernel mode. Questions? TU Dresden Userspace Packet Switch slide 29 of 30

04 external-to-vm Bandwidth 3 2.5 sv3, no tso sv3 only CPU Utilization 2 1.5 1 0.5 0 1 2 3 4 5 6 GBit/s TU Dresden Userspace Packet Switch slide 30 of 30