Virtio/vhost status update

Similar documents
DPDK Vhost/Virtio Performance Report Release 18.11

DPDK Vhost/Virtio Performance Report Release 18.05

DPDK Vhost/Virtio Performance Report Release 17.08

KVM as The NFV Hypervisor

Design of Vhost-pci - designing a new virtio device for inter-vm communication

DPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX

Accelerating VM networking through XDP. Jason Wang Red Hat

New Approach to OVS Datapath Performance. Founder of CloudNetEngine Jun Xiao

Open vswitch - architecture

VIRTIO: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD. CUNMING LIANG, Intel

VIRTIO-NET: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD. CUNMING LIANG, Intel

Improve VNF safety with Vhost-User/DPDK IOMMU support

A Userspace Packet Switch for Virtual Machines

Understanding The Performance of DPDK as a Computer Architect

Agilio CX 2x40GbE with OVS-TC

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

Vhost dataplane in Qemu. Jason Wang Red Hat

Corporate Update. OpenVswitch hardware offload over DPDK. DPDK summit 2017

VDPA: VHOST-MDEV AS NEW VHOST PROTOCOL TRANSPORT

Xilinx Answer QDMA DPDK User Guide

vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018

DPDK Summit 2016 OpenContrail vrouter / DPDK Architecture. Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr.

Zero-copy Receive for Virtualized Network Devices

Implementing a TCP Broadband Speed Test in the Cloud for Use in an NFV Infrastructure

Virtio 1 - why do it? And - are we there yet? Michael S. Tsirkin Red Hat

Changpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group

Vhost and VIOMMU. Jason Wang (Wei Xu Peter Xu

Xilinx Answer QDMA Performance Report

OVS-DPDK: Memory management and debugging

Achieve Low Latency NFV with Openstack*

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel

Agenda. About us Why para-virtualize RDMA Project overview Open issues Future plans

Topic: A Deep Dive into Memory Access. Company: Intel Title: Software Engineer Name: Wang, Zhihong

DPDK Summit China 2017

FAQ. Release rc2

Evolution of the netmap architecture

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC,

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC,

Introducing the Data Plane Development Kit (DPDK) on Lenovo Servers

DPDK Integration within F5 BIG-IP BRENT BLOOD, SR MANAGER SOFTWARE ENGINEERING VIJAY MANICKAM, SR SOFTWARE ENGINEER

DPDK Roadmap. Tim O Driscoll & Chris Wright Open Networking Summit 2017

Accelerate Service Function Chaining Vertical Solution with DPDK

Storage Performance Tuning for FAST! Virtual Machines

Quo Vadis Virtio? Michael S. Tsirkin Red Hat

Bringing the Power of ebpf to Open vswitch. Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.

A Brief Guide to Virtual Switching Franck Baudin (Red Hat) Billy O Mahony (Intel)

IsoStack Highly Efficient Network Processing on Dedicated Cores

Light & NOS. Dan Li Tsinghua University

Agenda How DPDK can be used for your Application DPDK Ecosystem boosting your Development Meet the Community Challenges

FPGA Acceleration and Virtualization Technology in DPDK ROSEN XU TIANFEI ZHANG

Mellanox DPDK. Release Notes. Rev 16.11_4.0

HowTo Guides. Release

The.pdf version of this slide deck will have missing info, due to use of animations. The original.pptx deck is available here:

Let s Hot plug: By uevent mechanism in DPDK. Jeff guo Intel DPDK Summit User space - Dublin- 2017

Changpeng Liu, Cloud Software Engineer. Piotr Pelpliński, Cloud Software Engineer

Configuring and Benchmarking Open vswitch, DPDK and vhost-user. Pei Zhang ( 张培 ) October 26, 2017

Status Update About COLO FT

Memory Externalization With userfaultfd Red Hat, Inc.

Changpeng Liu. Senior Storage Software Engineer. Intel Data Center Group

Event Device Drivers. Release rc1

DPDK Summit China 2017

QorIQ Intelligent Network Interface Card (inic) Solution SDK v1.0 Update

Red Hat OpenStack Platform 10

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

DPDK Tools User Guides. Release

Introduction to Virtio Crypto Device.

Advanced Computer Networks. End Host Optimization

Baseband Device Drivers. Release rc1

HowTo Guides. Release

Red Hat Enterprise Virtualization Hypervisor Roadmap. Bhavna Sarathy Senior Technology Product Manager, Red Hat

Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance

Keeping up with the hardware

Ethernet Services over IPoIB

Intel 10Gbe status and other thoughts. Linux IPsec Workshop Shannon Nelson Oracle Corp March 2018

SmartNIC Programming Models

Introduction to Oracle VM (Xen) Networking

HKG : OpenAMP Introduction. Wendy Liang

Baseband Device Drivers. Release

Speeding up Linux TCP/IP with a Fast Packet I/O Framework

Mellanox NIC s Performance Report with DPDK Rev 1.0

Mellanox DPDK. Release Notes. Rev 16.11_2.3

UDP Encapsulation in Linux netdev0.1 Conference February 16, Tom Herbert

VALE: a switched ethernet for virtual machines

viommu/arm: full emulation and virtio-iommu approaches Eric Auger KVM Forum 2017

Fast packet processing in linux with af_xdp

Much Faster Networking

Compression Device Drivers. Release rc3

To Grant or Not to Grant

SmartNIC Programming Models

Zhang Chen Zhang Chen Copyright 2017 FUJITSU LIMITED

MIKELANGELO D3.3. Super KVM - Fast virtual I/O hypervisor. Shiqing Fan, Fang Chen Gabriel Scalosub, Niv Gilboa Justin Činkelj, Gregor Berginc

Netchannel 2: Optimizing Network Performance

Linux SCTP is catching up and going above

RDMA-like VirtIO Network Device for Palacios Virtual Machines

Measuring a 25 Gb/s and 40 Gb/s data plane

Xen VT status and TODO lists for Xen-summit. Arun Sharma, Asit Mallick, Jun Nakajima, Sunil Saxena

École Polytechnique Fédérale de Lausanne. Porting a driver for the Intel XL710 40GbE NIC to the IX Dataplane Operating System

Real-Time KVM for the Masses Unrestricted Siemens AG All rights reserved

Support for Smart NICs. Ian Pratt

Virtual switching technologies and Linux bridge

Transcription:

Virtio/vhost status update Yuanhan Liu <yuanhanliu@linuxintelcom> Aug 2016

outline Performance Multiple Queue Vhost TSO Functionality/Stability Live migration Reconnect Vhost PMD Todo Vhost-pci Vhost Tx zero copy Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 2 / 20

Multiple Queue Why virtio/vhost is a CPU intensive workload, meaning CPU is the bottleneck Use multiple queue with multiple CPU could result to linear performance boost Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 3 / 20

Multiple Queue Howto QEMU -enable-kvm -m 4G -smp 4 -cpu host \ -object memory-backend-file,id=mem,size=4g,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ \ -chardev socket,id=char0,path=$socket_path \ -netdev type=vhost-user,id=net0,chardev=char0,queues=2 \ -device virtio-net-pci,netdev=net0,mac=$mac,mq=on,vectors=6 Inside guest Linux kernel virtio-net driver (one queue is enabled by default) $ ethtool -L combined 2 eth0 DPDK Virtio PMD $ testpmd -c 0x7 -- -i --rxq=2 --txq=2 --nb-cores=2 Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 4 / 20

Multiple Queue Under the hood +-------------------------------------+ VHOST_USER_GET_PROTOCOL_FEATURE +----------------+--------------------+ v +----------------+--------------------+ VHOST_USER_PROTOCOL_F_MQ is enabled? +------------- --+--+-----------------+ YES NO +---> 1 queue is enabled v +----------------+--------------------+ VHOST_USER_GET_QUEUE_NUM => max_queue +----------------+--------------------+ v +----------------+--------------------+ nr_queue > max_queue +----------------+--+-----------------+ NO YES +-----> error and quit v +----------------+--------------------+ nr_queue is enabled +-------------------------------------+ Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 5 / 20

Vhost TSO Why VM2VM performance sucks It could be improved dramatically with TSO enabled Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 6 / 20

Vhost TSO Howto TSO is enabled by default, nothing special needed Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 7 / 20

Vhost TSO Under the hood VM2VM on the same host no actual segment: virtio net supports transforming/receiving big packets no checksum generating/calculating: packets are transfered on the same host; we can assume the data is valid VM2NIC set correct mbuf flags NIC will do the TSO and CSUM offload NOTE: OVS hasn t enabled NIC TSO yet Thus the VM2NIC case won t work Patch has been submitted though: http://openvswitchorg/pipermail/dev/2016-june/072871html Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 8 / 20

Live migration Why A very important feature for production usage, such as host maintainance Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 9 / 20

Live migration Howto srcsh -chardev socket,id=char0,path=$socket_path \ -netdev type=vhost-user,id=net0,chardev=char0 \ -device virtio-net-pci,netdev=net0,mac=$mac \ -monitor telnet::3333,server,nowait \ dstsh -chardev socket,id=char0,path=$socket_path \ -netdev type=vhost-user,id=net0,chardev=char0 \ -device virtio-net-pci,netdev=net0,mac=$mac \ -incoming tcp:0:4444 trigger live migration: run on src host $ telnet 0 3333 > migrate -d tcp:$dst host:4444 Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 10 / 20

Live migration Under the hood log vring memory changes used vring update desc buffer notification the location change A GARP packet will be sent by the driver if it supports GUEST ANNOUNCE feature (kernel v35 or above) Otherwise, vhost constructs and sends an RARP packet +----+ +----+ VM +----------------> VM +----+ +--+-+ Notification packet +------------+ +------v------+ Souce Host Target Host +-----+------+ +------+------+ +--------+ +-------> Switch <--------+ +---+----+ v XXXXXXXXXXXXXXXXXX XX XX X External Network X XX XX XXXXXXXXXXXXXXXXXX Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 11 / 20

Reconnect The issue A crash of DPDK app (such as OVS) need restart all VMs connected to it, which is A disaster for production usage Also not handy for development, that building and restarting DPDK happens often Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 12 / 20

Reconnect Howto -chardev socket,id=char0,path=$socket_path,server \ -netdev type=vhost-user,id=net0,chardev=char0 \ -device virtio-net-pci,netdev=net0,mac=$mac \ NOTE: DPDK v1607 (or above) and QEMU v27 (or above) are needed! Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 13 / 20

Reconnect Under the hood When DPDK starts (or restarts), it invokes connect() it may succeed, when the server (QEMU) is up fail, when the server is not up yet DPDK vhost lib then creates a reconnect thread to keep calling connect() until the server is up When QEMU restarts, DPDK detect a connection broken, which in turn puts it to the reconnect thread Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 14 / 20

Vhost PMD What A virtual PMD wraps vhost APIs Common eth APIs apply rte eth dev configure rte eth rx queue setup rte eth tx queue setup rte eth dev start rte eth rx burst rte eth tx burst Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 15 / 20

Vhost PMD Howto $ testpmd -c 0x1f -n 4 \ --socket-mem 4096,0 \ --vdev "eth_vhost0,iface=$socket_path1,queues=2" \ --vdev "eth_vhost1,iface=$socket_path2,queues=2" \ Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 16 / 20

Vhost-pci What A method to Tx packets from one VM to another VM directly +----------+ +----------+ +-----------+ +-----------+ VM VM VM VM Tx Rx Tx Rx +----------+ +----------+ +-----------+ +-----------+ virtio virtio vhost-pci +----------> virtio +----+-----+ +-----+----+ +-----------+ +-----------+ ^ +-----------+ dequeue enqueue +-------> VHOST +-------+ +-----------+ Generic vhost datapath Vhost-pci datapath Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 17 / 20

Vhost-pci Under the hood A PCI device 2nd VM s memory is mapped and stored in 1st VM c PCI BAR 2 The 2nd VM will send some vhost-pci messages to tell necessary info, such as vring addr 1st VM is then able to directly Tx packets to the 2nd VM s Rx vring Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 18 / 20

Vhost zero copy +----------+ +----------+ VM VM Tx Rx +----------+ +----------+ virtio virtio +----+-----+ +-----+----+ ^ Copy I Copy II +-----------+ dequeue enqueue +-------> VHOST +-------+ +-----------+ Instead of doing Copy I, let mbuf reference the virtio desc buffer directly Therefore, one copy is saved Generic vhost datapath Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 19 / 20

Thanks Thanks! Yuanhan Liu (Intel DPDK) Virtio/vhost status update Aug 2016 20 / 20