Virtual switching technologies and Linux bridge

Similar documents
Stacked Vlan - Performance Improvement and Challenges

Rocker: switchdev prototyping vehicle

What is an L3 Master Device?

Corporate Update. OpenVswitch hardware offload over DPDK. DPDK summit 2017

Using SR-IOV offloads with Open-vSwitch and similar applications

Bringing the Power of ebpf to Open vswitch. Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.

Ethernet switch support in the Linux kernel

Agilio OVS Software Architecture

An Introduction to Open vswitch

A Brief Guide to Virtual Switching Franck Baudin (Red Hat) Billy O Mahony (Intel)

Ethernet Services over IPoIB

Agilio CX 2x40GbE with OVS-TC

Chapter 4 Configuring Switching

Hardware accelerating Linux network functions Roopa Prabhu, Wilson Kok

vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018

New Approach to OVS Datapath Performance. Founder of CloudNetEngine Jun Xiao

DPDK Summit China 2017

Achieve Low Latency NFV with Openstack*

Virtual Switches. Yao-Min Chen. Virtual Switches

Link Virtualization based on Xen

Stacked Vlan: Performance Improvement and Challenges

VDPA: VHOST-MDEV AS NEW VHOST PROTOCOL TRANSPORT

SmartNIC Programming Models

Accelerating VM networking through XDP. Jason Wang Red Hat

VALE: a switched ethernet for virtual machines

SmartNIC Programming Models

Creating a Virtual Network with Virt-manager

ERSPAN in Linux. A short history and review. Presenters: William Tu and Greg Rose

Networking in Virtual Infrastructure and Future Internet. NCHC Jen-Wei Hu

A Userspace Packet Switch for Virtual Machines

vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008

Virtio 1 - why do it? And - are we there yet? Michael S. Tsirkin Red Hat

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

Scaling bridge forwarding database. Roopa Prabhu, Nikolay Aleksandrov

Cloud Networking (VITMMA02) Network Virtualization: Overlay Networks OpenStack Neutron Networking

CS-580K/480K Advanced Topics in Cloud Computing. Network Virtualization

Data Center Virtualization: VirtualWire

KVM PERFORMANCE OPTIMIZATIONS INTERNALS. Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May

Deploy the ASAv Using KVM

Kata Containers The way to run virtualized containers. Sebastien Boeuf, Linux Software Engineer Intel Corporation

z/vm Virtual Switch: The Basics

Bringing the Power of ebpf to Open vswitch

Advanced Computer Networks. End Host Optimization

OVS Acceleration using Network Flow Processors

Introduction to the Cisco ASAv

Zero-copy Receive for Virtualized Network Devices

Open vswitch - architecture

Rtnetlink dump filtering in the kernel Roopa Prabhu

Using SR-IOV on OpenStack

Accelerating Contrail vrouter

Course Review. Hui Lu

Open vswitch in Neutron

Enabling DPDK Accelerated OVS in ODL and Accelerating SFC

Arrakis: The Operating System is the Control Plane

DPDK Summit 2016 OpenContrail vrouter / DPDK Architecture. Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC,

Lecture 5. Switching

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

double split driver model

Contemporary Linux Networking

Cloud Networking (VITMMA02) Server Virtualization Data Center Gear

Deploy the ExtraHop Discover Appliance on a Linux KVM

Open vswitch DPDK Acceleration Using HW Classification

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, t he Energy Efficient Solutions logo, mobilegt, PowerQUICC,

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

Design and Implementation of Virtual TAP for Software-Defined Networks

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture

Configuring TAP Aggregation and MPLS Stripping

Performance Considerations of Network Functions Virtualization using Containers

vsphere Networking Update 1 Modified on 04 OCT 2017 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5

KVM as The NFV Hypervisor

FSOS. Ethernet Configuration Guide

vsphere Networking 17 APR 2018 VMware vsphere 6.7 VMware ESXi 6.7 vcenter Server 6.7

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

Configuring and Benchmarking Open vswitch, DPDK and vhost-user. Pei Zhang ( 张培 ) October 26, 2017

Network stack virtualization for FreeBSD 7.0. Marko Zec

Accelerate Service Function Chaining Vertical Solution with DPDK

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)

THE INTERNET PROTOCOL INTERFACES

SWITCHD. An OpenFlow implementation for OpenBSD BSDCan 2016 Reyk Flöter ESDENERA NETWORKS GmbH

NCT240 IP DSLAM with IAC4500 VLAN Tagging Implementation

Configuring Port-Based and Client-Based Access Control (802.1X)

The Internet Protocol

A. ARPANET was an early packet switched network initially connecting 4 sites (Stanford, UC Santa Barbara, UCLA, and U of Utah).

Open vswitch: A Whirlwind Tour. Jus8n Pe:t March 3, 2011

Host Dataplane Acceleration: SmartNIC Deployment Models

Accelerating vrouter Contrail

The Addressing of Data Link Layer

VIRTIO: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD. CUNMING LIANG, Intel

Evolution of the netmap architecture

Configuring Tap Aggregation and MPLS Stripping

Emulex Universal Multichannel

PCI SR-IOV on FreeBSD. Ryan Stone

Measuring and Modeling of Open vswitch Performance Implementation in KVM environment

Docker Networking Deep Dive online meetup

Storage Protocol Offload for Virtualized Environments Session 301-F

VIRTIO-NET: VHOST DATA PATH ACCELERATION TORWARDS NFV CLOUD. CUNMING LIANG, Intel

Implementing a TCP Broadband Speed Test in the Cloud for Use in an NFV Infrastructure

VXLAN Overview: Cisco Nexus 9000 Series Switches

Transcription:

Virtual switching technologies and Linux bridge Toshiaki Makita NTT Open Source Software Center

Today's topics Virtual switching technologies in Linux Software switches (bridges) in Linux Switching technologies for KVM environment Performance of switches Userland APIs and commands for bridge Introduction to Recent features of bridge (and others) FDB manipulation VLAN filtering Learning/flooding control Features under development 802.1ad (Q-in-Q) support for bridge Non-promiscuous bridge 2

Who is Toshiaki Makita? Linux engineer at NTT Open Source Software Center Technical support for NTT group companies Active patch submitter on networking subsystem bridge, etc. 3

Software switches in Linux Linux has 3 types of software switches bridge macvlan Open vswitch 4

bridge HW switch like device (IEEE 802.1D) Has FDB (Forwarding DB), STP (Spanning tree), etc. Using promiscuous mode that allows to receive all packets Common NIC filters unicast whose dst is not its mac address without promiscuous mode Many NICs also filter multicast / vlan-tagged packets by default without bridge TCP/IP pass to upper layer with bridge TCP/IP br0 bridge handler hook promiscuous mode if dst mac is bridge device eth1 promiscuous mode 5

macvlan VLAN using not 802.1Q tag but mac address 4 types of mode private vepa bridge passthru Using unicast filtering if supported, instead of promiscuous mode (except for passthru) Unicast filtering allows NIC to receive multiple mac addresses MAC address A macvlan0 macvlan MAC address B macvlan1 handler hook unicast filtering 6

macvlan (private mode) vlan device like behavior Not a bridge Prohibit intermacvlan traffic (except for those via external GW) MAC address A MAC address B macvlan0 macvlan1 macvlan External SW External GW 7

macvlan (vepa mode) Similar to private mode Allow traffic between macvlans (via external SW) MAC address A MAC address B macvlan0 macvlan1 macvlan External SW 8

macvlan (bridge mode) Light weight bridge No source learning No STP Only one uplink Allow traffic between macvlans (via macvlan stack) MAC address A MAC address B macvlan0 macvlan1 macvlan External SW 9

macvlan (passthru mode) Allow only one macvlan device Used for VM (as macvtap) Promiscuous allow VM to use any mac address / vlan device MAC address A macvlan0 macvlan promiscuous External SW 10

Open vswitch Supports OpenFlow Can be used as a normal switch as well Has many features (VLAN tagging, VXLAN, GRE, bonding, etc.) Flow based forwarding Control plane in user space flow miss-hit causes upcall to userspace daemon user space daemon (ovs-vswitchd) control plane openvswitch (datapath) data plane handler hook promiscuous mode Flow table upcall FDB Flow table (cache) eth1 OpenFlow controller 11

Switching technologies for KVM Software switches bridge macvlan Open vswitch Hardware switch NIC embedded switch (in SR-IOV device) 12

bridge with KVM Used with tap device Tap device packet transmission -> file read file write -> packet reception qemu/vhost Guest fd read/write bridge vfs tap0 13

macvtap (private, vepa, bridge) with KVM macvtap tap-like macvlan variant packet reception -> file read file write -> packet transmission qemu/vhost Guest fd qemu/vhost Guest fd macvtap0 read/write macvtap1 read/write macvlan 14

macvtap (passthru) with KVM macvtap passthru mode PCI-passthrough like mode Guest can exclusively use physical device Guest can use any mac address / vlan interface Guest can use promiscuous mode Other modes uses unicast filtering Don't allow to receive mac address except for macvtap device's Don't allow vlan tagged packets if NIC has vlan filtering feature qemu/vhost Guest fd read/write macvtap0 macvlan promiscuous 15

Open vswitch with KVM Configuration is the same as bridge used with tap device qemu/vhost Guest fd read/write openvswitch vfs tap0 16

NIC embedded switch (SR-IOV) SR-IOV Addition to PCI normal physical function (PF), allow to add light weight virtual functions (VF) VF appears as a network interface (_0, _1...) Some SR-IOV devices have switches in them allow PF-VF / VF-VF communication PF VF VF _0 _1 embedded switch SR-IOV supported NIC 17

NIC embedded switch (SR-IOV) SR-IOV with KVM Use PCI-passthrough to attach VF to guest qemu Guest _0 qemu Guest _1 embedded switch SR-IOV supported NIC 18

NIC embedded switch (SR-IOV) SR-IOV with KVM Or use macvtap (passthru) migration-friendly qemu/vhost Guest qemu/vhost Guest fd fd macvtap0 macvtap1 _0 _1 embedded switch SR-IOV supported NIC 19

Performance of switches Environment Test results Throughput Overhead on host 20

Performance: environment 3.14.4 (2014/5/13 Release) Host: Xeon E5-2407 4 core * 2 socket NIC: 10GbE, Intel 82599 chip (ixgbe) Guest: 2 core *1 HW Switch: BLADE G8124 Benchmark tool: netperf-2.6 UDP_STREAM test (1518 byte frame length) host guest netserver bridge etc. UDP packets 82599 BLADE G8124 82599 netperf host *1: Pinning on host: vcpus -> CPU0~3, vhost -> CPU1. NIC irq affinity on host: 0x1 (CPU0). Pinning on guest: netserver process -> CPU1. NIC irq affinity on guest: 0x1 (CPU0). 21

Throughput (Gbps) Performance: throughput Receive throughput on guest SR-IOV (PCI-passthrough) has the highestperformance Software switches are 6%~14% worse than SR-IOV (PCI-passthrough) 7 6 5 4 3 2 1 0 22

CPU usage (%) CPU usage (%) Performance: Overhead on host Overhead (CPU usage) on host 350 SR-IOV (PCI-passthrough) has the lowest overhead 300 250 200 user system CPU usage by system and irqs are close to 0 150 100 50 hardirq softirq 0 CPU usage by macvtap is 24~29% lower than bridge / Open vswitch 250 200 150 100 50 0 vcpu1 vcpu0 vhost 23

Userland APIs and commands (bridge) Various APIs ioctl sysfs netlink Netlink is preferred for new features Because it is extensible sysfs is sometimes used Commands brctl ip / bridge (in bridge-utils, using ioctl / sysfs) (in iproute2, using netlink) 24

Userland APIs and commands (bridge) brctl # brctl addbr <bridge>... create new bridge # brctl addif <bridge> <port>... attach port to bridge # brctl showmacs <bridge>... show fdb entries These operations are now realized by netlink based commands as well (Since 3.0) # ip link add <bridge> type bridge... create new bridge # ip link set <port> master <bridge>... attach port # bridge fdb show... show fdb entries And recent features can only be used by netlink based ones or direct sysfs write # bridge fdb add # bridge vlan add etc... 25

Recent features of bridge (and others) FDB manipulation VLAN filtering Learning / flooding control 26

FDB manipulation FDB Forwarding database Learning: packet arrival triggers entry creation Source MAC address is used with incoming port Flood if failed to find entry Flood: deliver packet to all ports but incoming one FDB MAC address aa:bb:cc:dd:ee:ff Dst learning bridge... packet arrival from aa:bb:cc:dd:ee:ff eth1 27

FDB manipulation FDB manipulation commands Since 3.0 # bridge fdb add <mac address> dev <port> master temp # bridge fdb del <mac address> dev <port> master MAC address specified mac... Dst port bridge specified port eth1 28

FDB manipulation What's "temp"? There are 3 types of FDB entries permanent (local) static others (dynamically learned by packet arrival) "temp" means static here "bridge fdb"'s default is permanent permanent here means "deliver to bridge device" (e.g. br0) permanent doesn't deliver to specified port # bridge fdb add <mac address> dev <port> master temp br0 bridge (br0) if match permanent eth1 specified port 29

FDB manipulation What's "master"? Remember this command # bridge fdb add <mac address> dev <port> master temp # ip link set <port> master <bridge>... attach port "bridge fdb"'s default is "self" It adds entry to specified port () itself! master bridge specified port (self) eth1 30

FDB manipulation When to use "self"? Some NIC embedded switches support this command ixgbe, qlcnic macvlan (passthru) and vxlan also support it master bridge self PF VF VF _0 _1 embedded switch SR-IOV supported NIC 31

FDB manipulation Example: Intel 82599 (ixgbe) Someone thinks of using both bridge and SR-IOV due to limitation of number of VFs bridge puts (PF) into promiscuous, but... Unknown MAC address from VF goes to wire, not to PF qemu qemu Guest 1 MAC A eth1 tap bridge Guest 2 MAC C _0 VF PF MAC B Dst. A embedded switch Intel 82599 (ixgbe) 32

FDB manipulation Example: Intel 82599 (ixgbe) Type "bridge fdb add A dev " on host Traffic to A will be forwarded to bridge qemu qemu Guest 1 MAC A eth1 tap bridge Guest 2 MAC C _0 VF add fdb entry PF MAC B Dst. A embedded switch Intel 82599 (ixgbe) 33

VLAN filtering 802.1Q Bridge Filter packets according to vlan tag Forward packets according to vlan tag as well as mac address Insert / strip vlan tag FDB MAC address Vlan Dst aa:bb:cc:dd:ee:ff 10... insert / strip vlan tag bridge filter disallowed vlan eth1 34

VLAN filtering Ingress / egress filtering policy Incoming / outgoing packet is filtered if matching filtering policy Per-port per-vlan policy Default is "disallow all vlans" All packets are dropped Filtering table Port 10 Allowed Vlans 20 eth1 20 30 bridge filter by vlan filter by vlan at ingress at egress allow 10 disallow 10 eth1 VID 10 35

VLAN filtering PVID (Port VID) Untagged (and VID 0) packet is assigned this VID Per-port configuration Default PVID is none (untagged packet is discarded) Egress policy untagged Outgoing packet that matches this policy get untagged Per-port per-vlan policy Filtering table Port Allowed Vlans PVID 10 20 eth1 20 Egress Untag bridge apply pvid (insert vid 20) eth1 apply untagged (strip tag 20) 30 untagged packet 36

VLAN filtering Commands Enable VLAN filtering (disabled by default) # echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering Add / delete allowed vlan # bridge vlan add vid <vlan_id> dev <port> # bridge vlan del vid <vlan_id> dev <port> Set pvid / untagged # bridge vlan add vid <vlan_id> dev <port> [pvid] [untagged] Dump setting # bridge vlan show Note: bridge device needs "self" # bridge vlan add vid <vlan_id> dev br0 self # bridge vlan del vid <vlan_id> dev br0 self 37

VLAN with KVM Traditional configuration Use vlan devices Needs bridges per vlan Low flexibility How many devices? # ifconfig -s Iface....10 br10.20 br20.30 br30.40 br40... qemu qemu Guest Guest tap0 tap1 br10 br20.10.20 38

VLAN with KVM With VLAN filtering Simple Flexible Only one bridge # ifconfig -s Iface... br0 qemu Guest tap0 pvid/untag vlan 10 br0 qemu Guest tap1 pvid/untag vlan 20 vlan10 / 20 39

VLAN with KVM Other switches Open vswitch Can also handle VLANs # ovs-vsctl set Port <port> tag=<vid> NIC embedded switch Some of them support VLAN (e.g. Intel 82599) # ip link set <PF> vf <VF_num> vlan <vid> 40

Learning / flooding control Limit mac addresses guest can use Reduce FDB size Used with static FDB entries ("bridge fdb" command) qemu Guest qemu Guest Disable FDB learning on particular port Since 3.11 No dynamic FDB entry Don't flood unknown mac to specified port Since 3.11 Control packet delivery to guests tap0 no learning no flooding bridge learning flooding tap1 no learning no flooding Commands # echo 0 > /sys/class/net/<port>/brport/learning # echo 0 > /sys/class/net/<port>/brport/unicast_flooding 41

Features under development 802.1ad (Q-in-Q) support for bridge Non-promiscuous bridge 42

802.1ad (Q-in-Q) support for bridge 802.1ad allows stacked vlan tags MAC.1ad tag.1q tag payload Outer 802.1ad tag can be used to separate customers Example: Guest A, B -> Customer X Guest C, D -> Customer Y Inner 802.1Q tag can be used inside customers Customer X and Y can use any 802.1Q tags 43

802.1ad (Q-in-Q) support for bridge Bridge preserves guest.1q tag (vid 30) when inserting.1ad tag (vid 10) qemu Guest A.30.1Q VID 30 qemu Guest C.1ad tag will be stripped at another end point of.1ad network.1ad VID 10.1Q VID 30 tap0 pvid/untag vlan 10 tap1 bridge (.1ad mode) vlan10 / 20 pvid/untag vlan 20 Customer's another site.1q VID 30.1ad network.1ad VID 10.1Q VID 30 44

Non-promiscuous bridge If there is only one learning/flooding port, it can be non-promisc qemu Guest qemu Guest Instead of promisc mode, unicast filtering is set for static FDB entries Automatically enabled if meeting some conditions There is one or zero learning & flooding port bridge itself is not promiscuous mode VLAN filtering is enabled tap0 no learning no flooding non-promisc bridge learning flooding tap1 no learning no flooding Overhead will get closer to macvlans 45

Summary Linux has 3 types of software switches bridge, macvlan (macvtap), Open vswitch SR-IOV NIC enbedded switch can also be used for KVM Bridge's recent features FDB manipulation VLAN filtering Learning / Flooding control Features under development 802.1ad (Q-in-Q) support Non-promiscuous bridge 46

Thank you for listening. Any questions? 47