DPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX

Similar documents
vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018

Open vswitch DPDK Acceleration Using HW Classification

Corporate Update. OpenVswitch hardware offload over DPDK. DPDK summit 2017

UDP Encapsulation in Linux netdev0.1 Conference February 16, Tom Herbert

Mellanox DPDK. Release Notes. Rev 16.11_4.0

Accelerate Service Function Chaining Vertical Solution with DPDK

Mellanox DPDK. Release Notes. Rev 16.11_2.3

Enabling DPDK Accelerated OVS in ODL and Accelerating SFC

Virtio/vhost status update

Intel Ethernet Controller 700 Series GTPv1 - Dynamic Device Personalization

Using SR-IOV offloads with Open-vSwitch and similar applications

SUSE Linux Enterprise Server (SLES) 12 SP4 Inbox Driver Release Notes SLES 12 SP4

USER SPACE IPOIB PACKET PROCESSING

DPDK Summit 2016 OpenContrail vrouter / DPDK Architecture. Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr.

Cloud Networking (VITMMA02) Network Virtualization: Overlay Networks OpenStack Neutron Networking

Dynamic Device Personalization Guide: Intel Ethernet 700 Series Controller GTPv1 Profile. Application Note

What s happened to the world of networking hardware offloads? Jesse Brandeburg Anjali Singhai Jain

On the cost of tunnel endpoint processing in overlay virtual networks

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

Accelerating Load Balancing programs using HW- Based Hints in XDP

Mellanox DPDK Release Notes

New Approach to OVS Datapath Performance. Founder of CloudNetEngine Jun Xiao

Networking at the Speed of Light

SUSE Linux Enterprise Server (SLES) 15 Inbox Driver Release Notes SLES 15

SmartNIC Programming Models

Implementing VXLAN in DataCenter

Optimizing your virtual switch for VXLAN. Ron Fuller, VCP-NV, CCIE#5851 (R&S/Storage) Staff Systems Engineer NSBU

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Agilio OVS Software Architecture

Agilio CX 2x40GbE with OVS-TC

Programmable Overlays with VPP

SmartNIC Programming Models

Intel Rack Scale Architecture. using Intel Ethernet Multi-host Controller FM10000 Family

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Higher scalability to address more Layer 2 segments: up to 16 million VXLAN segments.

XDP: 1.5 years in production. Evolution and lessons learned. Nikita V. Shirokov

Bringing the Power of ebpf to Open vswitch. Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.

In-situ OAM. IPPM November 6 th, 2018

Ed Warnicke, Cisco. Tomasz Zawadzki, Intel

Accelerating VM networking through XDP. Jason Wang Red Hat

A Brief Guide to Virtual Switching Franck Baudin (Red Hat) Billy O Mahony (Intel)

Lesson 9 OpenFlow. Objectives :

Securing Network Traffic Tunneled Over Kernel managed TCP/UDP sockets

DPDK Summit China 2017

Implemen'ng IPv6 Segment Rou'ng in the Linux Kernel

Improve Performance of Kube-proxy and GTP-U using VPP

Accelerating vrouter Contrail

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

Accelerating Contrail vrouter

Hardware Checksumming

Learning with Purpose

SR for SD-WAN over hybrid networks

Tungsten Fabric Optimization by DPDK ZHAOYAN CHEN YIPENG WANG

Evaluation of virtualization and traffic filtering methods for container networks

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)

Intel Ethernet Server Adapter XL710 for OCP

Advanced Computer Networks. End Host Optimization

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2.

PLUSOPTIC NIC-PCIE-2SFP+-V2-PLU

KeyStone Training. Network Coprocessor (NETCP) Packet Accelerator (PA)

I Commands. iping, page 2 iping6, page 4 itraceroute, page 5 itraceroute6 vrf, page 6. itraceroute vrf encap vxlan, page 12

A Next Generation Home Access Point and Router

Packet Header Formats

QuickSpecs. HP Z 10GbE Dual Port Module. Models

ETHERNET OVER INFINIBAND

Data Center Configuration. 1. Configuring VXLAN

Accelerate Cloud Native with FD.io

Packet Filtering. Adaptive. APPlication NOTE // Adaptive Packet Filtering

VXLAN Deployment Use Cases and Best Practices

BIG-IP TMOS : Tunneling and IPsec. Version 13.0

MoonGen. A Scriptable High-Speed Packet Generator. Paul Emmerich. January 31st, 2016 FOSDEM Chair for Network Architectures and Services

Intel 10Gbe status and other thoughts. Linux IPsec Workshop Shannon Nelson Oracle Corp March 2018

How OAM Identified in Overlay Protocols

DPDK+eBPF KONSTANTIN ANANYEV INTEL

ALLNET ALL0141-4SFP+-10G / PCIe 10GB Quad SFP+ Fiber Card Server

Agenda Introduce NSX-T: Architecture Switching Routing Firewall Disclaimer This presentation may contain product features that are currently under dev

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

CS-580K/480K Advanced Topics in Cloud Computing. Network Virtualization

It's kind of fun to do the impossible with DPDK Yoshihiro Nakajima, Hirokazu Takahashi, Kunihiro Ishiguro, Koji Yamazaki NTT Labs

Programming Netronome Agilio SmartNICs

Cubro Packetmaster EX32100

Cubro Packetmaster EX32/32(+)

ENG. WORKSHOP: PRES. Linux Networking Greatness (part II). Roopa Prabhu/Director Engineering Linux Software/Cumulus Networks.

Open vswitch in Neutron

Cisco CSR 1000V VxLAN Support 2

Red Hat Enterprise Linux (RHEL) 7.5-ALT Driver Release Notes

Nexus 1000V in Context of SDN. Martin Divis, CSE,

Implementing VXLAN. Prerequisites for implementing VXLANs. Information about Implementing VXLAN

Arista EOS Central Drop Counters

Identifier Locator Addressing IETF95

Hardware Flow Offload. What is it? Why you should matter?

lib/librte_ipsec A NATIVE DPDK IPSEC LIBRARY UPDATE 2018/12 DECLAN DOHERTY, MOHAMMAD ABDUL AWAL, KONSTANTIN ANANYEV

ANIC Host CPU Offload Features Overview An Overview of Features and Functions Available with ANIC Adapters

A Universal Terabit Network Dataplane

Data Sheet FUJITSU PLAN EP Intel X710-DA2 2x10GbE SFP+

Network Adapter Flow Steering

GTP Load Balancing. How to do GTP Load Balancing with Cubro Products. 27 Sep

PE310G4DBi9 Quad port Fiber 10 Gigabit Ethernet PCI Express Content Director Server Adapter Intel based

Cubro Sessionmaster EXA32100

NETWORK OVERLAYS: AN INTRODUCTION

Transcription:

x DPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX

Rony Efraim

Introduction to DC w/ overlay network Modern data center (DC) use overly network like Virtual Extensible LAN (VXLAN) and GENEVE to create a virtual network on top of the physical network. The problem : all the HW acceleration used to accelerate TCP/IP stop working 3

The goal VMs on the DC eventually see a plain eth packet. Provide all necessary offloads for DPDK vswitch to offload as much as it can to spare CPU and to scale Support all stateless HW acceleration for overly like: RSS on inner 5 tuple inner csum rx/tx ( ~15% gain) Inner TSO (~30% gain in BW and 30% reduce in CPU cycles) Switch rules offload 4

Yongseok Koh

Inner RSS To achieve uniform distribution for tunneled packets Vary UDP source port if UDP is the transport (e.g., VxLAN) Use inner header for RSS hash Inner RSS Set level in struct rte_flow_action_rss 0 for default behavior 1 for outermost header 2 and higher for inner header testpmd> flow create 0 ingress pattern eth / ipv4 / udp / vxlan vni is 100 / eth / ipv4 / tcp / end actions rss level 2 queues 0 1 2 3 end / end 6

Inner Checksum DEV_RX_OFFLOAD_XXX_CKSUM XXX = [IPV4 UDP TCP SCTP OUTER_IPV4 OUTER_UDP] PMD marks mbuf->ol_flags with PKT_RX_XXX_CKSUM_[UNKNOWN BAD GOOD NONE] DEV_TX_OFFLOAD_XXX_CKSUM XXX = [IPV4 UDP TCP OUTER_IPV4 OUTER_UDP] App requests by marking mbuf->ol_flags with PKT_TX_XXX_CKSUM XXX = [IP L4_NO TCP SCTP UDP OTUER_IP OUTER_UDP] 7

Inner TSO App requests by mbuf->ol_flags = PKT_TX_TCP_SEG PKT_TX_[IPV4 IPV6] For IPv4 packet, set PKT_TX_IP_CKSUM DEV_TX_OFFLOAD_XXX_TNL_TSO Some device supports only specific tunnels XXX=[VXLAN GRE IPIP GENEVE] DEV_TX_OFFLOAD_[IP UDP]_TNL_TSO Others (e.g. MLX5) also support generic tunnels (not listed above) mbuf must have proper flags and offset values PKT_TX_TUNNEL_[IP UDP] outer_l2_len, outer_l3_len, l2_len, l3_len and l4_len 8

Encap/Decap There were only two tunnel types defined until v18.08 RTE_FLOW_ACTION_TYPE_[VXLAN NVGRE]_[ENCAP DECAP] Take a list of rte_flow_items for encap header Needed to define more tunnel types L3 encap/decap is required, where there s no inner L2. VXLAN-GPE [ETH/IP/UDP/VXLAN/IP/ ] MPLS over UDP [ETH/IP/UDP/MPLS/IP/ ] 9

Raw Encap/Decap Generic Encap/Decap action RTE_FLOW_ACTION_TYPE_RAW_[ENCAP DECAP] Added in v18.11 Encap header is specified with raw data Data pointer in struct rte_flow_action_raw_encap Two actions need to be specified for L3 encap/decap [ETH / IPV6 / TCP] Decap L2 [IPV6 / TCP] Encap MPLSoUDP [ETH / IPV4 / UDP / MPLS / IPV6 / TCP] 10

Modify Header RTE_FLOW_ACTION_TYPE_SET_XXX XXX = [MAC_[SRC DST] IPV4_[SRC DST] IPV6_[SRC DST] TP_[SRC DST]] RTE_FLOW_ACTION_[SET DEC]_TTL Suitable for NAT Hairpin 11

Metadata Match RTE_FLOW_ITEM_META Added in 18.11 Match on egress mbuf->tx_metadata mbuf->ol_flags = PKT_TX_METADATA Useful to apply different encap/decap actions between VMs VMs may have same network topology, i.e. same network headers Need to distinguish different VMs VM ID as metadata 12

Rony Efraim

Example Flow rules add rte_flow on Tx to match on vport (send-q/md) and inner L4/L3/L2 with action to Encap + send to wire Loopback packet + set a vport ID / Rx metadata Exceptions: Rx metadata will be the vport/vm id sent the packet VM -> local VM Rx metadata will be the destination vport/vm add rte_flow on Rx to match on vxlan + inner L4/L3/L2 with action decap + set a vport ID / Rx metadata. Name space 1 VirtIO Vhost 1 Name space 2 VirtIO Vhost 2 Name space 3 VirtIO Vhost 3 Exception DPDK-switch App Switch rules DB From VM (Tx) pulls from vhost and sends the buffer on the vport send-q or on HV-Q with MD No need to classify or fetch the data buffer. Q per vport HV Q PMD RX with MD/vPort Rte_flow To VM (Rx) Single receive-q for packets from all sources (loopback + wire) polls NIC (PMD) and get a metadata with every packet. The metadata is used either to forward the packet to the VM or further process by the switch application. HW eswitch Flows/rules Uplink 14

Zero copy on RX Name space 1 Name space 2 Name space 3 VirtIO VirtIO VirtIO The HW can steer the traffic according to the vport ID/ Rx Metadata to receive-q. Vhost 1 Vhost 2 Vhost 3 Exception DPDK-switch App By using a receive-q per VM the software switch application can offload : The need to read the Rx metadata and to pull a batch of packets and forward to a specific vhost. Using the buffer form the vring or buffers of the VM there is no need to copy the buffers. Q per vport HV Q PMD Switch rules DB Rte_flow Flows/rules Mellanox NIC support up to millions of receive-q. HW eswitch Uplink 15

QnA

Thank You!