Improve Performance of Kube-proxy and GTP-U using VPP

Similar documents
Singapore. Service Proxy, Container Networking & K8s. Acknowledgement: Pierre Pfister, Jerome John DiGiglio, Ray

A Hierarchical SW Load Balancing Solution for Cloud Deployment

Accelerate Cloud Native with FD.io

Agilio CX 2x40GbE with OVS-TC

A Universal Dataplane. FastData.io Project

vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018

DPDK Vhost/Virtio Performance Report Release 18.05

Accelerating Contrail vrouter

DPDK Performance Report Release Test Date: Nov 16 th 2016

DPDK Vhost/Virtio Performance Report Release 18.11

Ed Warnicke, Cisco. Tomasz Zawadzki, Intel

Accelerating vrouter Contrail

Accelerate Service Function Chaining Vertical Solution with DPDK

FD.io VPP & Ligato Use Cases. Contiv-VPP CNI plugin for Kubernetes IPSEC VPN gateway

DPDK Intel NIC Performance Report Release 17.08

DPDK Intel NIC Performance Report Release 18.02

A Brief Guide to Virtual Switching Franck Baudin (Red Hat) Billy O Mahony (Intel)

DPDK Intel NIC Performance Report Release 18.05

DPDK Load Balancers RSS H/W LOAD BALANCER DPDK S/W LOAD BALANCER L4 LOAD BALANCERS L7 LOAD BALANCERS NOV 2018

Maximizing Network Throughput for Container Based Storage David Borman Quantum

Best Practices for Setting BIOS Parameters for Performance

DPDK Vhost/Virtio Performance Report Release 17.08

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

DPDK Summit China 2017

DPDK Intel Cryptodev Performance Report Release 18.08

Enabling DPDK Accelerated OVS in ODL and Accelerating SFC

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Life of a Packet. KubeCon Europe Michael Rubin TL/TLM in GKE/Kubernetes github.com/matchstick. logo. Google Cloud Platform

Measuring a 25 Gb/s and 40 Gb/s data plane

FD.io : The Universal Dataplane

A Universal Terabit Network Dataplane

PVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim

Dataplane Networking journey in Containers

Implemen'ng IPv6 Segment Rou'ng in the Linux Kernel

Virtual Switch Acceleration with OVS-TC

Understanding The Performance of DPDK as a Computer Architect

DPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX

SmartNIC Programming Models

On the cost of tunnel endpoint processing in overlay virtual networks

SmartNIC Programming Models

DPDK Intel Cryptodev Performance Report Release 17.11

Enabling Fast, Dynamic Network Processing with ClickOS

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

DPDK Roadmap. Tim O Driscoll & Chris Wright Open Networking Summit 2017

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Implementing A High Performance Virtualized CPE Solution

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Building high performance network functions in VPP. Ole Trøan, VPP contributor FOSDEM 2018

Datacenter Network Solutions Group

Backend for Software Data Planes

The dark powers on Intel processor boards

Building a Platform Optimized for the Network Edge

Design and Implementation of Virtual TAP for Software-Defined Networks

Experiences in Building a 100 Gbps (D)DoS Traffic Generator

Intel s Architecture for NFV

New Approach to OVS Datapath Performance. Founder of CloudNetEngine Jun Xiao

Host Dataplane Acceleration: SmartNIC Deployment Models

Jim Harris Principal Software Engineer Intel Data Center Group

VPP Host Stack. TCP and Session Layers. Florin Coras, Dave Barach, Keith Burns, Dave Wallace

The Power of Batching in the Click Modular Router

High Performance Cloud-native Networking K8s Unleashing FD.io

BIOS Parameters by Server Model

Accelerating VM networking through XDP. Jason Wang Red Hat

TLDK Overview. Transport Layer Development Kit Ray Kinsella February ray.kinsella [at] intel.com IRC: mortderire

Improving DPDK Performance

Modeling the impact of CPU properties to optimize and predict packet-processing performance

TLDK Overview. Transport Layer Development Kit Keith Wiles April Contributions from Ray Kinsella & Konstantin Ananyev

Accelerating 4G Network Performance

Share IETF understanding on User Plane of 3GPP 5G System Intend to be a part of the LS reply to User Plane Protocol Study in 3GPP

6WINDGate. White Paper. Packet Processing Software for Wireless Infrastructure

DPDK Summit 2016 OpenContrail vrouter / DPDK Architecture. Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr.

OpenFlow Software Switch & Intel DPDK. performance analysis

VALE: a switched ethernet for virtual machines

Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!

Link Virtualization based on Xen

Cloud Networking (VITMMA02) Network Virtualization: Overlay Networks OpenStack Neutron Networking

100 Gbps Open-Source Software Router? It's Here. Jim Thompson, CTO, Netgate

PacketShader: A GPU-Accelerated Software Router

Testing the Performance Impact of the Exact Match Cache

PE2G6BPi35 Six Port Copper Gigabit Ethernet PCI Express Bypass Server Adapter Intel based

G-NET: Effective GPU Sharing In NFV Systems

Kubernetes - Networking. Konstantinos Tsakalozos

Suricata Extreme Performance Tuning With Incredible Courage

Diffusion TM 5.0 Performance Benchmarks

Evolution of the netmap architecture

Project Calico v3.2. Overview. Architecture and Key Components. Project Calico provides network security for containers and virtual machine workloads.

Configuring and Benchmarking Open vswitch, DPDK and vhost-user. Pei Zhang ( 张培 ) October 26, 2017

DPDK Summit China 2017

Loadbalancer.org Virtual Appliance quick start guide v6.3

Session based high bandwidth throughput testing

Intel Speed Select Technology Base Frequency - Enhancing Performance

NFV go-live. Where are my containers? Franck Baudin Sr Principal Product Manager - OpenStack NFV May 9, 2018

TOWARDS FAST IP FORWARDING

Load Balancing Bloxx Web Filter. Deployment Guide v Copyright Loadbalancer.org

The Work of Containerized NFV Infrastructure on Arm Platform

TRex Realistic Traffic Generator

Improve VNF safety with Vhost-User/DPDK IOMMU support

28x 29x 30x [ 24x] 3.20GHz ( 133x24) CPU Clock Ratio CPU Frequency. CPU Host Clock Control [ Enable] CPU Host Frequency ( MHz ) 133

Mellanox NIC s Performance Report with DPDK Rev 1.0

Transcription:

Improve Performance of Kube-proxy and GTP-U using VPP Hongjun Ni (hongjun.ni@intel.com) Danny Zhou (danny.zhou@intel.com) Johnson Li (johnson.li@intel.com) Network Platform Group, DCG, Intel Acknowledgement: Jianfeng Tan

Agenda Enabling high performance kube-proxy Six ways to improve the performance of GTP-U Key takeaway 2

Introducing Vector Packet Processor Packet processing development platform Runs on commodity CPUs and leverages DPDK Runs as a Linux user-space application

VPP Graph Node and Plugin Framework dpdk-input Packet Vector ethernetinput mpls-ethernetinput ip4-input ip6-input arp-input ip4-lookup ip4-local ip4-udplookup ip4-loadbalance Kube-proxy ethernetoutput ip4- rewrite GTP-U

Enabling high performance kube-proxy Introducing kube-proxy Current pain point Proposed kube-proxy solution Kube-proxy dataplane GoVPP Performance 5

Introducing kube-proxy Watches addition and removal of Service and Endpoints. Installs iptables rules Captures traffic and select Pod Redirects traffic Reference: https://kubernetes.io/docs/concepts/services-networking/service/

Current Pain Point Supports userspace and iptables Uses kernel iptables NAT Performance degrades when service/endpoint pairs increase 7

Proposed kube-proxy solution API Server watch watch watch Kube-proxy CP Kube-proxy CP Kube-proxy CP GoVPP GoVPP GoVPP CP: Control Plane DP: Data Plane Kube-proxy DP Kube-proxy DP Kube-proxy DP Node 1 Node 1 Node 1

Kube-proxy Data Plane K8s-node Backend Pod 1 Pod IP: x.x.x.x Backend Pod 2 Pod IP: y.y.y.y Distribute traffic evenly virtio-user virtio-user Per flow stick to Pod vhost vhost Three service types: VPP DNAT Load balancing SNAT (1). ClusterIP: Port (2). NodeIP: NodePort (3). External LoadBalancer eth

Kube-proxy Data Plane Internals Supported Graph Nodes Other input Graph nodes ip-lookup kp4-nat4 kp6-nat6 kp4-nat6 kp4- nodeport kp6-nat4 udplookup kp6- nodeport ip4 -lookup input Other Graph nodes VIP->Pod Table ip6 -lookup Stores VIP->Pod session

Kube-proxy: External LoadBalancer VPP Kube-proxy K8s-node-1 Pod 1 VPP LB eth1 Kube-proxy DP Pod N VPP Router K8s-node-2 Pod 1 eth1 Kube-proxy DP Pod N K8s-node-3 Pod 1 eth1 Kube-proxy DP Pod N

GoVPP Golang toolset for VPP management VPP binary API (JSON) Go Structure Handle 250,000 binary API requests per second Reference: https://wiki.fd.io/images/f/fa/govpp-intro.pdf

Throughput (Mpps) Performance Kube-proxy Throughput For 64-Byte Packet 12 10 8 6 9.83 6.62 Linux iptables perf: < 400 kpps Scaling 4 2 0 Phy NIC vhost-user Test Case Input packet size (bytes) Output packet size (bytes) Load balance + DNAT + Routing 64 64 Reference: https://people.netfilter.org/kadlec/nftest.pdf

Six Ways to Improve Performance of GTP-U Cache table lookup result Bypass second ip-input Bypass first route lookup Dual-loop and Quad-loop Packet prefetching Bypass second route lookup 14

GTP-U Plugin Internals Typical GTP-U packet processed by VPP and GTP-U plugin Outer MAC header Outer IP header UDP header GTP-U header Inner IP header L4 header Payload GTP-U Plugin Supported Graph Nodes Other input Graph nodes ip-lookup udp-lookup gtpu-input gtpu-encap gtpu-bypass IP4 -input input Other Graph nodes GTP-U tunnel IP6 -input GTP-U Plugin Stores PDP context

Device Under Test for Performance Network Topology Hardware Configuration BIOS Configuration Traffic Gen (Trex) CPU DIMM Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz 2133 MHz, 64GB Total Enhanced Intel Speedstep Turbo Boost Processor C3 Enabled Enabled Disabled Port A Port B NIC 2x 82599ES 10-Gigabit SFI/SFP+ Network Connection Processor C6 Hyper-Threading Disabled Disabled PacketGen Ixia* 10 Gigabit Ethernet Traffic Generator Intel VT-d CPU Power and Performance Policy Enabled Performance Memory Freq. 2133 MHz Port A Port B Software Configuration Total Memory Size Memory RAS and Performance Configuration -> NUMA Optimized 64 GB ENABLED Core 0 OS Ubuntu 16.04.2 LTS QPI B/W 9.6 GT/s DUT (VPP + GTP-U) Kernel Linux version 4.4.0-62-generic DPDK 17.08 MLC Streamer MLC Spatial Prefetcher ENABLED ENABLED Legend Flow A Flow B VPP GTP-U 17.10-rc0 17.10-rc0 DCU Data Prefetcher DCU Instruction Prefetcher Direct Cache Access (DCA) 16ENABLED ENABLED ENABLED

Cache table lookup result Gtpu-input Read IP and TEID N key == stored key? Y Tunnel Lookup Retrieve stored tunnel index Store key and tunnel index Read tunnel entry

Bypass second ip-input dpdk-input Packet Vector ethernetinput ip4-input ip6-input arp-input Pros: Boost performance ip4-lookup ip4-local ip4-udplookup Cons : Security issue ip4-loadbalance gtpu-decap ethernetoutput ip4- rewrite ip4-lookup ip4-input

Bypass first route lookup dpdk-input ethernetinput Packet Vector gtpu packet: accelerate decap processing gtpu-bypss ip4-input none-gtpu packet ip4-lookup ip6-input ip4-local arp-input ip4-udplookup none-gtpu packet: an overhead with about 13 clocks per packet gtpu packet ip4-loadbalance gtpu-decap ethernetoutput ip4- rewrite ip4-lookup ip4-input

Throughput (Mpps) GTPU-Decap Performance and Analysis GTPU-Decap Throughput For 64-Byte Packet 12 10.4 10 8 6.84 7.29 8.32 6 4 2 0 raw cache table lookup bypss ip-input bypass route lookup Test Case Input packet size (bytes) Output packet size (bytes) Transport IP Routing + GTPU-Decap + Inner IP Routing 98 64

Dual-loop and Quad-loop, Packet Prefetching Reduce read latency Process packets in parallel

Bypass second route lookup dpdk-input Packet Vector ip4-input ip6-input arp-input ip4-lookup ip4-rewrite ip4-loadbalance gtpu-encap ethernetoutput ip4- rewrite ip4-lookup ethernetinput

Throughput (Mpps) GTPU-Encap Performance and Analysis GTPU-Encap Throughput For 64-Byte Packet 8.6 8.48 8.4 8.26 8.2 8.13 8 7.8 7.76 7.85 7.6 7.4 One-loop Dual-loop Quad-loop Quad-loop w Prefetch Bypass route Test Case Input packet size (bytes) Output packet size (bytes) Inner IP Routing + GTPU-Encap + Transport Routing 64 114

Key Takeaway Easy-to-use and flexible VPP plugin framework Kube-proxy plugin to boost DP s performance in Cloud environment Combination of a set of ways to optimize data plane performance Better platform for developing open source dataplane ingredients 24

Q&A