VPP Host Stack. TCP and Session Layers. Florin Coras, Dave Barach, Keith Burns, Dave Wallace

Similar documents
VPP Host Stack. Transport and Session Layers. Florin Coras, Dave Barach, Keith Burns, Dave Wallace

VPP Host Stack. Transport and Session Layers. Florin Coras, Dave Barach

Ed Warnicke, Cisco. Tomasz Zawadzki, Intel

A Universal Terabit Network Dataplane

Building high performance network functions in VPP. Ole Trøan, VPP contributor FOSDEM 2018

Accelerate Cloud Native with FD.io

A Universal Dataplane. FastData.io Project

FD.io VPP & Ligato Use Cases. Contiv-VPP CNI plugin for Kubernetes IPSEC VPN gateway

High Performance Cloud-native Networking K8s Unleashing FD.io

A Brief Guide to Virtual Switching Franck Baudin (Red Hat) Billy O Mahony (Intel)

Empower Diverse Open Transport Layer Protocols in Cloud Networking GEORGE ZHAO DIRECTOR OSS & ECOSYSTEM, HUAWEI

FD.io : The Universal Dataplane

High Performance Cloud-native Networking K8s Unleashing FD.io

Light & NOS. Dan Li Tsinghua University

CICN Community Information-Centric Networking

High Performance Packet Processing with FlexNIC

Light: A Scalable, High-performance and Fully-compatible User-level TCP Stack. Dan Li ( 李丹 ) Tsinghua University

Cisco Ultra Packet Core High Performance AND Features. Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018

Improve Performance of Kube-proxy and GTP-U using VPP

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Programmable Overlays with VPP

The Work of Containerized NFV Infrastructure on Arm Platform

OpenFlow Software Switch & Intel DPDK. performance analysis

fd.io vpp and containers

fd.io Intro Mark Gray fd.io Foundation 1

IsoStack Highly Efficient Network Processing on Dedicated Cores

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS

Building a Platform Optimized for the Network Edge

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

TLDK Overview. Transport Layer Development Kit Keith Wiles April Contributions from Ray Kinsella & Konstantin Ananyev

TLDK Overview. Transport Layer Development Kit Ray Kinsella February ray.kinsella [at] intel.com IRC: mortderire

Much Faster Networking

Advanced Computer Networks. End Host Optimization

Advanced Computer Networking. CYBR 230 Jeff Shafer University of the Pacific QUIC

6.9. Communicating to the Outside World: Cluster Networking

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

Singapore. Service Proxy, Container Networking & K8s. Acknowledgement: Pierre Pfister, Jerome John DiGiglio, Ray

DPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX

Accelerate Network Protocol Stack Performance and Adoption in the Cloud Networking via DMM

Operating Systems Design Exam 3 Review: Spring Paul Krzyzanowski

Data Path acceleration techniques in a NFV world

Speeding up Linux TCP/IP with a Fast Packet I/O Framework

GPUfs: Integrating a file system with GPUs

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)

Containers Do Not Need Network Stacks

Networking at the Speed of Light

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

Software Datapath Acceleration for Stateless Packet Processing

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

Supporting Fine-Grained Network Functions through Intel DPDK

19: Networking. Networking Hardware. Mark Handley

RoGUE: RDMA over Generic Unconverged Ethernet

Storage Protocol Offload for Virtualized Environments Session 301-F

Solarflare and OpenOnload Solarflare Communications, Inc.

A Look at Intel s Dataplane Development Kit

KVM as The NFV Hypervisor

GPUfs: Integrating a file system with GPUs

Introduction to Routers and LAN Switches

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

DPDK Summit China 2017

PLUSOPTIC NIC-PCIE-2SFP+-V2-PLU

Latest Developments with NVMe/TCP Sagi Grimberg Lightbits Labs

HKG net_mdev: Fast-path userspace I/O. Ilias Apalodimas Mykyta Iziumtsev François-Frédéric Ozog

Transport Layer Overview

ntop Users Group Meeting

Internet Networking recitation #10 TCP New Reno Vs. Reno

Linux SCTP is catching up and going above

Transport Layer (TCP/UDP)

VPP The Universal Fast Dataplane

QuickSpecs. HP Z 10GbE Dual Port Module. Models

An FPGA-Based Optical IOH Architecture for Embedded System

Optimizing TCP Receive Performance

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Different Layers Lecture 20

The Tofu Interconnect 2

TCP in Asymmetric Environments

Zhang Tianfei. Rosen Xu

Industry Collaboration and Innovation

LegUp: Accelerating Memcached on Cloud FPGAs

SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet

NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains

PDP : A Flexible and Programmable Data Plane. Massimo Gallo et al.

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

Virtualization, Xen and Denali

Agenda How DPDK can be used for your Application DPDK Ecosystem boosting your Development Meet the Community Challenges

Internetworking/Internetteknik, Examination 2G1305 Date: August 18 th 2004 at 9:00 13:00 SOLUTIONS

CSCI-GA Operating Systems. Networking. Hubertus Franke

Fast packet processing in linux with af_xdp

Implementation and Analysis of Large Receive Offload in a Virtualized System

Agilio OVS Software Architecture

Network device drivers in Linux

New Approach to OVS Datapath Performance. Founder of CloudNetEngine Jun Xiao

TCP/misc works. Eric Google

SDN AND NFV SECURITY DR. SANDRA SCOTT-HAYWARD, QUEEN S UNIVERSITY BELFAST COINS SUMMER SCHOOL, 23 JULY 2018

Industry Collaboration and Innovation

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

TCP congestion control:

Programmable NICs. Lecture 14, Computer Networks (198:552)

Operating System: Chap13 I/O Systems. National Tsing-Hua University 2016, Fall Semester

Transcription:

Host Stack and Layers Florin Coras, Dave Barach, Keith Burns, Dave Wallace

- A Universal Terabit Network Platform For Native Cloud Network Services Most Efficient on the Planet EFFICIENCY Superior Performance PERFORMANCE Flexible and Extensible SOFTWARE DEFINED NETWORKING Cloud Native CLOUD NETWORK SERVICES Open Source LINUX FOUNDATION Breaking the Barrier of Software Defined Network Services 1 Terabit Services on a Single Intel Xeon Server!

How does it work? Compute Optimized SW Network Platform 1 Packet processing is decomposed into a directed graph of nodes 2 packets move through graph nodes in vector 3 graph nodes are optimized to fit inside the instruction cache vhost-userinput af-packetinput dpdk-input Packet 0 Packet 1 Microprocessor ethernetinput Packet 2 Packet 3 3 Instruction Cache arp-input cdp-input lldp-input mpls-input ip4-lookupmulitcast l2-input ip4-lookup* ip4-input...-nochecksum ip6-input Packet 4 Packet 5 Packet 6 Packet 7 4 Data Cache Packet 8 interfaceoutput ip4-loadbalance mpls-policyencap ip4-rewritetransit ip4- midchain Packet 9 Packet 10 4 packets are pre-fetched into the data cache. * Each graph node implements a micro-nf, a micro-networkfunction processing packets. Makes use of modern Intel Xeon Processor micro-architectures. Instruction cache & data cache always hot è Minimized memory latency and usage.

Motivation: Container networking PID 1234 PID 4321 glibc send() recv() FIFO kernel FIFO IP (routing) IP (routing) device device

Motivation: Container networking PID 1234 PID 4321 send() etc etc etc ACL, SR, VXLAN, LISP IP4/6 MPLS Ethernet recv() FIFO af_packet dpdk af_packet FIFO IP (routing) FIFO dpdk FIFO IP (routing) device device device device device

Why not this? PID 1234 PID 4321 send() recv() FIFO FIFO IP DPDK

Host Stack App rx tx shm segment

Host Stack: Layer App Maintains per app state and conveys to/from session events Allocates and manages sessions/segments/fifos Isolates network resources via namespacing lookup tables (5-tuple) and local/global session rule tables (filters) Support for pluggable transport protocols Binary/native C API for external/builtin applications rx tx shm segment

Host Stack: SVM FIFOs App rx tx shm segment Allocated within shared memory segments Fixed position and size Lock free enqueue/dequeue but atomic size increment Option to dequeue/peek data Support for out-of-order data enqueues

Host Stack: App Clean-slate implementation Complete state machine implementation Connection management and flow control (window management) Timers and retransmission, fast retransmit, SACK NewReno congestion control, SACK based fast recovery Checksum offloading Linux compatibility tested with IWL protocol tester rx tx shm segment

Host Stack: Comms Library (VCL) Comms library (VCL) apps can link against LD_PRELOAD library for legacy apps epoll App rx tx shm segment

Application Attachment attach bind (server) connect (client) App shm segment

Establishment Client Server attach bind listen

Establishment attach connect Client Server attach bind open listen

Establishment Client Server handshake

Establishment Client Server connect succeeded handshake new client

Establishment Client Server shm segment rx tx connect reply accept notify rx tx shm segment

Data Transfer write Client Server read tx write evt rx write evt rx tx rx tx copy to buffer copy to fifo Congestion control Reliable transport

Data Transfer write Client Server read tx write evt rx write evt rx tx rx tx copy to buffer copy to fifo Congestion control Reliable transport Not yet part of CSIT but some rough numbers on a E2690: 200k CPS and 8Gbps/core!

Redirected Connections (Cut-through) Client Server bind

Redirected Connections (Cut-through) Client Server connect redirect

Redirected Connections (Cut-through) Throughput is memory bandwidth constrained: ~120Gbps! Client Server connect redirect

Ongoing work Overall integration with k8s Istio/Envoy Rx policer/tx pacer TSO New congestion control algorithms PMTU discovery Optimization/hardening/testing VCL/LD_PRELOAD Iperf, nginx, wget, curl

Next steps Get involved Get the Code, Build the Code, Run the Code layer: src/vnet/session : src/vnet/tcp SVM: src/svm VCL: src/vcl Read/Watch the Tutorials Read/Watch Tutorials Join the Mailing Lists

Thank you!? Florin Coras email:(fcoras@cisco.com) irc: florinc

Multi-threading App1 rx tx rx tx IP IP Core 0 DPDK Core 1

Features: Namespaces App Request access to vpp ns + secret IP IP IP fib1 fib2 ns1 ns2 ns3

Features: Tables App1 Request access to global and/or local scope NS Local Table NS Local Table Global Table fib1 ns1 ns2

Features: Tables App1 Both table have rules table that can be used for filtering Local tables are namespace specific and can be used for egress filtering Global tables are fib table specific and can be used for ingress filtering NS Local Table NS Local Table Global Table fib1 ns1 ns2