VPP Host Stack. Transport and Session Layers. Florin Coras, Dave Barach

Similar documents
VPP Host Stack. Transport and Session Layers. Florin Coras, Dave Barach, Keith Burns, Dave Wallace

VPP Host Stack. TCP and Session Layers. Florin Coras, Dave Barach, Keith Burns, Dave Wallace

Ed Warnicke, Cisco. Tomasz Zawadzki, Intel

Accelerate Cloud Native with FD.io

Empower Diverse Open Transport Layer Protocols in Cloud Networking GEORGE ZHAO DIRECTOR OSS & ECOSYSTEM, HUAWEI

Light & NOS. Dan Li Tsinghua University

A Universal Dataplane. FastData.io Project

A Universal Terabit Network Dataplane

Programmable Overlays with VPP

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Different Layers Lecture 20

Advanced Computer Networking. CYBR 230 Jeff Shafer University of the Pacific QUIC

Linux SCTP is catching up and going above

Building a Platform Optimized for the Network Edge

High Performance Packet Processing with FlexNIC

Stream Control Transmission Protocol (SCTP)

Advanced Computer Networks. End Host Optimization

OSI Transport Layer. Network Fundamentals Chapter 4. Version Cisco Systems, Inc. All rights reserved. Cisco Public 1

MOVING FORWARD WITH FABRIC INTERFACES

MatrixDTLS Developer s Guide

Light: A Scalable, High-performance and Fully-compatible User-level TCP Stack. Dan Li ( 李丹 ) Tsinghua University

Solarflare and OpenOnload Solarflare Communications, Inc.

DPDK Tunneling Offload RONY EFRAIM & YONGSEOK KOH MELLANOX

Transport Layer (TCP/UDP)

OpenFlow Software Switch & Intel DPDK. performance analysis

Data Path acceleration techniques in a NFV world

TLDK Overview. Transport Layer Development Kit Keith Wiles April Contributions from Ray Kinsella & Konstantin Ananyev

Software Datapath Acceleration for Stateless Packet Processing

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

ANIC Host CPU Offload Features Overview An Overview of Features and Functions Available with ANIC Adapters

TLDK Overview. Transport Layer Development Kit Ray Kinsella February ray.kinsella [at] intel.com IRC: mortderire

Much Faster Networking

Accelerate Network Protocol Stack Performance and Adoption in the Cloud Networking via DMM

UNIT IV -- TRANSPORT LAYER

UDP Encapsulation in Linux netdev0.1 Conference February 16, Tom Herbert

TCP congestion control:

Transport Layer Overview

Networking at the Speed of Light

TCP/IP. Chapter 5: Transport Layer TCP/IP Protocols

QuickSpecs. HP Z 10GbE Dual Port Module. Models

RDMA programming concepts

Networking Subsystem in Linux. Manoj Naik IBM Almaden Research Center

TCP/IP Protocol Suite 1

Eduardo

Securing Network Traffic Tunneled Over Kernel managed TCP/UDP sockets

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

URDMA: RDMA VERBS OVER DPDK

TSIN02 - Internetworking

PLEASE READ CAREFULLY BEFORE YOU START

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)

Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance

PLUSOPTIC NIC-PCIE-2SFP+-V2-PLU

CS 4390 Computer Networks. Transport Services and Protocols

IsoStack Highly Efficient Network Processing on Dedicated Cores

Intro to LAN/WAN. Transport Layer

Cisco Ultra Packet Core High Performance AND Features. Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018

Page 1. Review: Internet Protocol Stack. Transport Layer Services. Design Issue EEC173B/ECS152C. Review: TCP

Mark Falco Oracle Coherence Development

ECE 650 Systems Programming & Engineering. Spring 2018

bitcoin allnet exam review: transport layer TCP basics congestion control project 2 Computer Networks ICS 651

Introduction to Routers and LAN Switches

User Datagram Protocol

Implementation and Analysis of Large Receive Offload in a Virtualized System

Singapore. Service Proxy, Container Networking & K8s. Acknowledgement: Pierre Pfister, Jerome John DiGiglio, Ray

Improve Performance of Kube-proxy and GTP-U using VPP

DetNet. Flow Definition and Identification, Features and Mapping to/from TSN. DetNet TSN joint workshop IETF / IEEE 802, Bangkok

Internet Layers. Physical Layer. Application. Application. Transport. Transport. Network. Network. Network. Network. Link. Link. Link.

Single Root I/O Virtualization (SR-IOV) and iscsi Uncompromised Performance for Virtual Server Environments Leonid Grossman Exar Corporation

A Look at Intel s Dataplane Development Kit

ntop Users Group Meeting

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

fd.io Intro Mark Gray fd.io Foundation 1

IOV hardware and software plumbing for VMware ESX - journey to uncompromised I/O Virtualization and Convergence.

Dataplane Networking journey in Containers

Effect of SCTP Multistreaming over Satellite Links

Transport Layer. Chapter 3: Transport Layer

Introduction to Networking. Operating Systems In Depth XXVII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

MIDTERM EXAMINATION #2 OPERATING SYSTEM CONCEPTS U N I V E R S I T Y O F W I N D S O R S C H O O L O F C O M P U T E R S C I E N C E

NTRDMA v0.1. An Open Source Driver for PCIe NTB and DMA. Allen Hubbe at Linux Piter 2015 NTRDMA. Messaging App. IB Verbs. dmaengine.h ntb.

Designing a Resource Pooling Transport Protocol

Data Transport over IP Networks

Virtualization, Xen and Denali

SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet

Evaluation of virtualization and traffic filtering methods for container networks

TS Open Day Data Center Fibre Channel over IP

Transport layer. UDP: User Datagram Protocol [RFC 768] Review principles: Instantiation in the Internet UDP TCP

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

ETHERNET OVER INFINIBAND

EE 122: Transport Protocols. Kevin Lai October 16, 2002

Page 1. Review: Internet Protocol Stack. Transport Layer Services EEC173B/ECS152C. Review: TCP. Transport Layer: Connectionless Service

Transport layer. Review principles: Instantiation in the Internet UDP TCP. Reliable data transfer Flow control Congestion control

An Intelligent NIC Design Xin Song

CSC 401 Data and Computer Communications Networks

QUIZ: Longest Matching Prefix

ECE 650 Systems Programming & Engineering. Spring 2018

A Quest for Predictable Latency Adventures in Java Concurrency. Martin Thompson

6.9. Communicating to the Outside World: Cluster Networking

On the cost of tunnel endpoint processing in overlay virtual networks

fd.io vpp and containers

Guide To TCP/IP, Second Edition UDP Header Source Port Number (16 bits) IP HEADER Protocol Field = 17 Destination Port Number (16 bit) 15 16

Transcription:

Host Stack Transport and Layers Florin Coras, Dave Barach

- A Universal Terabit Network Platform For Native Cloud Network Services Most Efficient on the Planet EFFICIENCY Superior Performance PERFORMANCE Flexible and Extensible SOFTWARE DEFINED NETWORKING Cloud Native CLOUD NETWORK SERVICES Open Source LINUX FOUNDATION Breaking the Barrier of Software Defined Network Services 1 Terabit Services on a Single Intel Xeon Server!

Motivation: Container networking PID 1234 PID 4321 glibc send() recv() FIFO kernel FIFO IP (routing) IP (routing) device device

Motivation: Container networking PID 1234 send() etc etc etc ACL, SR, VXLAN, LISP IP4/6 MPLS Ethernet PID 4321 recv() FIFO af_packet/tap dpdk af_packet/tap FIFO IP (routing) FIFO dpdk FIFO IP (routing) device device device device device

Why not this? PID 1234 PID 4321 send() recv() FIFO FIFO IP DPDK

Host Stack App rx tx shm segment

Host Stack: Layer App Maintains per app state and conveys to/from session events Allocates and manages sessions/segments/fifos Isolates network resources via namespacing lookup tables (5-tuple) and local/global session rule tables (filters) Support for pluggable transport protocols Binary/native C API for external/builtin applications rx tx shm segment

Host Stack: SVM FIFOs App rx tx shm segment Allocated within shared memory segments with or without file backing (ssvm/memfd) Fixed position and size Lock free enqueue/dequeue but atomic size increment Option to dequeue/peek data Support for out-of-order data enqueues

Host Stack: App Clean-slate implementation Complete state machine implementation Connection management and flow control (window management) Timers and retransmission, fast retransmit, SACK NewReno congestion control, SACK based fast recovery Checksum offloading Linux compatibility tested with IWL protocol tester rx tx shm segment

Host Stack: more transports App rx tx shm segment SCTP UDP TLS

Host Stack: Comms Library (VCL) Comms library (VCL) apps can link against LD_PRELOAD library for legacy apps epoll App rx tx shm segment

Application Attachment attach bind (server) connect (client) App shm segment

Establishment Client Server attach bind listen

Establishment attach connect Client Server attach bind open listen

Establishment Client Server handshake

Establishment Client Server connect succeeded handshake new client

Establishment Client Server shm segment rx tx connect reply accept notify rx tx shm segment

Data Transfer write Client Server read tx write evt rx write evt rx tx rx tx copy to buffer copy to fifo Congestion control Reliable transport

Data Transfer write Client Server read tx write evt rx write evt rx tx rx tx copy to buffer copy to fifo Congestion control Reliable transport Some rough numbers on a E2699: ~12Gbps/core (1.5k MTU), ~20Gbps/core (9k MTU), ~185k CPS!

Data Transfer: Dgram Transports Client Server shm segment rx tx connect listen rx tx shm segment Data and dgram header dequeued from fifo UDP UDP Data enqueued in fifo with dgram hdr

Redirected Connections (Cut-through) Client Server bind

Redirected Connections (Cut-through) Client Server connect redirect

Redirected Connections (Cut-through) Client Server tracks these sessions, allocates ssvm segments and asks both peers to map them.

Redirected Connections (Cut-through) Throughput is memory bandwidth constrained: ~120Gbps! Client Server

Multi-threading for stream connections App1 rx tx rx tx Connections/sessions pinned to a thread Per-thread data structures/state IP IP Core 0 DPDK Core 1

Features: Namespaces App Request access to vpp ns + secret Namespaces are configured independently and associate applications to network layer resources like interfaces and fib tables IP IP IP fib1 fib2 ns1 ns2 ns3

Features: Tables App1 Request access to global and/or local scope NS Local Table NS Local Table Global Table fib1 ns1 ns2

Features: Tables App1 Both table have rules table that can be used for filtering Local tables are namespace specific and can be used for egress filtering Global tables are fib table specific and can be used for ingress filtering NS Local Table NS Local Table Global Table fib1 ns1 ns2

TLS App App TLS App TLS context App rx TLS Engine (openssl, mbedtls) rx tx tx TLS App registers as transport at init time TLS protocol implementation handled by plugin engines. We support openssl and mbedtls Client app registers key and certificate via api and requests tls as session transport CA certs read at TLS app init time. Defaults to reading /etc/ssl/certs/cacertificates.crt Ping and Ray from Intel working on accelerating the openssl engine with QAT cards

TLS App App TLS App TLS context App rx TLS Engine (openssl, mbedtls) rx tx tx TLS App registers as transport at init time TLS protocol implementation handled by plugin engines. We support openssl and mbedtls Client app registers key and certificate via api and requests tls as session transport CA certs read at TLS app init time. Defaults to reading /etc/ssl/certs/cacertificates.crt Ping and Ray from Intel working on accelerating the openssl engine with QAT cards Some rough OpenSSL numbers on a E2699: ~1Gbps/core (no hw accel)

Ongoing work Overall integration with k8s Istio/Envoy Rx policer/tx pacer TSO New congestion control algorithms PMTU discovery Optimization/hardening/testing

Next steps Get involved Get the Code, Build the Code, Run the Code layer: src/vnet/session : src/vnet/tcp SVM: src/svm VCL: src/vcl Read/Watch the Tutorials Read/Watch Tutorials Join the Mailing Lists

Thank you!? Florin Coras email:(fcoras@cisco.com) irc: florinc