TOWARDS FAST IP FORWARDING

Similar documents
VALE: a switched ethernet for virtual machines

Enabling Fast, Dynamic Network Processing with ClickOS

Software Routers: NetMap

A Look at Intel s Dataplane Development Kit

Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!

Evolution of the netmap architecture

The Power of Batching in the Click Modular Router

PASTE: A Network Programming Interface for Non-Volatile Main Memory

PVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim

Backend for Software Data Planes

Enabling innovation in the Internet: Main Achievements of the CHANGE Project. Felipe Huici, NEC Europe

PacketShader: A GPU-Accelerated Software Router

OpenFlow Software Switch & Intel DPDK. performance analysis

DPDK Summit China 2017

An Experimental review on Intel DPDK L2 Forwarding

Accelerating OpenFlow SDN Switches with Per-Port Cache

PDP : A Flexible and Programmable Data Plane. Massimo Gallo et al.

Programmable Software Switches. Lecture 11, Computer Networks (198:552)

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

How to Build a 100 Gbps DDoS Traffic Generator

Learning with Purpose

Improve Performance of Kube-proxy and GTP-U using VPP

Scalable Enterprise Networks with Inexpensive Switches

100 GBE AND BEYOND. Diagram courtesy of the CFP MSA Brocade Communications Systems, Inc. v /11/21

Switch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011

Comparison of Efficient Routing Table Data Structures

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate

G-NET: Effective GPU Sharing In NFV Systems

Current status of NetBSD MP-safe network stack project

Much Faster Networking

A Look at Intel s Dataplane Development Kit

Agilio CX 2x40GbE with OVS-TC

PASTE: Fast End System Networking with netmap

Bringing the Power of ebpf to Open vswitch. Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018

Building a Fast, Virtualized Data Plane with Programmable Hardware. Bilal Anwer Nick Feamster

Speeding up Linux TCP/IP with a Fast Packet I/O Framework

Accelerating vrouter Contrail

< Packet- based Informa/on Chaining Service (pix) > Networking Opera/ng System from Scratch towards High- Performance COTS Network Facili/es

Supporting Fine-Grained Network Functions through Intel DPDK

Using Diagnostic Tools

Programmable NICs. Lecture 14, Computer Networks (198:552)

Accelerating Contrail vrouter

Session based high bandwidth throughput testing

Data Center Traffic and Measurements: SoNIC

CS419: Computer Networks. Lecture 6: March 7, 2005 Fast Address Lookup:

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

OpenContrail, Real Speed: Offloading vrouter

High Performance Packet Processing with FlexNIC

WITH the fast development of Internet, the size of

Experiences in Building a 100 Gbps (D)DoS Traffic Generator

Recent Advances in Software Router Technologies

Speeding Up IP Lookup Procedure in Software Routers by Means of Parallelization

Switching & ARP Week 3

Lecture 16: Router Design

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Network Services Benchmarking: Accelerating the Virtualization of the Network

libvnf: building VNFs made easy

Cuckoo Filter: Practically Better Than Bloom

SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet

A 400Gbps Multi-Core Network Processor

A Network-centric TCP for Interactive Video Delivery Networks (VDN)

Addressing and Routing

Fairness Issues in Software Virtual Routers

Routing Lookup Algorithm for IPv6 using Hash Tables

DPDK Summit 2016 OpenContrail vrouter / DPDK Architecture. Raja Sivaramakrishnan, Distinguished Engineer Aniket Daptari, Sr.

Novel Hardware Architecture for Fast Address Lookups

Network stack virtualization for FreeBSD 7.0. Marko Zec

DPDK Intel NIC Performance Report Release 18.02

MoonGen. A Scriptable High-Speed Packet Generator. Paul Emmerich. January 31st, 2016 FOSDEM Chair for Network Architectures and Services

Open Source Traffic Analyzer

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

DPDK Performance Report Release Test Date: Nov 16 th 2016

Using libnetvirt to control the virtual network

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture

Single Root I/O Virtualization (SR-IOV) and iscsi Uncompromised Performance for Virtual Server Environments Leonid Grossman Exar Corporation

Containers Do Not Need Network Stacks

Exercise 1 INTERNET. x.x.x.254. net /24. net /24. x.x.x.33. x.x.x.254. x.x.x.52. x.x.x.254. x.x.x.254. x.x.x.

BSDCan 2015 June 13 th Extensions to FreeBSD Datacenter TCP for Incremental Deployment Support. Midori Kato

IETF 90: VNF PERFORMANCE BENCHMARKING METHODOLOGY

FAQ. Release rc2

To Grant or Not to Grant

How to Choose the Best Router Switching Path for Your Network

Revisiting virtualized network adapters

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Analysis of FTP over SCTP and TCP in Congested Network

Toward MP-safe Networking in NetBSD

OpenNetVM: A Platform for High Performance Network Service Chains

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2.

High-Speed Forwarding: A P4 Compiler with a Hardware Abstraction Library for Intel DPDK

Design Challenges for High Performance, Scalable NFV Interconnects

Tungsten Fabric Optimization by DPDK ZHAOYAN CHEN YIPENG WANG

CORAL: A Multi-Core Lock-Free Rate Limiting Framework

NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains

Lecture 2: Basic routing, ARP, and basic IP

Design and Implementation of Virtual TAP for Software-Defined Networks

DPDK Intel NIC Performance Report Release 18.05

Total Cost of Ownership Analysis for a Wireless Access Gateway

Transcription:

TOWARDS FAST IP FORWARDING IP FORWARDING PERFORMANCE IMPROVEMENT AND MEASUREMENT IN FREEBSD Nanako Momiyama Keio University 25th September 2016 EuroBSDcon 2016

OUTLINE Motivation Design and implementation Applying fast packet I/O and fast IP lookup into FreeBSD network stack Measurement results Problem analysis Approach (ongoing work) Conclusion

MOTIVATION Software packet forwarding has played an important role in general-purpose OSes L2 bridging, IP Routing, Firewall etc Increasing network capacities (10GbE, 40GbE...) pushed people out of the kernel user-space packet forwarding on top of netmap[1], DPDK[2] Stresses using them in production are beginning to arise APIs/CLIs compatibility, port scalability (s, VMs), features and isolation It s time for bridging a performance gap between kernel-based packet forwarding (1-2 Mpps) and user-space one (> 10 Mpps)

STARTING POINT L3 IP forwarding Support the Internet Useful for datacenter and VM back-end L2 network doesn t scale VM VM VM VM VM VM VM vrouter server

WHERE IS THE PERFORMANCE BOTTLENECK? Default FreeBSD can forward packets only at 1.4 Mpps (10GbE line rate is 14.88 Mpps) Packet I/O? Was a main bottleneck for packet forwarding Now several solutions to achieve the 10GbE line rate netmap, DPDK IP routing table lookup? Hardware appliance has TCAM for fast lookup Now several fast routing lookup algorithms for software SAIL[3], DXR[4], Poptrie[5] What if we bring these techniques into FreeBSD?

DESIGN AND IMPLEMENTATION Design overview FreeBSD default network stack FreeBSD for Control Plane The OS network stack to preserve existing APIs VALE[6] + DXR for Forwarding Plane VALE for fast, scalable packet I/O DXR for fast IP route lookup user kernel OS stack IP application routing socket radix tree Ethernet Device I/O

VALE OVERVIEW VALE is a software switch Run in the kernel Part of the netmap framework Netmap is a fast packet I/O framework which enables applications to send and receive packets at 10 GbE line rate VALE works as a L2 learning switch by default Packets do NOT go through the OS network stack just forwarding packets from one port to another port L2 switch logic can be replaced with a different module Default I/O user kernel OS stack IP Ethernet Device I/O VALE with L2 learning bridge user kernel Switch fabric Switch logic (L2 learning bridge)

NEW SWITCH LOGIC IMPLEMENTATION Create a new function as a new switch logic (L3 module) in VALE Use VALE for packet I/O and the OS network stack for L2/L3 Make a fake mbuf in VALE and pass it to the OS network stack The OS stack embeds a route lookup result in an unused mbuf field Before if_transmit(), force return to have VALE transmit packets user kernel VALE with L3 module OS stack IP Ethernet Switch fabric Switch logic fake mbuf (L3 Module)

DXR OVERVIEW DXR is a fast IPv4 route lookup algorithm Create compact data structures based on a large routing table (radix tree) Fit into CPU caches See the DXR paper for more details DXR compact fib Default routing structure generate Lookup table Range table Next hop table direct indexing binary search dst gw & addr 0x0000 0x0001 0x0002 nh #0 nh #2 range 0: 1.2.3.1 1: 1.2.3.4 2: 4.5.6.7 range 0x0000 nh #0 3: 4.5.6.8 0x0200 nh #3 0x0800 nh #1 0xfffe 0xffff nh #1 range 0x0000 nh #2 0x1400 nh #3 0x0000 nh #1 0xabcd nh #3 Ref. Modified from Figure 1 of Zec, Marko, Luigi Rizzo, and Miljenko Mikuc. "DXR: towards a billion routing lookups per second in software." ACM SIGCOMM Computer Communication Review 42.5 (2012): 29-36.

DXR IMPLEMENTATION Porting DXR patch for FreeBSD 8.0 to FreeBSD 12.0-CURRENT DXR builds and uses new compact data structures based on the OS radix tree user kernel OS stack DXR integration DXR-specific lookup function is called instead of ip_findroute() IP socket Radix Tree DXR FIB Ethernet Device I/O

EXPERIMENTAL SETUP Machine spec OS: FreeBSD (12.0-CURRENT, 04/08/16 snapshot) CPU: Intel(R) Core(TM) i7-3930k CPU @ 3.20GHz 6 core : Intel X520 10GbE dual-port Method Two machines connected back-to-back Generate 10GbE line-rate traffic using pkt-gen application Measure packet rates forwarded by router machine Setting Packet size is 64 byte (Incl. Ethernet CRC) Routing table size is minimum(less than 10 entries) Router machine pktgen rx Router pktgen tx Send-and-receive machine

RESULTS Default FreeBSD 1.43 Mpps out of 14.88 Mpps 10GbE line rate throughput 1.43 Mpps implementation device I/O if_input (if_ethersubr.c) ip_input ip_fastfwd if_output (if_ethersubr.c) device I/O function packet input L2 input L3 Route lookup L2 output packet output I/O Protocol I/O

RESULTS Default I/O + DXR lookup Using DXR lookup instead of FreeBSD default routing lookup (ip_findroute()) 1.66 Mpps out of 14.88 Mpps 10GbE line rate Replacing lookup part saves 97 ns throughput 1.66 Mpps implementation device I/O if_input (if_ethersubr.c) ip_input DXR if_output (if_ethersubr.c) device I/O function packet input L2 input L3 Route lookup L2 output packet output I/O Protocol I/O

RESULTS VALE + default routing lookup Replace FreeBSD default I/O with VALE 1.95 Mpps out of 14.88 Mpps 10GbE line rate Replacing packet I/O saves 187ns throughput 1.95 Mpps implementation netmap if_input (if_ethersubr.c) ip_input ip_fastfwd if_output (if_ethersubr.c) netmap function packet input L2 input L3 Route lookup L2 output packet output I/O Protocol I/O

RESULTS VALE + DXR lookup Replace FreeBSD default I/O with VALE and use DXR lookup 2.43 Mpps out of 14.88 Mpps 10GbE line rate Slightly (1Mpps) faster than default FreeBSD but still SLOW throughput 2.43 Mpps implementation netmap if_input (if_ethersubr.c) ip_input DXR if_output (if_ethersubr.c) netmap function packet input L2 input L3 Route lookup L2 output packet output I/O Protocol I/O

RESULTS AND TAKEAWAY Module Default (baseline) Default I/O + DXR lookup VALE + default lookup VALE + DXR lookup VALE L2 switch Throughput 1.43Mpps 1.66Mpps 1.95Mpps 2.43Mpps 12.39Mpps VALE L2 switch itself can achieve 12.39 Mpps Why does the 10 Mpps gap between L2 and L3 module exist? We should investigate which parts of take time Packet I/O and route lookup are not very expensive anymore

MEASUREMENT METHODOLOGY Hardcode the output interface in VALE in advance user VALE and DXR Force to return at the several vantage points Receive the packets on the send-and-receive machine and measure rates kernel IP return OS stack Radix Tree DXR return FIB return Ethernet return return Switch fabric Switch logic (L3 Module)

VALE + DXR lookup VALE Which + DXR part does LOOKUP consume time? VALE and DXR user kernel 36ns 118ns 4.64 Mpps 5.32 Mpps 14.36 Mpps return before ip_tryforward() return before ip_input() return before if_input() OS stack IP Ethernet Switch fabric Switch logic (L3 Module) Radix Tree DXR FIB return before if_output() return before if_transmit() 4.64 Mpps 3.66 Mpps 2.44 Mpps 49ns 137ns

MEASUREMENT CONCLUSION Packet I/O is fast enough and the cost of route lookup is negligible L2 protocol has become a new performance bottleneck How can we solve this problem?

BASIC DESIGN(ONGOING WORK) if_input() bypass user kernel Filtering packets in VALE if the packet has protocol type of IPv4(0x0800) and the destination MAC address of the input interface, it directory goes to ip_input() IP ip_input() OS stack DXR next hop table If & gw addr MAC addr 0 : 1.2.3.4 08:00:27:60:10:20 1 : 2.3.4.5 08:00:27:f4:d0:7a if_output() bypass Add a new field in DXR s FIB to cache the destination MAC address of the next hop Ether if_input() Switch fabric if_output() Avoid if_output() (incl. ARP resolve) for subsequent packets Filter Switch logic (L3 module)

CONCLUSION FreeBSD can forward packets only at 1.43 Mpps By replacing packet I/O with VALE, and route lookup with DXR, we can forward packets at 2.43 Mpps Ethernet layer remains expensive We have to bypass it for further speed up

THANK YOU Questions? Comments? Mail nanako@sfc.wide.ad.jp Code https://github.com/nanakom/freebsd/tree/dxr

REFERENCES [1] L. Rizzo. netmap: A novel framework for fast packet i/o. In Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12), June 2012. [2] DPDK: http://dpdk.org [3] T. Yang, G. Xie, Y. Li, Q. Fu, A. X. Liu, Q. Li, and L. Mathy. Guarantee IP Lookup Performance with FIB Explosion. In ACM SIGCOMM, pages 39 50, 2014. [4] M. Zec, L. Rizzo, and M. Mikuc. Dxr: Towards a billion routing lookups per second in software. SIGCOMM Comput. Commun. Rev., 42(5):29 36, Sept. 2012. [5] H. Asai and Y. Ohara. Poptrie: A compressed trie with population count for fast and scalable software IP routing table lookup. In ACM SIGCOMM, pages 57 70, 2015. [6] M. Honda, F. Huici, G. Lettieri, and L. Rizzo. mswitch: A highly- scalable, modular software switch. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research, SOSR 15, pages 1:1 1:13, New York, NY, USA, 2015. ACM