Custom UDP-Based Transport Protocol Implementation over DPDK

Similar documents
SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

Ethernet transport protocols for FPGA

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Light & NOS. Dan Li Tsinghua University

No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6

LANCOM Techpaper Routing Performance

Announcements. No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6

OpenOnload. Dave Parry VP of Engineering Steve Pope CTO Dave Riddoch Chief Software Architect

Implementation of Software-based EPON-OLT and Performance Evaluation

Why Your Application only Uses 10Mbps Even the Link is 1Gbps?

Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007

Light: A Scalable, High-performance and Fully-compatible User-level TCP Stack. Dan Li ( 李丹 ) Tsinghua University

A Network-centric TCP for Interactive Video Delivery Networks (VDN)

Advanced Computer Networks. End Host Optimization

May 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

URDMA: RDMA VERBS OVER DPDK

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Introduction to Protocols

Virtualization, Xen and Denali

Kernel Bypass. Sujay Jayakar (dsj36) 11/17/2016

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Learning with Purpose

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

FUJITSU Software Interstage Information Integrator V11

An Implementation of the Homa Transport Protocol in RAMCloud. Yilong Li, Behnam Montazeri, John Ousterhout

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

User Datagram Protocol

A TRANSPORT PROTOCOL FOR DEDICATED END-TO-END CIRCUITS

QuickSpecs. HP Z 10GbE Dual Port Module. Models

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)

IsoStack Highly Efficient Network Processing on Dedicated Cores

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper

Introduction to TCP/IP Offload Engine (TOE)

Speeding up Linux TCP/IP with a Fast Packet I/O Framework

Design and Implementation of Virtual TAP for Software-Defined Networks

Design of Link and Routing Protocols for Cache-and- Forward Networks. Shweta Jain, Ayesha Saleem, Hongbo Liu, Yanyong Zhang, Dipankar Raychaudhuri

Empower Diverse Open Transport Layer Protocols in Cloud Networking GEORGE ZHAO DIRECTOR OSS & ECOSYSTEM, HUAWEI

TCP CONGESTION WINDOW CONTROL ON AN ISCSI READ ACCESS IN A LONG-LATENCY ENVIRONMENT

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

CSCI-GA Operating Systems. Networking. Hubertus Franke

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Experiments on TCP Re-Ordering March 27 th 2017

EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab

Maelstrom: An Enterprise Continuity Protocol for Financial Datacenters

Ziye Yang. NPG, DCG, Intel

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G

Title: Collaborative research: End-to-End Provisioned Optical Network Testbed for Large-Scale escience Applications

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

Next Steps Spring 2011 Lecture #18. Multi-hop Networks. Network Reliability. Have: digital point-to-point. Want: many interconnected points

Supporting Fine-Grained Network Functions through Intel DPDK

Optimizing TCP Receive Performance

DPDK Performance Report Release Test Date: Nov 16 th 2016

Position of IP and other network-layer protocols in TCP/IP protocol suite

Multiple unconnected networks

Data Path acceleration techniques in a NFV world

Chapter III. congestion situation in Highspeed Networks

Improve Performance of Kube-proxy and GTP-U using VPP

G Robert Grimm New York University

Introduction to Open System Interconnection Reference Model

Presentation_ID. 2002, Cisco Systems, Inc. All rights reserved.

UDP, TCP, IP multicast

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

The Common Communication Interface (CCI)

IIP Wireless. Presentation Outline

ETSF10 Internet Protocols Transport Layer Protocols

CSE 461 Module 10. Introduction to the Transport Layer

Ultra high-speed transmission technology for wide area data movement

440GX Application Note

Multi-class Applications for Parallel Usage of a Guaranteed Rate and a Scavenger Service

40 Gbps IPsec on Commodity Hardware. Jim Thompson Founder & CTO Netgate

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2.

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

Transport Layer Review

The Power of Batching in the Click Modular Router

Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications

Transport Over IP. CSCI 690 Michael Hutt New York Institute of Technology

Parallelizing IPsec: switching SMP to On is not even half the way

Lecture 3. The Network Layer (cont d) Network Layer 1-1

High-performance message striping over reliable transport protocols

Implementation and Analysis of Large Receive Offload in a Virtualized System

RDMA and Hardware Support

Topics. TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput

King Fahd University of Petroleum and Minerals College of Computer Sciences and Engineering Department of Computer Engineering

Department of Computer and IT Engineering University of Kurdistan. Transport Layer. By: Dr. Alireza Abdollahpouri

Receive Livelock. Robert Grimm New York University

Agenda How DPDK can be used for your Application DPDK Ecosystem boosting your Development Meet the Community Challenges

NT1210 Introduction to Networking. Unit 10

Networks & protocols research in Grid5000 DAS3

Communication Networks

Exploring mtcp based Single-Threaded and Multi-Threaded Web Server Design

User Datagram Protocol (UDP):

Data Transport over IP Networks

Understanding and Improving the Cost of Scaling Distributed Event Processing

FlexNIC: Rethinking Network DMA

Transcription:

Custom UDPBased Transport Protocol Implementation over DPDK Dmytro Syzov, Dmitry Kachan, Kirill Karpov, Nikolai Mareev and Eduard Siemens Future Internet Lab Anhalt, Anhalt University of Applied Sciences, Bernburger Str. 57, 06366 Köthen, Germany {dmytro.syzov, dmitry.kachan, kirill.karpov, nikolai.mareev, eduard.siemens}@hsanhalt.de Keywords: Abstract: HighSpeed Data Transport, Packet Processing, User Space. As industry of information technologies evolves, demand for high speed data transmission steadily increases. The need in it can be found in variety of different industries from entertainment (trend for increasing resolution of videocast for example) to scientific research. However, there are several problems that hinder network application capabilities. One of them is slow packet processing due to significant overheads on system calls for simple network operations. There are hardware solutions, but from the economical point of view, using legacy equimpent is preferable due to high cost of updating network infrastructure. Thus, software solutions to these problems can be preferable. One of them is DPDK toolset which gives the ability to tailor network operations to the application. RMDT is a custom transport protocol aimed at high speed data transmission over lossy networks with high latency. The protocol is built over standard Linux UDP sockets. Thus it is heavily reliant on the networking stack performance. The goal of this work is to improve RMDT performance by means of DPDK in a 10G network and to assess the benefits of such an implementation. 1 INTRODUCTION The nature of network operations on Linux OS, with overheads on system calls, several memory copies during recv() and send(), results in a low performance in cases of high speed connections. While there are multiple solutions to increasing packet processing rate, for example Receive Packet Steering [1], but they are usually aimed at TCP optimization, or maintaining many lowrate connections. If there is a need in high speed transmission and TCP is not fitting for cases of high latency and lossy network, outofthebox options are limited. Their functionality is also often dependent on a specific implementation by the manufacturer, thus making development of a widely applicable network utilities more expensive and harder to maintain. DPDK [2] toolset aims at boosting packet processing performance by giving developers access to a lowlevel management of network stack. One of the main benefits is the avoidance of user space to kernel space switches. However it does not provide transmission protocols to use outofthebox. The common bottleneck is receive performance, as in case of standard Linux network operations, packets have to go through multiple memory copy operations and additional management operations necessary for the correct delivery to applications. In case of DPDK, there is an opportunity to tailor these operations specifically to the application. Having more control over timings of various send and receive related operations can improve latency, improving performance in use cases such as streaming. Also such control can deliver more precise measurements of round trip time, which consequently can improve behavior of congestion control as standard kernel method can introduce fluctuations in the overall time of an operation. This work attempts to adapt internal structure of the RMDT [3] protocol to DPDK library and to asses the benefits of DPDK over standard Linux approach. At this stage, the goal is to create a simple RMDT over DPDK implementation to test the possibility of improving its perormance with DPDK. As the main measure of the efficiency in our tests we are using the achievable data rate. Comparison between synthetic packet generation tests and RMDT tests can show the difference in ratio of time spent on network operations to time spent on custom protocol functionality, allowing an assessment of the necessity to improve the implementation. Thus a simple test of a clean send and receive is to be performed as well. 13

2 RELATED WORK As DPDK is a generally applicable network development kit, there is a large amount of projects implementing DPDK for a variety of goals. These include using DPDK to build a lightweight TCP/IP stack to achieve better efficiency with resource limited systems [4] as presented in a paper by R. Rajesh et al., building a high performance software router [5] as presented in a paper by Z. Li. M. Miao et al. developed and tested a selftuning packet I/O aimed at dynamic optimization of data rate and latency by controlling a batch size in a high throughput network system [6]. As can be seen, an improvement in networking operations is in demand by different types of applications. 3 TESTBED DESCRIPTION All tests have been performed in 10 GE Laboratory of Future Internet Lab Anhalt [3]. The core element here is the WAN emulator Netropy 10G [7] that can be used to create an emulation of WAN links with various impairments like packet losees, delay, reordering etc. It collects data regarding data passed through it and is used in this work to assess the resulting performance. Servers, which are used in tests have following characteristics: Kernel: 4.15.045lowlatency. NIC: 82599ES 10Gigabit SFI/SFP by Intel Corporation. Memory: 64 GB DDR3. CPU: 2xIntel Xeon E52643 v4, 3.40GHz. Software consists of two RMDT builds and two synthetic tests with pure packet generation and reception. Builds are for standard Linux networking stack and DPDK respectively. For an interface to UDP over DPDK, an already existing software was used FStack [8]. 4 SOFTWARE DESIGN On Figure 1 the flow chart of a basic DPDK receiver functionality test is presented. Here, the overall loop includes a basic, FStack provided, polling interface, which is derived from DPDK s own polling mechanisms. Apart from basic functionality, necessary for receiving packets via DPDK, additional checks are added to assure that data is received correctly. This functionality is put in the Corruption check box. A test for a sender is the same, but without polling for EPOLLIN. End loop Run loop Process = True Poll events Epoll interface While unprocessed events If event is EPOLLIN Receive FStack socket Corruption check Figure 1: DPDK receive loop. Process an unexpected event On Figure 2 the basic RMDT structure for the receiver side is presented. Here, Receive handler is tasked with receive and some basic processing for both user data and service packets. Rest of the protocol functionality is put into Transport control functionality box. That includes tasks regarding sending service packets. However, sender functionality is not the aim of this work as it does not bottleneck RMDTs overall performance in a pointtopoint configuration with MTU of 1500 bytes (Ethernet standard [9]) and FStack is not optimized for the send process. Both parts work concurrently with the memory buffer and all of the stack is controlled by a master thread which provides protocol interface to an application. It shall be noted that to perform network operations a context switch from user space to kernel space has to be performed, which is one of the contention points. 14

Application Application Master thread Master thread Memory buffer Memory buffer Transport control functionality Kernel space Linux TCP/IP stack NIC Figure 2: Simplified RMDT structure. In order to implement FStack into RMDT protocol some changes to the protocols networking subsystem have to be made. FStack requires a separate loop function to be run on a dedicated CPU core and that function has to be static. Thus, due to OOP structure of RMDT, all functionality regarding receiving and sending packets has to be moved to a separate thread that is not a direct part of any class in RMDT stack (a global static function). In the modified structure, additional blocks for send and receive loops represent separate threads which have to run on dedicated cores and perform receive polling and sending via DPDK (Figure 3). These threads are separated from the overall RMDT structure and transmit received data via Single Producer/SingleConsumer queues, while threads that were handling network operations previously are now polling said queues. Receive/send loop flow is similar to the one presented in figure 1, but with addition of interprocess communication after receiving or before sendig of each batch of packets. Here, switch to kernel space is not needed as DPDK works fully in user space. 5 TEST RESULTS Receive handler Firstly, basic DPDK tests have been performed without RMDT to assess the capabilities of hardware while working with DPDK. At this stage both pure Transport control functionality Send loop FStack NIC Receive handler Receive loop Figure 3: RMDT over DPDK structure. send and pure receive in a case of pointtopoint data transmission have been tested. In the simplest configuration, presented in the previous section, sender was able to achieve up to 6 Gbps, while receiver was capable of achieving maximum link capacity of 10 Gbps (unlike standard TCP/IP, which bottlenecked at receive). However, during testing, certain fluctuations in performance were noticed with sending data, when the rate would drop to 5.2 Gbps or less frequently vary between 5.2 and 6 Gbps. Possible reason for this could be an additional memory copy operations in FStack sending interface. The exact cause for such behavior was not studied in this work. Further tests with RMDT were performed only for a DPDKbased receiver. Sender used the standard Linux TCP/IP stack as in multithreaded configuration it was able to achieve 10 Gbps rates, unlike the FStack/DPDK test. Subsequent tests with RMDT were performed at first with a standard TCP/IP stack to compare it with a DPDKbased RMDT. Standard RMDT showed datarate of 6 Gbps. The bottleneck in such configuration is the receiver as in a test in pointtomultipoint configuration with two receivers, 10 Gbps datarate was achieved. In a test with DPDKbased RMDT, peack achieved datarate was 8 Gbps, althought it was observed to behave inconsistently, sometimes dropping to 6 Gbps. This behavior can be 15

12 10 Datarate, Gbps 8 6 4 2 0 Standard TCP/IP, send Standard TCP/IP, receive FStack, send FStack, receive RMDT over TCP/IP, send RMDT over TCP/IP, receive RMDT over FStack, receive Test case Figure 4: Test results. explained by unoptimized interprocess communication between FStack loop and main RMDT threads. This can be observed by comparison of a clean DPDK test. One of the main reasons are additional memory copy operations. However, even in an unoptimized state, the increase in performance can be seen. A summary of the test results can be seen on Figure 4. 6 CONCLUSIONS The demand for tools providing high speed data transmission grows, thus leading to development of new SDKs that revise outdated approaches to network applications as for example DPDK/FStack does. In this work an attempt to modify a custom UDPbased transport protocol to use DPDK capabilities was made with a goal of increasing performance in a 10G network in a pointtopoint configuration with 1500 bytes MTU. Tests showed an increase in performance in copmarison to standard Linux TCP/IP stack, but full link utilization was not achieved due to the fact that current RMDT structure does not yet fully use DPDK capabilities. 7 FUTURE WORK In order to continue tests with RMDT over DPDK, significant changes have to be made to the protocol s structure. In particular, better memory management should be implemented. With an improved version additional tests in a 10G and 40G network could be made. Another possible continuation of this work is developing and testing transportrelated applications that could use DPDK functionality for better performance. Network probing algorithms, for example, might improve with lower latency and more stable measurements. ACKNOWLEDGMENTS This work has been funded by Volkswagen Foundation for trilateral partnership between scholars and scientists from Ukraine, Russia and Germany within the project CloudBDT: Algorithms and Methods for Big Data Transport in Cloud Environments. REFERENCES [1] Scaling in the Linux Networking Stack, kernel.org, 2018 [Online]. Available: https://www.kernel.org/ doc/documentation/networking/scaling.txt, Accessed on: Dec 01, 2018. [2] Data plane development kit, dpdk.org, 2018 [Online]. Available: https://www.dpdk.org/about/, Accessed on: Dec 01, 2018. [3] Big Data Transmission F I L A, filalab.de, 2018 [Online]. Available: https://filalab.de/index.php/ourwork/bigdatatransmission/, Accessed on: Dec 01, 2018. 16

[4] R. Rajesh, K. B. Ramia, and M. Kulkarni, Integration of LwIP stack over Intel (R) DPDK for high throughput packet delivery to applications, in 2014 Fifth International Symposium on Electronic System Design, 2014, pp. 130134. [5] Z. Li, HPSRouter: A high performance software router based on DPDK, in 2018 20th International Conference on Advanced Communication Technology (ICACT), 2018, pp. 503506. [6] M. Miao, W. Cheng, F. Ren, and J. Xie, Smart batching: A loadsensitive selftuning packet I/O using dynamic batch sizing, in 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2016, pp. 726733. [7] Apposite Technologies Netropy WAN Emulators, Apposite Technologies. [8] FStack High Performance Network Framework Based On DPDK, fstack.org, 2018 [Online]. Available: http://www.fstack.org/, Accessed on: Dec 01, 2018. [9] C. Hornig, A standard for the transmission of IP datagrams over ethernet networks, 1984. 17