Open Packet Processing Acceleration Nuzzo, Craig,

Similar documents
The Load Balancing Research of SDN based on Ant Colony Algorithm with Job Classification Wucai Lin1,a, Lichen Zhang2,b

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

Networks and Operating Systems Chapter 11: Introduction to Operating Systems

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD

David R. Mackay, Ph.D. Libraries play an important role in threading software to run faster on Intel multi-core platforms.

Lecture 5: February 3

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Much Faster Networking

JavaScript and Flash Overhead in the Web Browser Sandbox

OPEN COMPUTE PLATFORMS POWER SOFTWARE-DRIVEN PACKET FLOW VISIBILITY, PART 2 EXECUTIVE SUMMARY. Key Takeaways

Introduction to TCP/IP Offload Engine (TOE)

Merging Enterprise Applications with Docker* Container Technology

IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE

6WINDGate. White Paper. Packet Processing Software for Wireless Infrastructure

The Power of Batching in the Click Modular Router

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

10 Steps to Virtualization

Thomas Lin, Naif Tarafdar, Byungchul Park, Paul Chow, and Alberto Leon-Garcia

Use Cases for iscsi and FCoE: Where Each Makes Sense

Evaluation of the Chelsio T580-CR iscsi Offload adapter

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

Supporting Fine-Grained Network Functions through Intel DPDK

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory

BUILDING A NEXT-GENERATION FIREWALL

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Shadowfax: Scaling in Heterogeneous Cluster Systems via GPGPU Assemblies

Real-time Communications Security and SDN

PacketShader: A GPU-Accelerated Software Router

QuickSpecs. HP Z 10GbE Dual Port Module. Models

ASYNCHRONOUS SHADERS WHITE PAPER 0

ARISTA: Improving Application Performance While Reducing Complexity

Storage Networking Strategy for the Next Five Years

Data Path acceleration techniques in a NFV world

COSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team

NVMe SSDs Becoming Norm for All Flash Storage

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Improving DPDK Performance

Building NVLink for Developers

Specifying Storage Servers for IP security applications

Performance Considerations of Network Functions Virtualization using Containers

ODP Relationship to NFV. Bill Fischofer, LNG 31 October 2013

Zhang Tianfei. Rosen Xu

Network Design Considerations for Grid Computing

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

White Paper Network Management Considerations For VSAT Technology March 2010

CS427 Multicore Architecture and Parallel Computing

G-NET: Effective GPU Sharing In NFV Systems

vswitch Acceleration with Hardware Offloading CHEN ZHIHUI JUNE 2018

GPU Consolidation for Cloud Games: Are We There Yet?

Kernel level AES Acceleration using GPUs

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Lecture 1: Gentle Introduction to GPUs

Transforming Management for Modern Scale-Out Infrastructure

打造 Linux 下的高性能网络 北京酷锐达信息技术有限公司技术总监史应生.

Contents Overview of the Compression Server White Paper... 5 Business Problem... 7

Martin Dubois, ing. Contents

Live Migration of Direct-Access Devices. Live Migration

An FPGA-Based Optical IOH Architecture for Embedded System

Full Scalable Media Cloud Solution with Kubernetes Orchestration. Zhenyu Wang, Xin(Owen)Zhang

100 Gbps Open-Source Software Router? It's Here. Jim Thompson, CTO, Netgate

All product specifications are subject to change without notice.

OpenFlow Software Switch & Intel DPDK. performance analysis

SentinelOne Technical Brief

An Experimental review on Intel DPDK L2 Forwarding

Using Containers to Deliver an Efficient Private Cloud

MidoNet Scalability Report

An Operating System History of Operating Systems. Operating Systems. Autumn CS4023

CSC 5930/9010 Cloud S & P: Virtualization

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Virtualization, Xen and Denali

SAP High-Performance Analytic Appliance on the Cisco Unified Computing System

! Readings! ! Room-level, on-chip! vs.!

Comprehensive Kernel Instrumentation via Dynamic Binary Translation

The dark powers on Intel processor boards

Quality of Service Implementation within IEEE DCF Interframe Space

Virtualization Introduction

VALE: a switched ethernet for virtual machines

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

CS370 Operating Systems

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Computer Networks. Routing Algorithms

Empower Diverse Open Transport Layer Protocols in Cloud Networking GEORGE ZHAO DIRECTOR OSS & ECOSYSTEM, HUAWEI

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Parallelizing TCP/IP Offline Log Analysis and Processing Exploiting Multiprocessor Functionality

Heterogeneous SoCs. May 28, 2014 COMPUTER SYSTEM COLLOQUIUM 1

Achieve Optimal Network Throughput on the Cisco UCS S3260 Storage Server

Benchmarking results of SMIP project software components

Migration and Building of Data Centers in IBM SoftLayer

GigE Vision over NBASE T. Meeting New Bandwidth and Cost Demands in High Performance Imaging Applications

HP SDN Document Portfolio Introduction

Virtualized SQL Server Performance and Scaling on Dell EMC XC Series Web-Scale Hyper-converged Appliances Powered by Nutanix Software

Launching StarlingX. The Journey to Drive Compute to the Edge Pilot Project Supported by the OpenStack

QuartzV: Bringing Quality of Time to Virtual Machines

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

Broadcast-Quality, High-Density HEVC Encoding with AMD EPYC Processors

RDMA and Hardware Support

6.9. Communicating to the Outside World: Cluster Networking

Was ist dran an einer spezialisierten Data Warehousing platform?

Transcription:

Open Packet Processing Acceleration Nuzzo, Craig, cnuzz2@uis.edu Summary The amount of data in our world is growing rapidly, this is obvious. However, the behind the scenes impacts of this growth may not seem as apparent. All of that data has to be controlled by something, and the answer has always been the computer networks. Over the years, these computer networks have been refined by means of their inner-workings, as they have moved from bare-metal systems to visualized to software defined. The problem that the industry faces is the bottleneck for keeping up with the speed of the Internet without any new bleeding-edge hardware architectures available yet. With the slowing of Moore s Law, the hardware is now only getting a little faster, we have to look at the software instead. The amount of data is so large, that we need to have a discussion on what else can be done in order to accelerate the packets that move that data around the Internet. This is where Xen and OpenDataPlane come into play for the VirtuOR group. After a brief overview of the solution and a summary of ODP in depth, we will then look at the solution in detail and what results VirtuOR saw in the end. VirtuOR addresses part of this problem by implementing a solution to accelerate packet processing, therefore achieving a smaller bottleneck for data. The choice of Xen is due to the high compatibility within ODP. This choice does not come lightly since ODP can drastically impact the performance of all processes on a device, up to 89% (Braham, 2016, p. 408). They chose to manipulate the existing Xen architecture (Braham, 2016, p. 408) with an implementation of the OpenDataPlane (ODP) project. The new Xen architecture (as shown in Figure 1) will virtualize the CPU cores since they will be integrated into a virtual privileged domain called driver domain. This will achieve accelerated packet processing without the overhead of the physical CPU cores. The ODP is an open-sourced project that allows application programmers a easy to use programming environment for data plane applications (OpenDataPlane, 2014). They achieve this by providing common APIs, utilities and configuration files for the underlying hardware. The goal of ODP is to create a data plane application framework for many different platforms. The accelerated packet processing described in this report utilizes the Pull Model packet processing scheme from ODP. The Pull Model (as shown in Figure 2) basically organizes packets with a Scheduler function. The advantage here is to prioritize desired packets for faster processing in the long run (Braham, 2016, p. 409). All of this is dependent on the number of threads as ODP is dependent on how many CPU cores are allocated to the application. This means that each thread will use all the resources available to accelerate the packets. This speed is controlled by the number of allocated cores, or in this case threads, launched by the application. All of these ideas are placed into a Linux virtual machine within the driver domain of the new Xen architecture (as shown in Figure 1). The responsibility of the driver domain is to add or remove the number of virtual CPU cores used by the ODP, this will achieve the accelerated packet processing. ODP will then launch threads corresponding to the number of virtual cores in the driver domain. Those threads then continue to accelerate the processing of packets without loading up the physical CPU. The beauty of virtualized CPU cores is the fact that adding more of them has no influence on the underlying physical CPU of the system (Braham, 2016, p. 410). Replacing the physical CPU cores with the virtual CPU cores in the driver domain is the crux of VirtuOR s solution. In the end, the use of ODP saw some advantages, which include but are not limited to: 1) compatibility with the majority of NICs and drivers in the market and 2) classification of different packet flows with functions from ODP by means of better prioritization of packets for monitoring. The real life implementation by VirtuOR was within their Metamorphic Networks platform (M-Net). The

platform has the ability to remove, create or move dynamically the VMs within the Xen environment (as shown in Figure 3). The M-Net utilizes the TRILL protocol connected though a wired network of two physical nodes. The TRILL provides simple forwarding and speed since it calculates the shortest path based on a combination of IS-IS protocol and Dijkstra algorithm (Braham, 2016, p. 410). All traffic going to different domains is then managed by the driver domain and ODP. The results of the solution were tested on two M-Net devices equipped with 2.5GHz Intel core 2 duo processor and four Intel 82571EB Gigabit Ethernet cards (Braham, 2016, p. 411) and featured a proprietary Linux distribution developed by VirtuOR that contained the new Xen architecture. The parameters used to evaluate were the following: maximum reached throughput, number of processed packets, band-width use percentage and use percentage of the virtual and physical CPU resources on both architectures (Braham, 2016, p. 411). As show in Figure 4, the packet processing of the new architecture has a gain of 15% when the number of virtual CPU cores is more than 1. The throughput evaluation concluded that 958 Mbits/s is achievable. Bandwidth use percentage comparison showed that this 95% use of bandwidth happens with 2 virtual CPU cores in the new architecture. They observed that the only CPU resources used for packet processing were virtual ones. This came out to be 89% for the new architecture and 9.4% for the old (Braham, 2016, p. 412). For future work, the VirtuOR team hopes to compare their solution to other packet processing accelerators in the industry. Future Work One of the main reasons this paper was chosen is the fact that it exemplifies the open-source community. By bringing together multiple open-source solutions, a new one is born. The team at VirtuOR brings together three main open-sourced projects: the Xen Project, the Linux kernel and the OpenDataPlane project. Together this allows them to come up with a solution for faster packet processing within their own solutions. This is something that is on the rise. We see more and more open code than ever before. Microsoft has even recently joined the Linux Foundation and they have opened up their.net coding platform. Many additional companies are unloading their code to places like GitHub for the public to see. This growth will only help the packet processing and software defined networking in order to speed up the Internet further. The collaboration is becoming a healthy solution for the networking industry. Two related computer topics in open-source include cloud computing and graphics processing. The integration of these platforms may help out the software-defined networking of packet processing. The cloud has become a popular option amongst the modern day IT Department. This gives them the ability to concentrate on improving their code without having the physical overhead of running in house servers. An implementation like that of this research paper would most definitely help improve those services. Not only would it improve the cloud service for the business, but also for the client if they are able to utilize the software-defined accelerated packet processing as an option or by default. This would be a lucrative transaction for either parties. Another consideration may be to take advantage of Graphic Processing Units (GPU). The modern GPU architecture can offer computational throughput that is quite high and the memory is very efficient. The GPUs for this particular application would be benefit from being both software and hardware. The GPU is inexpensive and more readily available than the many CPUs. Even more impressive is the fact that 60-200ns of latency can be removed from the ability to retrieve data from main memory (Kalia, Zhou, & Andersen, n.d.). This paired with code being written in CUDA or OpenCL would do wonders to a project like Metamorphic Networks is conducting. The implementation of accelerated packet processing may not be the only thing in the OSI Model that can be virtualized. Research to virtualize other aspects of computer networking could be

considered. Academia and enterprise already use network virtualization to not only learn about networking concepts, but apply it to real world solutions. We see this in software-defined networking implementations already. This idea could also be used to sandbox certain aspects in networking in order to escape the inevitable demise of cyber attacks. This modular code could help IT Departments avoid unnecessary attacks by being able to remove and replace networking concepts at a software control panel or in the command prompt itself. The continued research and implementations of accelerated packet processing is so important now more than ever. Companies should be looking into this as a serious consideration as their network stacks are overran by massive amounts of data. As 4K video is being pushed out into the wild, video streaming services should look to implement some of the aforementioned ideas. That would do us all some good. Citations Rabia, T., Braham, O., & Pujolle, G. (n.d.). Accelerating packet processing in a Xen environment With OpenDataPlane. 2016 IEEE 30th International Conference on Advanced Information Networking and Applications, 408-413. OpenDataPlane Introduction and Overview [An in depth introduction to the OpenDataPlane.]. (2014, January). Kalia, A., Zhou, D., & Andersen, D. G. (n.d.). Raising the Bar for Using GPUs in Software Packet Processing. Carnegie Mellon University and Intel Labs

Figures and Tables Figure 1 Figure 2

Figure 3 Figure 4