G-NET: Effective GPU Sharing In NFV Systems

Similar documents
G-NET: Effective GPU Sharing in NFV Systems

소프트웨어기반고성능침입탐지시스템설계및구현

GASPP: A GPU- Accelerated Stateful Packet Processing Framework

PacketShader: A GPU-Accelerated Software Router

NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

GPGPU introduction and network applications. PacketShaders, SSLShader

Supporting Fine-Grained Network Functions through Intel DPDK

DPDK Summit China 2017

A distributed in-memory key-value store system on heterogeneous CPU GPU cluster

PacketShader as a Future Internet Platform

The Power of Batching in the Click Modular Router

Goverlan Reach Server Hardware & Operating System Guidelines

SafeBricks: Shielding Network Functions in the Cloud

Agilio CX 2x40GbE with OVS-TC

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

A Comparison of Performance and Accuracy of Measurement Algorithms in Software

Recent Advances in Software Router Technologies

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin

Network Design Considerations for Grid Computing

PDP : A Flexible and Programmable Data Plane. Massimo Gallo et al.

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

VALE: a switched ethernet for virtual machines

Titan: Fair Packet Scheduling for Commodity Multiqueue NICs. Brent Stephens, Arjun Singhvi, Aditya Akella, and Mike Swift July 13 th, 2017

FUJITSU Software Interstage Information Integrator V11

Software. Linux. Squid Windows

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)

Comparing TCP performance of tunneled and non-tunneled traffic using OpenVPN. Berry Hoekstra Damir Musulin OS3 Supervisor: Jan Just Keijser Nikhef

Improving DPDK Performance

NetBricks: Taking the V out of NFV. Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, Scott Shenker UC Berkeley, Google, ICSI

DIDO: Dynamic Pipelines for In-Memory Key-Value Stores on Coupled CPU-GPU Architectures

Virtual Switch Acceleration with OVS-TC

The Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers

Diffusion TM 5.0 Performance Benchmarks

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

Performance and Scalability with Griddable.io

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Design and Implementation of Virtual TAP for Software-Defined Networks

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

LegUp: Accelerating Memcached on Cloud FPGAs

TALK THUNDER SOFTWARE FOR BARE METAL HIGH-PERFORMANCE SOFTWARE FOR THE MODERN DATA CENTER WITH A10 DATASHEET YOUR CHOICE OF HARDWARE

Networking at the Speed of Light

Stateless Network Functions:

Data Path acceleration techniques in a NFV world

LANCOM Techpaper Routing Performance

Nova Scheduler: Optimizing, Configuring and Deploying NFV VNF's on OpenStack

Performance Evaluation of Myrinet-based Network Router

System Requirements. Things to Consider Before You Install Foglight NMS. Host Server Hardware and Software System Requirements

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Accelerating String Matching Using Multi-threaded Algorithm

I/O Buffering and Streaming

6WINDGate. White Paper. Packet Processing Software for Wireless Infrastructure

Extreme Networks Session Director

NetSlices: Scalable Mul/- Core Packet Processing in User- Space

Best Practices for Setting BIOS Parameters for Performance

Advanced Computer Networks. End Host Optimization

Accelerating 4G Network Performance

KVM as The NFV Hypervisor

Deep Packet Inspection of Next Generation Network Devices

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency

Real-Time Internet of Things

OpenFlow Software Switch & Intel DPDK. performance analysis

Scaling Acceleration Capacity from 5 to 50 Gbps and Beyond with Intel QuickAssist Technology

Performance Considerations of Network Functions Virtualization using Containers

BESS: A Virtual Switch Tailored for NFV

Programmable NICs. Lecture 14, Computer Networks (198:552)

Improve Performance of Kube-proxy and GTP-U using VPP

RoCE vs. iwarp Competitive Analysis

MWC 2015 End to End NFV Architecture demo_

Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016

Intel Ethernet Server Adapter XL710 for OCP

PLUSOPTIC NIC-PCIE-2SFP+-V2-PLU

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc.

Virtualization, Xen and Denali

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

Mega KV: A Case for GPUs to Maximize the Throughput of In Memory Key Value Stores

Building a Platform Optimized for the Network Edge

TOWARDS FAST IP FORWARDING

Using (Suricata over) PF_RING for NIC-Independent Acceleration

High Performance Packet Processing with FlexNIC

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

Fairness Issues in Software Virtual Routers

libvnf: building VNFs made easy

Stateless Network Functions: Breaking the Tight Coupling of State and Processing

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

VNF Chain Allocation and Management at Data Center Scale

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Backend for Software Data Planes

Executive Summary. Introduction. Test Highlights

PVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim

CME 213 S PRING Eric Darve

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate

Parallel Processing SIMD, Vector and GPU s cont.

Empowering SDN SOFTWARE-BASED NETWORKING & SECURITY FROM VYATTA. Bruno Barba Systems Engineer Mexico & CACE

A-GEAR 10Gigabit Ethernet Server Adapter X520 2xSFP+

Enabling Efficient and Scalable Zero-Trust Security

Scalable Distributed Training with Parameter Hub: a whirlwind tour

Transcription:

G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science and Technology of China

Network Function Virtualization (NFV) Network Functions: nodes on the data path between a source host and a destination host Firewall, NIDS, IPS, Gateway, VPNs, Load Balancers, etc. NFV is a network architecture concept: hardware => software Based on virtualization techniques Easier to manage/deploy, higher flexibility, higher scalability, easier to validate, etc. Construct service chains to provide specific services to meet different demands Firewall NIDS IPsec Router Virtualization Virtualization IPS LB Virtualization

s in Accelerating Network Functions s are proven to be a good candidate for accelerating network functions Router - PacketShader [Sigcomm 10] SSL reverse proxy - SSLShader [NSDI 11] IPsec Router Virtualization NIDS - Kargus [CCS 12], MIDeA [CCS 11] NDN Router - [NSDI 13] CPU Intel Xeon E5-2697 v4: 18 Cores 76.8 GB/s memory bandwidth price: $2702 Nvidia Titan X: 3840 Cores 550 GB/s memory bandwidth price: $1999 1. Massive Processing Units Network functions large number of packets thousands of cores for parallel processing 2. Massively Hiding Memory Access Latency Network functions large number of memory accesses in processing packets s can effectively hide memory access latency with massive hardware threads and zero-overhead thread scheduling (a hardware support)

-Accelerated Network Functions RX Pre- Processing Processing Post- Processing TX

-Accelerated Network Functions RX Packet parsing, batching, etc. Pre- Processing Compute/memory- intensive tasks TX Post- Processing Construct/filter packet, etc. Parallel Processing in s 5

Why s Have not Been Utilized in NFV Systems? Current -based NFs - Exclusive Access to The is only accessed by one network function NIDS Virtualization Aho-Corasick algorithm

Why s Have not Been Utilized in NFV Systems? Temporal Sharing - Only kernels from one VM can run on the at a time Inefficient Firewall NIDS IPsec Router Virtualization Bit vector linear search Aho-Corasick algorithm AES and SHA1 DIR-24-8-BASIC

Current Way of Virtualization is Inefficient Temporal Sharing - Only kernels from one VM can run on the at a time underutilization Input: 20 Mpps Firewall NIDS IPsec Router Virtualization Aho-Corasick algorithm capability: 70 Mpps Idle

Current Way of Virtualization is Inefficient Temporal Sharing - Only kernels from one VM can run on the at a time underutilization Higher latency (1) Exclusive access A A A A A Kernel Timeline (2) Temporal sharing A B C A Kernel Timeline

Spatial Sharing (1) Exclusive access A A A A A Kernel Timeline (2) Temporal sharing A B C A Kernel Timeline A A A A (3) Spatial sharing B B B B B B Kernel C C C C Timeline Spatial sharing multiple kernels run on the simultaneously Minimize the interference of kernel executions from other NFs (Latency) Enhance utilization - Kernels from VMs can run on the simultaneously (Throughput)

Hyper-Q for Spatial Sharing Hyper-Q for spatial sharing A technique that enables kernels from the same context to execute on the simultaneously Firewall NIDS IPsec Router Virtualization Hyper-Q Challenges 1. VMs have different context => Cannot utilize Hyper-Q directly 2. Kernels utilizing Hyper-Q can access the entire memory space => Security issue 3. NFs are not aware of the existence of other NFs; An NF tries to maximize its resources would influence other NFs => Demanding scheduling and resource allocation schemes

The Goal of G-NET Easy to manage /deploy Flexibility Scalability Network Function Virtualization Easy validation Agility Spatial Sharing Security Development Scheduling Resource Allocation G-NET: NF-Hypervisor Co-Design

G-NET: Virtualization NF1 NFn Use API remoting to launch operations Framework Framework VMs A proxy creates a common context in the hypervisor for spatial sharing Manager Common Context Receive requests Perform ops Send response Hypervisor

G-NET: System Workflow NF1 Zero-copy principle applied in system implementation NFn Framework Req. Framework VMs Resp. Switch Shared Memory Manager Common Context Hypervisor NIC

Achieve Predictable Performance How to control the performance of an NF? Batch Size How to guarantee the performance of a service chain? NF Perf. Throughput Latency Kernel Performance Quantity of Work Compute Resource? Firewall NIDS IPsec Router Virtualization How to allocate resources? s utilize fast thread switching to enhance hardware utilization s have massive hardware threads (#thread >> #core) Unlike CPUs, a thread is unable to be bond to a specific core

Achieve Predictable Performance How to control the performance of an NF? Batch Size How to guarantee the performance of a service chain? NF Perf. Throughput Latency Kernel Performance Quantity of Work Compute Resource? Firewall NIDS IPsec Router Virtualization Thread block Our Approach Streaming Multiprocessor (SM) as the basic unit for resource allocation One thread block can only run on one SM; A thread block would be scheduled to run on an idle SM when there are available ones A thread block is allocated with one SM when Total #thread blocks <= Total #SMs Core SM

Service Chain Based Scheduling How to optimize the performance of a service chain with limited compute resources NFs have different processing tasks Firewall NIDS IPsec Router #SM 5 5 5 5 Throughput (Mpps) 6 3 4 10 3 Fairness Service chain based scheduling and resource allocation Locate the bottleneck NF (the NF with the lowest throughput T) Allocate resources for all NFs in the service chain to achieve throughput T * (1+P) (0<P<1) Firewall NIDS IPsec Router SMs Throughput improves by P in each round

Service Chain Based Scheduling NF1 NF2 NFn B1 B2 Bn Framework Framework Framework VMs Switch Scheduler Manager Traffic Speed 1. Batch size 2. #Thread blk Common Context Hypervisor NIC Streaming Multiprocessor NF1:5 NF2:4 NF3:7

IsoPointer for Memory Isolation NF1 NF2 NFn *Pa *Pb Framework Framework Framework *P > Memory Region Base Address (B) Memory Size (S) Pointer access checking: B <= P < B+S Switch Scheduler Manager Common Context NIC Ma Mb IsoPointer: guarantee memory isolation

NF Development NF1 NF2 NFn Repetitive development efforts CPU- pipeline Framework Framework Framework Manage CPU threads Communicate with Manager Packet I/O with Switch Switch Scheduler Manager Common Context Framework handles all common operations NIC

NF Development NF1 NF2 NFn NF Spec. NF Spec. Abstraction Abstraction NF Spec. Abstraction Framework Framework Framework Repetitive development efforts CPU- pipeline Manage CPU threads Communicate with Manager Packet I/O with Switch Switch Scheduler Manager Common Context Framework handles all common operations Abstraction to simplify NF development NIC

NF Development NF1 Pre- Processing Processing Post- Processing NF Spec. Abstraction Framework memcpy_htod pre_pkt_handler set_kernel_args post_pkt_handler memcpy_dtoh called for each pkt called for each kernel called for each pkt CPU code (Router) Implementation = Significantly reduces development efforts + Kernel Core functions

Evaluation Hardware CPU: Intel Xeon E5-2650 v4 : NVIDIA GTX TITAN X NIC: Intel XL710 40Gbps Software Virtualization: Docker 17.03.0-ce NIC Driver & Library: DPDK 17.02.1 OS: CentOS 7.3.1611, Linux kernel 3.8.0-30 Service Chains 2 NFs: IPsec NIDS 3 NFs: { Firewall IPsec NIDS IPsec NIDS Router 4 NFs: Firewall IPsec NIDS Router

Throughput Comparison with Temporal Sharing Normalized Throughput 1.5 1.25 1 0.75 up to 23.8% 1.5 1.25 1 0.75 up to 25.9% 0.5 64 128 256 512 1024 1518 0.5 64 128 256 512 1024 1518 Temporal Share G-NET Packet Size (Byte) (a) IPSec+NIDS Packet Size (Byte) (b) Firewall+IPSec+NIDS Normalized Throughput 1.5 1.25 1 0.75 up to 21.5% 2 1.5 1 up to 70.8% More Resource Competition with four NFs 0.5 64 128 256 512 1024 1518 0.5 64 128 256 512 1024 1518 Packet Size (Byte) Packet Size (Byte) (c) IPSec+NIDS+Router (d) Firewall+IPSec+NIDS+Router

Scheduling Service chain scheduling scheme comparison G-NET: optimize the performance of the service chain FairShare: Evenly partition compute resources among NFs UncoShare: Each NF tries to maximize its resource allocation 1 Firewall + IPSec + NIDS + Router FairShare UncoShare G-NET Throughput Improvement 0.8 0.6 0.4 0.2 10.7% average improvement over FairShare 80.5% average improvement over UncoShare 0 64 128 256 512 1024 1518 Packet Size (Byte)

Latency NIDS IPSec+NIDS Firewall+IPSec+NIDS Firewall+IPSec+NIDS+Router 1 95th 0.75 CDF 0.5 50th 0.25 0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 Round Trip Time (microsecond) 1.5 G-NET Temporal Sharing 1.5 Normalized Latency 1.25 1 0.75 1.25 1 50th: ~20% 95th: 25% - 44% 0.75 0.5 IPSec+NIDS Firewall+ IPSec+NIDS Firewall+IPSec+ NIDS+Router 0.5 IPSec+NIDS Firewall+ IPSec+NIDS Firewall+IPSec+ NIDS+Router (a) 50th percentile latency (b) 95th percentile latency

Conclusion G-NET: An NFV system that efficiently utilizes s with spatial sharing Service chain based scheduling and resource allocation scheme Memory isolation with IsoPointer Abstraction that simplifies building -accelerated NFs Experimental Results (Compare with temporal sharing) Enhances throughput by up to 70.8% Reduces latency by up to 44.3% G-NET High-efficient platform for building -based NFV systems