NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains

Similar documents
G-NET: Effective GPU Sharing In NFV Systems

Flurries: Countless Fine-Grained NFs for Flexible Per-Flow Customization

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

Computer Networking. Queue Management and Quality of Service (QOS)

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Allocations in High Speed Networks. Congestion Control in Today s Internet

15-744: Computer Networking. Overview. Queuing Disciplines. TCP & Routers. L-6 TCP & Routers

NetBricks: Taking the V out of NFV. Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, Scott Shenker UC Berkeley, Google, ICSI

Mininet Performance Fidelity Benchmarks

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

CS 268: Computer Networking

Next Gen Virtual Switch. CloudNetEngine Founder & CTO Jun Xiao

New Approach to OVS Datapath Performance. Founder of CloudNetEngine Jun Xiao

Congestion Control. Daniel Zappala. CS 460 Computer Networking Brigham Young University

Random Early Detection (RED) gateways. Sally Floyd CS 268: Computer Networks

Router s Queue Management

An Analysis and Empirical Study of Container Networks

MWC 2015 End to End NFV Architecture demo_

Reliably Scalable Name Prefix Lookup! Haowei Yuan and Patrick Crowley! Washington University in St. Louis!! ANCS 2015! 5/8/2015!

MUD: Send me your top 1 3 questions on this lecture

Unit 2 Packet Switching Networks - II

The Power of Batching in the Click Modular Router

Lecture 24: Scheduling and QoS

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

The Network Layer and Routers

Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter. Glenn Judd Morgan Stanley

Supporting Fine-Grained Network Functions through Intel DPDK

of-service Support on the Internet

Performance Considerations of Network Functions Virtualization using Containers

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

Master Course Computer Networks IN2097

Congestion Control for High Bandwidth-delay Product Networks

Master Course Computer Networks IN2097

Fairness Issues in Software Virtual Routers

CS 268: Lecture 7 (Beyond TCP Congestion Control)

Congestion Control for High Bandwidth-delay Product Networks. Dina Katabi, Mark Handley, Charlie Rohrs

Advanced Operating Systems (CS 202) Scheduling (1)

Exploiting ICN for Flexible Management of Software-Defined Networks

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Overview. Lecture 22 Queue Management and Quality of Service (QoS) Queuing Disciplines. Typical Internet Queuing. FIFO + Drop tail Problems

Container Adoption for NFV Challenges & Opportunities. Sriram Natarajan, T-Labs Silicon Valley Innovation Center

PVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim

Data Center TCP (DCTCP)

Congestion Control In the Network

CS644 Advanced Networks

Implementation of Software-based EPON-OLT and Performance Evaluation

Adaptive Video Multicasting

Design Challenges for High Performance, Scalable NFV Interconnects

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Course Syllabus. Operating Systems

Precision and Accuracy of Packet Generators Who tests the testers?

Scheduling. Scheduling 1/51

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

libvnf: building VNFs made easy

Open Source Traffic Analyzer

Transport Protocols for Data Center Communication. Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Processes and threads

Resource allocation in networks. Resource Allocation in Networks. Resource allocation

RDMA and Hardware Support

Project Computer Networking. Resource Management Approaches. Start EARLY Tomorrow s recitation. Lecture 20 Queue Management and QoS

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

OpenNetVM: A Platform for High Performance Network Service Chains

Lecture Topics. Announcements. Today: Uniprocessor Scheduling (Stallings, chapter ) Next: Advanced Scheduling (Stallings, chapter

SafeBricks: Shielding Network Functions in the Cloud

QoS Services with Dynamic Packet State

High Performance Packet Processing with FlexNIC

Mohammad Hossein Manshaei 1393

Stateless Network Functions:

Real-Time Protocol (RTP)

Chapter 6: CPU Scheduling

IEEE/ACM TRANSACTIONS ON NETWORKING 1

XCP: explicit Control Protocol

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

Fairness, Queue Management, and QoS

RED behavior with different packet sizes

VALE: a switched ethernet for virtual machines

Equation-Based Congestion Control for Unicast Applications. Outline. Introduction. But don t we need TCP? TFRC Goals

Chapter III: Transport Layer

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

Network Design Considerations for Grid Computing

Data Path acceleration techniques in a NFV world

15-744: Computer Networking. Middleboxes and NFV

CS551 Router Queue Management

NetVM: High Performance and Flexible Networking Using Virtualization on Commodity Platforms

Master Course Computer Networks IN2097

Master Course Computer Networks IN2097

TDDD82 Secure Mobile Systems Lecture 6: Quality of Service

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency

Advanced Computer Networks. Flow Control

Reservation-Based Scheduling for IRQ Threads

Advanced Computer Networks. End Host Optimization

Properties of Processes

Preserving I/O Prioritization in Virtualized OSes

Arrakis: The Operating System is the Control Plane

Scheduling. Scheduling 1/51

Chapter 5: CPU Scheduling

TCP/IP over ATM using ABR, UBR, and GFR Services

SWsoft ADVANCED VIRTUALIZATION AND WORKLOAD MANAGEMENT ON ITANIUM 2-BASED SERVERS

Transcription:

NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains Sameer G Kulkarni 1, Wei Zhang 2, Jinho Hwang 3, Shriram Rajagopalan 3, K.K. Ramakrishnan 4, Timothy Wood 2, Mayutan Arumaithurai 1 & Xiaoming Fu 1 1 University of Göttingen 2 George Washington University 3 IBM T J Watson Research Center 4 University of California, Riverside.

Growing NFV Popularity.. Diverse and Large # of middleboxes, on par with switches and routers in ISP/DC Networks [APLOMB, SIGCOMM 12] NFs are fast replacing the traditional middleboxes in ISP/Telco/DC networks. *source: ETSI NFV June 13 2

Growing NFV Popularity.. Diverse and Large # of middleboxes, on par with switches and routers in ISP/DC Networks [APLOMB, SIGCOMM 12] NFs are fast replacing the traditional middleboxes in ISP/Telco/DC networks. Typical usage of NFs include Function chaining. *source: ETSI NFV June 13 3

Growing NFV Popularity.. Diverse and Large # of middleboxes, on par with switches and routers in ISP/DC Networks [APLOMB, SIGCOMM 12] NFs are fast replacing the traditional middleboxes in ISP/Telco/DC networks. Typical usage of NFs include Function chaining. Performance (Resource Utilization) and Scalability are the key for NFV deployment! *source: ETSI NFV June 13 4

How to address performance and scalability for NFV platform? 5

How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. 6

How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! 7

How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! Multiplexing approach: Flurries [CoNext 16], ClickOS [NSDI 14] Multiplex NFs on same core. Performance?, Scalability 8

How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! Multiplexing approach: Flurries [CoNext 16], ClickOS [NSDI 14] Multiplex NFs on same core. Performance?, Scalability 9

How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! Multiplexing approach: Flurries [CoNext 16], ClickOS [NSDI 14] Multiplex NFs on same core. Performance?, Scalability Then, how to Schedule the NFs to optimize the performance? 10

How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! Multiplexing approach: Flurries [CoNext 16], ClickOS [NSDI 14] Multiplex NFs on same core. Performance?, Scalability Then, how to Schedule the NFs to optimize the performance? 11

Use Existing Linux Schedulers? Vanilla Linux schedulers: Do existing schedulers perform well? 12

Use Existing Linux Schedulers? Vanilla Linux schedulers: Completely Fair Scheduler Normal or Default Batch - Virtual run time - Nanosecond granularity Do existing schedulers perform well? 13

Use Existing Linux Schedulers? Vanilla Linux schedulers: Completely Fair Scheduler Normal or Default Batch - Virtual run time - Nanosecond granularity Real Time Scheduler Round Robin FIFO - Time slice - Millisecond granularity Do existing schedulers perform well? 14

OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) 15

OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) Ideal Allocation (, ) Ideal Allocation (NF3) 16

OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) Ideal Allocation (, ) Ideal Allocation (NF3) 17

OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) Ideal Allocation (, ) Ideal Allocation (NF3) Ideal Throughput (F1, F2) Ideal Throughput (F3) 18

OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) Schedulers fail to account NF load! Ideal Allocation (, ) Ideal Allocation (NF3) Ideal Throughput (F1, F2) Ideal Throughput (F3) 19

OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) 20

OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) Ideal Allocation () Ideal Allocation () Ideal Allocation (NF3) 21

OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) Ideal Allocation () Ideal Allocation () Ideal Allocation (NF3) 22

OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) Ideal Allocation () Ideal Allocation () Ideal Throughput (F1, F2, F3) Ideal Allocation (NF3) 23

OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) Schedulers fail to account NF Processing cost! Ideal Allocation () Ideal Allocation () Ideal Throughput (F1, F2, F3) Ideal Allocation (NF3) 24

OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow 1 25

OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow 1 26

OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow 1 7.1 3.58 3.52 2.02 27

OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow 1 7.1 3.58 3.52 2.02 Ctx sw/s NORMAL BATCH RR Total 20K/s 2K/s 1K/s CPU % NORMAL BATCH RR 34% 15% 9% 34% 42% 37% NF3 33% 43% 54% 28

OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow 1 7.1 3.58 3.52 2.02 Too many/too little context switches result in overhead and in-appropriate allocation of CPU Ctx sw/s NORMAL BATCH RR Total 20K/s 2K/s 1K/s CPU % NORMAL BATCH RR 34% 15% 9% 34% 42% 37% NF3 33% 43% 54% 29

OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow 1 7.1 3.58 3.52 2.02 Too many/too little context switches result in overhead and in-appropriate allocation of CPU Ctx sw/s NORMAL BATCH RR Total 20K/s 2K/s 1K/s CPU % NORMAL BATCH RR Vanilla Linux schedulers result in sub-optimal 34% resource 15% utilization. 9% 34% 42% 37% NF3 33% 43% 54% 30

OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow 1 7.1 3.58 3.52 2.02 Too many/too little context switches result in overhead and in-appropriate allocation of CPU Ctx sw/s NORMAL BATCH RR Total 20K/s 2K/s 1K/s CPU % NORMAL BATCH RR Vanilla Linux schedulers result in sub-optimal 34% resource 15% utilization. 9% 34% 42% 37% Need the schedulers to be Load, NF characteristic, NF3 33% & chain 43% aware! 54% 31

NFVnice A user space control framework for scheduling NFV chains. 32

NFVnice A user space control framework for scheduling NFV chains. NFVnice in a nutshell: Complements the existing kernel task schedulers. Integrates Rate proportional scheduling from Hardware schedulers. Integrates Cost Proportional scheduling from software schedulers. Built on OpenNetVM[HMBox 16, NSDI 14]: A DPDK based NFV platform. Enables deployment of containerized (Docker) or process based NFs. Improves NF Throughput, Fairness and CPU Utilization through: Proportional and Fair share of CPU to NFs: Tuning Scheduler. Avoid wasted work and isolate bottlenecks: Backpressure. Efficient I/O management framework for NFs. 33

NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice 34

NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice Cgroups: (control groups) is a Linux kernel feature that limits, accounts for and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes. 35

cgroups Rate-Cost Proportional Fairness What is Rate-Cost Proportional Fairness? Determines the NFs CPU share by accounting the: NF Load (Avg. packet arrival rate, instantaneous Queue length) NF Priority and per-packet computation cost (Median) Why? Efficient and fair allocation of CPU to the contending NFs. Provides upper bound on the wait/idle time for each NF. Flexible & Extensible approach to adapt any QOS policy. 36

cgroups Rate-Cost Proportional Fairness Initialization mkdir /cgroupfs/nf(i)

cgroups Rate-Cost Proportional Fairness Initialization mkdir /cgroupfs/nf(i) Weight Computation

cgroups Rate-Cost Proportional Fairness Initialization mkdir /cgroupfs/nf(i) Weight Computation Update Cgroup Weight Write NFShare(i) to /cgrougfs/nf(i)/cpu.shares

cgroups Rate-Cost Proportional Fairness Initialization mkdir /cgroupfs/nf(i) Weight Computation Every 10 ms Update Cgroup Weight Write NFShare(i) to /cgrougfs/nf(i)/cpu.shares

NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice Back pressure Chain-aware scheduling; Avoid wasted work (within and across cores) 41

Back pressure Backpressure in NF chains Selective per chain backpressure marking. Bottleneck Upstream Dataflow Downstream Only Flow A going through bottleneck NF (NF3) is back pressured and throttled at the upstream source. while Flow B is not affected. 42

Back pressure Scenario: No Backpressure NF3 Flow 1 Flow 2 NF3* NF4 43

Back pressure Scenario: No Backpressure NF3 Flow 1 Flow 2 NF3* NF4 and contend for CPU, and steal the CPU cycles from NF3! 44

Back pressure Scenario: No Backpressure NF3 Flow 1 Flow 2 NF3* NF4 and contend for CPU, and steal the CPU cycles from NF3! Lots of Wasted Work!!! 45

Back pressure Traditional Backpressure NF3 Stop! Stop! Flow 1 Flow 2 NF3* NF4 46

Back pressure Traditional Backpressure NF3 Stop! Stop! Flow 1 Flow 2 NF3* NF4 Lots of Wasted Work Incurs feedback delay and offers no flow isolation. 47

Back pressure NFVnice Backpressure NF3 NF4 Flow 1 Flow 2 NF3* NF4 Lots of Wasted Work Incurs Delay, No Isolation TX Watch list Monitor Reacts Instantaneously Packet Throttling per Chain Clear Throttle Throttle Packets

Back pressure NFVnice Backpressure NF3 NF4 Flow 1 Flow 2 NF3* NF4 Lots of Wasted Work Incurs Delay, No Isolation TX Watch list Monitor Reacts Instantaneously Packet Throttling per Chain Clear Throttle Qlen < LOW_WATER_MARK Throttle Packets

Back pressure NFVnice Backpressure NF3 NF4 Flow 1 Flow 2 NF3* NF4 Lots of Wasted Work Incurs Delay, No Isolation TX Watch list Monitor Reacts Instantaneously Packet Throttling per Chain Clear Throttle Qlen < LOW_WATER_MARK Throttle Packets

NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice Back pressure Chain-aware scheduling; Avoid wasted work (within and across cores) ECN End-2-End Bottleneck/congestion control (across nodes) 51

ECN ECN Marking ECN-aware NF Manager: Per-NF ECN marking based on Active Queue Management (AQM) policies. 52

ECN ECN Marking ECN-aware NF Manager: Per-NF ECN marking based on Active Queue Management (AQM) policies. Address NF bottleneck across chain of NFs in distinct nodes. 53

NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice Back pressure Chain-aware scheduling; Avoid wasted work (within and across cores) ECN End-2-End Bottleneck/congestion control (across nodes) I/O Mgt. Efficient Disk I/O Mgmt. Library 54

NFVnice: Implementation OpenNetVM [HMBox 16, NSDI 14] makes use of DPDK to enable fast packet processing. NFVnice extends the Data and Control plane functionalities to facilitate efficient multiplexing and scheduling of NFs on same core. 55

Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. Monitor 56

Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. Wakeup Thread Wakeup notification to the NFs Timer Management. Monitor 57

Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. Wakeup Thread Wakeup notification to the NFs Timer Management. Monitor 58

Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. NF threads (libnf) Voluntary yield decisions. Estimate per-packet processing cost. Monitor 59

Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. Monitor Thread periodically (1ms) monitors NF load. computes the cpu share for each core. Monitor Tracks EWMA of NFs Rx queue length and mark ECN. 60

Evaluation Testbed: Hardware: 3 Intel Xeon(R) CPU E5-2697, 28 cores @2.6Ghz servers, with dual port 10Gbps DPDK compatible NICs. Software: Linux kernel 3.19.0-39-lowlatency profile. NFVnice: built on top of OpenNetVM. 10Gbps 10Gbps Traffic: Pktgen and Moongen: Line rate traffic (64 byte packets). Iperf: TCP flows. Schemes compared: Native Linux Schedulers with and w/o NFVnice. Different NFs (varying computation costs) and chain configurations. 61

Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet 62

Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet 63

Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet 64

Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet 65

Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet Significant Reduction in Wasted Work! CFS Normal Scheduler Wasted Work (Packet Drops/sec) Default NFVnice 3.58M 58% 2.02M 50% 11.2K <0.3% 12.3K <0.4% NF3 - - 66

Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet Significant Reduction in Wasted Work! CPU Allocation < Computation Cost CFS Normal Scheduler Wasted Work (Packet Drops/sec) NF Runtime (ms) (measured over 2s interval) Default NFVnice Default NFVnice 3.58M 58% 11.2K <0.3% 657.825 128.723 2.02M 50% 12.3K <0.4% 602.285 848.922 NF3 - - 623.797 1014.218 67

Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet Significant Reduction in Wasted Work! CPU Allocation < Computation Cost CFS Normal Scheduler Wasted Work (Packet Drops/sec) NF Runtime (ms) (measured over 2s interval) Default NFVnice Default NFVnice 3.58M 58% 11.2K <0.3% 657.825 128.723 2.02M 12.3K 602.285 848.922 50% <0.4% NFVnice improves NF3throughput - - for 623.797 all kernel 1014.218schedulers. 68

Efficient Resource (CPU) Utilization Three NF Chain (NF per core) 550 Cpp 2200 Cpp NF3 4500 Cpp Core-1 Core-2 Core-3 Default Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% NF3 (4500cycles) 0.6Mpps - 100% Aggregate 0.6Mpps - 69

Efficient Resource (CPU) Utilization Three NF Chain (NF per core) 550 Cpp 2200 Cpp NF3 4500 Cpp Core-1 Core-2 Core-3 Lots of Wasted Work! Burning CPU!! Default Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% NF3 (4500cycles) 0.6Mpps - 100% Aggregate 0.6Mpps - 70

Efficient Resource (CPU) Utilization Three NF Chain (NF per core) 550 Cpp 2200 Cpp NF3 4500 Cpp Core-1 Core-2 Core-3 Lots of Wasted Work! Burning CPU!! Default Svc. Rate Drop rate CPU Util NFVnice Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% 0.82Mpps 0.15Mpps 11% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% 0.72Mpps 0.07Mpps 64% NF3 (4500cycles) 0.6Mpps - 100% 0.6Mpps - 100% Aggregate 0.6Mpps - 0.6Mpps - 71

Efficient Resource (CPU) Utilization Three NF Chain (NF per core) Lots of Wasted Work! Burning CPU!! 550 Cpp 2200 Cpp NF3 4500 Cpp Core-1 Core-2 Core-3 Avoid Wasted Work! Optimize CPU utilization. Default Svc. Rate Drop rate CPU Util NFVnice Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% 0.82Mpps 0.15Mpps 11% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% 0.72Mpps 0.07Mpps 64% NF3 (4500cycles) 0.6Mpps - 100% 0.6Mpps - 100% Aggregate 0.6Mpps - 0.6Mpps - 72

Efficient Resource (CPU) Utilization Three NF Chain (NF per core) Lots of Wasted Work! Burning CPU!! 550 Cpp 2200 Cpp NF3 4500 Cpp Core-1 Core-2 Core-3 Achieves Same Throughput Avoid Wasted Work! Optimize CPU utilization. Default Svc. Rate Drop rate CPU Util NFVnice Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% 0.82Mpps 0.15Mpps 11% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% 0.72Mpps 0.07Mpps 64% NF3 (4500cycles) 0.6Mpps - 100% 0.6Mpps - 100% Aggregate 0.6Mpps - 0.6Mpps - 73

Efficient Resource (CPU) Utilization Three NF Chain (NF per core) Lots of Wasted Work! Burning CPU!! 550 Cpp 2200 Cpp NF3 4500 Cpp Core-1 Core-2 Core-3 Achieves Same Throughput Avoid Wasted Work! Optimize CPU utilization. (550cycles) (2200cycles) NF3 (4500cycles) Default Svc. Rate Drop rate CPU Util 5.95Mpps 1.18Mpps 0.6Mpps 4.76Mpps 80% 0.58Mpps 49% Aggregate 0.6Mpps - - 100% 100% 100% NFVnice Svc. Rate Drop rate CPU Util 0.82Mpps 0.15Mpps 11% 0.72Mpps 0.07Mpps 64% 0.6Mpps 0.6Mpps - - 100% NFVnice avoids wasted work; provides better resource utilization 74

Performance + Resource Utilization Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain 2 0 1 2 3 4 5 6 7 8 9 # Packets processed in Mpps 75

Performance + Resource Utilization Inefficient CPU utilization by Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain 2 0 1 2 3 4 5 6 7 8 9 # Packets processed in Mpps 76

Performance + Resource Utilization Inefficient CPU utilization by Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain 2 0 1 2 3 4 5 6 7 8 9 # Packets processed in Mpps 77

Performance + Resource Utilization Inefficient CPU utilization by Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain 2 Judicious utilization of NFs CPU 0 1 2 3 4 5 6 7 8 9 # Packets processed in Mpps 78

Performance + Resource Utilization Inefficient CPU utilization by Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain 2 Judicious utilization of NFs CPU 0 1 2 3 4 5 6 7 8 9 # Packets processed in Mpps 79

Performance + Resource Utilization Inefficient CPU utilization by Chain 1 Chain 2 Judicious utilization of NFs CPU ~2x Throughput Gain with efficient CPU utilization Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 0 1 2 3 4 5 6 7 8 9 # Packets processed in Mpps 80

Performance + Resource Utilization Inefficient CPU utilization by Chain 1 Chain 2 Judicious utilization of NFs CPU ~2x Throughput Gain with efficient CPU utilization Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 0 1 2 3 4 5 6 7 8 9 Flows get right amount of bandwidth and NF resources # Packets processed in Mpps 81

Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Core-1 Core-1 Core-1 82

Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Core-1 Core-1 Core-1 83

Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Robust across chain diversity Core-1 Core-1 Core-1 84

Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Robust across chain diversity Core-1 Core-1 Core-1 85

Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Robust across chain diversity Core-1 Core-1 Core-1 Consistently improves throughput regardless of chain characteristics 86

NF Processing cost variation Simple Three NF Chain * * NF3* Core-1 Core-1 Core-1 *Variable per packet processing cost [120 to 550 cpp] 87

NF Processing cost variation Simple Three NF Chain * * NF3* Core-1 Core-1 Core-1 *Variable per packet processing cost [120 to 550 cpp] NFVnice is resilient to cost variations! 88

TCP and UDP Isolation 89

TCP and UDP Isolation TCP affected by UDP flows! Wastage of, bandwidth 10000 TCP W/O NFVnice TCP With NFVnice UDP W/O NFVnice UDP With NFVnice Throughput in Mbps 1000 100 10 1 5 10 15 20 25 30 35 40 45 50 55 60 Time in Seconds 90

TCP and UDP Isolation TCP affected by UDP flows! Wastage of, bandwidth 10000 TCP W/O NFVnice TCP With NFVnice UDP W/O NFVnice UDP With NFVnice Effectively isolates UDP and TCP flows Throughput in Mbps 1000 100 10 1 5 10 15 20 25 30 35 40 45 50 55 60 Time in Seconds 91

Conclusion NFVnice enables performance and scale A user space framework complementing the OS CPU schedulers. Brings the best of Hardware packet schedulers and CPU schedulers to the NFV platform. Weight adjustments and Backpressure help get better NFV performance. Improves Fairness and Throughput by being chain-aware. Our work will be open-sourced soon: Get OpenNetVM: http://sdnfv.github.io/ Watch out for the link: https://github.com/sdnfv/nfvnice 92

Thank you! CleanSky ITN: A EU FP7 Marie Curie Initial Training Network A Network for the Cloud Computing Eco-System 93