NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains

Size: px

Start display at page:

Download "NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains"

Noel Ramsey
6 years ago
Views:

K. Ramakrishnan 4, Timothy Wood 2, Mayutan

1 NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains Sameer G Kulkarni 1, Wei Zhang 2, Jinho Hwang 3, Shriram Rajagopalan 3, K.K. Ramakrishnan 4, Timothy Wood 2, Mayutan Arumaithurai 1 & Xiaoming Fu 1 1 University of Göttingen 2 George Washington University 3 IBM T J Watson Research Center 4 University of California, Riverside.

2 Growing NFV Popularity.. Diverse and Large # of middleboxes, on par with switches and routers in ISP/DC Networks [APLOMB, SIGCOMM 12] NFs are fast replacing the traditional middleboxes in ISP/Telco/DC networks. *source: ETSI NFV June 13 2

3 Growing NFV Popularity.. Diverse and Large # of middleboxes, on par with switches and routers in ISP/DC Networks [APLOMB, SIGCOMM 12] NFs are fast replacing the traditional middleboxes in ISP/Telco/DC networks. Typical usage of NFs include Function chaining. *source: ETSI NFV June 13 3

4 Growing NFV Popularity.. Diverse and Large # of middleboxes, on par with switches and routers in ISP/DC Networks [APLOMB, SIGCOMM 12] NFs are fast replacing the traditional middleboxes in ISP/Telco/DC networks. Typical usage of NFs include Function chaining. Performance (Resource Utilization) and Scalability are the key for NFV deployment! *source: ETSI NFV June 13 4

5 How to address performance and scalability for NFV platform? 5

6 How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. 6

7 How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! 7

8 How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! Multiplexing approach: Flurries [CoNext 16], ClickOS [NSDI 14] Multiplex NFs on same core. Performance?, Scalability 8

How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core.

9 How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! Multiplexing approach: Flurries [CoNext 16], ClickOS [NSDI 14] Multiplex NFs on same core. Performance?, Scalability 9

10 How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! Multiplexing approach: Flurries [CoNext 16], ClickOS [NSDI 14] Multiplex NFs on same core. Performance?, Scalability Then, how to Schedule the NFs to optimize the performance? 10

11 How to address performance and scalability for NFV platform? Consolidation approaches E2 [SOSP 15], NetBricks [OSDI 16]: Consolidate NFs of a chain on single core. But, lot of different NFs and diverse NF chains >>> compute cores! Performance, Scalability Challenges Exist!! Multiplexing approach: Flurries [CoNext 16], ClickOS [NSDI 14] Multiplex NFs on same core. Performance?, Scalability Then, how to Schedule the NFs to optimize the performance? 11

12 Use Existing Linux Schedulers? Vanilla Linux schedulers: Do existing schedulers perform well? 12

13 Use Existing Linux Schedulers? Vanilla Linux schedulers: Completely Fair Scheduler Normal or Default Batch - Virtual run time - Nanosecond granularity Do existing schedulers perform well? 13

14 Use Existing Linux Schedulers? Vanilla Linux schedulers: Completely Fair Scheduler Normal or Default Batch - Virtual run time - Nanosecond granularity Real Time Scheduler Round Robin FIFO - Time slice - Millisecond granularity Do existing schedulers perform well? 14

15 OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) 15

16 OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) Ideal Allocation (, ) Ideal Allocation (NF3) 16

17 OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) Ideal Allocation (, ) Ideal Allocation (NF3) 17

18 OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) Ideal Allocation (, ) Ideal Allocation (NF3) Ideal Throughput (F1, F2) Ideal Throughput (F3) 18

19 OS Scheduler Characterization (Load) 3 Homogeneous NFs running on a same core with offered load 2:2:1. 6Mpps 6Mpps 3Mpps Flow 1 Flow 2 Flow 3 (X) (X) Core NF3 (X) Schedulers fail to account NF load! Ideal Allocation (, ) Ideal Allocation (NF3) Ideal Throughput (F1, F2) Ideal Throughput (F3) 19

20 OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) 20

21 OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) Ideal Allocation () Ideal Allocation () Ideal Allocation (NF3) 21

22 OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) Ideal Allocation () Ideal Allocation () Ideal Allocation (NF3) 22

23 OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) Ideal Allocation () Ideal Allocation () Ideal Throughput (F1, F2, F3) Ideal Allocation (NF3) 23

24 OS Scheduler Characterization (Cost) 3 Heterogeneous NFs (per packet processing cost 10:5:1) with equal load ~5Mpps ~5Mpps ~5Mpps Flow 1 Flow 2 Flow 3 (10X) (5X) Core NF3 (X) Schedulers fail to account NF Processing cost! Ideal Allocation () Ideal Allocation () Ideal Throughput (F1, F2, F3) Ideal Allocation (NF3) 24

25 OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow 1 25

26 OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow 1 26

27 OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow

28 OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow Ctx sw/s NORMAL BATCH RR Total 20K/s 2K/s 1K/s CPU % NORMAL BATCH RR 34% 15% 9% 34% 42% 37% NF3 33% 43% 54% 28

29 OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow Too many/too little context switches result in overhead and in-appropriate allocation of CPU Ctx sw/s NORMAL BATCH RR Total 20K/s 2K/s 1K/s CPU % NORMAL BATCH RR 34% 15% 9% 34% 42% 37% NF3 33% 43% 54% 29

30 OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow Too many/too little context switches result in overhead and in-appropriate allocation of CPU Ctx sw/s NORMAL BATCH RR Total 20K/s 2K/s 1K/s CPU % NORMAL BATCH RR Vanilla Linux schedulers result in sub-optimal 34% resource 15% utilization. 9% 34% 42% 37% NF3 33% 43% 54% 30

02 Too many/too little context switches result in overhead and

1K/s CPU % NORMAL BATCH RR Vanilla Linux schedulers result in sub-optimal

31 OS Scheduler Characterization (Chain) 3 NF chain (all NFs running on same core): Flow Too many/too little context switches result in overhead and in-appropriate allocation of CPU Ctx sw/s NORMAL BATCH RR Total 20K/s 2K/s 1K/s CPU % NORMAL BATCH RR Vanilla Linux schedulers result in sub-optimal 34% resource 15% utilization. 9% 34% 42% 37% Need the schedulers to be Load, NF characteristic, NF3 33% & chain 43% aware! 54% 31

32 NFVnice A user space control framework for scheduling NFV chains. 32

33 NFVnice A user space control framework for scheduling NFV chains. NFVnice in a nutshell: Complements the existing kernel task schedulers. Integrates Rate proportional scheduling from Hardware schedulers. Integrates Cost Proportional scheduling from software schedulers. Built on OpenNetVM[HMBox 16, NSDI 14]: A DPDK based NFV platform. Enables deployment of containerized (Docker) or process based NFs. Improves NF Throughput, Fairness and CPU Utilization through: Proportional and Fair share of CPU to NFs: Tuning Scheduler. Avoid wasted work and isolate bottlenecks: Backpressure. Efficient I/O management framework for NFs. 33

34 NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice 34

35 NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice Cgroups: (control groups) is a Linux kernel feature that limits, accounts for and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes. 35

36 cgroups Rate-Cost Proportional Fairness What is Rate-Cost Proportional Fairness? Determines the NFs CPU share by accounting the: NF Load (Avg. packet arrival rate, instantaneous Queue length) NF Priority and per-packet computation cost (Median) Why? Efficient and fair allocation of CPU to the contending NFs. Provides upper bound on the wait/idle time for each NF. Flexible & Extensible approach to adapt any QOS policy. 36

37 cgroups Rate-Cost Proportional Fairness Initialization mkdir /cgroupfs/nf(i)

38 cgroups Rate-Cost Proportional Fairness Initialization mkdir /cgroupfs/nf(i) Weight Computation

39 cgroups Rate-Cost Proportional Fairness Initialization mkdir /cgroupfs/nf(i) Weight Computation Update Cgroup Weight Write NFShare(i) to /cgrougfs/nf(i)/cpu.shares

40 cgroups Rate-Cost Proportional Fairness Initialization mkdir /cgroupfs/nf(i) Weight Computation Every 10 ms Update Cgroup Weight Write NFShare(i) to /cgrougfs/nf(i)/cpu.shares

41 NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice Back pressure Chain-aware scheduling; Avoid wasted work (within and across cores) 41

42 Back pressure Backpressure in NF chains Selective per chain backpressure marking. Bottleneck Upstream Dataflow Downstream Only Flow A going through bottleneck NF (NF3) is back pressured and throttled at the upstream source. while Flow B is not affected. 42

43 Back pressure Scenario: No Backpressure NF3 Flow 1 Flow 2 NF3* NF4 43

44 Back pressure Scenario: No Backpressure NF3 Flow 1 Flow 2 NF3* NF4 and contend for CPU, and steal the CPU cycles from NF3! 44

45 Back pressure Scenario: No Backpressure NF3 Flow 1 Flow 2 NF3* NF4 and contend for CPU, and steal the CPU cycles from NF3! Lots of Wasted Work!!! 45

46 Back pressure Traditional Backpressure NF3 Stop! Stop! Flow 1 Flow 2 NF3* NF4 46

47 Back pressure Traditional Backpressure NF3 Stop! Stop! Flow 1 Flow 2 NF3* NF4 Lots of Wasted Work Incurs feedback delay and offers no flow isolation. 47

48 Back pressure NFVnice Backpressure NF3 NF4 Flow 1 Flow 2 NF3* NF4 Lots of Wasted Work Incurs Delay, No Isolation TX Watch list Monitor Reacts Instantaneously Packet Throttling per Chain Clear Throttle Throttle Packets

49 Back pressure NFVnice Backpressure NF3 NF4 Flow 1 Flow 2 NF3* NF4 Lots of Wasted Work Incurs Delay, No Isolation TX Watch list Monitor Reacts Instantaneously Packet Throttling per Chain Clear Throttle Qlen < LOW_WATER_MARK Throttle Packets

50 Back pressure NFVnice Backpressure NF3 NF4 Flow 1 Flow 2 NF3* NF4 Lots of Wasted Work Incurs Delay, No Isolation TX Watch list Monitor Reacts Instantaneously Packet Throttling per Chain Clear Throttle Qlen < LOW_WATER_MARK Throttle Packets

51 NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice Back pressure Chain-aware scheduling; Avoid wasted work (within and across cores) ECN End-2-End Bottleneck/congestion control (across nodes) 51

52 ECN ECN Marking ECN-aware NF Manager: Per-NF ECN marking based on Active Queue Management (AQM) policies. 52

53 ECN ECN Marking ECN-aware NF Manager: Per-NF ECN marking based on Active Queue Management (AQM) policies. Address NF bottleneck across chain of NFs in distinct nodes. 53

54 NFVnice: Building Blocks cgroups Work-conserving and proportional scheduling (within each core) NFVnice Back pressure Chain-aware scheduling; Avoid wasted work (within and across cores) ECN End-2-End Bottleneck/congestion control (across nodes) I/O Mgt. Efficient Disk I/O Mgmt. Library 54

NFVnice extends the Data and Control plane functionalities

55 NFVnice: Implementation OpenNetVM [HMBox 16, NSDI 14] makes use of DPDK to enable fast packet processing. NFVnice extends the Data and Control plane functionalities to facilitate efficient multiplexing and scheduling of NFs on same core. 55

56 Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. Monitor 56

57 Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. Wakeup Thread Wakeup notification to the NFs Timer Management. Monitor 57

58 Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. Wakeup Thread Wakeup notification to the NFs Timer Management. Monitor 58

59 Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. NF threads (libnf) Voluntary yield decisions. Estimate per-packet processing cost. Monitor 59

60 Control Plane NFVnice Node Packet Mempool Packet Shared Memory WATCH LIST RX Wakeup RX TX Mgr Core Core NF Manager Core Core User Space Kernel Space CPU Schedulers Cgroupfs Hardware NIC S0/Cores 2 S1/Cores 2 NIC Resource monitoring and control functions. Monitor Thread periodically (1ms) monitors NF load. computes the cpu share for each core. Monitor Tracks EWMA of NFs Rx queue length and mark ECN. 60

Evaluation Testbed: Hardware: 3 Intel Xeon(R) CPU E5-2697, 28 cores @2.

61 Evaluation Testbed: Hardware: 3 Intel Xeon(R) CPU E5-2697, 28 servers, with dual port 10Gbps DPDK compatible NICs. Software: Linux kernel lowlatency profile. NFVnice: built on top of OpenNetVM. 10Gbps 10Gbps Traffic: Pktgen and Moongen: Line rate traffic (64 byte packets). Iperf: TCP flows. Schemes compared: Native Linux Schedulers with and w/o NFVnice. Different NFs (varying computation costs) and chain configurations. 61

62 Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet 62

63 Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet 63

64 Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet 64

65 Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet 65

66 Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet Significant Reduction in Wasted Work! CFS Normal Scheduler Wasted Work (Packet Drops/sec) Default NFVnice 3.58M 58% 2.02M 50% 11.2K <0.3% 12.3K <0.4% NF

67 Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet Significant Reduction in Wasted Work! CPU Allocation < Computation Cost CFS Normal Scheduler Wasted Work (Packet Drops/sec) NF Runtime (ms) (measured over 2s interval) Default NFVnice Default NFVnice 3.58M 58% 11.2K <0.3% M 50% 12.3K <0.4% NF

Drops/sec) NF Runtime (ms) (measured over 2s interval) Default NFVnice Default

68 Performance: Impact of cgroup weights and Backpressure 120 Cpp Simple Three NF Chain 270 Cpp NF3 550 Cpp Core-1 Core-1 Core-1 Cycles per packet Significant Reduction in Wasted Work! CPU Allocation < Computation Cost CFS Normal Scheduler Wasted Work (Packet Drops/sec) NF Runtime (ms) (measured over 2s interval) Default NFVnice Default NFVnice 3.58M 58% 11.2K <0.3% M 12.3K % <0.4% NFVnice improves NF3throughput - - for all kernel schedulers. 68

69 Efficient Resource (CPU) Utilization Three NF Chain (NF per core) 550 Cpp 2200 Cpp NF Cpp Core-1 Core-2 Core-3 Default Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% NF3 (4500cycles) 0.6Mpps - 100% Aggregate 0.6Mpps - 69

70 Efficient Resource (CPU) Utilization Three NF Chain (NF per core) 550 Cpp 2200 Cpp NF Cpp Core-1 Core-2 Core-3 Lots of Wasted Work! Burning CPU!! Default Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% NF3 (4500cycles) 0.6Mpps - 100% Aggregate 0.6Mpps - 70

71 Efficient Resource (CPU) Utilization Three NF Chain (NF per core) 550 Cpp 2200 Cpp NF Cpp Core-1 Core-2 Core-3 Lots of Wasted Work! Burning CPU!! Default Svc. Rate Drop rate CPU Util NFVnice Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% 0.82Mpps 0.15Mpps 11% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% 0.72Mpps 0.07Mpps 64% NF3 (4500cycles) 0.6Mpps - 100% 0.6Mpps - 100% Aggregate 0.6Mpps - 0.6Mpps - 71

72 Efficient Resource (CPU) Utilization Three NF Chain (NF per core) Lots of Wasted Work! Burning CPU!! 550 Cpp 2200 Cpp NF Cpp Core-1 Core-2 Core-3 Avoid Wasted Work! Optimize CPU utilization. Default Svc. Rate Drop rate CPU Util NFVnice Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% 0.82Mpps 0.15Mpps 11% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% 0.72Mpps 0.07Mpps 64% NF3 (4500cycles) 0.6Mpps - 100% 0.6Mpps - 100% Aggregate 0.6Mpps - 0.6Mpps - 72

73 Efficient Resource (CPU) Utilization Three NF Chain (NF per core) Lots of Wasted Work! Burning CPU!! 550 Cpp 2200 Cpp NF Cpp Core-1 Core-2 Core-3 Achieves Same Throughput Avoid Wasted Work! Optimize CPU utilization. Default Svc. Rate Drop rate CPU Util NFVnice Svc. Rate Drop rate CPU Util (550cycles) 5.95Mpps 4.76Mpps 80% 100% 0.82Mpps 0.15Mpps 11% (2200cycles) 1.18Mpps 0.58Mpps 49% 100% 0.72Mpps 0.07Mpps 64% NF3 (4500cycles) 0.6Mpps - 100% 0.6Mpps - 100% Aggregate 0.6Mpps - 0.6Mpps - 73

74 Efficient Resource (CPU) Utilization Three NF Chain (NF per core) Lots of Wasted Work! Burning CPU!! 550 Cpp 2200 Cpp NF Cpp Core-1 Core-2 Core-3 Achieves Same Throughput Avoid Wasted Work! Optimize CPU utilization. (550cycles) (2200cycles) NF3 (4500cycles) Default Svc. Rate Drop rate CPU Util 5.95Mpps 1.18Mpps 0.6Mpps 4.76Mpps 80% 0.58Mpps 49% Aggregate 0.6Mpps % 100% 100% NFVnice Svc. Rate Drop rate CPU Util 0.82Mpps 0.15Mpps 11% 0.72Mpps 0.07Mpps 64% 0.6Mpps 0.6Mpps % NFVnice avoids wasted work; provides better resource utilization 74

75 Performance + Resource Utilization Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain # Packets processed in Mpps 75

76 Performance + Resource Utilization Inefficient CPU utilization by Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain # Packets processed in Mpps 76

77 Performance + Resource Utilization Inefficient CPU utilization by Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain # Packets processed in Mpps 77

78 Performance + Resource Utilization Inefficient CPU utilization by Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain 2 Judicious utilization of NFs CPU # Packets processed in Mpps 78

79 Performance + Resource Utilization Inefficient CPU utilization by Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 Chain 1 Chain 2 Judicious utilization of NFs CPU # Packets processed in Mpps 79

80 Performance + Resource Utilization Inefficient CPU utilization by Chain 1 Chain 2 Judicious utilization of NFs CPU ~2x Throughput Gain with efficient CPU utilization Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF # Packets processed in Mpps 80

Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF4 0 1

81 Performance + Resource Utilization Inefficient CPU utilization by Chain 1 Chain 2 Judicious utilization of NFs CPU ~2x Throughput Gain with efficient CPU utilization Default- NFVnice- Default- NFVnice- Default-NF3 NFVnice-NF3 Default-NF4 NFVnice-NF Flows get right amount of bandwidth and NF resources # Packets processed in Mpps 81

82 Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Core-1 Core-1 Core-1 82

83 Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Core-1 Core-1 Core-1 83

84 Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Robust across chain diversity Core-1 Core-1 Core-1 84

85 Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Robust across chain diversity Core-1 Core-1 Core-1 85

86 Robustness: Chain Diversity Robust across schedulers (Low) Three NF Chain (Med) NF3(High) Robust across chain diversity Core-1 Core-1 Core-1 Consistently improves throughput regardless of chain characteristics 86

87 NF Processing cost variation Simple Three NF Chain * * NF3* Core-1 Core-1 Core-1 *Variable per packet processing cost [120 to 550 cpp] 87

88 NF Processing cost variation Simple Three NF Chain * * NF3* Core-1 Core-1 Core-1 *Variable per packet processing cost [120 to 550 cpp] NFVnice is resilient to cost variations! 88

89 TCP and UDP Isolation 89

90 TCP and UDP Isolation TCP affected by UDP flows! Wastage of, bandwidth TCP W/O NFVnice TCP With NFVnice UDP W/O NFVnice UDP With NFVnice Throughput in Mbps Time in Seconds 90

91 TCP and UDP Isolation TCP affected by UDP flows! Wastage of, bandwidth TCP W/O NFVnice TCP With NFVnice UDP W/O NFVnice UDP With NFVnice Effectively isolates UDP and TCP flows Throughput in Mbps Time in Seconds 91

92 Conclusion NFVnice enables performance and scale A user space framework complementing the OS CPU schedulers. Brings the best of Hardware packet schedulers and CPU schedulers to the NFV platform. Weight adjustments and Backpressure help get better NFV performance. Improves Fairness and Throughput by being chain-aware. Our work will be open-sourced soon: Get OpenNetVM: Watch out for the link: 92

93 Thank you! CleanSky ITN: A EU FP7 Marie Curie Initial Training Network A Network for the Cloud Computing Eco-System 93

G-NET: Effective GPU Sharing In NFV Systems

G-NET: Effective GPU Sharing In NFV Systems G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science