ResQ: Enabling SLOs in Network Function Virtualization

Size: px

Start display at page:

Download "ResQ: Enabling SLOs in Network Function Virtualization"

Lindsay Sims
6 years ago
Views:

1 ResQ: Enabling SLOs in Network Function Virtualization Amin Tootoonchian* Aurojit Panda Chang Lan Melvin Walls Katerina Argyraki Sylvia Ratnasamy Scott Shenker *Intel Labs UC Berkeley ICSI NYU Nefeli EPFL

2 NFV Builds on Resource Sharing Classic approach Dedicated hardware Individual functions NFV approach Shared hardware Functions in software 2

3 Offering Performance Guarantees Is Challenging Performance depends on neighbors activity. Due to sharing of network, server, and processor resources. Cluster Server QPI Interconnect I/O Controller DDR RAM RAM DDR Shared Cache (LLC) PCI-E PCI-E NIC NIC Memory Controller 3

4 Assumptions on Resource Sharing and Isolation Cluster Server But share on-die uncore resources. QPI Interconnect I/O Controller DDR RAM RAM DDR Shared Cache (LLC) PCI-E PCI-E NIC NIC Memory Controller Traffic isolation through fabric and NIC QoS mechanisms. Independent NFs do not share the same core. 4

5 Does Resource Contention Matter? Solo run Consolidated runs port 1 core 1 port 1 core 1 port 1 core 1 port 1 core 1 port 2 core 2 port 2 core 2 port 2 core 2 port 2 core 2 Traffic Generator port 3 core 3 port 3 core 3 port 3 core 3 port 3 core 3 port n core n port n core n port n core n port n core n Target NF s throughput Target NF s latency T solo L solo T 1 L 1 T 2 L 2 T m L m How far off is min(t & ) and max L & from T +,-, and L +,-,? 5

6 Does Resource Contention Matter? Throughput Degradation Latency Degradation Degradation (%) Small packets Large packets Degradation (%) Small packets Large packets Significant degradation for most NFs. 6

7 Approaches to Offer Performance SLOs Prediction (indirect) Contention-aware placement. Accurate prediction is hard. Optimistic à SLO violation. Conservative à inefficient. Algorithmically complex. No isolation with SLO violations. May lead to neighbor violations. Isolation (direct) Neighbor-indep. placement. No need for prediction. Algorithmically simpler. Isolation despite SLO violations. Never affects neighbors SLOs. Enabler: emergence of hardware resource isolation mechanisms. 7

8 ResQ: SLO Enforcement by Direct Isolation 1. Direct performance isolation 2. Performance SLO enforcement 8

9 Direct Performance Isolation 9

10 Enabler: Hardware Resource Isolation Interconnect I/O Controller Intel Cache Allocation Technology (CAT) for LLC isolation: Classify cores/threads/vms. Shared Cache (LLC) Assign parts of LLC to classes. Memory Controller Is LLC isolation sufficient to ensure NF performance isolation? 10

11 LLC Isolation Is Not Sufficient! Achieves a high level of isolation with small packets. But up to 15% degradation with large packets. Despite small-packet traffic being more resource intensive. Observed high memory utilization with large-packet traffic. But, in general, we expect NFs to generate low memory traffic. Also, NF LLC miss rates with large & small packets are comparable. Root cause: high I/O-related mem. traffic due to LLC misses. 11

12 The Leaky DMA Problem NICs do DMA transfers to part of LLC. Enabled by Intel Data Direct I/O Technology (DDIO). By default, uses 10% of LLC to allocate buffers. Contention for DDIO LLC space. Large packets require 12x more space than small packets. CAT does not apply to I/O. Interconnect I/O Controller Shared Cache (LLC) RX/TX Memory Controller Solution: limit # on-the-fly packets, e.g., buffer sizing. Contention 12

13 Accuracy of ResQ s Isolation Mechanism BEFORE AFTER Degradation (%) Degradation (%) Small packets Large packets Small packets Large packets Throughput Degradation Latency Degradation Degradation (%) Degradation (%) 30 Small packets 25 Large packets LLC isolation and buffer sizing ensures 0 performance isolation with a high degree of accuracy (<3% error) Small packets 20 Large packets

14 Performance SLO Enforcement 14

15 ResQ SLOs Reserved SLOs: static allocation. Input: NF, expected config and traffic profile. Target: throughput, latency. On-demand SLOs: dynamic allocation. Input: NF. Target: latency. 15

16 ResQ Admission Process Profile NFs. Construct a performance model. Fast and scalable. Fast greedy allocation. Deny admission if infeasible. Compute # of instances. Compute core & LLC allocation per instance. 16

17 ResQ Optimal Scheduler MILP formulation for the optimal solution. Slow compared to greedy allocation. Run in the background (i.e., not in the admission path). Rearrange NFs if necessary. Practical for small clusters. Takes seconds to minutes. Larger clusters: divide into smaller ones with independent solvers. 17

18 Resource Efficiency # Servers Insensitive Combination Sensitive Highly inefficient (conservative predictor) Only up to 18.5% worse than optimal Cost of hard partitioning is <3% compared to greedy ResQ Optimal ResQ Greedy Dynamic (no isolation) Prediction [1] (no isolation) [1] Mihai Dobrescu, Katerina Argyraki, and Sylvia Ratnasamy. Toward Predictable Performance in Software Packet-Processing Platforms. NSDI

19 Conclusion ResQ achieves better accuracy & efficiency than prior work. Despite using simple heuristics and algorithms. Enabled by direct performance isolation. Plenty of room for improvement with software mechanisms. Code available at Useful for general NFV experimentation. 19

TOWARD PREDICTABLE PERFORMANCE IN SOFTWARE PACKET-PROCESSING PLATFORMS. Mihai Dobrescu, EPFL Katerina Argyraki, EPFL Sylvia Ratnasamy, UC Berkeley

TOWARD PREDICTABLE PERFORMANCE IN SOFTWARE PACKET-PROCESSING PLATFORMS Mihai Dobrescu, EPFL Katerina Argyraki, EPFL Sylvia Ratnasamy, UC Berkeley Programmable Networks 2 Industry/research community efforts