Pricing Intra-Datacenter Networks with

Similar documents
Pricing Intra-Datacenter Networks with Over- Committed Bandwidth Guarantee

Congestion Control In the Network

Design and Implementation of Virtual TAP for Software-Defined Networks

Falloc: Fair Network Bandwidth Allocation in IaaS Datacenters via a Bargaining Game Approach

Fair Network Bandwidth Allocation in IaaS Datacenters via a Cooperative Game Approach

Hardware Evolution in Data Centers

Low-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015

Communication using Multiple Wireless Interfaces

Models. Motivation Timing Diagrams Metrics Evaluation Techniques. TOC Models

End to End SLA for Enterprise Multi-Tenant Applications

CS644 Advanced Networks

Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen

Competitive and Deterministic Embeddings of Virtual Networks

Network Support for Multimedia

Congestion Control for High Bandwidth-delay Product Networks. Dina Katabi, Mark Handley, Charlie Rohrs

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

Network Design Considerations for Grid Computing

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter. Glenn Judd Morgan Stanley

Traffic Engineering with Forward Fault Correction

Congestion Control for High Bandwidth-delay Product Networks

Chapter 4 Network Layer: The Data Plane

CS 268: Lecture 7 (Beyond TCP Congestion Control)

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Pocket: Elastic Ephemeral Storage for Serverless Analytics

Performance investigation and comparison between virtual networks and physical networks based on Sea-Cloud Innovation Environment

Delay Controlled Elephant Flow Rerouting in Software Defined Network

TDDD82 Secure Mobile Systems Lecture 6: Quality of Service

Topic 4a Router Operation and Scheduling. Ch4: Network Layer: The Data Plane. Computer Networking: A Top Down Approach

Giza: Erasure Coding Objects across Global Data Centers

Episode 5. Scheduling and Traffic Management

Lecture 14: Congestion Control"

of-service Support on the Internet

Knowledge-Defined Network Orchestration in a Hybrid Optical/Electrical Datacenter Network

Assignment 7: TCP and Congestion Control Due the week of October 29/30, 2015

SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics

Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud

CS268: Beyond TCP Congestion Control

Fairness, Queue Management, and QoS

Data Center Network Topologies II

Radon: Network QoS. Andrew Shewmaker Carlos Maltzahn Katia Obraczka Scott Brandt Ivo Jimenez. UC Santa Cruz Graduate Studies in Computer Science

Homework 1. Question 1 - Layering. CSCI 1680 Computer Networks Fonseca

GPU Consolidation for Cloud Games: Are We There Yet?

Measurement Based Fair Queuing for Allocating Bandwidth to Virtual Machines

Towards a Robust Protocol Stack for Diverse Wireless Networks Arun Venkataramani

Application of SDN: Load Balancing & Traffic Engineering

CSC 4900 Computer Networks: Network Layer

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

different problems from other networks ITU-T specified restricted initial set Limited number of overhead bits ATM forum Traffic Management

A Network-aware Scheduler in Data-parallel Clusters for High Performance

Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures. Dimitra Simeonidou

Performance Modeling

A Talari Networks White Paper. Turbo Charging WAN Optimization with WAN Virtualization. A Talari White Paper

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007

CSE 123b Communications Software

Activity-Based Congestion Management for Fair Bandwidth Sharing in Trusted Packet Networks

Cloud e Datacenter Networking

Wireless Networks (CSC-7602) Lecture 8 (15 Oct. 2007)

Packer: Minimizing Multi-Resource Fragmentation and Performance Interference in Datacenters

Implementation of Software-based EPON-OLT and Performance Evaluation

DiffServ Architecture: Impact of scheduling on QoS

Cloud Control with Distributed Rate Limiting. Raghaven et all Presented by: Brian Card CS Fall Kinicki

Core-Stateless Fair Queueing: Achieving Approximately Fair Bandwidth Allocations in High Speed Networks. Congestion Control in Today s Internet

CSCD 433/533 Advanced Networks Spring Lecture 22 Quality of Service

Practical MU-MIMO User Selection on ac Commodity Networks

Preserving I/O Prioritization in Virtualized OSes

Better Never than Late: Meeting Deadlines in Datacenter Networks

Lecture 24: Scheduling and QoS

Packet-Level Diversity From Theory to Practice: An based Experimental Investigation

eba: EfficientBandwidthGuaranteeUnderTraffic Variability in Datacenters

EP2210 Scheduling. Lecture material:

Promoting the Use of End-to-End Congestion Control in the Internet

Performance Evaluation of Scheduling Mechanisms for Broadband Networks

PARDA: Proportional Allocation of Resources for Distributed Storage Access

Circuit Switching Under the Radar with REACToR

IP Network Emulation

TCP Westwood: Efficient Transport for High-speed wired/wireless Networks

Quality-Assured Cloud Bandwidth Auto-Scaling for Video-on-Demand Applications

NaaS Network-as-a-Service in the Cloud

NET ID. CS519, Prelim (March 17, 2004) NAME: You have 50 minutes to complete the test. 1/17

Multi-class Applications for Parallel Usage of a Guaranteed Rate and a Scavenger Service

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers

Streaming Video and TCP-Friendly Congestion Control

New Bandwidth Sharing and Pricing Policies to Achieve a Win-Win Situation for Cloud Provider and Tenants

Computer Networking. Queue Management and Quality of Service (QOS)

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Resource Control and Reservation

RSVP 1. Resource Control and Reservation

Hybrid Control and Switched Systems. Lecture #17 Hybrid Systems Modeling of Communication Networks

Module objectives. Integrated services. Support for real-time applications. Real-time flows and the current Internet protocols

Mohammad Hossein Manshaei 1393

Real-Time Internet of Things

New Bandwidth Sharing and Pricing Policies to Achieve a Win-Win Situation for Cloud Provider and Tenants

OASIS: Self-tuning Storage for Applications

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

Quality of Service (QoS)

Data Center TCP (DCTCP)

Unit 2 Packet Switching Networks - II

Accelerating 4G Network Performance

Enabling Efficient and Scalable Zero-Trust Security

Transcription:

Pricing Intra-Datacenter Networks with Over-Committed Bandwidth Guarantee Jian Guo 1, Fangming Liu 1, Tao Wang 1, and John C.S. Lui 2 1 Cloud Datacenter & Green Computing/Communications Research Group 1 Huazhong University of Science and Technology, China 2 The Chinese University of Hong Kong USENIX ATC 2017 @ Santa Clara, CA, USA 1

Talk outline How to enable (quantified intra-dcn bandwidth performance + pricing) for VMs in IaaS clouds? SoftBW: Motivation Idea Solution Performance 2

Talk outline 1. Motivation What kind of bandwidth performance & pricing do current clouds provide? 3

Performance metrics in IaaS clouds Performance vcpu, Memory, GPU, FPGA vcpu, Memory, GPU vcpu, Memory CPU Number of cores, CPU model (Frequency, Architecture) Number of cores, CPU model Number of cores, CPU model Memory GB, DDR 3/4 GB GB, DDR 3/4 Network Low to moderate / Moderate / High N/A N/A No quantitative network performance No clear definition on VM bandwidth performance and pricing in today s top IaaS clouds 4

Measurements in clouds Different clouds Same price: 16x difference in performance Different VMs Performance: Cheap VM > Expensive VM Different time Varying and highly unpredictable Price-performance anomaly 5

Talk outline 2. Idea Can we guarantee & price bandwidth performance? 6

When considering performance & pricing Benefit both providers & users Over-commitment (OC) A economical and practical solution for bandwidth in cloud-scale DCN Why Rational Opportunity: 99% links < 10% loaded in cloud-scale DCN (SIGCOMM 15) Worst case performance guaranteed: N OC means 1/N guarantee Though, the bandwidth guarantee may fail how often tenants may not obtain their paid bandwidth? 7

Analysis on failure rate of bandwidth guarantee at cloud-scale Modeling with traffic trace (IMC 10) n VMs, exponential distribution P n = e αc n i=1 αc i 1 i 1! Low failure rate (OC tolerance) If utilization = 10% Then, 4x over-commitment (OC): < 5% failure (Sum of guaranteed BW/physical BW at a server) Providers: increase network utilization & revenue with OC Tenants: how to guarantee their benefits (performance, fairness)? 8

Share bandwidth with over-commitment Fulfillment ratio: F= rate/(bandwidth guarantee) Metric for performance guarantee meet traffic demand while maintaining fairness of fulfillment Quota for pricing measure fulfillment per billing cycle Abstraction Packet arrival VM1 0.9 VM2 1.1 VM3 1 Time Billing cycle BW Fulfillment for scheduling Fulfillment for charging Co-design: price-performance consistency 9

Talk outline 3. Solution: SoftBW How to achieve bandwidth guarantee and usage-based pricing under over-commitment? 10

Centralized Controller SoftBW overview Pricing on Fulfillment Traffic Monitor SoftBW Master Enforcing Requirements Fulfillment Packet Estimation Scheduling SoftBW Agent Control Plane Data Plane vswitch Design goals Price-performance consistency Over commitment tolerance Short flow friendly Data plane Scheduling: bandwidth guarantee with OC Work-conserving VM VM... VM VM Host Server Servers Control plane Pricing: usage-based charging Enable provider to take advantage of low network utilization to oversell bandwidth 11

Fulfillment-based scheduling Round-robin for each VM queue Estimation of fulfillment: F> or <1 Scheduling of packets: current time vs. time to send (tts) Queues Time Scheduler VM1 VM2 VM3 tts1 tts2 tts3 Time vs. tts 12

Estimation of fulfillment Update: after transmitting each packet Fulfillment: F=rate/B, rate=packet size/time Expected transmission time: P size /B Inter-departure time: F<1 (delayed) τ>0; F>1 τ= τ P size /B<0 Real time: τ τ Psize/B p 3 p 2 p 2 p 1 Time Packet delayed Rate=B, F=1 13

Estimation of fulfillment Maintain (update): τ τ + τ, as accumulation from many packets τ 0: bandwidth guarantee not satisfied, tts = 0: allow to send τ < 0: rate exceeds bandwidth guarantee, tts = time + P/B τ from positive to negative: from unsatisfied to satisfied, tts = time + P/B - τ τ Psize/B τ p 3 p 2 τ >0, F<1 p 2 p 2 τ <0, F>1 p 1 Time 14

Scheduling of packets tts = 0: not satisfied, allow to send 0 < tts < current time: missed the expected transmission time tts > current time: send if there is residual bandwidth Queues Time Scheduler VM1 VM2 VM3 0 0~t >t Time vs. tts 15

Pricing model & Performance metrics (Open to customized designs, based on general rules) Differentiated performance metrics: different applications Usage-based charging: performance-price consistency e.g., real-time jobs, deadline jobs, delay-tolerant background backup Non-decreasing pricing function: true requirement declaration Guarantee Performance Price (e.g.) Strict Bandwidth B (< physical bandwidth C) r*(1+b/c)p0 Dynamic Data (traffic) size S, deadline time T r*(1+s/tc)p1 Fairness VM-level fairness r*(r/c)p1 Best effort No bandwidth guarantee Free e.g., for strict, if actual rate r > B, then (r-b) is charged as fair share May fail under OC but pay less (price-performance consistency): see example in page 24 P0 is unit price, P1 = βp0, β<1 to encourage tenants to use dynamic guarantee for reducing guarantee failure 16

Centralized Controller SoftBW implementation Pricing on Fulfillment Traffic Monitor VM SoftBW Master Fulfillment Packet Estimation Scheduling SoftBW Agent VM Enforcing Requirements... Host Server VM Control Plane Data Plane vswitch VM A SDN based solution Pricing Control plane application Centralized control Opendaylight platform Scheduling (bandwidth allocation) Data plane function Distributed scheduling on each server Open vswitch 17

Talk outline 4. Evaluation: questions How efficiently does SoftBW allocate bandwidth under over-commitment? What is the impact of over-commitment on the network utilization and guarantee failure? 18

Evaluation setup Comparison: rate-limit based bandwidth allocation ElasticSwitch, Best effort (no performance guarantee) Testbed: performance of SoftBW 14 servers, KVM and Open vswitch, 1Gbps NIC Simulator: impact of over-commitment 2,000 servers, each server with 4 VMs, 1s interval Traffic trace: Based on the distribution in existing measurement work (IMC 11, SIGCOMM 15) 19

Performance: bandwidth allocation SoftBW guarantees fairness under over-commitment 2 co-located VMs: each with 450 Mbps BW guarantee physical bandwidth: 1 Gpbs 3 co-located VMs: each 450 Mbps BW guarantee physical bandwidth: 1 Gpbs Sufficient bandwidth: achieve Not enough bandwidth: ES, best bandwidth guarantee effort fail to achieve fairness 20

Performance: short flows Quickly obtain the bandwidth: improve short flow performance (completion time) by 2.8x - 4.5x ES: completion time less than the update interval (e.g. 50ms) of rate-limit based solution hurt short flows SoftBW: packets in a queue can be scheduled in each round short flows friendly 21

Over-commitment: utilization Improve about 3.9x bandwidth utilization with 4x OC Overall bandwidth utilization Traffic without OC: 9.5% bandwidth utilization 4x OC: 37.4% utilization 22

Over-commitment: failure rate Dynamic guarantee: only 1.55% failure rate with 4x OC 98.4% finish in time 99.9% less than 1.4x deadline Worst performance 2x deadline 23

Summary: A feasibility study for How to enable (quantified bandwidth performance + pricing) for cloud VMs Beneficial for both providers and tenants SoftBW: 1. Use over commitment 2. Scheduling + pricing = price-performance consistency 24

Thank you! Cloud Datacenter & Green Computing/Communications research group Huazhong University of Science & Technology Fangming Liu: http://grid.hust.edu.cn/fmliu/ 25

Performance: overhead SoftBW involves very little overhead Latency: no obvious increase as compared with 350 µs RTT CPU: 10Gbps transmission at MTU packet size costs 5.1% 26

Performance: fast convergence SoftBW converges in about 20 milliseconds TCP vs. UDP: the rate of VMs Throughput is stable 27

Over-commitment: failure rate Strict guarantee: only 8.3% failure rate with 4x OC 91.7% no failure 99.9% less than 5s failure (total simulation time: 600s) Worst performance 10s failure 28

Summary: position of SoftBW in the literature Bandwidth Allocation Performance Model + VM Placement + Rate Enforcement Hose model, VOC, Pipe model, TAG model E.g., Oktpus, Proteus, CloudMirror Reservation E.g., Oktpus, static, none work-conserving, not efficient Dynamic Rate-limit E.g., ElasticSwtich, inefficient for short flows, unavailable under over commitment Packet Scheduling SoftBW, pricing and guarantee for bandwidth over commitment 29

Summary: position of SoftBW 30

Questions Q: The novelty of our paper as compared to existing work. The contribution of this work. A: We focus on addressing price-performance anomaly and over commitment of bandwidth guarantee. We have our own contributions. First, existing work does not consider these two goals. Mostly, they focus on the efficiency of bandwidth allocation (also, they assume not over-subscribed access BW). Second, as shown in our experiments, existing rate-limit based solution in data center bandwidth allocation cannot work well for fairness and short flows, under over commitment. Third, they do not provide a pricing strategy (via a coherent fulfillment metric to co-design scheduling & pricing) to guarantee price-performance consistency. 31

Assumptions & focus Assumes the datacenter fabric to be a non-blocking switch [10, 11, 7], and our main focus is to schedule the traffic at the virtual ports connected to VMs Our bandwidth allocation focuses on end-based rate control The choice comes from the fact that today s datacenters see rapid advances in achieving full bisection bandwidth [8, 9], and the providers have a growing concern about the over committed access bandwidth on each server rather than the aggregation and core level. By leveraging the software virtual switch at each server, the cost of implementation can be reduced and the scale of rate control is limited to the number of VMs on each server. 32

Questions Q1: Can you explain true requirement? Q2: Why you use non-decreasing pricing function? A: For example, when transmitting 1 Gb data, using 100 Mbps bandwidth will cost 10 seconds, while using 200 Mbps bandwidth only costs 5 seconds. Both situations cost 1000P, where P denotes the price of using 1 Mbps for 1 seconds. Hence, to keep performance-price consistency, the unit price of higher bandwidth guarantee should also be higher. In this way, tenants cannot declare higher bandwidth than their requirements to achieve higher performance under the same price. We call this true requirement. That s why we use non-decreasing pricing function. 33

Questions Q: how bandwidth will be charged when failed or over-fulfilled, for example, exceeding the strict bandwidth guarantee or missing the deadline? A: For strict guarantee, traffic that exceeds bandwidth guarantee will be charged the same as the pricing of fairness guarantee, since it only gets a fair sharing. For dynamic guarantee, traffic that exceeding the deadline will not be charged, since the provider does not realize their SLA. All these strategies are used to guarantee price-performance consistency. 34

Questions Q: How you realize dynamic guarantee? Why it is called dynamic guarantee? A: For dynamic guarantee, the underlying implementation is similar with strict guarantee. But we update the guaranteed bandwidth in each billing cycle, by dividing the residual data S with residual transmission time t. The guarantee is dynamically adjusted according to the available bandwidth. If there is residual bandwidth on the server, the VM can utilize it and reduce the guarantee in the next update. As a result, the total bandwidth guarantee on a server is reduced, and the probability of guarantee failure also decreases. If the bandwidth is not guaranteed for some periods, the VM can increase the guarantee and still finish the transmission within the expected time. 35

Questions Q: How does SoftBW interact with the underlying TCP protocol? Will it make TCP unstable? A: As we have shown in the experiments, the overhead on RTT is about 1.9us. This is negligible as compared with the round-trip-time between VMs. It is even less than the jitter of RTT in real systems. So, it will not interfere the underlying TCP flows. Also, in our experiment on convergence, the throughput of the TCP flow is very stable. 36

Questions Q: Do you have an indication that increasing the utilization in the network by 3x would not first hit the limits of other resources, e.g., CPU or memory? A: In real-world situation, this does depend on applications. This may happen when the application is CPU or memory intensive. However, such applications are beyond our discussion. For applications using network resource, filling up 10 Gbps bandwidth needs only one CPU core (with 1500B MTU size). In such application scenario, SoftBW will benefit the network utilization for the providers. Hence, in our simulation, we only consider the bandwidth resource. 37