Traffic Engineering with Forward Fault Correction

Similar documents
SWAN: Software-driven wide area network. Ratul Mahajan

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Topic 6: SDN in practice: Microsoft's SWAN. Student: Miladinovic Djordje Date:

SCALING SOFTWARE DEFINED NETWORKS. Chengyu Fan (edited by Lorenzo De Carli)

DCRoute: Speeding up Inter-Datacenter Traffic Allocation while Guaranteeing Deadlines

Application of SDN: Load Balancing & Traffic Engineering

Efficient Point to Multipoint Transfers Across Datacenters

Lecture 14 SDN and NFV. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

MED: The Monitor-Emulator-Debugger for Software-Defined Networks

Efficient Point to Multipoint Transfers Across Datacenters

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers

Pricing Intra-Datacenter Networks with

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers

RPT: Re-architecting Loss Protection for Content-Aware Networks

Exercises TCP/IP Networking With Solutions

Dynamic Scheduling of Network Updates (Extended version)

Distributed Denial of Service

Practical, Real-time Centralized Control for CDN-based Live Video Delivery

A Strategy of CDN Traffic Optimization Based on the Technology of SDN

Joint Allocation and Scheduling of Network Resource for Multiple Control Applications in SDN

Software-Defined Networking (SDN) Overview

CS268: Beyond TCP Congestion Control

From ATM to IP and back again: the label switched path to the converged Internet, or another blind alley?

IP Bandwidth on Demand and Traffic Engineering via Multi-Layer Transport Networks. Dr. Greg M. Bernstein Grotto Networking 2004

3. Quality of Service

Forwarding Architecture

Computer Networks CS 552

Computer Networks CS 552

QoS Services with Dynamic Packet State

Robust validation of network designs under uncertain demands and failures

Data Center TCP (DCTCP)

A Network-State Management Service. Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft

Minimizing Collateral Damage by Proactive Surge Protection

What is SDN, Current SDN projects and future of SDN VAHID NAZAKTABAR

Progress Report No. 15. Shared Segments Protection

DevoFlow: Scaling Flow Management for High Performance Networks

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan

On low-latency-capable topologies, and their impact on the design of intra-domain routing

Design and Implementation of Virtual TAP for Software-Defined Networks

Understanding BGP Miscounfiguration

From Routing to Traffic Engineering

! Network bandwidth shared by all users! Given routing, how to allocate bandwidth. " efficiency " fairness " stability. !

Comparison of Shaping and Buffering for Video Transmission

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

SecureNoC. (Team Colonel Panic): Xiaoming Guo, Sijia He, Amlan Nayak, Jay Zhang. December 3, 2015

EP2210 Scheduling. Lecture material:

Resilient IP Backbones. Debanjan Saha Tellium, Inc.

Casa Systems Axyom Multiservice Router

A Scalable, Commodity Data Center Network Architecture

Competitive and Deterministic Embeddings of Virtual Networks

Managing Network Bandwidth to Maximize Performance

Detailed diagnosis in enterprise networks. Network diagnosis

Unit 2 Packet Switching Networks - II

Congestion Control In the Network

Advanced Computer Networks

Routing Metric. ARPANET Routing Algorithms. Problem with D-SPF. Advanced Computer Networks

Converged Communication Networks

68% 63% 50% 25% 24% 20% 17% Credit Theft. DDoS. Web Fraud. Cross-site Scripting. SQL Injection. Clickjack. Cross-site Request Forgery.

Radon: Network QoS. Andrew Shewmaker Carlos Maltzahn Katia Obraczka Scott Brandt Ivo Jimenez. UC Santa Cruz Graduate Studies in Computer Science

Outline. EL736 Communications Networks II: Design and Algorithms. Class3: Network Design Modelling Yong Liu 09/19/2006

15-744: Computer Networking. Middleboxes and NFV

SCREAM: Sketch Resource Allocation for Software-defined Measurement

Top-Down Network Design

Data Center Networks and Switching and Queueing and Covert Timing Channels

T9: SDN and Flow Management: DevoFlow

Application Provisioning in Fog Computingenabled Internet-of-Things: A Network Perspective

Multi-resource Energy-efficient Routing in Cloud Data Centers with Network-as-a-Service

SD-WAN orchestrated by Amdocs

One More Bit Is Enough

ECE 333: Introduction to Communication Networks Fall 2001

Consistent SDN Flow Migration aided by Optical Circuit Switching. Rafael Lourenço December 2nd, 2016

AS the increasing number of smart devices and various

Inter-Data-Center Network Traffic Prediction with Elephant Flows

Knowledge-Defined Network Orchestration in a Hybrid Optical/Electrical Datacenter Network

Software Defined Networking

Data Center TCP (DCTCP)

Fault-Aware Flow Control and Multi-path Routing in Wireless Sensor Networks

Lecture 14: Performance Architecture

RCD: Rapid Close to Deadline Scheduling for Datacenter Networks

Some Musings on OpenFlow and SDN for Enterprise Networks. David Meyer Open Networking Summit October 18-19, 2011

Designing and Provisioning IP Contribution Networks

Lecture 21. Reminders: Homework 6 due today, Programming Project 4 due on Thursday Questions? Current event: BGP router glitch on Nov.

Problems with IntServ. EECS 122: Introduction to Computer Networks Differentiated Services (DiffServ) DiffServ (cont d)

Joint Backup Capacity Allocation and Embedding for Survivable Virtual Networks

Topologies. Maurizio Palesi. Maurizio Palesi 1

Casa Systems Axyom Multiservice Router

Congestion Control for High Bandwidth-delay Product Networks. Dina Katabi, Mark Handley, Charlie Rohrs

Revisiting Network Support for RDMA

Internet Traffic Characteristics. How to take care of the Bursty IP traffic in Optical Networks

Universal Packet Scheduling. Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley

SPARTA: Scalable Per-Address RouTing Architecture

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks

Building Efficient and Reliable Software-Defined Networks. Naga Katta

Hardware Evolution in Data Centers

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF

UnivMon: Software-defined Monitoring with Universal Sketch

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552)

DiffServ Architecture: Impact of scheduling on QoS

Transcription:

Traffic Engineering with Forward Fault Correction Harry Liu Microsoft Research 06/02/2016 Joint work with Ratul Mahajan, Srikanth Kandula, Ming Zhang and David Gelernter 1

Cloud services require large network capacity Cloud Applications Growing traffic Cloud Networks Expensive (e.g. cost of WAN: $100M/year) 2

TE is critical to effectively utilizing networks Traffic Engineering (centralized & SDN-Based) WAN Network Datacenter Network Microsoft SWAN (SIGCOMM 13) Google B4 (SIGCOMM 13) Devoflow (SIGCOMM 11) MicroTE (CoNEXT 11) 3

Centralized TE is the key to network efficiency Sub-optimal resource allocation based on local view & control. Demand=10 Demand=10 1) how much traffic to admit 2) how to route Link Cap: 10 10 5 1 4 2 5 3 10 Requirement: path length 2 hops Total throughput: 15 Optimal resource allocation based on global view & control. 10 Link Cap: 10 2 1 4 10 10 Total throughput: 20 TE controller 3 4

But, centralized TE is also vulnerable to faults TE controller Network view (e.g. topo, cap, traffic) Frequent updates for high utilization (e.g. per 5min) Network configuration (e.g. routes, rate limits) Data-plane faults Network Control-plane faults 5

Data plane faults Link and switch failures Rescaling: Sending traffic proportionally to residual paths s2 s4 (10) s2 link 7 failure s2 link failure 3 10 s1 s3 s4 (10) 3 s3 7 Link Capacity: 10 s4 s1 congestion 3 s3 7 s4 6

Control plane faults Failures or long delays to configure a network device TE Controller Switch TE configurations RPC failure Firmware bugs Overloaded CPU Memory shortage Control plane faults can also result in congestion. 7

The TE controllability is undermined by faults Network view Inaccuracy TE controller Network configuration Incompleteness Data-plane faults Network Control-plane faults 8

Control and data plane faults in practice In a production WAN network (200+ routers, 6000+ links): Faults are common. Faults cause severe congestion. Data plane: fault rate = 25% per 5 minutes. Control plane: fault rate = 0.1% -- 1% per TE update. 9

State of the art for handling faults Heavy over-provisioning: Big loss in throughput Reactive handling of faults: Control plane faults: retry Data plane faults: re-compute TE and update networks Cannot prevent congestion Slow (seconds -- minutes) Blocked by control plane faults 10

How about handling faults proactively? TE Algorithm making it robust Network not robust enough 11

Forward fault correction (FFC) in TE [Bad News] Individual faults are unpredictable. [Good News] Simultaneous #faults is small. FEC guarantees no information loss under up to k arbitrary packet drops. Packet loss with careful data encoding Network faults FFC guarantees no congestion under up to k arbitrary faults. with careful traffic distribution 12

Example: FFC for link failures s2 s4 (9) s1 s3 s4 (9) Link Capacity: 10 s2 1 1 s3 K=1 (FFC) 8 8 s4 Failure Cases s2 link failure s1 1 s3 9 8 s4 s1 s2 link failure s3 9 9 s4 s1 s2 9 1 s3 link failure 8 s4 13

Trade-off: network efficiency v.s. robustness Non-FFC (Throughput: 30) FFC (k=1) (Throughput: 18) Non-FFC (Throughput: 18) s1 1 8 s1 s2 15 5 s3 link failure 10 There exists a trade-off between throughput and robustness Achieving the s1 optimal throughput s4 with FFC guarantee s1 s2 5 5 s3 s2 1 s3 s2 4 4 s3 10 10 8 5 5 s4 s4 FFC does not always sacrifice efficiency for robustness s1 s2 9 4 s3 s4 link failure 5 s4 14

Systematically realizing FFC in TE Formulation: How to merge FFC into existing TE framework? Computation: How to find FFC-TE efficiently? 15

Basic TE linear programming formulations TE decisions: TE objective: Sizes of flows Traffic on paths Maximizing throughput LP formulations max. b f l f,t f b f Basic TE constraints: Deliver all granted flows No overloaded link s.t. f: t l f,t b f e: f t e l f,t c e FFC constraints: No overloaded link up to k c control plane faults k e link failures k v switch failures 16

Formulating data-plane FFC flow size: s f S bw allocation: a 1 a 2 path-1 Paths are link-disjoint. path-2 D Fault on path-1: s f a 2 +a 3 FFC k=1 Fault on path-2: s f a 1 +a 3 Fault on path-3: s f a 1 +a 2 a 3 3 2 path-3 Lemma: FFC is achieved when path-i s weight is a i /a 1 +a 2 +a 3 17

An efficient and precise solution to FFC k-sum linear constraint group (k-sum group): Given n paths and A = {a 1, a 2,, a n }, FFC requires that the sum of arbitrary n-k elements in A is flow size O( n k ) FFC-TE LP-formulation: TE Objective Basic TE Constraints k-sum group-1 k-sum group-n FFC Constraints (too many) Lossless compression of a k-sum group: O( n k ) O(kn) O(n) bubble sorting network (SIGCOMM 2014) strong duality (MSR TR 2016) http://www.hongqiangliu.com/publications.html 18

FFC extensions Differential protection for different traffic priorities Minimizing congestion risks without rate limiters Control plane faults on rate limiters Uncertainty in current TE Different TE objectives (e.g. max-min fairness) 19

Implementation & evaluation highlights Testbed experiment (8 switches & 30 servers) FFC can be implemented in commodity switches FFC has no data loss due to congestion under faults Large-scale simulation A WAN network with O(100) switches and O(1000) links One-week traffic trace Fault injection according to real failure trace Results: with negligible throughput loss, FFC can reduce data loss by a factor of 7-130 in well-provisioned networks data loss of high priority traffic to almost zero in well-utilized networks 20

Conclusion and future work Network view SDN Controller Network configuration FFC Network Properties: High throughput No congestion Security Availability Connectivity Network Network Faults: Data-plane Control-plane Misconfigurations Attacks Traffic spikes 21

Q&A 22