1 Achieving Efficient Bandwidth Utilization in Wide-Area Networks While Minimizing State Changes
2 WAN Traffic Engineering Maintaining private WAN infrastructure is expensive Must balance latency-sensitive customer traffic with high-volume, operational traffic Many competing goals: Distribute load Improve utilization Operate more efficiently
3 Example Network West East
3 Example Network Sporadic shortcuts West East
3 Example Network Sporadic shortcuts Sparse bisection West East
4 Multi-Commodity Flow 1. Continuously monitor flows, build a traffic matrix 2. Encode as constraint problem 3. Map solution to physical paths 4. Update forwarding tables 5. Repeat West
4 Multi-Commodity Flow 1. Continuously monitor flows, build a traffic matrix 2. Encode as constraint problem 3. Map solution to physical paths 4. Update forwarding tables 5. Repeat West
5 Outline of This Talk Motivation Case for Randomized Routing Randomized Routing Evaluation Conclusions
6 The Case for Randomized Routing
7 Constantly Changing Routing State With optimal MCF, forwarding state must constantly be changed.
8 Increasing Number of Unique Paths With optimal MCF, the number of paths used increases over time.
9 Expensive Optimization Problem Solving MCF is computationally expensive.
Randomized Routing 10
11 Equal-Cost Multi-Path Routing 1. Pre-compute a set of least-cost paths 2. Identify flows by hashing packet header fields 3. Randomly forward along least cost paths West
11 Equal-Cost Multi-Path Routing 1. Pre-compute a set of least-cost paths 2. Identify flows by hashing packet header fields 3. Randomly forward along least cost paths West
12 Valiant Load Balancing 1. Choose a random intermediate node 2. Route from source to intermediate node 3. Route from intermediate node to destination West
12 Valiant Load Balancing 1. Choose a random intermediate node 2. Route from source to intermediate node 3. Route from intermediate node to destination West
13 Valiant Load Balancing West East
13 Valiant Load Balancing West East
14 Randomized Routing Trees A routing tree is a logical tree topology whose leaves correspond to physical topology A randomized routing tree is probability distribution over routing trees West Over-provisioning a network by Räcke's factor of O(log n) is usually not acceptable
15 Computing Low-Stretch Trees Stretch is the ratio of the weighted length of the path in logical tree over the weighted length of the path in the physical graph Key idea: reduce the oblivious routing problem to the low-stretch routing tree problem Fakcharoenphol, Rao, and Talwar (FRT) algorithm finds trees with average stretch O(log n)
16 Constructing RRTs Räcke s algorithm iteratively constructs a sequence of instances of the low-stretch tree problem At each iteration, it penalizes edges that have been heavily utilized in previous trees Balances load among edges to ensure O(log n) congestion ratio regardless of the traffic matrix (i.e. it is oblivious)
17 Rate Adaptation Congestion and churn minimization are conflicting goals Semi-oblivious routing is a potentially attractive solution Fixed set of paths, dynamically adjusted sending rates Con: Hajiaghayi et al. proved Ω(log(n)/log (log(n))) congestion Pro: Realistic workloads are different from worst-case Requires additional infrastructure (e.g., monitoring, updating weights)
18 Kulfi A framework for evaluating different traffic engineering policies Abstraction: users write modules that can modify path selection algorithm, or probability distribution over set of paths Example modules include: MCF, Raeke, AK, VLB, ECMP, and more. https://github.com/merlin-lang/kulfi
Evaluation 19
20 Implementation Linux End Host Linux Kernel User- Space Agent Netfilter Module Traffic matrix + Path-to-tag mapping OpenFlow rules VLAN- tagged packets SDN Controller SDN Switch Traffic statistics
20 Implementation Linux End Host Linux Kernel User- Space Agent Netfilter Module Traffic matrix + Path-to-tag mapping OpenFlow rules VLAN- tagged packets SDN Controller SDN Switch Historical Data Traffic statistics Traffic statistics
21 Oblivious Routing on Facebook Traffic Oblivious routing performs much better in practice than theory worst-case predictions
21 Oblivious Routing on Facebook Traffic Constant factor, not O (log n) Oblivious routing performs much better in practice than theory worst-case predictions
22 Oblivious Congestion on Facebook Network On a majority of links, the flow rate demanded by oblivious routing is at most 2x the link capacity
23 Abilene Network Experiment S10 S8 S11 S4 S5 S7 S3 S6 S2 S1 S12 S9 Emulated Abilene topology in hardware test bed Used real-world and worst case traffic scenarios Compared shortest-path, ECMP, MCF, oblivious, and semi-oblivious
23 Abilene Network Experiment S10 S8 S11 Artificial traffic S4 S5 S7 S3 S6 S2 S1 S12 S9 Emulated Abilene topology in hardware test bed Used real-world and worst case traffic scenarios Compared shortest-path, ECMP, MCF, oblivious, and semi-oblivious
24 Abilene Network, Various Workloads 1 Artificial Traffic 1 Abilene Observed Traffic Link congestion 0.8 0.6 0.4 0.2 0 SPF max ECMP max Obliv max Semi Obliv max MCF max 0 20 40 60 80 100 120 140 160 180 Time (minutes) SPF median ECMP median Obliv median Semi Obliv median MCF median Link congestion 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 180 Time (minutes) 1 Abilene Gravity Traffic 1 Abilene Gravity + Artificial Traffic Link congestion 0.8 0.6 0.4 0.2 Link congestion 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 120 140 160 180 Time (minutes) 0 0 20 40 60 80 100 120 140 160 180 Time (minutes)
25 Abilene Summary SPF and ECMP achieve slightly better median congestion MCF and semi-oblivious achieve better maximum congestion Max congestion with oblivious was never worse than 50% greater than optimal MCF, despite theory prediction No significant difference between semi-oblivious and optimal MCF, despite prior theory results
26 Conclusions Oblivious routing performs much better in practice than expected, avoids problems associated with churn, and load-balances better Semi-oblivious routing can provide close to optimal bandwidth utilization on real workloads Randomization can dramatically simplify traffic engineering infrastructure, balance competing requirements for traffic engineering goals
27