Application of SDN: Load Balancing & Traffic Engineering
Outline 1 OpenFlow-Based Server Load Balancing Gone Wild Introduction OpenFlow Solution Partitioning the Client Traffic Transitioning With Connection Affinity Evaluation Future Work
Introduction Clients access online service through a single public IP address. Data centers host online services on multiple replica servers offering the same service each has a unique IP and an integer weight. Front-end load balancers: direct each client request to a particular replica server. Problems: Dedicated load balancers are expensive and quickly become a single point of failure and congestion.
OpenFlow Basic Solution Plug-n-Serve system uses OpenFlow to reactively assign client requests to replicas based on the current network and server load. Plug-n-Serve intercepts the first packet of each client request and installs an individual forwarding rule that handles the remaining packets of the connection. Scalability Limitations: Overhead and delay in involving the relatively slow controller in every client connection. Many rules installed at each switch (separate rule for each client). Heavy load on the controller.
OpenFlow Features Microflow rule: matches on all fields. Wildcard rule: can have don t care bits in some fields. Rules can be deleted after a fixed time interval (a hard timeout). Rules can be deleted after a specified period of inactivity (a soft timeout). The switch counts the number of bytes and packets matching each rule. The controller can poll these counter values.
OpenFlow Alternative Approach Use wildcard rules to direct incoming client requests based on the client IP addresses. Switch performs an action of: 1 Rewriting the server IP address 2 Forwarding the packet to the output port associated with the chosen replica. Rely on microflow rules only during transitions from one set of wildcard rules to another. Soft timeouts allow these microflow rules to self destruct after a client connection completes.
Load-balancing Architecture Constraints: 1 Generating an efficient set of rules for a target distribution of load. 2 Ensuring that packets in the same TCP connection reach the same server across changes in the rules. Components: 1 Partitioning algorithm: Generates wildcard rules that balance load over the replicas. 2 Transitioning algorithm: Moves from one set of wildcard rules to another, without disrupting ongoing connections.
[1] Partitioning the Client Traffic Must divide client traffic in proportion to the load-balancing weights. Successive packets from the same TCP connection forwarded to same replica Rules installed match on client IP addresses Figure: Basic model from load balancer switch s view
[1] Partitioning the Client Traffic Binary tree is used to represent IP prefixes. If α j is power of 2 binary tree leaf nodes Each R j is associated with α j leaf nodes. e.g. R 2 is associated with four leaves. If α j is not power of 2 find closest power of 2 and renormalize the weights. Figure: Wildcard rule assigned to each leaf node
Minimizing the Number of Wildcard Rules Creating a wildcard rule for each leaf node large number of rules. Aggregate siblings associated with the same server replica. 10* can represent 100* and 101* associated with R 2. 00* can represent 000* 001* associated with R 1. 6 wildcard rules instead of 8. Alternate assignment can lead to only 4 rules: (0*, 10*, 110*, and 111*).
Minimizing Change During Re-Partitioning Weights α j may change over time: maintenance, save energy, congestion. Possible solution: regenerate wildcard rules from scratch. Problems: Change replica selection for large number of client IP addresses, increase overhead of transitioning to new rules.
Minimizing Change During Re-Partitioning Better Solution: If number of leaf nodes of a replica unchanged Rules of this replica may not need to change. e.g. If α 3 changed to 0 and α 1 changed to 4: Rule of R 2 remains unchanged, and R 1 will only have one rule 1*. Create a new binary tree for updated α j. Pre-allocates leaf nodes to re-usable wildcard rules. Re-usable wildcard rules: ith highest bit set to 1 in new and old α j even if old and new α j are different. Allocate leaf nodes for larger group rather than using existing rules of smaller pre-allocated nodes.
[2] Transitioning With Connection Affinity Existing connections should complete at the original replica. New Connection: TCP SYN flag is set in the first packet of a new connection. Approaches: Faster Transition: Direct some packets to controller Slower Transition: Switch handles all packets
Transitioning Quickly With Microflow Rules Rule directing all 0* traffic to the controller for inspection. A dedicated high-priority microflow rule with 60-second soft timeout for each connection. Rule directs to the new replica R 2 (for a SYN). Rule directs to the old replica R 1 (for a non-syn). Controller modifies the 0* rule to direct all future traffic to the new replica R 2.
Transitioning With No Packets to Controller Controller divides the address space for 0* into several smaller pieces, each represented by a high-priority wildcard rule (e.g., 000*, 001*, 010*, and 011*) directing traffic to old replica R 1. 60-second soft timeout added to higher-priority rules to be deleted if no activity safely can shift to R 2. Controller installs a single lower-priority rule directing 0* to the new replica R 2.
Evaluation α 1 = 3, α 2 = 4, α 3 = 1 At time 75 sec: α 2 = 0
Future Work: Non-Uniform Client Traffic The target distribution of load is 50%, 25%, and 25% for R 1, R 2, and R 3. Actual division of load is (overwhelming) 75% for R 1 and (underwhelming) 12.5% for R 2 and R 3 each. Solution: Use OpenFlow counters for rules. Identify severely overloaded and underloaded replicas. Identify the set of rules to shift.
Future Work: Network of Multiple Switches SW1: forward packets with src IP in 1* to SW3, modify dst IP to R 3. SW1: forward packets with src IP in 00* to SW2, modify dst IP to R 2. SW1: forward packets with src IP in 01* to SW2, modify dst IP to R 3. SW2,SW3: forward packets to appropriate server.
Advantages Computes concise wildcard rules that achieve a target distribution of the traffic. Proactively installs wildcard rules in the switches to direct requests for large groups of clients without involving the controller. Automatically adjust to changes in load-balancing policies without disrupting existing connections. Avoids the cost and complexity of separate load-balancer devices. Allows flexibility of network topology. Scales naturally as the number of switches and replicas grows, while directing client requests at line rate.
SDN and Traffic Engineering: SWAN
Outline 1 Achieving high utilization with software-driven WAN
Introduction Service rely on low-latency inter-dc communication, hence resources over-provisioned Unable to fully leverage investment: lack of co-ordination among services network under-subscribed on average poor efficiency of MPLS TE Solution?
Introduction Software-Driven WAN (SWAN) proposed by Microsoft Enables inter-dc WAN to carry significantly more traffic. Achieves high efficiency and utilization. Enables to update the network s data plane at high load as well Fully use network capacity with an order of few rules 4 / 17
Background & Motivation Types of services: Interactive Services critical path of end user experience - eg. DC contacts another DC to serve user s request highly sensitive to loss and delay Elastic Services regular timely delivery - eg. data replication sensitive to delay varies Background Services maintenance and provisioning activities - eg. copy all data of service to another DC for long-term storage bandwidth hungry, requires more resources not sensitive to delay or latency
Background & Motivation - Issues with MPLS TE Poor utilization Daily traffic pattern on a busy link Break down based on traffic type Reduction in peak usage if background traffic is dynamically adapted
Background & Motivation - Issues with MPLS TE Poor efficiency Flows arrive in the order F a followed by F b and finally F c MPLS TE greedily assigns path as shown in Fig. (a) while there exists a more efficient solution as shown in Fig. (b)
Background & Motivation - Issues with MPLS TE Poor sharing Link capacity = 1, each service (S i D i ) has unit demand With link-fairness - (S 2 D 2 ) gets twice throughput of other services
SWAN Overview SWAN s sharing policies Small number of priority classes Interactive Elastic Background (lowest priority) bandwidth allocated in strict precedence prefer shorter paths for higher priority classes Except interactive services, all other inform SWAN controller about details of their demand. Interactive traffic sent using traditional approach. Controller: up-to-date, global view of topology & demands; computes resource allocation for services; Per SDN paradigm, controller directly updates forwarding entries in switches
SWAN Overview Need for a scalable algorithm for global allocation Computationally intensive (LP) SWAN uses a practical approach approximately fair with provable bounds and close to optimal
SWAN Overview Atomic reconfiguration of a distributed switch Each flow unit = 1, Link capacity = 1.5 units SWAN computes multi-step congestion-free transition plan
SWAN Overview Key concept For each link, SWAN leaves a scratch copy s [0, 50%]. This scratch capacity guarantees a transition plan exists with a maximum of ( 1 s 1) steps.
SWAN Overview Switch hardware supports limited number of rules. SWAN dynamically identifies and installs tunnels using LP. What about network re-configuration? Will it disrupt traffic? SWAN sets aside scratch space (eg. 10%) on the switch to accommodate new set of rules.
SWAN Design Figure: Architecture of SWAN Service brokers & hosts - host estimate service s demand (every T h time); broker apportions the demand based on current limits; broker also aggregates demand and updates controller every T s time. Network agent - report topology changes to controller, get traffic info. from controller (every T a time); reliably update switches. Controller - uses info. on service demands and network topology (every T c time), computes service allocations, decide forwarding plane config. updates, and instructs service brokers and network agents accordingly.
SWAN Design Forwarding plane configuration uses label-based forwarding (similar to VLAN tagging) label assigned by source; transit switches use label and table to route Computing service allocations approximate max-min fairness among same priority classes Updating forwarding state update traffic distribution across tunnels uses scratch capacity and LP-based algorithm updating tunnels
SWAN Design - Handling Failures Network agents report link/switch failures to the controller. Controller re-computes the allocation and updates network agents and service brokers, etc. Network agents, service brokers, and the controller have backup instances.
Conclusion SWAN enables highly efficient and flexible inter-dc WAN Scratch capacity on the links and scratch space on the switch enable updates without congestion. Test-bed and data-driven simulations show SWAN can carry 60 % more traffic.