Episode 5. Scheduling and Traffic Management

Episode 5. Scheduling and Traffic Management Part 3 Baochun Li Department of Electrical and Computer Engineering University of Toronto

Outline What is scheduling? Why do we need it? Requirements of a scheduling discipline Fundamental choices Scheduling best effort connections Scheduling guaranteed-service connections Packet drop strategies A look at today: datacenter networks

Scheduling guaranteed-service connections

Scheduling guaranteed-service connections With best-effort connections, the goal is fairness

Scheduling guaranteed-service connections With best-effort connections, the goal is fairness With guaranteed-service connections What performance guarantees are achievable? How easy is admission control?

Weighted Fair Queuing revisited

Weighted Fair Queuing revisited Turns out that WFQ also provides performance guarantees

Weighted Fair Queuing revisited Turns out that WFQ also provides performance guarantees Bandwidth bound ratio of weights * link capacity Example: connections with weights 1, 2, 7; link capacity 10 connections get at least 1, 2, 7 units of bandwidth each End-to-end delay bound assumes that the connection doesn t send too much (otherwise its packets will be stuck in queues) more precisely, connection should be leaky-bucket regulated

Leaky-bucket regulators TOKEN RUCKET TOKENS ARRIVE PERIODICALLY /p v ]NPUT _--} OUTPUT DATA BUFFER 5

Leaky-bucket regulators Implement linear bounded arrival processes # bits transmitted in time [t 1, t 2 ] <= r (t 2 - t 1 ) + s TOKEN RUCKET TOKENS ARRIVE PERIODICALLY /p v ]NPUT _--} OUTPUT DATA BUFFER 5

Special cases of a leaky-bucket regulator 6

Special cases of a leaky-bucket regulator Peak-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the peak rate 6

Special cases of a leaky-bucket regulator Peak-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the peak rate Moving-window average-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the average rate Augment a leaky-bucket regulator with a peak-rate regulator: controls all three parameters: the average rate, the peak rate, and the largest burst 6

Parekh-Gallager Theorem Let a connection be allocated weights at each WFQ scheduler along its path, so that the least bandwidth it is allocated is g Let it be leaky-bucket regulated such that # bits sent in time [t 1, t 2 ] <= r (t 2 - t 1 ) + s Let the connection pass through K schedulers, where the k th scheduler has a link rate r(k) Let the largest packet allowed in the network be P D g + K 1 k=1 P g + K k=1 P r(k)

Example 8

Example Consider a connection with leaky bucket parameters (16384 bytes, 150 Kbps) that traverses 10 hops on a network where all the links have a bandwidth of 45 Mbps. If the largest allowed packet in the network is 8192 bytes long, what g value will guarantee an end-toend delay of 100 ms? Assume a propagation delay of 30 ms. Solution The queuing delay must be bounded by 100-30 = 70 ms. Plugging this into the previous equation, we get 70 x 10-3 = {(16384 x 8) + (9 x 8192 x 8)} / g + (10 x 8192 x 8) / (45 x 10 6 ), so that g = 12.87 Mbps. This is more than 86 times larger than the source s average rate of 150 Kbps 8

Significance

Significance Theorem shows that WFQ can provide end-to-end delay bounds

Significance Theorem shows that WFQ can provide end-to-end delay bounds So WFQ provides both fairness and performance guarantees

Significance Theorem shows that WFQ can provide end-to-end delay bounds So WFQ provides both fairness and performance guarantees Bound holds regardless of cross traffic behaviour

Problems

Problems To get a delay bound, need to pick g the lower the delay bounds, the larger g needs to be large g => exclusion of more competitors from link g can be very large, in our example > 80 times the peak rate! WFQ couples delay and bandwidth allocations low delay requires allocating more bandwidth wastes bandwidth for low-bandwidth low-delay sources

Delay-Earliest Due Date

Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected

Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets

Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets A source is required to send slower than its peak rate

Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets A source is required to send slower than its peak rate Bandwidth at scheduler reserved at peak rate Deadline = expected arrival time + delay bound If a source sends faster than contract, delay bound will not apply Each packet gets a hard delay bound

Rate-controlled scheduling A class of disciplines two components: regulator and scheduler incoming packets are placed in regulator where they wait to become eligible then they are put in the scheduler Regulator shapes the traffic, scheduler provides performance guarantees TI m II m OUTPUT REGULATOR SCHEDULER

Examples of Regulators

Examples of Regulators Rate-jitter regulator: packets should arrive to the scheduler less than peak rate bounds maximum outgoing rate

Examples of Regulators Rate-jitter regulator: packets should arrive to the scheduler less than peak rate bounds maximum outgoing rate Delay-jitter regulator: packets should arrive to the scheduler at a constant delay after leaving the scheduler at a previous switch compensates for variable delay at previous hop

Analysis

Analysis First regulator on path monitors and regulates traffic => bandwidth bound

Analysis First regulator on path monitors and regulates traffic => bandwidth bound End-to-end delay bound delay-jitter regulator reconstructs traffic => end-to-end delay is fixed (= worst-case delay at each hop) rate-jitter regulator partially reconstructs traffic can show that end-to-end delay bound is smaller than (sum of delay bound at each hop + delay at first hop)

Delay-jitter Regulator + Delay- EDD scheduler: provides bandwidth bound, delay bound, and delay-jitter bound

Decoupling delay from bandwidth

Decoupling delay from bandwidth Can give a low-bandwidth connection a low delay without overbooking

Decoupling delay from bandwidth Can give a low-bandwidth connection a low delay without overbooking e.g. consider connection A with rate 64 Kbps sent to a router with rate-jitter regulation and multi-priority FCFS scheduling (called rate-controlled static priority) After sending a packet of length l, next packet is eligible at time (now + l/64 Kbps)

Evaluation

Evaluation Pros flexibility: ability to emulate other disciplines can decouple bandwidth and delay assignments end-to-end delay bounds are easily computed do not require complicated schedulers to guarantee protection can provide delay-jitter bounds Cons possibly non-work-conserving delay-jitter bounds at the expense of increasing mean delay delay-jitter regulation is expensive (clock synch, timestamps)

Summary

Summary Two sorts of applications: best effort and guaranteed service

Summary Two sorts of applications: best effort and guaranteed service Best effort connections require fair service provided by GPS, which is unimplementable emulated by WFQ and its variants

Summary Two sorts of applications: best effort and guaranteed service Best effort connections require fair service provided by GPS, which is unimplementable emulated by WFQ and its variants Guaranteed service connections require performance guarantees provided by WFQ, but this is expensive may be better to use rate-controlled schedulers

Packet dropping

Packet dropping Packets that cannot be served immediately are buffered

Packet dropping Packets that cannot be served immediately are buffered Full buffers => packet drop strategy

Packet dropping Packets that cannot be served immediately are buffered Full buffers => packet drop strategy Packet losses happen almost always from best-effort connections (due to admission control in guaranteedservice connections)

Classification of drop strategies Degree of aggregation Drop priorities Early or late Drop position

Degree of aggregation

Degree of aggregation Degree of discrimination in selecting a packet to drop

Degree of aggregation Degree of discrimination in selecting a packet to drop e.g. in vanilla FIFO, all packets are in the same class

Degree of aggregation Degree of discrimination in selecting a packet to drop e.g. in vanilla FIFO, all packets are in the same class Instead, can classify packets and drop packets selectively

Degree of aggregation Degree of discrimination in selecting a packet to drop e.g. in vanilla FIFO, all packets are in the same class Instead, can classify packets and drop packets selectively The finer the classification the better the protection

Drop priorities Drop lower-priority packets first How to choose? Source marks packets Policer also marks packets Packets are marked with a congestion loss priority (CLP) bit in packet header Source marks some Packets Switch PreferentiallY d.iscardi marked Packets

CLP bit: pros and cons

CLP bit: pros and cons Pros if network has spare capacity, all traffic is carried during congestion, load is automatically shed

CLP bit: pros and cons Pros if network has spare capacity, all traffic is carried during congestion, load is automatically shed Cons separating priorities within a single connection is hard what prevents all packets being marked as high priority?

Early vs. late drop

Early vs. late drop Early drop: drop even if space is available signals endpoints to reduce rate cooperative sources get lower overall delays, uncooperative sources get severe packet loss

Early vs. late drop Early drop: drop even if space is available signals endpoints to reduce rate cooperative sources get lower overall delays, uncooperative sources get severe packet loss Early random drop drop arriving packet with fixed drop probability if queue length exceeds threshold intuition: misbehaving sources more likely to send packets and see packet losses doesn t work well in controlling misbehaving users

Early vs. late drop: RED

Early vs. late drop: RED Random early detection (RED) makes three improvements

Early vs. late drop: RED Random early detection (RED) makes three improvements Metric is moving average of queue lengths small bursts pass through unharmed only affects sustained overloads

Early vs. late drop: RED Random early detection (RED) makes three improvements Metric is moving average of queue lengths small bursts pass through unharmed only affects sustained overloads Packet drop probability is a linear function of average queue length prevents severe reaction to mild overload Can mark packets instead of dropping them allows sources to detect network state without losses

Drop position Can drop a packet from head, tail, or random position in the queue Tail easy Head default approach harder lets source detect loss earlier O*! Full buffer \ / \ )a Dropped 5 4 /u'1! ll -lf-l!-----._ Previously packet creates served packet "hole" Acks Destination

Drop position (contd.)

Drop position (contd.) Random hardest hurts bandwidth hogs most unlikely to make it to real routers

Drop position (contd.) Random hardest hurts bandwidth hogs most unlikely to make it to real routers Drop entire longest queue easy almost as effective as drop tail from longest queue

Datacenter Networks

Unique characteristics in datacenter networks 1000s of server ports 30 30

Unique characteristics in datacenter networks web app cache db 1000s of server ports MapRe duce Spark monitoring 30 30

Unique characteristics in datacenter networks Large number of flows from distributed workloads, most are fairly short 1000s of server ports web app cache db MapRe duce Spark monitoring 30 30

Unique characteristics in datacenter networks Large number of flows from distributed workloads, most are fairly short We care more about delays than fairness 1000s of server ports web app cache db MapRe duce Spark monitoring 30 30

Unique requirements Goal: Complete flows quickly Requires scheduling flows such that: High throughput for large flows Fabric latency (no queuing delays) for small flows Lots of recent work on the use of rate control to schedule flows DCTCP [SIGCOMM 10], HULL [NSDI 11], D 2 TCP [SIGCOMM 12], D3 [SIGCOMM 11], PDQ [SIGCOMM 12] Most are quite complex 31

pfabric: critique paper on November 10 Packets carry a single priority number The priority can be a flow s remaining size pfabric switches Very small buffers (20-30KB for 10Gbps fabric) Send highest priority / drop lowest priority packets pfabric hosts Send/retransmit aggressively Minimal rate control: just prevent congestion collapse 32