Episode 5. Scheduling and Traffic Management Part 3 Baochun Li Department of Electrical and Computer Engineering University of Toronto
Outline What is scheduling? Why do we need it? Requirements of a scheduling discipline Fundamental choices Scheduling best effort connections Scheduling guaranteed-service connections Packet drop strategies A look at today: datacenter networks
Scheduling guaranteed-service connections
Scheduling guaranteed-service connections With best-effort connections, the goal is fairness
Scheduling guaranteed-service connections With best-effort connections, the goal is fairness With guaranteed-service connections What performance guarantees are achievable? How easy is admission control?
Scheduling guaranteed-service connections With best-effort connections, the goal is fairness With guaranteed-service connections What performance guarantees are achievable? How easy is admission control? We now study some scheduling disciplines that provide performance guarantees
Weighted Fair Queuing revisited
Weighted Fair Queuing revisited Turns out that WFQ also provides performance guarantees
Weighted Fair Queuing revisited Turns out that WFQ also provides performance guarantees Bandwidth bound ratio of weights * link capacity Example: connections with weights 1, 2, 7; link capacity 10 connections get at least 1, 2, 7 units of bandwidth each
Weighted Fair Queuing revisited Turns out that WFQ also provides performance guarantees Bandwidth bound ratio of weights * link capacity Example: connections with weights 1, 2, 7; link capacity 10 connections get at least 1, 2, 7 units of bandwidth each End-to-end delay bound assumes that the connection doesn t send too much (otherwise its packets will be stuck in queues) more precisely, connection should be leaky-bucket regulated
Leaky-bucket regulators TOKEN RUCKET TOKENS ARRIVE PERIODICALLY /p v ]NPUT _--} OUTPUT DATA BUFFER 5
Leaky-bucket regulators Implement linear bounded arrival processes # bits transmitted in time [t 1, t 2 ] <= r (t 2 - t 1 ) + s TOKEN RUCKET TOKENS ARRIVE PERIODICALLY /p v ]NPUT _--} OUTPUT DATA BUFFER 5
Special cases of a leaky-bucket regulator 6
Special cases of a leaky-bucket regulator Peak-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the peak rate 6
Special cases of a leaky-bucket regulator Peak-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the peak rate Moving-window average-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the average rate 6
Special cases of a leaky-bucket regulator Peak-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the peak rate Moving-window average-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the average rate Augment a leaky-bucket regulator with a peak-rate regulator: controls all three parameters: the average rate, the peak rate, and the largest burst 6
Special cases of a leaky-bucket regulator Peak-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the peak rate Moving-window average-rate regulator: Setting the token-bucket limit to one token, and the token replenishment interval to the average rate Augment a leaky-bucket regulator with a peak-rate regulator: controls all three parameters: the average rate, the peak rate, and the largest burst (Keshav, Ch. 13.3.4) 6
Parekh-Gallager Theorem Let a connection be allocated weights at each WFQ scheduler along its path, so that the least bandwidth it is allocated is g Let it be leaky-bucket regulated such that # bits sent in time [t 1, t 2 ] <= r (t 2 - t 1 ) + s Let the connection pass through K schedulers, where the k th scheduler has a link rate r(k) Let the largest packet allowed in the network be P D g + K 1 k=1 P g + K k=1 P r(k)
Example 8
Example Consider a connection with leaky bucket parameters (16384 bytes, 150 Kbps) that traverses 10 hops on a network where all the links have a bandwidth of 45 Mbps. If the largest allowed packet in the network is 8192 bytes long, what g value will guarantee an end-toend delay of 100 ms? Assume a propagation delay of 30 ms. 8
Example Consider a connection with leaky bucket parameters (16384 bytes, 150 Kbps) that traverses 10 hops on a network where all the links have a bandwidth of 45 Mbps. If the largest allowed packet in the network is 8192 bytes long, what g value will guarantee an end-toend delay of 100 ms? Assume a propagation delay of 30 ms. Solution The queuing delay must be bounded by 100-30 = 70 ms. Plugging this into the previous equation, we get 70 x 10-3 = {(16384 x 8) + (9 x 8192 x 8)} / g + (10 x 8192 x 8) / (45 x 10 6 ), so that g = 12.87 Mbps. This is more than 86 times larger than the source s average rate of 150 Kbps 8
Example Consider a connection with leaky bucket parameters (16384 bytes, 150 Kbps) that traverses 10 hops on a network where all the links have a bandwidth of 45 Mbps. If the largest allowed packet in the network is 8192 bytes long, what g value will guarantee an end-toend delay of 100 ms? Assume a propagation delay of 30 ms. Solution The queuing delay must be bounded by 100-30 = 70 ms. Plugging this into the previous equation, we get 70 x 10-3 = {(16384 x 8) + (9 x 8192 x 8)} / g + (10 x 8192 x 8) / (45 x 10 6 ), so that g = 12.87 Mbps. This is more than 86 times larger than the source s average rate of 150 Kbps With large packets, packet delays can be quite substantial! 8
Significance
Significance Theorem shows that WFQ can provide end-to-end delay bounds
Significance Theorem shows that WFQ can provide end-to-end delay bounds So WFQ provides both fairness and performance guarantees
Significance Theorem shows that WFQ can provide end-to-end delay bounds So WFQ provides both fairness and performance guarantees Bound holds regardless of cross traffic behaviour
Problems
Problems To get a delay bound, need to pick g the lower the delay bounds, the larger g needs to be large g => exclusion of more competitors from link g can be very large, in our example > 80 times the peak rate!
Problems To get a delay bound, need to pick g the lower the delay bounds, the larger g needs to be large g => exclusion of more competitors from link g can be very large, in our example > 80 times the peak rate! WFQ couples delay and bandwidth allocations low delay requires allocating more bandwidth wastes bandwidth for low-bandwidth low-delay sources
Delay-Earliest Due Date
Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected
Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets
Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets A source is required to send slower than its peak rate
Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets A source is required to send slower than its peak rate Bandwidth at scheduler reserved at peak rate
Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets A source is required to send slower than its peak rate Bandwidth at scheduler reserved at peak rate Deadline = expected arrival time + delay bound If a source sends faster than contract, delay bound will not apply
Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets A source is required to send slower than its peak rate Bandwidth at scheduler reserved at peak rate Deadline = expected arrival time + delay bound If a source sends faster than contract, delay bound will not apply Each packet gets a hard delay bound
Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets A source is required to send slower than its peak rate Bandwidth at scheduler reserved at peak rate Deadline = expected arrival time + delay bound If a source sends faster than contract, delay bound will not apply Each packet gets a hard delay bound Delay bound is independent of bandwidth requirement but reservation is at a connection s peak rate
Delay-Earliest Due Date Earliest-due-date: packet with earliest deadline selected Delay-EDD prescribes how to assign deadlines to packets A source is required to send slower than its peak rate Bandwidth at scheduler reserved at peak rate Deadline = expected arrival time + delay bound If a source sends faster than contract, delay bound will not apply Each packet gets a hard delay bound Delay bound is independent of bandwidth requirement but reservation is at a connection s peak rate Implementation requires per-connection state and a priority queue
Rate-controlled scheduling A class of disciplines two components: regulator and scheduler incoming packets are placed in regulator where they wait to become eligible then they are put in the scheduler Regulator shapes the traffic, scheduler provides performance guarantees TI m II m OUTPUT REGULATOR SCHEDULER
Examples of Regulators
Examples of Regulators Rate-jitter regulator: packets should arrive to the scheduler less than peak rate bounds maximum outgoing rate
Examples of Regulators Rate-jitter regulator: packets should arrive to the scheduler less than peak rate bounds maximum outgoing rate Delay-jitter regulator: packets should arrive to the scheduler at a constant delay after leaving the scheduler at a previous switch compensates for variable delay at previous hop
Analysis
Analysis First regulator on path monitors and regulates traffic => bandwidth bound
Analysis First regulator on path monitors and regulates traffic => bandwidth bound End-to-end delay bound delay-jitter regulator reconstructs traffic => end-to-end delay is fixed (= worst-case delay at each hop) rate-jitter regulator partially reconstructs traffic can show that end-to-end delay bound is smaller than (sum of delay bound at each hop + delay at first hop)
Delay-jitter Regulator + Delay- EDD scheduler: provides bandwidth bound, delay bound, and delay-jitter bound
Decoupling delay from bandwidth
Decoupling delay from bandwidth Can give a low-bandwidth connection a low delay without overbooking
Decoupling delay from bandwidth Can give a low-bandwidth connection a low delay without overbooking e.g. consider connection A with rate 64 Kbps sent to a router with rate-jitter regulation and multi-priority FCFS scheduling (called rate-controlled static priority)
Decoupling delay from bandwidth Can give a low-bandwidth connection a low delay without overbooking e.g. consider connection A with rate 64 Kbps sent to a router with rate-jitter regulation and multi-priority FCFS scheduling (called rate-controlled static priority) After sending a packet of length l, next packet is eligible at time (now + l/64 Kbps)
Decoupling delay from bandwidth Can give a low-bandwidth connection a low delay without overbooking e.g. consider connection A with rate 64 Kbps sent to a router with rate-jitter regulation and multi-priority FCFS scheduling (called rate-controlled static priority) After sending a packet of length l, next packet is eligible at time (now + l/64 Kbps) If placed at highest-priority queue, all packets from A get low delay
Decoupling delay from bandwidth Can give a low-bandwidth connection a low delay without overbooking e.g. consider connection A with rate 64 Kbps sent to a router with rate-jitter regulation and multi-priority FCFS scheduling (called rate-controlled static priority) After sending a packet of length l, next packet is eligible at time (now + l/64 Kbps) If placed at highest-priority queue, all packets from A get low delay Can decouple delay and bandwidth bounds, unlike WFQ
Evaluation
Evaluation Pros flexibility: ability to emulate other disciplines can decouple bandwidth and delay assignments end-to-end delay bounds are easily computed do not require complicated schedulers to guarantee protection can provide delay-jitter bounds
Evaluation Pros flexibility: ability to emulate other disciplines can decouple bandwidth and delay assignments end-to-end delay bounds are easily computed do not require complicated schedulers to guarantee protection can provide delay-jitter bounds Cons possibly non-work-conserving delay-jitter bounds at the expense of increasing mean delay delay-jitter regulation is expensive (clock synch, timestamps)
Summary
Summary Two sorts of applications: best effort and guaranteed service
Summary Two sorts of applications: best effort and guaranteed service Best effort connections require fair service provided by GPS, which is unimplementable emulated by WFQ and its variants
Summary Two sorts of applications: best effort and guaranteed service Best effort connections require fair service provided by GPS, which is unimplementable emulated by WFQ and its variants Guaranteed service connections require performance guarantees provided by WFQ, but this is expensive may be better to use rate-controlled schedulers
Outline What is scheduling? Why do we need it? Requirements of a scheduling discipline Fundamental choices Scheduling best effort connections Scheduling guaranteed-service connections Packet drop strategies A look at today: datacenter networks
Packet dropping
Packet dropping Packets that cannot be served immediately are buffered
Packet dropping Packets that cannot be served immediately are buffered Full buffers => packet drop strategy
Packet dropping Packets that cannot be served immediately are buffered Full buffers => packet drop strategy Packet losses happen almost always from best-effort connections (due to admission control in guaranteedservice connections)
Packet dropping Packets that cannot be served immediately are buffered Full buffers => packet drop strategy Packet losses happen almost always from best-effort connections (due to admission control in guaranteedservice connections) Shouldn t drop packets unless imperative packet drop wastes resources
Classification of drop strategies Degree of aggregation Drop priorities Early or late Drop position
Degree of aggregation
Degree of aggregation Degree of discrimination in selecting a packet to drop
Degree of aggregation Degree of discrimination in selecting a packet to drop e.g. in vanilla FIFO, all packets are in the same class
Degree of aggregation Degree of discrimination in selecting a packet to drop e.g. in vanilla FIFO, all packets are in the same class Instead, can classify packets and drop packets selectively
Degree of aggregation Degree of discrimination in selecting a packet to drop e.g. in vanilla FIFO, all packets are in the same class Instead, can classify packets and drop packets selectively The finer the classification the better the protection
Degree of aggregation Degree of discrimination in selecting a packet to drop e.g. in vanilla FIFO, all packets are in the same class Instead, can classify packets and drop packets selectively The finer the classification the better the protection Max-min fair allocation of buffers to classes drop packet from class with the longest queue
Drop priorities Drop lower-priority packets first How to choose? Source marks packets Policer also marks packets Packets are marked with a congestion loss priority (CLP) bit in packet header Source marks some Packets Switch PreferentiallY d.iscardi marked Packets
CLP bit: pros and cons
CLP bit: pros and cons Pros if network has spare capacity, all traffic is carried during congestion, load is automatically shed
CLP bit: pros and cons Pros if network has spare capacity, all traffic is carried during congestion, load is automatically shed Cons separating priorities within a single connection is hard what prevents all packets being marked as high priority?
Early vs. late drop
Early vs. late drop Early drop: drop even if space is available signals endpoints to reduce rate cooperative sources get lower overall delays, uncooperative sources get severe packet loss
Early vs. late drop Early drop: drop even if space is available signals endpoints to reduce rate cooperative sources get lower overall delays, uncooperative sources get severe packet loss Early random drop drop arriving packet with fixed drop probability if queue length exceeds threshold intuition: misbehaving sources more likely to send packets and see packet losses doesn t work well in controlling misbehaving users
Early vs. late drop: RED
Early vs. late drop: RED Random early detection (RED) makes three improvements
Early vs. late drop: RED Random early detection (RED) makes three improvements Metric is moving average of queue lengths small bursts pass through unharmed only affects sustained overloads
Early vs. late drop: RED Random early detection (RED) makes three improvements Metric is moving average of queue lengths small bursts pass through unharmed only affects sustained overloads Packet drop probability is a linear function of average queue length prevents severe reaction to mild overload
Early vs. late drop: RED Random early detection (RED) makes three improvements Metric is moving average of queue lengths small bursts pass through unharmed only affects sustained overloads Packet drop probability is a linear function of average queue length prevents severe reaction to mild overload Can mark packets instead of dropping them allows sources to detect network state without losses
Early vs. late drop: RED Random early detection (RED) makes three improvements Metric is moving average of queue lengths small bursts pass through unharmed only affects sustained overloads Packet drop probability is a linear function of average queue length prevents severe reaction to mild overload Can mark packets instead of dropping them allows sources to detect network state without losses RED improves performance of a network of cooperating TCP sources
Early vs. late drop: RED Random early detection (RED) makes three improvements Metric is moving average of queue lengths small bursts pass through unharmed only affects sustained overloads Packet drop probability is a linear function of average queue length prevents severe reaction to mild overload Can mark packets instead of dropping them allows sources to detect network state without losses RED improves performance of a network of cooperating TCP sources No bias against bursty sources
Drop position Can drop a packet from head, tail, or random position in the queue Tail easy Head default approach harder lets source detect loss earlier O*! Full buffer \ / \ )a Dropped 5 4 /u'1! ll -lf-l!-----._ Previously packet creates served packet "hole" Acks Destination
Drop position (contd.)
Drop position (contd.) Random hardest hurts bandwidth hogs most unlikely to make it to real routers
Drop position (contd.) Random hardest hurts bandwidth hogs most unlikely to make it to real routers Drop entire longest queue easy almost as effective as drop tail from longest queue
Datacenter Networks
Unique characteristics in datacenter networks 1000s of server ports 30 30
Unique characteristics in datacenter networks web app cache db 1000s of server ports MapRe duce Spark monitoring 30 30
Unique characteristics in datacenter networks Large number of flows from distributed workloads, most are fairly short 1000s of server ports web app cache db MapRe duce Spark monitoring 30 30
Unique characteristics in datacenter networks Large number of flows from distributed workloads, most are fairly short We care more about delays than fairness 1000s of server ports web app cache db MapRe duce Spark monitoring 30 30
Unique requirements Goal: Complete flows quickly Requires scheduling flows such that: High throughput for large flows Fabric latency (no queuing delays) for small flows Lots of recent work on the use of rate control to schedule flows DCTCP [SIGCOMM 10], HULL [NSDI 11], D 2 TCP [SIGCOMM 12], D3 [SIGCOMM 11], PDQ [SIGCOMM 12] Most are quite complex 31
pfabric: critique paper on November 10 Packets carry a single priority number The priority can be a flow s remaining size pfabric switches Very small buffers (20-30KB for 10Gbps fabric) Send highest priority / drop lowest priority packets pfabric hosts Send/retransmit aggressively Minimal rate control: just prevent congestion collapse 32