Episode 4. Flow and Congestion Control Baochun Li Department of Electrical and Computer Engineering University of Toronto
Recall the previous episode Detailed design principles in: The link layer The network layer Topic of this episode: Design principles in the end-to-end layer Congestion control a network system design issue 2
Salzer 7.5.6, 7.6; Keshav Chapter 9.7, 13.4.5, CUBIC paper (critique 2)
Design Principles in the End-to-End Layer
The network layer 5
The network layer The network layer provides a useful but not completely dependable best-effort communication environment, that will deliver data segments to any destination But with no guarantees on the order of arrival, certainty of arrival, and the accuracy of content This is too hostile for most applications! 5
The end-to-end layer The job of the end-to-end layer is to create a more comfortable communication environment that has the features of performance, reliability, and certainty that an application needs Problem: different applications have different needs But they tend to fall into classes of similar requirements For each class it is possible to design a broadly useful protocol called the transport protocol A transport protocol operates between two attachment points of a network (a client and a service), with the goal of moving either messages or a stream of data between them, while providing a particular set of assurances 6
Transport Protocol Design
Sending muti-segment messages The simplest method of sending a multi-segment message end-to-end send one segment, wait for the receiver to acknowledge the segment, then send the second segment, and so on known as the lock-step protocol: takes N round-trip times to send N messages! sender send first segment receive ACK, send second segment receive ACK, send third segment (repeat N times) Done. segment 1 Acknowledgment 1 segment 2 Acknowledgment 2 3 N Acknowledgment N receiver accept segment 1 accept segment 2 accept segment N time 8
Overlapping transmissions Adopt the pipelining principle As soon as the first segment has been sent, immediately send the next ones, without waiting for acknowledgments When the pipeline is completely filled, there may be several segments in the network : N segments require N transmission times + 1 RTT send segment 1 send segment 2 send segment 3 receive ACK 1 receive ACK 2 (repeat N times) receive ACK N, done. sender 3 2 ack 1 ack 2 segment 1 ack N N receiver acknowledge segment 1 acknowledge segment 2 acknowledge segment N time 9
But things can go wrong: lost packets 10
But things can go wrong: lost packets One or more packets or acknowledgments may be lost along the way 10
But things can go wrong: lost packets One or more packets or acknowledgments may be lost along the way The sender needs to maintain a list of segments sent 10
But things can go wrong: lost packets One or more packets or acknowledgments may be lost along the way The sender needs to maintain a list of segments sent As each acknowledgment gets back, the sender checks that off its list 10
But things can go wrong: lost packets One or more packets or acknowledgments may be lost along the way The sender needs to maintain a list of segments sent As each acknowledgment gets back, the sender checks that off its list After sending the last segment, the sender sets a timer to expire a little more than one round-trip time in the future 10
But things can go wrong: lost packets One or more packets or acknowledgments may be lost along the way The sender needs to maintain a list of segments sent As each acknowledgment gets back, the sender checks that off its list After sending the last segment, the sender sets a timer to expire a little more than one round-trip time in the future If upon receiving an acknowledgment, the list of missing acknowledgments becomes empty, all is well 10
But things can go wrong: lost packets One or more packets or acknowledgments may be lost along the way The sender needs to maintain a list of segments sent As each acknowledgment gets back, the sender checks that off its list After sending the last segment, the sender sets a timer to expire a little more than one round-trip time in the future If upon receiving an acknowledgment, the list of missing acknowledgments becomes empty, all is well Otherwise, the sender resends each one in the list, starts another timer, and repeat the sequence until every segment is acknowledged (or the retry limit is reached) 10
But things can go wrong: bottlenecks 11
But things can go wrong: bottlenecks When the sender generates data, the network can transmit it faster than the (slower) receiver can accept it 11
But things can go wrong: bottlenecks When the sender generates data, the network can transmit it faster than the (slower) receiver can accept it The transport protocol needs to include some method of controlling the rate at which the sender generates data called flow control 11
But things can go wrong: bottlenecks When the sender generates data, the network can transmit it faster than the (slower) receiver can accept it The transport protocol needs to include some method of controlling the rate at which the sender generates data called flow control A basic intuitive idea 11
But things can go wrong: bottlenecks When the sender generates data, the network can transmit it faster than the (slower) receiver can accept it The transport protocol needs to include some method of controlling the rate at which the sender generates data called flow control A basic intuitive idea The sender starts by asking the receiver how much data the receiver can handle 11
But things can go wrong: bottlenecks When the sender generates data, the network can transmit it faster than the (slower) receiver can accept it The transport protocol needs to include some method of controlling the rate at which the sender generates data called flow control A basic intuitive idea The sender starts by asking the receiver how much data the receiver can handle The response from the receiver is known as a window 11
But things can go wrong: bottlenecks When the sender generates data, the network can transmit it faster than the (slower) receiver can accept it The transport protocol needs to include some method of controlling the rate at which the sender generates data called flow control A basic intuitive idea The sender starts by asking the receiver how much data the receiver can handle The response from the receiver is known as a window The sender asks for permission to send, and the receiver responds by quoting a window size 11
But things can go wrong: bottlenecks When the sender generates data, the network can transmit it faster than the (slower) receiver can accept it The transport protocol needs to include some method of controlling the rate at which the sender generates data called flow control A basic intuitive idea The sender starts by asking the receiver how much data the receiver can handle The response from the receiver is known as a window The sender asks for permission to send, and the receiver responds by quoting a window size The sender then sends that much data and waits until it receives permission to send more 11
Flow control with a fixed window receive permission, send segment 1 send segment 2 send segment 3 send segment 4 receive ACK 1 receive ACK 2 receive ACK 3 receive ACK 4, wait receive permission, send segment 5 send segment 6 sender segment #1 ack # 2 may I send? yes, 4 segments ack # 1 ack # 3 ack # 4 #2 #3 #4 send 4 more segment #5 #6 receiver receive request, open a 4-segment window buffer segment 1 buffer segment 2 buffer segment 3 buffer segment 4 finished processing segments 1 4, reopen the window buffer segment 5 buffer segment 6 time 12
Sliding Windows 13
Sliding Windows As soon as it has freed up a segment buffer, the receiver can immediately send permission for a window that is one segment larger Either by sending a separate message or, if there happens to be an ACK ready to go, piggy-backing on that ACK 13
Sliding Windows As soon as it has freed up a segment buffer, the receiver can immediately send permission for a window that is one segment larger Either by sending a separate message or, if there happens to be an ACK ready to go, piggy-backing on that ACK The sender keeps track of how much window space is left, and increases that number whenever additional permission arrives 13
Self-Pacing 14
Self-Pacing Once the sender fills a sliding window, it cannot send the next data element until the acknowledgment of the oldest data element in the window returns 14
Self-Pacing Once the sender fills a sliding window, it cannot send the next data element until the acknowledgment of the oldest data element in the window returns At the same time, the receiver cannot generate acknowledgments any faster than the network can deliver data elements 14
Self-Pacing Once the sender fills a sliding window, it cannot send the next data element until the acknowledgment of the oldest data element in the window returns At the same time, the receiver cannot generate acknowledgments any faster than the network can deliver data elements Because of these two considerations, the rate at which the window slides adjusts itself automatically to be equal to the bottleneck data rate! 14
Appropriate window size 15
Appropriate window size Do we still need to know the network round-trip time at the sender? 15
Appropriate window size Do we still need to know the network round-trip time at the sender? Yes window size >= round-trip time x bottleneck data rate 15
Appropriate window size Do we still need to know the network round-trip time at the sender? Yes window size >= round-trip time x bottleneck data rate The bandwidth-delay product 15
Appropriate window size Do we still need to know the network round-trip time at the sender? Yes window size >= round-trip time x bottleneck data rate The bandwidth-delay product If a too-large round trip time estimate is used in window setting, the resulting excessive window size will simply increase the length of packet forwarding queues in the network 15
Appropriate window size Do we still need to know the network round-trip time at the sender? Yes window size >= round-trip time x bottleneck data rate The bandwidth-delay product If a too-large round trip time estimate is used in window setting, the resulting excessive window size will simply increase the length of packet forwarding queues in the network Those longer queues will increase the transit time 15
Appropriate window size Do we still need to know the network round-trip time at the sender? Yes window size >= round-trip time x bottleneck data rate The bandwidth-delay product If a too-large round trip time estimate is used in window setting, the resulting excessive window size will simply increase the length of packet forwarding queues in the network Those longer queues will increase the transit time The increase will lead the sender to think that it needs an even larger window a positive feedback! 15
Appropriate window size Do we still need to know the network round-trip time at the sender? Yes window size >= round-trip time x bottleneck data rate The bandwidth-delay product If a too-large round trip time estimate is used in window setting, the resulting excessive window size will simply increase the length of packet forwarding queues in the network Those longer queues will increase the transit time The increase will lead the sender to think that it needs an even larger window a positive feedback! The estimate needs to err on the side of being too small 15
Congestion control a network-wide problem of managing shared resources
Shared resources: everywhere in a system 17
Shared resources: everywhere in a system Resource sharing examples in systems: Many virtual processors (threads) sharing a few physical processors using a thread manager A multilevel memory manager creates the illusion of large, fast virtual memories by combining a small and fast shared memory with large and slow storage devices 17
Shared resources: everywhere in a system Resource sharing examples in systems: Many virtual processors (threads) sharing a few physical processors using a thread manager A multilevel memory manager creates the illusion of large, fast virtual memories by combining a small and fast shared memory with large and slow storage devices In networks, the resource that is shared is a set of communication links and the supporting packet forwarding switches They are geographically and administratively distributed managing them is more complex! 17
Analogy: Supermarket vs. Packet Switch 18
Analogy: Supermarket vs. Packet Switch Queues are started to manage the problem that packets may arrive at a switch at a time when the outgoing link is already busy transmitting another packet Just like checkout lines in the supermarket 18
Analogy: Supermarket vs. Packet Switch Queues are started to manage the problem that packets may arrive at a switch at a time when the outgoing link is already busy transmitting another packet Just like checkout lines in the supermarket Any time there is a shared resource, and the demand for that resource comes from several statistically independent sources, there will be fluctuations in the arrival of load Thus there will be fluctuations in the length of the queue, and the time spent waiting for service in the queue Offered load > capacity of a resource: overloaded 18
How long will overload persist? 19
How long will overload persist? If the duration of overload is comparable to service time, it is normal the time in a supermarket to serve one customer or the time for a packet forwarding switch to handle one packet In this case, a queue handles short bursts of too much demand by time-averaging with adjacent periods when there is excess capacity 19
How long will overload persist? If the duration of overload is comparable to service time, it is normal the time in a supermarket to serve one customer or the time for a packet forwarding switch to handle one packet In this case, a queue handles short bursts of too much demand by time-averaging with adjacent periods when there is excess capacity If overload persists for a time significantly longer than the service time, there begins to develop a risk that the system will fail to meet some specification, such as maximum delay When this occurs, the resource is said to be congested If congestion is chronic, the length of the queue will grow without bound 19
The stability of offered load 20
The stability of offered load The stability of offered load is another factor in the frequency and duration of congestion When the load on a resource is aggregated from a large number of statistically independent small sources, averaging can reduce the frequency and duration of load peaks When the load comes from a small number of large sources, even if the sources are independent, the probability that they all demand service at about the same time can be high enough, that congestion can be frequent or long-lasting 20
Congestion Collapse Competition for resource may lead to waste of resource Counter-intuitive, but the supermarket analogy can help understand it Customers who are tired of waiting may just walk out, leaving filled shopping carts behind Someone has to put the goods from abandoned carts back to the shelves One of two of the checkout clerks leave their registers to do so The rate of sales being rung up drops while they are away The queues at the remaining registers grow longer Causing more people to abandon their carts Eventually, the clerks will be doing nothing but restocking 21
Self-sustaining nature of congestion collapse capacity of a limited resource unlimited resource useful work done limited resource with no waste congestion collapse offered load 22
Self-sustaining nature of congestion collapse Once temporary congestion induces a collapse, even if the offered load drops back to a level that the resource can handle, the already induced waste rate can continue to exceed the capacity of the resource This will cause it to continue to waste the resource, remain congested indefinitely capacity of a limited resource unlimited resource useful work done limited resource with no waste congestion collapse offered load 22
Primary goal of resource management 23
Primary goal of resource management Avoid congestion collapse! by increasing the capacity of the resource by reducing the offered load 23
Primary goal of resource management Avoid congestion collapse! by increasing the capacity of the resource by reducing the offered load There is a need to move quickly to a state in which the load is less than the capacity of the resource But when offered load is reduced, the amount reduced does not really go away It is just deferred to a later time at the source The source is still averaging periods of overload with periods of excess capacity, but over a longer period of time 23
How to increase capacity or reduce load? 24
How to increase capacity or reduce load? It is necessary to provide feedback to one or more control points an entity that determines the amount of resource that is available the load being offered 24
How to increase capacity or reduce load? It is necessary to provide feedback to one or more control points an entity that determines the amount of resource that is available the load being offered A congestion control system is fundamentally a feedback system A delay in the feedback path can lead to oscillations in load 24
The Supermarket and Call Centre Analogies 25
The Supermarket and Call Centre Analogies In a supermarket, a store manager can be used to watch the queues at the checkout lines Whenever there are more than two or three customers in any line, the manager calls for staff elsewhere in the store to drop what they are doing, and temporarily take stations as checkout clerks This practically increases capacity 25
The Supermarket and Call Centre Analogies In a supermarket, a store manager can be used to watch the queues at the checkout lines Whenever there are more than two or three customers in any line, the manager calls for staff elsewhere in the store to drop what they are doing, and temporarily take stations as checkout clerks This practically increases capacity When you call customer service, you may hear an automatic response message Your call is important to us. It will be 30 minutes to we can answer. This may lead some callers to hang up and try again at a different time This practically decrease load 25
The Supermarket and Call Centre Analogies In a supermarket, a store manager can be used to watch the queues at the checkout lines Whenever there are more than two or three customers in any line, the manager calls for staff elsewhere in the store to drop what they are doing, and temporarily take stations as checkout clerks This practically increases capacity When you call customer service, you may hear an automatic response message Your call is important to us. It will be 30 minutes to we can answer. This may lead some callers to hang up and try again at a different time This practically decrease load Both may lead to oscillations 25
Resource Management in Networks
Shared resources in a computer network 27
Shared resources in a computer network Communication links 27
Shared resources in a computer network Communication links The processing and buffering capacity of the packet forwarding switches 27
Main Challenges Part 1 28
Main Challenges Part 1 There is more than one resource Even a small number of resources can be used up in a large number of different ways, which is complex to keep track of There can be dynamic interactions among different resources as one nears capacity it may push back on another which may push back on yet another which may push back on the first one! 28
Main Challenges Part 1 There is more than one resource Even a small number of resources can be used up in a large number of different ways, which is complex to keep track of There can be dynamic interactions among different resources as one nears capacity it may push back on another which may push back on yet another which may push back on the first one! It is easy to induce congestion collapse As queues for a particular communication link grow, delays grow When queuing delays become too long, the timers of higher layer protocols begin to expire and trigger retransmissions of the delayed packets The retransmitted packets join the long queues, and waste capacity 28
No, we cannot install more buffers! 29
No, we cannot install more buffers! As memory gets cheaper, the idea is tempting but it doesn t work Suppose memory is so cheap that a packet forwarder can be equipped with an infinite buffer size, which can absorb an unlimited amount of overload But as more buffers are used, the queuing delay grows At some point the queuing delay exceeds the timeouts of end-to-end protocols Packets are retransmitted The offered load is now larger, so the queue grows even longer It becomes self-sustaining, and the queue grows even longer The infinite buffer size does not solve the problem, it makes it worse! 29
Main Challenges Part 2 30
Main Challenges Part 2 There are limited options to expand capacity Capacity is determined by physical facilities (e.g., wireless spectrum) One can try sending some queued packets via an alternate path But these strategies are too complex to work well! Reducing the offered load (the demand) is the only realistic way 30
Main Challenges Part 3 31
Main Challenges Part 3 The options to reduce load are awkward 31
Main Challenges Part 3 The options to reduce load are awkward The control point for the offered load is too far away 31
Main Challenges Part 3 The options to reduce load are awkward The control point for the offered load is too far away Feedback path to that point may be long: by the time the feedback signal gets there, the sender may have stopped sending; The feedback may get lost 31
Main Challenges Part 3 The options to reduce load are awkward The control point for the offered load is too far away Feedback path to that point may be long: by the time the feedback signal gets there, the sender may have stopped sending; The feedback may get lost The control point must be capable of reducing its offered load 31
Main Challenges Part 3 The options to reduce load are awkward The control point for the offered load is too far away Feedback path to that point may be long: by the time the feedback signal gets there, the sender may have stopped sending; The feedback may get lost The control point must be capable of reducing its offered load Video streaming protocols are not able to do this! 31
Main Challenges Part 3 The options to reduce load are awkward The control point for the offered load is too far away Feedback path to that point may be long: by the time the feedback signal gets there, the sender may have stopped sending; The feedback may get lost The control point must be capable of reducing its offered load Video streaming protocols are not able to do this! The control point must be willing to cooperate 31
Main Challenges Part 3 The options to reduce load are awkward The control point for the offered load is too far away Feedback path to that point may be long: by the time the feedback signal gets there, the sender may have stopped sending; The feedback may get lost The control point must be capable of reducing its offered load Video streaming protocols are not able to do this! The control point must be willing to cooperate The packet forwarder in the network layer may be under a different administration than the control point in the end-to-end layer 31
Main Challenges Part 3 The options to reduce load are awkward The control point for the offered load is too far away Feedback path to that point may be long: by the time the feedback signal gets there, the sender may have stopped sending; The feedback may get lost The control point must be capable of reducing its offered load Video streaming protocols are not able to do this! The control point must be willing to cooperate The packet forwarder in the network layer may be under a different administration than the control point in the end-to-end layer The control point is more interested in keeping its offered load equal to its intended load, in the hope of capturing more of the capacity in the face of competition! (think BitTorrent) 31
Possible ideas to address these challenges
Overprovisioning 33
Overprovisioning Basic idea: configure each link of the network to have 125% or 200% as much capacity as the offered load at the busiest minute of the day Works best on interior links of a large network, where no individual client represents more than a tiny fraction of the load Average load offered by a large number of statistically independent sources is relatively stable 33
Overprovisioning Basic idea: configure each link of the network to have 125% or 200% as much capacity as the offered load at the busiest minute of the day Works best on interior links of a large network, where no individual client represents more than a tiny fraction of the load Average load offered by a large number of statistically independent sources is relatively stable Problems Odd events can disrupt statistical independence Overprovisioning on one link will move the congestion to another At the edge, statistical averaging stops working flash crowd User usage patterns may adapt to the additional capacity 33
Pricing in a market: the invisible hand 34
Pricing in a market: the invisible hand Since network resources are just another commodity with limited availability, it should be possible to use pricing as a congestion control mechanism If demand for a resource temporarily exceeds its capacity, clients will bid up the price The increased price will cause some clients to defer their use of the resource until a time when it is cheaper, thereby reducing offered load It will also induce additional suppliers to provide more capacity 34
Pricing in a market: the invisible hand Since network resources are just another commodity with limited availability, it should be possible to use pricing as a congestion control mechanism If demand for a resource temporarily exceeds its capacity, clients will bid up the price The increased price will cause some clients to defer their use of the resource until a time when it is cheaper, thereby reducing offered load It will also induce additional suppliers to provide more capacity Challenges How do we make it work on the short time scales of congestion? Clients need a way to predict the costs in the short term, too There has to be a minimal barrier of entry by alternate suppliers 34
How do we address these challenges? Decentralized schemes are extremely scalable
Case in point: the Internet
Cross-layer Cooperation: Feedback
Cross-layer feedback: basic idea 38
Cross-layer feedback: basic idea The packet forwarder that notices congestion provides feedback to one or more end-to-end layer sources 38
Cross-layer feedback: basic idea The packet forwarder that notices congestion provides feedback to one or more end-to-end layer sources The end-to-end source responds by reducing its offered load 38
Cross-layer feedback: basic idea The packet forwarder that notices congestion provides feedback to one or more end-to-end layer sources The end-to-end source responds by reducing its offered load The best solution: the packet forwarder simply discards the packet Simple and reliable! 38
Which packet to discard? 39
Which packet to discard? The choice is not obvious 39
Which packet to discard? The choice is not obvious The simplest strategy, tail drop, limits the size of the queue, and any packet that arrives when the queue is full gets discarded 39
Which packet to discard? The choice is not obvious The simplest strategy, tail drop, limits the size of the queue, and any packet that arrives when the queue is full gets discarded A better technique, called random drop, may be to choose a victim from the queue at random 39
Which packet to discard? The choice is not obvious The simplest strategy, tail drop, limits the size of the queue, and any packet that arrives when the queue is full gets discarded A better technique, called random drop, may be to choose a victim from the queue at random The sources that are contributing the most to congestion are the most likely to receive the feedback 39
Which packet to discard? The choice is not obvious The simplest strategy, tail drop, limits the size of the queue, and any packet that arrives when the queue is full gets discarded A better technique, called random drop, may be to choose a victim from the queue at random The sources that are contributing the most to congestion are the most likely to receive the feedback Another refinement, called early drop, begins dropping packets before the queue is completely full, in the hope of alerting the source sooner 39
Which packet to discard? The choice is not obvious The simplest strategy, tail drop, limits the size of the queue, and any packet that arrives when the queue is full gets discarded A better technique, called random drop, may be to choose a victim from the queue at random The sources that are contributing the most to congestion are the most likely to receive the feedback Another refinement, called early drop, begins dropping packets before the queue is completely full, in the hope of alerting the source sooner The goal of early drop is to start reducing the offered load as soon as the possibility of congestion is detected, rather than waiting till congestion is confirmed avoidance rather than recovery 39
Which packet to discard? The choice is not obvious The simplest strategy, tail drop, limits the size of the queue, and any packet that arrives when the queue is full gets discarded A better technique, called random drop, may be to choose a victim from the queue at random The sources that are contributing the most to congestion are the most likely to receive the feedback Another refinement, called early drop, begins dropping packets before the queue is completely full, in the hope of alerting the source sooner The goal of early drop is to start reducing the offered load as soon as the possibility of congestion is detected, rather than waiting till congestion is confirmed avoidance rather than recovery Random drop + early drop: random early detection (RED) 39
Which packet to discard? The choice is not obvious The simplest strategy, tail drop, limits the size of the queue, and any packet that arrives when the queue is full gets discarded A better technique, called random drop, may be to choose a victim from the queue at random The sources that are contributing the most to congestion are the most likely to receive the feedback Another refinement, called early drop, begins dropping packets before the queue is completely full, in the hope of alerting the source sooner The goal of early drop is to start reducing the offered load as soon as the possibility of congestion is detected, rather than waiting till congestion is confirmed avoidance rather than recovery Random drop + early drop: random early detection (RED) 39