TCP Congestion Control
What is Congestion The number of packets transmitted on the network is greater than the capacity of the network Causes router buffers (finite size) to fill up packets start getting dropped Why is it bad? retransmissions cause bandwidth wastage delay is increased Congestion Collapse - retransmissions due to drop due to congestion can further increase congestion!!
Effect of Congestion
Congestion Control Congestion control aims to keep number of packets below level at which performance falls off dramatically End-to-end flow control is not enough! Independent senders can each have flow control with their receivers, but together can inject large number of packets in the network
Mechanisms for Congestion Control
Backpressure Congested node slows down or halts flow of packets from some or all incoming nodes These nodes in turn slows down or halts incoming packet flow Propagates back to source
Choke Packet Control packet Generated at congested node Sent to source node e.g. ICMP source quench From router or destination Source cuts back until no more source quench message Sent for every discarded packet, or anticipated Rather crude mechanism
Implicit Congestion Signaling Transmission delay may increase with congestion Packet may be discarded Source can detect these as implicit indications of congestion and dynamically adjust packet sending rate Useful on connectionless (datagram) networks basis for TCP congestion control
TCP Congestion Control Sender estimates level of congestion from packet delay/drop Send more when no congestion detected Slow down when congestion detected Somewhat messy, but simple to implement Two issues: detecting congestion adjusting sending rate
Detecting Congestion Detected by detecting packet drops How to detect packet drops? timeout for acknowledgements too many duplicate acknowledgements We will consider only timeout here
Adjusting Sending Rate Based on TCP Congestion Window (cwnd) Limits how much data can be in transit (similar to receiver window advertisement, but purpose is different) Max. window size at sender at any time = min (cwnd, recvadvertisedwindow) cwnd is varied to control sending rate to address congestion varied by sender, congestion control is sender-side task recvadvertisedwindow varied to address end-to-end flow control varied by receiver, as discussed earlier
Basic principle On detecting packet drop, decrease cwnd On receiving ack, increase cwnd What should be the rates of increase and decrease? different variations exist Two phases of TCP congestion control slow start congestion avoidance
Detecting Which Phase to Do Based on a variable ss_thresh (slow start threshold) cwnd initially set to maximum segment size (MSS) slow start : cwnd < ss_thresh congestion avoidance : cwnd > ss_thresh As per RFC, implementations can choose either phase when cwnd = ss_thresh
Before we go any further Lets clear up some confusion that may arise in interpreting cwnd cwnd is implemented as number of bytes (as it should be as TCP is byte-oriented) However, most descriptions talks about cwnd in terms of number of segments A segment in the context of cwnd means a full-size segment (size = maximum segment size, MSS), so easy to convert RFC 2581 talks in terms of both We will also talk about cwnd in term of no. of segments (easier to follow from text for students)
Slow Start Used to find a good sending rate initially at startup, or while recovering after congestion Whenever starting traffic on a new connection, or whenever increasing traffic after congestion was detected: Intialize cwnd = 1 Each time a segment is acknowledged, increment cwnd by 1 Continue until ss_thresh is reached Packet loss is detected
ACK for segments 2 + 3 Slow start is cwnd = 1 not so slow!!ack for segment 1 segment 1 cwnd = 2 segment 2 segment 3 segment 4 segment 5 segment 6 segment 7 cwnd = 4 ACK for segments 4+5+6+7 cwnd = 8
Congestion Avoidance Slow start sets a good congestion window fast Congestion avoidance slows down the increase in cwnd If cwnd > ss_thresh then each time a segment is acknowledged increment cwnd by 1/cwnd (cwnd += 1/cwnd) cwnd is increased by one only if all segments have been acknowledged Increases by 1 per RTT, vs. doubling per RTT Additive Increase
Slow Start + Congestion Avoidance cwnd = 1 Assume that ss_thresh = 8 cwnd = 2 Cwnd (in segments) 14 12 10 8 6 4 2 0 t=0 ssthresh t=2 t=4 Roundtrip times t=6 cwnd = 4 cwnd = 8 cwnd = 9 cwnd = 10
Congestion Avoidance (contd.) On each timeout, ss_thresh is set to half the current size of the congestion window: ss_thresh = cwnd / 2 cwnd is reset to one cwnd = 1 and slow-start is entered This is called multiplicative decrease
Example cwnd ss_thresh Timeout Slow Start Congestion Avoidance ss_thresh Time
Other TCP Congestion Control Schemes What we described is TCP Tahoe (see RFC 2581) Other implementations are there for higher throughput TCP Reno : adds fast retransmit and fast recovery with TCP Tahoe (see RFC 2581) used mostly TCP Vegas TCP SACK (selective ACK, see RFC 2018 for basics)
Fast Retransmit and Fast Recovery Fast retransmit Retransmit on 3 duplicate ACKs (total 4) for the same segment (without waiting for timeout) Receiver should send ACKs for segments received out of order (creating the duplicate ACKs for the missing segment) Fast recovery Do not enter slow start if fast retransmit is used to send a segment ss-thresh and cwnd set using a slightly complex method Enter slow start on a timeout as usual