CSE 123A Computer Networks Winter 2005 Lecture 14 Congestion Control Some images courtesy David Wetherall Animations by Nick McKeown and Guido Appenzeller
The bad news and the good news The bad news: new homework is up (due a week from today at the beginning of class) The good news: Walter pled on your behalf for an extension for project 2 and your professor is merciful New due date: Friday at 10pm March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 2
Your TAs David Yu Brian Lum Walter Philips March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 3
Today How fast should a sending host transmit data? Not to fast, not to slow, just right Should not be faster than the receiver can process Flow control (last class) Should not be faster than the sender s share Bandwidth allocation Should not be faster than the network can process Congestion control Congestion control & bandwidth allocation are separate ideas, but frequently combined March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 4
What is bandwidth allocation? Source 1 Source 2 Source 3 100Mbps 10Mbps Router 10Mbps Router 10Mbps Router 4Mbps 6Mbps Destination 1 Destination 2 How much bandwidth should each flow from a source to a destination receive when they compete for resources? What is a fair allocation? March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications
Quick review: How queuing works Queues absorb short-term traffic bursts Long term overload will cause packets to be dropped Two components to a queuing mechanism Scheduling: which packets are sent from queue? Buffer management: what happens when queue is full? Most of the Internet is FIFO/Drop Tail First-In-First-Out: Packets leave in same order they arrive Drop tail: When queue is full, newly arriving packets dropped Simple to implement at high speeds Why might you want other policies? March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 6
Why buffering is good: Statistical Multiplexing Observations 1. The bigger the buffer, the lower the packet loss. 2. If the buffer never goes empty, the outgoing line is busy 100% of the time. March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 7
What is congestion? Source 1 Source 2 10-Mbps Ethernet 100-Mbps FDDI Router 1.5-Mbps T1 link Packets dropped here Destination Buffer intended to absorb bursts when input rate > output But if sending rate is persistently > drain rate, queue builds Dropped packets represent wasted work; goodput < throughput March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications
Impact of load on FIFO/Drop-Tail Queues Congestive packet loss Throughput Latency Congestion collapse Network Load March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 9
Congestion collapse Rough definition: When an increase in network load produces a decrease in useful work Why does it happen? Sender sends faster than bottleneck link speed» Whats a bottleneck link? Packets queue until dropped In response to packets being dropped, sender retransmits All hosts repeat in steady state March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 10
What can be done? Increase network resources More buffers for queuing Increase link speed Pros/Cons of these approaches? Reduce network load Send data more slowly How much more slowly? When to slow down? March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 11
Review: TCP Congestion Control Only W packets may be outstanding Rule for adjusting W If an ACK is received: W W+1/W If a packet is lost: W W/2 March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 12
High-level design choices Open loop Explicitly reserve bandwidth in the network in advance of sending Closed loop Respond to feedback and adjust bandwidth allocation Network-based Network implements and enforces bandwidth allocation Host-based Hosts are responsible for controlling their sending rate to be no more than their share What is typically used on the Internet? Why? March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 13
Proactive vs reactive approaches Congestion avoidance: try to stay to the left of the knee Congestion control: try to stay to the left of the cliff Knee Cliff Congestive packet loss Throughput Latency Congestion collapse Network Load March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 14
Key questions How to detect congestion? How to limit sending data rate? How fast to send? March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 15
How to detect congestion? Explicit congestion signaling Source Quench: ICMP message from router to sender DECBit / Explicit Congestion Notification (ECN):» Router marks packet based on queue occupancy (i.e. indication that packet encountered congestion the way)» Receiver tells sender if queues are getting too full Implicit congestion signaling Packet loss» Assume congestion is primary source of packet loss» Lost packets (timeout, NAK) indicate congestion Packet delay» Round-trip time increases as packets queue» Packet inter-arrival time is a function of bottleneck link March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 16
How to limit the sending rate? Window-based (TCP) Artificially constrain number of outstanding packets allowed in network Increase window to send faster; decrease to send slower Pro: Cheap to implement, good failure properties Con: Creates traffic bursts (requires bigger buffers) Rate-based (many streaming media protocols) Two parameters (period, packets) Allow sending of x packets in period y Pro: smooth traffic Con: per-connection times, what if receiver fails? March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 17
How fast to send? Ideally: Keep equilibrium at knee of power curve Find knee somehow Keep number of packets in flight the same Don t send a new packet into the network until you know one has left (I.e. by receiving an ACK) What if you guess wrong, or if bandwidth availability changes? Compromise: adaptive approximation If congestion signaled, reduce sending rate by x If data delivered successfully, increase sending rate by y How to relate x and y? Most choices don t converge March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 18
How does TCP do it? Jacobson&Karels88 -> seminal paper in computer networking 5 th most cited paper in all computer science Context: 1986 brings huge congestion collapse LBL<->Berkeley link throughput decreases by 1000x Motivation for paper: Why? and how to fix it? Key algorithms Congestion avoidance (misnamed) Slow start Fast retransmit & fast recovery March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 19
Basic approach: TCP Probes the Network Source 100 Mbps Ethernet Router 45 Mbps T3 link Sink Each source independently probes the network to determine how much bandwidth is available Changes over time, since everyone does this Assume that packet loss implies congestion Since errors are rare; also, requires no support from routers March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 20
Basic implementation Window-based congestion control Allows congestion control and flow control mechanisms to be unified rwin: advertised flow control window from receiver cwnd: congestion control window» Estimate of how much outstanding data network can deliver in a round-trip time Sender can only send MIN(rwin,cwnd) at any time Idea: increase cwnd when congestion is encountered; increase cwnd otherwise Question: how much to adjust? March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 21
Congestion avoidance algorithm Goal: Adapt to changes in available bandwidth Additive increase, Multiplicative Decrease (AIMD) Increase sending rate by a constant (e.g. by 1500 bytes) Decrease sending rate by a linear factor (e.g. divide by 2) Rough intuition for why this works Let L i be queue length at time i In steady state: L i = N, where N is a constant During congestion, L i = N + yl i-1, where y > 0 Consequence: queue size increases exponentially» Must reduce sending rate exponentially as well (multiplicative decrease) March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 22
AIMD (Additive Increase/Multiplicative Decrease) Increase slowly while we believe there is bandwidth Additive increase per RTT Cwnd += 1 full packet / RTT Source Destination Decrease quickly when there is loss (went too far!) Multiplicative decrease Cwnd /= 2 March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 23
Basic TCP congestion control (no slow start) Only W packets may be outstanding Rule for adjusting W If an ACK is received: W W+1/W If a packet is lost: W W/2 March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 24
Slow start Goal: quickly find the equilibrium sending rate Quickly increase sending rate until congestion detected Remember last rate that worked and don t overshoot it Algorithm: On new connection, or after timeout, set cwnd=1 full pkt For each segment acknowledged, increment cwnd by 1 pkt If timeout then divide cwnd by 2, and set ssthresh = cwnd If cwnd >= ssthresh then exit slow start Why called slow? Its exponential after all March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 25
Slow start growth example cwnd=1 cwnd=2 cwnd=4 cwnd=8 Sender Ack 3 Ack 4 Ack 5 Ack 6 Ack 7 Ack 8 1 Ack 2 2 3 4 5 6 7 Receiver cwnd 300 250 200 150 100 50 0 0 1 2 3 4 5 6 7 8 round-trip times March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 26
Putting it together 18 16 Slow Start + Congestion Avoidance Timeout cwnd 14 12 10 8 6 4 2 Congestion avoidance ssthresh 0 0 2 4 6 8 10 12 14 16 Slow start 18 20 22 24 26 28 30 32 34 36 38 40 round-trip times March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 27
Fast retransmit & recovery Fast retransmit Timeouts are slow (1 second is fastest timeout on many TCPs) When packet is lost, receiver still ACKs last in-order packet Use 3 duplicate ACKs to indicate a loss; detect losses quickly» Why 3? When wouldn t this work? Fast recovery Goal: avoid stalling after loss If there are still ACKs coming in, then no need for slow start If a packet has made it through -> we can send another one Divide cwnd by 2 after fast retransmit Increment cwnd by 1 full pkt for each additional duplicate ACK March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 28
Fast retransmit&recovery 3 Dup Acks Sender Ack 3 Ack 4 Ack 4 Ack 4 Ack 4 Ack 4 4 1 Ack 2 2 3 5 6 7 8 4 Receiver Fast recovery (increase cwnd by 1) Fast retransmit March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 29
Fast retransmit in action 18 16 Slow Start + Congestion Avoidance + Fast Retransmit cwnd 14 12 10 8 6 4 2 Fast retransmit 0 0 3 6 9 12 15 18 21 24 27 30 33 36 39 round-trip times March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 30
Fast recovery in action 18 16 Slow Start + Congestion Avoidance + Fast Retransmit + Fast Recovery 14 12 cwnd 10 8 6 4 2 Fast recovery 0 0 3 6 9 12 15 18 21 24 27 30 33 36 39 round-trip times March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 31
Discussion: Delayed ACKs In request/response programs, want to combine an ACK to a request with a response in same packet Wait 200ms before ACKing Must ACK every other packet (or packet burst) Impact on slow start? Must not delay duplicate ACKs Why? What is the interaction with the congestion control algorithms? March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 32
Discussion: Short Connections Short connection: only contains a few pkts How do short connections and Slow-Start interact? What happens when a packet is lost during Slow-Start? What happens when the SYN is dropped? Bottom line: Which packet gets dropped matters a lot SYN Slow-Start Congestion avoidance Do you think most flows are short or long? Do you think most traffic is in short flows or long flows? March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 33
Discussion: What happens if hosts lie? Congestion control is voluntary; relies on cooperation among endpoints Why not cheat? Easy for senders to cheat (just ignore losses) Doesn t happen. Servers have little incentive to cheat Receivers have create incentive to cheat (faster surfing) Doesn t happen. Not clear what receiver can do Savage et al, 1999 -> Ooops, turns out receivers can cheat Handful of design vulnerabilities in TCP allows receivers to arbitrarily control sending rate of remote servers Source of vulnerabilities: ACKs mean things that they don t prove March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 34
One such vulnerability ACK splitting Round- Trip Time (RTT) Sender Data 1:1461 ACK 486 ACK 973 ACK 1461 Receiver Rule: grow window by one full-sized packet for each valid ACK received Send M ACKs for one pkt Data 1461:2921 Data 2921:4381 Data 4381:5841 Growth factor proportional to M Data 5841:7301 March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 35
10 line change to Linux TCP 60000 Page fetch from CNN.com Sequence Number (bytes) 50000 40000 30000 20000 10000 Modified Client Normal Client 0 0 0.2 0.4 0.6 0.8 1 Time (sec) March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 36
Final stuff to think about TCP is designed around the premise of cooperation What happens to TCP if it competes with a UDP flow? What if we divide cwnd by 3 instead of 2 after a loss? There are a bunch of magic numbers Decrease by 2x, increase by 1/cwnd, 3 duplicate acks, g=1/8, initial timeout = 3 seconds, etc But overall it works really well! March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 37
Summary TCP Probes the network for bandwidth, assuming that loss signals congestion The congestion window is managed with an additive increase/multiplicative decrease policy It took fast retransmit and fast recovery to get there Slow start is used to avoid length initial delays Ramp up to near target rate, then switch to AIMD Fast recovery is used to keep network full while recovering from a loss March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 38
For Next Time QoS, Read 6.5 March 2, 2005 CSE 123A -- Lecture 13: Reliable Communications 39