Today s Lecture How does TCP achieve correct operation? EE 122:TCP, Connection Setup, Reliability Ion Stoica TAs: Junda Liu, DK Moon, David Zats Reliability in the face of IP s best effort service 3-way handshake to establish connections 3-way or 4-way handshake to terminate connections Retransmission to recover from loss http://inst.eecs.berkeley.edu/~ee122/ (Materials with thanks to Vern Paxson, Jennifer Rexford, and colleagues at UC Berkeley) 1 We ll only look at timeout-based retransmission today 2 TCP Service Model TCP Support for Reliable Delivery Reliable, in-order, byte-stream delivery and with good performance Challenges - the network can drop packets delay packets deliver packets out-of-order replicate packets Weird, but it does sometimes happen corrupt packets Checksum Used to detect corrupted data at the receiver leading the receiver to drop the packet s Used to detect missing data... and for putting the data back in order Retransmission Sender retransmits lost or corrupted data based on estimates of round-trip time Fast retransmit algorithm for rapid retransmission 3 4 1
These should be familiar 5 6 Starting sequence number (byte offset) of data carried in this segment gives seq # just beyond highest seq. received in order. E.g., if sender sends N in-order bytes starting at seq S then ack for it will be S+N. 7 8 2
Number of 4-byte words in TCP header; 5 = no options Must Be Zero 6 bits reserved 9 10 We will get to these shortly Buffer space available for receiving data. Used for TCP s sliding window. Interpreted as offset beyond field s value. 11 12 3
Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 TCP Stream of Bytes Service Host A Used with URG flag to indicate urgent data (not discussed further) Host B 13 14 Provided Using TCP Segments TCP Segment Host A IP TCP (segment) TCP Hdr IP Hdr Host B TCP TCP Segment sent when: 1. Segment full (Max Segment Size), 2. Not full, but times out, or 3. Pushed by application. 15 IP packet No bigger than Maximum Transmission Unit (MTU) E.g., up to 1,500 bytes on an Ethernet TCP packet IP packet with a TCP header and data inside TCP header 20 bytes long (IP Header = 20 bytes) TCP segment No more than Maximum Segment Size (MSS) bytes E.g., up to 1460 consecutive bytes from the stream 16 4
Sequence Numbers Host A Sequence number = 1 st byte Host B ISN (initial sequence number) TCP TCP HDR TCP TCP HDR sequence number = next expected byte Initial Sequence Number (ISN) for the very first byte E.g., Why not just use ISN = 0? Practical issue IP addresses and port #s uniquely identify a connection Eventually, though, these port #s do get used again there is a chance an old packet is still in flight and might be associated with new connection TCP requires (RFC793) changing ISN over time Set from 32-bit clock that ticks every 4 microseconds only wraps around once every 4.55 hours To establish a connection, hosts exchange ISNs 17 18 Timing Diagram: 3-Way Handshaking Connection Establishment: TCP s Three-Way Handshake Active Open Client (initiator) connect()! SYN, SeqNum = x Passive Open Server listen()! SYN +, SeqNum = y, Ack = x + 1, Ack = y + 1 accept()! 19 20 5
What if the SYN Gets Lost? Flags: SYN FIN RST PSH URG HdrLen 0 Checksum Flags See /usr/include/netinet/tcp.h on Unix Systems Advertised window Urgent pointer 21 Suppose the SYN packet gets lost is lost inside the network, or: Server discards the packet (e.g., listen queue is full) Eventually, no SYN- arrives Sender sets a timer and waits for the SYN- and retransmits the SYN if needed How should the TCP sender set the timer? Sender has no idea how far away the receiver is Hard to guess a reasonable length of time to wait SHOULD (RFCs 1122 & 2988) use default of 3 seconds Other implementations instead use 6 seconds 22 Tearing Down (Close) the Connection Close Connection (Two-Army Problem) Goal: both sides agree to close the connection Two-army problem: Two blue armies need to simultaneously attack the white army to win; otherwise they will be defeated. The blue army can communicate only across the area controlled by the white army which can intercept the messengers. 23 What is the solution? 24 6
Normal Termination, One Side (A) Normal Termination, Both Together B B SYN SYN FIN FIN SYN SYN FIN FIN + A Finish (FIN) to close and receive remaining bytes FIN occupies one byte in the sequence space Other host ack s the byte to confirm Closes A s side of the connection, but not B s time Until B likewise sends a FIN Which A then acks Connection now half-closed : Avoid reincarnation Can retransmit FIN if lost Connection now closed 25 A time : Avoid reincarnation Can retransmit FIN if lost Connection now closed Same as before, but B sets FIN with their ack of A s FIN 26 Abrupt Termination B SYN SYN RST RST A time Reliability: TCP Retransmission A sends a RESET (RST) to B E.g., because app. process on A crashed That s it B does not ack the RST Thus, RST is not delivered reliably And: any data in flight is lost But: if B sends anything more, will elicit another RST 27 28 7
Reasons for Retransmission How Long Should Sender Wait? lost lost DUPLICATE PET Early timeout DUPLICATE 29 PETS Sender sets a timeout to wait for an Too short: wasted retransmissions Too long: excessive delays when packet lost TCP sets retransmission timeout (RTO) as function of RTT Expect to arrive an RTT after data sent plus slop to allow for variations (e.g., queuing, MAC) But: how does the sender know the RTT? And: what s a good estimate for slop? 30 RTT Estimation Use exponential averaging: EstimatedRTT SampleRTT = AckRcvdTime SendTime EstimatedRTT = α EstimatedRTT + (1 α) SampleRTT α = 7/8 (for one measurement per flight) SampleRTT Time 31 Jacobson/Karels Algorithm Compute slop in terms of observed variability One solution: use standard deviation (requires expensive square root computation) Use mean deviation instead Difference = SampleRTT EstimatedRTT Deviation = Deviation +δ ( Difference Deviation) RTO = µ EstimatedRTT + φ Deviation δ =1/4 (again, for one measurement per flight) µ =1 φ = 4 Implementations often use a coarse-grained (500 msec) 32 timer, so resulting value is large 8
Problem: Ambiguous Measurement How to differentiate between the real, and of the retransmitted packet? SampleRTT? Sender Original Transmission Retransmission Receiver SampleRTT? Sender Original Transmission Retransmission Receiver Karn/Partridge Algorithm Measure SampleRTT only for original transmissions Once a segment has been retransmitted, do not use it for any further measurements Also, employ exponential backoff Every time RTO timer expires, set RTO 2 RTO (Up to maximum 60 sec) Every time new measurement comes in (= successful original transmission), collapse RTO back to computed value 33 34 Summary Reliable, in-order, byte-stream delivery s s 3-way handshake to establish 3-way or 4-way handshake to terminate Timer-based retransmission What s missing? Performance 35 9