TCP: Reliable, In-Order Delivery EE 122: Intro to Communication Networks Fall 2007 (WF 4-5:30 in Cory 277) Vern Paxson Ts: Lisa Fowler, Daniel Killebrew & Jorge Ortiz http://inst.eecs.berkeley.edu/~ee122/ nnouncements Project #1, phase 2 due next Monday @ 11PM No lecture next Wednesday, Oct 31 Regular office hours next Friday ut not Wednesday vailable via appointment as usual, send email In case you re interested, I ll be giving a talk next week, Experiences With Countering Internet ttacks 1-2PM in the Wozniak lounge (Soda) Materials with thanks to Jennifer Rexford, Ion Stoica, and colleagues at Princeton and UC erkeley 1 2 Today s Lecture TCP Service Model How does TCP achieve correct operation? Reliability in the face of IP s meager best effort service 3-way handshake to establish connections 3-way or 4-way handshake to terminate connections Retransmission to recover from loss We ll only look at timeout-based retransmission today 3 Reliable, in-order, byte-stream delivery and with good performance Challenges - the network can drop packets o Even perhaps a large number delay packets o Even perhaps for many seconds deliver packets out-of-order o Follows from possibility of arbitrary delay replicate packets o Weird, but it does sometimes happen corrupt packets (What s missing?) 4 TCP Support for Reliable Delivery Used to detect corrupted data at the receiver leading the receiver to drop the packet s Used to detect missing data... and for putting the data back in order Retransmission Sender retransmits lost or corrupted data based on estimates of round-trip time Fast retransmit algorithm for rapid retransmission 5 6 1
These should be familiar Starting sequence number (byte offset) of data carried in this segment 7 8 gives seq # just beyond highest seq. received in order. If sender sends N in-order bytes starting at seq S then ack for it will be S+N. Number of 4-byte words in TCP header; 5 = no options 9 10 Must e Zero 6 bits reserved HdrLen 0 Flags dvertised window We will get to these shortly HdrLen 0 Flags dvertised window 11 12 2
uffer space available for receiving data. Used for TCP s sliding window. Used with flag to indicate urgent data (not discussed further) Interpreted as offset beyond field s value. 13 14 TCP Stream of ytes Service Provided Using TCP Segments Host Host yte 80 yte 2 yte 1 yte 0 yte 80 yte 2 yte 1 yte 0 TCP Segment sent when: 1. Segment full (Max Segment Size), 2. Not full, but times out, or 3. Pushed by application. Host Host TCP yte 80 yte 2 yte 1 yte 0 yte 80 yte 2 yte 1 yte 0 15 16 TCP Segment Sequence Numbers IP TCP (segment) TCP Hdr IP Hdr Host ISN (initial sequence number) IP packet No bigger than Maximum Transmission Unit (MTU) E.g., up to 1,500 bytes on an Ethernet TCP packet IP packet with a TCP header and data inside TCP header 20 bytes long TCP segment No more than Maximum Segment Size (MSS) bytes E.g., up to 1460 consecutive bytes from the stream Sequence number = 1 st byte Host TCP TCP HDR TCP TCP HDR sequence number = next expected byte 17 18 3
Initial Sequence Number (ISN) for the very first byte E.g., Why not just use ISN = 0? Practical issue IP addresses and port #s uniquely identify a connection Eventually, though, these port #s do get used again a chance an old packet is still in flight and might be associated with new connection TCP requires (RFC793) changing ISN over time Set from 32-bit clock that ticks every 4 microseconds only wraps around once every 4.55 hours Connection Establishment: TCP s Three-Way Handshake To establish a connection, hosts exchange ISNs 20 19 Establishing a TCP Connection Each host tells its ISN to the other host. Three-way handshake to establish connection Host sends a (open; synchronize sequence numbers ) to host Host returns a acknowledgment ( ) Host sends an to acknowledge the 21 HdrLen 0 Flags See /usr/include/netinet/tcp.h on Unix Systems dvertised window 22 Step 1: s Initial Step 2: s - s port s port s port s port s Initial Sequence Number (Irrelevant since not set) 5=20 Flags 0 dvertised window s Initial Sequence Number = s ISN plus 1 20 0 Flags dvertised window tells it wants to open a connection tells it accepts, and is ready to hear the next byte 23 upon receiving this packet, can start sending data 24 4
Step 3: s of the - Timing Diagram: 3-Way Handshaking s port s port s Initial Sequence Number s ISN plus 1 20 0 Flags dvertised window ctive Open Client (initiator) connect(), SeqNum = x +, SeqNum = y, ck = x + 1 Passive Open Server listen(), ck = y + 1 tells it s likewise okay to start sending accept() upon receiving this packet, can start sending data 25 26 What if the Gets Lost? Suppose the packet gets lost is lost inside the network, or: Server discards the packet (e.g., listen queue is full) Eventually, no - arrives Sender sets a timer and waits for the - and retransmits the if needed How should the TCP sender set the timer? Sender has no idea how far away the receiver is Hard to guess a reasonable length of time to wait SHOULD (RFCs 1122 & 2988) use default of 3 seconds o Other implementations instead use 6 seconds 27 Loss and Web Downloads User clicks on a hypertext link rowser creates a socket and does a connect The connect triggers the OS to transmit a If the is lost 3-6 seconds of delay: can be very long User may become impatient and click the hyperlink again, or click reload User triggers an abort of the connect rowser creates a new socket and another connect Essentially, forces a faster send of a new packet! Sometimes very effective, and the page comes quickly 28 5 Minute reak Tearing Down the Connection Questions efore We Proceed? 29 30 5
Normal Termination, One Side t Time Normal Termination, oth Together + time Finish () to close and receive remaining bytes occupies one octet in the sequence space Other host ack s the octet to confirm Closes s side of the connection, but not s Until likewise sends a Which then acks Connection now half-closed : void reincarnation Can retransmit if lost Connection now closed 31 time : void reincarnation Can retransmit if lost Connection now closed Same as before, but sets with their ack of s 32 Sending/Receiving the Sending a : close() Process has finished sending data via the socket Process calls close() to close the socket Once TCP has sent all of the outstanding bytes then TCP sends a o Even if bytes not yet ack d o ecause has seqno beyond all the bytes o and thus won t be ack d until all bytes are delivered Receiving a : EOF Process is reading data from the socket Eventually, the attempt to read returns an EOF ll bytes prior to sender calling close() have been delivered 33 brupt Termination time sends a RESET () to E.g., because app. process on crashed That s it does not ack the Thus, is not delivered reliably nd: any data in flight is lost ut: if sends anything more, will elicit another 34 Reasons for Retransmission Reliability: TCP Retransmission 35 lost lost DUPLICTE PET Early timeout DUPLICTE PETS 36 6
How Long Should Sender Wait? Sender sets a timeout to wait for an Too short: wasted retransmissions Too long: excessive delays when packet lost TCP sets retransmission timeout (RTO) as function of RTT Expect to arrive an RTT after data sent plus slop to allow for variations (e.g., queuing, MC) ut: how does the sender know the RTT? nd: what s a good estimate for slop? RTT Estimation Use exponential averaging: SampleRTT = ckrcvdtime " SendTime EstimatedRTT = # $ EstimatedRTT + (1"#) $ SampleRTT # = 7 /8 (for one measurement per flight) EstimatedRTT SampleRTT 37 Time 38 Jacobson/Karels lgorithm Compute slop in terms of observed variability One solution: use standard deviation (requires expensive square root computation) Use mean deviation instead Difference = SampleRTT " EstimatedRTT Deviation = Deviation + # $ ( Difference "Deviation) RTO = µ $ EstimatedRTT + % $ Deviation # =1/4 (again, for one measurement per flight) µ =1 % = 4 Problem: mbiguous Measurement How to differentiate between the real, and of the retransmitted packet? SampleRTT? Sender Original Transmission Retransmission Receiver SampleRTT? Sender Original Transmission Retransmission Receiver Implementations often use a coarse-grained (500 msec) 39 40 timer, so resulting value is large Karn/Partridge lgorithm Measure SampleRTT only for original transmissions Once a segment has been retransmitted, do not use it for any further measurements lso, employ exponential backoff Every time RTO timer expires, set RTO 2 RTO (Up to maximum 60 sec) Every time new measurement comes in (= successful original transmission), collapse RTO back to computed value 41 Summary Reliable, in-order, byte-stream delivery s s 3-way handshake to establish 3-way or 4-way handshake to terminate Timer-based retransmission What s missing? Performance Next lecture Congestion control (K&R 3.6, 3.7) Reminder: next lecture is Fri Nov 2, not Wed Oct 31 42 7