CPSC 360 Network Programming The Transport Layer Reliable data delivery & flow control in TCP Michele Weigle Department of Computer Science Clemson University mweigle@cs.clemson.edu http://www.cs.clemson.edu/~mweigle/courses/cpsc360 1 Transport Layer Protocols & Services Outline! Fundamental transport layer services» Multiplexing/Demultiplexing» Error detection» Reliable data delivery» Pipelining» Flow control» Congestion control! Internet transport protocols» UDP» TCP... Logical end-toend transport... application transport network link physical...... network link physical network link physical application transport network link physical 2
TCP Overview TCP is! Point-to-point, full-duplex» Bi-directional data flow within a connection! Reliable, in-order byte steam» No message boundaries! Connection-oriented» Handshaking initializes sender and receiver state before data exchange! Pipelined» Congestion and flow control determine window size» Each endpoint has two buffers: a send and receive buffer Application writes data TCP send buffer door segments Application reads data TCP receive buffer RFCs: 793, 1122, 1323, 2018, 2581! Congestion controlled» Internet would cease to function without this!! Flow controlled» Sender and receiver have synchronized windows to ensure receiver is not overwhelmed 3 TCP Segment Structure Header and payload format Urgent data ( URG ) ACK number is valid Push data now ( PSH ) Connection establishment and teardown commands ( RST, SYN, FIN ) Same checksum as in UDP 32 bits source port # dest port # sequence number head len acknowledgement number not used UAP R S F checksum rcvr window size ptr urgent data Options (variable length) Application data (variable length) s of data sent/received are numbered (not segments) Number of bytes the receiver is willing to accept payload! MSS (Maximum Segment Size) 4
TCP Sequence Numbers and ACKs Telnet example! Sequence numbers:» stream index of the first byte in the segment s payload! ACKs:» Sequence number of next byte expected from the other side» ACKs are cumulative! How does receiver handle out-of-order segments?» TCP spec doesn t say, it s up to the implementor User types C time host ACKs receipt of echoed C Host A Host B Seq=42, ACK=79, data =!C" Seq=79, ACK=43, data = C Seq=43, ACK=80, data = host ACKs receipt of C, echoes back C 5 Reliable Data Transfer in TCP Sender s state machine TCP_send(data) create segment udt_send(segment nextseqnum) start_timer receive ACK for segment y update timers advance the window to y retransmit y if third duplicate ACK timeout for segment y udt_send(segment y) start timer wait for event! TCP retransmits segments if:» An expected ACK times out» 3 duplicate ACKs for a segment are received 6
Reliable Data Transfer in TCP Simplified sender s state machine Sliding window Sent and ACK ed Sent and unack ed Unsent and eligible Unsent and ineligible 1st sequence send base next sequence number sendbase = initial_sequence number nextseqnum = initial_sequence number loop (forever) { switch(event) event: data received from application above event: timer timeout for segment with sequence number y event: ACK received with value y } create TCP segment with sequence number nextseqnum start timer for segment with nextseqnum pass segment to IP nextseqnum = nextseqnum + length(data) retransmit segment with sequence number y ( segment y ) compute new timeout interval for segment y restart timer for segment y 7 Reliable Data Transfer in TCP Simplified sender s state machine Sliding window Sent and ACK ed 1st sequence send base Sent and unack ed Unsent and eligible Unsent and ineligible next sequence number if (y > sendbase) { /* Cumulative ACK of all data up to y */ cancel timers for segments with sequence numbers # y sendbase = initial_sequence number := y sendbase nextseqnum = initial_sequence number } else if ( y == sendbase) { /* A duplicate ACK */ loop (forever) { increment number of duplicate ACKs received switch(event) if (number of duplicate ACKS received == 3) event: data received from application { /* Fast retransmit */ above resend segment with sequence number sendbase event: timer timeout for segment with restart timer for segment sendbase sequence number y } event: ACK received with}value y } 8
TCP ACK generation rules [RFC 1122, RFC 2581] Event In-order segment arrival, no gaps, all previous data already ACKed In-order segment arrival, no gaps, one delayed ACK pending Out-of-order segment arrival (higher than expected sequence number) Gap detected Arrival of segment that partially or completely fills gap TCP Receiver action Delayed ACK. Wait 200 ms (up to 500 ms allowed) for next segment. If no segment received, send ACK Immediately send single cumulative ACK Send duplicate ACK, indicating sequence number of next expected byte Immediate ACK if segment starts at lower end of gap 9 Reliable Data Transfer in TCP Retransmission examples Host A Host B Host A Host B timeout X ACK=100 Seq=100 timeout Seq=92 timeout Seq=100, 20 data bytes ACK=100 ACK=120 ACK=100 time ACK=120! Lost ACK scenario! Premature timeout 10
Reliable Data Transfer in TCP Retransmission examples Host A Host B Host A Host B timeout Seq=100, 20 data bytes X ACK=100 ACK=120 Seq=100 timeout Seq=92 timeout Seq=100, 20 data bytes ACK=100 ACK=120 time ACK=120! Cumulative ACKs potentially avoid retransmissions! Premature timeout 11 Reliable Data Transfer in TCP Setting the ACK timer! How large should the ACK timeout value be?» Too short: Premature timeouts result in unnecessary retransmissions» Too long: Slow reaction to loss results in poor performance because the sender s windows stops advancing! Timer should be longer than the RTT, but how do we estimate RTT?» Measure the time from segment transmission until receipt of ACK ( SampleRTT ) " Ignore retransmissions " Measure only one segment s RTT at a time» SampleRTT will vary, so we compute an average RTT based on several recent RTT samples 12
Reliable Data Transfer in TCP Estimating round-trip-time EstimatedRTT = (1-!)*EstimatedRTT +! *SampleRTT Timeout = EstimatedRTT + 4*Deviation Deviation = (1-")*Deviation + " * SampleRTT-EstimatedRTT! The estimated RTT is an exponential weighted moving average (EWMA)» Computes a smooth average» Influence of a given sample decreases exponentially fast» Typical value of! is 0.125! Timeout is EstimtedRTT plus safety margin! Large variation in EstimatedRTT results in a larger safety margin 13 Reliable Data Transfer in TCP Estimating round-trip-time 14
TCP Flow Control Window control sequence 1 st Receiver!s buffer (RcvBuffer) Delivered to application Received, ACKed, not delivered Acceptable but not received Not expected, not received! Flow control is the problem of ensuring the receiver is not overwhelmed» The receiver can become overwhelmed if the application reads too slow or the sender transmits too fast Receiver!s window (RcvWindow)! The receiver s window represents its remaining buffer capacity! The window advances as the application reads received data 15 TCP Flow Control Window control sequence 1 st Receiver!s buffer (RcvBuffer) Delivered to application Received, ACKed, not delivered Acceptable but not received Not expected, not received! The receiver explicitly informs the sender of the amount of free buffer space in RcvBuffer» RcvWindow field in TCP segment! The sender keeps the amount of transmitted, unacked data less than most recently received RcvWindow source port dest port sequence number ack number not receiver used flags window size checksum urgent data head len Receiver!s window TCP options Application data 16
TCP Flow Control Window control sequence 1 st RcvBuffer Delivered to application Received, ACKed, not delivered Acceptable but not received Not expected, not received Read Rcvd RcvWindow! The goal is to ensure: Rcvd - Read! RcvBuffer! Sender is sent: RcvWindow = RcvBuffer - (Rcvd-Read) 17 TCP Flow Control Window control sequence 1 st Sent and ACKed Sent and not ACKed Sender!s buffer Eligible to be sent Ineligible Application blocked from sending ACKed Sent RcvWindow! The sender ensures: Sent - ACKed! RcvWindow 18
TCP Connection Management The three-way handshake Client client TCP 3-way handshake bytes Server welcoming connection! TCP endpoints establish a connection before exchanging data segments» client: connection initiator Socket clientsocket = new Socket("hostname", "port number")» server: contacted by client Socket connectionsocket = welcomesocket.accept() 19 TCP Connection Management The three-way handshake! Client sends SYN segment to server» The SYN specifies the client s initial sequence number! Server receives SYN, replies with SYN+ACK segment» ACKs received SYN» Allocates buffers» Specifies server s initial sequence number! Third segment may be an ACK only or an ACK+data client Connection request (SYN=1, ACK=0 seq=client_isn) server Connection granted (SYN=1, ACK=1 seq=server_isn, ack=client_ins+1) ACK (SYN=0, ACK=1 seq=client_isn+1, ack=server_isn+1) ACK (SYN=0, ACK=1 seq=client_isn+1, ack=server_isn+1, data= ) 20
TCP Connection Management Closing a connection! Client sends FIN segment to server! Server receives FIN, replies with ACK» Server closes connection, sends FIN! Client receives FIN, replies with ACK! Client enters timed wait state» Client will ACK any received FIN close() Timed Wait client server FIN ACK FIN ACK Connection closed close() 21 TCP Connection Management Client/Server lifecycles receive FIN send ACK Wait 30 seconds Closed connect() send SYN receive ACK send nothing Closed Server creates listen Timed Wait SYN Sent ACK Listen receive FIN send ACK FIN Wait 2 receive SYN+ACK send ACK Established close() send FIN Close Wait receive SYN send SYN & ACK SYN Received receive ACK send nothing FIN Wait 1 close() send FIN receive FIN send ACK Established receive ACK send nothing! TCP client lifecycle! TCP server lifecycle 22
hufflepuff:[~]% sudo tcpdump host hufflepuff and host www.cs.clemson.edu and tcp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on en0, link-type EN10MB (Ethernet), capture size 96 bytes TCP Traces 11:08:22.816917 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http: S 1990840413:1990840413(0) win 65535 <mss 1460,nop,wscale 0,nop,nop,timestamp 1273416398 0> 11:08:22.817253 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071: S 150440752:150440752(0) ack 1990840414 win 49232 <nop,nop,timestamp 1628316 1273416398,mss 1460,nop,wscale 0> 11:08:22.817331 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http:. ack 1 win 65535 11:08:22.821890 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http: P 1:506(505) ack 1 win 65535 11:08:22.822224 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071:. ack 506 win 49232 11:08:22.828103 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071:. 1:1449(1448) ack 506 win 49232 11:08:22.828112 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071: P 1449:2208(759) ack 506 win 49232 11:08:22.880198 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http: P 506:1123(617) ack 2208 win 65535 11:08:22.880580 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071:. ack 1123 win 49232 11:08:22.883328 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071: P 2208:2388(180) ack 1123 win 49232 11:08:22.884907 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http: F 1123:1123(0) ack 2388 win 65535 11:08:22.885087 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071:. ack 1124 win 49232 11:08:22.836032 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071: F 2388:2388(0) ack 1124 win 49232 11:08:22.836067 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http:. ack 2389 win 65535 ^C 14 packets captured 127 packets received by filter 0 packets dropped by kernel For more info: % man tcpdump 23