CPSC 852 Internetworking The Transport Layer Reliable data delivery & flow control in TCP Michele Weigle Department of Computer Science Clemson University mweigle@cs.clemson.edu http://www.cs.clemson.edu/~mweigle/courses/cpsc852 1 Transport Layer Protocols & Services Outline Fundamental transport layer services» Multiplexing/Demultiplexing» Error detection» Reliable data delivery» Pipelining» Flow control» Congestion control Internet transport protocols» UDP» TCP... Logical end-to-end transport... application transport transport network network link link physical physical...... network network link link physical physical network link physical application transport transport network network link link physical physical 2
TCP Overview TCP is Point-to-point, full-duplex» Bi-directional data flow within a connection Reliable, in-order byte steam» No message boundaries Connection-oriented» Handshaking initializes sender and receiver state before data exchange Pipelined» Congestion and flow control determine window size» Each endpoint has two buffers: a send and receive buffer Application writes data TCP send buffer door segments Application reads data TCP receive buffer RFCs: 793, 1122, 1323, 2018, 2581 Congestion controlled» Internet would cease to function without this! Flow controlled» Sender and receiver have synchronized windows to ensure receiver is not overwhelmed 3 TCP Segment Structure Header and payload format Urgent data ( URG ) ACK number is valid Push data now ( PSH ) Connection establishment and teardown commands ( RST, SYN, FIN ) Same checksum as in UDP 32 bits source port # dest port # sequence number head len acknowledgement number not used UAP R SF checksum rcvr window size ptr urgent data Options (variable length) Application data (variable length) s of data sent/received are numbered (not segments) Number of bytes the receiver is willing to accept payload MSS (Maximum Segment Size) 4
TCP Sequence Numbers and ACKs Telnet example Sequence numbers:» stream index of the first byte in the segment s payload ACKs:» Sequence number of next byte expected from the other side» ACKs are cumulative How does receiver handle out-of-order segments?» TCP spec doesn t say, it s up to the implementor User types C time host ACKs receipt of echoed C Host A Host B Seq=42, ACK=79, data = C Seq=79, ACK=43, data = C Seq=43, ACK=80, data = host ACKs receipt of C, echoes back C 5 Reliable Data Transfer in TCP Sender s state machine TCP_send(data) create segment udt_send(segment nextseqnum) start_timer receive ACK for segment y update timers advance the window to y retransmit y if third duplicate ACK timeout for segment y udt_send(segment y) start timer wait wait for for event event TCP retransmits segments if:» An expected ACK times out» 3 duplicate ACKs for a segment are received 6
Reliable Data Transfer in TCP Simplified sender s state machine Sliding window Sent and ACK ed Sent and unack ed Unsent and eligible Unsent and ineligible sequence 1 st send base sendbase sendbase = initial_sequence initial_sequence number number nextseqnum nextseqnum = initial_sequence initial_sequence number number loop loop (forever) (forever){{ switch(event) switch(event) event: event: data data received received from from application application above above event: event: timer timer timeout timeout for for segment segment with with sequence sequence number number y event: event: ACK ACK received received with with value value y }} next sequence number create create TCP TCP segment segment with with sequence sequence number number nextseqnum nextseqnum start start timer timer for for segment segment with with nextseqnum nextseqnum pass pass segment segment to to IP IP nextseqnum nextseqnum = nextseqnum nextseqnum + length(data) length(data) retransmit retransmit segment segment with with sequence sequence number number y ( segment ( segment y ) y ) compute compute new new timeout timeout interval interval for for segment segmenty restart restart timer timer for for segment segmenty 7 Reliable Data Transfer in TCP Simplified sender s state machine Sliding window Sent and ACK ed Sent and unack ed Unsent and eligible Unsent and ineligible sequence 1 st send base next sequence number if if (y (y > sendbase) sendbase) {{ /* /* Cumulative Cumulative ACK ACK of of all all data data up up to to y */ */ cancel sendbase sendbase = initial_sequence cancel timers initial_sequence number timers for for segments segments with with sequence sequence numbers numbers y sendbase number nextseqnum nextseqnum = initial_sequence sendbase := initial_sequence number := y }} number else loop loop (forever) (forever){ else if if (( y == == sendbase) sendbase) {{ /* /* A duplicate duplicate ACK ACK */ */ { increment switch(event) increment number number of of duplicate duplicate ACKs ACKs received received switch(event) if event: event: data data received received from from application if (number (number of of duplicate duplicate ACKS ACKS received received == == 3) 3) application { above { /* /* Fast Fast retransmit retransmit */ */ above resend event: event: timer timer timeout timeout for for segment resend segment segment with segment with with sequence sequence number number sendbase sendbase restart with sequence sequence number number y restart timer timer for for segment segment sendbase sendbase } event: event: ACK ACK received received with with value } } value y } } } 8
Reliable Data Transfer in TCP ACK generation rules [RFC 1122, RFC 2581] Event In-order segment arrival, no gaps, all previous data already ACKed In-order segment arrival, no gaps, one delayed ACK pending Out-of-order segment arrival (higher than expected sequence number) Gap detected Arrival of segment that partially or completely fills gap TCP Receiver action Delayed ACK. Wait 200 ms (up to 500 ms allowed) for next segment. If no segment received, send ACK Immediately send single cumulative ACK Send duplicate ACK, indicating sequence number of next expected byte Immediate ACK if segment starts at lower end of gap 9 Reliable Data Transfer in TCP Retransmission examples Host A Host B Host A Host B timeout X ACK=100 Seq=100 timeout Seq=92 timeout Seq=100, 20 data bytes ACK=100 ACK=120 ACK=100 time ACK=120 Lost ACK scenario Premature timeout 10
Reliable Data Transfer in TCP Retransmission examples Host A Host B Host A Host B timeout Seq=100, 20 data bytes X ACK=100 ACK=120 Seq=100 timeout Seq=92 timeout Seq=100, 20 data bytes ACK=100 ACK=120 time ACK=120 Cumulative ACKs potentially avoid retransmissions Premature timeout 11 Reliable Data Transfer in TCP Setting the ACK timer How large should the ACK timeout value be?» Too short: Premature timeouts result in unnecessary retransmissions» Too long: Slow reaction to loss results in poor performance because the sender s windows stops advancing Timer should be longer than the RTT, but how do we estimate RTT?» Measure the time from segment transmission until receipt of ACK ( SampleRTT ) Ignore retransmissions Measure only one segment s RTT at a time» SampleRTT will vary, so we compute an average RTT based on several recent RTT samples 12
Reliable Data Transfer in TCP Estimating round-trip-time EstimatedRTT = (1-α)*EstimatedRTT + α *SampleRTT Timeout = EstimatedRTT + 4*Deviation Deviation = (1-β)*Deviation + β * SampleRTT-EstimatedRTT The estimated RTT is an exponential weighted moving average (EWMA)» Computes a smooth average» Influence of a given sample decreases exponentially fast» Typical value of α is 0.125 Timeout is EstimtedRTT plus safety margin Large variation in EstimatedRTT results in a larger safety margin 13 Reliable Data Transfer in TCP Estimating round-trip-time 14
TCP Flow Control Window control sequence 1 st Receiver s buffer (RcvBuffer) Delivered to application Received, ACKed, not delivered Acceptable but not received Not expected, not received Flow control is the problem of ensuring the receiver is not overwhelmed» The receiver can become overwhelmed if the application reads too slow or the sender transmits too fast Receiver s window (RcvWindow) The receiver s window represents its remaining buffer capacity The window advances as the application reads received data 15 TCP Flow Control Window control sequence 1 st Receiver s buffer (RcvBuffer) Delivered to application Received, ACKed, not delivered Acceptable but not received Not expected, not received The receiver explicitly informs the sender of the amount of free buffer space in RcvBuffer» RcvWindow field in TCP segment The sender keeps the amount of transmitted, unacked data less than most recently received RcvWindow source port head len not used dest port sequence number ack number flags checksum Receiver s window TCP options receiver window size urgent data Application data 16
TCP Flow Control Window control sequence 1 st RcvBuffer Delivered to application Received, ACKed, not delivered Acceptable but not received Not expected, not received Read Rcvd RcvWindow The goal is to ensure: Rcvd - Read RcvBuffer Sender is sent: RcvWindow = RcvBuffer - (Rcvd-Read) 17 TCP Flow Control Window control sequence 1 st Sent and ACKed Sent and not ACKed Sender s buffer Eligible to be sent Ineligible Application blocked from sending ACKed Sent RcvWindow The sender ensures: Sent - ACKed RcvWindow 18
TCP Connection Management The three-way handshake Client client TCP 3-way handshake bytes Server welcoming connection TCP endpoints establish a connection before exchanging data segments» client: connection initiator Socket clientsocket = new Socket("hostname", "port number")» server: contacted by client Socket connectionsocket = welcomesocket.accept() 19 TCP Connection Management The three-way handshake Client sends SYN segment to server» The SYN specifies the client s initial sequence number Server receives SYN, replies with SYN+ACK segment» ACKs received SYN» Allocates buffers» Specifies server s initial sequence number Third segment may be an ACK only or an ACK+data client Connection request (SYN=1, ACK=0 seq=client_isn) server Connection granted (SYN=1, ACK=1 seq=server_isn, ack=client_ins+1) ACK (SYN=0, ACK=1 seq=client_isn+1, ack=server_isn+1) ACK (SYN=0, ACK=1 seq=client_isn+1, ack=server_isn+1, data= ) 20
TCP Connection Management Closing a connection Client sends FIN segment to server Server receives FIN, replies with ACK» Server closes connection, sends FIN Client receives FIN, replies with ACK Client enters timed wait state» Client will ACK any received FIN close() Timed Wait client server FIN ACK FIN ACK Connection closed close() 21 TCP Connection Management Client/Server lifecycles receive FIN send ACK Wait 30 seconds Closed connect() send SYN receive ACK send nothing Closed Server creates listen Timed Wait SYN Sent ACK Listen receive FIN send ACK FIN Wait 2 receive SYN+ACK send ACK Established close() send FIN Close Wait receive SYN send SYN & ACK SYN Received receive ACK send nothing FIN Wait 1 close() send FIN receive FIN send ACK Established receive ACK send nothing TCP client lifecycle TCP server lifecycle 22
hufflepuff:[~]% sudo tcpdump host hufflepuff and host www.cs.clemson.edu and tcp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on en0, link-type EN10MB (Ethernet), capture size 96 bytes TCP Traces 11:08:22.816917 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http: S 1990840413:1990840413(0) win 65535 <mss 1460,nop,wscale 0,nop,nop,timestamp 1273416398 0> 11:08:22.817253 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071: S 150440752:150440752(0) ack 1990840414 win 49232 <nop,nop,timestamp 1628316 1273416398,mss 1460,nop,wscale 0> 11:08:22.817331 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http:. ack 1 win 65535 11:08:22.821890 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http: P 1:506(505) ack 1 win 65535 11:08:22.822224 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071:. ack 506 win 49232 11:08:22.828103 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071:. 1:1449(1448) ack 506 win 49232 11:08:22.828112 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071: P 1449:2208(759) ack 506 win 49232 11:08:22.880198 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http: P 506:1123(617) ack 2208 win 65535 11:08:22.880580 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071:. ack 1123 win 49232 11:08:22.883328 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071: P 2208:2388(180) ack 1123 win 49232 11:08:22.884907 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http: F 1123:1123(0) ack 2388 win 65535 11:08:22.885087 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071:. ack 1124 win 49232 11:08:22.885232 IP www.cs.clemson.edu.http > hufflepuff.cs.clemson.edu.54071: F 2388:2388(0) ack 1124 win 49232 11:08:22.885267 IP hufflepuff.cs.clemson.edu.54071 > www.cs.clemson.edu.http:. ack 2389 win 65535 ^C 14 packets captured 127 packets received by filter 0 packets dropped by kernel For more info: % man tcpdump 23