Outline Development of reliable protocol Sliding window protocols Go-Back-N, Selective Repeat Protocol performance Sockets, UDP, TCP, and IP UDP operation TCP operation connection management flow control congestion control TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 point-to-point: full duplex data: socket door one sender, one receiver reliable, in-order byte steam: no message boundaries sliding window: TCP congestion and flow control set window size send & receive buffers application writes data TCP send buffer segment application reads data TCP receive buffer socket door bi-directional data flow in same connection MSS: maximum segment size (typical size 1460 why?) connection-oriented: handshaking (exchange of control msgs) init s sender, receiver state before data exchange flow controlled: sender will not overwhelm receiver 1
TCP segment structure URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) 32 bits source port # dest port # head len sequence number acknowledgement number not used UAP R S F checksum Receive window Urg data pointer Options (variable length) application data (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept (only 16 bits??) TCP Connection Management (includes establishment, closing, and reliable data delivery using sequence numbers and timers) 2
TCP Connection Management Recall: TCP sender, receiver establish connection before exchanging data segments initialize TCP variables: seq. #s buffers, flow control info (e.g. RcvWindow) Socket calls: client: connect() server: accept() Three way handshake: Step 1: client host sends TCP SYN segment to server specifies initial seq # no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data TCP Three way Handshake 3
timed wait timed wait TCP Connection Management (cont.) Closing a connection: (note: two army problem) client server client closes socket: clientsocket.close(); close Step 1: client end system sends TCP FIN control segment to server close Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN. closed TCP Connection Management (cont.) Step 3: client receives FIN, replies with ACK. Enters timed wait - will respond with ACK to received FINs closing client server closing Step 4: server, receives ACK. Connection closed. Note: with small modification, can handle simultaneous FINs. closed closed 4
TCP Connection Management (cont) TCP server lifecycle TCP client lifecycle TCP seq. # s and ACKs Seq. # s: ACKs: byte stream number of first byte in segment s data seq # of next byte expected from other side cumulative ACK User types C host ACKs receipt of echoed C Host A Host B host ACKs receipt of C, echoes back C simple telnet scenario time 5
TCP Round Trip Time and Timeout Q: how to set TCP timeout value? longer than RTT but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt SampleRTT will vary, want estimated RTT smoother average several recent measurements, not just current SampleRTT How to measure RTT when there are retransmissions? TCP Round Trip Time and Timeout EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT Exponential weighted moving average influence of past sample decreases exponentially fast typical value: = 0.125 6
RTT (milliseconds) Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 300 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT TCP Round Trip Time and Timeout Setting the timeout EstimtedRTT plus safety margin large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT: DevRTT = (1- )*DevRTT + * SampleRTT-EstimatedRTT (typically, = 0.25) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4*DevRTT 7
TCP reliable data transfer TCP creates reliable service on top of IP s unreliable service sliding windows improve utilization cumulative ACKs provide redundancy TCP uses single retransmission timer retransmissions are triggered by: timeout events duplicate ACKs initially we consider simplified TCP sender: ignore duplicate ACKs ignore flow control, congestion control TCP sender events: data rcvd from app: create segment with seq # seq # is number of first data byte in segment start timer if not already running (think of timer as for oldest unacked segment) expiration interval: TimeOutInterval timeout: retransmit segment that caused timeout restart timer ACK rcvd: if acknowledges previously unacked segments update what is known to be ACKed start timer if there are outstanding segments 8
TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Arrival of in-order segment with expected seq #. One other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq. #. Gap detected Arrival of segment that partially or completely fills gap TCP Receiver action Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments Immediately send duplicate ACK, indicating seq. # of next expected byte Immediate send ACK, provided that segment starts at lower end of gap Fast Retransmit Time-out period often relatively long: long delay before resending lost packet Detect lost segments via duplicate ACKs. sender often sends many segments back-to-back if segment is lost, there will likely be many duplicate ACKs for that segment If sender receives 3 ACKs for same data it assumes that segment after ACKed data was lost fast retransmit: resend segment before timer expires 9
timeout Host A Host B triple duplicate ACKs seq # x1 seq # x2 seq # x3 seq # x4 seq # x5 X ACK x1 ACK x1 ACK x1 ACK x1 time SYN Flood DOS Attack A classic denial of service (DOS) attack Flood a node with TCP connection requests Node allocates resources for connection sends SYNACK segment Last part of handshake (from attacker) never happens Node eventually releases resources for halfopen connections, but requests come too fast All kernel resources for TCP connections are consumed Other, legitimate clients are shut out 10
Solution Instead of allocating half-open connection Server picks seq # with hash of IP addresses, ports and a secret number (known only to server) This is called a SYN cookie Sends back SYNACK with the SYN cookie Allocates no resources, and forgets the seq # If ACK segment arrives, server regenerates the seq #. The ACK field of the arriving packet should equal that number, plus 1. If so, go ahead and allocate resources for this legitimate client. If an ACK never arrives (as in an attack) we have lost only the time to handle the SYN packet Completed handshake attack SYN Floods (were) most effective if launched from multiple clients Distributed Denial of Service (DDoS) Attack Some attacks get around SYN cookies by completing the TCP connection Resources are allocated but not used Much harder to defend against hard to tell legitimate clients from attackers 11
TCP Flow Control and Congestion Control TCP Flow Control receive side of TCP connection has a receive buffer (tied to socket): IP datagrams (currently) unused buffer space TCP data (in buffer) receiving process may be slow at reading from buffer (socket) application process flow control sender won t overflow receiver s buffer by transmitting too much, too fast speed-matching service: matching send rate to receiving application s drain rate 12
TCP Flow control: how it works IP datagrams (currently) unused buffer space rwnd RcvBuffer TCP data (in buffer) (Note: for now, assume TCP receiver discards out-oforder segments) unused buffer space: application process = rwnd = RcvBuffer-[LastByteRcvd - LastByteRead] receiver: advertises unused buffer space by including rwnd value in segment header (window advertisement) sender: limits # of unacked bytes to rwnd guarantees receiver s buffer doesn t overflow Basically, an adaptive sliding window Principles of Congestion Control Congestion: informally: too many sources sending too much data too fast for network to handle not the same as flow control! manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) a major issue in Internet performance 13
Controlling Congestion two broad approaches: end-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion Used in early networks: SNA, DECbit, ATM) explicit rate sender at which sender should transmit TCP congestion control: goal: TCP sender should transmit as fast as possible, but without congesting network Q: how to find rate just below congestion level decentralized: each TCP sender sets its own rate, based on implicit feedback: ACK received: segment arrived (a good thing!) network not congested, so increase sending rate ACK not received: assume loss due to congested network so decrease sending rate 14
RTT Congestion Window In addition to a receiver window (from advertisements) the sender maintains a congestion window, cwnd Rwnd indicates state of the receiver Cwnd indicates state of the network *** The send window is min(rwnd,cwnd) TCP Slow Start when connection begins, cwnd = 1 MSS (max segment size) example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate increase rate exponentially until first loss event or when threshold reached double cwnd every RTT done by incrementing cwnd by 1 for every ACK received Host A Host B time 15
sending rate TCP Congestion Control: more details segment loss event: reducing cwnd timeout: no response from receiver cut cwnd to 1 3 duplicate ACKs: at least some segments getting through (recall fast retransmit) cut cwnd in half, less aggressively than on timeout ACK received: increase cwnd slowstart phase: increase exponentially fast (despite name) at connection start, or following timeout congestion avoidance: increase linearly TCP congestion control: bandwidth probing probing for bandwidth : increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate continue to increase on ACK, decrease on loss (since available bandwidth is changing, depending on other connections in network) ACKs being received, so increase rate X X X X X loss, so decrease rate TCP s sawtooth behavior time 16
Transitioning into/out of slowstart ssthresh: cwnd threshold maintained by TCP on loss event: set ssthresh to cwnd/2 remember (half of) TCP rate when congestion last occurred when cwnd >= ssthresh: transition from slowstart to congestion avoidance phase L cwnd = 1 MSS ssthresh = 64 KB dupackcount = 0 timeout ssthresh = cwnd/2 cwnd = 1 MSS dupackcount = 0 retransmit missing segment duplicate ACK dupackcount++ slow start new ACK cwnd = cwnd+mss dupackcount = 0 transmit new segment(s),as allowed cwnd > ssthresh L timeout ssthresh = cwnd/2 cwnd = 1 MSS dupackcount = 0 retransmit missing segment congestion avoidance TCP: congestion avoidance when cwnd > ssthresh grow cwnd linearly increase cwnd by 1 MSS per RTT approach possible congestion slower than in slowstart implementation: cwnd = cwnd + MSS/cwnd for each ACK received AIMD ACKs: increase cwnd by 1 MSS per RTT: additive increase loss: cut cwnd in half (non-timeout-detected loss ): multiplicative decrease AIMD: Additive Increase Multiplicative Decrease 17
cwnd window size (in segments) Popular flavors of TCP TCP Reno ssthresh TCP Tahoe ssthresh Transmission round Summary: TCP Congestion Control when cwnd < ssthresh, sender in slow-start phase, window grows exponentially. when cwnd >= ssthresh, sender is in congestionavoidance phase, window grows linearly. when triple duplicate ACK occurs, ssthresh set to cwnd/2, cwnd set to ~ ssthresh when timeout occurs, ssthresh set to cwnd/2, cwnd set to 1 MSS. 18
Effects of Wireless LANs Loss rates on wireless LANs are much higher than on wired networks Some loss can be mitigated by retransmissions at the link layer (discussed later) IP handoffs for mobile devices also cause loss. What is TCP s response to packet loss? Many solutions have been proposed (and are still being proposed!) to deal with this issue. TCP throughput Q: what s average throughout of TCP as function of window size, RTT? ignoring slow start let W be window size when loss occurs. when window is W, throughput is W/RTT just after loss, window drops to W/2, throughput to W/2RTT. average throughout:.75 W/RTT 19
Throughput (bps) TCP Throughput Issues for Fast Networks example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput requires window size W = 83,333 inflight segments throughput in terms of loss rate: L = 2 10-10 Wow 1.22 MSS RTT new versions of TCP being proposed for high throughput L 12,000,000,000 10,000,000,000 8,000,000,000 6,000,000,000 4,000,000,000 2,000,000,000 0 1E-04 1E-06 1E-08 Error Rate (L) 1E-10 TCP Fairness fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 TCP connection 2 bottleneck router capacity R 20
Why is TCP fair? Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally R equal bandwidth share loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R Fairness (more) Fairness and UDP multimedia apps often do not use TCP do not want rate throttled by congestion control instead use UDP: pump audio/video at constant rate, tolerate packet loss Fairness and parallel TCP connections nothing prevents app from opening parallel connections between 2 hosts. web browsers do this example: link of rate R supporting 9 connections; new app asks for 1 TCP, gets rate R/10 new app asks for 11 TCPs, gets R/2! 21
Summary Developed basis of reliable protocol Learned about sliding window protocols Go-Back-N, Selective Repeat Analyzed protocol performance Discussed demultiplexing UDP operation (ports, checksum) TCP operation connection management (sequence numbers) flow control (receive window) congestion control (congestion window) 22