Reliable Byte-Stream () Outline Connection Establishment/Termination Sliding Window Revisited Flow Control Adaptive Timeout Simple Demultiplexer (UDP) Header format Note 16 bit port number (so only 64K ports) Process really identified via <port,host> pair Checksum (optional in IPv4, mandatory in IPv6) psuedo header + UDP header + data Pseudo header: Protocol number Source IP Dest IP UDP length field 0 16 31 SrcPort DstPort Checksum Length Data Why? Spring 2002 CS 332 1 Spring 2002 CS 332 4 End-to-End Protocols Underlying best-effort network drops messages re-orders messages delivers duplicate copies of a given message limits messages to some finite size delivers messages after an arbitrarily long delay Common end-to-end services guarantee message delivery deliver messages in the same order they are sent deliver at most one copy of each message support arbitrarily large messages support synchronization (between sender and receiver) allow the receiver to flow control the sender support multiple application processes on each host Spring 2002 CS 332 2 Connection-oriented Byte-stream app writes bytes sends segments app reads bytes Application process Write bytes Send buffer Overview Segment Segment Segment Transmit segments Full duplex Flow control: keep sender from overrunning receiver Congestion control: keep sender from overrunning network Application process Read bytes Receive buffer Spring 2002 CS 332 5 Simple Demultiplexer (UDP) Extends host-to-host service into process-toprocess Unreliable and unordered datagram service Adds multiplexing No flow control Endpoints identified by ports (why not PID?) servers have well-known ports (clients don t need this) Often just starting point see /etc/services on Unix Implemented as message queue Flow Control vs Congestion Control Flow Control Prevent sender from overloading receiver End-to-end issue Congestion Control Prevent too much data from being injected into network Concerned with how hosts and network interact Spring 2002 CS 332 3 Spring 2002 CS 332 6 1
Data Link Reliability (text 2.5) Acknowledgements & Timeouts Wherein we look at reliability issues on a point-to-point link! Error correcting codes can t handle all possible errors (without introducing lots of overhead--including this is not designing for normal situation), so badly garbled frames are dropped. We need a way to recover from these lost frames. Spring 2002 CS 332 7 Spring 2002 CS 332 10 Acks and Timeouts Acknowledgement () Small frame sent to peer indicating receipt of frame No data Piggybacking Timeout If not received within reasonable time, original frame is retransmitted Automatic Repeat Request (ARQ) General strategy of using S and timeouts to implement reliable delivery Spring 2002 CS 332 8 A Subtlety Consider scenarios (c) and (d) in previous slide. receives two good frames (duplicate) It may deliver both to higher layer protocol (not good!) Solution: 1-bit sequence number in frame header Spring 2002 CS 332 11 Acknowledgements & Timeouts Stop-and-Wait Problem: keeping the pipe full Example 1.5Mbps link x 45ms RTT = 67.5Kb (8KB) 1KB frames imples 1/8th link utilization (Next slide) Spring 2002 CS 332 9 Spring 2002 CS 332 12 2
Bandwidth x Delay Product Sending a 1KB packet in 45ms implies sending at rate of (1024 x 8)/0.045 = 182 Kbps, or 1/8 of bandwidth. Bandwidth-delay: The number of bits that fits in the pipe in a single round trip. (I.e. the amount of data that could be in transit) Goal: Want to be able to send this much data before getting first. (called keeping the pipe full) Spring 2002 CS 332 13 Sliding Window: Maintain three state variables receive window size (RWS) (upper bound on # out-of-order frames) largest frame acceptable (LFA) (sequence # of) last frame received (LFR) Maintain invariant: LFA - LFR <= RWS RWS LFR LFA Frame SeqNum arrives: if LFR < SeqNum < = LFA accept if SeqNum < = LFR or SeqNum > LFA discarded Send cumulative s Spring 2002 CS 332 16 Sliding Window Allow multiple outstanding (un-ed) frames Upper bound on un-ed frames, called window Time Spring 2002 CS 332 14 Note: When packet loss occurs, pipe is no longer kept full Longer it takes to notice lost packet, worst the condition becomes Possible solutions: Send Ns Selective acknowledgements (just exactly those frames received, not highest frame received) Not used: too much added complexity Spring 2002 CS 332 17 Sliding Window: Assign sequence number to each frame (SeqNum) Maintain three state variables: send window size (SWS) last acknowledgment received (LAR) last frame sent (LFS) Maintain invariant: LFS - LAR <= SWS LAR SWS Advance LAR when arrives Buffer up to SWS frames (must be prepared to retransmit frames until they are ed) LFS Sequence Number Space SeqNum field is finite; sequence numbers wrap around Sequence number space must be larger then number of outstanding frames (I.e. stop-and-wait had 2 # space) I.e. if sequence number space is of size 8 (say 0..7), and number of outstanding frames is allowed to be 10, then sender can send sequence numbers 0,1,2,3,4,5,6,7,0,1 all at once. Now if receiver sends back an with sequence number 1, which packet 1 is it ing? Spring 2002 CS 332 15 Spring 2002 CS 332 18 3
Sequence Number Space Even SWS < SequenceSpaceSize is not sufficient suppose 3-bit SeqNum field (0..7) (so SequenceSpaceSize = 8) Let SWS=RWS=7 sender transmit frames 0..6 Frames arrive successfully, but s are lost sender retransmits 0..6 receiver expecting 7, 0..5, but receives second incarnation of 0..5 (because the receiver has at this point updated its various pointers) SWS <= (SequenceSpaceSize+1)/2 is rule (if SWS=RWS) Intuitively, SeqNum slides between two halves of sequence number space Spring 2002 CS 332 19 The End-to-End Argument Consider vs X.25 : Consider underlying IP network unreliable and use sliding window to provide end-to-end inorder reliable delivery X.25: Use sliding window within network on hopby-hop basis (which should guarantee end-to-end). Several problems with this: No guarantee that added hop preserves service In link from A to B to C, no guarantee that B behaves perfectly (nodes known to introduce errors and mix packet order) Spring 2002 CS 332 22 Easy to overlook Relationship between window size and sequence number space depends on assumption that frames are not reordered in transit (easy to assume on point-to-point link). End-to-End A function should not be provided in the lower levels of the system unless it can be completely and correctly implemented at that level Does allow for functions to be incompletely provided at lower levels for optimization E.g. detecting and retransmitting single corrupt packet across one hop preferable to retransmitting entire file end-to-end. Spring 2002 CS 332 20 Spring 2002 CS 332 23 Data Link Versus Transport Potentially connects many different hosts need explicit connection establishment and termination Potentially different RTT (over different routes and at different times, even on scale of minutes) need adaptive timeout mechanism Potentially long delay in network need to be prepared for arrival of very old packets Potentially different capacity at destination need to accommodate different node capacity Potentially different network capacity need to be prepared for network congestion Spring 2002 CS 332 21 Segment Format 0 4 10 16 31 SrcPort DstPort SequenceNum Acknowledgment HdrLen 0 Flags AdvertisedWindow Checksum UrgPtr Options (variable) Data Spring 2002 CS 332 24 4
Segment Format (cont) Each connection identified with 4-tuple: (SrcPort, SrcIPAddr, DsrPort, DstIPAddr) Sliding window + flow control acknowledgment, SequenceNum, AdvertisedWinow Data (SequenceNum) Acknowledgment + AdvertisedWindow Flags SYN, FIN, RESET, PUSH, URG, Checksum pseudo header + header + data Spring 2002 CS 332 25 Sliding Window Revisited LastByteAcked Sending application LastByteWritten Sending side LastByteAcked < = LastByteSent LastByteSent < = LastByteWritten buffer bytes between LastByteAcked and LastByteWritten LastByteSent Receiving application LastByteRead NextByteExpected Spring 2002 CS 332 28 LastByteRcvd Receiving side LastByteRead < NextByteExpected NextByteExpected < = LastByteRcvd +1 buffer bytes between NextByteExpected and LastByteRcvd Connection Establishment and Termination Active participant (client) SYN, SequenceNum = x SYN +, SequenceNum = y, Acknowledgment = x + 1, Acknowledgment = y + 1 Passive participant (server) Spring 2002 CS 332 26 Flow Control Send buffer size: MaxSendBuffer Receive buffer size: MaxRcvBuffer Receiving side LastByteRcvd - LastByteRead < = MaxRcvBuffer AdvertisedWindow = MaxRcvBuffer -(LastByteRcvd - NextByteRead) Sending side LastByteSent - LastByteAcked < = AdvertisedWindow EffectiveWindow = AdvertisedWindow -(LastByteSent - LastByteAcked) LastByteWritten - LastByteAcked < = MaxSendBuffer block sender if (LastByteWritten - LastByteAcked) + y > MaxBuffer Always send in response to arriving data segment Persist when AdvertisedWindow = 0 Spring 2002 CS 332 29 State Transition Diagram CLOSED Active open/syn Passive open Close Close LISTEN SYN/SYN + Send/SYN SYN/SYN + SYN_RCVD SYN_SENT SYN + / Close/FIN ESTABLISHED Close/FIN FIN/ FIN_WAIT_1 CLOSE_WAIT FIN/ Close/FIN FIN_WAIT_2 CLOSING LAST_ + FIN/ Timeout after two segment lifetimes FIN/ TIME_WAIT CLOSED Spring 2002 CS 332 27 Protection Against Wrap Around 32-bit SequenceNum Bandwidth Time Until Wrap Around T1 (1.5 Mbps) 6.4 hours Ethernet (10 Mbps) 57 minutes T3 (45 Mbps) 13 minutes FDDI (100 Mbps) 6 minutes STS-3 (155 Mbps) 4 minutes STS-12 (622 Mbps) 55 seconds STS-24 (1.2 Gbps) 28 seconds Spring 2002 CS 332 30 5
Keeping the Pipe Full Karn/Partridge Algorithm 16-bit AdvertisedWindow Original transmission Original transmission Bandwidth T1 (1.5 Mbps) Ethernet (10 Mbps) T3 (45 Mbps) FDDI (100 Mbps) STS-3 (155 Mbps) STS-12 (622 Mbps) STS-24 (1.2 Gbps) Delay x Bandwidth Product 18KB 122KB 549KB 1.2MB 1.8MB 7.4MB 14.8MB SampleR TT Retransmission Do not sample RTT when retransmitting Double timeout after each retransmission SampleR TT Retransmission Spring 2002 CS 332 31 Spring 2002 CS 332 34 Extensions Implemented as header options Store timestamp in outgoing segments Extend sequence space with 32-bit timestamp (PAWS) Shift (scale) advertised window Jacobson/ Karels Algorithm New Calculations for average RTT Diff = samplertt - EstRTT EstRTT = EstRTT + ( 8 x Diff) Dev = Dev + 8 ( Diff - Dev) where 8 is a factor between 0 and 1 Consider variance when setting timeout value TimeOut = µ x EstRTT + φ x Dev where µ = 1 and φ = 4 Notes algorithm only as good as granularity of clock (500ms on Unix) accurate timeout mechanism important to congestion control (later) Spring 2002 CS 332 32 Spring 2002 CS 332 35 Adaptive Retransmission (Original Algorithm) Measure SampleRTT for each segment/ pair Compute weighted average of RTT EstRTT = α x EstimatedRTT + β x SampleRTT where α + β = 1 α between 0.8 and 0.9 β between 0.1 and 0.2 Set timeout based on EstRTT TimeOut = 2 x EstRTT Spring 2002 CS 332 33 6