Transport Layer. By analogy, then, this would be the place to consider the services that the transport layer might want from the network layer.

The We already know the functions provided by the transport layer, because we needed to talk about them in order to describe how applications make use of the services provided by the transport layer. Let s quickly review: The transport layer must provide a multiplexing and demultiplexing service. This is its most important function. For outgoing data, the transport layer attaches a transport header which includes a port number which identifies the sending process. It then hands the transport protocol data unit (PDU) to the network layer for transmission. This is a multiplexing function, with messages from many sources (applications) multiplexed into a single stream of segments that are handed to the network layer. For incoming data, the transport layer examines the port number in the transport header and uses the port number to choose the application (process) that should receive the message carried in the transport PDU. This is a demultiplexing function, with segments arriving from the network layer distributed to the correct application. The transport layer could provide reliable data transfer, integrity (authentication and encryption), and quality of service (QoS) guarantees. By analogy, then, this would be the place to consider the services that the transport layer might want from the network layer. The essential service provided by the network layer is delivery of a packet from one end system to another. But what about services that the network layer could provide? This discussion is conspicuous in its absence from the start of Chapter 3. A skip ahead to Chapter 4 offers a possible explanation: The list of services that could be provided by the network layer looks a lot like the list of services that could be provided by the transport layer. The Internet transport protocols make the least possible assumption about the network layer: best-effort delivery between end systems. By design! TCP and UDP can run over top of any network layer. 1 June, 2012

If there were any serious competitors at the transport level, it d be worth having a discussion of possible network layer services and how the presence or absence of those services would affect the design of transport protocols. But there aren t, so we won t. The Internet protocol suite provides two transport protocols, TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). UDP provides only multiplexing and demultiplexing. TCP provides reliable data transfer in addition to multiplexing and demultiplexing. A bolt-on module, TLS ( Security) can be used to add integrity to either TCP or UDP. The Internet transport protocols provide no QoS guarantees, because the underlying network may not be capable of supporting such guarantees. Port numbers allow TCP and UDP to identify specific processes. A port number is associated with exactly one process, but a process can acquire multiple port numbers. In the Internet protocols the allowable range of port numbers is 0 65535 (2 16 1). This range is divided into system ports, 0 1023, user ports, 1024 49151, and dynamic ports, 49152 65535. The IANA assigns port numbers 1 from the system range for services associated with standard Internet protocols. It also administers the use of ports in the user range as a convenience for network application developers. In the socket API, each port number is associated with a socket, the object created by an application to access the services of the transport layer. As we already know, it s actually a bit more complicated. A socket using the UDP protocol is associated with a local port number and network address. Each time an application wants to use the socket to send a message, it must specify the destination port and network address. The destination can be different for each use of the socket. 1 See http://www.iana.org/protocols and scroll down the page to the section with the heading Port Numbers. The Service Name and Transport Protocol Port Number Registry lists all system and user port assignments. 2 June, 2012

A socket using the TCP protocol is associated with a local port number and network address and with a remote port number and network address, as part of the TCP connection setup. Once the connection is established, the application does not need to specify the destination with each message, but the destination cannot be changed. In order to send a message to an application, we must somehow know a port number for a socket associated with that application. In the Internet, this is solved by using well-known port numbers (the system and user ports mentioned above). Using the socket API, we can request a specific wellknown port number be associated with a socket created by the application. Reliable Data Transfer Let s dive right into one of the core subjects of this chapter: reliable data transfer. To be reliable, we require that data be delivered without loss or error, and in the order in which it was sent. Without going into details, detection of errors requires that extra information be transmitted with the data. The sender performs some calculation over the data and sends the result of this calculation along with the data. The receiver repeats the calculation and checks its result against the result sent with the data. If the two results agree, the data has been received without error. The amount of extra information required is surprisingly small. Often, it is called a checksum. Assuming that we detect an error in data delivered to the local system, what can we do about it? We could try to correct the error. Just as with error detection, error correction requires that extra information be transmitted with the data. The amount of extra information required for error correction is large and this technique is not used in practice. We can ask the sender to retransmit the data. This requires some care to do right but it is a practical technique. We ll start simple and work up to the algorithms used in modern protocols. To establish a trivial base case, consider a perfect channel 2 between the sender and receiver. No data is ever lost or corrupted and data is delivered in the order that it s sent. 3 June, 2012

As shown in the figure, the implementation really is trivial. A single state suffices for each of the sender and receiver. The sender s transport layer waits for the application to send a message with a call to rdt_send. When it arrives, the transport layer wraps it in a transport layer segment with a call to wrap and hands it off to the network layer with a call to udt_send. The receiver s transport layer waits for the network layer to provide data with a call to rdt_rcv. When it arrives, the transport layer removes the message from the segment with a call to extract and delivers it to the application with a call to deliver_msg. Sender wait for call from above rdt_rcv(seg) msg = extract(seg) deliver_msg(msg) rdt_send(msg) seg = wrap(msg) udt_send(seg) wait for call from below Receiver (The notation used in this and subsequent figures is the standard bubble diagram notation for a state machine. See the text for details if you re not familiar with it.) Nothing else is required. With a perfect channel, nothing can go wrong. In reality, errors happen. Let s start by assuming that data can be corrupted but never entirely lost. By assumption, the receiver can detect data errors but doesn t have enough information to repair the error. The receiver will need to ask the sender to retransmit the data. We ve identified the three capabilities required to do this: Error detection is needed so that the receiver is aware that there s a problem. Receiver feedback is needed so that the sender is aware that there s a problem. Retransmission of the data by the sender is necessary to fix the problem. Figure 1 shows the enhanced state machines required for the sender and receiver. Each time the application hands a message to the sender s transport layer for transmission, the transport layer calculates a checksum over the message. The message and checksum are wrapped together in a segment and handed to the network layer for transmission. 2 Because we don t want to be specific about the connection between the sender and receiver it could be a single link or the Internet we ll use the word channel for the connection between the sender and receiver. 4 June, 2012

rdt_send(msg) xmtseg = wrap(msg,chksum) wait for call from above Sender wait for ACK or NAK isnak(rcvseg) isack(rcvseg) Λ!corrupt(rcvseg) msg = extract(seg) deliver_msg(msg) xmtseg = wrap(ack) Receiver wait for call from below corrupt(rcvseg) xmtseg = wrap(nak) Figure 1: Sender and receiver state machines for rdt2.0 The sender s work is not done. It must wait for a reply from the receiver. If the received segment is a NAK, the sender must retransmit the segment. If the received segment is an ACK, the sender can return to the initial state and wait for another message from the application. Each time the network layer delivers a segment to the receiver s transport layer, the receiver must check that the segment is correct. To do this, it calculates a checksum for the received message and compares it to the checksum sent with the message. If the checksums match, the segment is correct and it can be delivered to the application. In addition, an ACK segment must be sent to the sender s transport layer so that it knows the segment was received without error. If the checksums don t match, the segment is corrupted and it is simply discarded. In addition, a NAK segment must be sent to the sender s transport layer so that it knows to retransmit the segment. There are several problems with rdt2.0. The first is merely annoying. While the sender is in the wait for ACK or NAK state, it cannot accept new messages 5 June, 2012

from the application. If the application calls rdt_send, it will block waiting for the transport layer to return to the wait for call from above state. This type of protocol, where the sender must stop and wait for an acknowledgement for each segment, is commonly called a stop-and-wait protocol. The second is fundamental. If any segment can be corrupted, ACK and NAK segments can be corrupted. Suppose that the receiver sends a NAK segment that is corrupted en route. The sender will not recognise it as either of ACK or NAK and will stay in the wait for ACK or NAK state. The receiver will take no further action; it will simply wait for the next segment to arrive. Our protocol is livelocked neither side will do anything more. How can we fix this problem? It should be quickly apparent that adding a checksum to ACK and NAK segments won t help. The sender will know that the segment just received is corrupt, but that doesn t help. It s still sitting in state wait for ACK or NAK. Adding additional capability to the transport protocol to allow the sender to request that the receiver retransmit an ACK or NAK won t help either. Both the sender and receiver will require additional states to deal with this new segment type. A bit of thought should convince you that we ve just moved the problem to the new states. We could decide that if the sender receives a corrupt segment, it will assume the worst (NAK) and retransmit xmtseg. But what if the corrupted segment was really a ACK? Now the receiver will get a second copy of the message, with no way of knowing it s a copy. Let s pursue this last idea for a moment. Knowing that it s received a copy, the receiver could take action, discarding the copy and resending the ACK. This would tell the sender that the segment was received without error and allow it to return to wait for call from above to await the next message from the application. Successful error recovery! The technique that we ll adopt to allow the receiver to detect a duplicate is sequence numbers. Now we have a new set of questions to answer. How many sequence numbers do we need? Here, two (0 and 1) would sufifice, because there s only one segment being transmitted (commonly referred to as 6 June, 2012

in flight ) at any given moment. If the receiver has sent an ACK for a segment with sequence number 0 and it receives another segment with sequence number 0, it knows that its ACK didn t get through and it can send it again. A bit of thought should convince you that, in general, if we want to have N 1 segments in flight we need N sequence numbers. While we re at it, consider that when the receiver receives a segment with the wrong sequence number, it knows that the previous ACK was not received and it should send it again. We could apply the same logic at the sender if the ACK also carried sequence numbers. Suppose that each ACK contains the sequence number of the last segment that was correctly received. If this isn t the same sequence number as the segment that was just transmitted, the sender knows that it must retransmit the segment. That brings us to rdt2.2, shown in Figure 2. The state machines for the sender and receiver have been doubled so that we can use state to keep track of the segment sequence number. Assume that the sender and receiver each start in the state identified by the dashed arrow. Let s see what happens when a segment is transmitted without error. The sender starts in state wait for call 0 from above. When the application hands a message to the transport layer, it s wrapped in a segment along with a checksum, assigned a sequence number of 0, and handed to the network layer for transmission. The sender now moves to state wait for ACK 0 to receive the reply from the receiver. The receiver starts in state wait for call 0 from below. If the segment is received without error (i.e., not corrupted and sequence number 0), the receiver will deliver the message to the application, send back an ACK message with sequence number 0 (ACK(0)), and move to wait for call 1 from below to await delivery of the next segment. When the sender receives ACK(0), it knows that the segment with sequence number 0 was received without error. It moves to state wait for call 1 from above to await the next message from the application. But suppose the segment is corrupted on the way to the receiver. The receiver will discard the segment, send back ACK(1), and remain in state wait for call 0 from below in anticipation that the sender will retransmit the segment. 7 June, 2012

rdt_send(msg) xmtseg = wrap(0,msg,chksum) wait for call 0 from above wait for ACK 0 (corrupt(rcvseg) isack(rcvseg,1))!corrupt(rcvseg) && isack(rcvseg,1) Λ Sender!corrupt(rcvseg) && isack(rcvseg,0) Λ (corrupt(rcvseg) isack(rcvseg,0)) wait for ACK 1 wait for call 1 from above rdt_send(msg) xmtseg = wrap(1,msg,chksum)!corrupt(rcvseg) && seq(rcvseg,0) msg = extract(seg) deliver_msg(msg) xmtseg = wrap(0,ack,chksum) (corrupt(rcvseg) seq(rcvseg,1)) xmtseg = wrap(1,ack,chksum) wait for call 0 from below Receiver wait for call 1 from below!corrupt(rcvseg) && seq(rcvseg,1) msg = extract(seg) deliver_msg(msg) xmtseg = wrap(1,ack,chksum) (corrupt(rcvseg) seq(rcvseg,0)) xmtseg = wrap(0,ack,chksum) Figure 2: Sender and receiver state machines for rdt2.2 8 June, 2012

Receipt of ACK(1) will cause the sender to retransmit the segment with sequence number 0. This exchange will repeat until the segment is received without error. Suppose that the segment is received correctly by the receiver but the ACK is corrupted on its way back to the sender. On the receiver s side, everything looks good. It delivers the message to the application and moves to state wait for call 1 from below to await the next segment. The sender receives a corrupt segment and responds by resending the segment with sequence number 0. When the receiver sees the segment with sequence number 0, it discards the segment and repeats ACK(0). This sequence will repeat until the ACK(0) is received without error by the sender. When this happens, the sender knows that the segment with sequence number 0 was received without error and it can move to state wait for call 1 from above. In general, if the sender receives an ACK with a sequence number that does not match the segment it just transmitted, it knows that the receiver did not receive the segment. If the receiver receives a duplicate segment, it knows that the sender did not receive the ACK for that segment. If we want to have many segments in flight, we ll need to use a variable to keep track of the sequence numbers instead of creating more states. Let s add the second type of error: loss of a segment. In the current Internet, the most common reason is that a router has dropped a packet because of congestion (no space in the transmit buffer for a link). Once either of the sender or receiver recognises that a segment has gone missing, we can easily recover from the error by retransmitting the segment. But how can we detect the absence of something? The first thing to note is that in order to realise something hasn t arrived, we must be expecting something to arrive. Given that we re expecting something to arrive, we probably have some notion of when, and that allows us to say the thing I m expecting hasn t arrived in a reasonable amount of time. The technique is called a timeout. For example, when the sender hands a segment to the network layer for transmission to the receiver, it knows that it should receive an ACK within a reasonable amount of time. It can set 9 June, 2012

a timer to go off at the end of the interval. Computers don t do well with reasonable, however, so we ll need to be more specific. One possibility is to keep track of the average time between handing a segment to the network layer and receiving an ACK in reply. This is the average round trip time (RTT), and it s an estimate of the minimum interval before the sender can expect an ACK from the receiver. The trick is to wait long enough to be reasonably certain of loss, but not so long that it takes an unacceptably long time to recover from loss of a segment. This balancing act introduces the possibility of unnecessary retransmissions, but fortunately our protocol already copes with duplicates. The state machines for rdt3.0 are shown in Figure 3. Each time a segment is handed to the network layer (udt_send), a timer is started. When an ACK indicates that the segment was received without error, the timer is stopped. In each wait for ACK state, the sender now relies completely on the timeout to trigger retransmission. Arguably this is not the best choice the protocol might recover faster if we kept the behaviour of rdt2.2 and retransmitted the segment on receipt of a corrupted segment or an ACK with the wrong sequence number. The sender must also consider the possibility that an ACK will arrive in one of the wait for call from above states, triggered by an unnecessary retransmission of a segment. Notice that the receiver s state machine is unchanged from rdt2.2. The receiver already has the capability to recognise and discard duplicate segments. The receiver has no use for a timeout, because each segment might be the last. The sender, on the other hand, expects to receive an ACK for each segment it transmits. The text illustrates the operation of rdt3.0 with four scenarios (Figure 3.16). Only the scenario for a premature timeout by the sender (Figure 3.16(d)) is discussed here. The timeline is modified slightly to show how the sender might receive an ACK while in a wait for call from above state. 10 June, 2012

rdt_rcv(rcvseg) Λ rdt_send(msg) xmtseg = wrap(0,msg,chksum) start_timer wait for call 0 from above wait for ACK 0 timeout start_timer (corrupt(rcvseg) isack(rcvseg,1)) Λ!corrupt(rcvseg) && isack(rcvseg,1) stop_timer Sender!corrupt(rcvseg) && isack(rcvseg,0) stop_timer timeout start_timer wait for ACK 1 wait for call 1 from above rdt_rcv(rcvseg) Λ (corrupt(rcvseg) isack(rcvseg,0)) Λ rdt_send(msg) xmtseg = wrap(1,msg,chksum) start_timer!corrupt(rcvseg) && seq(rcvseg,0) msg = extract(seg) deliver_msg(msg) xmtseg = wrap(0,ack,chksum) (corrupt(rcvseg) seq(rcvseg,1)) xmtseg = wrap(1,ack,chksum) wait for call 0 from below Receiver wait for call 1 from below!corrupt(rcvseg) && seq(rcvseg,1) msg = extract(seg) deliver_msg(msg) xmtseg = wrap(1,ack,chksum) (corrupt(rcvseg) seq(rcvseg,0)) xmtseg = wrap(0,ack,chksum) Figure 3: Sender and receiver state machines for rdt3.0 11 June, 2012

send seg(0) rcv ACK(0) send seg(1) timeout! resend seg(1) rcv ACK(1) rcv ACK(1) (dup) send seg(0) Sender seg(1) seg(0) ACK(0) seg(1) ACK(1) seg(0). Receiver ACK(1) rcv seg(0) send ACK(0) rcv seg(1)... delay... send ACK(1) rcv seg(1) (dup) send ACK(1) The segment with sequence number 0 is transmitted and acknowledged without error. When the segment with sequence number 1 arrives, the receiver is busy and there is some delay before it transmits ACK(1), enough that the sender times out and retransmits the segment with sequence number 1. After the retransmission, the original ACK(1) arrives from the receiver. The sender processes the ACK and moves to state wait for call 0 from above where it waits for the application to provide another message. Meanwhile, the receiver has processed the duplicate segment with sequence number 1, sending ACK(1) in response. Because the application hasn t generated a new message, the sender is still in state wait for call 0 from above when the duplicate ACK(1) arrives. The protocols we ve just explored are stop-and-wait protocols the sender will wait for an acknowledgement before sending the next segment. Is there room for improvement? Let s do a quick calculation. Transcontinental distances are on the order of 4000 5000 km; transoceanic distances up to 9000 km. Signal propagation speeds are on the order of 2 10 8 3 10 8 m/sec. To an order of magnitude, d prop for a segment will be 10 2 sec; a few 10 s of msec. This doesn t account for the other nodal delays some quick tests with ping show an average delay of 80 90 msec to the east coast of North America, 200 msec across the Pacific. How long will it take us to transmit a typical segment? The size of a typical Ethernet frame is 1500 bytes or 12000 bits. For convenience, let s use a segment size of 10kb and transmit it over gigabit Ethernet links at 10 9 b/sec. The transmission time for a segment is on the order of 10 5 sec; about 10 µsec. 12 June, 2012

Bottom line: It takes about 10 µsec to transmit the segment, after which the sender waits for 10 s of msec for the ACK. We re using less than 1/1000 of the available bandwidth! Yes, surely there must be some way to improve on this! Using larger segments could help, but that brings its own problems in a network based on store-and-forward routers. Question P31 from Chapter 1 explored the advantage of breaking a single large segment into multiple small segments for transmission. The key lesson is that we want to have many segments in flight, reducing the total transmission time. Let s explore this idea, and see how to design a reliable data transmission protocol that allows for many segments in flight between two end systems. There are two variations, called go-back-n and selective repeat. In a go-back-n protocol, the sender puts many segments in flight toward the receiver. The receiver sends back acknowledgements as each segment is received. If a segment is lost or corrupted, the receiver requests retransmission of the missing segment and discards all segments that arrive until the missing segment is received. This simplifies the design of the receiver, but many segments may be transmitted correctly, discarded, and retransmitted, in the course of error recovery. In a selective repeat protocol, the sender puts many segments in flight toward the receiver. The receiver sends back acknowledgements as each segment is received. If a segment is lost or corrupted, the receiver requests retransmission of the missing segment and buffers all segments that arrive without error until the missing segment is received. This complicates the design of the receiver, but error recovery is limited to retransmission of the missing segment(s). Implementing either protocol has some implications: As mentioned earlier, if we want to have N 1 segments in flight, we will need at least N sequence numbers. We will need to introduce variables to handle the bookkeeping; it s not practical to keep adding states to the protocol. Typically, N is determined by the number of bits available to hold the sequence number. Sequence numbers with k bits have N = 2 k distinct values 0 2 k 1, hence the window size is N 1 = 2 k 1. 13 June, 2012

The sender must be able to retransmit any segment that has been transmitted but not acknowledged by the receiver. That means that the sender needs enough buffer space to store all segments in flight until they are acknowledged. The buffer requirement at the receiver will depend on the choice of protocol. Before we talk about the details of either protocol, let s justify the assertion that N sequence numbers are necessary for N 1 segments in flight. The figure below illustrates the proper use of sequence numbers for a window of size N 1 = 7. past 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 future segments acknowledged segments in flight segments not yet transmitted At any one time, we can have a maximum of 7 segments in flight transmitted but not yet acknowledged. Suppose that the window size was 8 and the sender transmitted a eighth segment. As shown in the following figure, it would have sequence number 3. Sender 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 segments acknowledged segments in flight segments not yet transmitted past Receiver expected sequence number future 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 segments acknowledged segments not yet transmitted Now, suppose that the receiver has actually received all the segments transmitted by the sender (segments #4, #5,..., #2, #3), but for some reason all the acknowledgements have been lost. The receiver is expecting a segment with sequence number 4. If the sender retransmits the oldest unacknowledged segment (with sequence number 4), the receiver will think it s a new segment and will accept the duplicate. 14 June, 2012

The general problem is that errors in the channel can cause the sender and receiver to have very different views of the window of legal sequence numbers. Given N sequence numbers, we can have at most N 1 segments in flight. The figures invite another interpretation. At any given time, we can have N 1 segments in flight. Considered in terms of the infinite sequence of past, present, and future segments, it s as if we are sliding a window of size N 1 over the sequence of segments. Another name for this type of protocol is a sliding window protocol. Now let s consider the details for a go-back-n protocol. What actions are required of the sender? There s only one state, so we can dispense with the bubble diagram 3. Assume that sequence numbers range from 0 to N 1 and the first segment to be transmitted will receive sequence number 0. Create variables oldest and newest to track the sequence number of the oldest and newest segments in flight, and inflight to count the number of segments in flight. Let s consider how the sender responds to the possible events: message from application, receive ack, corrupt segment, and timeout. Initialise oldest to 0, newest to N 1, and inflight to 0. Message from application: When the application tries to transmit a message, the transport layer must decide if there s room in the window. If so, it can transmit the segment and place a copy in the buffer holding segments in flight. If not, it must refuse the message. if inflight < N-1 : newest = (newest+1) mod N segbuffer[newest] = wrap(newest,msg,chksum) udt_send(segbuffer[newest]) inflight = inflight+1 if newest == oldest : start_timer else refuse(msg) Notice that we re only concerned with a timeout on the ACK for the oldest segment; there s no need to keep a timer for other segments. 3 The presentation of go-back-n and selective repeat in these notes restates the presentation in the text in Python and makes explicit the modulo-n arithmetic used for sequence numbers. You should convince yourself that the presentations are equivalent. 15 June, 2012

Receive ACK: If an ACK arrives without error, we can take that as an indication that the receiver has correctly received all segments with sequence numbers up to the sequence number in the ACK message. It may well be that some previous ACK has been lost; that s ok. Assume that diffmodn calculates the difference between two sequence numbers using mod N arithmetic and returns a positive value 4 between 0 and N 1. ackseq = seq(rcvseg) cnt = diffmodn(ackseq,oldest)+1 if cnt < N : inflight = inflight-cnt oldest = ackseq if oldest == newest : stop_timer else start_timer oldest = (oldest+1) mod N It could be that this ACK is a request for retransmission of the segment with sequence number oldest. In this case 5 the ACK will specify a sequence number that s (oldest-1) mod N. No previously unacknowledged segment is acknowledged, hence the transmit window is unchanged. Otherwise, the ACK will be for some segment in the transmit window. The count of segments in flight is reduced accordingly and the base of the window moves forward to one past the segment just acknowledged. If this ACK acknowledges the newest segment (the one just transmitted), then we have no segments in flight, so stop the timer. If there are segments still in flight, restart the timer (this may result in a longer timeout for the oldest segment still in flight). Corrupt segment: There s nothing to be done when this happens. It may be that we ll receive an uncorrupted ACK in a bit, in which case the loss of this ACK won t matter. Or maybe we won t, in which case the timer will go off and we ll retransmit. Timeout: We haven t received an ACK for the oldest segment in flight, and it s well past time for that to happen. Assume that there s been an error and resend everything. 4 In other words, diffmodn returns the result of counting forward from oldest to ackseq. 5 To justify the test cnt < N, recall that (k 1) mod N (k + (N 1)) mod N. 16 June, 2012

udt_send(segbuffer[oldest]) start_timer xmtseq = (oldest+1) mod N cnt = 1 while cnt < inflight : udt_send(segbuffer[xmtseq]) cnt = cnt + 1 xmtseq = (xmtseq+1) mod N What actions are required of the receiver? Really, there are only two events of interest: the arrival of an uncorrupted segment with the expected sequence number (the correct segment ), and the arrival of any other segment, corrupt or not ( default ). Assume a variable expected that contains the expected sequence number. Initialise expected to 0. Correct segment: The receiver should extract the message and pass it to the application and send an acknowledgement to the sender. msg = extract(rcvseg) deliver_msg(msg) ackseg = wrap(expected,ack,chksum) udt_send(ackseg) expected = (expected+1) mod N Default: If anything else arrives, it s wrong. Repeat the acknowledgment of the last segment correctly received. ackseq = (expected+(n-1)) mod N ackseg = wrap(ackseq,ack,chksum) udt_send(ackseg) We can summarise the windows at the sender and receiver as follows: At the sender, the oldest segment of interest is the oldest segment that s been transmitted but not acknowledged. The window of available sequence numbers is anchored here. The window advances each time the oldest segment is acknowledged by the receiver. At the receiver, the only segment of interest is the expected segment, a trivial window of size 1. The window advances each time the expected segment is received. 17 June, 2012

Ideally, we will get maximum utilisation of the channel if the time required to transmit a full set of segments (where a full set is defined to be the length of the sender s window) is equal to the RTT of the channel, in seconds. The acknowledgement for the first segment transmitted would arrive at the sender just as it finishes transmitting the segment that fills the window. If acknowledgements arrive on schedule, there will be no gaps in the outgoing stream of segments. The problem with go-back-n is that, in worst case, error recovery can require retransmission of one entire window of segments. Over a channel with long latency and high bandwidth, this can be a very large amount of data. Can we do better if we adopt the policy that the receiver will acknowledge and keep all segments received without error and the sender will retransmit only those segments that are not acknowledged? To achieve in-order delivery, the receiver will need to buffer segments while it requests retransmission of a missing segment and awaits its arrival. If we can buffer segments at the sender, we can surely manage it at the receiver. The receiver will have a nontrivial window, as we re willing to accept segments that arrive early and buffer them until they can be passed to the application. There s a subtle problem stemming from the fact that the sender and receiver may not agree on the position of the window. The following figure 6 illustrates the problem. 6 This is Figure 3.23 from the text, with sequence numbers and annotations to make clear the story told by the figure. The window size (14) is an odd choice, but only because it s not 2 k 1. 18 June, 2012

Sender oldest newest segments not yet transmitted * * ** * 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 segments acknowledged used window available past future Receiver oldest (not received) newest (not received) segments not yet received ** * * * 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 segments acknowledged and delivered to application window The scenario in the figure shows that the sender thinks it has transmitted all segments up to segment #3 (the portion of the window labelled used ) but hasn t received acknowledgements for segments #11 or #12. The receiver is awaiting segment #0. It thinks it has acknowledged all segments up to segment #14. and has also received and acknowledged segments #1, #2, and #3. This implies that ACK(11) and ACK(12) (the acknowledgements for segments #11 and #12) were corrupted or lost, that segment #0 was corrupted or lost, and that ACK(3) is in flight, corrupted, or lost. Suppose that the sender s timer expires and it decides to retransmit segment #11. The receiver will receive a segment with sequence number 11. The receiver s window extends to sequence number 13, so the arrival of segment #11 is within expectation, if a bit fast. The receiver will accept this duplicate segment, incorrectly, and buffer it for delivery once the intervening segments arrive. The acknowledgement will satisfy the sender and the error will be undetected. But we can t just ignore this error pattern. It will happen that acknowledgements get lost, just as shown here, and there must be some way to clear the lack of acknowledgement at the sender so that it can advance its window. 19 June, 2012

A bit of thought should convince you that every segment in the combined window extending from the leftmost (past) edge of the sender s window to the rightmost (future) edge of the receiver s window must have a unique sequence number. In effect, the sender s and receiver s windows must be treated as one large window. The rule for a selective repeat protocol is that the window at the sender and receiver should be (N 1)/2. Figure 3.27 in the text provides another illustration of this error scenario, using sequence numbers from 0 3 and a window size of 3. With the preliminary analysis out of the way, what are the actions for the sender? Assume that sequence numbers range from 0 to N 1, that the window at the sender and receiver is of size (N 1)/2 = W, and the first segment to be transmitted will receive sequence number 0. As before, oldest holds the sequence number of the oldest unacknowledged segment. Newest will be the sequence number of the most recently transmitted segment, and inuse will count the number of sequence numbers in use. Notice that sequence number arithmetic is performed modulo N, even though the window size is limited to W. Initialise oldest to 0, newest to N 1, and inuse to 0. Message from application: When the application tries to transmit a message, the transport layer must decide if there s room in the window. If so, it can transmit the segment and place a copy in the buffer holding segments in flight. If not, it must refuse the message. if inuse < W : newest = (newest+1) mod N segbuffer[newest] = wrap(newest,msg,chksum) udt_send(segbuffer[newest]) inuse = inuse+1 start_timer(newest) else refuse(msg) Because we re only resending segments that are not correctly received (thus not acknowledged), we need a separate timeout for each segment. Receive ACK: If an ACK arrives without error, we can take that as an indication that the receiver has correctly received the referenced segment. ackseq = seq(rcvseg) stop_timer(ackseq) mark_seg_as_acked(ackseq) if oldest == ackseq : while seg_is_acked(oldest) : 20 June, 2012

inuse = inuse-1 oldest = (oldest+1) mod N If the ACK is for the oldest unacknowledged segment, we can advance the sender s window to the next unacknowledged segment. We have to check the status (acknowledged or not) of each segment. The complete condition for the while loop is while oldest <= newest && seg_is_acked(oldest) : but testing oldest <= newest is awkward in mod N arithmetic. Since a segment that s not yet sent cannot be acknowledged, the while loop must stop when oldest is incremented to be greater than newest. An explicit test for oldest <= newest is not required. Corrupt segment: There s nothing to be done when this happens. It may be that we ll receive an uncorrupted ACK in a bit, in which case the loss of this ACK won t matter. Or maybe we won t, in which case the timer will go off and we ll retransmit. Timeout(lateseq): We haven t received an ACK for the oldest segment in flight, with sequence number lateseq, and it s past time for that to happen. Assume that there s been an error and resend just this one segment. udt_send(segbuffer[lateseq]) start_timer(lateseq) And the receiver? It becomes a bit more complex because it must now manage buffers and a nontrivial window. The base of the window is held in oldest, the sequence number of the first segment not yet received. Each time a segment arrives without error, there are three cases to consider: The sequence number of the segment matches oldest. Call this event oldest. In this case, we want to send an ACK and begin to deliver messages to the application, advancing the window through consecutive sequence numbers until we come to a missing segment. The segment has a sequence number less than oldest, but within the past window (i.e., the W sequence numbers preceding oldest). This is a segment that arrived without error and was delivered to the application, but the ACK was lost and the sender has retransmitted the segment. Call this event past. In this case, we want to send an ACK so the sender will know it s been received, but that s all we need to do. 21 June, 2012

The segment has a sequence number larger than oldest, but within the future window (i.e., the W 1 sequence numbers following oldest). This is a segment that s arrived early. Call this event future. In this case, we want to send an ACK so the sender will know it s been received, but we can t yet deliver it to the application because an earlier message hasn t arrived. Buffer the segment for later delivery. In Python, the actions will be as follows. Oldest: The sequence number of the segment, rcvseq, matches the sequence number in oldest. The receiver should unwrap and deliver the message in this segment, then scan the segment buffer to see if there are additional messages ready for delivery. For uniformity, stash the newly arrived segment in the buffer before starting the scan. rcvseq = seq(rcvseg) ackseg = wrap(rcvseq,ack,chksum) udt_send(ackseg) segbuffer[rcvseq] = rcvseg while!empty(segbuffer[rcvseq]) : msg = extract(segbuffer[rcvseq]) deliver_msg(msg) rcvseq = (rcvseq+1) mod N oldest = rcvseq When the scan reaches an empty segment buffer, reset oldest. Past: All that needs to be done is send an ACK. rcvseq = seq(rcvseg) ackseg = wrap(rcvseq,ack,chksum} udt_send(ackseg) Future: We need to acknowledge this segment and buffer it because we can t deliver the message to the application because one or more earlier segments are missing. rcvseq = seq(rcvseg) ackseg = wrap(rcvseq,ack,chksum} udt_send(ackseg) segbuffer[rcvseq] = rcvseg Default: If the arriving segment is corrupt, we don t need to do anything. If the arriving segment is not corrupt but has a sequence number outside the past or future windows, something is seriously wrong (this shouldn t happen). 22 June, 2012

Now that we have a good understanding of the relationship between window size and range of sequence numbers, it s time to admit that we can t always use the minimum range of sequence numbers. Our model of the channel between the sender and receiver allows for segments to be corrupted or lost completely, and retransmission triggered by timeouts can cause duplicates to arrive at the receiver. But our model of the channel assumes that, with the exception of complete loss, segments arrive at the receiver in the order that they were sent. For a sender and receiver connected by a single link, this is trivially true. Bits cannot pass one another as they propagate along the link. This extends to the situation where there s a single path between the sender and receiver, even if the path has multiple links. This assumption does not hold in a large packet-switched network where there are many alternative paths between the sender and receiver. It s possible (if unlikely) for a segment to be delayed for a significant amount of time, long enough for the sender and receiver to retransmit the segment, recover from the loss, and move on. When the delayed segment finally appears at the receiver, its sequence number may well be within the current window and this would result in the segment being accepted an error. The practical solution is to place an upper bound on the lifetime of packets in the network 7 and use a range of sequence numbers large enough to avoid any repetition in that time period. It s time to summarise what we ve learned about reliable data transfer protocols. Our model of the communication channel between the sender and the receiver allows for three types of errors: Data can be corrupted, so that the bits that arrive at the receiver are not the bits transmitted by the sender. This can occur due to noise in the channel or intermittent equipment failure. With today s technology, data corruption is very rare for guided media, slightly more common with unguided media. Data can be outright lost, so that nothing arrives at the receiver. This can occur when an intermediate router runs out of buffer space and must discard a datagram. 7 For the Internet, this is estimated to be about three minutes. 23 June, 2012

Data can be delayed for long periods and arrive at the receiver out of order. This is an extremely rare error, caused by extreme congestion delays or transient errors in router forwarding tables. It s important to keep in mind that terabytes (10 12 ) of data are transmitted on the Internet each second. A one-in-a-billion error happens somewhere once each millisecond. To achieve reliable data transfer in the presence of these errors, we have a suite of techniques to apply: Checksums are the result of some calculation performed over the data and transmitted with the data for verification by the receiver. They are used to detect corrupted data. Timers are used to measure an interval. In reliable data transfer, they are used to time the interval between sending a segment and receiving the acknowledgement. They are used to detect complete data loss. Sequence numbers are used to identify each unit of data (a segment, for example). They allow the receiver to detect loss or duplication of data. A gap in the sequence numbers seen by the receiver indicates data loss. A repeated sequence number indicates duplication. Acknowledgements provide positive feedback from the receiver to the sender so that the sender knows what data the receiver has received. Acknowledgements allow the sender to discard data that s buffered for possible retransmission. Negative acknowledgements are an alternate implementation choice. A sliding window allows multiple units of data to be in flight between the sender and the receiver. This allows a reliable data transfer protocol to achieve an acceptable data transfer rate by increasing the utilisation of the channel. 24 June, 2012

UDP UDP The User Datagram 8 Protocol (UDP), defined in RFC 768, provides a connectionless, best-effort data transfer service. We ve mentioned already that the Internet network protocol, IP, provides a connectionless, best-effort data transfer service between hosts. Why do we need UDP? For one thing, it adds the essential transport service, the ability to specify particular application processes on the source and destination hosts. A second added capability is a checksum over the entire UDP segment plus selected items from the IP header so that it s possible to detect if the header or data has been corrupted in transmission. There s not a whole lot to a UDP segment: source port (16) destination port (16) length (16) checksum (16) payload (max 64KiB) The source (local) and destination (remote) port numbers are 16 bit values, as explained previously. The 16 bit message length includes the UDP header. The real limit on length is the IP header s length field, also 16 bits. Allowing for 20 bytes of mandatory IP header and 8 bytes of UDP header, the maximum payload is 2 16 8 20 = 65507 bytes. The checksum is the standard Internet checksum documented in RFC 1071. It s calculated as the one s complement sum of the UDP header and data as 16 bit (2 byte) words. If the payload is an odd number of bytes, a byte with value 0 is added for the purpose of computing the checksum. Use of the checksum is optional, and a checksum value of 0 is interpreted as absence of the checksum 9. 8 The RFC refers to a UDP segment as a datagram, but these notes will use segment for compatibility with the text. 25 June, 2012

TCP To provide additional protection against changes in the IP header, selected fields are collected into a pseudo-header that s prepended to the UDP segment when the checksum is calculated. This provides an extra measure of protection against corruption of the the IP header as it s forwarded from router to router. 0 source IP address destination IP address protocol (udp = 17) length To see why this is useful, we need to look ahead a bit. Each router will modify the IP header at the least, the router will modify the hop count in the header. The router must then recalculate the IP header checksum. If the router introduces an error as it modifies the IP header, the new checksum will be correct for the erroneous header. The UDP header and data are not modified in transit, so routers do not have the same opportunity to recalculate the UDP checksum and hide an error. The pseudo-header is not transmitted with the UDP segment, but the UDP checksum is transmitted. The destination end system rebuilds the pseudoheader from the received IP header and recalculates the checksum. A mismatch indicates an error somewhere in the UDP segment or the IP header fields (but it s still not possible to pinpoint the error). The inclusion of the length of the UDP segment in the pseudoheader seems redundant. A likely explanation is that it is included for symmetry with TCP, which uses the same pseudo-header format and does not provide an explicit length field in the TCP segment header. TCP TCP, defined in RFC 793, provides reliable end-to-end data transport over an unreliable network layer. Connections are full duplex and point-to-point. More specifically, TCP is designed to run over an unreliable internetwork, and adapt its behaviour to the varying characteristics (bandwidth, delay, maximum transfer unit) of paths through an internetwork. 9 Recall that this is not a problem for one s complement representation. If the computed checksum turns out to be 0, we just use the other one s complement representation of 0, which is all 1 s. 26 June, 2012

TCP As explained earlier, each end of a TCP connection is identified by an address: port pair. The four-tuple local_ip:local_port ; remote_ip:remote_port defines a connection. This four-tuple must be unique for each connection. Port numbers are interpreted exactly as for UDP. Many applications are allocated the same port number in both the UDP and TCP port space. The unit of data transfer for TCP is called a segment. It consists of a 20 byte mandatory header, some header options, and the data payload. A segment need not have any data. All told, a segment must fit within the payload limit of IP 64K minus whatever space is occupied by the IP header. Recall that TCP provides a byte stream and does not preserve the boundaries between blocks of data as provided by the application. If markers are required to separate messages, the application must supply them. By not preserving boundaries between blocks of data, TCP is free to group data for maximum efificiency before passing it to the network layer for transmission, and before passing it to the application on the receiving side. The goal is to send the least amount of network overhead data per byte of data transferred between applications. The TCP and IP headers have a fixed minimum size, so the only way to reduce overhead is to send as much application data as possible in each segment. The ability to regroup data for efificient transmission allows TCP to avoid silly window syndrome, where the transmitting side is continually sending tiny segments in response to small receiver window increments. To suggest that a TCP implementation send even small amounts of data promptly (e.g., for use with interactive applications where a single keypress or mouse click evokes an action), the protocol defines a push mechanism. Push is a strong suggestion to the local TCP implementation to immediately send whatever data it has accumulated, and a similar suggestion to the remote TCP implementation to immediately deliver any accumulated data to the process at the other end of the connection, without waiting for more data to build a larger segment. 27 June, 2012