Networked Systems and Services, Fall 2018 Chapter 3

Similar documents
Networked Systems and Services, Fall 2017 Reliability with TCP

Internet Networking recitation #10 TCP New Reno Vs. Reno

Congestion / Flow Control in TCP

COMP/ELEC 429/556 Introduction to Computer Networks

image 3.8 KB Figure 1.6: Example Web Page

Transmission Control Protocol. ITS 413 Internet Technologies and Applications

TCP/IP Networking. Part 4: Network and Transport Layer Protocols

8. TCP Congestion Control

TCP and Congestion Control (Day 1) Yoshifumi Nishida Sony Computer Science Labs, Inc. Today's Lecture

CS 356: Introduction to Computer Networks. Lecture 16: Transmission Control Protocol (TCP) Chap. 5.2, 6.3. Xiaowei Yang

TCP. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli (Slides by Christos Papadopoulos, remixed by Lorenzo De Carli)

Outline. CS5984 Mobile Computing

TCP Performance. EE 122: Intro to Communication Networks. Fall 2006 (MW 4-5:30 in Donner 155) Vern Paxson TAs: Dilip Antony Joseph and Sukun Kim

cs/ee 143 Communication Networks

Fast Retransmit. Problem: coarsegrain. timeouts lead to idle periods Fast retransmit: use duplicate ACKs to trigger retransmission

Guide To TCP/IP, Second Edition UDP Header Source Port Number (16 bits) IP HEADER Protocol Field = 17 Destination Port Number (16 bit) 15 16

User Datagram Protocol (UDP):

Flow and Congestion Control Marcos Vieira

TCP Basics : Computer Networking. Overview. What s Different From Link Layers? Introduction to TCP. TCP reliability Assigned reading

Transport Layer. Application / Transport Interface. Transport Layer Services. Transport Layer Connections

CSCD 330 Network Programming

Transport Protocols. Raj Jain. Washington University in St. Louis

UNIT IV -- TRANSPORT LAYER

TCP congestion control:

TCP: Flow and Error Control

Transport Protocols & TCP TCP

Fall 2012: FCM 708 Bridge Foundation I

Overview. TCP congestion control Computer Networking. TCP modern loss recovery. TCP modeling. TCP Congestion Control AIMD

Chapter 24. Transport-Layer Protocols

Lecture 4: Congestion Control

Chapter 3 outline. 3.5 Connection-oriented transport: TCP. 3.6 Principles of congestion control 3.7 TCP congestion control

Transport Layer. -UDP (User Datagram Protocol) -TCP (Transport Control Protocol)

file:///c:/users/hpguo/dropbox/website/teaching/fall 2017/CS4470/H...

7. TCP 최양희서울대학교컴퓨터공학부

Chapter 6. What happens at the Transport Layer? Services provided Transport protocols UDP TCP Flow control Congestion control

Transport Protocols and TCP: Review

TCP Service Model. Today s Lecture. TCP Support for Reliable Delivery. EE 122:TCP, Connection Setup, Reliability

Lecture 5: Flow Control. CSE 123: Computer Networks Alex C. Snoeren

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste

Reliable Transport II: TCP and Congestion Control

Lecture 3: The Transport Layer: UDP and TCP

Transport Layer Marcos Vieira

An Issue in NewReno After Fast Recovery. Yoshifumi Nishida

Advanced Computer Networks

F-RTO: An Enhanced Recovery Algorithm for TCP Retransmission Timeouts

Announcements Computer Networking. Outline. Transport Protocols. Transport introduction. Error recovery & flow control. Mid-semester grades

TCP Congestion Control 65KB W

Recap. TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness

Connection-oriented (virtual circuit) Reliable Transfer Buffered Transfer Unstructured Stream Full Duplex Point-to-point Connection End-to-end service

Advanced Computer Networks

CS457 Transport Protocols. CS 457 Fall 2014

Reliable Transport I: Concepts and TCP Protocol

CPSC 441 COMPUTER COMMUNICATIONS MIDTERM EXAM SOLUTION

Reliable Transport I: Concepts and TCP Protocol

ECE697AA Lecture 3. Today s lecture

Transport layer. UDP: User Datagram Protocol [RFC 768] Review principles: Instantiation in the Internet UDP TCP

Transport layer. Review principles: Instantiation in the Internet UDP TCP. Reliable data transfer Flow control Congestion control

Transport Protocols and TCP

CE693 Advanced Computer Networks

Topics. TCP sliding window protocol TCP PUSH flag TCP slow start Bulk data throughput

Problem 7. Problem 8. Problem 9

TCP Enhancements in Linux. Pasi Sarolahti. Berkeley Summer School Outline

Outline. User Datagram Protocol (UDP) Transmission Control Protocol (TCP) Transport layer (cont.) Transport layer. Background UDP.

CS321: Computer Networks Congestion Control in TCP

ENRICHMENT OF SACK TCP PERFORMANCE BY DELAYING FAST RECOVERY Mr. R. D. Mehta 1, Dr. C. H. Vithalani 2, Dr. N. N. Jani 3

Transport Over IP. CSCI 690 Michael Hutt New York Institute of Technology

CS419: Computer Networks. Lecture 10, Part 3: Apr 13, 2005 Transport: TCP performance

TCP/IP Performance ITL

Transport Layer PREPARED BY AHMED ABDEL-RAOUF

Congestion Control in TCP

Department of Computer and IT Engineering University of Kurdistan. Transport Layer. By: Dr. Alireza Abdollahpouri

Computer Communication Networks Midterm Review

TCP based Receiver Assistant Congestion Control

CS4700/CS5700 Fundamentals of Computer Networks

Improved Selective Acknowledgment Scheme for TCP

Outline Computer Networking. Functionality Split. Transport Protocols

Lecture 7: Flow Control"

Lecture 20 Overview. Last Lecture. This Lecture. Next Lecture. Transport Control Protocol (1) Transport Control Protocol (2) Source: chapters 23, 24

Transmission Control Protocol (TCP)

TCP Review. Carey Williamson Department of Computer Science University of Calgary Winter 2018

Communication Networks

Computer Networks and Data Systems

05 Transmission Control Protocol (TCP)

Computer Networking Introduction

Network Protocols. Transmission Control Protocol (TCP) TDC375 Autumn 2009/10 John Kristoff DePaul University 1

SSFNET TCP Simulation Analysis by tcpanaly

CSCI-1680 Transport Layer II Data over TCP Rodrigo Fonseca

Transport Layer: outline

Multiple unconnected networks

TCP Strategies. Keepalive Timer. implementations do not have it as it is occasionally regarded as controversial. between source and destination

TCP Overview. Connection-oriented Byte-stream

ECE 435 Network Engineering Lecture 10

CMSC 417. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala. October 25, 2018

Intro to LAN/WAN. Transport Layer

Chapter 3- parte B outline

Congestion Control. Brighten Godfrey CS 538 January Based in part on slides by Ion Stoica

Transport Protocols Reading: Sections 2.5, 5.1, and 5.2. Goals for Todayʼs Lecture. Role of Transport Layer

Chapter 3 Review Questions

ETSF05/ETSF10 Internet Protocols Transport Layer Protocols

Transport Layer Protocols. Internet Transport Layer. Agenda. TCP Fundamentals

Transcription:

Networked Systems and Services, Fall 2018 Chapter 3 Jussi Kangasharju Markku Kojo Lea Kutvonen

4. Transport Layer Reliability with TCP Transmission Control Protocol (TCP) RFC 793 + more than hundred other RFCs TCP Loss Recovery mechanisms (not exhaustive): Timer (RTO) Recovery & TCP Reno [RFC 5681] TCP NewReno [RFC 3782, RFC 6582] Limited Transmit [RFC 3042] TCP SACK-based Loss Recovery [RFC 2018, RFC 6675] 2

Remember the Protocol Stack? End-to-end Argument? User Application Transport Network What is the right place to implement reliability? Transport is lowest level end-to-end protocol(in theory) Network User Application Transport Network Link Link Link Physical Physical Physical

Transport Layer Application Presentation Session Transport Network Data Link Physical Function: Demultiplexing (port numbers) Optional functions (TCP provides): Creating long lived connections Reliable, in-order packet delivery Error detection Flow and congestion control Key challenges: Efficient data delivery in the presence of losses Detecting and responding to congestion Balancing fairness against high utilization 4

TCP Transport Service to Applications Connections, 3 Phases of communication Connection establishment, data transfer, connection termination Bidirectional byte stream TCP does not provide messages ==> Applications have to take care of message boundaries! (cf., Unix pipes) Reliable transport No data loss or corruption, data delivered in order, no duplication of data Flow control Congestion control (with approximate fairness) 5

TCP Segment Format 0 4 10 16 24 31 Source port Destination port TCP header length Reserved Checksum Checksum Sequence number Acknowledgement number U A PRSF RCS S Y I G K H T N N Window Urgent pointer Options (0 or more 32 bit words) (padding) Payload (optional) 6

TCP Options Options field for optional features Option space limited - TCP header length field (= 4 bits) indicates the length of the header in 32-bit words => Header max length 15*4 bytes = 60 bytes - 20 bytes for the fixed header => max. 40 bytes for options Option type Option length Option value 1 byte 1 byte length - 2 bytes 7

Connection Setup Why do we need connection setup? To establish state on both hosts Most important state: sequence numbers - Count the number of bytes that have been sent&received - Initial value chosen at random - Why? Client Server isn = initial sequence number 8

Data Transfer & Sequence Number Space TCP uses a byte stream abstraction Each byte in each stream is (implicitly) numbered 32-bit value, wraps around Byte stream broken down into segments (packets) Size limited by the Maximum Segment Size (MSS) MSS set to limit IP fragmentation MSS is based on the local link MTU size and the Receiver MSS negotiated during the connection setup (MSS Option) OR using Path MTU Discovery Each segment has a sequence number 13450 14950 16050 17550 Segment 8 Segment 9 Segment 10 9

TCP Connection Termination Client Server Close Close Timed wait 2*MSL Closed Time Time Closed 10

How are TCP Acks Generated? Acknowledgement number indicates the sequence number that receiver expects to receive next Highest sequence number received in order + 1 Acknowledgements are cumulative ACK number k implies ACK of all sequences numbers < k Delayed ACKs Receiver does not need to acknowledge each segment separately At least every second (full-sized) segment should be acknowledged Sending ACK is delayed at most 500 msecs, if the next segment in order has not arrived - Many implementations use delayed ACK timer of 200 msecs Out-of-order segments are acknowledged immediately! Send ACK for the highest sequence number received in order 11

TCP Congestion Control TCP congestion control is one of the most important functions to ensure stable operation of the Internet When routers become congested, they have to drop packets Congestion control is intertwined with the loss recovery and thereby with TCP performance Congestion window,cwnd, controls how much unacknowledged data can be in flight in the network (FlightSize) Largest allowed sequence number when sending = highest acknowledged sequence number + cwnd In the following, Congestion window and other congestion control details are present, but we focus on loss recovery In the Internet Protocol course we focus on congestion control 12

Slow Start Slow Start is used, when the network state is considered unknown At the beginning of the TCP connection (Initial Slow Start) After retransmission timer expires (in RTO Recovery) When there has not been anything to send for a while (Restart After Idle) Basic idea Increasecwnd until segment loss is detected = network becomes congested ORcwnd reaches Slow Start Threshold (ssthresh) cwnd is increased at most by one MSS per arriving new acknowledgement (an ACK that acknowledges new data) -cwnd gets roughly doubled per each Round-Trip Time (RTT) * Exponential growth as function of RTT 13

Slow Start cwnd starts from 1 MSS for RTO loss recovery Cwnd can be larger for Initial Slow Start and Restart After Idle Sender cwnd = 1 MSS 1. RTT Receiver Data segment cwnd = 2 MSS 2. RTT ACK cwnd = cwnd = 3 MSS 4 MSS 3. RTT Slow Start also effectively diverts RTO Recovery away from go-back-n 14

Retransmission Timeout (RTO) Recovery Retransmission timer is set when data segment is sent When retransmission timer expires, it starts the RTO loss recovery with congestion control actions: cwnd = 1 MSS; ssthresh = max (FlightSize/2, 2*SMSS) (*) Retransmit first unacknowledged segment Continue (re)transmission in Slow Start until cwnd > ssthresh after which enter Congestion Avoidance Each new ACK indicates next sequence number (segment) to retransmit When there are no more segments to retransmit, continue by transmitting new data (*) FlightSize: the amount of unacknowledged data a TCP sender has in flight ssthresh (Slow-Start Threshold): is used to indicate previously observed safe sending rate 15

RTO Recovery For simplicity: MSS = 1 B (byte) Sender Receiver Assume TCP cwnd = 4 MSS Ack =2 RTO Ack =2 Ack=2 Timer expires, enter RTO loss recovery ssthresh= 2; cwnd = 1 cwnd = 2... Time... Time 16

Retransmission timer [RFC 6298] Retransmission timer runs for the first unacknowledged segment Important to find a proper value for the retransmission timer: Too big timer value: start of the loss recovery is delayed Too small timer value: timer expires spuriously - Results in unnecessary retransmissions, in the worst case full window of data is unnecessarily retransmitted! - Results also in unnecessary congestion response (transmission rate decreased) Initial RTO value >= 1 sec (recently changed from 3 to 1 sec) After this, a proper value is estimated (computed) dynamically from the measured Round-Trip Time (RTT) 17

Calculating RTO timer value RTT is measured continuously when ACKs arrive TCP sender calculates weighted moving average, to be used as the smoothed RTT:SRTT SRTT is updated each time an RTT sample is measured (at least once per window, i.e., once per RTT) SRTT = (1-α)*SRTT + α*rttsample where α = 1/8 = 0.125 Calculates also RTT variation,rttvar RTTvar =(1-β)*RTTvar + β* RTTsample-SRTT where β = 1/4 = 0.25 Timer value: RTO = SRTT + 4*RTTvar 18

RTT Sample Ambiguity RTO Sample? RTO Sample? What is RTT of a retransmitted segment? 19

Accurate measurement of RTO Solution to acknowledgement ambiguity: Karn s algorithm Don t update the RTT estimate for retransmitted segments How often RTT samples are measured? Some implementations take only one sample per window (= one per RTT) - RFC 6298 requires at least once per RTT Many newer implementations, e.g., Linux, measures RTT for each valid segment In practice only one retransmission timer is running at a time For the first unacknowledged segment Timer is restarted each time when an ACK that acknowledges new data arrives ==> effective timer value = RTO + 1 RTT 20

Fast Retransmit Duplicate ACK (dupack) When an out-of-order data segment arrives at a TCP receiver, the TCP receiver acknowledges immediately with a pure ACK the highest sequence number received in order (i.e., the same acknowledgement number as in the ACK that acknowledged the last segment received in order) Receiving dupacks indicates that Segments are leaving the network A segment has been received out-of-order and what is the expected sequence number After receiving 3 consecutive dupacks, TCP sender Fast Retransmits the first unacknowledged segment [ Sets also cwnd & ssthresh (see steps 1&2 on slide 23) ] After the Fast Retransmit, the sender continues in Fast Recovery 21

New Reno Fast Recovery [RFC 6582, (old: RFC 3782 * )] Fast Recovery allows transmission of new data during loss recovery NewReno Fast Recovery is able to recover one lost segment per RTT * See RFC 3782 for possibly easier to understand description 22

Fast Recovery (NewReno) recover variable is used to determine when recovery is over (and to avoid multiple false Fast Retransmits) Fast Retransmit & Fast Recovery (NewReno) triggered by 3 rd DupACK: 1. Set recover = highest sequence number transmitted so far" [ Setssthresh = max (FlightSize / 2, 2*MSS) ] 2. Retransmit the first unacknowledged segment and [ set cwnd = ssthresh + 3*MSS ] 3. For each additional duplicate ACK received while in Fast Recovery increment cwnd by one MSS to potentially allow transmitting new segment 4. Transmit a new segment, if allowed by the new value of cwnd (andrwnd) 5. When an ACK arrives that acknowledges new data, a) If this ACK acknowledges all of the data up to and including "recover", then recovery is completed; b) Otherwise, acknowledgement is a Partial ACK and recovery should continue (see next slide) Step 3 allows transmitting new data also during loss recovery 23

NewReno/Step 5 b): Partial ACK On each Partial ACK Retransmit first unacknowledged segment [ Deflatecwnd by the amount of new data acknowledged by the Partial ACK. If the partial ACK acknowledges at least one MSS of new data, then add back MSS bytes tocwnd ] Transmit a new segment, if allowed by the new value of cwnd Continue Fast Recovery - Repeat steps 3&4 on arrival of dupack - Repeat step 5 on arrival of an ACK that acknowledges new data 24

Fast Retransmit&Fast Recovery (NewReno) Sender Receiver For simplicity: MSS = 1B cwnd = 6 ssthresh=3; cwnd = 3+3 = 6; Recover =7 cwnd = 6-2+1 = 5 cwnd = 5-2+1 = 4 cwnd = 4+1 = 5 Recovery Done cwnd = 3 Time... Time 25

Limited Transmit [RFC 3042] Problem: If cwnd is small (cwnd < 4 ), OR several segments are dropped in a single window ==> It is possible that TCP sender cannot receive three dupacks ==> TCP sender has to wait for retransmission timeout and recover using Slow Start (with drasticcwnd reduction) This delays the start of a recovery and is inefficient Solution: Limited Transmit Transmit a new data segment on each of the first two dupacks Transmitting new data segments can be allowed as a dupack indicates that a segment has left the network New data segments trigger more dupacks 26

Limited Transmit For simplicity: MSS = 1B Sender Receiver cwnd = 3 Ack =2 Ack =3 No need to wait for RTO, as three dupacks arrive RTO Fast Retransmit...... Time Time 27

TCP Selective Acknowledgements [RFC 2018, RFC 6675] Duplicate ACKs indicate only one missing segment (next expected) Similarly each cumulative ACK during recovery (i.e., NewReno partial ack) indicates only one missing segment (next expected) ==> NewReno Fast Recovery can recover only one segment per RTT Also, in RTO recovery several segments are often unnecessarily retransmitted Selective Acknowledgement (SACK) option allows identifying several missing segments with a single dupack 28

SACK option (RFC 2018) TCP SACK-permitted option type =4 length=2 1 byte 1 byte Used in connection establishment (with SYN segments) to negotiate the use of SACK option TCP SACK option Carries information about sequence number ranges (SACK blocks) that have arrived successfully, but outof-order, at the receiver (stored in the receive buffer) 29

TCP SACK option type =5 length=n Beginning of the 1 st block (seq.no) End of the 1st block (seq.no+1) Beginning of the 2 nd block (seq.no) End of the 2 nd block (seq.no+1) Beginning of the 3 rd block (seq.no) End of the 3 rd block (seq.no+1) One TCP segment may carry max 4 SACK blocks, as max 40 bytes have been reserved for TCP options (use of other TCP options reduces this). 30

Sending SACK option Always, when acknowledging an out-of-order segment (i.e., always, when acknowledging other than the highest sequence number that has arrived) SACK option includes as many latest sequence number ranges as possible Each arrived segment (block) becomes reported several times (i.e., repeated with the later ACKs) First block in the SACK option includes the segment that triggered the acknowledgement SACK information is only informative for the TCP sender TCP sender must not remove a segment from its send buffer until a cumulative ACK acknowledging it arrives 31

SACK-based Recovery [RFC 6675] With help of the SACK option a TCP sender may recover more than one lost segment within one RTT (cf. NewReno) TCP sender maintains scoreboard data structure with the retransmission queue (updated on arrival of an ACK and after transmitting a segment) SACKed: information whether a SACK block corresponding to the segment has been received HighACK: sequence number of the highest byte of data that has been cumulatively ACKed HighRxt: highest sequence number that has been retransmitted during the current loss recovery phase HighData: highest sequence number transmitted pipe: an estimate of the number of bytes (segments) outstanding in the network cwnd limits transmission of segments during loss recovery; Ifcwnd pipe >= 1 SMSS, sender can transmit segments If there are segments that are considered lost, retransmit as many lost segments as cwnd allows - a segment is considered lost, if at least 3 discontinuous SACKed sequences have arrived above the segment or more than 2 * SMSS bytes with sequence numbers above the segment have been SACKed If there are not enough lost segments to transmit, transmit as many new data segments ascwnd allows If no lost nor new segments to transmit, follow the rules in Steps (3) & (4) of NextSeg() in RFC 6675 to retransmit one data segment not considered lost 32

SACK Fast Retransmit (RFC 6675) If at least 3 segments above HighAck+1 has been SACKed (*): 1. SetRecoveryPoint = HighData; 2. [ Set ssthresh = cwnd = FlightSize / 2 ] 3. Retransmit the first unacknowledged segment and set HighRxt = highest sequence number in the retransmitted segment 4. Recalculate a new value for pipe: Includes all data (segments) that have been sent but not ACKed (either cumulatively or SACKed), but not segments that are considered lost ( = at least 3 later segments after the segment have reached the receiver and have been SACKed) Includes all retransmitted data (segments) (HighACK < seqno <=HighRxt) 5. Ifcwnd pipe >= 1 SMSS, sender can transmit segments In the first place retransmit lost segments then transmit new data As many as allowed by cwnd If no lost segments nor new data, send one segment as per Steps (3) & (4) of NextSeg() After transmitting, updatehighrxt,highdata andpipe (*) On each ACK with SACKed data, use Limited Transmit to send at most one SMSS of new data ( ifcwnd pipe >= 1 SMSS ) 33

SACK Fast Recovery (cont d) On each arriving ACK: A. If cumulative ACK number >RecoveryPoint Recovery completed, exit FastRecovery B. If cumulative ACK number <=RecoveryPoint Update scoreboard with SACK information Update pipe (like in step 4 above) C. Ifcwnd pipe >= 1 SMSS, sender can transmit segments In the first place retransmit lost segments then transmit new data As many as allowed by cwnd If no lost segments nor new data, send one segment as per Steps (3) & (4) of NextSeg() After transmitting, updatehighrxt,highdata andpipe 34

Fast Retransmit & Fast Recovery (SACK) Sender Receiver For simplicity: MSS = 1B ack=2 ack=2; SACK 3 cwnd = 6 ack=2; SACK 3, 5 ack=2; SACK 3, 5, 7 RecoveryPoint=7; ssthresh=3 cwnd = 3; pipe =2 pipe =3 ack=4; SACK 5, 7 cwnd = 3; pipe = 2 pipe = 3 ack=4; SACK 5, 7-8 Recovery Done pipe = 1 pipe = 2 pipe = 3 pipe = 2 pipe = 3 pipe = 1 pipe = 2 pipe = 3 pipe = 2 Time ack=6, SACK 7-8 ack=6, SACK 7-9 ack=6, SACK 7-10 ack=11 ack=12 Time 35