IEEE/ACM TRANSACTIONS ON NETWORKING 1

Size: px
Start display at page:

Download "IEEE/ACM TRANSACTIONS ON NETWORKING 1"

Transcription

1 IEEE/ACM TRANSACTIONS ON NETWORKING 1 DX: Latency-Based Congestion Control for Datacenters Changhyun Lee, Chunjong Park, Keon Jang, Sue Moon, and Dongsu Han, Member, IEEE Abstract Since the advent of datacenter networking, achieving low latency within the network has been a primary goal. Many congestion control schemes have been proposed in recent years to meet the datacenters unique performance requirement. The nature of congestion feedback largely governs the behavior of congestion control. In datacenter networks, where round trip times are in hundreds of microseconds, accurate feedback is crucial to achieve both high utilization and low queueing delay. Proposals for datacenter congestion control predominantly leverage explicit congestion notification (ECN) or even explicit innetwork feedback to minimize the queuing delay. In this paper, we explore latency-based feedback as an alternative and show its advantages over ECN. Against the common belief that such implicit feedback is noisy and inaccurate, we demonstrate that latency-based implicit feedback is accurate enough to signal a single packet s queuing delay in 10 Gb/s networks. Such high accuracy enables us to design a new congestion control algorithm, DX, that performs fine-grained control to adjust the congestion window just enough to achieve very low queuing delay while attaining full utilization. Our extensive evaluation shows that: 1) the latency measurement accurately reflects the one-way queuing delay in single packet level; 2) the latency feedback can be used to perform practical and fine-grained congestion control in high-speed datacenter networks; and 3) DX outperforms DCTCP with 5.33 times smaller median queueing delay at 1 Gb/s and 1.57 times at 10 Gb/s. Index Terms Datacenter networks, congestion control, TCP, low latency. I. INTRODUCTION THE QUALITY of network congestion control fundamentally depends on the accuracy and granularity of congestion feedback. For the most part, the history of congestion Manuscript received June 5, 2015; revised April 20, 2016; accepted May 31, This work was supported in part by the Institute for Information and communications Technology Promotion within the Ministry of Science, ICT and Future Planning (MSIP) through the Program titled Development of an NFV-Inspired Networked Switch and an Operating System for Multi-Middlebox Services (B ) and through the Program titled Creation of PEP based on automatic protocol behavior analysis and Resource management for hyper connected for IoT Services (B ) and in part by the National Research Foundation of Korea within MSIP under Grant This paper is an extended version of a previous conference publication [1]. C. Lee is with the National Security Research Institute, Daejeon 34044, South Korea ( changhnlee@gmail.com). C. Park and S. Moon are with the School of Computing, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea ( cjpark87@gmail.com; sbmoon@kaist.edu). K. Jang was with Intel Labs, Santa Clara, CA USA. He is now with Google, Mountain View, CA USA ( gunjang11@gmail.com). D. Han is with the School of Electrical Engineering, Korea Advanced Institute of Science Technology, Daejeon 34141, South Korea ( dongsu.han@gmail.com). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TNET control has largely been about identifying the right form of congestion feedback. From packet loss and explicit congestion notification (ECN) to explicit in-network feedback [2], [3], the pursuit for accurate and fine-grained feedback has been central tenet in designing new congestion control algorithms. Novel forms of congestion feedback have enabled innovative congestion control behaviors that formed the basis of a number of flexible and efficient congestion control algorithms [4], [5], as the requirements for congestion control diversified [6]. With the advent of datacenter networking, identifying and leveraging more accurate and fine-grained feedback mechanisms have become even more crucial [7]. Round trip times (RTTs), which represent the interval of the control loop, are few hundreds of microseconds, where as TCP is designed to work in the wide area network (WAN) with hundreds of milliseconds of RTTs. Prevalence of latency-sensitive flows in datacenters (e.g., Partition/Aggregate workloads) requires low latency while the end-to-end latency is dominated by in-network queuing delay [7]. As a result, proposals for datacenter congestion control predominantly leverage ECN (e.g., DCTCP [7] and HULL [8]) or explicit in-network feedback (e.g., RCP-type feedback [3]), to minimize the queuing delay and the flow completion times. This paper takes a relatively unexplored path of identifying a better form of feedback for datacenter networks. In particular, this paper explores the prospect of using network latency as congestion feedback in the datacenter environment. We believe latency can be a good form of congestion feedback in datacenters for a number of reasons: (i) by definition, it includes all queuing delay throughout the network, and hence is a good indicator for congestion; (ii) a datacenter is typically owned by a single entity who can enforce all end hosts to use the same latency-based protocol, effectively removing potential source of errors originating from uncontrolled traffic; and (iii) finally, latency-based feedback does not require any switch hardware modifications. Although latency-based feedback has been previously explored in WAN [9], [10], the datacenter environment is very different, posing unique requirements that are difficult to address. Datacenters have much higher bandwidth (10 Gbps to even 40 Gbps) at the end host and very low latency (few hundreds of microseconds) in the network. This makes it difficult to measure the queuing delay of individual packets for a number of reasons: (i) I/O batching at the end host, which is essential for high throughput, introduces large measurement error ( III). (ii) Measuring queuing delay requires high precision because a single MSS packet introduces only 0.3 (1.2) microseconds of queuing delay IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 2 IEEE/ACM TRANSACTIONS ON NETWORKING in 40 GbE (10 GbE) networks. As a result, the common belief is that latency measurement might be too noisy to serve as reliable congestion feedback [7], [11]. On the contrary, we argue that it is possible to accurately measure the queuing delay at the end-host, so that even a single packet queuing delay is detectable. Realizing this requires solving several design and implementation challenges. First, even with very accurate hardware measurement, bursty I/O (e.g., DMA bursts) leads to inaccurate delay measurements. Second, ACK packets on the reverse path may be queued behind data packets and add noise to the latency measurement. To address these issues, we leverage a combination of recent advances in software low latency packet processing [12], [13] and hardware technology [14] that allows us to measure queuing delay accurately. Such accurate delay measurements enable a more finegrained control loop for datacenter congestion control. In particular, we envision a fine-grained feedback control loop achieves near zero-queuing with high utilization. Translating latency into feedback control to achieve high utilization and low queuing is non-trivial. We present DX, a latency based congestion control that addresses these challenges. DX performs window adaptation to achieve low queuing delay (as low as that of HULL [8] and 6.6 times smaller than DCTCP), while achieving 99.9% utilization. Moreover it provides advantages over recent works in that it does not require any switch modifications. To summarize, our contributions in this paper are the followings: (i) thorough evaluation of ECN-based congestion feedback in comparison to latency feedback; (ii) novel techniques to accurately measure in-network queuing delay based on end-to-end latency measurements; (iii) a congestion control logic that exploits latency-based feedback to achieve just a few packets of queuing delay and high utilization without any form of in-network support; and (iv) a prototype that demonstrates the feasibility and its benefits in our testbed. II. COMPARISON OF CONGESTION FEEDBACK As congestion control in datacenters needs to react within RTTs orders of magnitude smaller than in WAN, most proposals for datacenter congestion control leverage ECN or explicit in-network feedback [7], [8], [15] [17]. We describe and compare them with latency-based feedback. Explicit Congestion Notification: DCTCP [7] and many other proposals [8], [15], [16] use ECN to detect congestion before the queue actually overflows. Typically, congestion level is measured by calculating the fraction of ECN-marked ACK packets out of the total ACK packets in each window. To absorb instant fluctuations in queue occupancy, DCTCP takes the moving average of the sample fractions over multiple windows and estimates the probability with which the queue length is higher than the marking threshold. After detecting congestion, it decreases the window size in proportion to the congestion level. This allows DCTCP to maintain the average queuing delay small, near the marking threshold, K. Explicit in-network feedback provides multi-bit congestion indicator that is much more accurate and fine-grained than ECN and has been used in several proposals for datacenter networking [2], [4], [6], [18], [19]. The key difference is that it also notifies how much the network is under-utilized, in addition to signaling congestion. Such feedback enables multiplicative-increase and multiplicativedecrease (MIMD), which results in high utilization and fast convergence [2], [3], [6]. However, this idealized feedback requires in-network support. Currently, we are unaware of any commodity switches that are capable of generating explicit in-network feedback [7]. Latency feedback: TCP Vegas [9] has introduced latency congestion feedback in wide-area network. If latency can be accurately measured to reflect the network congestion, it has more benefits than other types of feedback. First, it is implicit feedback that does not require any in-network support. Second, it can take on a much larger range of values than ECN or QCN [17], [20], offering a finer-grained form of feedback. The difference from in-network feedback of RCP [3] or XCP [2] is that latency feedback cannot signal the remaining network capacity when the network is not being fully utilized, but only notifies when and how congested the network is. Our proposal in this paper is using latency feedback in datacenter networks. In Section III, we show that our measurement methodology effectively captures the network queueing delay in datacenter environment. The rest of this section provides a quantitative comparison of the feedback. Note that we only try to evaluate the feedback itself, not the congestion control algorithm using the feedback. A. Accuracy of ECN-Based Feedback We quantify the accuracy of DCTCP s ECN feedback with respect to an ideal form of explicit in-network feedback that accurately reflects the congestion level, such as that of RCP or XCP. To do this, we take the queue size as the ground truth congestion level and plot the measured feedback using ns-2 simulation. We use a simple dumbbell topology of 40 DCTCP senders and receivers with 10 Gbps link capacity. The RTT between nodes is 200 µs, the ECN marking threshold is set to K =35[7], and the queue capacity on the bottleneck link is set to 100 packets; the queue does not overflow during the simulation. Each sender starts a DCTCP flow and records the congestion level given by the fraction of ECN marked packets for each congestion window. We take the average switch queue occupancy during the window as the ground-truth congestion level. Figure 1 shows the percentage of ECN marked packets and its moving average as used by DCTCP. The x-axis represents the ground-truth, and the y-axis indicates the measured level of congestion (percentage of ECN marked packets). Along with the ECN congestion feedback, we plot a line for the ideal congestion feedback that informs the exact number of packets in the queue. The ideal congestion feedback models a form of explicit in-network feedback similar to that of RCP [3]. For example, the ideal feedback at 100% congestion, with respect to the maximum queue size, should be 100 queued packets, which is the amount to reduce in the next round to achieve zero queuing delay.

3 LEE et al.: DX: LATENCY-BASED CONGESTION CONTROL FOR DATACENTERS 3 Fig. 1. Congestion level vs. ECN fraction and its moving average. Fig. 2. Congestion level vs. measured latency. From this simple experiment, we make the following three key observations: Accuracy is low. The fraction of ECN-marked packets is not a good indicator of the congestion level. About 50% of feedback is either 0 or 100; 16% (33%) of the times, the measured congestion level was 0 (100). Values other than 0 and 100 do not reflect the level of congestion accurately either. A wide range of switch queue occupancy shares the same feedback. For example, both the queue lengths, 28 and 64 can result in the feedback of 80%. As a result, the Pearson correlation coefficient between the actual congestion and measured feedback was only (compared to for latency feedback presented later); 1.0 is the highest correlation and 0.0 means no correlation. The RMSE (root mean square error) with respect to the ideal feedback was (compared to 1.05 of latency feedback). Granularity is too coarse. The congestion feedback in Figure 1 is very coarse grained. The fundamental reason is its dependency on the window size. For example, five is the most frequently appearing window size in our simulation. In this case, the feedback (i.e., ECN-marked fraction) can only take on six values from 0%, 20%, 40%, 60%, 80%, to 100%, while the actual congestion level is between 9 and 69 packets (61 different levels). Taking the moving average does not help and even degrades the accuracy as the measured congestion level stays the same for a wide range of queue lengths (Figure 1). We observe in Figure 1 that the moving average smoothes out the extreme congestion level values of 0s and 1s. However, very little correlation exists between the ECN-based congestion feedback and the actual queue lengths. The measured congestion level (i.e., the moving average) always resides between and 0.755, while the actual queue occupancy (the ground-truth congestion level) varies between 7 and 70 packets. As a result, the correlation coefficient drops to , and the RMSE with respect to the ideal feedback is relatively high at B. Accuracy of Latency-Based Feedback In the face of the above disparity between the actual queueing and ECN-based feedback, we have turned to latency feedback as an alternative. As both senders and receivers are under the same administrative domain in datacenter networks, we assume that we could instrument both ends, and high-precision latency measurements are feasible. Later in Section III, we introduce our detailed techniques for accurate latency measurement. Assuming that latency can be measured accurately for now, we verify that latency measurements accurately reflect the ground-truth congestion level, using ns-2 simulation. The congestion level is measured once for every congestion window in the following way. The sender measures the RTT for every data packet and sets its minimum as the base RTT without queueing delay. The difference between the base RTT and a sample RTT represents the queueing delay, which is the congestion level. Figure 2 shows the actual congestion level versus latency based congestion-level measurement. For ease of comparison, we consider the maximum possible queueing delay as the congestion level 100% and translate the measured latency into congestion level accordingly. Latency feedback (i.e., queuing delay) naturally reflects the average queue lengths. The correlation coefficient is as high as , and the RMSE against the ideal feedback is only 1.05, which is 32 times smaller than the raw ECN fraction feedback, and 23 times smaller than the moving average. Now the next section discusses how to achieve accurate latency measurement to capture the congestion level in the real network. III. ACCURATE QUEUING DELAY MEASUREMENT Latency measurement can be inaccurate for many reasons including variability in end-host stack latency, NIC queuing delay, and I/O batching. In this section, we describe several techniques to eliminate such sources of errors. Our goal is to achieve a level of accuracy that can distinguish even a single MSS packet queuing at 10 Gbps, which is 1.2 µs. Thisis necessary to target near zero queuing as congestion control should be able to back off even when a single packet is queued. Before we introduce our solutions to each source of error, we first show how noisy the latency measurement is without any care. Figure 3 shows the round trip time measured by the sender s kernel when saturating a 10 Gbps link; we generate TCP traffic using iperf [21] on Linux kernel. the sender and the receiver are connected back to back, so no queueing is expected in the network. Our measurement shows that the round-trip time varies from 23 µs to 733 µs, which potentially gives up to 591 packets of error. The middle 50% of RTT samples still exhibit wide range of errors of 111 µs that corresponds to 93 packets. These errors are an order of magnitude larger than our target latency error, 1.2 µs.

4 4 IEEE/ACM TRANSACTIONS ON NETWORKING Fig. 5. Example delay calibration for bursty packet reception. Fig. 3. Fig. 4. Round-trip time measured in kernel. TABLE I SOURCES OF ERRORS IN LATENCY MEASUREMENT AND OUR TECHNIQUES FOR MITIGATION Timeline of timestamp measurement points. Table I shows four sources of measurement errors and their magnitude. We eliminate each of them to achieve our target accuracy ( 1 µsec). Removing host stack delay: End-host network stack latency variation is over an order of magnitude larger than our target accuracy. Our measurement shows about 80 µs standard deviation, when the RTT is measured in the Linux kernel s TCP stack. Thus, it is crucial to eliminate the host processing delay in both a sender and a receiver. For software timestamping, our implementation choice eliminates the end host stack delay at the sender as we timestamp packets right before the TX, and right after the RX on top of DPDK [13]. Hardware timestamping innately removes such delay. Now, we need to deal with the end-host stack delay at the receiver. Figure 4 shows how DX timestamps packets when a host sends one data packet and receives back an ACK packet. To remove the end host stack delay from the receiver, we simply subtract the t 3 t 2 from t 4 t 1. The timestamp values are stored and delivered in the option fields of the TCP header. Burst reduction: TCP stack is known to transmit packets in a burst. The amount of burst is affected by the window size and TCP Segmentation Offloading (TSO), and ranges up to 64 KB. Burst packets affect timestamping because all packets in a TX burst get the almost the same timestamp, and yet they are received by one by one at the receiver. This results in an error as large as 50 µs. To eliminate packet bursts, we use a software token bucket to pace the traffic at the link capacity. The token bucket is a packet queue and drained by polling in SoftNIC [22]. At each poll, the number of packets drained is calculated based on the link rate and the elapsed time from the last poll. The upper bound is 10 packets, which is enough to saturate 99.99% of the link capacity even in 10 Gbps networks. We note that our token bucket is different from TCP pacing or the pacer in HULL [8] where each and every packet is paced at the target rate; our token bucket is simply implemented with very small overhead. In addition, we keep a separate queue for each flow to prevent the latency increase from other flows queue build-ups. Error calibration: Even after the burst reduction, packets can be still batched for TX as well as RX. Interestingly, we find that even hardware timestamping is subject to the noise introduced by packet bursts due to its implementation. To quantify such noise, we run a simple experiment where a sender is connected to a receiver back to back and sends traffic at near line rate of 9.5 Gbps. Ideally, all packets should be spaced with 1.23 µs interval, but the result shows that 68% of the packet gaps for TX and 32% for RX fall below 1.23 µs. The detailed error distribution can be found in our previous paper [1]. The packet gaps of TX are more variable than that of RX, as it is directly affected by I/O batching, while RX DMA is triggered when a packet is received by the NIC. The noise in the H/W is caused by the fact that the NIC timestamps packets when it completes the DMA, rather than timestamping them when the packets are sent or received on the wire. We believe this is not a fundamental problem, and H/W timestamping accuracy can be further improved by minor changes in implementation. In this paper, we employ simple heuristics to reduce the noise by accounting for burst transmission in software. Suppose two packets are received or transmitted in the same batch as in Figure 5. If the packets are spaced with timestamps whose interval is smaller than what the link capacity allows, we correct the timestamp of the latter packet to be at least transmission delay away from the former packet s timestamp. One-way queuing delay: So far, we have described techniques to accurately measure RTT. However, RTT includes the delay on the reverse path, which is another source of noise for determining queuing on the forward path. A simple solution to this is measuring one-way delay which requires clock

5 LEE et al.: DX: LATENCY-BASED CONGESTION CONTROL FOR DATACENTERS 5 Fig. 6. One-way queuing delay without time synchronization. synchronization between two hosts. PTP (Precision Time Protocol) enables clock synchronization with sub-microseconds [23]. However it requires hardware support and possibly switch support to remove errors from queuing delay. It also requires periodic synchronization to compensate clock drifts. Since we are targeting a microsecond level of accuracy, even a short term drift could affect the queuing delay measurement. For these reasons, we choose not to rely on clock synchronization. Our intuition is that unlike one-way delay, queuing delay can be measured simply by subtracting the baseline delay (skewed one-way delay with zero queuing) from the sample one-way delay even if the clocks are not synchronized. For example, suppose a clock difference of 5 seconds, as depicted in Figure 6. When we measure one-way delay from A to B, which takes one second propagation delay (no queuing), the one-way delay measured would be 4 seconds instead of one second. When we measure another sample where it takes 2 seconds due to queuing delay, it would result in 3 seconds. By subtracting 4 from 3, we get one second queuing delay. Now, there are two remaining issues. First is obtaining accurate baseline delay, and second is dealing with clock drifts. The base line can be obtained by picking the minimum among many samples. The frequency of zero queuing being measured depends on the congestion control algorithm behavior. Since we target near zero-queuing, we observe this every few RTTs. Handling clock drift: A standard clock drifts only 40 nsecs per msec [24]. This means that the relative error between two measurements (e.g., base one-way delay and sample one-way delay) taken from two clocks during a millisecond can only contain tens of nanoseconds of error. Thus, we make sure that base one-way delay is updated frequently (every few round trip times). One last caveat with updating base one-way delay is that clock drift differences can cause one-way delay measurements to continuously increase or decrease. If we simply take minimum base one-way delay, it causes one side to update its base one-way delay continuously, while the other side never updates the base delay because its measurement continuously increases. As a workaround, we update the base one-way delay when the RTT measurement hits the new minimum or re-observe the current minimum; RTT measurements are not affected by clock drift, and minimum RTT implies no queueing in the network. This event happens frequently enough in DX, and it ensures that clock drifts do not cause problems. IV. DX: LATENCY-BASED CONGESTION CONTROL A. Limitations of Existing Algorithms Our latency measurement serves as much more accurate congestion feedback than previous kernel-based latency Fig. 7. Queue length with the increasing number of TCP Vegas flows. measurement, so existing latency-based congestion control algorithms can also benefit from it. In this subsection, we study whether existing latency-based algorithms can be used in datacenter networks to meet the low queueing delay requirement. The first latency-based algorithm proposed in wide-area networks is TCP Vegas [9], and other later proposed algorithms share the same core idea with TCP Vegas. Therefore we focus on TCP Vegas and analyze its performance in datacenter networks. If TCP Vegas turns out to work well, then we will not need to develop another algorithm and just re-use TCP Vegas. If not, we need to figure out why TCP Vegas does not work and use the lessons learned to design a new algorithm. We conduct ns-2 simulation in a dumbbell topology to test if TCP Vegas achieves low queueing delay. We have ten idle senders in the beginning, and we activate each sender with 0.5 second interval so that we have ten active flows in the end. Figure 7 shows the queueing delay evolution as the number of flows is increased. We notice that the queueing delay increases with the number of flows in the bottleneck link; each flow adds up its own share of queueing to existing queueing. The queue length is consistently as high as 42 packets from 4.5 s to 5.0 s where ten flows are sharing the link. As datacenter workloads are very dynamic and the number of flows is not bounded, TCP Vegas cannot always guarantee low queueing delay. We observe another drawback of TCP Vegas in the fairness among flows. According to the TCP Vegas algorithm, a sender determines the congestion level from (measured RTT - base RTT). This approach can provide fairness only when all the senders maintain the same base RTT value. The simulation result, however, tells us otherwise. The first flow has 205 µs for base RTT and the last flow has 238 µs; later flows in the network get larger base RTT. In this case, the last flow under-estimates the congestion level and tries to send faster than it is supposed to. From the above simulation, we learn two lessons to be used in designing a new congestion control algorithm for datacenters: i) the algorithm should be able to drop the queue length down to zero quickly as soon as it observes congestion; ii) the algorithm should take into account the number of flows in the network when decreasing window size. We explain how we reflect these lessons in our algorithm in the next subsection. B. DX Algorithm Details We present a congestion control algorithm for datacenters that targets near zero queueing delay based on implicit feedback, without any form of in-network support. Because latency feedback signals the amount of excessive packets in the

6 6 IEEE/ACM TRANSACTIONS ON NETWORKING network, it allows senders to calculate the maximum number of packets to drain from the network while achieving full utilization. This section presents the basic mechanisms and design of our new congestion control algorithm, DX. Our target deployment environment is datacenters, and we assume that all traffic congestion is controlled by DX, similar to the previous work [4], [6] [8], [11]. DX is a window-based congestion control algorithm, and its congestion avoidance follows the popular AIMD (Additive Increase Multiplicative Decrease) rule. The key difference from TCP (e.g., TCP Reno) is its congestion avoidance algorithm. DX uses the queueing delay to make a decision on whether to increase or decrease congestion window in the next round at every RTT. Zero queueing delay indicates that there is still more room for packets in the network, so the window size is increased by one at a time. On the other hand, any positive queueing delay means that a sender must decrease the window. DX updates the window size once in a round-trip using the formula below: { CWND +1, if Q =0 new CW ND = CWND (1 Q V ), if Q>0, (1) where Q represents the latency feedback, that is, the average queueing delay in the current round-trip, and V is a selfupdated coefficient of which role is critical in our congestion control. When Q>0, DX decreases the window proportional to the current queueing delay. The amount to decrease should be just enough to drain the currently queued packets not to affect utilization. An aggressive decrease in the congestion window will cause the network utilization to drop below 100%. For DX, the exact amount depends on the number of flows sharing the bottleneck because the aggregate sending rate of these flows should decrease to drain the queue. V is the coefficient that accounts for the number of competing flows. We drive the value of V using the analysis below. We denote the link capacity (packets / sec) as C, the base RTT as R, single-packet transmission delay as D, the number of flows as N, and the window size and the queueing delay of flow k at time t as W (t) k and Q (t) k, respectively. Without loss of generality, we assume at time t the bottleneck link fully utilized and the queue size is zero. We also assume that their behaviors are synchronized to derive a closed-form analysis and verify the results using simulations and testbed experiments. At time t, because the link is fully utilized and the queuing delay is zero, the sum of the window size equals to the bandwidth delay product C R: N W (t) k = C R (2) k=1 Since none of the N flows experiences congestion, they all increase their window size by one at time t +1: N W (t+1) k = C R + N (3) k=1 Now all the senders observe a positive queueing delay, and they respond by decreasing the window size using the multiplicative factor, 1 Q/V, as in (1). As a result, at time t +2, we expect fewer packets in the network; we want just enough packets to fully saturate the link and achieve zero queuing delay in the next round. We calculate the total number of packets in the network (in both the link and the queues) at time t +2from the sum of window size of all the flows. N N W (t+2) k = W (t+1) k (1 Q k (t+1) ) (4) V k=1 k=1 Assuming every flow experiences maximum queueing delay N D in the worst case, we get: N W (t+2) k = k=1 N k=1 W (t+1) k (1 N D ) V =(C R + N)(1 N D ) (5) V We want total number of in-flight packets at time t +2 to equal to the bandwidth delay product: (C R + N)(1 N D )=C R (6) V Solving for V results in: N D V = (1 C R C R+N ) (7) Among the variables required to calculate V, the only unknown is N, which is the number of concurrent flows. The number of flows can be estimated from the sender s own window size because DX achieves fair-share throughput at a steady state; DX is an AIMD algorithm, and a previous work [25] has shown that AIMD algorithms converge to fairness. For notational convenience, we denote W (t+1) k as W and rewrite (3) as: N W (t+1) k = N W = C R + N N = C R W 1 k=1 Using (5) and replacing D, single-packet transmission delay, with (1/C), weget: V = R W W (8) 1 In calculating V, the sender only needs to know the base RTT, R, and the previous window size W. No additional measurement is required. We do not need to rely on external configuration or parameter settings either, unlike the ECN-based approaches. Even if the link capacity in the network varies across links, it does not affect our calculation of V. So far we have explained how DX handles the event of positive queueing delay. Although DX does not experience packet loss by queue overflows, a timeout event can still occur due to physical level failures. In traditional TCP algorithms, these timeout events are considered as congestion alarms and dealt by window size decrement. In DX, however, such timeout events are not caused by congestion confirmed by measured queueing delay, so the window size can remain the

7 LEE et al.: DX: LATENCY-BASED CONGESTION CONTROL FOR DATACENTERS 7 Fig. 8. Steady-state CWND comparison. same and keep utilizing the network link fully. Being able to identify the source of packet loss is one of the DX s strengths; retransmitting lost packets and updating timeout value can be done in the same manner as TCP. Fig. 9. Convergence of two flows with DX and DCTCP. C. Steady-State Analysis Here we provide a simple analysis on the steady-state behavior of DX. Our interest is in how close DX is to the ideal congestion control algorithm with completely zero queueing. In our analysis, we compare the three kinds of window size: ideal, theoretical, and simulational. The ideal window size is easily computed by the bandwidth-delay product divided by the number of flows; 100% link utilization with zero queueing. The theoretical value is computed using the worst case queue length in our algorithm. DX increases the window size only when the queue length is zero, so the worst case happens when all the flows see zero queue length and decide to increase their window by one simultaneously. Then we can have as much as n queued packets where n is the number of flows. The theoretical window size can be now calculated from (bandwidth-delay product + n) dividedbyn. Finally, the simulational value is the result from ns-2 simulation. We use 10 Gbps link capacity and 200 µs RTT for this analysis. We present the result in Figure 8. To test various scenarios, we increase the number of flows from two to ten and plot the results. We observe that the ideal value, which is the lower bound, is very close to theoretical value of DX. The simulation result is also close to the theoretical value as the maximum difference is 2.69 packets at n =2. The disparity between the theoretical and simulation results come from the assumption used in the theoretical computation that all flows are synchronized. Next we observe the convergence behavior of DX in comparison to DCTCP. For this observation, we use the same convergence analysis methodology from a previous AIMD analysis work [25]. In our analysis scenario, a flow (denoted as flow #1) is occupying the total link bandwidth in the beginning. Then the second flow (denoted as flow #2) comes into the network, and after a certain amount of time, both flows converge to the fair share throughput and reach steady-state. We plot the change in each flow s window size in Figure 9. The x-axis is the window size of flow #1, and y-axis is the window size of flow #2. The fairness line represents the condition where two flows have the same window size, hence fair throughput. The efficiency line represents the condition where the sum of the window size of two flows is exactly same as the bandwidth-delay product; the right side of efficiency line means over-utilization (i.e., queueing) and the left side means under-utilization. Starting from the bottom-right corner, both DCTCP and DX converge near the fairness line, but DX takes a more direct path than DCTCP. At steady-state, DX is much closer to the efficiency line than DCTCP so it minimizes the unnecessary queuieng in the network. V. IMPLEMENTATION We have implemented DX in two parts: latency measurement in DPDK-based NIC driver and latency-based congestion control in the Linux s TCP stack. This separation provides a few advantages: (i) it measures latency more accurately than doing so in the Linux Kernel; (ii) legacy applications can take advantage of DX without modification; and (iii) it separates the latency measurement from the TCP stack, and hides the differences between hardware implementations, such as timestamp clock frequencies or timestamping mechanisms. We present the implementation of software- and hardwarebased latency measurements and modifications to the kernel TCP stack to support latency feedback. A. Timestamping and Delay Calculation We measure four timestamp values as shown in section III Figure 4: t 1 and t 2 are the transmission and reception time of a data packet, and t 3 and t 4 are the transmission and reception time of a corresponding ACK packet. Software timestamping: To eliminate host processing delay, we perform TX timestamping right before pushing packets to the NIC, and RX timestamping right after the packets are received, at the DPDK-based device driver. We use rdtsc to get CPU cycles and transform this into nanoseconds timescales. We correct timestamps using techniques described in III. All four timestamps must be delivered to the sender to calculate the one-way delay and the base RTT. We use TCP s option fields to relay t 1, t 2,andt 3 ( V-B). To calculate one-way delay, the DX receiver stores a mapping from expected ACK number to t 1 and t 2 when it receives a data packet. It then puts them in the corresponding ACK along with the ACK s transmission time (t 3 ). The memory

8 8 IEEE/ACM TRANSACTIONS ON NETWORKING overhead is proportional to the arrived data of which the corresponding ACK has not been sent yet. The memory overhead is negligible as it requires store 8 bytes per in-flight packet. In the presence of delayed ACK, not all timestamps are delivered back to the sender, and some of them are discarded. Hardware timestamping: We have implemented hardwarebased timestamping on Mellanox ConnectX-3 using a DPDKported driver. Although the hardware supports RX/TX timestamping for all packets, its driver did not support TX timestaming. We have modified the driver to timestamp all RX/TX packets. The NIC hardware delivers timestamps to the driver by putting the timestamps in the ring descriptor when it completes DMA. This causes an issue with the previous logic to carry t 1 in the original data packet. To resolve this, we store mapping of expected ACK number to the t 1 at the sender, and retrieve this when ACK is received. LRO handling: Large Receive Offload (LRO) is a widely used technique for reducing CPU overhead on the receiver side. It aggregates received TCP data packets into a large single TCP packet and passes to the kernel. It is crucial to achieve 10 Gbps or beyond in today s Linux TCP stack. This affects DX in two ways. First, it makes the TCP receiver generate fewer number of ACKs, which in turn reduces the number of t 3 and t 4 samples. Second, even though t 1 and t 2 are acquired before LRO bundling at the driver, we cannot deliver all of them back to the kernel TCP stack due to limited space in the TCP option header. To work around the problem, for each ACK that is processed, we scan through the previous t 1 and t 2 samples, and deliver average one-way delay with the sample count. In fact, instead of passing all timestamps to the TCP layer, we only passes one-way delay t 2 - t 1 and RTT ((t 4 t 1 ) (t 3 t 2 )). Burst mitigation: As shown in III, burstiness from I/O batching incurs timestamping errors. To control burstiness, we implement a simple token bucket with burst size of MTU and rate set to link capacity. SoftNIC [22] does polling on the token bucket to draw packets and passes them to the timestamping module or the NIC. If the polling loop takes longer than the transmission time of a packet, the token bucket emits more than one packet, but limits the number of packets to keep up with link capacity. B. Congestion Control We implement DX congestion control algorithm in the Linux kernel. We add DX as a new TCP option that consumes 14 bytes of additional TCP header. The first 2 bytes are for the option number and the option length required by the TCP option parser. The remaining 12 bytes are divided into three 4 byte spaces and used for storing timestamps and/or an ACK number. Most of modifications are made in the tcp_ack() function in TCP stack. This is triggered when an ACK packet is received. An ACK packet carries one-way delay and RTT in the header that are pre-calculated by the DPDK-based device driver. For each round trip time, the received delay samples are averaged and used for new CWND calculation. The current implementation takes the average one-way delay observed during the last round trip. Practical considerations: In real-world networks, a transient increase in queueing delay Q does not always mean network congestion. Reacting to wrong congestion signals results in low link utilization. There are two sources of error: measurement noise and instant queueing due to packet bursts. Although we have shown that our latency measurement has a low standard deviation up to about a microsecond, it can still trigger undesirable window reduction as DX reacts to a positive queueing delay whether large or small. On the other hand, instant queueing can happen with even very small number of packets. For example, if two packets arrive at the switch at the exactly same moment, one of them will be served after the first packet s transmission delay, hence positive queueing delay. To tackle such practical issues, we come up with two simple techniques. First, to be robust against latency measurement noise, we use headroom when determining congestion; DX does not decrease window size when Q < headroom. The size of the headroom is determined by the level of measurement noise. For example, if each latency measurement has 10 µs error at maximum, the headroom should be set to 10 µs because any measurements smaller than 10 µs can be a false congestion alarm. Second, to be robust against transient increase in delay measurements, we use the average queueing delay during an RTT period. In an ideal network without packet bursts, the maximum queueing delay is a good indication of excess packets. In real networks, however, taking the maximum is easily affected by instant queueing. Taking the minimum removes the burstiness most effectively, but it detects congestion only when all the packets in the window experience positive queueing delay. Hence we choose the average to balance them out. Note that DCTCP, a previous ECN-based solution, also suffers from bursty instant queueing and requires higher ECN threshold in practice than theoretic calculation [7]. VI. EVALUATION Throughout the evaluation, We answer three main questions: Can DX obtain the accuracy of a single packet s queuing delay in high-speed networks? Can DX achieve minimal queuing delay while achieving high utilization? How does DX perform in large scale networks with realistic workloads? By using testbed experiments, we show that our noise reduction techniques are effective and queuing delay can be measured with an accuracy of a single MSS packet at 10 Gbps. We evaluate DX against DCTCP and verify that it reduces queuing in the switch up to five times. Next, we use ns-2 packet level simulation to conduct more detailed analysis and evaluate DX in large-scale with realistic workload. First, we verify the DX s effectiveness by looking at queuing delay, utilization and fairness. We then quantify the impactof measurementerrorson DX to evaluate its robustness.

9 LEE et al.: DX: LATENCY-BASED CONGESTION CONTROL FOR DATACENTERS 9 Fig. 10. Improvements with noise reduction techniques. Fig. 11. Improvement on RTT measurement error compared to kernel s. Fig. 12. Accuracy of queuing delay measurement. (a) 1 Gbps with software timestamping. (b) 10 Gbps with hardware timestamping. Finally, we perform large-scale evaluation to compare DX s overall performance against the state of the art: DCTCP [7] and HULL [8]. A. Accuracy of Queuing Delay in Testbed For testbed experiments, we use Intel 1 GbE/10 GbE NICs for software timestamping and Mellanox ConnectX-3 40 GbE NIC for hardware timestamping; the Mellanox NIC is used in 10 Gbps mode due to the lack of 40 GbE switches. Effectiveness of noise reduction techniques: To quantify the benefit of each technique, we apply the techniques one by one and measure RTT using both software and hardware. Two machines are connected back to back, and we conduct RTT measurement at 10 Gbps link. We plot the standard deviation in Figure 10. Ideally, the RTT should remain unchanged since there is no network queueing delay. In software-based solution, we reduce the measurement error (presented as standard deviation) down to 1.98 µs by timestamping at DPDK and applying burst control and calibration. Among the techniques, burst control is the most effective, cutting down the error by 23.8 times. In hardware solution, simply timestamping at NIC achieves comparable noise with all techniques applied in the software solution. After inter-packet interval calibration, the noise drops further down to 0.53 µs, less than half of a single packet s queueing delay at 10 Gbps, which is within our target accuracy. Calibration of H/W timestamping: We look further into how calibration affects the accuracy of hardware timestamping. The calibration effectively removes the inter packet gap samples smaller than link transmission delay which originally took up 68% for TX and 32% for RX. The figures for this evaluation can be found in our previous paper [1]. Overall RTT measurement accuracy improvement: Now, we look at how much overall improvements we made on the accuracy of RTT measurement. We plot the CDF of RTT measurement for our technique using hardware and RTT measured in the Kernel in Figure 11. The total range of RTT has decreased by 62 times, from 710 µs to µs. The standard deviation is improved from 80.7 µs to 0.53 µs by two orders of magnitude, and falls below a single packet queuing at 10 Gbps. Verification of queuing delay: Now that we can measure RTT accurately, the remaining question is whether it leads to accurate queuing delay estimation. We conduct a controlled experiment where we have a full control over the queuing level. To create such a scenario, we saturate a port in a switch by generating full throttle traffic from one host, and inject a MTU-sized ICMP packet to the same port at fixed interval from another host. This way, we increase the queuing by a packet at fixed interval, and we measure the queuing statistics from the switch to verify our queuing delay measurement. Figure 12 shows the time series of queuing delay measured by DX along with the ground truth queue occupancy measured at the switch (marked as red squares). We use software and hardware timestamping for 1 Gbps and 10 Gbps, respectively. Every time a new ping packet enters the network, the queueing delay increases by one MTU packet transmission delay: 12 µs at 1 Gbps and 1.2 µs at 10 Gbps. The queue length retrieved from the switch also matches our measurement result. The result at 10 Gbps seems noisier than at 1 Gbps due to the smaller transmission delay; note that the scale of y-axis is different in two graphs. B. DX Congestion Control in Testbed Using the accurate queueing delay measurements, we run our DX prototype with three servers in our testbed; two nodes are senders and the other is a receiver. We use iperf [21] to generate TCP flows for 15 seconds. For comparison, we run DCTCP in the same environment. The ECN marking threshold

10 10 IEEE/ACM TRANSACTIONS ON NETWORKING at 10 Gbps compared to DCTCP. DX reacts to congestion much earlier than DCTCP and reduces the congestion window to the right amount to minimize the queue length while achieving full utilization. DX achieves the lowest queueing delay among existing end-to-end congestion controls with implicit feedback that do not require any switch modifications, In the next section, we also show that DX is even comparable to HULL, a solution that requires in-network support and switch modification. Fig. 13. Queue length comparison of DX against DCTCP in Testbed. (a) 1 Gbps bottleneck. (b) 10 Gbps bottleneck. for DCTCP is set to the recommended value of 20 at 1 Gbps and 65 at 10 Gbps [7]. During the experiment, the switch queue length is measured every 20 ms by reading the register values from the switch chipset; the queue length is measured in bytes and converted into time. We first present the result at the 1 Gbps bottleneck link in Figure 13. Here we focus on the queue length of each protocol as the throughput does not exhibit much difference; both protocols successfully saturate the bottleneck link during the experiment, and each flow achieves the fair-share throughput of 500 Mbps in the 1 Gbps link and 5 Gbps in the 10 Gbps link. We observe that DX consistently reduces the switch queue length compared to that of DCTCP. The average queueing delay of DX, 37.8 µs, is 4.85 times smaller than that of DCTCP, µs. DX shows 5.33x improvement in median queue length over DCTCP (3 packets for DX and 16 packets for DCTCP). DCTCP s maximum queue length goes up to 24 packets, while DX peaks at 8 packets. We run the same experiment with 10 Gbps bottleneck. For 10 Gbps, we additionally run DX with hardware timestamp using Mellanox ConnectX-3 NIC. Figure 13 shows the result. DX (HW) denotes hardware timestamping, and DX (SW) denotes software timestamping. DX (HW) decreases the average queue length by 1.67 times compared to DCTCP, from 43.4 µs to 26.0 µs. DX (SW) achieves 31.8 µs of average queuing delay. The result also shows that DX effectively reduces the 99th-percentile queue length by a factor of 2 with hardware timestamping; DX (HW) and DX (SW) achieve 52 packets and 38 packets respectively while DCTCP achieves 78 packets. To summarize, latency feedback is more effective in maintaining low queue occupancy than ECN feedback. DX achieves 4.85 times smaller average queue size at 1 Gbps and 1.67 times C. Large-Scale Simulation 1) Dumbbell Topology With More Senders: In this section, we evaluate DX, DCTCP, and HULL in simulation to observe the performance in larger-scale environment. We run ns-2 simulator using a dumbbell network topology with 10 Gbps link capacity. The latency measurement in simulation is accurate without any noise. For scalability test, we vary the number of simultaneous flows from 10 to 30 as queuing delay and utilization are correlated with it; the number of senders has a direct impact on queueing delay as shown in DCTCP [7]. We measure the queuing delay and utilization, and summarize the findings below. The graphs can be found in our previous paper [1]. Queueing delay: Many distributed applications with short flows are sensitive to the tail latency as the slowest flow that belongs to a task determines the completion time of the task [26]. Hence, we look at the 99th percentile queuing delay as well as the average queueing delay. On average, DX achieves 6.6x smaller queueing delay than DCTCP with ten senders, and slightly higher queuing delay than HULL. At 99th percentile, DX even outperforms HULL by 1.6x to 2.2x. The reason that DX achieves such low queuing is because of the immediate reaction to the queuing whereas both DCTCP and HULL uses weighted averaging for reducing congestion window size that takes multiple round trip times. Utilization: DX achieves 99.9% of utilization which is comparable to DCTCP, but with much smaller queuing. HULL sacrifices utilization to reduce the queuing delay achieving about 90% of the bottleneck link capacity. We note that low queueing delay of DX does not sacrifice the utilization. Fairness and throughput stability: To evaluate the throughput fairness, we generate 5 identical flows in the 10 Gbps link one by one with 1 second interval and stop each flow after 5 seconds of transfer. In Figure 14, we see that both protocols offer fair throughput to exiting flows at each moment. One interesting observation is that DX flows have more stable throughput than DCTCP flows. This implies that DX provides higher fairness than DCTCP in small time scale. We compute the standard deviation of throughput to quantify the stability; 268 Mbps for DCTCP and 122 Mbps for DX. Impact of latency noise: We evaluate the impact of latency noise to the headroom size and average queue length in DX. We generate latency noise using normal distribution with varying standard deviation. The noise level is multiples of 1.2 µs, single packet s transmission delay. As the simulated noise level increases, we need more headroom for full link utilization. Figure 15 shows the required headroom for full link utilization and the resulting queue length in average.

Lecture 14: Congestion Control"

Lecture 14: Congestion Control Lecture 14: Congestion Control" CSE 222A: Computer Communication Networks George Porter Thanks: Amin Vahdat, Dina Katabi and Alex C. Snoeren Lecture 14 Overview" TCP congestion control review Dukkipati

More information

Lecture 21: Congestion Control" CSE 123: Computer Networks Alex C. Snoeren

Lecture 21: Congestion Control CSE 123: Computer Networks Alex C. Snoeren Lecture 21: Congestion Control" CSE 123: Computer Networks Alex C. Snoeren Lecture 21 Overview" How fast should a sending host transmit data? Not to fast, not to slow, just right Should not be faster than

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

ADVANCED COMPUTER NETWORKS

ADVANCED COMPUTER NETWORKS ADVANCED COMPUTER NETWORKS Congestion Control and Avoidance 1 Lecture-6 Instructor : Mazhar Hussain CONGESTION CONTROL When one part of the subnet (e.g. one or more routers in an area) becomes overloaded,

More information

Analyzing the Receiver Window Modification Scheme of TCP Queues

Analyzing the Receiver Window Modification Scheme of TCP Queues Analyzing the Receiver Window Modification Scheme of TCP Queues Visvasuresh Victor Govindaswamy University of Texas at Arlington Texas, USA victor@uta.edu Gergely Záruba University of Texas at Arlington

More information

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste Outline 15-441 Computer Networking Lecture 18 TCP Performance Peter Steenkiste Fall 2010 www.cs.cmu.edu/~prs/15-441-f10 TCP congestion avoidance TCP slow start TCP modeling TCP details 2 AIMD Distributed,

More information

Computer Networking

Computer Networking 15-441 Computer Networking Lecture 17 TCP Performance & Future Eric Anderson Fall 2013 www.cs.cmu.edu/~prs/15-441-f13 Outline TCP modeling TCP details 2 TCP Performance Can TCP saturate a link? Congestion

More information

Appendix B. Standards-Track TCP Evaluation

Appendix B. Standards-Track TCP Evaluation 215 Appendix B Standards-Track TCP Evaluation In this appendix, I present the results of a study of standards-track TCP error recovery and queue management mechanisms. I consider standards-track TCP error

More information

15-744: Computer Networking. Overview. Queuing Disciplines. TCP & Routers. L-6 TCP & Routers

15-744: Computer Networking. Overview. Queuing Disciplines. TCP & Routers. L-6 TCP & Routers TCP & Routers 15-744: Computer Networking RED XCP Assigned reading [FJ93] Random Early Detection Gateways for Congestion Avoidance [KHR02] Congestion Control for High Bandwidth-Delay Product Networks L-6

More information

Congestion Control in Datacenters. Ahmed Saeed

Congestion Control in Datacenters. Ahmed Saeed Congestion Control in Datacenters Ahmed Saeed What is a Datacenter? Tens of thousands of machines in the same building (or adjacent buildings) Hundreds of switches connecting all machines What is a Datacenter?

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this

More information

Cloud e Datacenter Networking

Cloud e Datacenter Networking Cloud e Datacenter Networking Università degli Studi di Napoli Federico II Dipartimento di Ingegneria Elettrica e delle Tecnologie dell Informazione DIETI Laurea Magistrale in Ingegneria Informatica Prof.

More information

Investigating the Use of Synchronized Clocks in TCP Congestion Control

Investigating the Use of Synchronized Clocks in TCP Congestion Control Investigating the Use of Synchronized Clocks in TCP Congestion Control Michele Weigle (UNC-CH) November 16-17, 2001 Univ. of Maryland Symposium The Problem TCP Reno congestion control reacts only to packet

More information

CS 268: Computer Networking

CS 268: Computer Networking CS 268: Computer Networking L-6 Router Congestion Control TCP & Routers RED XCP Assigned reading [FJ93] Random Early Detection Gateways for Congestion Avoidance [KHR02] Congestion Control for High Bandwidth-Delay

More information

Congestion control in TCP

Congestion control in TCP Congestion control in TCP If the transport entities on many machines send too many packets into the network too quickly, the network will become congested, with performance degraded as packets are delayed

More information

6.033 Spring 2015 Lecture #11: Transport Layer Congestion Control Hari Balakrishnan Scribed by Qian Long

6.033 Spring 2015 Lecture #11: Transport Layer Congestion Control Hari Balakrishnan Scribed by Qian Long 6.033 Spring 2015 Lecture #11: Transport Layer Congestion Control Hari Balakrishnan Scribed by Qian Long Please read Chapter 19 of the 6.02 book for background, especially on acknowledgments (ACKs), timers,

More information

Transport Protocols for Data Center Communication. Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti

Transport Protocols for Data Center Communication. Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti Transport Protocols for Data Center Communication Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti Contents Motivation and Objectives Methodology Data Centers and Data Center Networks

More information

Transmission Control Protocol (TCP)

Transmission Control Protocol (TCP) TETCOS Transmission Control Protocol (TCP) Comparison of TCP Congestion Control Algorithms using NetSim @2017 Tetcos. This document is protected by copyright, all rights reserved Table of Contents 1. Abstract....

More information

CS268: Beyond TCP Congestion Control

CS268: Beyond TCP Congestion Control TCP Problems CS68: Beyond TCP Congestion Control Ion Stoica February 9, 004 When TCP congestion control was originally designed in 1988: - Key applications: FTP, E-mail - Maximum link bandwidth: 10Mb/s

More information

15-744: Computer Networking TCP

15-744: Computer Networking TCP 15-744: Computer Networking TCP Congestion Control Congestion Control Assigned Reading [Jacobson and Karels] Congestion Avoidance and Control [TFRC] Equation-Based Congestion Control for Unicast Applications

More information

Lecture 14: Congestion Control"

Lecture 14: Congestion Control Lecture 14: Congestion Control" CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Amin Vahdat, Dina Katabi Lecture 14 Overview" TCP congestion control review XCP Overview 2 Congestion Control

More information

ECE 610: Homework 4 Problems are taken from Kurose and Ross.

ECE 610: Homework 4 Problems are taken from Kurose and Ross. ECE 610: Homework 4 Problems are taken from Kurose and Ross. Problem 1: Host A and B are communicating over a TCP connection, and Host B has already received from A all bytes up through byte 248. Suppose

More information

Congestion Control for High Bandwidth-delay Product Networks. Dina Katabi, Mark Handley, Charlie Rohrs

Congestion Control for High Bandwidth-delay Product Networks. Dina Katabi, Mark Handley, Charlie Rohrs Congestion Control for High Bandwidth-delay Product Networks Dina Katabi, Mark Handley, Charlie Rohrs Outline Introduction What s wrong with TCP? Idea of Efficiency vs. Fairness XCP, what is it? Is it

More information

Transmission Control Protocol. ITS 413 Internet Technologies and Applications

Transmission Control Protocol. ITS 413 Internet Technologies and Applications Transmission Control Protocol ITS 413 Internet Technologies and Applications Contents Overview of TCP (Review) TCP and Congestion Control The Causes of Congestion Approaches to Congestion Control TCP Congestion

More information

TCP Congestion Control : Computer Networking. Introduction to TCP. Key Things You Should Know Already. Congestion Control RED

TCP Congestion Control : Computer Networking. Introduction to TCP. Key Things You Should Know Already. Congestion Control RED TCP Congestion Control 15-744: Computer Networking L-4 TCP Congestion Control RED Assigned Reading [FJ93] Random Early Detection Gateways for Congestion Avoidance [TFRC] Equation-Based Congestion Control

More information

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz 1 A Typical Facebook Page Modern pages have many components

More information

Network Management & Monitoring

Network Management & Monitoring Network Management & Monitoring Network Delay These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (http://creativecommons.org/licenses/by-nc/3.0/) End-to-end

More information

Handles all kinds of traffic on a single network with one class

Handles all kinds of traffic on a single network with one class Handles all kinds of traffic on a single network with one class No priorities, no reservations required Quality? Throw bandwidth at the problem, hope for the best 1000x increase in bandwidth over 2 decades

More information

Impact of transmission errors on TCP performance. Outline. Random Errors

Impact of transmission errors on TCP performance. Outline. Random Errors Impact of transmission errors on TCP performance 1 Outline Impact of transmission errors on TCP performance Approaches to improve TCP performance Classification Discussion of selected approaches 2 Random

More information

Congestion. Can t sustain input rate > output rate Issues: - Avoid congestion - Control congestion - Prioritize who gets limited resources

Congestion. Can t sustain input rate > output rate Issues: - Avoid congestion - Control congestion - Prioritize who gets limited resources Congestion Source 1 Source 2 10-Mbps Ethernet 100-Mbps FDDI Router 1.5-Mbps T1 link Destination Can t sustain input rate > output rate Issues: - Avoid congestion - Control congestion - Prioritize who gets

More information

Equation-Based Congestion Control for Unicast Applications. Outline. Introduction. But don t we need TCP? TFRC Goals

Equation-Based Congestion Control for Unicast Applications. Outline. Introduction. But don t we need TCP? TFRC Goals Equation-Based Congestion Control for Unicast Applications Sally Floyd, Mark Handley AT&T Center for Internet Research (ACIRI) Jitendra Padhye Umass Amherst Jorg Widmer International Computer Science Institute

More information

RED behavior with different packet sizes

RED behavior with different packet sizes RED behavior with different packet sizes Stefaan De Cnodder, Omar Elloumi *, Kenny Pauwels Traffic and Routing Technologies project Alcatel Corporate Research Center, Francis Wellesplein, 1-18 Antwerp,

More information

Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter. Glenn Judd Morgan Stanley

Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter. Glenn Judd Morgan Stanley Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter Glenn Judd Morgan Stanley 1 Introduction Datacenter computing pervasive Beyond the Internet services domain BigData, Grid Computing,

More information

Hybrid Control and Switched Systems. Lecture #17 Hybrid Systems Modeling of Communication Networks

Hybrid Control and Switched Systems. Lecture #17 Hybrid Systems Modeling of Communication Networks Hybrid Control and Switched Systems Lecture #17 Hybrid Systems Modeling of Communication Networks João P. Hespanha University of California at Santa Barbara Motivation Why model network traffic? to validate

More information

CSE 123A Computer Networks

CSE 123A Computer Networks CSE 123A Computer Networks Winter 2005 Lecture 14 Congestion Control Some images courtesy David Wetherall Animations by Nick McKeown and Guido Appenzeller The bad news and the good news The bad news: new

More information

Congestion Control. Tom Anderson

Congestion Control. Tom Anderson Congestion Control Tom Anderson Bandwidth Allocation How do we efficiently share network resources among billions of hosts? Congestion control Sending too fast causes packet loss inside network -> retransmissions

More information

Documents. Configuration. Important Dependent Parameters (Approximate) Version 2.3 (Wed, Dec 1, 2010, 1225 hours)

Documents. Configuration. Important Dependent Parameters (Approximate) Version 2.3 (Wed, Dec 1, 2010, 1225 hours) 1 of 7 12/2/2010 11:31 AM Version 2.3 (Wed, Dec 1, 2010, 1225 hours) Notation And Abbreviations preliminaries TCP Experiment 2 TCP Experiment 1 Remarks How To Design A TCP Experiment KB (KiloBytes = 1,000

More information

Flow-start: Faster and Less Overshoot with Paced Chirping

Flow-start: Faster and Less Overshoot with Paced Chirping Flow-start: Faster and Less Overshoot with Paced Chirping Joakim Misund, Simula and Uni Oslo Bob Briscoe, Independent IRTF ICCRG, Jul 2018 The Slow-Start

More information

Data Center TCP (DCTCP)

Data Center TCP (DCTCP) Data Center TCP (DCTCP) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Microsoft Research Stanford University 1

More information

TCP Incast problem Existing proposals

TCP Incast problem Existing proposals TCP Incast problem & Existing proposals Outline The TCP Incast problem Existing proposals to TCP Incast deadline-agnostic Deadline-Aware Datacenter TCP deadline-aware Picasso Art is TLA 1. Deadline = 250ms

More information

TCP and BBR. Geoff Huston APNIC

TCP and BBR. Geoff Huston APNIC TCP and BBR Geoff Huston APNIC Computer Networking is all about moving data The way in which data movement is controlled is a key characteristic of the network architecture The Internet protocol passed

More information

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007 CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007 Question 344 Points 444 Points Score 1 10 10 2 10 10 3 20 20 4 20 10 5 20 20 6 20 10 7-20 Total: 100 100 Instructions: 1. Question

More information

Performance Analysis of TCP Variants

Performance Analysis of TCP Variants 102 Performance Analysis of TCP Variants Abhishek Sawarkar Northeastern University, MA 02115 Himanshu Saraswat PES MCOE,Pune-411005 Abstract The widely used TCP protocol was developed to provide reliable

More information

Computer Networking. Queue Management and Quality of Service (QOS)

Computer Networking. Queue Management and Quality of Service (QOS) Computer Networking Queue Management and Quality of Service (QOS) Outline Previously:TCP flow control Congestion sources and collapse Congestion control basics - Routers 2 Internet Pipes? How should you

More information

Performance Consequences of Partial RED Deployment

Performance Consequences of Partial RED Deployment Performance Consequences of Partial RED Deployment Brian Bowers and Nathan C. Burnett CS740 - Advanced Networks University of Wisconsin - Madison ABSTRACT The Internet is slowly adopting routers utilizing

More information

Chapter III. congestion situation in Highspeed Networks

Chapter III. congestion situation in Highspeed Networks Chapter III Proposed model for improving the congestion situation in Highspeed Networks TCP has been the most used transport protocol for the Internet for over two decades. The scale of the Internet and

More information

ADVANCED TOPICS FOR CONGESTION CONTROL

ADVANCED TOPICS FOR CONGESTION CONTROL ADVANCED TOPICS FOR CONGESTION CONTROL Congestion Control The Internet only functions because TCP s congestion control does an effective job of matching traffic demand to available capacity. TCP s Window

More information

Chapter 24 Congestion Control and Quality of Service 24.1

Chapter 24 Congestion Control and Quality of Service 24.1 Chapter 24 Congestion Control and Quality of Service 24.1 Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 24-1 DATA TRAFFIC The main focus of congestion control

More information

CS321: Computer Networks Congestion Control in TCP

CS321: Computer Networks Congestion Control in TCP CS321: Computer Networks Congestion Control in TCP Dr. Manas Khatua Assistant Professor Dept. of CSE IIT Jodhpur E-mail: manaskhatua@iitj.ac.in Causes and Cost of Congestion Scenario-1: Two Senders, a

More information

CS519: Computer Networks. Lecture 5, Part 4: Mar 29, 2004 Transport: TCP congestion control

CS519: Computer Networks. Lecture 5, Part 4: Mar 29, 2004 Transport: TCP congestion control : Computer Networks Lecture 5, Part 4: Mar 29, 2004 Transport: TCP congestion control TCP performance We ve seen how TCP the protocol works Sequencing, receive window, connection setup and teardown And

More information

Chapter II. Protocols for High Speed Networks. 2.1 Need for alternative Protocols

Chapter II. Protocols for High Speed Networks. 2.1 Need for alternative Protocols Chapter II Protocols for High Speed Networks 2.1 Need for alternative Protocols As the conventional TCP suffers from poor performance on high bandwidth delay product links [47] meant for supporting transmission

More information

TCP and BBR. Geoff Huston APNIC

TCP and BBR. Geoff Huston APNIC TCP and BBR Geoff Huston APNIC Computer Networking is all about moving data The way in which data movement is controlled is a key characteristic of the network architecture The Internet protocol passed

More information

Congestion Control End Hosts. CSE 561 Lecture 7, Spring David Wetherall. How fast should the sender transmit data?

Congestion Control End Hosts. CSE 561 Lecture 7, Spring David Wetherall. How fast should the sender transmit data? Congestion Control End Hosts CSE 51 Lecture 7, Spring. David Wetherall Today s question How fast should the sender transmit data? Not tooslow Not toofast Just right Should not be faster than the receiver

More information

Implementing stable TCP variants

Implementing stable TCP variants Implementing stable TCP variants IPAM Workshop on Large Scale Communications Networks April 2002 Tom Kelly ctk21@cam.ac.uk Laboratory for Communication Engineering University of Cambridge Implementing

More information

Fast Retransmit. Problem: coarsegrain. timeouts lead to idle periods Fast retransmit: use duplicate ACKs to trigger retransmission

Fast Retransmit. Problem: coarsegrain. timeouts lead to idle periods Fast retransmit: use duplicate ACKs to trigger retransmission Fast Retransmit Problem: coarsegrain TCP timeouts lead to idle periods Fast retransmit: use duplicate ACKs to trigger retransmission Packet 1 Packet 2 Packet 3 Packet 4 Packet 5 Packet 6 Sender Receiver

More information

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014 1 Congestion Control In The Internet Part 2: How it is implemented in TCP JY Le Boudec 2014 Contents 1. Congestion control in TCP 2. The fairness of TCP 3. The loss throughput formula 4. Explicit Congestion

More information

Investigating the Use of Synchronized Clocks in TCP Congestion Control

Investigating the Use of Synchronized Clocks in TCP Congestion Control Investigating the Use of Synchronized Clocks in TCP Congestion Control Michele Weigle Dissertation Defense May 14, 2003 Advisor: Kevin Jeffay Research Question Can the use of exact timing information improve

More information

Data Center TCP (DCTCP)

Data Center TCP (DCTCP) Data Center Packet Transport Data Center TCP (DCTCP) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Cloud computing

More information

Improving TCP Performance over Wireless Networks using Loss Predictors

Improving TCP Performance over Wireless Networks using Loss Predictors Improving TCP Performance over Wireless Networks using Loss Predictors Fabio Martignon Dipartimento Elettronica e Informazione Politecnico di Milano P.zza L. Da Vinci 32, 20133 Milano Email: martignon@elet.polimi.it

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

Congestion Control for High Bandwidth-delay Product Networks

Congestion Control for High Bandwidth-delay Product Networks Congestion Control for High Bandwidth-delay Product Networks Dina Katabi, Mark Handley, Charlie Rohrs Presented by Chi-Yao Hong Adapted from slides by Dina Katabi CS598pbg Sep. 10, 2009 Trends in the Future

More information

Flow and Congestion Control Marcos Vieira

Flow and Congestion Control Marcos Vieira Flow and Congestion Control 2014 Marcos Vieira Flow Control Part of TCP specification (even before 1988) Goal: not send more data than the receiver can handle Sliding window protocol Receiver uses window

More information

EXPERIENCES EVALUATING DCTCP. Lawrence Brakmo, Boris Burkov, Greg Leclercq and Murat Mugan Facebook

EXPERIENCES EVALUATING DCTCP. Lawrence Brakmo, Boris Burkov, Greg Leclercq and Murat Mugan Facebook EXPERIENCES EVALUATING DCTCP Lawrence Brakmo, Boris Burkov, Greg Leclercq and Murat Mugan Facebook INTRODUCTION Standard TCP congestion control, which only reacts to packet losses has many problems Can

More information

Flow and Congestion Control (Hosts)

Flow and Congestion Control (Hosts) Flow and Congestion Control (Hosts) 14-740: Fundamentals of Computer Networks Bill Nace Material from Computer Networking: A Top Down Approach, 6 th edition. J.F. Kurose and K.W. Ross traceroute Flow Control

More information

CSE 461. TCP and network congestion

CSE 461. TCP and network congestion CSE 461 TCP and network congestion This Lecture Focus How should senders pace themselves to avoid stressing the network? Topics Application Presentation Session Transport Network congestion collapse Data

More information

Chapter 3 Review Questions

Chapter 3 Review Questions Chapter 3 Review Questions. 2. 3. Source port number 6 and destination port number 37. 4. TCP s congestion control can throttle an application s sending rate at times of congestion. Designers of applications

More information

Increase-Decrease Congestion Control for Real-time Streaming: Scalability

Increase-Decrease Congestion Control for Real-time Streaming: Scalability Increase-Decrease Congestion Control for Real-time Streaming: Scalability Dmitri Loguinov City University of New York Hayder Radha Michigan State University 1 Motivation Current Internet video streaming

More information

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers Overview 15-441 Computer Networking TCP & router queuing Lecture 10 TCP & Routers TCP details Workloads Lecture 10: 09-30-2002 2 TCP Performance TCP Performance Can TCP saturate a link? Congestion control

More information

Linux Plumbers Conference TCP-NV Congestion Avoidance for Data Centers

Linux Plumbers Conference TCP-NV Congestion Avoidance for Data Centers Linux Plumbers Conference 2010 TCP-NV Congestion Avoidance for Data Centers Lawrence Brakmo Google TCP Congestion Control Algorithm for utilizing available bandwidth without too many losses No attempt

More information

Recap. TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness

Recap. TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness Recap TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness 81 Feedback Signals Several possible signals, with different

More information

There are 10 questions in total. Please write your SID on each page.

There are 10 questions in total. Please write your SID on each page. Name: SID: Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 to the Final: 5/20/2005 There are 10 questions in total. Please write

More information

Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 Final: 5/20/2005

Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 Final: 5/20/2005 Name: SID: Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 Final: 5/20/2005 There are 10 questions in total. Please write your SID

More information

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2014 1 Congestion Control In The Internet Part 2: How it is implemented in TCP JY Le Boudec 2014 Contents 1. Congestion control in TCP 2. The fairness of TCP 3. The loss throughput formula 4. Explicit Congestion

More information

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2015

Congestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2015 1 Congestion Control In The Internet Part 2: How it is implemented in TCP JY Le Boudec 2015 Contents 1. Congestion control in TCP 2. The fairness of TCP 3. The loss throughput formula 4. Explicit Congestion

More information

Understanding TCP Parallelization. Qiang Fu. TCP Performance Issues TCP Enhancements TCP Parallelization (research areas of interest)

Understanding TCP Parallelization. Qiang Fu. TCP Performance Issues TCP Enhancements TCP Parallelization (research areas of interest) Understanding TCP Parallelization Qiang Fu qfu@swin.edu.au Outline TCP Performance Issues TCP Enhancements TCP Parallelization (research areas of interest) Related Approaches TCP Parallelization vs. Single

More information

ARTICLE IN PRESS. Delay-based early congestion detection and adaptation in TCP: impact on web performance

ARTICLE IN PRESS. Delay-based early congestion detection and adaptation in TCP: impact on web performance Computer Communications xx (2005) 1 14 www.elsevier.com/locate/comcom Delay-based early congestion detection and adaptation in TCP: impact on web performance Michele C. Weigle a, *, Kevin Jeffay b, F.

More information

CS644 Advanced Networks

CS644 Advanced Networks What we know so far CS644 Advanced Networks Lecture 6 Beyond TCP Congestion Control Andreas Terzis TCP Congestion control based on AIMD window adjustment [Jac88] Saved Internet from congestion collapse

More information

Lecture 15: Datacenter TCP"

Lecture 15: Datacenter TCP Lecture 15: Datacenter TCP" CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Mohammad Alizadeh Lecture 15 Overview" Datacenter workload discussion DC-TCP Overview 2 Datacenter Review"

More information

Worst-case Ethernet Network Latency for Shaped Sources

Worst-case Ethernet Network Latency for Shaped Sources Worst-case Ethernet Network Latency for Shaped Sources Max Azarov, SMSC 7th October 2005 Contents For 802.3 ResE study group 1 Worst-case latency theorem 1 1.1 Assumptions.............................

More information

CS Transport. Outline. Window Flow Control. Window Flow Control

CS Transport. Outline. Window Flow Control. Window Flow Control CS 54 Outline indow Flow Control (Very brief) Review of TCP TCP throughput modeling TCP variants/enhancements Transport Dr. Chan Mun Choon School of Computing, National University of Singapore Oct 6, 005

More information

Report on Transport Protocols over Mismatched-rate Layer-1 Circuits with 802.3x Flow Control

Report on Transport Protocols over Mismatched-rate Layer-1 Circuits with 802.3x Flow Control Report on Transport Protocols over Mismatched-rate Layer-1 Circuits with 82.3x Flow Control Helali Bhuiyan, Mark McGinley, Tao Li, Malathi Veeraraghavan University of Virginia Email: {helali, mem5qf, taoli,

More information

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What

More information

TCP so far Computer Networking Outline. How Was TCP Able to Evolve

TCP so far Computer Networking Outline. How Was TCP Able to Evolve TCP so far 15-441 15-441 Computer Networking 15-641 Lecture 14: TCP Performance & Future Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 Reliable byte stream protocol Connection establishments

More information

Congestion Control in Communication Networks

Congestion Control in Communication Networks Congestion Control in Communication Networks Introduction Congestion occurs when number of packets transmitted approaches network capacity Objective of congestion control: keep number of packets below

More information

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < : A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks Visvasuresh Victor Govindaswamy,

More information

IEEE 1588 PTP clock synchronization over a WAN backbone

IEEE 1588 PTP clock synchronization over a WAN backbone Whitepaper IEEE 1588 PTP clock synchronization over a WAN backbone A field study comparing PTP clock synchronization accuracy against GPS external time reference in a live production WAN environment Contents

More information

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP CS 5520/ECE 5590NA: Network Architecture I Spring 2008 Lecture 13: UDP and TCP Most recent lectures discussed mechanisms to make better use of the IP address space, Internet control messages, and layering

More information

Overview. TCP congestion control Computer Networking. TCP modern loss recovery. TCP modeling. TCP Congestion Control AIMD

Overview. TCP congestion control Computer Networking. TCP modern loss recovery. TCP modeling. TCP Congestion Control AIMD Overview 15-441 Computer Networking Lecture 9 More TCP & Congestion Control TCP congestion control TCP modern loss recovery TCP modeling Lecture 9: 09-25-2002 2 TCP Congestion Control Changes to TCP motivated

More information

Advanced Computer Networks. Datacenter TCP

Advanced Computer Networks. Datacenter TCP Advanced Computer Networks 263 3501 00 Datacenter TCP Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Today Problems with TCP in the Data Center TCP Incast TPC timeouts Improvements

More information

Performance of UMTS Radio Link Control

Performance of UMTS Radio Link Control Performance of UMTS Radio Link Control Qinqing Zhang, Hsuan-Jung Su Bell Laboratories, Lucent Technologies Holmdel, NJ 77 Abstract- The Radio Link Control (RLC) protocol in Universal Mobile Telecommunication

More information

Principles of congestion control

Principles of congestion control Principles of congestion control Congestion: Informally: too many sources sending too much data too fast for network to handle Different from flow control! Manifestations: Lost packets (buffer overflow

More information

Congestion Control. Queuing Discipline Reacting to Congestion Avoiding Congestion. Issues

Congestion Control. Queuing Discipline Reacting to Congestion Avoiding Congestion. Issues Congestion Control Outline Queuing Discipline Reacting to Congestion Avoiding Congestion Issues Two sides of the same coin pre-allocate resources to avoid congestion (e.g. telephone networks) control congestion

More information

Computer Networks. Course Reference Model. Topic. Congestion What s the hold up? Nature of Congestion. Nature of Congestion 1/5/2015.

Computer Networks. Course Reference Model. Topic. Congestion What s the hold up? Nature of Congestion. Nature of Congestion 1/5/2015. Course Reference Model Computer Networks 7 Application Provides functions needed by users Zhang, Xinyu Fall 204 4 Transport Provides end-to-end delivery 3 Network Sends packets over multiple links School

More information

CS CS COMPUTER NETWORKS CS CS CHAPTER 6. CHAPTER 6 Congestion Control

CS CS COMPUTER NETWORKS CS CS CHAPTER 6. CHAPTER 6 Congestion Control COMPUTER NETWORKS CS 45201 CS 55201 CHAPTER 6 Congestion Control COMPUTER NETWORKS CS 45201 CS 55201 CHAPTER 6 Congestion Control P. Farrell and H. Peyravi Department of Computer Science Kent State University

More information

Oscillations and Buffer Overflows in Video Streaming under Non- Negligible Queuing Delay

Oscillations and Buffer Overflows in Video Streaming under Non- Negligible Queuing Delay Oscillations and Buffer Overflows in Video Streaming under Non- Negligible Queuing Delay Presented by Seong-Ryong Kang Yueping Zhang and Dmitri Loguinov Department of Computer Science Texas A&M University

More information

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS 28 CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS Introduction Measurement-based scheme, that constantly monitors the network, will incorporate the current network state in the

More information

WarpTCP WHITE PAPER. Technology Overview. networks. -Improving the way the world connects -

WarpTCP WHITE PAPER. Technology Overview. networks. -Improving the way the world connects - WarpTCP WHITE PAPER Technology Overview -Improving the way the world connects - WarpTCP - Attacking the Root Cause TCP throughput reduction is often the bottleneck that causes data to move at slow speed.

More information

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459

More information

Managing Performance Variance of Applications Using Storage I/O Control

Managing Performance Variance of Applications Using Storage I/O Control Performance Study Managing Performance Variance of Applications Using Storage I/O Control VMware vsphere 4.1 Application performance can be impacted when servers contend for I/O resources in a shared storage

More information

TCP and BBR. Geoff Huston APNIC. #apricot

TCP and BBR. Geoff Huston APNIC. #apricot TCP and BBR Geoff Huston APNIC The IP Architecture At its heart IP is a datagram network architecture Individual IP packets may be lost, re-ordered, re-timed and even fragmented The IP Architecture At

More information