Congestion in InfiniBand Networks

Size: px
Start display at page:

Download "Congestion in InfiniBand Networks"

Transcription

1 Congestion in InfiniBand Networks Philip Williams Stanford University EE382C Abstract The InfiniBand Architecture (IBA) is a relatively new industry-standard networking technology suited for inter-processor and I/O communication. One of the challenges facing IBA networks is how to deal with congestion. The most recent version of the IBA specification includes a Congestion Control Annex (CCA) that relies on end-to-end Explicit Congestion Notification (ECN) packet marking to resolve congestion with traffic injection rate throttling. However, setting the CCA parameters to control congestion in a stable and efficient manner requires some experimentation. There are alternatives to ECN that can be used to deal with congestion, including adaptive routing and Virtual Output Queuing (VOQ.) Due to the specific requirements of IBA, adaptive routing and VOQ are tricky to implement, but it can be done. Once implemented, adaptive routing can be effective at preventing congestion by routing around it, and VOQ can be effective at eliminating congestion spreading by bypassing it. However, these two techniques do not address the root cause of congestion as ECN does. This paper examines the advantages and disadvantages of each technique, and suggests that the best approach may be to combine the techniques, in order to both avoid congestion where possible, and recover from it when necessary. 1. InfiniBand Architecture InfiniBand is an industry-standard architecture, designed for high bandwidth, low latency, scalability, and reliability. It is particularly suited to SANs for high-performance clusters. Because scalability and industry-wide versatility are defining characteristics of InfiniBand, many design choices are left intentionally unspecified by the IBA, such as topology and routing, in order to accommodate a wide variety of applications. However, certain characteristics of IBA are universal for all applications. The primary goal is high throughput with low latency. End-to-end latency is typically 10µs or less. Because of this competitive requirement, InfiniBand traffic is typically required to be lossless, that is, packets cannot be dropped. Dropping and resending a packet would incur too much latency for InfiniBand applications. (Note, there are provisions in section of the IBA specification [3] that allow setting a Switch Lifetime Limit and Head of Queue Lifetime Limit to discard packets if they remain in a switch for too long. However, this is not desirable behavior, and most of the research regarding congestion control in InfiniBand networks aims to avoid packet dropping, thus lossless traffic is assumed.) 2. Congestion Another characteristic explicitly specified by IBA is credit-based flow control. Downstream switches send credits to upstream switches to inform them that there is buffer space available. If many switches concurrently request the same resource, there may be no buffer space available. The sending switch must hold on to its packet and wait. This is the definition of congestion. Congestion increases latency by requiring packets to wait before being sent to the next node in the network. With lossless traffic, congestion that is initially concentrated at a single hot spot can spread throughout the entire network. A full input buffer causes the upstream switch to wait; if the input buffer of the upstream switch fills up, then the next upstream switch must wait, and so on. This

2 Figure 1. Head of Line (HOL) Blocking can continue all the way back to the source nodes. This is particularly undesirable because it can impact traffic flows that are not even destined for the original congested link, due to a phenomenon known as Head of Line (HOL) blocking, which reduces throughput. This is illustrated by an example from [1], as shown in Figure 1. Assume nodes 5 and 6 arrive first, then node 2, then node 0. Nodes 5 and 6 both send to port 4, overloading its capacity. When node 2 also attempts to send to port 4, there are no credits available, so it must wait. Since node 0 arrived after node 2, it must wait until node 2 is transmitted, thus node 0 is blocked from sending to port 7, even though port 7 is available. The inter-switch link shared by 2-4 and 0-7 remains idle, despite the fact that there is bandwidth available. Congestion can now spread upstream from node 2 and node 0. If the congestion is not resolved, this pattern of congestion spreading, also known as tree saturation, can ultimately result in total network collapse. In this example, the traffic flows destined for port 4 are hot flows; they are directly causing the congestion. The traffic flows destined for port 7 are cold flows; they contribute to the congestion, but are not directly causing it. 3. Congestion Control There are numerous ways to deal with congestion. a) TOPOLOGY - Overprovision the network. Build the network with excess resources so that packets never have to contend for them. This method aims to avoid congestion altogether. b) ROUTING - Route adaptively. Examine the network state during routing and route around congested areas. This method aims to detect congestion when it occurs, and avoid adding to it. c) FLOW CONTROL - Create virtual channels. Use multiple virtual channels for each input port that allow the allocation of the channel bandwidth to be decoupled from the allocation of the channel state [6]. This method eliminates HOL blocking, allowing cold flows to bypass congestion. d) FLOW CONTROL - Reduce the amount of traffic. Decrease the rate at which traffic is injected into congested nodes, giving them time to process and transmit the backed up traffic. This method attacks the source of the congestion itself, alleviating congestion for both hot flows and cold flows. The advantages, disadvantages, and applicability to IBA of each of these strategies will be discussed in the next six sections. 4. Overprovision The simplest way to deal with congestion is to overprovision the network. If an excess amount of resources (switch nodes, inter-switch links, and buffer space) are provided by the network, packets will rarely contend for the same links or buffer space, and congestion will not occur. Obviously, this is not a cost effective solution. The network must be designed to handle the maximum possible traffic; most of the time the available resources will not be utilized. It is also not a scalable solution. As [2] points out, as the number of nodes in a network increases, the fraction of each node s traffic required for congestion to form decreases proportionally, which implies that the amount of overprovisioning would need to increase. Since scalability is one of the key desired attributes of IBA, overprovisioning is not an acceptable solution. 5. Adaptive Routing With adaptive routing, the routing algorithm examines the network state then adapts to it by choosing a path that is underutilized. This technique detects congested links, then routes

3 Figure 2. Avg packet latency vs accepted traffic Various percentages of adaptive routing 32 switch network, uniform traffic around them. To implement adaptive routing, a network must have enough path diversity such that there are alternate paths available at any congested link. This requirement may or may not be met by a particular InfiniBand network, since topology is completely unspecified in the IBA specification. Users are free to implement any network topology, be it regular or irregular. Assuming that a network topology is chosen that has sufficient path diversity for adaptive routing, another requirement must be met. The routing tables must provide multiple paths for each destination, so that the adaptive routing algorithm can choose among them. This is a problem for IBA. Within an InfiniBand subnet, every port is assigned a unique Local ID (LID.) When a packet is injected into the subnet, its destination is specified by a Destination LID (DLID.) At each node, the packet s DLID is compared with the node s LID to determine if the packet has arrived at its destination. IBA specifies that each switch contain a single routing table, and that each routing table list a single output port for each DLID. Thus, the routing algorithm does not have any choice as to which path to take; there is only one option. To enable adaptive routing, [4] proposes a solution that takes advantage of an IBA provision called LID Mask Control (LMC.) IBA allows each port to be identified with a consecutive range of LID s, rather than just a single LID. The LMC indicates how many of the least significant bits of the LID should be ignored when comparing a packet s DLID with the present node s LID. Each destination node will accept all packets destined for any address within its LID range. Since switches treat every LID as distinct, regardless of the LMC, each switch s forwarding table can list a range of DLID s for each destination, and each DLID within that range can be associated with a different output port. The result is that a range of paths are available to take that all lead to the same destination node. The table addressing logic can be designed in such a way that it simultaneously returns all of the available DLID s to the route selection logic, which can then choose which of the output ports to use. [4] evaluated the performance of this adaptive routing technique by simulating various irregular network topologies with a number of different traffic patterns. Average packet latency and throughput were measured. Performance with adaptive routing was compared with purely deterministic routing. Figure 2 illustrates the latency and throughput improvement achieved with adaptive routing vs deterministic routing under uniform traffic. (The percentages indicate what percentage of traffic was flagged as available for adaptive routing. Packets that were not flagged were limited to a single, deterministic path.) There is a positive linear relationship between the amount of traffic routed adaptively and the saturation point. Figure 3 demonstrates the factor of throughput improvement achieved with adaptive routing compared to deterministic routing. Under uniform traffic, adaptive routing significantly outperformed deterministic routing. Throughput increased by 9%-31% on average, depending on network size. However, with 15% hot spot traffic (15% of the source nodes send their traffic to a randomly

4 Figure 3. Throughput increase factor 100% adaptive routing vs 0% adaptive routing Various network sizes and traffic patterns selected hot spot destination,) throughput only improved by a modest 1%-6% with adaptive routing. The researchers attribute this mediocre improvement to limitations with the particular routing algorithm that was used. The simulation used up*/down* routing, which is well-suited for deadlock-free routing in irregular networks. However, it does not allow a very high degree of adaptivity. The algorithm was able to avoid congested areas when they were of low intensity and randomly dispersed throughout the network, but unable to do so when they were highly concentrated in hot spots. Despite this disappointing result, adaptive routing has potential as a congestion avoidance mechanism. In the very least it can be utilized to avoid congestion under benign traffic. It s possible that another adaptive routing algorithm could be chosen which might yield better results for hot spot traffic. [7] mentions alternative routing algorithms that are also well-suited for irregular networks, adaptive trail routing [8] and minimal adaptive routing [9]. Examination of these routing algorithms and their degree of adaptivity is out of the scope of this paper, a possible area for future research. 6. Virtual Channels/Virtual Output Queuing Virtual channels can be utilized to relieve HOL blocking. This entails creating multiple virtual channels for each input port. If a flow is allocated a virtual channel, and then becomes blocked, it does not hold on to the physical link. The link remains available for other virtual channels to utilize. Thus, cold flows can bypass congestion. Note that virtual channels do not relieve hot flows, but as long as the cold flows are taken care of, congestion spreading can be controlled, and network collapse will not occur. If there are enough virtual channels, Virtual Output Queuing (VOQ) can be implemented. VOQ organizes input buffers into a set of queues such that for every input port, there is a distinct buffer assigned to each output port. This provides better HOL blocking relief than an implementation with less virtual channels. IBA does specify the use of Virtual Lanes (VL s,) but they are not quite the same as the virtual channels we have described. VL s are intended to provide Quality of Service (QoS,) not dynamic congestion control. The effect of VL s is that separate virtual networks are created; each one can be used by a different class of traffic. At the source node, each packet is assigned a Service Level (SL,) which cannot be modified. SL s are mapped to VL s at each switch. Thus, when the VL for a particular SL becomes congested, packets belonging to the other SL s can still be transmitted on their virtual networks. There are many aspects of this VL specification which are problematic for the implementation of VOQ in IBA. First, the switch cannot dynamically allocate any packet to any VL; this allocation is deterministically selected by the switch s SL to VL mapping table, and a packet s SL can only be assigned at the source node. Second, VL s cannot be mapped to specific output ports of the switch; they are mapped to each packet s SL. Third, the packet is stored in a particular VL when it arrives at the switch, before the output port has even been determined. [5] describes a clever strategy for implementing VOQ in InfiniBand networks that

5 Figure 4. Average packet latency vs accepted traffic 8 switch network, hot spot traffic (1 hot spot, with distinct curves for varying amounts of traffic) overcomes all of these difficulties. If the SL s assigned at the source node and the SL to VL mapping tables in each switch are carefully constructed, the result is that VL s can be mapped to specific output ports, thus implementing VOQ. To achieve this, the network topology must be examined, and every combination (4-tuple) of switch, input port, output port, and output port at the next switch must be determined. Next, the paths between all possible source/destination pairs are traversed. For every 4-tuple visited during this traversal, a locally unique SL is assigned. Locally unique means that all of the 4-tuples that share the same switch, input port, and output port must have distinct SL s. If locally unique SL s can be assigned to every 4-tuple, and the number of VL s is greater than or equal to the number of output ports on each switch, then VOQ can be fully implemented. When VOQ is implemented for every path in the network, it referred to as full VOQ. If infinitely many distinct SL s and VL s could be used, then full VOQ would always be realizable. However, IBA only allows a maximum of 15 SL s and 15 VL s (plus one VL that is reserved for subnet management.) In addition, all 15 SL s and VL s might not be available for VOQ; some of them might be reserved for other purposes. For a given network, only partial VOQ might be possible. In the case of partial VOQ, maximal performance can be achieved by weighting the 4- tuples according to the number of paths in the entire network that include them, then implementing VOQ for the 4-tuples with the highest weights. Similar to the simulations described in Section 5 to evaluate adaptive routing, [5] evaluated the performance of VOQ by simulating various irregular network topologies with a number of different traffic patterns. Average packet latency and throughput were measured. Performance with VOQ was compared with a network with the same number of SL s and VL s, used in the traditional sense as separate Virtual Networks (VN.) Figure 4 shows the improvement achieved with VOQ under varying amounts of hot spot traffic. Without VOQ, the network saturates early, due to congestion spreading. With VOQ, saturation occurs at a much higher throughput. Higher amounts of hot spot traffic result in

6 Figure 5. Average packet latency vs accepted traffic 8 switch network, hot spot traffic (4 hot spots receiving 20% of all traffic) reduced throughput for both VOQ and VN, but the ratio of throughput improvement for VOQ vs VN actually increases (150% improvement for 10% hot spot traffic, 900% improvement for 70% hot spot traffic.) Figure 5 analyzes the performance of hot flows and cold flows under hot spot traffic. There is not much improvement seen on the hot flow itself for VOQ, compared with VN. However, the throughput for the cold flows is drastically improved with VOQ, since HOL blocking is reduced or eliminated. The results shown all involve 8 SL s and 8 VL s, and 8 switch networks. With only 4 SL s and 4 VL s, performance improvements are not as large, but are still significant. This is an important result since full VOQ is often not possible. With very limited resources, partial VOQ still successfully controls congestion. Also, in larger 16, 32, and 64 switch networks, VOQ still delivers consistent improvements. It is interesting to note that up*/down* routing was used in this simulation, the same routing algorithm that was unable to handle hot spot traffic with adaptive routing in Section Injection Rate Throttling/Explicit Congestion Notification The fourth way to control congestion is to decrease the amount of traffic injected into the congested area. This allows the congested nodes more time to process and transmit packets, and resolves the network congestion. To implement injection rate throttling, congestion must be detected, the offending traffic flows must be notified, and they must reduce their traffic. There also must be a provision to recover from this reduced injection rate and return to the normal rate once congestion has been resolved, in order to utilize full network bandwidth. [1] proposes a strategy for implementing this congestion control method in IBA using an endto-end Explicit Congestion Notification (ECN) scheme. ECN functions as follows, as is shown in Figure 6. 1) Switch detects congestion when a Virtual Lane s (VL) input buffer becomes full. 2) Switch sets the ECN bit in the headers of the packets in the congested VL. Once it is set, the ECN bit cannot be modified. 3) The packets are propagated forward. When each one reaches its destination, the end node recognizes the ECN bit, and sends another ECN

7 Figure 6. Explicit Congestion Control (ECN) mechanism in IBA packet back to the source node. (For reliable data streams where ACK packets are already used, the ECN bit is set in the ACK packet. For unreliable data streams that do not use ACK packets, a small Congestion Notification (CN) packet is created and sent to the source.) 4) When a source node receives an ECN packet, it decreases its injection rate. (For reliable data streams, the source node can decrease the injection rate, as well as reduce its window limit. The window limit is the number of packets that are allowed to be sent before an ACK is received.) 5) If a source node continues to receive additional ECN packets, it continues to decrease the injection rate. On the other hand, if a certain amount of time passes without receiving additional ECN packets, it can be assumed that the congestion has been abated, and the source will gradually increase the injection rate until it is returned to its default level. Let us examine some of the specific properties of this mechanism. ECN relies on end-to-end notification. This method has a relatively slow reaction time. It would be faster for a switch to send notification upstream without first sending the notification all the way downstream to the destination node, or even faster for a switch to throttle its own packets traffic rate. The relatively slow reaction time is somewhat mitigated by the fact that end-to-end latency in InfiniBand networks is very small. The benefit of this method is that it is simple to implement within the parameters of the existing IBA hardware specification, and it allows the injection rate to be throttled for the specific traffic flow that is causing the congestion. This is because the switches have limited information about the traffic they are routing. By allowing the end nodes to control the notification, the specific traffic flow that is causing the congestion can be identified, rather than throttling all traffic flows originating from the source node. The algorithm for increasing and decreasing injection rate is critical. The rate reduction must be fast enough to respond quickly to congestion and limit spreading, but not so fast that it induces starvation. The rate increase must be fast enough to recover any underutilized bandwidth when congestion has been resolved, but must not be so fast that it prematurely contributes additional traffic before congestion has been resolved. [1] provides guidelines for a source response function to achieve this balance. When an ECN packet is received, the injection rate is decreased. Injection rate is controlled by an Inter-Packet Delay (IPD) value that determines how long to wait between sending packets. (For reliable data streams, the window limit can also be adjusted.) The source response function only needs to decrease the injection rate for ECN packets that resulted from data packets that were sent since the most recent adjustment to the injection rate, to avoid reacting to congestion notification that reflects old settings of the injection rate. For injection rate increases, the source maintains a Remaining Packets until Increase (RPI) counter. For every packet that is sent without receiving an ECN, (or for every unmarked ACK that is received, in the case of reliable data streams,) the RPI counter is reduced by one. Once the RPI counter reaches zero, the injection rate and/or window limit can be increased. Additionally, the number of ECN packets it takes to reduce the injection rate by a given

8 amount must be less than or equal to the number of unmarked packets it takes to increase the injection rate by the same amount. Traffic flows with lower injection rates must recover more quickly than traffic flows with higher injection rates. Traffic flows operating at the minimum injection rate must increase the injection rate after a single unmarked packet, rather than waiting for the RPI counter to reach zero. These requirements ensure stable and efficient operation under a variety of corner case conditions. [1] also has a provision for backward compatibility. Switches and end nodes that do not support ECN are fully compatible with those that do. If a non-compliant switch becomes congested, it will likely be spread to a compliant neighbor switch, which can then initiate the ECN protocol. If a switch marks a packet destined for an end node known to be non-compliant, the switch should also send a CN packet to the subnet manager. If a destination node receives a marked data packet from a source node known to be noncompliant, it should send a CN packet to the subnet manager. When the subnet manager receives CN packets, it will notify the offending source node and adjust its injection rate. ECN is a robust method for congestion control. It is completely dynamic; it can react to any traffic pattern. Because of its closed loop, end-to-end nature, it can handle persistent hot spots that last indefinitely. 8. ECN Implementation [1] was published in 2002 by HP Laboratories, and was adopted as an optional section of the InfiniBand specification in 2004 (version 1.2) as the Congestion Control Annex (CCA.) [3] The actual implementation differs in some regards from the original proposal discussed in Section 7. In some areas, ECN has been simplified in the CCA. The source response function has been simplified. Window limit is not used as a means of controlling traffic, only inter-packet injection rate is considered. Also, there is no RPI counter governing rate increase. Instead, rate increase is controlled by a Congestion Control Table Index (CCTI) Timer. The CCTI Timer is cyclical and its duration can be set anywhere from 1.024µs to 67ms. Each time it expires, the injection rate is increased (and the timer is reset.) The advantage to this implementation is that the injection rate is guaranteed to return to its default level, even if there is a break in the traffic and no packets are sent. It would seem that determining a balanced CCTI Timer value would be difficult, since the rate increase is a function of time, rather than a function of packets; whereas rate decrease is still a function of packets, as originally proposed. Actually, the CCTI Timer can be set in a manner similar to the RPI counter of the original proposal by setting CCTI Timer duration = RPI counter size * IPD time. Another simplification in the CCA is that no subnet manager enhancement is provided for backward compatibility. Non-compliant switches and end nodes simply ignore ECN packets, with no intervention from the subnet manager. A mix of congestion control aware and unaware devices is supported, with, at most, some loss of Congestion control. [3] The enhancement may have been abandoned because it was not wellspecified in the original proposal. The basic functionality was explained, but details of how to implement high-level congestion control in the subnet manager were not defined. More importantly, the enhancement did not seem scalable. In the extreme case, in a large network with all CCA-aware switches and no CCA-aware end nodes, the subnet manager would need to handle injection rate control for every source node in the network. In other areas, the CCA provides more flexibility than the original proposal. Flexibility is to be expected for an industry-wide specification with a multitude of applications. For example, a threshold value can be specified for congestion detection, so that buffers are considered congested when they pass a certain limit, rather than only when they are completely full. This allows networks to detect congestion earlier. Because the CCA is flexible, it requires some tuning to determine parameter values (such as

9 Figure 7. Throughput vs time WITHOUT congestion control Figure 8. Throughput vs time WITH congestion control Figures port 3-stage network, hot spot traffic (32 sources sending to the hot spot) buffer threshold, maximum IPD, IPD increment amount, and CCTI Timer duration) that result in effective congestion control. [2] claims to be the first paper not only to develop guidelines for setting these parameters, but to prove that parameter values even exist that can utilize the CCA to control congestion in a realistic network. [2] simulated a variety of InfiniBand networks with regular, folded butterfly topologies, under uniform and hot spot traffic. Simulation results demonstrate that ECN does an excellent job of controlling congestion. Figures 7 and 8 show a different-colored plot for the throughput seen at each of 32 end nodes, during a finite period of hot spot traffic. The single red plot at the top of Figure 7 shows that the single hot spot node remained saturated during the congestion period, while throughput for the rest of the nodes was drastically reduced without congestion control, due to congestion spreading. With congestion control enabled, throughput on the cold flows remains high, as shown in Figure 8. These simulations also demonstrate the importance of fine-tuning the CCA parameters. If they are not set correctly, the network throughput can oscillate or become unstable. If the IPD increment amount is set too low (Figure 9,) the source nodes react too slowly to the congestion. If the IPD increment amount is set too high (Figure 10,) the source nodes react too quickly, making drastic, oscillating changes to the injection rates. If the CCTI Timer expires too quickly (Figure 11,) the source nodes are unable to reduce the injection rate, and there is congestion spreading and throughput collapse, just as if there were no congestion control in place. If the CCTI Timer expires too slowly (Figure 12,) the source nodes react quickly but recover slowly, resulting in unnecessary loss of throughput on the hot flow. Thanks to extensive experimentation, [2] is able to offer general guidelines for setting the CCA parameters. Given N = number of network ports, and HSD = the maximum number of sources that could send traffic to a single hot spot, acceptable values are as follows: Buffer threshold = 90% of switch input buffer size Maximum IPD = 2/3*HSD µs IPD table index increment amount = min(n/6, HSD/2) CCTI Timer duration = 10 µs These guidelines are essential for anyone hoping to make use of IBA s built-in ECN mechanism. However, these results are based on regular, butterfly topologies. Further experimentation is required to determine the guidelines for torus networks. It is unknown

10 Figure 9. Throughput vs time IPD increment amount set too low Figure 10. Throughput vs time IPD increment amount set too high Figure 11. Throughput vs time CCTI Timer set too fast Figure 12. Throughput vs time CCTI Timer set too slow Figures Hot spot traffic (32 sources sending to the hot spot) whether general guidelines can even be established for irregular networks, which are very common among InfiniBand networks. 9. Conclusion ECN is the best method for congestion control in InfiniBand, for many reasons. It is completely dynamic; it requires little knowledge about the network or traffic patterns that are expected. It targets both cold flows and hot flows. It is a standard part of the IBA specification; it does not require tricky algorithms to transform existing resources for purposes they were not originally designed for. However, the tunable parameters may require some tweaking to achieve the best performance. VOQ is the next-best alternative. It does an effective job of controlling congestion spreading by unblocking cold flows during hot spot traffic. It performs well with just a few dedicated SL s and VL s. However, the traversal and mapping algorithms for determining appropriate SL s and VL s to enable VOQ in InfiniBand switches is complicated. If there are any changes to the network topology, all of the tables must be recalculated. Adaptive routing is another useful method for avoiding congestion in the general case.

11 However, it requires specially configured routing tables to be implemented in IBA, and it has not been proven to be very effective in resolving hot spot traffic. An ideal solution would be to combine all three techniques. They are all compatible. Adaptive routing would enable network traffic to avoid congestion under low-intensity conditions. With hot spot traffic, congestion spots would still develop, and VOQ would allow most flows to bypass HOL blocking, maintaining high overall throughput. If the hot spots persisted, ECN would throttle the hot flow traffic to keep the congestion from spreading upstream. This does not imply that the congestion problem for InfiniBand networks is completely solved. There are areas that can be improved. A better adaptive routing algorithm for irregular networks could be developed. As we saw in Section 5, up*/down* routing does not adapt well to hot spot traffic. Another improvement would be to add more SL s and VL s to the IBA specification. This would allow VOQ to approach full VOQ, rather than partial VOQ, and would also create more resources for other uses, such as QoS. For ECN, the backward compatibility functionality that was outlined in [1] but dropped in the CCA could be developed. Also, a more sophisticated injection throttling algorithm could be developed. For example, [10] proposes a power source response function, rather than the standard multiplicative function, to enable faster reaction times and better utilization of available bandwidth. There are already some effective tools available for controlling congestion in IBA, but there are still opportunities for further research. 10. References [1] Y. Turner, J. R. Santos, and G. Janakiraman, An approach for congestion control in InfiniBand Internet Systems and Storage Laboratory, HP Laboratories Palo Alto, HPL (R.1), May [2] G. Pfister et al. Solving Hot Spot Contention Using Infiniband Architecture Congestion Control. Ion High Performance Interconnects for Distributed Computing, [3] InfiniBand Trade Association, InfiniBand Architecture Specification, Volume 1, Release 1.2 [online document], October 2004, Available at [4] J.C. Martinez, J. Flich, A. Robles, P. L opez, and J. Duato, Supporting Adaptive Routing in InfiniBand Networks, in 11th Euromicro Workshop in Parallel Distributedand Network- Based Processing, February, [5] M. E. Gomez, J. Flich, A. Robles, P. Lopez, J. Duato, "VOQSW: A Methodology to Reduce HOL Blocking in InfiniBand Networks," International Parallel and Distributed Processing Symposium, [6] W. J. Dally, B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, [7] J. C. Sancho, A. Robles, J. Duato, A Flexible Routing Scheme for Networks of Workstations, High Performance Computing: Third International Symposium, [8] W. Qiao and L. M. Ni, Adaptive Routing in Irregular Networks Using Cut-Through Switches, in Proceedings of the 1996 International Conference on Parallel Processing Volume 1, [9] F. Silla and J. Duato, Improving the Efficiency of Adaptive Routing in Networks with Irregular Topology, in Fourth International Conference on High Performance Computing, [10] S. Yan, G. Min, I. Awan, An Enhanced Congestion Control Mechanism in InfiniBand Networks for High Performance Computing Systems, in 20 th International Conference on Advanced Information Networking and Applications Volume 1, 2006.

Congestion Management in Lossless Interconnects: Challenges and Benefits

Congestion Management in Lossless Interconnects: Challenges and Benefits Congestion Management in Lossless Interconnects: Challenges and Benefits José Duato Technical University of Valencia (SPAIN) Conference title 1 Outline Why is congestion management required? Benefits Congestion

More information

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks Xuan-Yi Lin, Yeh-Ching Chung, and Tai-Yi Huang Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan 00, ROC

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ P. López, J. Flich and J. Duato Dept. of Computing Engineering (DISCA) Universidad Politécnica de Valencia, Valencia, Spain plopez@gap.upv.es

More information

36 IEEE POTENTIALS /07/$ IEEE

36 IEEE POTENTIALS /07/$ IEEE INTERCONNECTION NETWORKS ARE A KEY ELEMENT IN a wide variety of systems: massive parallel processors, local and system area networks, clusters of PCs and workstations, and Internet Protocol routers. They

More information

InfiniBand Congestion Control

InfiniBand Congestion Control InfiniBand Congestion Control Modelling and validation ABSTRACT Ernst Gunnar Gran Simula Research Laboratory Martin Linges vei 17 1325 Lysaker, Norway ernstgr@simula.no In a lossless interconnection network

More information

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Congestion Management in HPC

Congestion Management in HPC Congestion Management in HPC Interconnection Networks Pedro J. García Universidad de Castilla-La Mancha (SPAIN) Conference title 1 Outline Why may congestion become a problem? Should we care about congestion

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

3. Evaluation of Selected Tree and Mesh based Routing Protocols

3. Evaluation of Selected Tree and Mesh based Routing Protocols 33 3. Evaluation of Selected Tree and Mesh based Routing Protocols 3.1 Introduction Construction of best possible multicast trees and maintaining the group connections in sequence is challenging even in

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department UNIVERSITY OF CASTILLA-LA MANCHA Computing Systems Department A case study on implementing virtual 5D torus networks using network components of lower dimensionality HiPINEB 2017 Francisco José Andújar

More information

In-Order Packet Delivery in Interconnection Networks using Adaptive Routing

In-Order Packet Delivery in Interconnection Networks using Adaptive Routing In-Order Packet Delivery in Interconnection Networks using Adaptive Routing J.C. Martínez, J. Flich, A. Robles, P. López, and J. Duato Dept. of Computer Engineering Universidad Politécnica de Valencia

More information

End-to-End Congestion Control for InfiniBand

End-to-End Congestion Control for InfiniBand End-to-End Congestion Control for InfiniBand Jose Renato Santos, Yoshio Turner, G. (John) Janakiraman Internet Systems and Storage Laboratory HP Laboratories Palo Alto HPL-22-359 December 2 th, 22* congestion

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Unit 2 Packet Switching Networks - II

Unit 2 Packet Switching Networks - II Unit 2 Packet Switching Networks - II Dijkstra Algorithm: Finding shortest path Algorithm for finding shortest paths N: set of nodes for which shortest path already found Initialization: (Start with source

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. of Computer Engineering (DISCA) Universidad Politécnica de Valencia

More information

Lecture 21: Congestion Control" CSE 123: Computer Networks Alex C. Snoeren

Lecture 21: Congestion Control CSE 123: Computer Networks Alex C. Snoeren Lecture 21: Congestion Control" CSE 123: Computer Networks Alex C. Snoeren Lecture 21 Overview" How fast should a sending host transmit data? Not to fast, not to slow, just right Should not be faster than

More information

CS 268: Computer Networking

CS 268: Computer Networking CS 268: Computer Networking L-6 Router Congestion Control TCP & Routers RED XCP Assigned reading [FJ93] Random Early Detection Gateways for Congestion Avoidance [KHR02] Congestion Control for High Bandwidth-Delay

More information

Routing. 4. Mar INF-3190: Switching and Routing

Routing. 4. Mar INF-3190: Switching and Routing Routing 4. Mar. 004 1 INF-3190: Switching and Routing Routing: Foundations! Task! To define the route of packets through the network! From the source! To the destination system! Routing algorithm! Defines

More information

Episode 4. Flow and Congestion Control. Baochun Li Department of Electrical and Computer Engineering University of Toronto

Episode 4. Flow and Congestion Control. Baochun Li Department of Electrical and Computer Engineering University of Toronto Episode 4. Flow and Congestion Control Baochun Li Department of Electrical and Computer Engineering University of Toronto Recall the previous episode Detailed design principles in: The link layer The network

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Congestion in Data Networks. Congestion in Data Networks

Congestion in Data Networks. Congestion in Data Networks Congestion in Data Networks CS420/520 Axel Krings 1 Congestion in Data Networks What is Congestion? Congestion occurs when the number of packets being transmitted through the network approaches the packet

More information

Computer Networking. Queue Management and Quality of Service (QOS)

Computer Networking. Queue Management and Quality of Service (QOS) Computer Networking Queue Management and Quality of Service (QOS) Outline Previously:TCP flow control Congestion sources and collapse Congestion control basics - Routers 2 Internet Pipes? How should you

More information

McGill University - Faculty of Engineering Department of Electrical and Computer Engineering

McGill University - Faculty of Engineering Department of Electrical and Computer Engineering McGill University - Faculty of Engineering Department of Electrical and Computer Engineering ECSE 494 Telecommunication Networks Lab Prof. M. Coates Winter 2003 Experiment 5: LAN Operation, Multiple Access

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

Frame Relay. Frame Relay: characteristics

Frame Relay. Frame Relay: characteristics Frame Relay Andrea Bianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Network management and QoS provisioning - 1 Frame Relay: characteristics Packet switching

More information

Resource allocation in networks. Resource Allocation in Networks. Resource allocation

Resource allocation in networks. Resource Allocation in Networks. Resource allocation Resource allocation in networks Resource Allocation in Networks Very much like a resource allocation problem in operating systems How is it different? Resources and jobs are different Resources are buffers

More information

Random Early Detection (RED) gateways. Sally Floyd CS 268: Computer Networks

Random Early Detection (RED) gateways. Sally Floyd CS 268: Computer Networks Random Early Detection (RED) gateways Sally Floyd CS 268: Computer Networks floyd@eelblgov March 20, 1995 1 The Environment Feedback-based transport protocols (eg, TCP) Problems with current Drop-Tail

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636 1 Adaptive Routing Adaptive Routing Basics Minimal Adaptive Routing Fully Adaptive Routing Load-Balanced Adaptive Routing Search-Based Routing Case Study: Adapted Routing in the Thinking Machines CM-5

More information

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS 28 CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS Introduction Measurement-based scheme, that constantly monitors the network, will incorporate the current network state in the

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

15-744: Computer Networking. Overview. Queuing Disciplines. TCP & Routers. L-6 TCP & Routers

15-744: Computer Networking. Overview. Queuing Disciplines. TCP & Routers. L-6 TCP & Routers TCP & Routers 15-744: Computer Networking RED XCP Assigned reading [FJ93] Random Early Detection Gateways for Congestion Avoidance [KHR02] Congestion Control for High Bandwidth-Delay Product Networks L-6

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Priority Traffic CSCD 433/533. Advanced Networks Spring Lecture 21 Congestion Control and Queuing Strategies

Priority Traffic CSCD 433/533. Advanced Networks Spring Lecture 21 Congestion Control and Queuing Strategies CSCD 433/533 Priority Traffic Advanced Networks Spring 2016 Lecture 21 Congestion Control and Queuing Strategies 1 Topics Congestion Control and Resource Allocation Flows Types of Mechanisms Evaluation

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Appendix B. Standards-Track TCP Evaluation

Appendix B. Standards-Track TCP Evaluation 215 Appendix B Standards-Track TCP Evaluation In this appendix, I present the results of a study of standards-track TCP error recovery and queue management mechanisms. I consider standards-track TCP error

More information

Switching and Forwarding Reading: Chapter 3 1/30/14 1

Switching and Forwarding Reading: Chapter 3 1/30/14 1 Switching and Forwarding Reading: Chapter 3 1/30/14 1 Switching and Forwarding Next Problem: Enable communication between hosts that are not directly connected Fundamental Problem of the Internet or any

More information

AODV-PA: AODV with Path Accumulation

AODV-PA: AODV with Path Accumulation -PA: with Path Accumulation Sumit Gwalani Elizabeth M. Belding-Royer Department of Computer Science University of California, Santa Barbara fsumitg, ebeldingg@cs.ucsb.edu Charles E. Perkins Communications

More information

Optical Packet Switching

Optical Packet Switching Optical Packet Switching DEISNet Gruppo Reti di Telecomunicazioni http://deisnet.deis.unibo.it WDM Optical Network Legacy Networks Edge Systems WDM Links λ 1 λ 2 λ 3 λ 4 Core Nodes 2 1 Wavelength Routing

More information

Baidu s Best Practice with Low Latency Networks

Baidu s Best Practice with Low Latency Networks Baidu s Best Practice with Low Latency Networks Feng Gao IEEE 802 IC NEND Orlando, FL November 2017 Presented by Huawei Low Latency Network Solutions 01 1. Background Introduction 2. Network Latency Analysis

More information

ADVANCED TOPICS FOR CONGESTION CONTROL

ADVANCED TOPICS FOR CONGESTION CONTROL ADVANCED TOPICS FOR CONGESTION CONTROL Congestion Control The Internet only functions because TCP s congestion control does an effective job of matching traffic demand to available capacity. TCP s Window

More information

Lecture 4 Wide Area Networks - Congestion in Data Networks

Lecture 4 Wide Area Networks - Congestion in Data Networks DATA AND COMPUTER COMMUNICATIONS Lecture 4 Wide Area Networks - Congestion in Data Networks Mei Yang Based on Lecture slides by William Stallings 1 WHAT IS CONGESTION? congestion occurs when the number

More information

Congestion Control in Datacenters. Ahmed Saeed

Congestion Control in Datacenters. Ahmed Saeed Congestion Control in Datacenters Ahmed Saeed What is a Datacenter? Tens of thousands of machines in the same building (or adjacent buildings) Hundreds of switches connecting all machines What is a Datacenter?

More information

Configuring STP. Understanding Spanning-Tree Features CHAPTER

Configuring STP. Understanding Spanning-Tree Features CHAPTER CHAPTER 11 This chapter describes how to configure the Spanning Tree Protocol (STP) on your switch. For information about the Rapid Spanning Tree Protocol (RSTP) and the Multiple Spanning Tree Protocol

More information

Performance Consequences of Partial RED Deployment

Performance Consequences of Partial RED Deployment Performance Consequences of Partial RED Deployment Brian Bowers and Nathan C. Burnett CS740 - Advanced Networks University of Wisconsin - Madison ABSTRACT The Internet is slowly adopting routers utilizing

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

Principles of congestion control

Principles of congestion control Principles of congestion control Congestion: Informally: too many sources sending too much data too fast for network to handle Different from flow control! Manifestations: Lost packets (buffer overflow

More information

What Is Congestion? Computer Networks. Ideal Network Utilization. Interaction of Queues

What Is Congestion? Computer Networks. Ideal Network Utilization. Interaction of Queues 168 430 Computer Networks Chapter 13 Congestion in Data Networks What Is Congestion? Congestion occurs when the number of packets being transmitted through the network approaches the packet handling capacity

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 Oriana Riva, Department of Computer Science ETH Zürich 1 Today Flow Control Store-and-forward,

More information

ADVANCED COMPUTER NETWORKS

ADVANCED COMPUTER NETWORKS ADVANCED COMPUTER NETWORKS Congestion Control and Avoidance 1 Lecture-6 Instructor : Mazhar Hussain CONGESTION CONTROL When one part of the subnet (e.g. one or more routers in an area) becomes overloaded,

More information

An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing

An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing Pedro Yébenes 1, Jesús Escudero-Sahuquillo 1, Pedro J. García 1, Francisco

More information

A Routing Protocol for Utilizing Multiple Channels in Multi-Hop Wireless Networks with a Single Transceiver

A Routing Protocol for Utilizing Multiple Channels in Multi-Hop Wireless Networks with a Single Transceiver 1 A Routing Protocol for Utilizing Multiple Channels in Multi-Hop Wireless Networks with a Single Transceiver Jungmin So Dept. of Computer Science, and Coordinated Science Laboratory University of Illinois

More information

CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF

CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF Your Name: Answers SUNet ID: root @stanford.edu In accordance with both the letter and the

More information

Recap. TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness

Recap. TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness Recap TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness 81 Feedback Signals Several possible signals, with different

More information

Chapter III: Transport Layer

Chapter III: Transport Layer Chapter III: Transport Layer UG3 Computer Communications & Networks (COMN) Mahesh Marina mahesh@ed.ac.uk Slides thanks to Myungjin Lee and copyright of Kurose and Ross Principles of congestion control

More information

EEC-484/584 Computer Networks

EEC-484/584 Computer Networks EEC-484/584 Computer Networks Lecture 13 wenbing@ieee.org (Lecture nodes are based on materials supplied by Dr. Louise Moser at UCSB and Prentice-Hall) Outline 2 Review of lecture 12 Routing Congestion

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

CHAPTER 3 ENHANCEMENTS IN DATA LINK LAYER

CHAPTER 3 ENHANCEMENTS IN DATA LINK LAYER 32 CHAPTER 3 ENHANCEMENTS IN DATA LINK LAYER This proposed work describes the techniques used in the data link layer to improve the performance of the TCP in wireless networks and MANETs. In the data link

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

A Two-level Threshold Recovery Mechanism for SCTP

A Two-level Threshold Recovery Mechanism for SCTP A Two-level Threshold Recovery Mechanism for SCTP Armando L. Caro Jr., Janardhan R. Iyengar, Paul D. Amer, Gerard J. Heinz Computer and Information Sciences University of Delaware acaro, iyengar, amer,

More information

DIBS: Just-in-time congestion mitigation for Data Centers

DIBS: Just-in-time congestion mitigation for Data Centers DIBS: Just-in-time congestion mitigation for Data Centers Kyriakos Zarifis, Rui Miao, Matt Calder, Ethan Katz-Bassett, Minlan Yu, Jitendra Padhye University of Southern California Microsoft Research Summary

More information

Course 6. Internetworking Routing 1/33

Course 6. Internetworking Routing 1/33 Course 6 Internetworking Routing 1/33 Routing The main function of the network layer is routing packets from the source machine to the destination machine. Along the way, at least one intermediate node

More information

Understanding TCP Parallelization. Qiang Fu. TCP Performance Issues TCP Enhancements TCP Parallelization (research areas of interest)

Understanding TCP Parallelization. Qiang Fu. TCP Performance Issues TCP Enhancements TCP Parallelization (research areas of interest) Understanding TCP Parallelization Qiang Fu qfu@swin.edu.au Outline TCP Performance Issues TCP Enhancements TCP Parallelization (research areas of interest) Related Approaches TCP Parallelization vs. Single

More information

Network Working Group Request for Comments: 1046 ISI February A Queuing Algorithm to Provide Type-of-Service for IP Links

Network Working Group Request for Comments: 1046 ISI February A Queuing Algorithm to Provide Type-of-Service for IP Links Network Working Group Request for Comments: 1046 W. Prue J. Postel ISI February 1988 A Queuing Algorithm to Provide Type-of-Service for IP Links Status of this Memo This memo is intended to explore how

More information

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock Reading Assignment T. M. Pinkston, Deadlock Characterization and Resolution in Interconnection Networks, Chapter 13 in Deadlock Resolution in Computer Integrated Systems, CRC Press 2004 Deadlock: Part

More information

Stream Control Transmission Protocol

Stream Control Transmission Protocol Chapter 13 Stream Control Transmission Protocol Objectives Upon completion you will be able to: Be able to name and understand the services offered by SCTP Understand SCTP s flow and error control and

More information

TCP Congestion Control : Computer Networking. Introduction to TCP. Key Things You Should Know Already. Congestion Control RED

TCP Congestion Control : Computer Networking. Introduction to TCP. Key Things You Should Know Already. Congestion Control RED TCP Congestion Control 15-744: Computer Networking L-4 TCP Congestion Control RED Assigned Reading [FJ93] Random Early Detection Gateways for Congestion Avoidance [TFRC] Equation-Based Congestion Control

More information

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ J. Flich, P. López, M. P. Malumbres, and J. Duato Dept. of Computer Engineering

More information

Deterministic versus Adaptive Routing in Fat-Trees

Deterministic versus Adaptive Routing in Fat-Trees Deterministic versus Adaptive Routing in Fat-Trees C. Gómez, F. Gilabert, M.E. Gómez, P. López and J. Duato Dept. of Computer Engineering Universidad Politécnica de Valencia Camino de Vera,, 07 Valencia,

More information

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network This Lecture BUS0 - Computer Facilities Network Management Switching networks Circuit switching Packet switching gram approach Virtual circuit approach Routing in switching networks Faculty of Information

More information

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers Overview 15-441 Computer Networking TCP & router queuing Lecture 10 TCP & Routers TCP details Workloads Lecture 10: 09-30-2002 2 TCP Performance TCP Performance Can TCP saturate a link? Congestion control

More information

What Is Congestion? Effects of Congestion. Interaction of Queues. Chapter 12 Congestion in Data Networks. Effect of Congestion Control

What Is Congestion? Effects of Congestion. Interaction of Queues. Chapter 12 Congestion in Data Networks. Effect of Congestion Control Chapter 12 Congestion in Data Networks Effect of Congestion Control Ideal Performance Practical Performance Congestion Control Mechanisms Backpressure Choke Packet Implicit Congestion Signaling Explicit

More information

Impact of transmission errors on TCP performance. Outline. Random Errors

Impact of transmission errors on TCP performance. Outline. Random Errors Impact of transmission errors on TCP performance 1 Outline Impact of transmission errors on TCP performance Approaches to improve TCP performance Classification Discussion of selected approaches 2 Random

More information

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks Gwangsun Kim Arm Research Hayoung Choi, John Kim KAIST High-radix Networks Dragonfly network in Cray XC30 system 1D Flattened butterfly

More information

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD. OceanStor 9000 Issue V1.01 Date 2014-03-29 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in

More information

Transmission Control Protocol. ITS 413 Internet Technologies and Applications

Transmission Control Protocol. ITS 413 Internet Technologies and Applications Transmission Control Protocol ITS 413 Internet Technologies and Applications Contents Overview of TCP (Review) TCP and Congestion Control The Causes of Congestion Approaches to Congestion Control TCP Congestion

More information

Router participation in Congestion Control. Techniques Random Early Detection Explicit Congestion Notification

Router participation in Congestion Control. Techniques Random Early Detection Explicit Congestion Notification Router participation in Congestion Control 1 Techniques Random Early Detection Explicit Congestion Notification 68 2 Early congestion notifications Early notifications inform end-systems that the network

More information

LID Assignment In InfiniBand Networks

LID Assignment In InfiniBand Networks LID Assignment In InfiniBand Networks Wickus Nienaber, Xin Yuan, Member, IEEE and Zhenhai Duan, Member, IEEE Abstract To realize a path in an InfiniBand network, an address, known as Local IDentifier (LID)

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

Congestion Avoidance Overview

Congestion Avoidance Overview Congestion avoidance techniques monitor network traffic loads in an effort to anticipate and avoid congestion at common network bottlenecks. Congestion avoidance is achieved through packet dropping. Among

More information

Congestion Collapse in the 1980s

Congestion Collapse in the 1980s Congestion Collapse Congestion Collapse in the 1980s Early TCP used fixed size window (e.g., 8 packets) Initially fine for reliability But something happened as the ARPANET grew Links stayed busy but transfer

More information

ETSF05/ETSF10 Internet Protocols. Performance & QoS Congestion Control

ETSF05/ETSF10 Internet Protocols. Performance & QoS Congestion Control ETSF05/ETSF10 Internet Protocols Performance & QoS Congestion Control Quality of Service (QoS) Maintaining a functioning network Meeting applications demands User s demands = QoE (Quality of Experience)

More information

Advanced Computer Networks

Advanced Computer Networks Advanced Computer Networks QoS in IP networks Prof. Andrzej Duda duda@imag.fr Contents QoS principles Traffic shaping leaky bucket token bucket Scheduling FIFO Fair queueing RED IntServ DiffServ http://duda.imag.fr

More information

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. Informática de Sistemas y Computadores Universidad Politécnica

More information

Supporting Service Differentiation for Real-Time and Best-Effort Traffic in Stateless Wireless Ad-Hoc Networks (SWAN)

Supporting Service Differentiation for Real-Time and Best-Effort Traffic in Stateless Wireless Ad-Hoc Networks (SWAN) Supporting Service Differentiation for Real-Time and Best-Effort Traffic in Stateless Wireless Ad-Hoc Networks (SWAN) G. S. Ahn, A. T. Campbell, A. Veres, and L. H. Sun IEEE Trans. On Mobile Computing

More information

CS4700/CS5700 Fundamentals of Computer Networks

CS4700/CS5700 Fundamentals of Computer Networks CS4700/CS5700 Fundamentals of Computer Networks Lecture 16: Congestion control II Slides used with permissions from Edward W. Knightly, T. S. Eugene Ng, Ion Stoica, Hui Zhang Alan Mislove amislove at ccs.neu.edu

More information

Abstract. Paper organization

Abstract. Paper organization Allocation Approaches for Virtual Channel Flow Control Neeraj Parik, Ozen Deniz, Paul Kim, Zheng Li Department of Electrical Engineering Stanford University, CA Abstract s are one of the major resources

More information

ETSF05/ETSF10 Internet Protocols. Performance & QoS Congestion Control

ETSF05/ETSF10 Internet Protocols. Performance & QoS Congestion Control ETSF05/ETSF10 Internet Protocols Performance & QoS Congestion Control Quality of Service (QoS) Maintaining a functioning network Meeting applications demands User s demands = QoE (Quality of Experience)

More information

Congestion Avoidance

Congestion Avoidance Congestion Avoidance Richard T. B. Ma School of Computing National University of Singapore CS 5229: Advanced Compute Networks References K. K. Ramakrishnan, Raj Jain, A Binary Feedback Scheme for Congestion

More information