Congestion Management in HPC

Similar documents
Congestion Management in Lossless Interconnects: Challenges and Benefits

36 IEEE POTENTIALS /07/$ IEEE

Congestion Management for Ethernet-based Lossless DataCenter Networks

An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing

InfiniBand Congestion Control

Dynamic Network Reconfiguration for Switch-based Networks

Congestion in InfiniBand Networks

SCALABLE STRATEGIES FOR ALLEVIATING THE HOL BLOCKING PRODUCED BY CONGESTION TREES IN LOSSLESS INTERCONNECTION NETWORKS

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

Design of a Tile-based High-Radix Switch with High Throughput

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ

Routing Algorithms. Review

High Node Count - Scalability Challenges for Interconnection Networks

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Network-on-chip (NOC) Topologies

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

What Is Congestion? Effects of Congestion. Interaction of Queues. Chapter 12 Congestion in Data Networks. Effect of Congestion Control

Extending commodity OpenFlow switches for large-scale HPC deployments

Basic Low Level Concepts

The Impact of Optics on HPC System Interconnects

Quality of Service. Traffic Descriptor Traffic Profiles. Figure 24.1 Traffic descriptors. Figure Three traffic profiles

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS

Congestion Control in Communication Networks

Chapter 24 Congestion Control and Quality of Service 24.1

ETSF05/ETSF10 Internet Protocols. Performance & QoS Congestion Control

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Congestion in Data Networks. Congestion in Data Networks

ETSF05/ETSF10 Internet Protocols. Performance & QoS Congestion Control

Lecture 21: Congestion Control" CSE 123: Computer Networks Alex C. Snoeren

Fairness Example: high priority for nearby stations Optimality Efficiency overhead

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

Resource allocation in networks. Resource Allocation in Networks. Resource allocation

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Exploring InfiniBand Congestion Control

Interconnection Network

The final publication is available at

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ

What Is Congestion? Computer Networks. Ideal Network Utilization. Interaction of Queues

"Filling up an old bath with holes in it, indeed. Who would be such a fool?" "A sum it is, girl," my father said. "A sum. A problem for the mind.

University of Castilla-La Mancha

Requirement Discussion of Flow-Based Flow Control(FFC)

Efficient Switches with QoS Support for Clusters

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Boosting the Performance of Myrinet Networks

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Random Early Detection (RED) gateways. Sally Floyd CS 268: Computer Networks

Packet Switch Architecture

Packet Switch Architecture

CSMA based Medium Access Control for Wireless Sensor Network

CS519: Computer Networks

A Cost and Scalability Comparison of the Dragonfly versus the Fat Tree. Frank Olaf Sem-Jacobsen Simula Research Laboratory

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Future Routing Schemes in Petascale clusters

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Revisiting Network Support for RDMA

Mobile Transport Layer

Computer Networking. Queue Management and Quality of Service (QOS)

Congestion Control for High Bandwidth-delay Product Networks. Dina Katabi, Mark Handley, Charlie Rohrs

Congestion Control. Daniel Zappala. CS 460 Computer Networking Brigham Young University

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

ADVANCED COMPUTER NETWORKS

Chapter 6: Congestion Control and Resource Allocation

Address InterLeaving for Low- Cost NoCs

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

Dynamic Scheduling Algorithm for input-queued crossbar switches

In-Order Packet Delivery in Interconnection Networks using Adaptive Routing

DUE to the increasing computing power of microprocessors

Basic Switch Organization

Congestion. Can t sustain input rate > output rate Issues: - Avoid congestion - Control congestion - Prioritize who gets limited resources

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

Congestion Control and Resource Allocation

Unit 2 Packet Switching Networks - II

Chapter II. Protocols for High Speed Networks. 2.1 Need for alternative Protocols

Routing and Fault-Tolerance Capabilities of the Fabriscale FM compared to OpenSM

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing

Unicast Routing in Mobile Ad Hoc Networks. Dr. Ashikur Rahman CSE 6811: Wireless Ad hoc Networks

Bandwidth Allocation & TCP

Application of SDN: Load Balancing & Traffic Engineering

Frame Relay. Frame Relay: characteristics

TCP and BBR. Geoff Huston APNIC

Network Control and Signalling

Ethernet Hub. Campus Network Design. Hubs. Sending and receiving Ethernet frames via a hub

A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes

Computer Networks. Sándor Laki ELTE-Ericsson Communication Networks Laboratory

Chapter 7 CONCLUSION

P802.1Qcz Congestion Isolation

PLEASE READ CAREFULLY BEFORE YOU START

Lecture 21. Reminders: Homework 6 due today, Programming Project 4 due on Thursday Questions? Current event: BGP router glitch on Nov.

End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Transmission Control Protocol. ITS 413 Internet Technologies and Applications

CONNECTION-BASED ADAPTIVE ROUTING USING DYNAMIC VIRTUAL CIRCUITS

Transcription:

Congestion Management in HPC Interconnection Networks Pedro J. García Universidad de Castilla-La Mancha (SPAIN) Conference title 1

Outline Why may congestion become a problem? Should we care about congestion in current HPC systems? How can congestion be managed? Challenges 2

Why may congestion become a problem? For three decades the goal of computer architects has been to keep the processors busy top performance Interconnects were usually cheap, and never a bottleneck Now, global system performance in large systems is limited by the interconnection network Network saturation leads to congestion situations that may drastically degrade network performance 3

Contention Several packets from different flows request the same output port in a switch One packet makes progress, the others wait Network contention 4

Congestion Persistent contention, mainly in network saturation state Buffers containing packets belonging gto flows involved in contention become full Persistent network contention 5

Congestion propagation In saturated lossless networks, congestion is quickly propagated by flow control, forming congestion trees Flow control Persistent network contention 6

Congestion propagation In saturated lossless networks, congestion is quickly propagated by flow control, forming congestion trees Congestion tree structure: Congestion propagation may reach the sources Congestion tree leaf Congestion tree branch Congestion tree root Congestion tree leaf Congestion tree branch 7

Congestion trees and Head of Line blocking Congestion trees may cause Head of Line (HoL) blocking Non-congested packets advance at the same speed as congested ones Congestion affects sources that do not cause congestion 8

Network performance at saturation HS = traffic injected to Hot Spot destination HS starts HS ends At saturation, network performance drops dramatically due to congestion situations 9

Should we currently care about congestion? Conflicting interests: cost vs. performance Saturation was traditionally avoided by overdimensioning the interconnection network 10

Network overdimensioning Many more components than really necessary Offered network bandwidth is much higher h than the bandwidth requested by end nodes 11

Network overdimensioning Advantage: low link utilization congestion is unlikely Saturation Working zone zone ency Late Injected traffic Disadvantages: Expensive (processors cheaper relative to interconnects) Power consumption increases (growing link speed) 12

Should we currently care about congestion? Conflicting interests: cost vs. performance Saturation was traditionally avoided by overdimensioning the interconnection network currently not suitable No network overdimensioning? 13

Network not overdimensioned Only the components strictly necessary to interconnect all the processing nodes Offered network bandwidth decreases 14

Network not overdimensioned Advantages: cheaper, less power consumption Saturation Working zone zone tency Lat Injected traffic Disadvantage: high link utilization congestion is likely 15

Should we currently care about congestion? Conflicting interests: cost vs. performance Saturation was traditionally avoided by overdimensioning the interconnection network Currently not suitable No overdimensioning Danger when working with high traffic loads (close to the saturation point) Network performance (throughput, latency) should be good under very different traffic patterns & load scenarios Traffic load may significantly vary over time, reaching saturation Some strategy to deal with congestion is required 16

The big picture: Power Growing processor Growing link consumption speed speed increases Processor prices drop (demand) Relative interconnect cost increases Power management Smaller networks Congestion probability grows Performance Congestion Management degradation Strategies Saturation point reached with lower traffic load Bandwidth decreases 17

Benefits of congestion management Stable performance when the network reaches saturation No performance drop Delivers maximum achievable throughput Reacts quickly when power management turned some components off and demand suddenly increases Prevents performance degradation due to power management Enables more aggressive power saving strategies without risk Helps to keep performance when faults occur and fault tolerance techniques enable alternative paths Alternative paths may become congested (fewer resources are available) 18

How can congestion be managed? Different approaches to congestion management: Packet dropping Proactive techniques Reactive techniques HoL blocking elimination techniques Hybrid techniques Related techniques 19

Packet dropping Packets in congested buffers are discarded Suitable for computer networks (like the Internet) but not suitable for most current HPC parallel applications Both congested and non congested packets may be discarded Discarded packets must be retransmitted, thus increasing final packet latency 20

Proactive congestion management A.K.A. congestion prevention Path setup before data transmission [1] Used in ATM, computer networks (QoS) Optimal performance requires to know in advance: Resource requirements of each transmission Network status Knowledge about network status is not always available High overhead, high setup latencies, poor link utilization (not suitable for HPC) [1] P. Yew, N. Tzeng, D.H. Lawrie, Distributing Hot Spot Addressing in Large Scale Multiprocessors, IEEE Transactions on Computers, 36(4): 388 395, 1987. 21

Reactive congestion management A.K.A. congestion recovery Injection limitation techniques (injection throttling) using closed loop feedback Does not scale well with network size and link bandwidth Notification delay (proportional to distance / number of hops) Link and buffer capacity (proportional to clock frequency) May produce traffic oscillations (closed loop system with pure delay) 22

Reactive congestion management Example: Infiniband FECN/BECN mechanism [2]: Two bits in the packet header are reserved for congestion notification If a switch port is considered as congested, the Forward Explicit Congestion Notification (FECN) bit in the header of packets crossing that port is set Upon reception of such a FECN marked packet, a destination will return a packet (Congestion Notification Packet, CNP) whose header will have the Backward Explicit Congestion Notification (BECN) bit set back to the source Any source receiving a BECN marked packet will then reduce its packet injection rate for this traffic flow [2] E.G. Gran, M. Eimot, S.A. Reinemo, T. Skeie, O. Lysne, L. Huse, G. Shainer, First experiences with congestion control in InfiniBand hardware, in Proceedings of IPDPS 2010, pp. 1 12. 23

HoL blocking elimination techniques Key idea: The real problem is not the congestion itself, but its negative effect (HoL blocking) By eliminating HoL blocking, congestion becomes harmless 24

Example of HoL blocking due to congestion Should congested flows be throttled? Src. 0 33 % Sw. 1 33 % Sw. 5 Congested flows Non-congested flows Src. 1 33 % Sw. 2 33 % 33 % Sw. 6 Sw. 8 33 % 100 % Dst. 1 Src. 2 Sw. 3 Sw. 7 33 % 33 % 66 % 33 % Dst. 2 Src. 3 33 % Sw. 4 33 % 33 % Sending 33 % Stopped 33 % Sending 25

Example of real life HoL blocking The A 31 highway metaphor Bottleneck A-31 A-43 The flow is affected by the bottleneck of the A 31 highway Map Source: Google Maps 26

HoL blocking elimination techniques In general, these techniques rely on having different queues at each port to separate different packet kt flows They differ mainly in the criteria to map packets to queues and in the number of required queues per port 27

HoL blocking elimination techniques VOQnet (Virtual Output Queuing at network level) [3] A separate queue at each input port for every destination Packets with the same destination are stored in the same queue Selected_Queue = Packet_Destination Completely eliminates HoL blocking Number of required buffer resources increases at least quadratically with network size!!! [3] W. Dally, P. Carvey, L. Dennison, Architecture of the Avici terabit switch/router, in Proceedings of 6th Hot Interconnects, 1998, pp. 41 50. 28

HoL blocking elimination techniques VOQsw (Virtual Output Queuing at switch level) [4] & DAMQs (Dynamically Allocated Multi Queues) [5] A separate queue at every input port for every output port Packets requesting the same output t are stored in the same queue Selected_Queue = Requested_Output_Port Better than nothing but they do not completely eliminate HoL blocking Effectiveness depends on topology and traffic pattern [4] T. Anderson, S. Owicki, J. Saxe, C. Thacker, High speed switch scheduling for local area networks, ACM Transactions on Computer Systems, vol. 11 (4), pp. 319 352, November 1993. [5] Y. Tamir, G. Frazier, Dynamically allocated multi queue buffers for VLSI communication switches, IEEE Transactions on Computers,vol. 41 (6), June 1992. 29

HoL blocking elimination techniques DBBM (Destination Based Buffer Management) )[6] Several groups of destinations are defined A separate queue for each group at every port (q queues per port) Packets with destinations in the same group are stored at the same queue Selected_Queue = Packet_Destination MOD q Does not completely eliminate HoL blocking Effectiveness depends on the number of queues, topology and traffic pattern [6] T. Nachiondo, J. Flich, J. Duato, Buffer management strategies to reduce HoL blocking, IEEE Transactions on Parallel and Distributed Systems, vol. 21 (6), pp. 739 753, 2010. 30

HoL blocking elimination techniques OBQA (Output Based Queue Assignment) [7] Suitable for fat trees with DESTRO routing Queue assignment linked with topology & routing algorithm Reduces HoL blocking with the minimum number of queues per port (q) Sl Selected_Queue tdq = Requested_Output_Port t t tmod q q smaller than half the switch radix Does not completely eliminate HoL blocking Effectiveness depends on the number of queues [7] J. Escudero Sahuquillo, P. J. García, F. J. Quiles, J. Duato, An efficient strategy for reducing head of line blocking in fat trees, in LNCS vol. 6272, pp. 413 427. Proceedings of 16 th International Euro Par Conference (II), () Ischia, Italy, Sept. 2010. 31

Performance comparison Uniform traffic simulation results Network Latency y( (cycles) vs Normalized Generated Traffic 4 ary 4 tree 8x8 switches 16 ary 2 tree 32x32 switches 32

HoL blocking elimination techniques RECN (Regional Explicit Congestion Notification) [8] & FBICM (Flow Based Implicit Congestion Management) [9] RECN has been proposed for source based routing networks while FBICM for distributed table based routing networks The key difference with respect to previous techniques is that they completely and dynamically isolate congested flows Basics: Explicit identification of congested flows Storage of congestion information Dynamic queue allocation to isolate congested flows [8] P. J. García, J. Flich, J. Duato, I. Johnson, F. J. Quiles, F. Naven, Efficient, scalable congestion management for interconnection networks, IEEE Micro, vol. 26 (5), pp. 52 66, September 2006. [9] J. Escudero Sahuquillo, P. J. García, F. J. Quiles, J. Flich, J. Duato, Cost effective congestion management for interconnection networks using distributed deterministic routing, in Proceedings of ICPADS 2010, Shanghai, China, December 2010. 33

RECN/FBICM basic procedure Congested points are detected at any port of the network by measuring queue occupancy The location of any detected d congested point is stored in a control memory (a CAM line) at any port forwarding packets towards the congested point: RECN: an explicit route is stored FBICM: a list of destinations is stored to implicitly locate the point A special queue associated to the CAM line is also allocated to exclusively store packets addressed to that congested point Congestion information is progressively notified to any port in other switches crossed by congested flows, where new CAM lines and special il queues are allocated A packet arriving at a port is stored in the standard queue only if its routing information does not match any CAM line 34

RECN/FBICM queue requirements Non congested packets can share queues without suffering significant HoL blocking only one standard queue per port Special queues are allocated/deallocated when required, thus congested packets can be separately buffered by using a small number of special queues per port HoL blocking produced d by congested packets is eliminated in a scalable way 35

RECN/FBICM drawbacks In scenarios with a lot of different congested points, it is possible to run out of special queues at some ports The need for CAMs at switch ports increases implementation cost and required silicon area per port 36

Hybrid congestion management strategies Example: Combining Injection Throttling and FBICM [10]: Use FBICM to quickly and locally eliminate HoL blocking blocking, propagating congestion information and allocating queues as necessary Use reactive congestion management to slowly eliminate congestion, deallocating FBICM queues whenever possible Use of FBICM provides immediate response and allows reactive congestion management to be tuned for slow reaction, thus avoiding oscillations Reactive congestion management drastically reduces FBICM buffer requirements (just one or two queues per port) [10] J. Escudero Sahuquillo, E. G. Gran, P.J. García, J. Flich, T. Skeie, O. Lysne, F.J. Quiles, J. Duato, Combining Congested Flow Isolation and Injection Throttling in HPC Interconnection Networks, to appear in Proceedings of ICPP 2011. 37

Performance comparison Hot spot scenario simulation results Network Normalized Throughput vs Time 4 ary 3 tree 1 hot spot 4 ary 3 tree 4 hot spots 38

Related techniques Adaptive Routing/Traffic balancing May help to delay the occurrence of congestion Useless when heavy congestion arises Problems regarding in order packet delivery Existing congestion management techniques do not work correctly with adaptive routing (congested points may vary) Adaptive routing may spread congestion over more links Virtual Channels Performance depends d on channel (queue) assignment 39

Challenges To develop congestion management techniques that react locally and immediately when congestion arises To make congestion management techniques truly scalable To achieve coordination among end nodes without explicit communication among them To eliminate instabilities and oscillatory responses To minimize the number of extra resources needed to handle congestion To make congestion management compatible with ih adaptive routing 40

Acknowledgements Jose Duato (Universitat Politecnica de Valencia), who generously gave us the main ideas behind our congestion management proposals Jose Flich (Universitat i Politecnica i de Vl Valencia) i) and Jesus Escudero Sahuquillo (Universidad de Castilla La Mancha), who have developed alongside me all our congestion management proposals The technique combining reactive congestion management and FBICM has been developed in collaboration with Simula Research Laboratory (Oslo) 41

Thanks!! Any question? Conference title 42