Congestion Management in Lossless Interconnects: Challenges and Benefits

Similar documents
Congestion Management in HPC

36 IEEE POTENTIALS /07/$ IEEE

An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing

Congestion Management for Ethernet-based Lossless DataCenter Networks

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department

Congestion in InfiniBand Networks

Requirement Discussion of Flow-Based Flow Control(FFC)

SCALABLE STRATEGIES FOR ALLEVIATING THE HOL BLOCKING PRODUCED BY CONGESTION TREES IN LOSSLESS INTERCONNECTION NETWORKS

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Quality of Service. Traffic Descriptor Traffic Profiles. Figure 24.1 Traffic descriptors. Figure Three traffic profiles

InfiniBand Congestion Control

What Is Congestion? Effects of Congestion. Interaction of Queues. Chapter 12 Congestion in Data Networks. Effect of Congestion Control

Basic Low Level Concepts

What Is Congestion? Computer Networks. Ideal Network Utilization. Interaction of Queues

Routing Algorithms. Review

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

Lecture 21: Congestion Control" CSE 123: Computer Networks Alex C. Snoeren

Congestion in Data Networks. Congestion in Data Networks

Unit 2 Packet Switching Networks - II

Resource allocation in networks. Resource Allocation in Networks. Resource allocation

Application of SDN: Load Balancing & Traffic Engineering

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including

Address InterLeaving for Low- Cost NoCs

Chapter II. Protocols for High Speed Networks. 2.1 Need for alternative Protocols

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Congestion Control in Communication Networks

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Computer Networking. Queue Management and Quality of Service (QOS)

Fairness Example: high priority for nearby stations Optimality Efficiency overhead

ADVANCED COMPUTER NETWORKS

15-744: Computer Networking TCP

Congestion Control. Daniel Zappala. CS 460 Computer Networking Brigham Young University

Computer Networking

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

"Filling up an old bath with holes in it, indeed. Who would be such a fool?" "A sum it is, girl," my father said. "A sum. A problem for the mind.

Bandwidth Allocation & TCP

Stateless Resource Sharing AND ATS

End-to-End Adaptive Packet Aggregation for High-Throughput I/O Bus Network Using Ethernet

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Congestion. Can t sustain input rate > output rate Issues: - Avoid congestion - Control congestion - Prioritize who gets limited resources

TCP and BBR. Geoff Huston APNIC

ETSF05/ETSF10 Internet Protocols. Performance & QoS Congestion Control

Optical Packet Switching

Next Steps Spring 2011 Lecture #18. Multi-hop Networks. Network Reliability. Have: digital point-to-point. Want: many interconnected points

Lecture 21. Reminders: Homework 6 due today, Programming Project 4 due on Thursday Questions? Current event: BGP router glitch on Nov.

Introduction: Two motivating examples for the analytical approach

CSE 123A Computer Networks

The final publication is available at

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Networks. Distributed Systems. Philipp Kupferschmied. Universität Karlsruhe, System Architecture Group. May 6th, 2009

Hybrid Control and Switched Systems. Lecture #17 Hybrid Systems Modeling of Communication Networks

4. Networks. in parallel computers. Advances in Computer Architecture

Unicast Routing in Mobile Ad Hoc Networks. Dr. Ashikur Rahman CSE 6811: Wireless Ad hoc Networks

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste

TCP and BBR. Geoff Huston APNIC

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ

Frame Relay. Frame Relay: characteristics

Appendix B. Standards-Track TCP Evaluation

Congestion Control for High Bandwidth-delay Product Networks. Dina Katabi, Mark Handley, Charlie Rohrs

Random Early Detection (RED) gateways. Sally Floyd CS 268: Computer Networks

Delay Tolerant Networks

ETSF10 Internet Protocols Transport Layer Protocols

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network

Transmission Control Protocol. ITS 413 Internet Technologies and Applications

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

ETSF05/ETSF10 Internet Protocols. Performance & QoS Congestion Control

Future Routing Schemes in Petascale clusters

Transport layer issues

Lecture 4 Wide Area Networks - Congestion in Data Networks

QCN: Quantized Congestion Notification. Rong Pan, Balaji Prabhakar, Ashvin Laxmikantha

XCP: explicit Control Protocol

Chapter 24 Congestion Control and Quality of Service 24.1

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Layer 3: Network Layer. 9. Mar INF-3190: Switching and Routing

Interconnection Network

TCP Congestion Control : Computer Networking. Introduction to TCP. Key Things You Should Know Already. Congestion Control RED

RCRT:Rate-Controlled Reliable Transport Protocol for Wireless Sensor Networks

Congestion Control and Resource Allocation

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

Chapter 7 CONCLUSION

Congestion Management Protocols Simulation Results and Protocol Variations

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

Lecture 14: Congestion Control"

Congestion Control for High Bandwidth-delay Product Networks

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 11

P802.1Qcz Congestion Isolation

TCP and BBR. Geoff Huston APNIC. #apricot

Chapter III. congestion situation in Highspeed Networks

Activity-Based Congestion Management for Fair Bandwidth Sharing in Trusted Packet Networks

different problems from other networks ITU-T specified restricted initial set Limited number of overhead bits ATM forum Traffic Management

Network-on-chip (NOC) Topologies

Switched Network Latency Problems Solved

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

6.033 Spring 2015 Lecture #11: Transport Layer Congestion Control Hari Balakrishnan Scribed by Qian Long

Network Control and Signalling

AODV-PA: AODV with Path Accumulation

Quality of Service Mechanism for MANET using Linux Semra Gulder, Mathieu Déziel

CS-534 Packet Switch Architecture

Transcription:

Congestion Management in Lossless Interconnects: Challenges and Benefits José Duato Technical University of Valencia (SPAIN) Conference title 1

Outline Why is congestion management required? Benefits Congestion and congestion management strategies Challenges Enhancing reactive congestion management Congestion management & adaptive routing HOL blocking elimination techniques Hybrid congestion management strategy HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 2

Current role of interconnection networks For three decades the goal of computer architects has been to keep the processors busy top performance Interconnects were usually cheap, and never a bottleneck Now, global system performance in large systems is limited by the interconnection network (e.g. Tianhe-1A) Network latency directly impacts application performance, and network saturation leads to latency increasing by orders of magnitude Saturation should be avoided at all costs HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 3

Conflicting interests: cost vs. performance Saturation was traditionally avoided by overdimensioning the interconnection network, but this is becoming very expensive No overdimensioning Danger when working with high traffic loads (close to the saturation point) Network performance (throughput, latency) should be good under very different traffic patterns & load scenarios Traffic load may significantly vary over time, reaching saturation At saturation, network performance drops dramatically HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 4

Network throughput at saturation HS = traffic injected to Hot Spot destination HS starts HS ends HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 5

Should we currently care about congestion? Growing processor speed Growing link speed Power consumption increases Processor prices drop (demand) Relative interconnect cost increases Power management Smaller networks Congestion probability grows Congestion Performance Management degradation Strategies Saturation point reached with lower traffic load Bandwidth decreases HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 6

Benefits Stable performance when the network reaches saturation No performance drop Delivers maximum achievable throughput Reacts quickly when power management turned some components off and demand suddenly increases Prevents performance degradation due to power management Enables more aggressive power saving strategies without risk Helps to keep performance when faults occur and fault tolerance techniques enable alternative paths Alternative paths may become congested (less resources are available) HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 7

Contention Several packets from different flows request the same output port in a switch One packet makes progress, the others wait Network contention HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 8

Congestion Persistent contention Buffers containing packets belonging to flows involved in contention become full Persistent network contention HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 9

Congestion propagation In lossless networks, congestion is quickly propagated by flow control, forming congestion trees Flow control Persistent network contention HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 10

Congestion propagation In lossless networks, congestion is quickly propagated by flow control, forming congestion trees Congestion trees may cause Head-of-Line blocking Congestion propagation may reach the sources Persistent network contention Congestion affects packets belonging to flows that do not cause congestion HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 11

Congestion trees Congestion tree structure: Congestion tree leaf Congestion tree branch Congestion tree root Congestion tree leaf Congestion tree branch HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 12

Traditional solution Overdimensioning the network Many more components than really necessary Offered network bandwidth is much higher than the bandwidth requested by end nodes Overdimensioned network HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 13

Latency Traditional solution Overdimensioning the network Advantage: low link utilization low latency Latency Working zone Congestion zone Injected traffic Traffic HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 14

Classical congestion management strategies Proactive congestion management (congestion prevention) Path setup before data transmission Used in ATM, computer networks (QoS) High overhead, high setup latencies, poor link utilization (not suitable for HPC) HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 15

Classical congestion management techniques Reactive congestion management (congestion recovery) Injection limitation techniques using closed-loop feedback Does not scale well with network size and link bandwidth Notification delay (proportional to distance / number of hops) Link and buffer capacity (proportional to clock frequency) May produce traffic oscillations (closed loop system with pure delay) HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 16

Other approaches Adaptive Routing May help to delay the occurrence of congestion Useless when heavy congestion arises Problems regarding in-order packet delivery Packet dropping Not suitable for most current HPC parallel applications HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 17

Challenges To develop congestion management techniques that react locally and immediately when congestion arises To make congestion management techniques truly scalable To achieve coordination among end nodes without explicit communication among them To eliminate instabilities and oscillatory responses To minimize the number of extra resources needed to handle congestion To make congestion management compatible with adaptive routing HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 18

Enhancing reactive congestion management Stop injecting packets for a while when a BECN is received Do not change injection rate again until feedback from previous changes is received to prevent oscillations Source nodes can dynamically adjust their injection rate to available bandwidth without communicating among them Inject exactly one packet when a BECN is received New contenders are automatically detected and injection rate reduced Slightly reduce the above rate to slowly eliminate the congestion tree HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 19

Congestion management & adaptive routing Existing congestion management techniques do not work correctly with adaptive routing Injection rate is adjusted for a certain congestion, but now packets may follow a different path (unstable behavior) Adaptive routing may spread congestion over more links Never use adaptive routing for congested packets when the congestion point is at an end node In this case, adaptive routing does not help, spreads congestion over more links, and increases HOL blocking Use adaptive routing otherwise(more research needed here) HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 20

HOL blocking elimination techniques Key idea: The real problem is not the congestion itself, but its negative effect (HOL blocking) By eliminating HOL blocking, congestion becomes harmless In general, different buffers required at each port for separating packet flows HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 21

HOL blocking example 33 % Sw. 1 33 % Sw. 5 Congested flows Non-congested flows 33 % Sw. 2 33 % Sw. 6 33 % 33 % Sw. 8 100 % Dst. 1 33 % Sw. 3 33 % Sw. 7 66 % 33 % Dst. 2 33 % Sw. 4 33 % 33 % Sending 33 % Stopped 33 % Sending HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 22

Real-life HOL blocking example The A-31 highway metaphor Bottleneck A-31 A-43 The flow is affected by the bottleneck of the A-31 highway Map Source: Google Maps HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 23

HOL blocking elimination techniques VOQnet (Virtual Output Queuing at network level) A separate queue at each input port for every destination Packets with the same destination are stored in the same queue Completely eliminates HOL blocking Number of required buffer resources increases at least quadratically with network size!!! HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 24

HOL blocking elimination techniques VOQsw (Virtual Output Queuing at switch level) & DAMQs (Dynamically Allocated Multi-Queues) A separate queue at every input port for every output port Packets requesting the same output are stored in the same queue Better than nothing but does not eliminate HOL blocking completely. Effectiveness depends on traffic pattern. Virtual Channels Performance depends on channel (queue) assignment HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 25

HOL blocking elimination techniques DBBM (Destination-Based Buffer Management) Several groups of destinations are defined A separate queue for each group at every port Packets with destinations in the same group are stored at the same queue OBQA (Output-Based Queue Assignment) Suitable for fat-trees with DESTRO routing Queue assignment linked with topology & routing algorithm Reduces HOL blocking with the minimum number of queues per port HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 26

OBQA description Logical input port organization Each input port has a number of queues (q) smaller than switch radix OBQA assigns packets to queues using this formula: Selected_Queue = Requested_Output_Port MOD q HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 27

OBQA evaluation Uniform traffic simulation results Network Latency (cycles) vs Normalized Generated Traffic 4-ary 4-tree 8x8 switches (configuration #2) 16-ary 2-tree 32x32 switches (configuration #3) HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 28

HOL blocking elimination techniques RECN (Regional Explicit Congestion Notification) & FBICM (Flow-Based Implicit Congestion Management) Key differences with respect to previous techniques: Explicitly identifies congested points Congestion information storage Dynamic queue allocation for congested flows HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 29

Principles of RECN-like solutions Congestion becomes harmless if the HOL blocking produced by congested packets is completely eliminated. HOL blocking produced by congested packets is completely eliminated if they are buffered separately. Non-congested packets can share queues without suffering significant HOL blocking. Congested packets can be separately buffered by using a small number of queues per port. Congested packets must be explicitly identified (i.e. packets belonging to flows contributing to create some congestion). Precise identification of congested packets is based on previous knowledge of the location of existing congestion points. HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 30

RECN basic procedure Congested points are detected at any input or output switch port of the network The routes to detected congested points are progressively notified to input and output ports crossed by congested flows After receiving a notification, a port dynamically allocates a CAM line to store the location of the congested point, and a set-aside queue (SAQ) to store congested packets A packet arriving at a port will be stored in a SAQ if it will pass through the congested point associated to that SAQ A packet arriving at a port will be stored in the standard ( cold ) queue if its route does not match any CAM entry SAQs can be dynamically deallocated, and later allocated for other congested points HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 31

How RECN works A congestion point forms HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 32

How RECN works Cold queue fills over a threshold HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 33

How RECN works Internal notification to each input sending packets to the congested output HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 34

How RECN works New SAQs are allocated for packets addressed to the congested output port HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 35

How RECN works Notifications sent when the SAQs fill over a threshold HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 36

How RECN works A new SAQ is allocated for the congested output at each notified output HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 37

How RECN works Internal notifications when the SAQs receive packets and the occupancy is over a threshold HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 38

How RECN works HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 39

How RECN works HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 40

How RECN works At the end, congestion tree packets are completely stored in SAQs HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 41

How RECN works Cold flow sharing some network resources with a branch of the congestion tree HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 42

How RECN works Cold packets are never stored in SAQs, so they never share a queue with congested packets HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 43

RECN basics RECN achieves efficiency and scalability in source routing environments 0 3-1 +4 +3 Turnpool Turnpointer Packet header information: The routing information is included in packet header and congestion notifications (turnpool), and it is used at each hop (turnpointer) HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 44

CAM structure Xon/Xoff Flow control CAM v turnpool bit mask b Xoff v turnpool bit mask b Xoff v turnpool bit mask b Xoff SAQ 0 SAQ 1 SAQ n-1 Valid Congested point Blocked HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 45

Reception of packets after SAQ allocation Turn pointer 4 2 Header of incoming packet Cold Queue +4 SAQ 0 SAQ 1 SAQ n-1 1..00004.0000111? 0 CAM line SAQ 0 CAM line SAQ 1 The incoming packet is stored in SAQ0 CAM line SAQ n-1 HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 46

FBICM operation Key features Effective HOL blocking elimination in networks with distributed routing Implicit congestion points identification, detecting flows heading to them just by inspecting packet destination Congestion information is based on destinations instead of turnpools and it represents any network congested point New CAM structure, new detection, propagation and resource management policies HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 47

FBICM operation IQ Switch architecture Normal Flow Queues (NFQ) Congested Flow Queues (CFQ) Separate non-congested flows from congested flows HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 48

FBICM operation CAM structure Congested flow identification fields Congested port, Hops to reach, Destination list, Next CFQ (Congested Flow Queue) Flow Control fields Stop & Go, Sent Stop, Receiving control HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 49

FBICM operation Congestion detection Primary CFQ + Threshold NFQ exceeded Switch 1 CAM allocation Switch 2 P4 P4 P5 NFQ CFQ CAM P5 P6 P6 P7 P7 New CAM Line Information: Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 50

FBICM operation Packet Processing Congested Packets are stored in the CFQ Switch 1 Switch 2 P4 P4 P5 NFQ CFQ CAM P5 P6 P6 P7 P7 New CAM Line Information: Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 51

FBICM operation Packet Processing Congested Packets are stored in the CFQ Switch 1 Switch 2 P4 P4 P5 NFQ CFQ CAM P5 P6 P6 P7 P7 New CAM Line Information: Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 52

FBICM operation Packet Processing Congested Packets are stored in the CFQ Switch 1 Switch 2 P4 P4 P5 NFQ CFQ CAM P5 P6 P6 P7 P7 New CAM Line Information: Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 53

FBICM operation Packet Processing Congested Packets are stored in the CFQ Switch 1 Switch 2 P4 P4 P5 NFQ CFQ CAM P5 P6 P6 P7 P7 New CAM Line Information: Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 54

FBICM operation Packet Processing Switch 1 Switch 2 HOL blocking is avoided P4 P4 P5 NFQ CFQ CAM P5 P6 P6 P7 P7 New CAM Line Information: Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 55

FBICM operation Packet Processing Switch 1 Switch 2 HOL blocking is avoided P4 P4 P5 NFQ CFQ CAM P5 P6 P6 P7 P7 New CAM Line Information: Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 56

FBICM operation Packet Processing Switch 1 Switch 2 HOL blocking is avoided P4 P4 P5 NFQ CFQ CAM P5 P6 P6 P7 P7 New CAM Line Information: Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 57

FBICM operation Congestion Information Propagation Switch 1 Switch 2 P4 P4 P5 NFQ CFQ CAM 0 P5 P6 P6 P7 P7 New CAM Line Information: Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 58

FBICM operation Congestion Information Propagation The threshold is exceeded in the CFQ Switch 1 Switch 2 P4 P4 CAM 1 P5 P6 NFQ CFQ external Stop CAM 0 P5 P6 P7 P7 New CAM Line Information: New CAM allocation (copy of CAM 0) CAM 0 CAM 1 Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: 0 Stop: true HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 59

FBICM operation Congestion Information Propagation Switch 1 port and matches with congestion Switch 2 Internal Stop P4 Congested packet reaches output information in CAM 1 P4 P5 NFQ CFQ P5 CAM 1 CAM 0 P6 P6 P7 P7 New CAM Line Information: CAM 0 Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null CAM 1 Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: 0 Stop: true HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 60

FBICM operation Congestion Information Propagation Switch 1 updating congested point Switch 2 Internal Stop NFQ P4 CFQ New CAM + CFQ allocated information P4 CAM 2 P5 NFQ CFQ P5 CAM 1 CAM 0 P6 P6 P7 P7 New CAM Line Information: CAM 0 Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null CAM 1 Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: 0 Stop: true CAM 2 Active Cong_Port: P6 Hops: 2 Destination_list NextCFQ: 1 Stop: true HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 61

FBICM operation Congestion Information Propagation Switch 1 Congestion tree branch Switch 2 NFQ CFQ CAM 2 P4 P4 CAM 1 P5 P6 NFQ CFQ CAM 0 P5 P6 P7 P7 Congestion tree resources will be released dynamically New CAM Line Information: CAM 0 Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: null CAM 1 Active Cong_Port: P6 Hops: 1 Destination_list NextCFQ: 0 Stop: true CAM 2 Active Cong_Port: P6 Hops: 2 Destination_list NextCFQ: 1 Stop: true HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 62

FBICM evaluation Network Throughput vs. Load (Config. 1) Random Uniform Traffic BMIN 64 x 64 Congested Traffic BMIN 64 x 64 HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 63

FBICM evaluation Network Throughput vs. Time (Config. 1) Real Traffic (CF = 20) BMIN 64 x 64 Real Traffic (CF = 40) BMIN 64 x 64 HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 64

Hybrid congestion management strategy Key ideas Use FBICM to quickly and locally eliminate HOL blocking, propagating congestion information and allocating buffers as necessary Use reactive congestion management to slowly eliminate congestion, deallocating FBICM buffers whenever possible Use of FBICM provides immediate response and allows reactive congestion management to be tuned for slow reaction, thus avoiding oscillations Reactive congestion management drastically reduces FBICM buffer requirements (just one buffer per port) HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 65

Conclusions Interconnects performance : Key when power management enabled Goal: To achieve best possible behavior with limited number of resources Congestion (HOL blocking): Serious menace DBBM and OBQA reduce HOL blocking in specific scenarios techniques ) with a small set of resources (ad-hoc Reactive congestion management: Does not scale well. Can be improved RECN: efficiently eliminates HOL blocking in Source Routing Networks FBICM: efficiently eliminates HOL blocking in Distributed Deterministic Routing Networks Hybrid congestion management: Mechanisms help each other HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 66

Acknowledgements Pedro García (University of Castilla La Mancha) developed the research on RECN and FBICM under my guidance and prepared part of the slides in this presentation The techniques to enhance reactive congestion management are being developed in collaboration with Simula Research Laboratory (Oslo) HPC Advisory Council Workshop. March 21-23, 2011 - Congestion Management 67

Thanks!! Any question? Conference title 68