Congestion Management for Ethernet-based Lossless DataCenter Networks

Similar documents
Congestion Management in HPC

36 IEEE POTENTIALS /07/$ IEEE

Congestion Management in Lossless Interconnects: Challenges and Benefits

An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing

IEEE P802.1Qcz Proposed Project for Congestion Isolation

Requirement Discussion of Flow-Based Flow Control(FFC)

P802.1Qcz Congestion Isolation

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Jesus Escudero-Sahuquillo Universidad de Castilla-La Mancha (UCLM) SPAIN

InfiniBand Congestion Control

IEEE-SA Industry Connections White Paper IEEE 802 Nendica Report: The Lossless Network for Data Centers

Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter. Glenn Judd Morgan Stanley

IEEE-SA Industry Connections Report. The Lossless Network. For Data Centers

Advanced Computer Networks. Flow Control

DIBS: Just-in-time congestion mitigation for Data Centers

Dynamic Network Reconfiguration for Switch-based Networks

THE LOSSLESS NETWORK. For Data Centers

Baidu s Best Practice with Low Latency Networks

15-744: Computer Networking. Data Center Networking II

High Node Count - Scalability Challenges for Interconnection Networks

Advanced Computer Networks. Datacenter TCP

RDMA over Commodity Ethernet at Scale

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department

Switching Architectures for Cloud Network Designs

Advanced Computer Networks. Flow Control

Congestion in InfiniBand Networks

Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden

Future Routing Schemes in Petascale clusters

Lecture 16: Data Center Network Architectures

Router s Queue Management

XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platforms. Presented by Wei Dai

Mellanox Virtual Modular Switch

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Knowledge-Defined Network Orchestration in a Hybrid Optical/Electrical Datacenter Network

Cisco Data Center Ethernet

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Paving the Road to Exascale Computing. Yossi Avni

Routing protocols behaviour under bandwidth limitation

University of Castilla-La Mancha

DATA CENTER FABRIC COOKBOOK

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Maelstrom: An Enterprise Continuity Protocol for Financial Datacenters

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Data Center TCP (DCTCP)

Cross-Layer Flow and Congestion Control for Datacenter Networks

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

Lecture 15: Datacenter TCP"

Lecture 21: Congestion Control" CSE 123: Computer Networks Alex C. Snoeren

Adaptive Routing Strategies for Modern High Performance Networks

Extending commodity OpenFlow switches for large-scale HPC deployments

DUE to the increasing computing power of microprocessors

Discussion of Congestion Isolation Changes to 802.1Q

Design of a Tile-based High-Radix Switch with High Throughput

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks

CSE 123A Computer Networks

ETHERNET ENHANCEMENTS FOR STORAGE. Sunil Ahluwalia, Intel Corporation

FM4000. A Scalable, Low-latency 10 GigE Switch for High-performance Data Centers

ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS

NVMe Over Fabrics (NVMe-oF)

Advanced Computer Networks. Datacenter TCP

Congestion Control in Datacenters. Ahmed Saeed

Data Center Network Topologies II

Lecture 16: Router Design

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552)

USING HIGH PERFORMANCE NETWORK INTERCONNECTS IN DYNAMIC ENVIRONMENTS

Lecture 14: Congestion Control"

Industry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson

The Best Ethernet Storage Fabric

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Messaging Overview. Introduction. Gen-Z Messaging

APPLICATION NOTE. XCellAir s Wi-Fi Radio Resource Optimization Solution. Features, Test Results & Methodology

Boosting the Performance of Myrinet Networks

Efficient Switches with QoS Support for Clusters

SUPERNA RPO REPORTING AND BROCADE IP EXTENSION WITH ISILON SYNCIQ

The benefits Arista s LANZ functionality will provide to network administrators: Real time visibility of congestion hotspots at the microbursts level

170 Index. Delta networks, DENS methodology

Basic Low Level Concepts

Micro load balancing in data centers with DRILL

Data center Networking: New advances and Challenges (Ethernet) Anupam Jagdish Chomal Principal Software Engineer DellEMC Isilon

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Transport Protocols for Data Center Communication. Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti

Arista 7020R Series: Q&A

Huawei CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

RoGUE: RDMA over Generic Unconverged Ethernet

Revisiting Network Support for RDMA

Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures. Dimitra Simeonidou

Deploying Data Center Switching Solutions

Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig

Alizadeh, M. et al., " CONGA: distributed congestion-aware load balancing for datacenters," Proc. of ACM SIGCOMM '14, 44(4): , Oct

Evaluate Data Center Network Performance

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters

SPARTA: Scalable Per-Address RouTing Architecture

SCALABLE STRATEGIES FOR ALLEVIATING THE HOL BLOCKING PRODUCED BY CONGESTION TREES IN LOSSLESS INTERCONNECTION NETWORKS

VM Aware Fibre Channel

Transcription:

Congestion Management for Ethernet-based Lossless DataCenter Networks Pedro Javier Garcia 1, Jesus Escudero-Sahuquillo 1, Francisco J. Quiles 1 and Jose Duato 2 1: University of Castilla-La Mancha (UCLM) 2: Technical University València (UPV) NENDICA DCN: 1-19-0012-00-ICne

Abstract This paper describes congestion phenomena in lossless data center networks and its nega- tive consequences. It explores proposed solutions, analyzing their pros and cons to determine which are suited to the requirements of modern data centers. Conclusions identify important issues that should be addressed in the future.

Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

Introduction On-Line Data Intensive (OLDI) Services [Congdon18] Require immediate answers to requests that are coming in at a high rate. End-user experience is highly dependent upon the system responsiveness. The network becomes a significant component of overall DC latency when congestion occurs in the network. Deadline = 250 ms Request Aggregator Deadline = 50 ms Aggregator Aggregator... Aggregator Deadline = 10 ms Worker Worker... Worker Worker Worker... Worker

Introduction Data-Center Networks (DCNs) Todays DCNs require a flexible fabric for carrying in a convergent way traffic from different types of applications, storage of control. Latency is a concern: Fabric design for DCNs must minimize or eliminate packet loss, provide high throughput and maintain low latency. These goals are crucial for applications of OLDI, Deep Learning, NVMe over Fabrics and the Cloudified Central Offices. However, congestion threatens these applications.

Introduction Why congestion isolation is needed? HoL-blocking dramatically degrades the network performance (e.g. PFC has not enough granularity and there is no congested flow identification) [Garcia05]. Classical e2e congestion control for lossless networks is difficult to tune, reacts slowly, and may introduce oscillations and instability [Escudero11]. Network Throughput (normalized) HS starts 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 HS = traffic injected to Hot Spot destination HS ends 1Q ITh VOQnet 0 1e+06 2e+06 3e+06 4e+06 5e+06 Time (nanoseconds) 64-node CLOS network, 4 hot-spots

Introduction Why congestion isolation is needed? Src. A 33% Sw. 1 33% Sw. 5 Congested flows (Dst. X) Non-congested flows (Dst. Y) 33% Non-congested flows (Dst. Z) Src. B 33% Sw. 2 66% Sw. 6 33% Sw. 8 100% Dst. X 33% 33% Src. C Src. D 33% Sw. 3 33% Sw. 7 33% 66% Sw. 9 Dst. Y 33% Dst. Z Src. E 33% Sw. 4 33% High-Order HoL-blocking Low-Order HoL-blocking 33 % Sending 33 % Stopped 33 % Sending 33 % Stopped 33 % Sending 33 % Sending

Introduction Why congestion isolation is needed? We need a congestion isolation (CI) mechanism that reacts quickly when transient congestion situations appear, preventing network performance degradation caused by the HoL blocking. We want a CI mechanism that complements other technologies available in the DCNs, so that CI improves their performance, while the others reduce the CI complexity.

Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

Congestion Dynamics in DCNs Appearance of Congestion Congestion Congestion Injection rate at 100% of the link bandwidth (full rate) Injection rate at 100% of the link bandwidth (full rate) Speedup = 1 Speedup = 2 Congestion (t0+t) Congestion (t0) Congestion (t0) Congestion (t0+t) Injection rate at 100% of the link bandwidth (full rate) Injection rate at 100% of the link bandwidth (full rate) Speedup = 2 Speedup = 1.5

Congestion Dynamics in DCNs Growth of Congestion Trees (from root to leaves) Switch 1 Switch 3 Switch speedup = 1.5 Packet flows Congestion point Switch 5 Switch 2 Switch 4

Congestion Dynamics in DCNs Growth of Congestion Trees (from leaves to root) Switch speedup = 1.5 Packet flows Congestion point Switch 1 Switch 5 Switch 2 Switch 7 Switch 3 Switch 6 Switch 4

Congestion Dynamics in DCNs Growth of Congestion Trees (Roots movement) Switch speedup = 1.5 Packet flows (start) Packet flows (after) Congestion point Switch 1 Switch 1 Switch 3 Switch 3 Switch 2 Switch 2

Congestion Dynamics in DCNs Growth of Congestion Trees (in-network roots) Switch 1 Switch 5 Switch 2 Switch 7 Switch 8 X Y Switch 3 Switch 6 Switch speedup = 1.5 Packet flows addressed to X Packet flows addressed to Y Congestion point Switch 4

Congestion Dynamics in DCNs Growth of Congestion Trees (Overlapping) X Switch 1 Switch 4 Switch 8 Switch 2 Switch 5 Switch 7 Y Switch 3 Switch 6 Switch speedup = 1.5 Packet flows addressed to X Packet flows addressed to Y Congestion point Switch 9

Congestion Dynamics in DCNs Growth of Congestion Trees (Vanishing) Switch speedup = 1.5 Permanent packet flows Packet flows disappearing first Congestion point first appeared in the switch Switch 1 Switch 1 Switch 3 Switch 3 Switch 2 Switch 2

Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

Reducing Congestion Incast congestion reduction - ECMP

Reducing Congestion In-network congestion reduction - ECN X Switch 1 Switch 4 Switch 8 Switch 2 Switch 5 Switch 7 Y Switch 3 Switch 6 Switch speedup = 1.5 Packet flows addressed to X Packet flows addressed to Y Victim flow Congestion point Switch 9

Reducing Congestion Limitations of current technologies [Escudero19] These technologies may work together to eliminate loss in the cloud data center network. Load-balancing and destination scheduling are end-toend solutions incurring in the RTT delays when congestion appear. However, there is no time for loss in the network due to congestion and congestion trees grow very quickly. Transient congestion may still produce HoL blocking that leads to increase latency, lower throughput and buffers overflow, significantly degrading performance. Even using these mechanisms, we still need something to deal with HOL Blocking locally and fast.

Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

Combining Congestion Management Mechanisms CI is needed to react locally and very fast to immediately eliminate HoL blocking. Previous technologies reduce the use of PFC and ECN, but their closed- and open-loop approach cause delays still happening. Congestion trees appear suddenly, are difficult to predict (even worse when load balancing is applied) and grow quickly. New techniques can be applied in combination to the previous technologies, improving their behavior.

Combining Congestion Management Mechanisms Dynamic Virtual Lanes (DVL) Switch A Switch B P1 CFQ ncfq CFQ ncfq P3 CIP P1 CFQ ncfq CFQ ncfq P3 Congestion Root P2 CFQ P4 P2 CFQ P4 ncfq Legend Output port requested by the packet on top. Congestion root. Congestion Isolation Packets (CIP). Packets from congested flows. Packets from non-congested flows. ncfq

Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

References [Duato03] J. Duato, S. Yalamanchili, and L. M. Ni, Interconnection Networks: An Engineering Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers, 2003. [Garcia05] P. J. Garcia, J. Flich, J. Duato, I. Johnson, F. J. Quiles, and F. Naven, Dynamic Evolution of Congestion Trees: Analysis and Impact on Switch Architecture, in High Performance Embedded Architectures and Compilers, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, Nov. 2005, pp. 266 285. [Congdon18] Paul Congdon, IEEE 802 Nendica Report: The Lossless Network for Data Centers, IEEE-SA Industry Connections White Paper, August 2018. [Leiserson85] C. E. Leiserson, Fat-trees: Universal networks for hardware-efficient supercomputing, IEEE Transactions on Computers, vol. C-34, pp. 892 901, Oct 1985. [Escudero11] Jesús Escudero-Sahuquillo, Ernst Gunnar Gran, Pedro Javier García, Jose Flich, Tor Skeie, Olav Lysne, Francisco J. Quiles, José Duato: Combining Congested-Flow Isolation and Injection Throttling in HPC Interconnection Networks. ICPP 2011: 662-672 [Escudero19] Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, José Duato: P802.1Qcz interworking with otherdata center technologies. IEEE 802.1 Plenary Meeting, San Diego, CA, USA July 8, 2018 (cz-escudero-sahuquillo-ci-internetworking-0718-v1.pdf)