Congestion Management for Ethernet-based Lossless DataCenter Networks

Size: px
Start display at page:

Download "Congestion Management for Ethernet-based Lossless DataCenter Networks"

Transcription

1 Congestion Management for Ethernet-based Lossless DataCenter Networks Pedro Javier Garcia 1, Jesus Escudero-Sahuquillo 1, Francisco J. Quiles 1 and Jose Duato 2 1: University of Castilla-La Mancha (UCLM) 2: Technical University València (UPV) NENDICA DCN: ICne

2 Abstract This paper describes congestion phenomena in lossless data center networks and its nega- tive consequences. It explores proposed solutions, analyzing their pros and cons to determine which are suited to the requirements of modern data centers. Conclusions identify important issues that should be addressed in the future.

3 Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

4 Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

5 Introduction On-Line Data Intensive (OLDI) Services [Congdon18] Require immediate answers to requests that are coming in at a high rate. End-user experience is highly dependent upon the system responsiveness. The network becomes a significant component of overall DC latency when congestion occurs in the network. Deadline = 250 ms Request Aggregator Deadline = 50 ms Aggregator Aggregator... Aggregator Deadline = 10 ms Worker Worker... Worker Worker Worker... Worker

6 Introduction Data-Center Networks (DCNs) Todays DCNs require a flexible fabric for carrying in a convergent way traffic from different types of applications, storage of control. Latency is a concern: Fabric design for DCNs must minimize or eliminate packet loss, provide high throughput and maintain low latency. These goals are crucial for applications of OLDI, Deep Learning, NVMe over Fabrics and the Cloudified Central Offices. However, congestion threatens these applications.

7 Introduction Why congestion isolation is needed? HoL-blocking dramatically degrades the network performance (e.g. PFC has not enough granularity and there is no congested flow identification) [Garcia05]. Classical e2e congestion control for lossless networks is difficult to tune, reacts slowly, and may introduce oscillations and instability [Escudero11]. Network Throughput (normalized) HS starts HS = traffic injected to Hot Spot destination HS ends 1Q ITh VOQnet 0 1e+06 2e+06 3e+06 4e+06 5e+06 Time (nanoseconds) 64-node CLOS network, 4 hot-spots

8 Introduction Why congestion isolation is needed? Src. A 33% Sw. 1 33% Sw. 5 Congested flows (Dst. X) Non-congested flows (Dst. Y) 33% Non-congested flows (Dst. Z) Src. B 33% Sw. 2 66% Sw. 6 33% Sw % Dst. X 33% 33% Src. C Src. D 33% Sw. 3 33% Sw. 7 33% 66% Sw. 9 Dst. Y 33% Dst. Z Src. E 33% Sw. 4 33% High-Order HoL-blocking Low-Order HoL-blocking 33 % Sending 33 % Stopped 33 % Sending 33 % Stopped 33 % Sending 33 % Sending

9 Introduction Why congestion isolation is needed? We need a congestion isolation (CI) mechanism that reacts quickly when transient congestion situations appear, preventing network performance degradation caused by the HoL blocking. We want a CI mechanism that complements other technologies available in the DCNs, so that CI improves their performance, while the others reduce the CI complexity.

10 Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

11 Congestion Dynamics in DCNs Appearance of Congestion Congestion Congestion Injection rate at 100% of the link bandwidth (full rate) Injection rate at 100% of the link bandwidth (full rate) Speedup = 1 Speedup = 2 Congestion (t0+t) Congestion (t0) Congestion (t0) Congestion (t0+t) Injection rate at 100% of the link bandwidth (full rate) Injection rate at 100% of the link bandwidth (full rate) Speedup = 2 Speedup = 1.5

12 Congestion Dynamics in DCNs Growth of Congestion Trees (from root to leaves) Switch 1 Switch 3 Switch speedup = 1.5 Packet flows Congestion point Switch 5 Switch 2 Switch 4

13 Congestion Dynamics in DCNs Growth of Congestion Trees (from leaves to root) Switch speedup = 1.5 Packet flows Congestion point Switch 1 Switch 5 Switch 2 Switch 7 Switch 3 Switch 6 Switch 4

14 Congestion Dynamics in DCNs Growth of Congestion Trees (Roots movement) Switch speedup = 1.5 Packet flows (start) Packet flows (after) Congestion point Switch 1 Switch 1 Switch 3 Switch 3 Switch 2 Switch 2

15 Congestion Dynamics in DCNs Growth of Congestion Trees (in-network roots) Switch 1 Switch 5 Switch 2 Switch 7 Switch 8 X Y Switch 3 Switch 6 Switch speedup = 1.5 Packet flows addressed to X Packet flows addressed to Y Congestion point Switch 4

16 Congestion Dynamics in DCNs Growth of Congestion Trees (Overlapping) X Switch 1 Switch 4 Switch 8 Switch 2 Switch 5 Switch 7 Y Switch 3 Switch 6 Switch speedup = 1.5 Packet flows addressed to X Packet flows addressed to Y Congestion point Switch 9

17 Congestion Dynamics in DCNs Growth of Congestion Trees (Vanishing) Switch speedup = 1.5 Permanent packet flows Packet flows disappearing first Congestion point first appeared in the switch Switch 1 Switch 1 Switch 3 Switch 3 Switch 2 Switch 2

18 Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

19 Reducing Congestion Incast congestion reduction - ECMP

20 Reducing Congestion In-network congestion reduction - ECN X Switch 1 Switch 4 Switch 8 Switch 2 Switch 5 Switch 7 Y Switch 3 Switch 6 Switch speedup = 1.5 Packet flows addressed to X Packet flows addressed to Y Victim flow Congestion point Switch 9

21 Reducing Congestion Limitations of current technologies [Escudero19] These technologies may work together to eliminate loss in the cloud data center network. Load-balancing and destination scheduling are end-toend solutions incurring in the RTT delays when congestion appear. However, there is no time for loss in the network due to congestion and congestion trees grow very quickly. Transient congestion may still produce HoL blocking that leads to increase latency, lower throughput and buffers overflow, significantly degrading performance. Even using these mechanisms, we still need something to deal with HOL Blocking locally and fast.

22 Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

23 Combining Congestion Management Mechanisms CI is needed to react locally and very fast to immediately eliminate HoL blocking. Previous technologies reduce the use of PFC and ECN, but their closed- and open-loop approach cause delays still happening. Congestion trees appear suddenly, are difficult to predict (even worse when load balancing is applied) and grow quickly. New techniques can be applied in combination to the previous technologies, improving their behavior.

24 Combining Congestion Management Mechanisms Dynamic Virtual Lanes (DVL) Switch A Switch B P1 CFQ ncfq CFQ ncfq P3 CIP P1 CFQ ncfq CFQ ncfq P3 Congestion Root P2 CFQ P4 P2 CFQ P4 ncfq Legend Output port requested by the packet on top. Congestion root. Congestion Isolation Packets (CIP). Packets from congested flows. Packets from non-congested flows. ncfq

25 Agenda Introduction Congestion Dynamics in DCNs Reducing In-Network and Incast Congestion Combining Congestion Management Mechanisms Conclusions

26 References [Duato03] J. Duato, S. Yalamanchili, and L. M. Ni, Interconnection Networks: An Engineering Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers, [Garcia05] P. J. Garcia, J. Flich, J. Duato, I. Johnson, F. J. Quiles, and F. Naven, Dynamic Evolution of Congestion Trees: Analysis and Impact on Switch Architecture, in High Performance Embedded Architectures and Compilers, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, Nov. 2005, pp [Congdon18] Paul Congdon, IEEE 802 Nendica Report: The Lossless Network for Data Centers, IEEE-SA Industry Connections White Paper, August [Leiserson85] C. E. Leiserson, Fat-trees: Universal networks for hardware-efficient supercomputing, IEEE Transactions on Computers, vol. C-34, pp , Oct [Escudero11] Jesús Escudero-Sahuquillo, Ernst Gunnar Gran, Pedro Javier García, Jose Flich, Tor Skeie, Olav Lysne, Francisco J. Quiles, José Duato: Combining Congested-Flow Isolation and Injection Throttling in HPC Interconnection Networks. ICPP 2011: [Escudero19] Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, José Duato: P802.1Qcz interworking with otherdata center technologies. IEEE Plenary Meeting, San Diego, CA, USA July 8, 2018 (cz-escudero-sahuquillo-ci-internetworking-0718-v1.pdf)

Congestion Management in HPC

Congestion Management in HPC Congestion Management in HPC Interconnection Networks Pedro J. García Universidad de Castilla-La Mancha (SPAIN) Conference title 1 Outline Why may congestion become a problem? Should we care about congestion

More information

36 IEEE POTENTIALS /07/$ IEEE

36 IEEE POTENTIALS /07/$ IEEE INTERCONNECTION NETWORKS ARE A KEY ELEMENT IN a wide variety of systems: massive parallel processors, local and system area networks, clusters of PCs and workstations, and Internet Protocol routers. They

More information

Congestion Management in Lossless Interconnects: Challenges and Benefits

Congestion Management in Lossless Interconnects: Challenges and Benefits Congestion Management in Lossless Interconnects: Challenges and Benefits José Duato Technical University of Valencia (SPAIN) Conference title 1 Outline Why is congestion management required? Benefits Congestion

More information

An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing

An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing An Effective Queuing Scheme to Provide Slim Fly topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing Pedro Yébenes 1, Jesús Escudero-Sahuquillo 1, Pedro J. García 1, Francisco

More information

IEEE P802.1Qcz Proposed Project for Congestion Isolation

IEEE P802.1Qcz Proposed Project for Congestion Isolation IEEE P82.1Qcz Proposed Project for Congestion Isolation IETF 11 London ICCRG Paul Congdon paul.congdon@tallac.com Project Background P82.1Qcz Project Initiation November 217 - Agreed to develop a Project

More information

Requirement Discussion of Flow-Based Flow Control(FFC)

Requirement Discussion of Flow-Based Flow Control(FFC) Requirement Discussion of Flow-Based Flow Control(FFC) Nongda Hu Yolanda Yu hunongda@huawei.com yolanda.yu@huawei.com IEEE 802.1 DCB, Stuttgart, May 2017 www.huawei.com new-dcb-yolanda-ffc-proposal-0517-v01

More information

P802.1Qcz Congestion Isolation

P802.1Qcz Congestion Isolation P802.1Qcz Congestion Isolation IEEE 802 / IETF Workshop on Data Center Networking Bangkok November 2018 Paul Congdon (Huawei/Tallac) The Case for Low-latency, Lossless, Large-Scale DCNs More and more latency-sensitive

More information

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope

More information

Jesus Escudero-Sahuquillo Universidad de Castilla-La Mancha (UCLM) SPAIN

Jesus Escudero-Sahuquillo Universidad de Castilla-La Mancha (UCLM) SPAIN Pedro Javier Garcia Jesus Escudero-Sahuquillo Universidad de Castilla-La Mancha (UCLM) SPAIN Universidad de Castilla-La Mancha (UCLM) SPAIN Style Powered tby: Conference itle 1 March 12, Barcelona, Spain

More information

InfiniBand Congestion Control

InfiniBand Congestion Control InfiniBand Congestion Control Modelling and validation ABSTRACT Ernst Gunnar Gran Simula Research Laboratory Martin Linges vei 17 1325 Lysaker, Norway ernstgr@simula.no In a lossless interconnection network

More information

IEEE-SA Industry Connections White Paper IEEE 802 Nendica Report: The Lossless Network for Data Centers

IEEE-SA Industry Connections White Paper IEEE 802 Nendica Report: The Lossless Network for Data Centers IEEE-SA Industry Connections White Paper IEEE 802 Nendica Report: The Lossless Network for Data Centers IEEE 3 Park Avenue New York, NY 10016-5997 USA IEEE 802 Nendica Report: The Lossless Network for

More information

Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter. Glenn Judd Morgan Stanley

Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter. Glenn Judd Morgan Stanley Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter Glenn Judd Morgan Stanley 1 Introduction Datacenter computing pervasive Beyond the Internet services domain BigData, Grid Computing,

More information

IEEE-SA Industry Connections Report. The Lossless Network. For Data Centers

IEEE-SA Industry Connections Report. The Lossless Network. For Data Centers IEEE-SA Industry Connections Report The Lossless Network For Data Centers IEEE 3 Park Avenue New York, NY 10016-5997 USA The Lossless Network for Data Centers i Trademarks and Disclaimers IEEE believes

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

DIBS: Just-in-time congestion mitigation for Data Centers

DIBS: Just-in-time congestion mitigation for Data Centers DIBS: Just-in-time congestion mitigation for Data Centers Kyriakos Zarifis, Rui Miao, Matt Calder, Ethan Katz-Bassett, Minlan Yu, Jitendra Padhye University of Southern California Microsoft Research Summary

More information

Dynamic Network Reconfiguration for Switch-based Networks

Dynamic Network Reconfiguration for Switch-based Networks Dynamic Network Reconfiguration for Switch-based Networks Ms. Deepti Metri 1, Prof. A. V. Mophare 2 1Student, Computer Science and Engineering, N. B. N. Sinhgad College of Engineering, Maharashtra, India

More information

THE LOSSLESS NETWORK. For Data Centers

THE LOSSLESS NETWORK. For Data Centers THE LOSSLESS NETWORK For Data Centers IEEE 802 Network Enhancements for the Next Decade IEEE-SA Industry Connections Revision 1.0 February 1, 2018 Contents Abstract... 2 Contributors... 2 Our Digital Lives

More information

Baidu s Best Practice with Low Latency Networks

Baidu s Best Practice with Low Latency Networks Baidu s Best Practice with Low Latency Networks Feng Gao IEEE 802 IC NEND Orlando, FL November 2017 Presented by Huawei Low Latency Network Solutions 01 1. Background Introduction 2. Network Latency Analysis

More information

15-744: Computer Networking. Data Center Networking II

15-744: Computer Networking. Data Center Networking II 15-744: Computer Networking Data Center Networking II Overview Data Center Topology Scheduling Data Center Packet Scheduling 2 Current solutions for increasing data center network bandwidth FatTree BCube

More information

High Node Count - Scalability Challenges for Interconnection Networks

High Node Count - Scalability Challenges for Interconnection Networks High Node Count - Scalability Challenges for Interconnection Networks Professor Olav Lysne Simula Research Laboratory Overview Congestion control Fault Tolerance Scalable Modular Routing State Of The Art:

More information

Advanced Computer Networks. Datacenter TCP

Advanced Computer Networks. Datacenter TCP Advanced Computer Networks 263 3501 00 Datacenter TCP Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Today Problems with TCP in the Data Center TCP Incast TPC timeouts Improvements

More information

RDMA over Commodity Ethernet at Scale

RDMA over Commodity Ethernet at Scale RDMA over Commodity Ethernet at Scale Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitendra Padhye, Marina Lipshteyn ACM SIGCOMM 2016 August 24 2016 Outline RDMA/RoCEv2 background DSCP-based

More information

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks Xuan-Yi Lin, Yeh-Ching Chung, and Tai-Yi Huang Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan 00, ROC

More information

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat ACM SIGCOMM 2013, 12-16 August, Hong Kong, China Virtualized Server 1 Application Performance in Virtualized

More information

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity

More information

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department

UNIVERSITY OF CASTILLA-LA MANCHA. Computing Systems Department UNIVERSITY OF CASTILLA-LA MANCHA Computing Systems Department A case study on implementing virtual 5D torus networks using network components of lower dimensionality HiPINEB 2017 Francisco José Andújar

More information

Switching Architectures for Cloud Network Designs

Switching Architectures for Cloud Network Designs Switching Architectures for Cloud Network Designs Networks today require predictable performance and are much more aware of application flows than traditional networks with static addressing of devices.

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 Oriana Riva, Department of Computer Science ETH Zürich 1 Today Flow Control Store-and-forward,

More information

Congestion in InfiniBand Networks

Congestion in InfiniBand Networks Congestion in InfiniBand Networks Philip Williams Stanford University EE382C Abstract The InfiniBand Architecture (IBA) is a relatively new industry-standard networking technology suited for inter-processor

More information

Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden

Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden Networking Recap Storage Intro CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden Networking Recap Storage Intro Long Haul/Global Networking Speed of light is limiting; Latency has a lower bound (.) Throughput

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

Lecture 16: Data Center Network Architectures

Lecture 16: Data Center Network Architectures MIT 6.829: Computer Networks Fall 2017 Lecture 16: Data Center Network Architectures Scribe: Alex Lombardi, Danielle Olson, Nicholas Selby 1 Background on Data Centers Computing, storage, and networking

More information

Router s Queue Management

Router s Queue Management Router s Queue Management Manages sharing of (i) buffer space (ii) bandwidth Q1: Which packet to drop when queue is full? Q2: Which packet to send next? FIFO + Drop Tail Keep a single queue Answer to Q1:

More information

XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platforms. Presented by Wei Dai

XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platforms. Presented by Wei Dai XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster Platforms Presented by Wei Dai Reasons for Congestion in Cloud Cloud operators use virtualization to consolidate

More information

Mellanox Virtual Modular Switch

Mellanox Virtual Modular Switch WHITE PAPER July 2015 Mellanox Virtual Modular Switch Introduction...1 Considerations for Data Center Aggregation Switching...1 Virtual Modular Switch Architecture - Dual-Tier 40/56/100GbE Aggregation...2

More information

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ P. López, J. Flich and J. Duato Dept. of Computing Engineering (DISCA) Universidad Politécnica de Valencia, Valencia, Spain plopez@gap.upv.es

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

Knowledge-Defined Network Orchestration in a Hybrid Optical/Electrical Datacenter Network

Knowledge-Defined Network Orchestration in a Hybrid Optical/Electrical Datacenter Network Knowledge-Defined Network Orchestration in a Hybrid Optical/Electrical Datacenter Network Wei Lu (Postdoctoral Researcher) On behalf of Prof. Zuqing Zhu University of Science and Technology of China, Hefei,

More information

Cisco Data Center Ethernet

Cisco Data Center Ethernet Cisco Data Center Ethernet Q. What is Data Center Ethernet? Is it a product, a protocol, or a solution? A. The Cisco Data Center Ethernet architecture is a collection of Ethernet extensions providing enhancements

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Paving the Road to Exascale Computing. Yossi Avni

Paving the Road to Exascale Computing. Yossi Avni Paving the Road to Exascale Computing Yossi Avni HPC@mellanox.com Connectivity Solutions for Efficient Computing Enterprise HPC High-end HPC HPC Clouds ICs Mellanox Interconnect Networking Solutions Adapter

More information

Routing protocols behaviour under bandwidth limitation

Routing protocols behaviour under bandwidth limitation 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Routing protocols behaviour under bandwidth limitation Cosmin Adomnicăi

More information

University of Castilla-La Mancha

University of Castilla-La Mancha University of Castilla-La Mancha A publication of the Computing Systems Department Implementing the Advanced Switching Fabric Discovery Process by Antonio Robles-Gomez, Aurelio Bermúdez, Rafael Casado,

More information

DATA CENTER FABRIC COOKBOOK

DATA CENTER FABRIC COOKBOOK Do It Yourself! DATA CENTER FABRIC COOKBOOK How to prepare something new from well known ingredients Emil Gągała WHAT DOES AN IDEAL FABRIC LOOK LIKE? 2 Copyright 2011 Juniper Networks, Inc. www.juniper.net

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Maelstrom: An Enterprise Continuity Protocol for Financial Datacenters

Maelstrom: An Enterprise Continuity Protocol for Financial Datacenters Maelstrom: An Enterprise Continuity Protocol for Financial Datacenters Mahesh Balakrishnan, Tudor Marian, Hakim Weatherspoon Cornell University, Ithaca, NY Datacenters Internet Services (90s) Websites,

More information

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox InfiniBand Host Channel Adapters (HCA) enable the highest data center

More information

Data Center TCP (DCTCP)

Data Center TCP (DCTCP) Data Center Packet Transport Data Center TCP (DCTCP) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Cloud computing

More information

Cross-Layer Flow and Congestion Control for Datacenter Networks

Cross-Layer Flow and Congestion Control for Datacenter Networks Cross-Layer Flow and Congestion Control for Datacenter Networks Andreea Simona Anghel, Robert Birke, Daniel Crisan and Mitch Gusat IBM Research GmbH, Zürich Research Laboratory Outline Motivation CEE impact

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

Lecture 15: Datacenter TCP"

Lecture 15: Datacenter TCP Lecture 15: Datacenter TCP" CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Mohammad Alizadeh Lecture 15 Overview" Datacenter workload discussion DC-TCP Overview 2 Datacenter Review"

More information

Lecture 21: Congestion Control" CSE 123: Computer Networks Alex C. Snoeren

Lecture 21: Congestion Control CSE 123: Computer Networks Alex C. Snoeren Lecture 21: Congestion Control" CSE 123: Computer Networks Alex C. Snoeren Lecture 21 Overview" How fast should a sending host transmit data? Not to fast, not to slow, just right Should not be faster than

More information

Adaptive Routing Strategies for Modern High Performance Networks

Adaptive Routing Strategies for Modern High Performance Networks Adaptive Routing Strategies for Modern High Performance Networks Patrick Geoffray Myricom patrick@myri.com Torsten Hoefler Indiana University htor@cs.indiana.edu 28 August 2008 Hot Interconnect Stanford,

More information

Extending commodity OpenFlow switches for large-scale HPC deployments

Extending commodity OpenFlow switches for large-scale HPC deployments Extending commodity OpenFlow switches for large-scale HPC deployments Mariano Benito Enrique Vallejo Ramón Beivide Cruz Izu University of Cantabria The University of Adelaide Overview 1.Introduction 1.

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

Discussion of Congestion Isolation Changes to 802.1Q

Discussion of Congestion Isolation Changes to 802.1Q Discussion of Congestion Isolation Changes to 802.1Q Paul Congdon (Huawei), IEEE 802.1 DCB Geneva, Switzerland January 2018 High Level Questions Do we support and define end-station behavior? Should we

More information

Design of a Tile-based High-Radix Switch with High Throughput

Design of a Tile-based High-Radix Switch with High Throughput 2011 2nd International Conference on Networking and Information Technology IPCSIT vol.17 (2011) (2011) IACSIT Press, Singapore Design of a Tile-based High-Radix Switch with High Throughput Wang Kefei 1,

More information

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Peng Wang, Hong Xu, Zhixiong Niu, Dongsu Han, Yongqiang Xiong ACM SoCC 2016, Oct 5-7, Santa Clara Motivation Datacenter networks

More information

CSE 123A Computer Networks

CSE 123A Computer Networks CSE 123A Computer Networks Winter 2005 Lecture 8: IP Router Design Many portions courtesy Nick McKeown Overview Router basics Interconnection architecture Input Queuing Output Queuing Virtual output Queuing

More information

ETHERNET ENHANCEMENTS FOR STORAGE. Sunil Ahluwalia, Intel Corporation

ETHERNET ENHANCEMENTS FOR STORAGE. Sunil Ahluwalia, Intel Corporation ETHERNET ENHANCEMENTS FOR STORAGE Sunil Ahluwalia, Intel Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use

More information

FM4000. A Scalable, Low-latency 10 GigE Switch for High-performance Data Centers

FM4000. A Scalable, Low-latency 10 GigE Switch for High-performance Data Centers A Scalable, Low-latency 10 GigE Switch for High-performance Data Centers Uri Cummings Rebecca Collins Virat Agarwal Dan Daly Fabrizio Petrini Michael Perrone Davide Pasetto Hot Interconnects 17 (Aug 2009)

More information

ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS

ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS ANALYSIS AND IMPROVEMENT OF VALIANT ROUTING IN LOW- DIAMETER NETWORKS Mariano Benito Pablo Fuentes Enrique Vallejo Ramón Beivide With support from: 4th IEEE International Workshop of High-Perfomance Interconnection

More information

NVMe Over Fabrics (NVMe-oF)

NVMe Over Fabrics (NVMe-oF) NVMe Over Fabrics (NVMe-oF) High Performance Flash Moves to Ethernet Rob Davis Vice President Storage Technology, Mellanox Santa Clara, CA 1 Access Time Access in Time Micro (micro-sec) Seconds Why NVMe

More information

Advanced Computer Networks. Datacenter TCP

Advanced Computer Networks. Datacenter TCP Advanced Computer Networks 263 3501 00 Datacenter TCP Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 1 Oriana Riva, Department of Computer Science ETH Zürich Last week Datacenter Fabric Portland

More information

Congestion Control in Datacenters. Ahmed Saeed

Congestion Control in Datacenters. Ahmed Saeed Congestion Control in Datacenters Ahmed Saeed What is a Datacenter? Tens of thousands of machines in the same building (or adjacent buildings) Hundreds of switches connecting all machines What is a Datacenter?

More information

Data Center Network Topologies II

Data Center Network Topologies II Data Center Network Topologies II Hakim Weatherspoon Associate Professor, Dept of Computer cience C 5413: High Performance ystems and Networking April 10, 2017 March 31, 2017 Agenda for semester Project

More information

Lecture 16: Router Design

Lecture 16: Router Design Lecture 16: Router Design CSE 123: Computer Networks Alex C. Snoeren Eample courtesy Mike Freedman Lecture 16 Overview End-to-end lookup and forwarding example Router internals Buffering Scheduling 2 Example:

More information

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552)

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552) Packet Scheduling in Data Centers Lecture 17, Computer Networks (198:552) Datacenter transport Goal: Complete flows quickly / meet deadlines Short flows (e.g., query, coordination) Large flows (e.g., data

More information

USING HIGH PERFORMANCE NETWORK INTERCONNECTS IN DYNAMIC ENVIRONMENTS

USING HIGH PERFORMANCE NETWORK INTERCONNECTS IN DYNAMIC ENVIRONMENTS 12 th ANNUAL WORKSHOP 2016 USING HIGH PERFORMANCE NETWORK INTERCONNECTS IN DYNAMIC ENVIRONMENTS Vangelis Tasoulas Simula Research Laboratory [ April 7 th, 2016 ] ACKNOWLEDGEMENTS Feroz Zahid, Ernst Gunnar

More information

Lecture 14: Congestion Control"

Lecture 14: Congestion Control Lecture 14: Congestion Control" CSE 222A: Computer Communication Networks George Porter Thanks: Amin Vahdat, Dina Katabi and Alex C. Snoeren Lecture 14 Overview" TCP congestion control review Dukkipati

More information

Industry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson

Industry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson Industry Standards for the Exponential Growth of Data Center Bandwidth and Management Craig W. Carlson 2 Or Finding the Fat Pipe through standards Creative Commons, Flikr User davepaker Overview Part of

More information

The Best Ethernet Storage Fabric

The Best Ethernet Storage Fabric The Best Ethernet Storage Fabric John F. Kim & Amit Katz Santa Clara, CA August 2017 1 Storage Networking Background: From Fibre Channel to Ethernet 1997 2017 Feature Fibre Channel Ethernet Bandwidth 1

More information

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era Massimiliano Sbaraglia Network Engineer Introduction In the cloud computing era, distributed architecture is used to handle operations of mass data, such as the storage, mining, querying, and searching

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Messaging Overview. Introduction. Gen-Z Messaging

Messaging Overview. Introduction. Gen-Z Messaging Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional

More information

APPLICATION NOTE. XCellAir s Wi-Fi Radio Resource Optimization Solution. Features, Test Results & Methodology

APPLICATION NOTE. XCellAir s Wi-Fi Radio Resource Optimization Solution. Features, Test Results & Methodology APPLICATION NOTE XCellAir s Wi-Fi Radio Resource Optimization Solution Features, Test Results & Methodology Introduction Multi Service Operators (MSOs) and Internet service providers have been aggressively

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

Efficient Switches with QoS Support for Clusters

Efficient Switches with QoS Support for Clusters Efficient Switches with QoS Support for Clusters Alejandro Martínez, Francisco J. Alfaro,JoséL.Sánchez,José Duato 2 DSI - Univ. of Castilla-La Mancha 2 DISCA - Tech. Univ. of Valencia 27 - Albacete, Spain

More information

SUPERNA RPO REPORTING AND BROCADE IP EXTENSION WITH ISILON SYNCIQ

SUPERNA RPO REPORTING AND BROCADE IP EXTENSION WITH ISILON SYNCIQ SUPERNA RPO REPORTING AND BROCADE IP EXTENSION WITH ISILON SYNCIQ Reduce risk and data loss exposure with the Eyeglass RPO Reporting and Brocade 7840 IP Extension solution for Isilon SyncIQ SOLUTION ESSENTIALS

More information

The benefits Arista s LANZ functionality will provide to network administrators: Real time visibility of congestion hotspots at the microbursts level

The benefits Arista s LANZ functionality will provide to network administrators: Real time visibility of congestion hotspots at the microbursts level Arista LANZ Overview Overview Arista Networks Latency Analyzer (LANZ) represents the next step in the revolution in delivering real-time network performance and congestion monitoring. For the first time,

More information

170 Index. Delta networks, DENS methodology

170 Index. Delta networks, DENS methodology Index A ACK messages, 99 adaptive timeout algorithm, 109 format and semantics, 107 pending packets, 105 piggybacking, 107 schematic represenation, 105 source adapter, 108 ACK overhead, 107 109, 112 Active

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Micro load balancing in data centers with DRILL

Micro load balancing in data centers with DRILL Micro load balancing in data centers with DRILL Soudeh Ghorbani (UIUC) Brighten Godfrey (UIUC) Yashar Ganjali (University of Toronto) Amin Firoozshahian (Intel) Where should the load balancing functionality

More information

Data center Networking: New advances and Challenges (Ethernet) Anupam Jagdish Chomal Principal Software Engineer DellEMC Isilon

Data center Networking: New advances and Challenges (Ethernet) Anupam Jagdish Chomal Principal Software Engineer DellEMC Isilon Data center Networking: New advances and Challenges (Ethernet) Anupam Jagdish Chomal Principal Software Engineer DellEMC Isilon Bitcoin mining Contd Main reason for bitcoin mines at Iceland is the natural

More information

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its

More information

Transport Protocols for Data Center Communication. Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti

Transport Protocols for Data Center Communication. Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti Transport Protocols for Data Center Communication Evisa Tsolakou Supervisor: Prof. Jörg Ott Advisor: Lect. Pasi Sarolahti Contents Motivation and Objectives Methodology Data Centers and Data Center Networks

More information

Arista 7020R Series: Q&A

Arista 7020R Series: Q&A 7020R Series: Q&A Document Arista 7020R Series: Q&A Product Overview What is the 7020R Series? The Arista 7020R Series, including the 7020SR, 7020TR and 7020TRA, offers a purpose built high performance

More information

Huawei CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments

Huawei CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments Huawei CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments CloudFabric Solution Optimized for High-Availability/Hyperscale/HPC Environments Internet Finance HPC VPC Industry

More information

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing J. Flich, M. P. Malumbres, P. López and J. Duato Dpto. Informática de Sistemas y Computadores Universidad Politécnica

More information

RoGUE: RDMA over Generic Unconverged Ethernet

RoGUE: RDMA over Generic Unconverged Ethernet RoGUE: RDMA over Generic Unconverged Ethernet Yanfang Le with Brent Stephens, Arjun Singhvi, Aditya Akella, Mike Swift RDMA Overview RDMA USER KERNEL Zero Copy Application Application Buffer Buffer HARWARE

More information

Revisiting Network Support for RDMA

Revisiting Network Support for RDMA Revisiting Network Support for RDMA Radhika Mittal 1, Alex Shpiner 3, Aurojit Panda 1, Eitan Zahavi 3, Arvind Krishnamurthy 2, Sylvia Ratnasamy 1, Scott Shenker 1 (1: UC Berkeley, 2: Univ. of Washington,

More information

Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures. Dimitra Simeonidou

Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures. Dimitra Simeonidou Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures Dimitra Simeonidou Challenges and Drivers for DC Evolution Data centres are growing in size and

More information

Deploying Data Center Switching Solutions

Deploying Data Center Switching Solutions Deploying Data Center Switching Solutions Choose the Best Fit for Your Use Case 1 Table of Contents Executive Summary... 3 Introduction... 3 Multivector Scaling... 3 Low On-Chip Memory ASIC Platforms...4

More information

Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig

Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig The new Data Center Bridging (DCB) protocols provide important mechanisms for enabling priority and managing

More information

Alizadeh, M. et al., " CONGA: distributed congestion-aware load balancing for datacenters," Proc. of ACM SIGCOMM '14, 44(4): , Oct

Alizadeh, M. et al.,  CONGA: distributed congestion-aware load balancing for datacenters, Proc. of ACM SIGCOMM '14, 44(4): , Oct CONGA Paper Review By Buting Ma and Taeju Park Paper Reference Alizadeh, M. et al., " CONGA: distributed congestion-aware load balancing for datacenters," Proc. of ACM SIGCOMM '14, 44(4):503-514, Oct.

More information

Evaluate Data Center Network Performance

Evaluate Data Center Network Performance Downloaded from orbit.dtu.dk on: Sep 02, 2018 Evaluate Data Center Network Performance Pilimon, Artur Publication date: 2018 Document Version Publisher's PDF, also known as Version of record Link back

More information

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters CONGA: Distributed Congestion-Aware Load Balancing for Datacenters By Alizadeh,M et al. Motivation Distributed datacenter applications require large bisection bandwidth Spine Presented by Andrew and Jack

More information

SPARTA: Scalable Per-Address RouTing Architecture

SPARTA: Scalable Per-Address RouTing Architecture SPARTA: Scalable Per-Address RouTing Architecture John Carter Data Center Networking IBM Research - Austin IBM Research Science & Technology IBM Research activities related to SDN / OpenFlow IBM Research

More information

SCALABLE STRATEGIES FOR ALLEVIATING THE HOL BLOCKING PRODUCED BY CONGESTION TREES IN LOSSLESS INTERCONNECTION NETWORKS

SCALABLE STRATEGIES FOR ALLEVIATING THE HOL BLOCKING PRODUCED BY CONGESTION TREES IN LOSSLESS INTERCONNECTION NETWORKS SCALABLE STRATEGIES FOR ALLEVIATING THE HOL BLOCKING PRODUCED BY CONGESTION TREES IN LOSSLESS INTERCONNECTION NETWORKS P. Nicolas Kokkalis, Njuguna Njoroge, Ernesto Staroswiecki EE382C Interconnection

More information

VM Aware Fibre Channel

VM Aware Fibre Channel White Paper White Paper VM-ID VM Aware Fibre Channel Virtual Machine Traffic Visibility for SANs StorFusion VM-ID feature on QLogic Gen6 (32G) and Enhanced Gen5 (16G) Fibre Channel KEY BENEFITS Increases

More information