Tagger: Practical PFC Deadlock Prevention in Data Center Networks

Size: px
Start display at page:

Download "Tagger: Practical PFC Deadlock Prevention in Data Center Networks"

Transcription

1 Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo* (Toutiao), Kun Tan*(Huawei), Jitendra Padhye, Kai Chen (HKUST) Microsoft CoNEXT 2017, Incheon, South Korea * Work done while at Microsoft 1

2 RDMA is Being Widely Deployed RDMA: Remote Direct Memory Access v High throughput, low latency with low CPU overhead v Microsoft, Google, etc. are deploying RDMA RDMA Application RDMA Application Kernel kernel bypass Lossless Network Kernel kernel bypass RDMA NIC (With PFC) RDMA NIC 2

3 Priority Flow Control (PFC) PAUSE Congestion PFC threshold: 3pkts PAUSE upstream switch when PFC threshold reached v Avoid packet drop due to buffer overflow 3

4 A Simple Illustration of PFC Deadlock Switch A PFC threshold PAUSE PAUSE Switch C PAUSE Switch B Due to Cyclic Buffer Dependency (CBD) A->B->C->A Not just a theoretical problem, we have seen it in our datacenters too! 4

5 CBD in the Clos Network 5

6 CBD in the Clos Network flow 1 flow 2 consider two flows initially follow shortest UP-DOWN paths 6

7 CBD in the Clos Network flow 1 flow 2 due to link failures, both flows are locally rerouted to non-shortest paths 7

8 CBD in the Clos Network RX RX RX RX RX RX flow 1 flow 2 these two DOWN-UP bounced flows create CBD buffer dependency graph CBD: ->->->-> 8

9 Real in Production Data Centers? Packet reroute measurements in more than 20 data centers: ~100,000 DOWN-UP reroutes! 9

10 Handling Deadlock is Important #1: transient problem à PERMANENT deadlock v Transient loops due to link failures v Packet flooding v #2: small deadlock can cause large deadlock PAUSE PAUSE PAUSE PAUSE deadlock PAUSE PAUSE PAUSE 10

11 Three Key Challenges What are the challenges in designing a practical deadlock prevention solution? Ø No change to existing routing protocols or hardware Ø Link failures & routing errors are unavoidable at scale Ø Switches support at most 8 limited lossless priorities (and typically only two can be used) 11

12 The Existing Deadlock Prevention Solutions #1: deadlock-free routing protocols v not supported by commodity switches (fail challenge #1) v not work with link failures or routing errors (fail challenge #2) #2: buffer management schemes v require a lot of lossless priorities (fail challenge #3) Our answer: Tagger 12

13 TAGGER DESIGN 13

14 Important Observation Fat-tree [Sigcomm 08] V [Sigcomm 09] BCube [Sigcomm 09] HyperX [SC 09] desired path set: all shortest paths desired path set: dimension-order paths Takeaway: In a data center, we can ask operator to supply a set of expected lossless paths (ELP)! 14

15 Basic Idea of Tagger 1. Ask operators to provide: v topology & expected lossless paths (ELP) 2. Packets carrying tags when in the network 3. Pre-install match-action rules at switches for tag manipulation and packet queueing v v packets travel over ELP: lossless queues & CBD never forms packets deviate ELP: lossy queue, thus PFC not triggered 15

16 Illustrating Tagger for Clos Topology Root cause of CBD: packets deviate UP-DOWN routing! flow 1 flow 2 ELP = all shortest paths (CBD-free) 16

17 Illustrating Tagger for Clos Topology match action Tag InPort OutPort NewTag NoBounce Bounced match-action rules installed at switches flow 1 tag = NoBounce Under Tagger, packets carry tags when travelling in the network Initially, tag value = NoBounce At switches, Tagger pre-install match-action rules for tag manipulation 17

18 Illustrating Tagger for Clos Topology tag = NoBounce match action Tag InPort OutPort NewTag NoBounce Bounced match-action rules installed at switches flow 1 Packet received by switch 18

19 Illustrating Tagger for Clos Topology tag = NoBounce Bounced match action Tag InPort OutPort NewTag NoBounce Bounced down-up bounce observed! flow 1 rewrite tag once DOWN-UP bounce detected 19

20 Illustrating Tagger for Clos Topology tag = Bounced flow 1 knows it is a bounced packet that deviates ELP à placed in the lossy queue No PFC PAUSE sent from to à buffer dependency from to removed 20

21 Illustrating Tagger for Clos Topology RX RX RX RX RX RX flow 2 buffer dependency graph CBD: ->->->-> Tagger will do the same for packets of flow 2 2 buffer dependency edges are removed à CBD is eliminated 21

22 What If ELP Has CBD? ELP = shortest paths + 1-bounce paths (ELP has CBD now!) 22

23 Segmenting ELP into CBD-free Subsets two bounced paths are in ELP now flow 1 flow 2 flow 1 flow 2 flow 1 flow 2 path segments before bounce (only have UP-DOWN paths, no CBD) path segments after bounce (only have UP-DOWN paths, no CBD) 23

24 Isolating Path Segments with Tags flow 1 flow 2 flow 1 flow 2 tag 1 à path segments before bounce tag 2 à path segments after bounce 24

25 Isolating Path Segments with Tags tag = 1 tag = 2 flow 1 Adding a rule at switch : (Tag = 1, Inport=, OutPort = ) -> NewTag = 2 25

26 No CBD after Segmentation flow 1 flow 2 tag 1 flow 1 flow 2 tag buffer dependency graph packets with tag i à i-th lossless queue CBD: ->->->-> 26

27 What If k-bounce Paths all in ELP? solution: just segmenting ELP into k CBD-free subsets based on number of bounced times! ELP = shortest up-down paths + 1-bounce paths k-bounce paths 27

28 Summary: Tagger Design for Clos Topology 1. Initially, packets carry with tag = 1 2. pre-install match-action rules at switches: DOWN-UP bounce: increase tag by 1 Enqueue packets with tag i to i-th lossless queue (i <= k+1) Enqueue packets with tag i to lossy queue(i > k+1) For Clos topology, Tagger is optimal in terms of # of lossless priorities. 28

29 How to Implement Tagger? DSCP field in the IP header as the tag carried in the packets build 3-step match-action pipeline with basic ACL rules available in commodity switches 29

30 Tagger Meets All the Three Challenges 1. Work with existing routing protocols & hardware 2. Work with link failures & routing errors 3. Work with limited number of lossless queues 30

31 More Details in the Paper Proof of Deadlock freedom Analysis & Discussions Algorithm complexity Optimality Compression of match-action rules 31

32 Evaluation-1: Tagger prevents Deadlock deadlock! flow 1 flow 2 Scenario: two flows forms CBD Tagger avoids CBD caused by bounced flows, and prevents deadlock! 32

33 Evaluation-2: Scalability of Tagger * last entry includes additional 20,000 random paths. Match-action rules and priorities required for Jellyfish topology Tagger is scalable in terms of number of lossless priorities and ACL rules. 33

34 Evaluation-3: Overhead of Tagger Tagger rules have no impact on throughput and latency 34

35 Conclusion Tagger: a tagging system guarantees deadlock-freedom Practical: Ørequire no change to existing routing protocols Øimplementable with existing commodity switching ASICs Øwork with limited number of lossless priorities General: Øwork with any topologies Øwork with any ELPs 35

36 Thanks! 36

Tagger: Practical PFC Deadlock Prevention in Data Center Networks

Tagger: Practical PFC Deadlock Prevention in Data Center Networks Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu,, Yibo Zhu, Peng Cheng, Chuanxiong Guo Kun Tan, Jitendra Padhye, Kai Chen Microsoft Hong Kong University of Science and Technology

More information

RDMA over Commodity Ethernet at Scale

RDMA over Commodity Ethernet at Scale RDMA over Commodity Ethernet at Scale Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitendra Padhye, Marina Lipshteyn ACM SIGCOMM 2016 August 24 2016 Outline RDMA/RoCEv2 background DSCP-based

More information

RDMA in Data Centers: Looking Back and Looking Forward

RDMA in Data Centers: Looking Back and Looking Forward RDMA in Data Centers: Looking Back and Looking Forward Chuanxiong Guo Microsoft Research ACM SIGCOMM APNet 2017 August 3 2017 The Rising of Cloud Computing 40 AZURE REGIONS Data Centers Data Centers Data

More information

P802.1Qcz Congestion Isolation

P802.1Qcz Congestion Isolation P802.1Qcz Congestion Isolation IEEE 802 / IETF Workshop on Data Center Networking Bangkok November 2018 Paul Congdon (Huawei/Tallac) The Case for Low-latency, Lossless, Large-Scale DCNs More and more latency-sensitive

More information

Revisiting Network Support for RDMA

Revisiting Network Support for RDMA Revisiting Network Support for RDMA Radhika Mittal 1, Alex Shpiner 3, Aurojit Panda 1, Eitan Zahavi 3, Arvind Krishnamurthy 2, Sylvia Ratnasamy 1, Scott Shenker 1 (1: UC Berkeley, 2: Univ. of Washington,

More information

RDMA and Hardware Support

RDMA and Hardware Support RDMA and Hardware Support SIGCOMM Topic Preview 2018 Yibo Zhu Microsoft Research 1 The (Traditional) Journey of Data How app developers see the network Under the hood This architecture had been working

More information

Baidu s Best Practice with Low Latency Networks

Baidu s Best Practice with Low Latency Networks Baidu s Best Practice with Low Latency Networks Feng Gao IEEE 802 IC NEND Orlando, FL November 2017 Presented by Huawei Low Latency Network Solutions 01 1. Background Introduction 2. Network Latency Analysis

More information

IEEE P802.1Qcz Proposed Project for Congestion Isolation

IEEE P802.1Qcz Proposed Project for Congestion Isolation IEEE P82.1Qcz Proposed Project for Congestion Isolation IETF 11 London ICCRG Paul Congdon paul.congdon@tallac.com Project Background P82.1Qcz Project Initiation November 217 - Agreed to develop a Project

More information

RoGUE: RDMA over Generic Unconverged Ethernet

RoGUE: RDMA over Generic Unconverged Ethernet RoGUE: RDMA over Generic Unconverged Ethernet Yanfang Le with Brent Stephens, Arjun Singhvi, Aditya Akella, Mike Swift RDMA Overview RDMA USER KERNEL Zero Copy Application Application Buffer Buffer HARWARE

More information

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat ACM SIGCOMM 2013, 12-16 August, Hong Kong, China Virtualized Server 1 Application Performance in Virtualized

More information

SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet

SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet SoftRDMA: Rekindling High Performance Software RDMA over Commodity Ethernet Mao Miao, Fengyuan Ren, Xiaohui Luo, Jing Xie, Qingkai Meng, Wenxue Cheng Dept. of Computer Science and Technology, Tsinghua

More information

Requirement Discussion of Flow-Based Flow Control(FFC)

Requirement Discussion of Flow-Based Flow Control(FFC) Requirement Discussion of Flow-Based Flow Control(FFC) Nongda Hu Yolanda Yu hunongda@huawei.com yolanda.yu@huawei.com IEEE 802.1 DCB, Stuttgart, May 2017 www.huawei.com new-dcb-yolanda-ffc-proposal-0517-v01

More information

From Routing to Traffic Engineering

From Routing to Traffic Engineering 1 From Routing to Traffic Engineering Robert Soulé Advanced Networking Fall 2016 2 In the beginning B Goal: pair-wise connectivity (get packets from A to B) Approach: configure static rules in routers

More information

Information-Agnostic Flow Scheduling for Commodity Data Centers. Kai Chen SING Group, CSE Department, HKUST May 16, Stanford University

Information-Agnostic Flow Scheduling for Commodity Data Centers. Kai Chen SING Group, CSE Department, HKUST May 16, Stanford University Information-Agnostic Flow Scheduling for Commodity Data Centers Kai Chen SING Group, CSE Department, HKUST May 16, 2016 @ Stanford University 1 SING Testbed Cluster Electrical Packet Switch, 1G (x10) Electrical

More information

FaRM: Fast Remote Memory

FaRM: Fast Remote Memory FaRM: Fast Remote Memory Problem Context DRAM prices have decreased significantly Cost effective to build commodity servers w/hundreds of GBs E.g. - cluster with 100 machines can hold tens of TBs of main

More information

EXPERIENCES EVALUATING DCTCP. Lawrence Brakmo, Boris Burkov, Greg Leclercq and Murat Mugan Facebook

EXPERIENCES EVALUATING DCTCP. Lawrence Brakmo, Boris Burkov, Greg Leclercq and Murat Mugan Facebook EXPERIENCES EVALUATING DCTCP Lawrence Brakmo, Boris Burkov, Greg Leclercq and Murat Mugan Facebook INTRODUCTION Standard TCP congestion control, which only reacts to packet losses has many problems Can

More information

Programmable NICs. Lecture 14, Computer Networks (198:552)

Programmable NICs. Lecture 14, Computer Networks (198:552) Programmable NICs Lecture 14, Computer Networks (198:552) Network Interface Cards (NICs) The physical interface between a machine and the wire Life of a transmitted packet Userspace application NIC Transport

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

DIBS: Just-in-time congestion mitigation for Data Centers

DIBS: Just-in-time congestion mitigation for Data Centers DIBS: Just-in-time congestion mitigation for Data Centers Kyriakos Zarifis, Rui Miao, Matt Calder, Ethan Katz-Bassett, Minlan Yu, Jitendra Padhye University of Southern California Microsoft Research Summary

More information

Configuring Priority Flow Control

Configuring Priority Flow Control This chapter contains the following sections: Information About Priority Flow Control, page 1 Guidelines and Limitations, page 2 Default Settings for Priority Flow Control, page 3 Enabling Priority Flow

More information

Best Practices for Deployments using DCB and RoCE

Best Practices for Deployments using DCB and RoCE Best Practices for Deployments using DCB and RoCE Contents Introduction... Converged Networks... RoCE... RoCE and iwarp Comparison... RoCE Benefits for the Data Center... RoCE Evaluation Design... RoCE

More information

Democratically Finding The Cause of Packet Drops

Democratically Finding The Cause of Packet Drops Democratically Finding The Cause of Packet Drops Behnaz Arzani, Selim Ciraci, Luiz Chamon, Yibo Zhu, Hongqiang (Harry) Liu, Jitu Padhye, Geoff Outhred, Boon Thau Loo 1 Marple- SigComm 2017 Sherlock- SigComm

More information

Information-Agnostic Flow Scheduling for Commodity Data Centers

Information-Agnostic Flow Scheduling for Commodity Data Centers Information-Agnostic Flow Scheduling for Commodity Data Centers Wei Bai, Li Chen, Kai Chen, Dongsu Han (KAIST), Chen Tian (NJU), Hao Wang Sing Group @ Hong Kong University of Science and Technology USENIX

More information

Routing in packet-switching networks

Routing in packet-switching networks Routing in packet-switching networks Circuit switching vs. Packet switching Most of WANs based on circuit or packet switching Circuit switching designed for voice Resources dedicated to a particular call

More information

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech (aniruddh@gatech.edu) Tushar

More information

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552)

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552) Packet Scheduling in Data Centers Lecture 17, Computer Networks (198:552) Datacenter transport Goal: Complete flows quickly / meet deadlines Short flows (e.g., query, coordination) Large flows (e.g., data

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

Configuring Priority Flow Control

Configuring Priority Flow Control This chapter contains the following sections: Information About Priority Flow Control, page 1 Guidelines and Limitations, page 2 Default Settings for Priority Flow Control, page 3 Enabling Priority Flow

More information

Congestion Control for Large-Scale RDMA Deployments

Congestion Control for Large-Scale RDMA Deployments Congestion Control for Large-Scale RDMA Deployments Yibo Zhu 1,3 Haggai Eran 2 Daniel Firestone 1 Chuanxiong Guo 1 Marina Lipshteyn 1 Yehonatan Liron 2 Jitendra Padhye 1 Shachar Raindel 2 Mohamad Haj Yahia

More information

Router s Queue Management

Router s Queue Management Router s Queue Management Manages sharing of (i) buffer space (ii) bandwidth Q1: Which packet to drop when queue is full? Q2: Which packet to send next? FIFO + Drop Tail Keep a single queue Answer to Q1:

More information

BCube: A High Performance, Servercentric. Architecture for Modular Data Centers

BCube: A High Performance, Servercentric. Architecture for Modular Data Centers BCube: A High Performance, Servercentric Network Architecture for Modular Data Centers Chuanxiong Guo1, Guohan Lu1, Dan Li1, Haitao Wu1, Xuan Zhang1;2, Yunfeng Shi1;3, Chen Tian1;4, Yongguang Zhang1, Songwu

More information

Advanced Computer Networks. RDMA, Network Virtualization

Advanced Computer Networks. RDMA, Network Virtualization Advanced Computer Networks 263 3501 00 RDMA, Network Virtualization Patrick Stuedi Spring Semester 2013 Oriana Riva, Department of Computer Science ETH Zürich Last Week Scaling Layer 2 Portland VL2 TCP

More information

Routing Strategies. Fixed Routing. Fixed Flooding Random Adaptive

Routing Strategies. Fixed Routing. Fixed Flooding Random Adaptive Routing Strategies Fixed Flooding Random Adaptive Fixed Routing Single permanent route for each source to destination pair Determine routes using a least cost algorithm Route fixed, at least until a change

More information

Congestion Control in Datacenters. Ahmed Saeed

Congestion Control in Datacenters. Ahmed Saeed Congestion Control in Datacenters Ahmed Saeed What is a Datacenter? Tens of thousands of machines in the same building (or adjacent buildings) Hundreds of switches connecting all machines What is a Datacenter?

More information

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access

More information

LITE Kernel RDMA. Support for Datacenter Applications. Shin-Yeh Tsai, Yiying Zhang

LITE Kernel RDMA. Support for Datacenter Applications. Shin-Yeh Tsai, Yiying Zhang LITE Kernel RDMA Support for Datacenter Applications Shin-Yeh Tsai, Yiying Zhang Time 2 Berkeley Socket Userspace Kernel Hardware Time 1983 2 Berkeley Socket TCP Offload engine Arrakis & mtcp IX Userspace

More information

Extending commodity OpenFlow switches for large-scale HPC deployments

Extending commodity OpenFlow switches for large-scale HPC deployments Extending commodity OpenFlow switches for large-scale HPC deployments Mariano Benito Enrique Vallejo Ramón Beivide Cruz Izu University of Cantabria The University of Adelaide Overview 1.Introduction 1.

More information

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers Cutting the Cord: A Robust Wireless Facilities Network for Data Centers Yibo Zhu, Xia Zhou, Zengbin Zhang, Lin Zhou, Amin Vahdat, Ben Y. Zhao and Haitao Zheng U.C. Santa Barbara, Dartmouth College, U.C.

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Configuring Priority Flow Control

Configuring Priority Flow Control About Priority Flow Control, page 1 Licensing Requirements for Priority Flow Control, page 2 Prerequisites for Priority Flow Control, page 2 Guidelines and Limitations for Priority Flow Control, page 2

More information

High Performance Packet Processing with FlexNIC

High Performance Packet Processing with FlexNIC High Performance Packet Processing with FlexNIC Antoine Kaufmann, Naveen Kr. Sharma Thomas Anderson, Arvind Krishnamurthy University of Washington Simon Peter The University of Texas at Austin Ethernet

More information

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz 1 A Typical Facebook Page Modern pages have many components

More information

Audience This paper is targeted for IT managers and architects. It showcases how to utilize your network efficiently and gain higher performance using

Audience This paper is targeted for IT managers and architects. It showcases how to utilize your network efficiently and gain higher performance using White paper Benefits of Remote Direct Memory Access Over Routed Fabrics Introduction An enormous impact on data center design and operations is happening because of the rapid evolution of enterprise IT.

More information

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems

More information

Dos-A Scalable Optical Switch for Datacenters

Dos-A Scalable Optical Switch for Datacenters Dos-A Scalable Optical Switch for Datacenters Speaker: Lin Wang Research Advisor: Biswanath Mukherjee Ye, X. et al., DOS: A scalable optical switch for datacenters, Proceedings of the 6th ACM/IEEE Symposium

More information

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD Maximum Performance How to get it and how to avoid pitfalls Christoph Lameter, PhD cl@linux.com Performance Just push a button? Systems are optimized by default for good general performance in all areas.

More information

RoCE vs. iwarp Competitive Analysis

RoCE vs. iwarp Competitive Analysis WHITE PAPER February 217 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...5 Summary...6

More information

Configuring Priority Flow Control

Configuring Priority Flow Control About Priority Flow Control, on page 1 Licensing Requirements for Priority Flow Control, on page 2 Prerequisites for Priority Flow Control, on page 2 Guidelines and Limitations for Priority Flow Control,

More information

William Stallings Data and Computer Communications. Chapter 10 Packet Switching

William Stallings Data and Computer Communications. Chapter 10 Packet Switching William Stallings Data and Computer Communications Chapter 10 Packet Switching Principles Circuit switching designed for voice Resources dedicated to a particular call Much of the time a data connection

More information

arxiv: v2 [cs.ni] 12 Jun 2012

arxiv: v2 [cs.ni] 12 Jun 2012 Finishing Flows Quickly with Preemptive Scheduling Chi-Yao Hong UIUC cyhong@illinois.edu Matthew Caesar UIUC caesar@illinois.edu P. Brighten Godfrey UIUC pbg@illinois.edu arxiv:6.7v [cs.ni] Jun ABSTRACT

More information

Finishing Flows Quickly with Preemptive Scheduling

Finishing Flows Quickly with Preemptive Scheduling Finishing Flows Quickly with Preemptive Scheduling Chi-Yao Hong UIUC cyhong@illinois.edu Matthew Caesar UIUC caesar@illinois.edu P. Brighten Godfrey UIUC pbg@illinois.edu ABSTRACT Today s data centers

More information

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan

More information

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers Cutting the Cord: A Robust Wireless Facilities Network for Data Centers Yibo Zhu, Xia Zhou, Zengbin Zhang, Lin Zhou, Amin Vahdat, Ben Y. Zhao and Haitao Zheng U.C. Santa Barbara, Dartmouth College, U.C.

More information

Maelstrom: An Enterprise Continuity Protocol for Financial Datacenters

Maelstrom: An Enterprise Continuity Protocol for Financial Datacenters Maelstrom: An Enterprise Continuity Protocol for Financial Datacenters Mahesh Balakrishnan, Tudor Marian, Hakim Weatherspoon Cornell University, Ithaca, NY Datacenters Internet Services (90s) Websites,

More information

SPARTA: Scalable Per-Address RouTing Architecture

SPARTA: Scalable Per-Address RouTing Architecture SPARTA: Scalable Per-Address RouTing Architecture John Carter Data Center Networking IBM Research - Austin IBM Research Science & Technology IBM Research activities related to SDN / OpenFlow IBM Research

More information

Configuring QoS. Finding Feature Information. Prerequisites for QoS

Configuring QoS. Finding Feature Information. Prerequisites for QoS Finding Feature Information, page 1 Prerequisites for QoS, page 1 Restrictions for QoS, page 3 Information About QoS, page 4 How to Configure QoS, page 28 Monitoring Standard QoS, page 80 Configuration

More information

SOFTWARE DEFINED NETWORKS. Jonathan Chu Muhammad Salman Malik

SOFTWARE DEFINED NETWORKS. Jonathan Chu Muhammad Salman Malik SOFTWARE DEFINED NETWORKS Jonathan Chu Muhammad Salman Malik Credits Material Derived from: Rob Sherwood, Saurav Das, Yiannis Yiakoumis AT&T Tech Talks October 2010 (available at:www.openflow.org/wk/images/1/17/openflow_in_spnetworks.ppt)

More information

A Scalable, Commodity Data Center Network Architecture

A Scalable, Commodity Data Center Network Architecture A Scalable, Commodity Data Center Network Architecture B Y M O H A M M A D A L - F A R E S A L E X A N D E R L O U K I S S A S A M I N V A H D A T P R E S E N T E D B Y N A N X I C H E N M A Y. 5, 2 0

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

Providing Bandwidth Guarantees, Work Conservation and Low Latency Simultaneously in the Cloud

Providing Bandwidth Guarantees, Work Conservation and Low Latency Simultaneously in the Cloud 1 Providing Bandwidth Guarantees, Work Conservation and Low Latency Simultaneously in the Cloud Shuihai Hu 1, Wei Bai 1,2, Kai Chen 1, Chen Tian 3, Ying Zhang 4, Haitao Wu 5 1 SING Group @ HKUST 2 Microsoft

More information

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636 1 Adaptive Routing Adaptive Routing Basics Minimal Adaptive Routing Fully Adaptive Routing Load-Balanced Adaptive Routing Search-Based Routing Case Study: Adapted Routing in the Thinking Machines CM-5

More information

Micro load balancing in data centers with DRILL

Micro load balancing in data centers with DRILL Micro load balancing in data centers with DRILL Soudeh Ghorbani (UIUC) Brighten Godfrey (UIUC) Yashar Ganjali (University of Toronto) Amin Firoozshahian (Intel) Where should the load balancing functionality

More information

IsoStack Highly Efficient Network Processing on Dedicated Cores

IsoStack Highly Efficient Network Processing on Dedicated Cores IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single

More information

Messaging Overview. Introduction. Gen-Z Messaging

Messaging Overview. Introduction. Gen-Z Messaging Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional

More information

Why AI Frameworks Need (not only) RDMA?

Why AI Frameworks Need (not only) RDMA? Why AI Frameworks Need (not only) RDMA? With Design and Implementation Experience of Networking Support on TensorFlow GDR, Apache MXNet, WeChat Amber, and Tencent Angel Bairen Yi (byi@connect.ust.hk) Jingrong

More information

ETSF05/ETSF10 Internet Protocols. Routing on the Internet

ETSF05/ETSF10 Internet Protocols. Routing on the Internet ETSF05/ETSF10 Internet Protocols Routing on the Internet Circuit switched routing ETSF05/ETSF10 - Internet Protocols 2 Routing in Packet Switching Networks Key design issue for (packet) switched networks

More information

QuickSpecs. HP Z 10GbE Dual Port Module. Models

QuickSpecs. HP Z 10GbE Dual Port Module. Models Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or

More information

QoS Architecture and Its Implementation. Sueng- Yong Park, Ph.D. Yonsei University

QoS Architecture and Its Implementation. Sueng- Yong Park, Ph.D. Yonsei University Architecture and Its Implementation Sueng- Yong Park, Ph.D. Yonsei University 2007.11.07 1 Scheduler Deficit Round Robin (DRR) Implementation of DRR Calculation of BW 2 Deficit Round Robin Each queue,

More information

DevoFlow: Scaling Flow Management for High-Performance Networks

DevoFlow: Scaling Flow Management for High-Performance Networks DevoFlow: Scaling Flow Management for High-Performance Networks Andy Curtis Jeff Mogul Jean Tourrilhes Praveen Yalagandula Puneet Sharma Sujata Banerjee Software-defined networking Software-defined networking

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration

More information

SCALING SOFTWARE DEFINED NETWORKS. Chengyu Fan (edited by Lorenzo De Carli)

SCALING SOFTWARE DEFINED NETWORKS. Chengyu Fan (edited by Lorenzo De Carli) SCALING SOFTWARE DEFINED NETWORKS Chengyu Fan (edited by Lorenzo De Carli) Introduction Network management is driven by policy requirements Network Policy Guests must access Internet via web-proxy Web

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 Oriana Riva, Department of Computer Science ETH Zürich 1 Today Flow Control Store-and-forward,

More information

Adaptive Routing for Data Center Bridges

Adaptive Routing for Data Center Bridges Adaptive Routing for Data Center Bridges Cyriel Minkenberg 1, Mitchell Gusat 1, German Rodriguez 2 1 IBM Research - Zurich 2 Barcelona Supercomputing Center Overview IBM Research - Zurich Data center network

More information

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

More information

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network This Lecture BUS0 - Computer Facilities Network Management Switching networks Circuit switching Packet switching gram approach Virtual circuit approach Routing in switching networks Faculty of Information

More information

Data Center Network Topologies II

Data Center Network Topologies II Data Center Network Topologies II Hakim Weatherspoon Associate Professor, Dept of Computer cience C 5413: High Performance ystems and Networking April 10, 2017 March 31, 2017 Agenda for semester Project

More information

Slicing a Network. Software-Defined Network (SDN) FlowVisor. Advanced! Computer Networks. Centralized Network Control (NC)

Slicing a Network. Software-Defined Network (SDN) FlowVisor. Advanced! Computer Networks. Centralized Network Control (NC) Slicing a Network Advanced! Computer Networks Sherwood, R., et al., Can the Production Network Be the Testbed? Proc. of the 9 th USENIX Symposium on OSDI, 2010 Reference: [C+07] Cascado et al., Ethane:

More information

Router Architectures

Router Architectures Router Architectures Venkat Padmanabhan Microsoft Research 13 April 2001 Venkat Padmanabhan 1 Outline Router architecture overview 50 Gbps multi-gigabit router (Partridge et al.) Technology trends Venkat

More information

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

FIBRE CHANNEL OVER ETHERNET

FIBRE CHANNEL OVER ETHERNET FIBRE CHANNEL OVER ETHERNET A Review of FCoE Today Abstract Fibre Channel over Ethernet (FcoE) is a storage networking option, based on industry standards. This white paper provides an overview of FCoE,

More information

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

Towards scalable RDMA locking on a NIC

Towards scalable RDMA locking on a NIC TORSTEN HOEFLER spcl.inf.ethz.ch Towards scalable RDMA locking on a NIC with support of Patrick Schmid, Maciej Besta, Salvatore di Girolamo @ SPCL presented at HP Labs, Palo Alto, CA, USA NEED FOR EFFICIENT

More information

Routing in Ad Hoc Wireless Networks PROF. MICHAEL TSAI / DR. KATE LIN 2014/05/14

Routing in Ad Hoc Wireless Networks PROF. MICHAEL TSAI / DR. KATE LIN 2014/05/14 Routing in Ad Hoc Wireless Networks PROF. MICHAEL TSAI / DR. KATE LIN 2014/05/14 Routing Algorithms Link- State algorithm Each node maintains a view of the whole network topology Find the shortest path

More information

EE 382C Interconnection Networks

EE 382C Interconnection Networks EE 8C Interconnection Networks Deadlock and Livelock Stanford University - EE8C - Spring 6 Deadlock and Livelock: Terminology Deadlock: A condition in which an agent waits indefinitely trying to acquire

More information

NVMe Over Fabrics (NVMe-oF)

NVMe Over Fabrics (NVMe-oF) NVMe Over Fabrics (NVMe-oF) High Performance Flash Moves to Ethernet Rob Davis Vice President Storage Technology, Mellanox Santa Clara, CA 1 Access Time Access in Time Micro (micro-sec) Seconds Why NVMe

More information

ETSF05/ETSF10 Internet Protocols Routing on the Internet

ETSF05/ETSF10 Internet Protocols Routing on the Internet ETSF05/ETSF10 Internet Protocols Routing on the Internet 2014, (ETSF05 Part 2), Lecture 1.1 Jens Andersson Circuit switched routing 2014 11 05 ETSF05/ETSF10 Internet Protocols 2 Packet switched Routing

More information

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N. Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,

More information

Exploiting Offload Enabled Network Interfaces

Exploiting Offload Enabled Network Interfaces spcl.inf.ethz.ch S. DI GIROLAMO, P. JOLIVET, K. D. UNDERWOOD, T. HOEFLER Exploiting Offload Enabled Network Interfaces How to We program need an abstraction! QsNet? Lossy Networks Ethernet Lossless Networks

More information

Internetworking Part 1

Internetworking Part 1 CMPE 344 Computer Networks Spring 2012 Internetworking Part 1 Reading: Peterson and Davie, 3.1 22/03/2012 1 Not all networks are directly connected Limit to how many hosts can be attached Point-to-point:

More information

T Computer Networks II Data center networks

T Computer Networks II Data center networks T-110.5116 Computer Networks II Data center networks 29.9.2014 Matti Siekkinen (Sources: S. Kandula et al.: The Nature of Datacenter: measurements & analysis, A. Greenberg: Networking The Cloud, M. Alizadeh

More information

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Peng Wang, Hong Xu, Zhixiong Niu, Dongsu Han, Yongqiang Xiong ACM SoCC 2016, Oct 5-7, Santa Clara Motivation Datacenter networks

More information

Programmable Software Switches. Lecture 11, Computer Networks (198:552)

Programmable Software Switches. Lecture 11, Computer Networks (198:552) Programmable Software Switches Lecture 11, Computer Networks (198:552) Software-Defined Network (SDN) Centralized control plane Data plane Data plane Data plane Data plane Why software switching? Early

More information

The Tofu Interconnect 2

The Tofu Interconnect 2 The Tofu Interconnect 2 Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shun Ando, Masahiro Maeda, Takahide Yoshikawa, Koji Hosoe, and Toshiyuki Shimizu Fujitsu Limited Introduction Tofu interconnect

More information

Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation

Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang Datacenter 3 Monolithic Computer OS / Hypervisor 4 Can monolithic Application Hardware

More information

Multi-resource Energy-efficient Routing in Cloud Data Centers with Network-as-a-Service

Multi-resource Energy-efficient Routing in Cloud Data Centers with Network-as-a-Service in Cloud Data Centers with Network-as-a-Service Lin Wang*, Antonio Fernández Antaº, Fa Zhang*, Jie Wu+, Zhiyong Liu* *Institute of Computing Technology, CAS, China ºIMDEA Networks Institute, Spain + Temple

More information

Introduction to Infiniband

Introduction to Infiniband Introduction to Infiniband FRNOG 22, April 4 th 2014 Yael Shenhav, Sr. Director of EMEA, APAC FAE, Application Engineering The InfiniBand Architecture Industry standard defined by the InfiniBand Trade

More information

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved. Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access

More information

Deadlock-Free Local Fast Failover for Arbitrary Data Center Networks

Deadlock-Free Local Fast Failover for Arbitrary Data Center Networks Deadlock-Free Local Fast Failover for Arbitrary Data Center Networks Brent Stephens UW-Madison Alan L. Cox Rice University Abstract Today, given data center networks sizes and bursty workloads, it is likely

More information