Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks

Size: px
Start display at page:

Download "Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks"

Transcription

1 Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Peng Wang, Hong Xu, Zhixiong Niu, Dongsu Han, Yongqiang Xiong ACM SoCC 2016, Oct 5-7, Santa Clara

2 Motivation Datacenter networks rely on multipathing to provide large bisection bandwidth for diverse set of apps (web search, big data analytics, etc) Efficient load balancing is needed to improve network utilization and app performance 2

3 Motivation Current industry standard: ECMP load balancing Hash flow s 5-tuple to select outgoing link Stateless and oblivious to link utilization Agnostic to congestion s 4 s 5 ECMP blindly put a flow on a congested path 1Gbps s 1 s 2 s 3 3

4 Motivation State-of-the-art load balancing: CONGA (SIGCOMM 14) Track end-to-end path-wise congestion information Collect information using data packet piggybacking s 4 s 5 Dst Leaf S2 S3 path 1 2 Congestion-to-Leaf s 1 s 2 s 3 Src Leaf S1 S2 path 1 2 Congestion-from-Leaf 4

5 Motivation State-of-the-art load balancing: CONGA (SIGCOMM 14) Track end-to-end path-wise congestion information Collect information using data packet piggybacking s 4 s 5 Dst Leaf S2 S3 path 1 2 Congestion-to-Leaf s 1 s 2 s 3 Src Leaf S1 S2 path Congestion-from-Leaf 5

6 Motivation State-of-the-art load balancing: CONGA (SIGCOMM 14) Track end-to-end path-wise congestion information Collect information using data packet piggybacking s 4 s 5 Dst Leaf S2 S3 path 1 2 Congestion-to-Leaf s 1 s 2 s 3 Src Leaf S1 S2 path Congestion-from-Leaf 6

7 Motivation State-of-the-art load balancing: CONGA (SIGCOMM 14) Track end-to-end path-wise congestion information Collect information using data packet piggybacking s 4 s 5 Dst Leaf S2 S3 path 1 2 Congestion-to-Leaf s 1 s 2 s 3 Src Leaf S1 S2 path Congestion-from-Leaf 7

8 Motivation State-of-the-art load balancing: CONGA (SIGCOMM 14) Track end-to-end path-wise congestion information Collect information using data packet piggybacking s 4 s 5 Dst Leaf S2 S3 path Congestion-to-Leaf s 1 s 2 s 3 Src Leaf S1 S2 path Congestion-from-Leaf 8

9 Motivation Scalability issue of CONGA Designed for 2-tier network, not scalable for 3-tier network (much more paths) Simple per-path feedback needs many concurrent flows to cover all paths in a large-scale network 9

10 Challenges Massive congestion information on all paths In a typical 3-tier Clos network, hundreds of paths exist between any pair of ToR ; A ToR switch can communicate with thousands of other core plane 1 core plane 4 core 1 m 1 40G links m m=12 s 4 m aggregation n = 4 f p 4 p=96 ToR 1 2 r 1 2 r t p r r=48 pod 1 pod p Facebook datacenter fabric network topology 10

11 Challenges Massive congestion information on all paths In a typical 3-tier Clos network, hundreds of paths exist between any pair of ToR ; A ToR switch can communicate with thousands of other core plane 1 core plane 4 core 1 m 1 40G links m m=12 s 4 m Each ToR switch need to track millions of n paths = 4 aggregation f p for all 4 p=96 destination ToRs! ToR 1 2 r 1 2 r t p r r=48 pod 1 pod p Facebook datacenter fabric network topology 11

12 Expeditus 12

13 Our Idea Aggregate congestion information in two stages ToR-aggregation core 1 m 1 m aggregation ToR 1 r src 1 r dst 13

14 Our Idea Aggregate congestion information in two stages ToR-aggregation Aggregation-core core 1 m 1 m aggregation ToR 1 r src 1 r dst 14

15 Our Idea Aggregate congestion information in two stages ToR-aggregation Aggregation-core core 1 m 1 m aggregation ToR 1 r src 1 r dst 15

16 Overview One-hop congestion information collection Local congestion monitoring for all uplinks at each ToR and aggregation switch Two-stage path selection Select path tier-by-tier in 3-tier Clos network 16

17 Information collection Switches locally monitor egress and ingress link utilization for all uplinks 1 m 1 m core aggregation 1 r 1 r ToR 17

18 Overview One-hop congestion information collection Do local congestion monitoring for all uplinks at each ToR and aggregation switch Two-stage path selection Select path tier-by-tier in 3-tier Clos network 18

19 Stage #1 Start when the first packet (Exp-request) of flow reaches source ToR switch. 1 m 1 m core aggregation 1 r 1 r ToR S Y N source ToR t 1 1 dest ToR t 3 1 Exp-request

20 Stage #1 Source ToR switch inserts egress link utilisation into Exprequest packet 1 m 1 m core aggregation 1 r 1 r ToR S Y N source ToR t 1 1 dest ToR t 3 1 Exp-request

21 Stage #1 Source ToR switch inserts egress link utilisation into Exprequest packet Exp-request S source ToR 1 Y m 1 egress info m N core aggregation 1 r 1 r ToR source ToR t 1 1 dest ToR t

22 Stage #1 Destination ToR switch uses the maximum load of two hops as effective congestion, and chooses aggregation switch ID with the minimum effective congestion Exp-request 1 m S source ToR Y egress info N 1 m core aggregation 1 r 1 r ToR source ToR t 1 1 dest ToR t 3 1 S Y N source ToR egress info + dest ToR best aggr ID ingress info (say 4)

23 Stage #1 Destination ToR switch uses the maximum load of two hops as effective congestion, and chooses aggregation switch ID with the minimum effective congestion Exp-request 1 m S source ToR Y egress info N 1 m core aggregation 1 r 1 r ToR source ToR t 1 1 dest ToR t 3 1 S Y N source ToR egress info + dest ToR best aggr ID ingress info (say 4)

24 Stage #1 Destination ToR switch uses the maximum load of two hops as effective congestion, and chooses aggregation switch ID with the minimum effective congestion Exp-request 1 m S source ToR Y egress info N 1 m core aggregation 1 r 1 r ToR source ToR t 1 1 dest ToR t 3 1 S Y N source ToR egress info + dest ToR best aggr ID ingress info (say 4)

25 Stage #1 Destination ToR switch uses the maximum load of two hops as effective congestion, and chooses aggregation switch ID with the minimum effective congestion Exp-request 1 m S source ToR Y egress info N 1 m core aggregation 1 r 1 r ToR source ToR t 1 1 dest ToR t 3 1 S Y N source ToR egress info + dest ToR best aggr ID ingress info (say 4)

26 Stage #1 Destination ToR switch generates Exp-response packet and send it to the selected aggregation switch 1 m 1 m core a 3 4 chosen dest aggr 1 r 1 r Exp-response source ToR t 1 1 dest ToR t 3 1 Generate Exp-response: cp Exp-request s header, reverse src and dst addr 26

27 Stage #2 Chosen aggregation switch insert ingress link utilisation into Exp-response packet Exp-response dest aggr ingress info 1 m 1 m core a 3 4 chosen dest aggr 1 r source ToR t 1 1 dest ToR t r ToR 27

28 Stage #2 Similarly, source aggregation switch chooses core switch ID with the minimum effective congestion It further records selected core switch ID as path selection result Flow ID Egress Port 0xF0A3 1 Path Selection a 1 4 c 4 1 chosen core 1 m 1 m core a 4 4 chosen dest aggr Exp-response 1 dest aggr r + ingress info source aggr source ToR t 1 egress info 1 dest ToR t best core ID r (say 1) ToR 28

29 Stage #2 Source ToR switch records ingress port of Exp-response packet as path selection result Two-stage path selection is done! chosen core Exp-response c m 1 m core dest aggr ingress info a 1 4 a 4 4 chosen dest aggr Flow ID Egress Port 0xF0A3 4 1 r source ToR t 1 1 dest ToR t r ToR Path Selection t

30 Implementation - Click FromDevice (eth0) FromDevice (eth1) FromDevice (eth2) FromDevice (eth3) DRE DRE Prototype Expeditus in Click software router LookupIPRoute Output 0 EXPRoute Queue 0 Queue 1 Queue 2 Queue 3 DRE DRE Overhead for the new Click modules DRE (measure link utilization) ~ 151 ns/packet EXPRoute (two-stage path selection) ~ 473 ns/packet ToDevice (eth0) ToDevice (eth1) ToDevice (eth2) ToDevice (eth3) Packet processing pipeline for 4-pod fat-tree 30

31 Evaluation Schemes compared Expeditus, ECMP (baseline), CONGA (flowlet-level granularity), CONGA-Flow (flow-level granularity) Clairvoyant: ideal scheme that uses complete global congestion information to load balance flows (impractical in reality) Realistic traffic workloads: web search and data mining Performance metric: flow completion time (FCT) 31

32 Testbed Experiments Small-scale 3-tier Clos network with 4:1 oversubscription on real testbed 1Gbps NIC at & hosts 32

33 Testbed Results 99%ile tail for small flows (< 100KB) Web search Avg for large flows (>1MB) Web search 33

34 Large-scale Simulations NS-3 network simulator Topology 12-pod 10G fat-tree, 36 equal-cost paths, 864 hosts 10G Leaf-spine fabric with 128 hosts (8 leaf, 8 spine ) 34

35 Performance in 3-tier Clos 12-pod fat-tree with network oversubscription 2:1 at ToR tier 14%~32% 19%~29% Avg for all flows Web search Data Mining Although Expeditus misses some paths, it is still able to closely track Clairvoyant 35

36 Impact of link failure Reduction by Clairvoyant and Expeditus over ECMP (web search workload, load 0.5) Aggr-core link failure ToR-aggr link failure Expeditus still provides moderate performance gains in asymmetric scenarios

37 Comparison with CONGA Leaf-spine with network oversubscription 2:1 at leaf tier Avg for all flows There may not be enough concurrent flows to cover all paths at all times, which makes CONGA performs worse than Expeditus

38 Conclusion Expeditus: a novel data plane congestion-aware load balancing protocol for Clos data centre networks Includes simple local information collection and two-stage path selection mechanism Advances state-of-the-art by enabling congestion-aware load balancing in general 3-tier Clos topologies 38

39 Thank you! Peng Wang City University of Hong Kong 39

40 Backup slides 40

41 Motivation Local congestion-aware schemes Each switch balances load based on local congestion information Blind to congestion in downstream link 100Mbps s 4 s 5 1Gbps s 1 s 2 s 3 41

42 Per-pathlet Congestion Pathlet: two-hop segments between ToR and core Each ToR switch maintains congestion information at the granularity of pathlets Still require many concurrent flows to cover all flawless 1 m 1 m core aggregation 1 r 1 r ToR source ToR t 1 1 dest ToR t

43 Structural Property 1 m 1 m core aggregation 1 r 1 r ToR Pathwise connectivity: Aggregation switch of ID i only connects to aggregation of the same ID i in other pods 43

Expeditus: Congestion-aware Load Balancing in Clos Data Center Networks

Expeditus: Congestion-aware Load Balancing in Clos Data Center Networks Expeditus: Congestion-aware Load Balancing in Clos Data Center Networks Peng Wang 1 Hong Xu 1 Zhixiong Niu 1 Dongsu Han 2 Yongqiang Xiong 3 1 NetX Lab, City University of Hong Kong 2 KAIST 3 Microsoft

More information

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters CONGA: Distributed Congestion-Aware Load Balancing for Datacenters By Alizadeh,M et al. Motivation Distributed datacenter applications require large bisection bandwidth Spine Presented by Andrew and Jack

More information

Alizadeh, M. et al., " CONGA: distributed congestion-aware load balancing for datacenters," Proc. of ACM SIGCOMM '14, 44(4): , Oct

Alizadeh, M. et al.,  CONGA: distributed congestion-aware load balancing for datacenters, Proc. of ACM SIGCOMM '14, 44(4): , Oct CONGA Paper Review By Buting Ma and Taeju Park Paper Reference Alizadeh, M. et al., " CONGA: distributed congestion-aware load balancing for datacenters," Proc. of ACM SIGCOMM '14, 44(4):503-514, Oct.

More information

Building Efficient and Reliable Software-Defined Networks. Naga Katta

Building Efficient and Reliable Software-Defined Networks. Naga Katta FPO Talk Building Efficient and Reliable Software-Defined Networks Naga Katta Jennifer Rexford (Advisor) Readers: Mike Freedman, David Walker Examiners: Nick Feamster, Aarti Gupta 1 Traditional Networking

More information

Information-Agnostic Flow Scheduling for Commodity Data Centers

Information-Agnostic Flow Scheduling for Commodity Data Centers Information-Agnostic Flow Scheduling for Commodity Data Centers Wei Bai, Li Chen, Kai Chen, Dongsu Han (KAIST), Chen Tian (NJU), Hao Wang Sing Group @ Hong Kong University of Science and Technology USENIX

More information

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz 1 A Typical Facebook Page Modern pages have many components

More information

Let It Flow Resilient Asymmetric Load Balancing with Flowlet Switching

Let It Flow Resilient Asymmetric Load Balancing with Flowlet Switching Let It Flow Resilient Asymmetric Load Balancing with Flowlet Switching Erico Vanini*, Rong Pan*, Mohammad Alizadeh, Parvin Taheri*, Tom Edsall* * Load Balancing in Data Centers Multi-rooted tree 1000s

More information

Per-Packet Load Balancing in Data Center Networks

Per-Packet Load Balancing in Data Center Networks Per-Packet Load Balancing in Data Center Networks Yagiz Kaymak and Roberto Rojas-Cessa Abstract In this paper, we evaluate the performance of perpacket load in data center networks (DCNs). Throughput and

More information

Cloud networking (VITMMA02) DC network topology, Ethernet extensions

Cloud networking (VITMMA02) DC network topology, Ethernet extensions Cloud networking (VITMMA02) DC network topology, Ethernet extensions Markosz Maliosz PhD Department of Telecommunications and Media Informatics Faculty of Electrical Engineering and Informatics Budapest

More information

Information-Agnostic Flow Scheduling for Commodity Data Centers. Kai Chen SING Group, CSE Department, HKUST May 16, Stanford University

Information-Agnostic Flow Scheduling for Commodity Data Centers. Kai Chen SING Group, CSE Department, HKUST May 16, Stanford University Information-Agnostic Flow Scheduling for Commodity Data Centers Kai Chen SING Group, CSE Department, HKUST May 16, 2016 @ Stanford University 1 SING Testbed Cluster Electrical Packet Switch, 1G (x10) Electrical

More information

HULA: Scalable Load Balancing Using Programmable Data Planes

HULA: Scalable Load Balancing Using Programmable Data Planes : Scalable Load Balancing Using Programmable Data Planes Naga Katta *, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman +, Jennifer Rexford * * Princeton University, VMware, Barefoot Networks, + MIT CSAIL

More information

DATA center networks use multi-rooted Clos topologies

DATA center networks use multi-rooted Clos topologies : Sampling based Balancing in Data Center Networks Peng Wang, Member, IEEE, George Trimponias, Hong Xu, Member, IEEE, Yanhui Geng Abstract Data center networks demand high-performance, robust, and practical

More information

IEEE P802.1Qcz Proposed Project for Congestion Isolation

IEEE P802.1Qcz Proposed Project for Congestion Isolation IEEE P82.1Qcz Proposed Project for Congestion Isolation IETF 11 London ICCRG Paul Congdon paul.congdon@tallac.com Project Background P82.1Qcz Project Initiation November 217 - Agreed to develop a Project

More information

Micro load balancing in data centers with DRILL

Micro load balancing in data centers with DRILL Micro load balancing in data centers with DRILL Soudeh Ghorbani (UIUC) Brighten Godfrey (UIUC) Yashar Ganjali (University of Toronto) Amin Firoozshahian (Intel) Where should the load balancing functionality

More information

Computer Network Architectures and Multimedia. Guy Leduc. Chapter 2 MPLS networks. Chapter 2: MPLS

Computer Network Architectures and Multimedia. Guy Leduc. Chapter 2 MPLS networks. Chapter 2: MPLS Computer Network Architectures and Multimedia Guy Leduc Chapter 2 MPLS networks Chapter based on Section 5.5 of Computer Networking: A Top Down Approach, 6 th edition. Jim Kurose, Keith Ross Addison-Wesley,

More information

Sincronia: Near-Optimal Network Design for Coflows. Shijin Rajakrishnan. Joint work with

Sincronia: Near-Optimal Network Design for Coflows. Shijin Rajakrishnan. Joint work with Sincronia: Near-Optimal Network Design for Coflows Shijin Rajakrishnan Joint work with Saksham Agarwal Akshay Narayan Rachit Agarwal David Shmoys Amin Vahdat Traditional Applications: Care about performance

More information

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters : Distributed Congestion-Aware Load Balancing for Datacenters Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan,

More information

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters : Distributed Congestion-Aware Load Balancing for Datacenters Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam (Google), Francis Matus,

More information

P802.1Qcz Congestion Isolation

P802.1Qcz Congestion Isolation P802.1Qcz Congestion Isolation IEEE 802 / IETF Workshop on Data Center Networking Bangkok November 2018 Paul Congdon (Huawei/Tallac) The Case for Low-latency, Lossless, Large-Scale DCNs More and more latency-sensitive

More information

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552)

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552) Packet Scheduling in Data Centers Lecture 17, Computer Networks (198:552) Datacenter transport Goal: Complete flows quickly / meet deadlines Short flows (e.g., query, coordination) Large flows (e.g., data

More information

DevoFlow: Scaling Flow Management for High-Performance Networks

DevoFlow: Scaling Flow Management for High-Performance Networks DevoFlow: Scaling Flow Management for High-Performance Networks Andy Curtis Jeff Mogul Jean Tourrilhes Praveen Yalagandula Puneet Sharma Sujata Banerjee Software-defined networking Software-defined networking

More information

A closer look at network structure:

A closer look at network structure: T1: Introduction 1.1 What is computer network? Examples of computer network The Internet Network structure: edge and core 1.2 Why computer networks 1.3 The way networks work 1.4 Performance metrics: Delay,

More information

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level

More information

Hashing on broken assumptions

Hashing on broken assumptions Hashing on broken assumptions Lorenzo Saino (@lorenzosaino) Fastly Name of Presentation Problem: Spreading traffic across multiple links, paths, hosts Solutions: Link Aggregation Equal Cost Multipath (ECMP)

More information

Waze: Congestion-Aware Load Balancing at the Virtual Edge for Asymmetric Topologies

Waze: Congestion-Aware Load Balancing at the Virtual Edge for Asymmetric Topologies Waze: Congestion-Aware Load Balancing at the Virtual Edge for Asymmetric Topologies Naga Katta Salesforce.com Aran Bergman Technion Aditi Ghag, Mukesh Hira VMware Changhoon Kim Barefoot Networks Isaac

More information

Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers

Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers Fast and Cautious: Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers Guo Chen Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong (Larry) Luo, Yongqiang Xiong,

More information

Data Center Network Topologies II

Data Center Network Topologies II Data Center Network Topologies II Hakim Weatherspoon Associate Professor, Dept of Computer cience C 5413: High Performance ystems and Networking April 10, 2017 March 31, 2017 Agenda for semester Project

More information

6.888: Lecture 4 Data Center Load Balancing

6.888: Lecture 4 Data Center Load Balancing 6.888: Lecture 4 Data Center Load Balancing Mohammad Alizadeh Spring 2016 1 MoDvaDon DC networks need large bisection bandwidth for distributed apps (big data, HPC, web services, etc) Multi-rooted Single-rooted

More information

Interdomain Routing Design for MobilityFirst

Interdomain Routing Design for MobilityFirst Interdomain Routing Design for MobilityFirst October 6, 2011 Z. Morley Mao, University of Michigan In collaboration with Mike Reiter s group 1 Interdomain routing design requirements Mobility support Network

More information

Delay Controlled Elephant Flow Rerouting in Software Defined Network

Delay Controlled Elephant Flow Rerouting in Software Defined Network 1st International Conference on Advanced Information Technologies (ICAIT), Nov. 1-2, 2017, Yangon, Myanmar Delay Controlled Elephant Flow Rerouting in Software Defined Network Hnin Thiri Zaw, Aung Htein

More information

lecture 18: network virtualization platform (NVP) 5590: software defined networking anduo wang, Temple University TTLMAN 401B, R 17:30-20:00

lecture 18: network virtualization platform (NVP) 5590: software defined networking anduo wang, Temple University TTLMAN 401B, R 17:30-20:00 lecture 18: network virtualization platform (NVP) 5590: software defined networking anduo wang, Temple University TTLMAN 401B, R 17:30-20:00 Network Virtualization in multi-tenant Datacenters Teemu Koponen.,

More information

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA Weirong Jiang, Viktor K. Prasanna University of Southern California Norio Yamagaki NEC Corporation September 1, 2010 Outline

More information

Outlines. Introduction (Cont d) Introduction. Introduction Network Evolution External Connectivity Software Control Experience Conclusion & Discussion

Outlines. Introduction (Cont d) Introduction. Introduction Network Evolution External Connectivity Software Control Experience Conclusion & Discussion Outlines Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network Singh, A. et al. Proc. of ACM SIGCOMM '15, 45(4):183-197, Oct. 2015 Introduction Network Evolution

More information

DATA CENTER FABRIC COOKBOOK

DATA CENTER FABRIC COOKBOOK Do It Yourself! DATA CENTER FABRIC COOKBOOK How to prepare something new from well known ingredients Emil Gągała WHAT DOES AN IDEAL FABRIC LOOK LIKE? 2 Copyright 2011 Juniper Networks, Inc. www.juniper.net

More information

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era Massimiliano Sbaraglia Network Engineer Introduction In the cloud computing era, distributed architecture is used to handle operations of mass data, such as the storage, mining, querying, and searching

More information

DIBS: Just-in-time congestion mitigation for Data Centers

DIBS: Just-in-time congestion mitigation for Data Centers DIBS: Just-in-time congestion mitigation for Data Centers Kyriakos Zarifis, Rui Miao, Matt Calder, Ethan Katz-Bassett, Minlan Yu, Jitendra Padhye University of Southern California Microsoft Research Summary

More information

DiffFlow: Differentiating Short and Long Flows for Load Balancing in Data Center Networks

DiffFlow: Differentiating Short and Long Flows for Load Balancing in Data Center Networks : Differentiating Short and Long Flows for Load Balancing in Data Center Networks Francisco Carpio, Anna Engelmann and Admela Jukan Technische Universität Braunschweig, Germany Email:{f.carpio, a.engelmann,

More information

Routing Domains in Data Centre Networks. Morteza Kheirkhah. Informatics Department University of Sussex. Multi-Service Networks July 2011

Routing Domains in Data Centre Networks. Morteza Kheirkhah. Informatics Department University of Sussex. Multi-Service Networks July 2011 Routing Domains in Data Centre Networks Morteza Kheirkhah Informatics Department University of Sussex Multi-Service Networks July 2011 What is a Data Centre? Large-scale Data Centres (DC) consist of tens

More information

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers Cutting the Cord: A Robust Wireless Facilities Network for Data Centers Yibo Zhu, Xia Zhou, Zengbin Zhang, Lin Zhou, Amin Vahdat, Ben Y. Zhao and Haitao Zheng U.C. Santa Barbara, Dartmouth College, U.C.

More information

SDN-based Network Obfuscation. Roland Meier PhD Student ETH Zürich

SDN-based Network Obfuscation. Roland Meier PhD Student ETH Zürich SDN-based Network Obfuscation Roland Meier PhD Student ETH Zürich This Talk This thesis vs. existing solutions Alice Bob source: Alice destination: Bob Hi Bob, Hi Bob, Payload encryption ǾǼōĦ

More information

pfabric: Minimal Near-Optimal Datacenter Transport

pfabric: Minimal Near-Optimal Datacenter Transport pfabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh,ShuangYang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar,andScottShenker Stanford University Insieme Networks U.C. Berkeley

More information

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China

Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat. ACM SIGCOMM 2013, August, Hong Kong, China Got Loss? Get zovn! Daniel Crisan, Robert Birke, Gilles Cressier, Cyriel Minkenberg, and Mitch Gusat ACM SIGCOMM 2013, 12-16 August, Hong Kong, China Virtualized Server 1 Application Performance in Virtualized

More information

COCONUT: Seamless Scale-out of Network Elements

COCONUT: Seamless Scale-out of Network Elements COCONUT: Seamless Scale-out of Network Elements Soudeh Ghorbani P. Brighten Godfrey University of Illinois at Urbana-Champaign Simple abstractions Firewall Loadbalancer Router Network operating system

More information

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it Lecture 10.1 A real SDN implementation: the Google B4 case Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it WAN WAN = Wide Area Network WAN features: Very expensive (specialized high-end

More information

Baidu s Best Practice with Low Latency Networks

Baidu s Best Practice with Low Latency Networks Baidu s Best Practice with Low Latency Networks Feng Gao IEEE 802 IC NEND Orlando, FL November 2017 Presented by Huawei Low Latency Network Solutions 01 1. Background Introduction 2. Network Latency Analysis

More information

SOFTWARE DEFINED NETWORKS. Jonathan Chu Muhammad Salman Malik

SOFTWARE DEFINED NETWORKS. Jonathan Chu Muhammad Salman Malik SOFTWARE DEFINED NETWORKS Jonathan Chu Muhammad Salman Malik Credits Material Derived from: Rob Sherwood, Saurav Das, Yiannis Yiakoumis AT&T Tech Talks October 2010 (available at:www.openflow.org/wk/images/1/17/openflow_in_spnetworks.ppt)

More information

Towards Fully Synchronized (and Programmable) Datacenter Networks

Towards Fully Synchronized (and Programmable) Datacenter Networks Towards Fully Synchronized (and Programmable) Datacenter Networks Vishal Shrivastav, Cornell University 28 Jul 2016 University of Cambridge Outline DTP: Datacenter Time Protocol [SIGCOMM 16] SHOAL: Synchronized

More information

Slicing a Network. Software-Defined Network (SDN) FlowVisor. Advanced! Computer Networks. Centralized Network Control (NC)

Slicing a Network. Software-Defined Network (SDN) FlowVisor. Advanced! Computer Networks. Centralized Network Control (NC) Slicing a Network Advanced! Computer Networks Sherwood, R., et al., Can the Production Network Be the Testbed? Proc. of the 9 th USENIX Symposium on OSDI, 2010 Reference: [C+07] Cascado et al., Ethane:

More information

Advanced Computer Networks. Datacenter TCP

Advanced Computer Networks. Datacenter TCP Advanced Computer Networks 263 3501 00 Datacenter TCP Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Today Problems with TCP in the Data Center TCP Incast TPC timeouts Improvements

More information

Exploiting ICN for Flexible Management of Software-Defined Networks

Exploiting ICN for Flexible Management of Software-Defined Networks Exploiting ICN for Flexible Management of Software-Defined Networks Mayutan Arumaithurai, Jiachen Chen, Edo Monticelli, Xiaoming Fu and K. K. Ramakrishnan * University of Goettingen, Germany * University

More information

Lecture 7: Data Center Networks

Lecture 7: Data Center Networks Lecture 7: Data Center Networks CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Nick Feamster Lecture 7 Overview Project discussion Data Centers overview Fat Tree paper discussion CSE

More information

Universal Packet Scheduling. Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley

Universal Packet Scheduling. Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley Universal Packet Scheduling Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley Many Scheduling Algorithms Many different algorithms FIFO, FQ, virtual clocks, priorities Many different

More information

Adaptive Routing Strategies for Modern High Performance Networks

Adaptive Routing Strategies for Modern High Performance Networks Adaptive Routing Strategies for Modern High Performance Networks Patrick Geoffray Myricom patrick@myri.com Torsten Hoefler Indiana University htor@cs.indiana.edu 28 August 2008 Hot Interconnect Stanford,

More information

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks Dr. Vinod Vokkarane Assistant Professor, Computer and Information Science Co-Director, Advanced Computer Networks Lab University

More information

Advanced Computer Networks. Datacenter TCP

Advanced Computer Networks. Datacenter TCP Advanced Computer Networks 263 3501 00 Datacenter TCP Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 1 Oriana Riva, Department of Computer Science ETH Zürich Last week Datacenter Fabric Portland

More information

Extreme Networks How to Build Scalable and Resilient Fabric Networks

Extreme Networks How to Build Scalable and Resilient Fabric Networks Extreme Networks How to Build Scalable and Resilient Fabric Networks Mikael Holmberg Distinguished Systems Engineer Fabrics MLAG IETF TRILL Cisco FabricPath Extreme (Brocade) VCS Juniper QFabric IEEE Fabric

More information

Configuring Local SPAN and ERSPAN

Configuring Local SPAN and ERSPAN This chapter contains the following sections: Information About ERSPAN, page 1 Licensing Requirements for ERSPAN, page 5 Prerequisites for ERSPAN, page 5 Guidelines and Limitations for ERSPAN, page 5 Guidelines

More information

Data Center Network Topologies

Data Center Network Topologies Data Center Network Topologies. Overview 1. Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides and audio/video recordings of this class lecture are at:

More information

Dissemination of Paths in Path-Aware Networks

Dissemination of Paths in Path-Aware Networks Dissemination of Paths in Path-Aware Networks Christos Pappas Network Security Group, ETH Zurich IETF, November 16, 2017 PANRG Motivation How does path-awareness extend to the edge? 2 PANRG Motivation

More information

States on a (Data) Plane. Jennifer Rexford

States on a (Data) Plane. Jennifer Rexford States on a (Data) Plane Jennifer Rexford Traditional data planes are stateless 1 Software Defined Networks (SDN) Program your network from a logically central point! 2 OpenFlow Rule Tables Prio match

More information

Universal Packet Scheduling. Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley

Universal Packet Scheduling. Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley Universal Packet Scheduling Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley Packet Scheduling Active research literature with many Algorithms FIFO, DRR, virtual clocks, priorities

More information

A NEW APPROACH TO STOCHASTIC SCHEDULING IN DATA CENTER NETWORKS

A NEW APPROACH TO STOCHASTIC SCHEDULING IN DATA CENTER NETWORKS A NEW APPROACH TO STOCHASTIC SCHEDULING IN DATA CENTER NETWORKS ABSTRACT Tingqiu Tim Yuan, Tao Huang, Cong Xu and Jian Li Huawei Technologies, China The Quality of Service (QoS) of scheduling between latency-sensitive

More information

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Utilizing Datacenter Networks: Centralized or Distributed Solutions? Utilizing Datacenter Networks: Centralized or Distributed Solutions? Costin Raiciu Department of Computer Science University Politehnica of Bucharest We ve gotten used to great applications Enabling Such

More information

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers

Cutting the Cord: A Robust Wireless Facilities Network for Data Centers Cutting the Cord: A Robust Wireless Facilities Network for Data Centers Yibo Zhu, Xia Zhou, Zengbin Zhang, Lin Zhou, Amin Vahdat, Ben Y. Zhao and Haitao Zheng U.C. Santa Barbara, Dartmouth College, U.C.

More information

KNOM Tutorial Internet Traffic Matrix Measurement and Analysis. Sue Bok Moon Dept. of Computer Science

KNOM Tutorial Internet Traffic Matrix Measurement and Analysis. Sue Bok Moon Dept. of Computer Science KNOM Tutorial 2003 Internet Traffic Matrix Measurement and Analysis Sue Bok Moon Dept. of Computer Science Overview Definition of Traffic Matrix 4Traffic demand, delay, loss Applications of Traffic Matrix

More information

Enabling Wide-spread Communications on Optical Fabric with MegaSwitch

Enabling Wide-spread Communications on Optical Fabric with MegaSwitch Enabling Wide-spread Communications on Optical Fabric with MegaSwitch Li Chen Kai Chen, Zhonghua Zhu, Minlan Yu, George Porter, Chunming Qiao, Shan Zhong Optical Networking in Data Centers Optical networking

More information

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks Gwangsun Kim Arm Research Hayoung Choi, John Kim KAIST High-radix Networks Dragonfly network in Cray XC30 system 1D Flattened butterfly

More information

Configure Segment Routing for BGP

Configure Segment Routing for BGP Border Gateway Protocol (BGP) is an Exterior Gateway Protocol (EGP) that allows you to create loop-free inter-domain routing between autonomous systems. An autonomous system is a set of routers under a

More information

IP Fabric Architectures for SMPTE 2110 Bits By The Bay 2018 Conference. Ammar Latif Cisco Systems

IP Fabric Architectures for SMPTE 2110 Bits By The Bay 2018 Conference. Ammar Latif Cisco Systems IP Fabric Architectures for SMPTE 2110 Bits By The Bay 2018 Conference Ammar Latif Cisco Systems Industry Challenges and Requirements Video Router COTS Switches Deterministic Network End Point Synchronization

More information

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 18

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 18 CMPE 150/L : Introduction to Computer Networks Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 18 1 Final project demo Please do the demo THIS week to the TAs. Or you are allowed to use

More information

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan Coflow Recent Advances and What s Next? Mosharaf Chowdhury University of Michigan Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow Networking Open Source Apache Spark Open

More information

FOUNDATIONS OF INTENT- BASED NETWORKING

FOUNDATIONS OF INTENT- BASED NETWORKING FOUNDATIONS OF INTENT- BASED NETWORKING Loris D Antoni Aditya Akella Aaron Gember Jacobson Network Policies Enterprise Network Cloud Network Enterprise Network 2 3 Tenant Network Policies Enterprise Network

More information

ALB: Adaptive Load Balancing Based on Accurate Congestion Feedback for Asymmetric Topologies

ALB: Adaptive Load Balancing Based on Accurate Congestion Feedback for Asymmetric Topologies : Adaptive Load Balancing Based on Accurate Congestion Feedback for Asymmetric Topologies Qingyu Shi 1, Fang Wang 1,2, Dan Feng 1, and Weibin Xie 1 1 Wuhan National Laboratory for Optoelectronics, Key

More information

Scalable Enterprise Networks with Inexpensive Switches

Scalable Enterprise Networks with Inexpensive Switches Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises

More information

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope

More information

NaaS Network-as-a-Service in the Cloud

NaaS Network-as-a-Service in the Cloud NaaS Network-as-a-Service in the Cloud joint work with Matteo Migliavacca, Peter Pietzuch, and Alexander L. Wolf costa@imperial.ac.uk Motivation Mismatch between app. abstractions & network How the programmers

More information

Towards a Robust Protocol Stack for Diverse Wireless Networks Arun Venkataramani

Towards a Robust Protocol Stack for Diverse Wireless Networks Arun Venkataramani Towards a Robust Protocol Stack for Diverse Wireless Networks Arun Venkataramani (in collaboration with Ming Li, Devesh Agrawal, Deepak Ganesan, Aruna Balasubramanian, Brian Levine, Xiaozheng Tie at UMass

More information

Intel Rack Scale Architecture. using Intel Ethernet Multi-host Controller FM10000 Family

Intel Rack Scale Architecture. using Intel Ethernet Multi-host Controller FM10000 Family white paper Intel Rack Scale Architecture using Intel Multi-host FM10000 Family Introduction Hyperscale data centers are being deployed with tens of thousands of servers making operating efficiency a key

More information

Data Center Networks. Brighten Godfrey CS 538 April Thanks to Ankit Singla for some slides in this lecture

Data Center Networks. Brighten Godfrey CS 538 April Thanks to Ankit Singla for some slides in this lecture Data Center Networks Brighten Godfrey CS 538 April 5 2017 Thanks to Ankit Singla for some slides in this lecture Introduction: The Driving Trends Cloud Computing Computing as a utility Purchase however

More information

CSC 401 Data and Computer Communications Networks

CSC 401 Data and Computer Communications Networks CSC 401 Data and Computer Communications Networks Link Layer, Switches, VLANS, MPLS, Data Centers Sec 6.4 to 6.7 Prof. Lina Battestilli Fall 2017 Chapter 6 Outline Link layer and LANs: 6.1 introduction,

More information

Application-Aware SDN Routing for Big-Data Processing

Application-Aware SDN Routing for Big-Data Processing Application-Aware SDN Routing for Big-Data Processing Evaluation by EstiNet OpenFlow Network Emulator Director/Prof. Shie-Yuan Wang Institute of Network Engineering National ChiaoTung University Taiwan

More information

DARD: A Practical Distributed Adaptive Routing Architecture for Datacenter Networks

DARD: A Practical Distributed Adaptive Routing Architecture for Datacenter Networks DARD: A Practical Distributed Adaptive Routing Architecture for Datacenter Networks Xin Wu and Xiaowei Yang Duke-CS-TR-20-0 {xinwu, xwy}@cs.duke.edu Abstract Datacenter networks typically have multiple

More information

Advanced Computer Networks Data Center Architecture. Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015

Advanced Computer Networks Data Center Architecture. Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 Advanced Computer Networks 263-3825-00 Data Center Architecture Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 1 MORE ABOUT TOPOLOGIES 2 Bisection Bandwidth Bisection bandwidth: Sum of the

More information

Lecture 16: Data Center Network Architectures

Lecture 16: Data Center Network Architectures MIT 6.829: Computer Networks Fall 2017 Lecture 16: Data Center Network Architectures Scribe: Alex Lombardi, Danielle Olson, Nicholas Selby 1 Background on Data Centers Computing, storage, and networking

More information

Small-World Datacenters

Small-World Datacenters 2 nd ACM Symposium on Cloud Computing Oct 27, 2011 Small-World Datacenters Ji-Yong Shin * Bernard Wong +, and Emin Gün Sirer * * Cornell University + University of Waterloo Motivation Conventional networks

More information

A Scalable, Commodity Data Center Network Architecture

A Scalable, Commodity Data Center Network Architecture A Scalable, Commodity Data Center Network Architecture B Y M O H A M M A D A L - F A R E S A L E X A N D E R L O U K I S S A S A M I N V A H D A T P R E S E N T E D B Y N A N X I C H E N M A Y. 5, 2 0

More information

Implementing VXLAN in DataCenter

Implementing VXLAN in DataCenter Implementing VXLAN in DataCenter LTRDCT-1223 Lilian Quan Technical Marketing Engineering, INSBU Erum Frahim Technical Leader, ecats John Weston Technical Leader, ecats Why Overlays? Robust Underlay/Fabric

More information

Fastpass: A Centralized Zero-Queue Datacenter Network

Fastpass: A Centralized Zero-Queue Datacenter Network Fastpass: A Centralized Zero-Queue Datacenter Network Jonathan Perry, Amy Ousterhout, Hari Balakrishnan, Devavrat Shah, Hans Fugal M.I.T. Computer Science & Artificial Intelligence Lab Facebook http://fastpass.mit.edu/

More information

Introduction to Segment Routing

Introduction to Segment Routing Segment Routing (SR) is a flexible, scalable way of doing source routing. Overview of Segment Routing, page 1 How Segment Routing Works, page 2 Examples for Segment Routing, page 3 Benefits of Segment

More information

Data Center architecture trends and their impact on PMD requirements

Data Center architecture trends and their impact on PMD requirements Data Center architecture trends and their impact on PMD requirements Mark Nowell, Matt Traverso Cisco Kapil Shrikhande Dell IEEE 802.3 NG100GE Optics Study Group March 2012 1 Scott Kipp Brocade David Warren

More information

RDMA over Commodity Ethernet at Scale

RDMA over Commodity Ethernet at Scale RDMA over Commodity Ethernet at Scale Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitendra Padhye, Marina Lipshteyn ACM SIGCOMM 2016 August 24 2016 Outline RDMA/RoCEv2 background DSCP-based

More information

A Network-aware Scheduler in Data-parallel Clusters for High Performance

A Network-aware Scheduler in Data-parallel Clusters for High Performance A Network-aware Scheduler in Data-parallel Clusters for High Performance Zhuozhao Li, Haiying Shen and Ankur Sarker Department of Computer Science University of Virginia May, 2018 1/61 Data-parallel clusters

More information

Future Internet Architectures

Future Internet Architectures Future Internet Architectures Brighten Godfrey cs598pbg Nov 4 2010 slides 2010 by Brighten Godfrey unless otherwise noted Tussle in Cyberspace What tussles have we studied this semester? Choice in routing

More information

IP Fabric Reference Architecture

IP Fabric Reference Architecture IP Fabric Reference Architecture Technical Deep Dive jammon@brocade.com Feng Shui of Data Center Design 1. Follow KISS Principle Keep It Simple 2. Minimal features 3. Minimal configuration 4. Configuration

More information

Data Plane Monitoring in Segment Routing Networks Faisal Iqbal Cisco Systems Clayton Hassen Bell Canada

Data Plane Monitoring in Segment Routing Networks Faisal Iqbal Cisco Systems Clayton Hassen Bell Canada Data Plane Monitoring in Segment Routing Networks Faisal Iqbal Cisco Systems (faiqbal@cisco.com) Clayton Hassen Bell Canada (clayton.hassen@bell.ca) Reference Topology & Conventions SR control plane is

More information

A Generalized Blind Scheduling Policy

A Generalized Blind Scheduling Policy A Generalized Blind Scheduling Policy Hanhua Feng 1, Vishal Misra 2,3 and Dan Rubenstein 2 1 Infinio Systems 2 Columbia University in the City of New York 3 Google TTIC SUMMER WORKSHOP: DATA CENTER SCHEDULING

More information

Data Centers. Tom Anderson

Data Centers. Tom Anderson Data Centers Tom Anderson Transport Clarification RPC messages can be arbitrary size Ex: ok to send a tree or a hash table Can require more than one packet sent/received We assume messages can be dropped,

More information

Packet-Based Load Balancing in Data Center Networks

Packet-Based Load Balancing in Data Center Networks Packet-Based Load Balancing in Data Center Networks Yagiz Kaymak and Roberto Rojas-Cessa Networking Research Laboratory, Department of Electrical and Computer Engineering, New Jersey Institute of Technology,

More information

Universal Packet Scheduling. Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley

Universal Packet Scheduling. Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley Universal Packet Scheduling Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC Berkeley Many Scheduling Algorithms Many different algorithms FIFO, FQ, virtual clocks, priorities Many different

More information

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors

Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors Towards High-performance Flow-level level Packet Processing on Multi-core Network Processors Yaxuan Qi (presenter), Bo Xu, Fei He, Baohua Yang, Jianming Yu and Jun Li ANCS 2007, Orlando, USA Outline Introduction

More information