Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Size: px
Start display at page:

Download "Utilizing Datacenter Networks: Centralized or Distributed Solutions?"

Transcription

1 Utilizing Datacenter Networks: Centralized or Distributed Solutions? Costin Raiciu Department of Computer Science University Politehnica of Bucharest

2 We ve gotten used to great applications

3 Enabling Such Apps is Hard Apps Process huge amounts of data Are fast Are reliable One machine is not enough Limited reliability, speed Super computers are expensive Use many commodity machines instead

4 Data Centers Rule the World Cloud computing Economies of scale: networks of tens of thousands of hosts Datacenter apps support web search, online stores Web search, GFS, BigTable, DryadLINQ, MapReduce Dense traffic patterns Intra datacenter traffic is increasing in volume

5 Flexibility is Important in Data Centers Apps distributed across thousands of machines. Flexibility want any machine to be able to play any role.

6 Traditional Data Center Topology Internet Core Switch 10Gbps 10Gbps 1Gbps Aggrega5on Switches Top of Rack Switches Racks of servers

7 Fat Tree Topology [Fares et al., 2008; Clos, 1953] K=4 Aggrega5on Switches 1Gbps 1Gbps K Pods with K Switches each Racks of servers

8 VL2 Topology [Greenberg et al, 2009, Clos topology] 10Gbps 10Gbps 20 hosts

9 BCube Topology [Guo et al, 2009] BCube (4,1)

10 How Do We Route Packets in Data Centers? Traditional Routing Spanning Tree Protocol - kills all redundancy Instead datacenters use one of the following techniques: Multiple VLANs OSPF TRILL (new IETF standard)

11 How Do We Use this Capacity? Need to distribute flows across available paths. Basic solution: Random Load Balancing. Use Equal-Cost Multipath (ECMP) routing (OSPF, TRILL) Hash to a path at random. Sources randomly pick a VLANs. In practice sources have multiple interfaces pick a random source address for the flow

12 Collisions 1Gbps 1Gbps Racks of servers

13 Single-path TCP collisions reduce throughput

14 How bad are collisions? Capacity wasted (worst case): FatTree 60% BCube 50% VL2 25%

15 How do we address this problem? I will discuss two solutions Flow scheduling Multipath TCP

16 Flow Scheduling Hedera Fares et al. NSDI 2010

17 Solving Collisions with Flow Scheduling Centralized Controller 2. Compute flow demands 1. Pull stats, detect large flows 3. Compute placement 4. Place flows 1Gbps OF OF OF OF 1Gbps OF OF OF OF OF OF OF OF OF OF OF OF OF OF OF OF Racks of servers

18 Hedera Main Idea Schedule elephant flows They carry most of the bytes ECMP deals with short flows

19 Detecting Elephants Pull edge switches for byte counts Flows exceeding 100Mb/s are large What if only short flows? ECMP should be good enough

20 Demand Estimation Current flow rates are a poor indicator of flow demand Network could be the bottleneck Hedera s approach: what would this flow get if the network was not a bottleneck?

21 Demand estimation: simple example 1Gb/s 500Mb/s 500Mb/s General Approach: Iterative algorithm

22 Allocating Flows to Paths Multi-Commodity Flow Problem Single path forwarding Expressed as Binary Integer Programming NP-Complete Solvers give exact solution but are impractical for large networks

23 Approximating Multi-Commodity Flow Global First Fit Linearly search all paths until one that can accommodate the traffic is found Flows placed upon detection, are not moved Simulated Annealing Probabilistic search for good solutions that maximize bisection bandwidth

24 Fault Tolerance Scheduler failure all soft state, just fall back to ECMP Link, switch failures Portland notifies the scheduler

25 Does it work?

26 Hedera: One Flow, One Path Centralized Can it scale to really large datacenters? Needs a very tight control loop How often does it need to run to achieve these benefits? Strong assumption: traffic is always bottlenecked by the network What about app-bound traffic, e.g disk reads/writes?

27 Hedera: One Flow, One Path Centralized Can it scale to really large datacenters? MAYBE Needs a very tight control loop FIXABLE This is the wrong place to start How often does it need to run to achieve these benefits? Strong assumption: Only Hosts Know traffic is always bottlenecked by the network What about app-bound traffic, e.g disk reads/writes?

28 Multipath topologies need multipath transport Multipath transport enables better topologies

29 Collision

30

31

32 Not fair

33 Not fair

34

35

36 No matter how you do it, mapping each flow to a path is the wrong goal

37 Instead, we should pool capacity from different links

38 Instead, we should pool capacity from different links

39 Instead, we should pool capacity from different links

40 Instead, we should pool capacity from different links

41 Multipath Transport

42 Multipath Transport can pool datacenter networks Instead of using one path for each flow, use many random paths Don t worry about collisions. Just don t send (much) traffic on colliding paths

43 Multipath TCP Primer [IETF MPTCP WG] MPTCP is a drop in replacement for TCP Works with unmodified applications Over the existing network

44 MPTCP Operation SYN MP_CAPABLE X

45 MPTCP Operation SYN/ACK MP_CAPABLE Y

46 MPTCP Operation STATE 1 CWND Snd.SEQNO Rcv.SEQNO

47 MPTCP Operation STATE 1 CWND Snd.SEQNO Rcv.SEQNO SYN JOIN Y

48 MPTCP Operation STATE 1 CWND Snd.SEQNO Rcv.SEQNO SYN/ACK JOIN X

49 MPTCP Operation STATE 1 CWND Snd.SEQNO Rcv.SEQNO STATE 2 CWND Snd.SEQNO Rcv.SEQNO

50 MPTCP Operation SEQ 1000 options DSEQ DATA STATE 1 CWND Snd.SEQNO Rcv.SEQNO STATE 2 CWND Snd.SEQNO Rcv.SEQNO

51 MPTCP Operation SEQ 1000 options DSEQ DATA STATE 1 CWND Snd.SEQNO Rcv.SEQNO STATE 2 CWND Snd.SEQNO Rcv.SEQNO

52 MPTCP Operation SEQ 1000 options DSEQ DATA STATE 1 CWND Snd.SEQNO Rcv.SEQNO SEQ 5000 options DSEQ DATA STATE 2 CWND Snd.SEQNO Rcv.SEQNO

53 MPTCP Operation SEQ 1000 options DSEQ DATA STATE 1 CWND Snd.SEQNO Rcv.SEQNO SEQ 5000 options DSEQ DATA STATE 2 CWND Snd.SEQNO Rcv.SEQNO

54 MPTCP Operation SEQ 1000 options DSEQ DATA STATE 1 CWND Snd.SEQNO Rcv.SEQNO SEQ 5000 options DSEQ DATA STATE 2 CWND Snd.SEQNO Rcv.SEQNO

55 MPTCP Operation SEQ 1000 options DSEQ DATA STATE 1 CWND Snd.SEQNO Rcv.SEQNO SEQ 5000 options DSEQ DATA STATE 2 CWND Snd.SEQNO Rcv.SEQNO

56 MPTCP Operation ACK 2000 STATE 1 CWND Snd.SEQNO Rcv.SEQNO STATE 2 CWND Snd.SEQNO Rcv.SEQNO

57 MPTCP Operation SEQ 2000 options DSEQ DATA STATE 1 CWND Snd.SEQNO Rcv.SEQNO STATE 2 CWND Snd.SEQNO Rcv.SEQNO

58 Multipath TCP: Congestion Control [NSDI, 2011]

59 MPTCP better utilizes the FatTree network

60 MPTCP on EC2 Amazon EC2: infrastructure as a service We can borrow virtual machines by the hour These run in Amazon data centers worldwide We can boot our own kernel A few availability zones have multipath topologies 2-8 paths available between hosts not on the same machine or in the same rack Available via ECMP

61 Amazon EC2 Experiment 40 medium CPU instances running MPTCP For 12 hours, we sequentially ran all-to-all iperf cycling through: TCP MPTCP (2 and 4 subflows)

62 MPTCP improves performance on EC2

63 Where do MPTCP s benefits come from?

64 Allocating Flows to Paths Multi-Commodity Flow Problem Single path forwarding Expressed as Binary Integer Programming NP-Complete Solvers give exact solution but are impractical for large networks

65 Allocating Flows to Paths Multi-Commodity Flow Problem Single path forwarding Expressed as Binary Integer Programming NP-Complete Solvers give exact solution but are impractical for large networks Multipath forwarding Expressed as Linear Programming problem Solvable in polynomial time

66 How many subflows are needed? How does the topology affect results? How does the traffic matrix affect results?

67 At most 8 subflows are needed Total Throughput TCP

68 MPTCP improves fairness in VL2 topologies VL2 Fairness is important: Jobs finish when the slowest worker finishes

69 MPTCP improves throughput and fairness in BCube BCube Single path TCP optimum

70 Oversubscribed Topologies To saturate full bisectional bandwidth: There must be no traffic locality All hosts must send at the same time Host links must not be bottlenecks It makes sense to under-provision the network core This is what happens in practice Does MPTCP still provide benefits?

71 Performance improvements depend on traffic matrix Underloaded Sweet Spot Overloaded Increase Load

72 MPTCP vs. Centralized Scheduling

73 MPTCP vs Hedera First Fit Centralized Scheduling MPTCP Infinite Scheduling Interval

74 Centralized Scheduling: Setting the Threshold Throughput 1Gbps Hope 17% worse than multipath TCP 100Mbps App Limited

75 Centralized Scheduling: Setting the Threshold Throughput 1Gbps 21% worse than multipath TCP 100Mbps App Limited Hope

76 Centralized Scheduling: Setting the Threshold Throughput 1Gbps 500Mbps 100Mbps 51% 17% 45% 21%

77 MPTCP vs. Hedera MPTCP HEDERA Implementation Distributed Centralized Network changes No Yes, upgrade all switches to OF Hardware needed No Centralized Scheduler Software changes Yes host stack No Scope Schedules more flows Large flows only Convergence Time Scale Invariant, RTTs Tight Control Loop Limits Scalability Fairness Fair Less fair

78 What is an optimal datacenter topology for multipath transport?

79 In single homed topologies: Hosts links are often bottlenecks ToR switch failures wipe out tens of hosts for days Multi-homing servers is the obvious way forward

80 Fat Tree Topology

81 Fat Tree Topology Upper Pod Switch ToR Switch Servers

82 Dual Homed Fat Tree Topology Upper Pod Switch ToR Switch Servers

83 Is DHFT any better than Fat Tree? Not for traffic matrices that fully utilize the core Let s examine random traffic patterns

84 DHFT provides significant improvements when core is not overloaded Core Underloaded Core Overloaded

85 Summary One flow, one path thinking has constrained datacenter design Collisions, unfairness, limited utilization Fixing these is possible, but does not address the bigger issue Multipath transport enables resource pooling in datacenter networks: Improves throughput Improves fairness Improves robustness One flow, many paths frees designers to consider topologies that offer improved performance for similar cost

86 Backup Slides

87 Effect of MPTCP on short flows Flow sizes from VL2 dataset MPTCP enabled for long flows only (timer) Oversubscribed Fat Tree topology Results: TCP/ECMP Completion time: 79ms Core Utilization: 25% MPTCP 97ms 65%

88 Effect of Locality in the Dual Homed Fat Tree

89 Overloaded Fat Tree: better fairness with Multipath TCP

90

91

92 VL2 Topology [Greenberg et al, 2009, Clos topology] 10Gbps 10Gbps 20 hosts

93 BCube Topology [Guo et al, 2009] BCube (4,1)

Improving datacenter performance and robustness with multipath TCP

Improving datacenter performance and robustness with multipath TCP Improving datacenter performance and robustness with multipath TCP Paper #273, 14 pages ABSTRACT The latest large-scale data centers offer higher aggregate bandwidth and robustness by creating multiple

More information

Routing Domains in Data Centre Networks. Morteza Kheirkhah. Informatics Department University of Sussex. Multi-Service Networks July 2011

Routing Domains in Data Centre Networks. Morteza Kheirkhah. Informatics Department University of Sussex. Multi-Service Networks July 2011 Routing Domains in Data Centre Networks Morteza Kheirkhah Informatics Department University of Sussex Multi-Service Networks July 2011 What is a Data Centre? Large-scale Data Centres (DC) consist of tens

More information

GRIN: Utilizing the Empty Half of Full Bisection Networks

GRIN: Utilizing the Empty Half of Full Bisection Networks GRIN: Utilizing the Empty Half of Full Bisection Networks Alexandru Agache University Politehnica of Bucharest Costin Raiciu University Politehnica of Bucharest Abstract Various full bisection designs

More information

Advanced Computer Networks. Datacenter TCP

Advanced Computer Networks. Datacenter TCP Advanced Computer Networks 263 3501 00 Datacenter TCP Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Today Problems with TCP in the Data Center TCP Incast TPC timeouts Improvements

More information

Lecture 7: Data Center Networks

Lecture 7: Data Center Networks Lecture 7: Data Center Networks CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Nick Feamster Lecture 7 Overview Project discussion Data Centers overview Fat Tree paper discussion CSE

More information

Data Center Network Topologies II

Data Center Network Topologies II Data Center Network Topologies II Hakim Weatherspoon Associate Professor, Dept of Computer cience C 5413: High Performance ystems and Networking April 10, 2017 March 31, 2017 Agenda for semester Project

More information

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a world with flat data storage Simple, Centralized, and easy to program Unfortunately, datacenter networks

More information

Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden

Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden Networking Recap Storage Intro CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden Networking Recap Storage Intro Long Haul/Global Networking Speed of light is limiting; Latency has a lower bound (.) Throughput

More information

Advanced Computer Networks. Datacenter TCP

Advanced Computer Networks. Datacenter TCP Advanced Computer Networks 263 3501 00 Datacenter TCP Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 1 Oriana Riva, Department of Computer Science ETH Zürich Last week Datacenter Fabric Portland

More information

TCP Sendbuffer Advertising. Costin Raiciu University Politehnica of Bucharest

TCP Sendbuffer Advertising. Costin Raiciu University Politehnica of Bucharest TCP Sendbuffer Advertising Costin Raiciu University Politehnica of Bucharest Problem statement There is only so much we can find about about a connection by looking at in flight packets (losses, retransmissions,

More information

6.888: Lecture 4 Data Center Load Balancing

6.888: Lecture 4 Data Center Load Balancing 6.888: Lecture 4 Data Center Load Balancing Mohammad Alizadeh Spring 2016 1 MoDvaDon DC networks need large bisection bandwidth for distributed apps (big data, HPC, web services, etc) Multi-rooted Single-rooted

More information

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz 1 A Typical Facebook Page Modern pages have many components

More information

DevoFlow: Scaling Flow Management for High-Performance Networks

DevoFlow: Scaling Flow Management for High-Performance Networks DevoFlow: Scaling Flow Management for High-Performance Networks Andy Curtis Jeff Mogul Jean Tourrilhes Praveen Yalagandula Puneet Sharma Sujata Banerjee Software-defined networking Software-defined networking

More information

Shaping the Linux kernel MPTCP implementation towards upstream acceptance

Shaping the Linux kernel MPTCP implementation towards upstream acceptance Shaping the Linux kernel MPTCP implementation towards upstream acceptance Doru-Cristian Gucea Octavian Purdila Agenda MPTCP in a nutshell Use-cases Basics Initial Linux kernel implementation Implementation

More information

Optimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa

Optimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa Optimizing Network Performance in Distributed Machine Learning Luo Mai Chuntao Hong Paolo Costa Machine Learning Successful in many fields Online advertisement Spam filtering Fraud detection Image recognition

More information

TCP. The TCP Protocol. TCP Header Format. TCP Flow Control. TCP Congestion Control Datacenter TCP

TCP. The TCP Protocol. TCP Header Format. TCP Flow Control. TCP Congestion Control Datacenter TCP The TCP Protocol TCP TCP Congestion Control Datacenter TCP Frame format Connection management Flow control TCP reliable data transfer Congestion control TCP Header Format TCP Flow Control HL 32 bits Source

More information

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste

Outline Computer Networking. TCP slow start. TCP modeling. TCP details AIMD. Congestion Avoidance. Lecture 18 TCP Performance Peter Steenkiste Outline 15-441 Computer Networking Lecture 18 TCP Performance Peter Steenkiste Fall 2010 www.cs.cmu.edu/~prs/15-441-f10 TCP congestion avoidance TCP slow start TCP modeling TCP details 2 AIMD Distributed,

More information

Micro load balancing in data centers with DRILL

Micro load balancing in data centers with DRILL Micro load balancing in data centers with DRILL Soudeh Ghorbani (UIUC) Brighten Godfrey (UIUC) Yashar Ganjali (University of Toronto) Amin Firoozshahian (Intel) Where should the load balancing functionality

More information

Data Centers. Tom Anderson

Data Centers. Tom Anderson Data Centers Tom Anderson Transport Clarification RPC messages can be arbitrary size Ex: ok to send a tree or a hash table Can require more than one packet sent/received We assume messages can be dropped,

More information

From Routing to Traffic Engineering

From Routing to Traffic Engineering 1 From Routing to Traffic Engineering Robert Soulé Advanced Networking Fall 2016 2 In the beginning B Goal: pair-wise connectivity (get packets from A to B) Approach: configure static rules in routers

More information

MPTCP: Design and Deployment. Day 11

MPTCP: Design and Deployment. Day 11 MPTCP: Design and Deployment Day 11 Use of Multipath TCP in ios 7 Multipath TCP in ios 7 Primary TCP connection over WiFi Backup TCP connection over cellular data Enables fail-over Improves performance

More information

L19 Data Center Network Architectures

L19 Data Center Network Architectures L19 Data Center Network Architectures by T.S.R.K. Prasad EA C451 Internetworking Technologies 27/09/2012 References / Acknowledgements [Feamster-DC] Prof. Nick Feamster, Data Center Networking, CS6250:

More information

TCP so far Computer Networking Outline. How Was TCP Able to Evolve

TCP so far Computer Networking Outline. How Was TCP Able to Evolve TCP so far 15-441 15-441 Computer Networking 15-641 Lecture 14: TCP Performance & Future Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 Reliable byte stream protocol Connection establishments

More information

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale

More information

DARD: A Practical Distributed Adaptive Routing Architecture for Datacenter Networks

DARD: A Practical Distributed Adaptive Routing Architecture for Datacenter Networks DARD: A Practical Distributed Adaptive Routing Architecture for Datacenter Networks Xin Wu and Xiaowei Yang Duke-CS-TR-20-0 {xinwu, xwy}@cs.duke.edu Abstract Datacenter networks typically have multiple

More information

Jellyfish: Networking Data Centers Randomly

Jellyfish: Networking Data Centers Randomly Jellyfish: Networking Data Centers Randomly Ankit Singla Chi-Yao Hong Lucian Popa Brighten Godfrey DIMACS Workshop on Systems and Networking Advances in Cloud Computing December 8 2011 The real stars...

More information

Responsive Multipath TCP in SDN-based Datacenters

Responsive Multipath TCP in SDN-based Datacenters Responsive Multipath TCP in SDN-based Datacenters Jingpu Duan, Zhi Wang, Chuan Wu Department of Computer Science, The University of Hong Kong, Email: {jpduan,cwu}@cs.hku.hk Graduate School at Shenzhen,

More information

TCP improvements for Data Center Networks

TCP improvements for Data Center Networks TCP improvements for Data Center Networks Tanmoy Das and rishna M. Sivalingam Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 636, India Emails: {tanmoy.justu@gmail.com,

More information

Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades

Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades Andy Curtis joint work with: S. Keshav Alejandro López-Ortiz Tommy Carpenter Mustafa Elsheikh University of Waterloo Motivation Data

More information

c-through: Part-time Optics in Data Centers

c-through: Part-time Optics in Data Centers Data Center Network Architecture c-through: Part-time Optics in Data Centers Guohui Wang 1, T. S. Eugene Ng 1, David G. Andersen 2, Michael Kaminsky 3, Konstantina Papagiannaki 3, Michael Kozuch 3, Michael

More information

Stateless Datacenter Load Balancing with Beamer

Stateless Datacenter Load Balancing with Beamer Stateless Datacenter Load Balancing with Beamer Vladimir Olteanu, Alexandru Agache, Andrei Voinescu, Costin Raiciu University Politehnica of Bucharest Thanks to Datacenter load balancing Datacenter load

More information

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) OUTLINE Flat datacenter storage Deterministic data placement in fds Metadata properties of fds Per-blob metadata in fds Dynamic Work Allocation in fds Replication

More information

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017 CSE 124: THE DATACENTER AS A COMPUTER George Porter November 20 and 22, 2017 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative

More information

Developing Multipath TCP. Damon Wischik, Mark Handley, Costin Raiciu

Developing Multipath TCP. Damon Wischik, Mark Handley, Costin Raiciu Developing Multipath TCP Damon Wischik, Mark Handley, Costin Raiciu 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2 * * 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

More information

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era Massimiliano Sbaraglia Network Engineer Introduction In the cloud computing era, distributed architecture is used to handle operations of mass data, such as the storage, mining, querying, and searching

More information

Urban dashboardistics Damon Wischik Dept. of Computer Science and Technology

Urban dashboardistics Damon Wischik Dept. of Computer Science and Technology Urban dashboardistics Damon Wischik Dept. of Computer Science and Technology UNIVERSITY OF CAMBRIDGE Wardrop modelled route choice by drivers in a traffic network. Braess discovered paradoxical outcomes

More information

Improving Multipath TCP. PhD Thesis - Christoph Paasch

Improving Multipath TCP. PhD Thesis - Christoph Paasch Improving Multipath TCP PhD Thesis - Christoph Paasch The Internet is like a map... highly connected Communicating over the Internet A 1 A A 2 A 3 A 4 A 5 Multipath communication A 1 A 2 A 3 A 4 A 5 A

More information

Recap. TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness

Recap. TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness Recap TCP connection setup/teardown Sliding window, flow control Retransmission timeouts Fairness, max-min fairness AIMD achieves max-min fairness 81 Feedback Signals Several possible signals, with different

More information

Sinbad. Leveraging Endpoint Flexibility in Data-Intensive Clusters. Mosharaf Chowdhury, Srikanth Kandula, Ion Stoica. UC Berkeley

Sinbad. Leveraging Endpoint Flexibility in Data-Intensive Clusters. Mosharaf Chowdhury, Srikanth Kandula, Ion Stoica. UC Berkeley Sinbad Leveraging Endpoint Flexibility in Data-Intensive Clusters Mosharaf Chowdhury, Srikanth Kandula, Ion Stoica UC Berkeley Communication is Crucial for Analytics at Scale Performance Facebook analytics

More information

Multipath TCP. Prof. Mark Handley Dr. Damon Wischik Costin Raiciu University College London

Multipath TCP. Prof. Mark Handley Dr. Damon Wischik Costin Raiciu University College London Multipath TCP How one little change can make: YouTube more robust your iphone service cheaper your home broadband quicker prevent the Internet from melting down enable remote brain surgery cure hyperbole

More information

BLEST: Blocking Estimation-based MPTCP Scheduler for Heterogeneous Networks. Simone Ferlin, Ozgu Alay, Olivier Mehani and Roksana Boreli

BLEST: Blocking Estimation-based MPTCP Scheduler for Heterogeneous Networks. Simone Ferlin, Ozgu Alay, Olivier Mehani and Roksana Boreli BLEST: Blocking Estimation-based MPTCP Scheduler for Heterogeneous Networks Simone Ferlin, Ozgu Alay, Olivier Mehani and Roksana Boreli Multipath TCP: What is it? TCP/IP is built around the notion of a

More information

STMS: Improving MPTCP Throughput Under Heterogenous Networks

STMS: Improving MPTCP Throughput Under Heterogenous Networks STMS: Improving MPTCP Throughput Under Heterogenous Networks Hang Shi 1, Yong Cui 1, Xin Wang 2, Yuming Hu 1, Minglong Dai 1, anzhao Wang 3, Kai Zheng 3 1 Tsinghua University, 2 Stony Brook University,

More information

Making Multipath TCP friendlier to Load Balancers and Anycast

Making Multipath TCP friendlier to Load Balancers and Anycast Making Multipath TCP friendlier to Load Balancers and Anycast Fabien Duchêne Olivier Bonaventure Université Catholique de Louvain ICNP 2017

More information

Hardware Evolution in Data Centers

Hardware Evolution in Data Centers Hardware Evolution in Data Centers 2004 2008 2011 2000 2013 2014 Trend towards customization Increase work done per dollar (CapEx + OpEx) Paolo Costa Rethinking the Network Stack for Rack-scale Computers

More information

SPAIN: High BW Data-Center Ethernet with Unmodified Switches. Praveen Yalagandula, HP Labs. Jayaram Mudigonda, HP Labs

SPAIN: High BW Data-Center Ethernet with Unmodified Switches. Praveen Yalagandula, HP Labs. Jayaram Mudigonda, HP Labs SPAIN: High BW Data-Center Ethernet with Unmodified Switches Jayaram Mudigonda, HP Labs Mohammad Al-Fares, UCSD Praveen Yalagandula, HP Labs Jeff Mogul, HP Labs 1 Copyright Copyright 2010 Hewlett-Packard

More information

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013 Advanced Computer Networks 263-3501-00 Exercise Session 7 Qin Yin Spring Semester 2013 1 LAYER 7 SWITCHING 2 Challenge: accessing services Datacenters are designed to be scalable Datacenters are replicated

More information

CS244a: An Introduction to Computer Networks

CS244a: An Introduction to Computer Networks Grade: MC: 7: 8: 9: 10: 11: 12: 13: 14: Total: CS244a: An Introduction to Computer Networks Final Exam: Wednesday You are allowed 2 hours to complete this exam. (i) This exam is closed book and closed

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

Congestion Control of MPTCP: Performance Issues and a Possible Solution

Congestion Control of MPTCP: Performance Issues and a Possible Solution Congestion Control of MPTCP: Performance Issues and a Possible Solution Ramin Khalili, T-Labs/TU-Berlin, Germany R.khalili, N. Gast, M. Popovic, J.-Y Le Boudec, draft-khalili-mptcp-performance-issues-02

More information

DevoFlow: Scaling Flow Management for High Performance Networks

DevoFlow: Scaling Flow Management for High Performance Networks DevoFlow: Scaling Flow Management for High Performance Networks SDN Seminar David Sidler 08.04.2016 1 Smart, handles everything Controller Control plane Data plane Dump, forward based on rules Existing

More information

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS

More information

Per-Packet Load Balancing in Data Center Networks

Per-Packet Load Balancing in Data Center Networks Per-Packet Load Balancing in Data Center Networks Yagiz Kaymak and Roberto Rojas-Cessa Abstract In this paper, we evaluate the performance of perpacket load in data center networks (DCNs). Throughput and

More information

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers

Overview. TCP & router queuing Computer Networking. TCP details. Workloads. TCP Performance. TCP Performance. Lecture 10 TCP & Routers Overview 15-441 Computer Networking TCP & router queuing Lecture 10 TCP & Routers TCP details Workloads Lecture 10: 09-30-2002 2 TCP Performance TCP Performance Can TCP saturate a link? Congestion control

More information

NaaS Network-as-a-Service in the Cloud

NaaS Network-as-a-Service in the Cloud NaaS Network-as-a-Service in the Cloud joint work with Matteo Migliavacca, Peter Pietzuch, and Alexander L. Wolf costa@imperial.ac.uk Motivation Mismatch between app. abstractions & network How the programmers

More information

Hedera: Dynamic Flow Scheduling for Data Center Networks

Hedera: Dynamic Flow Scheduling for Data Center Networks Hedera: Dynamic Flow Scheduling for Data Center Networks Mohammad Al-Fares Sivasankar Radhakrishnan Barath Raghavan Nelson Huang Amin Vahdat {malfares, sivasankar, nhuang, vahdat}@cs.ucsd.edu barath@cs.williams.edu

More information

Lecture 16: Data Center Network Architectures

Lecture 16: Data Center Network Architectures MIT 6.829: Computer Networks Fall 2017 Lecture 16: Data Center Network Architectures Scribe: Alex Lombardi, Danielle Olson, Nicholas Selby 1 Background on Data Centers Computing, storage, and networking

More information

XCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet

XCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet XCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet Vijay Shankar Rajanna, Smit Shah, Anand Jahagirdar and Kartik Gopalan Computer Science, State University of New York at Binghamton

More information

Multipath TCP. Prof. Mark Handley Dr. Damon Wischik Costin Raiciu University College London

Multipath TCP. Prof. Mark Handley Dr. Damon Wischik Costin Raiciu University College London Multipath TCP How one little change can make: Google more robust your iphone service cheaper your home broadband quicker prevent the Internet from melting down enable remote brain surgery cure hyperbole

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

EXPERIENCES EVALUATING DCTCP. Lawrence Brakmo, Boris Burkov, Greg Leclercq and Murat Mugan Facebook

EXPERIENCES EVALUATING DCTCP. Lawrence Brakmo, Boris Burkov, Greg Leclercq and Murat Mugan Facebook EXPERIENCES EVALUATING DCTCP Lawrence Brakmo, Boris Burkov, Greg Leclercq and Murat Mugan Facebook INTRODUCTION Standard TCP congestion control, which only reacts to packet losses has many problems Can

More information

Page 1. Review: Internet Protocol Stack. Transport Layer Services. Design Issue EEC173B/ECS152C. Review: TCP

Page 1. Review: Internet Protocol Stack. Transport Layer Services. Design Issue EEC173B/ECS152C. Review: TCP EEC7B/ECS5C Review: Internet Protocol Stack Review: TCP Application Telnet FTP HTTP Transport Network Link Physical bits on wire TCP LAN IP UDP Packet radio Transport Layer Services Design Issue Underlying

More information

A Two-Phase Multipathing Scheme based on Genetic Algorithm for Data Center Networking

A Two-Phase Multipathing Scheme based on Genetic Algorithm for Data Center Networking A Two-Phase Multipathing Scheme based on Genetic Algorithm for Data Center Networking Lyno Henrique Gonçalvez Ferraz, Diogo Menezes Ferrazani Mattos, Otto Carlos Muniz Bandeira Duarte Grupo de Teleinformática

More information

Today s Agenda. Today s Agenda 9/8/17. Networking and Messaging

Today s Agenda. Today s Agenda 9/8/17. Networking and Messaging CS 686: Special Topics in Big Data Networking and Messaging Lecture 7 Today s Agenda Project 1 Updates Networking topics in Big Data Message formats and serialization techniques CS 686: Big Data 2 Today

More information

Improving the Robustness of TCP to Non-Congestion Events

Improving the Robustness of TCP to Non-Congestion Events Improving the Robustness of TCP to Non-Congestion Events Presented by : Sally Floyd floyd@acm.org For the Authors: Sumitha Bhandarkar A. L. Narasimha Reddy {sumitha,reddy}@ee.tamu.edu Problem Statement

More information

Explicit Path Control in Commodity Data Centers: Design and Applications

Explicit Path Control in Commodity Data Centers: Design and Applications Explicit Path Control in Commodity Data Centers: Design and Applications Shuihai Hu 1 Kai Chen 1 Haitao Wu 2 Wei Bai 1 Chang Lan 3 Hao Wang 1 Hongze Zhao 4 Chuanxiong Guo 2 1 SING Group @ Hong Kong University

More information

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in

More information

Implementation Experiments on HighSpeed and Parallel TCP

Implementation Experiments on HighSpeed and Parallel TCP Implementation Experiments on HighSpeed and TCP Zongsheng Zhang Go Hasegawa Masayuki Murata Osaka University Outline Introduction Background of and g Why to evaluate in a test-bed network A refined algorithm

More information

Jellyfish. networking data centers randomly. Brighten Godfrey UIUC Cisco Systems, September 12, [Photo: Kevin Raskoff]

Jellyfish. networking data centers randomly. Brighten Godfrey UIUC Cisco Systems, September 12, [Photo: Kevin Raskoff] Jellyfish networking data centers randomly Brighten Godfrey UIUC Cisco Systems, September 12, 2013 [Photo: Kevin Raskoff] Ask me about... Low latency networked systems Data plane verification (Veriflow)

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

Delay Controlled Elephant Flow Rerouting in Software Defined Network

Delay Controlled Elephant Flow Rerouting in Software Defined Network 1st International Conference on Advanced Information Technologies (ICAIT), Nov. 1-2, 2017, Yangon, Myanmar Delay Controlled Elephant Flow Rerouting in Software Defined Network Hnin Thiri Zaw, Aung Htein

More information

TCP and BBR. Geoff Huston APNIC

TCP and BBR. Geoff Huston APNIC TCP and BBR Geoff Huston APNIC Computer Networking is all about moving data The way in which data movement is controlled is a key characteristic of the network architecture The Internet protocol passed

More information

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in

DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in DB2 is a complex system, with a major impact upon your processing environment. There are substantial performance and instrumentation changes in versions 8 and 9. that must be used to measure, evaluate,

More information

Computer Networks. Sándor Laki ELTE-Ericsson Communication Networks Laboratory

Computer Networks. Sándor Laki ELTE-Ericsson Communication Networks Laboratory Computer Networks Sándor Laki ELTE-Ericsson Communication Networks Laboratory ELTE FI Department Of Information Systems lakis@elte.hu http://lakis.web.elte.hu Based on the slides of Laurent Vanbever. Further

More information

DFFR: A Distributed Load Balancer for Data Center Networks

DFFR: A Distributed Load Balancer for Data Center Networks DFFR: A Distributed Load Balancer for Data Center Networks Chung-Ming Cheung* Department of Computer Science University of Southern California Los Angeles, CA 90089 E-mail: chungmin@usc.edu Ka-Cheong Leung

More information

CS 6453 Network Fabric Presented by Ayush Dubey

CS 6453 Network Fabric Presented by Ayush Dubey CS 6453 Network Fabric Presented by Ayush Dubey Based on: 1. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google s Datacenter Network. Singh et al. SIGCOMM15. 2. Network Traffic

More information

SAFEGUARDING YOUR VIRTUALIZED RESOURCES ON THE CLOUD. May 2012

SAFEGUARDING YOUR VIRTUALIZED RESOURCES ON THE CLOUD. May 2012 SAFEGUARDING YOUR VIRTUALIZED RESOURCES ON THE CLOUD May 2012 THE ECONOMICS OF THE DATA CENTER Physical Server Installed Base (Millions) Logical Server Installed Base (Millions) Complexity and Operating

More information

A Network-aware Scheduler in Data-parallel Clusters for High Performance

A Network-aware Scheduler in Data-parallel Clusters for High Performance A Network-aware Scheduler in Data-parallel Clusters for High Performance Zhuozhao Li, Haiying Shen and Ankur Sarker Department of Computer Science University of Virginia May, 2018 1/61 Data-parallel clusters

More information

One More Bit Is Enough

One More Bit Is Enough One More Bit Is Enough Yong Xia, RPI Lakshmi Subramanian, UCB Ion Stoica, UCB Shiv Kalyanaraman, RPI SIGCOMM 05, Philadelphia, PA 08 / 23 / 2005 Motivation #1: TCP doesn t work well in high b/w or delay

More information

CS 43: Computer Networks. 19: TCP Flow and Congestion Control October 31, Nov 2, 2018

CS 43: Computer Networks. 19: TCP Flow and Congestion Control October 31, Nov 2, 2018 CS 43: Computer Networks 19: TCP Flow and Congestion Control October 31, Nov 2, 2018 Five-layer Internet Model Application: the application (e.g., the Web, Email) Transport: end-to-end connections, reliability

More information

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks Peng Wang, Hong Xu, Zhixiong Niu, Dongsu Han, Yongqiang Xiong ACM SoCC 2016, Oct 5-7, Santa Clara Motivation Datacenter networks

More information

Low Latency via Redundancy

Low Latency via Redundancy Low Latency via Redundancy Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, Scott Shenker Presenter: Meng Wang 2 Low Latency Is Important Injecting just 400 milliseconds

More information

SPARTA: Scalable Per-Address RouTing Architecture

SPARTA: Scalable Per-Address RouTing Architecture SPARTA: Scalable Per-Address RouTing Architecture John Carter Data Center Networking IBM Research - Austin IBM Research Science & Technology IBM Research activities related to SDN / OpenFlow IBM Research

More information

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552)

Packet Scheduling in Data Centers. Lecture 17, Computer Networks (198:552) Packet Scheduling in Data Centers Lecture 17, Computer Networks (198:552) Datacenter transport Goal: Complete flows quickly / meet deadlines Short flows (e.g., query, coordination) Large flows (e.g., data

More information

Application of SDN: Load Balancing & Traffic Engineering

Application of SDN: Load Balancing & Traffic Engineering Application of SDN: Load Balancing & Traffic Engineering Outline 1 OpenFlow-Based Server Load Balancing Gone Wild Introduction OpenFlow Solution Partitioning the Client Traffic Transitioning With Connection

More information

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Slides courtesy of Tim Wood Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Congestion Control. Brighten Godfrey CS 538 January Based in part on slides by Ion Stoica

Congestion Control. Brighten Godfrey CS 538 January Based in part on slides by Ion Stoica Congestion Control Brighten Godfrey CS 538 January 31 2018 Based in part on slides by Ion Stoica Announcements A starting point: the sliding window protocol TCP flow control Make sure receiving end can

More information

Page 1. Review: Internet Protocol Stack. Transport Layer Services EEC173B/ECS152C. Review: TCP. Transport Layer: Connectionless Service

Page 1. Review: Internet Protocol Stack. Transport Layer Services EEC173B/ECS152C. Review: TCP. Transport Layer: Connectionless Service EEC7B/ECS5C Review: Internet Protocol Stack Review: TCP Application Telnet FTP HTTP Transport Network Link Physical bits on wire TCP LAN IP UDP Packet radio Do you remember the various mechanisms we have

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

On the Efficacy of Fine-Grained Traffic Splitting Protocols in Data Center Networks

On the Efficacy of Fine-Grained Traffic Splitting Protocols in Data Center Networks Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 2011 On the Efficacy of Fine-Grained Traffic Splitting Protocols in Data Center Networks

More information

TCP and BBR. Geoff Huston APNIC

TCP and BBR. Geoff Huston APNIC TCP and BBR Geoff Huston APNIC Computer Networking is all about moving data The way in which data movement is controlled is a key characteristic of the network architecture The Internet protocol passed

More information

Mahout: Low-Overhead Datacenter Traffic Management using End-Host-Based Elephant Detection. Vasileios Dimitrakis

Mahout: Low-Overhead Datacenter Traffic Management using End-Host-Based Elephant Detection. Vasileios Dimitrakis Mahout: Low-Overhead Datacenter Traffic Management using End-Host-Based Elephant Detection Vasileios Dimitrakis Vasileios Dimitrakis 2016-03-18 1 Introduction - Motivation (1) Vasileios Dimitrakis 2016-03-18

More information

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013 Distributed Systems 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski Rutgers University Fall 2013 December 12, 2014 2013 Paul Krzyzanowski 1 Motivation for the Cloud Self-service configuration

More information

Implementation and Evaluation of Coupled Congestion Control for Multipath TCP

Implementation and Evaluation of Coupled Congestion Control for Multipath TCP Implementation and Evaluation of Coupled Congestion Control for Multipath TCP Régel González Usach and Mirja Kühlewind Institute of Communication Networks and Computer Engineering (IKR), University of

More information

Data Center TCP (DCTCP)

Data Center TCP (DCTCP) Data Center TCP (DCTCP) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Microsoft Research Stanford University 1

More information

Computer Networks. Course Reference Model. Topic. Congestion What s the hold up? Nature of Congestion. Nature of Congestion 1/5/2015.

Computer Networks. Course Reference Model. Topic. Congestion What s the hold up? Nature of Congestion. Nature of Congestion 1/5/2015. Course Reference Model Computer Networks 7 Application Provides functions needed by users Zhang, Xinyu Fall 204 4 Transport Provides end-to-end delivery 3 Network Sends packets over multiple links School

More information

Concise Paper: Freeway: Adaptively Isolating the Elephant and Mice Flows on Different Transmission Paths

Concise Paper: Freeway: Adaptively Isolating the Elephant and Mice Flows on Different Transmission Paths 2014 IEEE 22nd International Conference on Network Protocols Concise Paper: Freeway: Adaptively Isolating the Elephant and Mice Flows on Different Transmission Paths Wei Wang,Yi Sun, Kai Zheng, Mohamed

More information

Congestion Control in Datacenters. Ahmed Saeed

Congestion Control in Datacenters. Ahmed Saeed Congestion Control in Datacenters Ahmed Saeed What is a Datacenter? Tens of thousands of machines in the same building (or adjacent buildings) Hundreds of switches connecting all machines What is a Datacenter?

More information

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic

More information

Data Centers and Cloud Computing. Data Centers

Data Centers and Cloud Computing. Data Centers Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information