Spanning Tree and Datacenters

Similar documents
Data Center Networking

Missing pieces + Putting the pieces together

Lecture 7: Data Center Networks

ProAc&ve Rou&ng In Scalable Data Centers with PARIS

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

Datacenter Backbone Enterprise Cellular Wireless

Switching and bridging

Link State Rou.ng Reading: Sec.ons 4.2 and 4.3.4

Designing Mul+- Tenant Data Centers using EVPN- IRB. Neeraj Malhotra, Principal Engineer, Cisco Ahmed Abeer, Technical Marke<ng Engineer, Cisco

IP Fabric Reference Architecture

Op#mizing MapReduce for Highly- Distributed Environments

Data Centers. Tom Anderson

Advanced Computer Networks Data Center Architecture. Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015

CSE 461: Bridging LANs. Last Topic

15-744: Computer Networking. Data Center Networking II

Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden

Top-Down Network Design

White Paper. OCP Enabled Switching. SDN Solutions Guide

Missing pieces + Putting the pieces together

Network Layer (Routing)

Networking Acronym Smorgasbord: , DVMRP, CBT, WFQ

Data Center Fundamentals: The Datacenter as a Computer

Communication Networks

Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures. Dimitra Simeonidou

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Best Practices for Validating the Performance of Data Center Infrastructure. Henry He Ixia

Today s Objec4ves. Data Center. Virtualiza4on Cloud Compu4ng Amazon Web Services. What did you think? 10/23/17. Oct 23, 2017 Sprenkle - CSCI325

Topic. How rou=ng protocols work with IP. The Host/Router dis=nc=on. I don t! I route. CSE 461 University of Washington 1

SPARTA: Scalable Per-Address RouTing Architecture

Some portions courtesy Srini Seshan or David Wetherall

RapidIO.org Update.

Top-Down Network Design, Ch. 7: Selecting Switching and Routing Protocols. Top-Down Network Design. Selecting Switching and Routing Protocols

Exam Questions

Automating Cloud Networking with RedHat OpenStack

Packet Switching. Guevara Noubir Fundamentals of Computer Networks. Computer Networks: A Systems Approach, L. Peterson, B. Davie, Morgan Kaufmann

L19 Data Center Network Architectures

Link layer: introduction

Thinking Architecturally (80 Minutes Inside Scott s Head)

Data Center Network Topologies II

Multicast EECS 122: Lecture 16

IPv6 Neighbor Discovery (ND) Problems with Layer-2 Multicast State

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

Interdomain Routing Reading: Sections K&R EE122: Intro to Communication Networks Fall 2007 (WF 4:00-5:30 in Cory 277)

OPTIMAL ROUTING VS. ROUTE REFLECTOR VNF - RECONCILE THE FIRE WITH WATER

Mellanox Virtual Modular Switch

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Missing Pieces of the Puzzle

Medium Access Protocols

Lecture 7 Advanced Networking Virtual LAN. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Advanced Topics in Routing

From Routing to Traffic Engineering

Data Center Network Topologies

Data Center architecture trends and their impact on PMD requirements

The IP Data Plane: Packets and Routers

Networking: Hardware, Rou?ng and new Challenges

Revisiting Network Support for RDMA

Themes. The Network 1. Energy in the DC: ~15% network? Energy by Technology

Fairness Example: high priority for nearby stations Optimality Efficiency overhead

Chapter 3 Part 2 Switching and Bridging. Networking CS 3470, Section 1

Interdomain Routing Reading: Sections P&D 4.3.{3,4}

On low-latency-capable topologies, and their impact on the design of intra-domain routing

Link Layer. w/ credit to Rick Graziani (Cabrillo) for some of the anima<ons

Data Center Configuration. 1. Configuring VXLAN

The Virtual Brick Road Achievements and Challenges in NFV Space. Diego R. Lopez Telefónica NFV ISG Technical Manager October 2013

Carnegie Mellon Computer Science Department Spring 2015 Midterm Exam

The Software Defined Hybrid Packet Optical Datacenter Network BIG DATA AT LIGHT SPEED TM CALIENT Technologies

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz

Routing Outline. EECS 122, Lecture 15

CSCI Computer Networks

CS4450. Computer Networks: Architecture and Protocols. Lecture 15 BGP. Spring 2018 Rachit Agarwal

Question. Reliable Transport: The Prequel. Don t parse my words too carefully. Don t be intimidated. Decisions and Their Principles.

6.888: Lecture 2 Data Center Network Architectures

MFTP: a Clean- Slate Transport Protocol for the Informa8on Centric MobilityFirst Network

Principles behind data link layer services

More Routing. EE122 Fall 2012 Scott Shenker

Cloud Computing. Lecture 2: Leaf-Spine and PortLand Networks

A local area network that employs either a full mesh topology or partial mesh topology

Transparent Bridging and VLAN

Architecting Data Center Networks in the era of Big Data and Cloud

Layer 2 Implementation

Introduction to Internetworking

Principles behind data link layer services:

Principles behind data link layer services:

CSE 123A Computer Networks

Data Center Interconnect Solution Overview

SPAIN: High BW Data-Center Ethernet with Unmodified Switches. Praveen Yalagandula, HP Labs. Jayaram Mudigonda, HP Labs

Abstrac(ons for Middleboxes. à StonyBrook

Distributed Core Architecture Using Dell Networking Z9000 and S4810 Switches

OPENFLOW & SOFTWARE DEFINED NETWORKING. Greg Ferro EtherealMind.com and PacketPushers.net

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger

Network Mobility Management: APPLICATION REQUIREMENTS & MOBILITY MODELS

Taxonomy of SDN. Vara Varavithya 17 January 2018

Industry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson

Network Performance: Queuing

Unicast Routing. Information About Layer 3 Unicast Routing CHAPTER

Data Link Layer. Our goals: understand principles behind data link layer services: instantiation and implementation of various link layer technologies

Developing Multipath TCP. Damon Wischik, Mark Handley, Costin Raiciu

Campus Network Design

Lecture 15: Datacenter TCP"

Transcription:

Spanning Tree and Datacenters EE 122, Fall 2013 Sylvia Ratnasamy http://inst.eecs.berkeley.edu/~ee122/ Material thanks to Mike Freedman, Scott Shenker, Ion Stoica, Jennifer Rexford, and many other colleagues

Last 1me: Self- Learning in Ethernet Approach: Flood first packet to node you are trying to reach Avoids loop by restric1ng flooding to spanning tree Flooding allows packet to reach des1na1on And in the process switches learn how to reach source of flood

Missing Piece: Building the Spanning Tree Distributed Self- configuring No global informa1on Must adapt when failures occur

A convenient spanning tree Shortest paths to (or from) a node form a tree Covers all nodes in the graph No shortest path can have a cycle

Algorithm Has Two Aspects Pick a root: Pick the one with the smallest iden1fier (MAC addr.) This will be the des1na1on to which shortest paths go Compute shortest paths to the root Only keep the links on shortest- paths Break 1es by picking the lowest neighbor switch addr Ethernet s spanning tree construc1on does both with a single algorithm

Constructing a Spanning Tree" Messages (Y, d, X) From node X Proposing Y as the root And advertising a distance d to Y Switches elect the node with smallest identifier (MAC address) as root Y in the messages Each switch determines if a link is on the shortest path from the root; excludes it from the tree if not d to Y in the message

Steps in Spanning Tree Algorithm" Initially, each switch proposes itself as the root Example: switch X announces (X, 0, X) Switches update their view of the root Upon receiving message (Y, d, Z) from Z, check Y s id If Y s id < current root: set root = Y Switches compute their distance from the root Add 1 to the shortest distance received from a neighbor If root or shortest distance to it changed, flood updated message (Y, d+1, X)

Example (root, dist, from)" 1" 2! 3! 4! 5! 6! 7! (1,0,1) (2,0,2) (3,0,3) (4,0,4) (5,0,5) (6,0,6) (7,0,7) 1" 2! 3! 4! 5! 6! 7! (1,0,1) (2,0,2) (1,1,3) (2,1,4) (1,1,5) (1,1,6) (2,1,7) 1" 2! 3! 4! 5! 6! 7! (1,0,1) (1,2,3) (1,1,3) (2,1,4) (1,1,5) (1,1,6) (2,1,7) 1" 2! 3! 4! 5! 6! 7! (1,0,1) (1,2,3) (1,1,3) (1,3,2) (1,1,5) (1,1,6) (1,3,2)

Robust Spanning Tree Algorithm" Algorithm must react to failures Failure of the root node Failure of other switches and links Root switch continues sending messages Periodically re-announcing itself as the root Detecting failures through timeout (soft state) If no word from root, time out and claim to be the root!

Problems with Spanning Tree? Delay in reestablishing spanning tree Network is down until spanning tree rebuilt Much of the network bandwidth goes unused Forwarding is only over the spanning tree A real problem for datacenter networks

Datacenters

What you need to know Characteris1cs of a datacenter environment goals, constraints, workloads, etc. How and why DC networks are different (vs. WAN) e.g., latency, geo, autonomy, How tradi1onal solu1ons fare in this environment e.g., IP, Ethernet, TCP, ARP, DHCP Not details of how datacenter networks operate

Disclaimer Material is emerging (not established) wisdom Material is incomplete many details on how and why datacenter networks operate aren t public

What goes into a datacenter (network)? Servers organized in racks

What goes into a datacenter (network)? Servers organized in racks Each rack has a `Top of Rack (ToR) switch

What goes into a datacenter (network)? Servers organized in racks Each rack has a `Top of Rack (ToR) switch An `aggrega1on fabric interconnects ToR switches

What goes into a datacenter (network)? Servers organized in racks Each rack has a `Top of Rack (ToR) switch An `aggrega1on fabric interconnects ToR switches Connected to the outside via `core switches note: blurry line between aggrega1on and core With network redundancy of ~2x for robustness

Example 1 Brocade reference design

Example 2 Internet CR CR AR AR... AR AR S S S S S S... A A A A A A ~ 40-80 servers/rack Cisco reference design

Observa1ons on DC architecture Regular, well- defined arrangement Hierarchical structure with rack/aggr/core layers Mostly homogenous within a layer Supports communica1on between servers and between servers and the external world Contrast: ad- hoc structure, heterogeneity of WANs

Datacenters have been around for a while 1961, Informa-on Processing Center at the Na-onal Bank of Arizona

What s new?

SCALE!

How big exactly? 1M servers [Microsof] less than google, more than amazon > $1B to build one site [Facebook] >$20M/month/site opera1onal costs [Microsof 09] But only O(10-100) sites

What s new? Scale Service model user- facing, revenue genera1ng services mul1- tenancy jargon: SaaS, PaaS, DaaS, IaaS,

Implica1ons Scale need scalable solu1ons (duh) improving efficiency, lowering cost is cri1cal à `scale out solu9ons w/ commodity technologies Service model performance means $$ virtualiza9on for isola1on and portability

Mul1- Tier Applica1ons Applica1ons decomposed into tasks Many separate components Running in parallel on different machines 27

Componen1za1on leads to different types of network traffic North- South traffic Traffic between external clients and the datacenter Handled by front- end (web) servers, mid- 1er applica1on servers, and back- end databases Traffic pamerns fairly stable, though diurnal varia1ons 28

North- South Traffic user requests from the Internet Router Front- End Proxy Front- End Proxy Web Server Web Server Web Server Data Cache Data Cache Database Database 29

Componen1za1on leads to different types of network traffic North- South traffic Traffic between external clients and the datacenter Handled by front- end (web) servers, mid- 1er applica1on servers, and back- end databases Traffic pamerns fairly stable, though diurnal varia1ons East- West traffic Traffic between machines in the datacenter Commn. within big data computa1ons (e.g. Map Reduce) Traffic may shif on small 1mescales (e.g., minutes)

East- West Traffic Distributed Storage Map Tasks Reduce Tasks Distributed Storage 31

East- West Traffic CR CR AR AR AR AR S S S S S S A A A S S A A A... S S A A A S S A A A

Ofen doesn t cross the network East- West Traffic Some frac1on (typically 2/3) crosses the network Distributed Storage Map Tasks Always goes over the network Reduce Tasks Distributed Storage 33

What s different about DC networks? Characteris1cs Huge scale: ~20,000 switches/routers contrast: AT&T ~500 routers

What s different about DC networks? Characteris1cs Huge scale: Limited geographic scope: High bandwidth: 10/40/100G Contrast: DSL/WiFi Very low RTT: 10s of microseconds Contrast: 100s of milliseconds in the WAN

What s different about DC networks? Characteris1cs Huge scale Limited geographic scope Single administra1ve domain Can deviate from standards, invent your own, etc. Green field deployment is s1ll feasible

What s different about DC networks? Characteris1cs Huge scale Limited geographic scope Single administra1ve domain Control over one/both endpoints can change (say) addressing, conges1on control, etc. can add mechanisms for security/policy/etc. at the endpoints (typically in the hypervisor)

What s different about DC networks? Characteris1cs Huge scale Limited geographic scope Single administra1ve domain Control over one/both endpoints Control over the placement of traffic source/sink e.g., map- reduce scheduler chooses where tasks run alters traffic pamern (what traffic crosses which links)

What s different about DC networks? Characteris1cs Huge scale Limited geographic scope Single administra1ve domain Control over one/both endpoints Control over the placement of traffic source/sink Regular/planned topologies (e.g., trees/fat- trees) Contrast: ad- hoc WAN topologies (dictated by real- world geography and facili1es)

What s different about DC networks? Characteris1cs Huge scale Limited geographic scope Single administra1ve domain Control over one/both endpoints Control over the placement of traffic source/sink Regular/planned topologies (e.g., trees/fat- trees) Limited heterogeneity link speeds, technologies, latencies,

What s different about DC networks? Goals Extreme bisec1on bandwidth requirements recall: all that east- west traffic target: any server can communicate at its full link speed problem: server s access link is 10Gbps!

Full Bisec1on Bandwidth Internet O(40x10x100) Gbps CR CR O(40x10)Gbps AR AR... AR AR 10Gbps S S S S S S... Tradi1onal tree topologies scale up A A A A A A ~ 40-80 servers/rack full bisec1on bandwidth is expensive typically, tree topologies oversubscribed

A Scale Out Design Build mul1- stage `Fat Trees out of k- port switches k/2 ports up, k/2 down Supports k 3 /4 hosts: 48 ports, 27,648 hosts All links are the same speed (e.g. 10Gps)

Full Bisec1on Bandwidth Not Sufficient To realize full bisec1onal throughput, rou1ng must spread traffic across paths Enter load- balanced rou1ng How? (1) Let the network split traffic/flows at random (e.g., ECMP protocol - - RFC 2991/2992) How? (2) Centralized flow scheduling (e.g., w/ SDN, Dec 2 nd lec.) Many more research proposals 44

What s different about DC networks? Goals Extreme bisec1on bandwidth requirements Extreme latency requirements real money on the line current target: 1μs RTTs how? cut- through switches making a comeback (lec. 2!) reduces switching 1me

What s different about DC networks? Goals Extreme bisec1on bandwidth requirements Extreme latency requirements real money on the line current target: 1μs RTTs how? cut- through switches making a comeback (lec. 2!) how? avoid conges1on reduces queuing delay

What s different about DC networks? Goals Extreme bisec1on bandwidth requirements Extreme latency requirements real money on the line current target: 1μs RTTs how? cut- through switches making a comeback (lec. 2!) how? avoid conges1on how? fix TCP 1mers (e.g., default 1meout is 500ms!) how? fix/replace TCP to more rapidly fill the pipe

What s different about DC networks? Goals Extreme bisec1on bandwidth requirements Extreme latency requirements Predictable, determinis9c performance your packet will reach in Xms, or not at all your VM will always see at least YGbps throughput Resurrec1ng `best effort vs. `Quality of Service debates How is s1ll an open ques1on

What s different about DC networks? Goals Extreme bisec1on bandwidth requirements Extreme latency requirements Predictable, determinis9c performance Differen1a1ng between tenants is key e.g., No traffic between VMs of tenant A and tenant B Tenant X cannot consume more than XGbps Tenant Y s traffic is low priority We ll see how in the lecture on SDN

What s different about DC networks? Goals Extreme bisec1on bandwidth requirements Extreme latency requirements Predictable, determinis9c performance Differen1a1ng between tenants is key Scalability (of course) Q: How s Ethernet spanning tree looking?

What s different about DC networks? Goals Extreme bisec1on bandwidth requirements Extreme latency requirements Predictable, determinis9c performance Differen1a1ng between tenants is key Scalability (of course) Cost/efficiency focus on commodity solu1ons, ease of management some debate over the importance in the network case

Announcements/Summary I will not have office hours tomorrow email me to schedule an alternate 1me Recap: datacenters new characteris1cs and goals some libera1ng, some constraining scalability is the baseline requirement more emphasis on performance less emphasis on heterogeneity less emphasis on interoperability