Data Center Networking

Similar documents
Spanning Tree and Datacenters

Lecture 7: Data Center Networks

Datacenter Backbone Enterprise Cellular Wireless

6.888: Lecture 2 Data Center Network Architectures

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Enterprise. Nexus 1000V. L2/L3 Fabric WAN/PE. Customer VRF. MPLS Backbone. Service Provider Data Center-1 Customer VRF WAN/PE OTV OTV.

Advanced Computer Networks Data Center Architecture. Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015

L19 Data Center Network Architectures

Data Center Networks. Networking Case Studies. Cloud CompuMng. Cloud CompuMng. Cloud Service Models. Cloud Service Models

Data Center Network Topologies II

GUIDE. Optimal Network Designs with Cohesity

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

Lecture 16: Data Center Network Architectures

Data Centers. Tom Anderson

DELL EMC READY BUNDLE FOR VIRTUALIZATION WITH VMWARE AND FIBRE CHANNEL INFRASTRUCTURE

VXLAN Overview: Cisco Nexus 9000 Series Switches

Mellanox Virtual Modular Switch

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Technical Document. What You Need to Know About Ethernet Audio

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

Networking solution for consolidated IT infrastructure

Dell EMC. VxBlock Systems for VMware NSX 6.2 Architecture Overview

THE DATACENTER AS A COMPUTER AND COURSE REVIEW

21CTL Disaster Recovery, Workload Mobility and Infrastructure as a Service Proposal. By Adeyemi Ademola E. Cloud Engineer

Chapter 3 Part 2 Switching and Bridging. Networking CS 3470, Section 1

Data Centers and Cloud Computing

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Dell EMC. VxBlock Systems for VMware NSX 6.3 Architecture Overview

Data Center Fundamentals: The Datacenter as a Computer

Data Centers and Cloud Computing. Data Centers

Routing Domains in Data Centre Networks. Morteza Kheirkhah. Informatics Department University of Sussex. Multi-Service Networks July 2011

By John Kim, Chair SNIA Ethernet Storage Forum. Several technology changes are collectively driving the need for faster networking speeds.

White Paper. OCP Enabled Switching. SDN Solutions Guide

Data Center Network Topologies

Cloud Networking (VITMMA02) Server Virtualization Data Center Gear

Top-Down Network Design

Migrate from Cisco Catalyst 6500 Series Switches to Cisco Nexus 9000 Series Switches

A Scalable, Commodity Data Center Network Architecture

VIRTUAL CLUSTER SWITCHING SWITCHES AS A CLOUD FOR THE VIRTUAL DATA CENTER. Emil Kacperek Systems Engineer Brocade Communication Systems.

Deploy Microsoft SQL Server 2014 on a Cisco Application Centric Infrastructure Policy Framework

THE NETWORK AND THE CLOUD

Lecture 7: Data Center Networks

Trends in Data Centre

Chapter 6 Connecting Device

Multi-site Datacenter Network Infrastructures

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

Brocade Ethernet Fabrics

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Cloud networking (VITMMA02) DC network topology, Ethernet extensions

Achieving Efficient Bandwidth Utilization in Wide-Area Networks While Minimizing State Changes

Lecture 7 Advanced Networking Virtual LAN. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Data Link Layer. Our goals: understand principles behind data link layer services: instantiation and implementation of various link layer technologies

Top-Down Network Design

ENTERPRISE MPLS. Kireeti Kompella

SPARTA: Scalable Per-Address RouTing Architecture

DELL EMC READY BUNDLE FOR VIRTUALIZATION WITH VMWARE AND ISCSI INFRASTRUCTURE

internet technologies and standards

Deployments and Network Topologies

15-744: Computer Networking. Data Center Networking II

Network Design Considerations for Grid Computing

VXLAN Design with Cisco Nexus 9300 Platform Switches

Architecture. SAN architecture is presented in these chapters: SAN design overview on page 16. SAN fabric topologies on page 24

Cisco CloudCenter Solution with Cisco ACI: Common Use Cases

Meraki MX Family Cloud Managed Security Appliances

Cisco EXAM Cisco ADVDESIGN. Buy Full Product.

Configuration and Management of Networks. Pedro Amaral

2. Layer-2 Ethernet-based VPN

Barcelona: a Fibre Channel Switch SoC for Enterprise SANs Nital P. Patwa Hardware Engineering Manager/Technical Leader

Extending the LAN. Context. Info 341 Networking and Distributed Applications. Building up the network. How to hook things together. Media NIC 10/18/10

Service Graph Design with Cisco Application Centric Infrastructure

NETWORK OVERLAYS: AN INTRODUCTION

Hálózatok üzleti tervezése

DD2490 p Layer 2 networking. Olof Hagsand KTH CSC

Juniper Virtual Chassis Technology: A Short Tutorial

Data Center Interconnect Solution Overview

Plexxi Theory of Operations White Paper

THE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER

Introduction. Figure 1 - AFS Applicability in CORD Use Cases

Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016

TDT Appendix E Interconnection Networks

The Cisco HyperFlex Dynamic Data Fabric Advantage

THE EXPONENTIAL DATA CENTER

Cisco HyperFlex Systems

New Product: Cisco Catalyst 2950 Series Fast Ethernet Desktop Switches

SD-WAN Deployment Guide (CVD)

Pass-Through Technology

Fundamentals of IP Networking 2017 Webinar Series Part 4 Building a Segmented IP Network Focused On Performance & Security

Principles behind data link layer services

High Performance Ethernet for Grid & Cluster Applications. Adam Filby Systems Engineer, EMEA

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Introduction to Cisco ASR 9000 Series Network Virtualization Technology

Securely Access Services Over AWS PrivateLink. January 2019

IP Mobility Design Considerations

VIRTUALIZING SERVER CONNECTIVITY IN THE CLOUD

Introduction to OSPF

INTRODUCTION 2 DOCUMENT USE PREREQUISITES 2

Implementing VXLAN. Prerequisites for implementing VXLANs. Information about Implementing VXLAN

VMware vsan Network Design-OLD November 03, 2017

Goals and topics. Verkkomedian perusteet Fundamentals of Network Media T Circuit switching networks. Topics. Packet-switching networks

Transcription:

Data Center Networking Cloud Computing Data Centers Prof. Andrzej Duda 1 2 What s a Cloud ervice Data Center? Cloud Computing Electrical power and economies of scale determine total data center size: 50,000 200,000 servers today ervers divided up among hundreds of different services cale-out is paramount: some services have 10s of servers, some have 10s of 1000s Figure by Advanced Data Centers 3 Data Center Costs Amortized Cost* Component ub-components ~45% ervers CPU, memory, disk ~25% Power infrastructure UP, cooling, power distribution ~15% Power draw Electrical utility costs ~15% Network witches, links, transit *3 yr amortization for servers, 15 yr for infrastructure; 5% cost of money Total cost varies upwards of $1/4 B for mega data center server costs dominate network costs significant Long provisioning timescales: new servers purchased quarterly at best 4 Overall Data Center Design Goal Agility Any service, Any erver Turn the servers into a single large fungible pool Let services breathe : dynamically expand and contract their footprint as needed We already see how this is done in terms of Google s GF, BigTable, MapReduce Benefits Increase service developer productivity Lower cost Achieve high performance and reliability These are the three motivators for most data center infrastructure projects! 5 Cloud Computing Elastic resources Expand and contract resources Pay-per-use Infrastructure on demand Multi-tenancy Multiple independent users ecurity and resource isolation Amortize the cost of the (shared) infrastructure Flexibility service management Resiliency: isolate failure of servers and storage Workload movement: move work to other locations 6 1

and Web From traditional web to web service (or OA) no longer simply file (or web page) downloads pages often dynamically generated, more complicated objects (e.g., Flash videos used in YouTube) HTTP is used simply as a transfer protocol many other application protocols layered on top of HTTP web services & OA (service-oriented architecture) A schematic representation of modern web services web rendering, request routing, database, storage, computing, aggregators, Data Center Network front-end back-end 7 8 Networking Objectives 1. Uniform high capacity Capacity between servers limited only by their NICs No need to consider topology when adding servers => In other words, high capacity between two any servers no matter which racks they are located! 2. Performance isolation Traffic of one service should be unaffected by others 3. Ease of management: Plug-&-Play (layer-2 semantics) Flat addressing, so any server can have any IP address erver configuration is the same as in a LAN Legacy applications depending on broadcast must work What goes into a datacenter (network)? ervers organized in racks 9 10 What goes into a datacenter (network)? What goes into a datacenter (network)? ervers organized in racks Each rack has a `Top of Rack (ToR) switch ervers organized in racks Each rack has a `Top of Rack (ToR) switch An `aggregation fabric interconnects ToR switches 11 12 2

Introduction Top-of-Rack Architecture Top-of-Rack Architecture Datacenter Networks (2) Rack of servers A rack has ~20-40 servers Commodity servers And top-of-rack switch Modular design Preconfigured racks Power, network, and storage cabling Aggregate to the next level Example of a TOR switch with 48 ports 7 13 14 Tuesday 6 May 2014 CALE! Example 1 Brocade reference design 15 16 Canonical Data Center Architecture How big exactly? 1M servers [Microsoft] Core (L3) less than Google, more than Amazon Aggregation (L2) > $1B to build one site [Facebook] Edge (L2) Top-of-Rack >$20M/month/site operational costs [Microsoft 09] Application servers But only O(10-100) sites 17 3

Data Center Cisco Architecture Example configuration Data center with 11'520 machines Machines organized in racks and rows Data center with 24 rows Each row with 12 racks Each rack with 40 blades Machines in a rack interconnected with a ToR switch (access layer) ToR witch with 48 GbE ports and 4 10GbE uplinks ToR switches connect to End-of-Row (EoR) switches via 1-4 10GigE uplinks (aggregation layer) For fault-tolerance ToR might be connected to EoR switches of different rows EoR switches typically 10GbE To support 12 ToR switches EoR would have to have 96 ports (4*12*2) 19 Core witch layer 12 10GigE switches with 96 ports each (24*48 ports) 20 Componen'za'on leads to different types of network traffic North-outh traffic Traffic between external clients and the datacenter Handled by front-end (web) servers, mid-'er applica'on servers, and back-end databases Traffic paferns fairly stable, though diurnal varia'ons Wide-Area Network ervers Router DN erver...... Data Centers ervers Router 21 DN-based site selection Clients 22 North-outh Traffic Componen'za'on leads to different types of network traffic user requests from the Front-End Proxy Router Front-End Proxy North-outh traffic Traffic between external clients and the datacenter Handled by front-end (web) servers, mid-'er applica'on servers, and back-end databases Traffic paferns fairly stable, though diurnal varia'ons Web erver Web erver Web erver East-West traffic Traffic between machines in the datacenter Comm within big data computa'ons (e.g. Map Reduce) Traffic may shio on small 'mescales (e.g., minutes) Data Cache Data Cache Database Database 23 24 4

East-West Traffic East-West Traffic A A A A A A... A A A A A A Distributed torage Map Tasks Reduce Tasks Distributed torage 25 26 Often doesn t East-West cross the Traffic network ome fraction (typically 2/3) crosses the network Huge scale: ~20,000 switches/routers contrast: AT&T ~500 routers Distributed torage Map Always goes Reduce Tasks over the network Tasks Distributed torage 27 Huge scale: Limited geographic scope: High bandwidth: 10/40/100G Contrast: Cable/aDL/WiFi Very low RTT: 10s of microseconds Contrast: 100s of milliseconds in the WAN Huge scale Limited geographic scope ingle administrative domain Can deviate from standards, invent your own, etc. Green field deployment is still feasible 29 30 5

Huge scale Limited geographic scope ingle administrative domain Control over one/both endpoints can change (say) addressing, congestion control, etc. can add mechanisms for security/policy/etc. at the endpoints (typically in the hypervisor) Huge scale Limited geographic scope ingle administrative domain Control over one/both endpoints Control over the placement of traffic source/sink e.g., map-reduce scheduler chooses where tasks run alters traffic pattern (what traffic crosses which links) 31 32 Huge scale Limited geographic scope ingle administrative domain Control over one/both endpoints Control over the placement of traffic source/sink Regular/planned topologies (e.g., trees/fat-trees) Contrast: ad-hoc WAN topologies (dictated by real-world geography and facilities) 33 Huge scale Limited geographic scope ingle administrative domain Control over one/both endpoints Control over the placement of traffic source/sink Regular/planned topologies (e.g., trees/fat-trees) Limited heterogeneity link speeds, technologies, latencies, 34 High Bandwidth Ideal: Each server can talk to any other server at its full access link rate DC Network: Just a Giant witch! Conceptually: DC network as one giant switch H1 H2 H3 H4 H5 H6 H7 H8 H9 lides from: Alizadeh, HotNets 2012 6

DC Network: Just a Giant witch! DC Network: Just a Giant witch! TX H1 H2 H3 H4 H5 H6 H7 H8 H9 H9 H8 H7 H6 H5 H4 H3 H2 H1 RX TX H1 H2 H3 H4 H5 H6 H7 H8 H9 H9 H8 H7 H6 H5 H4 H3 H2 H1 RX lides from: Alizadeh, HotNets 2012 lides from: Alizadeh, HotNets 2012 High Bandwidth Ideal: Each server can talk to any other server at its full access link rate Conceptually: DC network as one giant switch Would require a 10 Pbits/sec switch! 1M ports (one port/server) 10Gbps per port Practical approach: build a network of switches ( fabric ) with high bisection bandwidth Each switch has practical #ports and link speeds Performance Properties of a Network: Bisection Bandwidth Bisection bandwidth: bandwidth across smallest cut that divides network into two equal halves Bandwidth across narrowest part of the network bisection cut bisection bw= link bw bisection bw = sqrt(n) * link bw not a bisection cut Why is it relevant: if traffic is completely random, the probability of a message going across the two halves is 1 2 if all nodes send a message, the bisection bandwidth will have to be N/2 Full Bisec'on Bandwidth Goals Extreme bisection bandwidth requirements recall: all that east-west traffic target: any server can communicate at its full link speed problem: server s access link is 10Gbps! O(40x10x100) Gbps O(40x10)Gbps 10Gbps...... 41 Traditional tree topologies scale up A A A A A A ~ 40-80 servers/rack full bisection bandwidth is expensive typically, tree topologies oversubscribed 42 7

A cale Out Design Full Bisec'on Bandwidth Not ufficient Build mul'-stage `Fat Trees out of k-port switches k/2 ports up, k/2 down upports k 3 /4 hosts: 48 ports, 27,648 hosts All links are the same speed (e.g. 10Gps) 43 To realize full bisec'onal throughput, rou'ng must spread traffic across paths Enter load-balanced rou'ng How? (1) Let the network split traffic/flows at random (e.g., ECMP protocol -- RFC 2991/2992) How? (2) Centralized flow scheduling? Many more research proposals 44 Goals Extreme bisection bandwidth requirements Extreme latency requirements real money on the line current target: 1µs RTTs how? cut-through switches making a comeback how? avoid congestion how? fix TCP timers (e.g., default timeout is 500ms!) how? fix/replace TCP to more rapidly fill the pipe Advanced Data Center Architectures 45 46 Data Center Cisco Architecture Reminder: Layer 2 vs. Layer 3 Ethernet switching (layer 2) Cheaper switch equipment Fixed addresses and auto-configuration eamless mobility, migration, and failover IP routing (layer 3) calability through hierarchical addressing Efficiency through shortest-path routing Multipath routing through Equal-Cost MultiPath (ECMP) o, like in enterprises Data centers often connect layer-2 islands by IP routers 47 48 8

Load Balancers pread load over server replicas Present a single public address (VIP) for a service Direct each request to a server replica Virtual IP (VIP) 192.121.10.1 10.10.10.1 10.10.10.2 10.10.10.3 49 Is current DC Architecture Adequate? Hierarchical network; 1+1 redundancy Equipment higher in the hierarchy handles more traffic more expensive, more efforts made at availability è scale-up design ervers connect via 1 Gbps UTP to Top-of-Rack switches Other links are mix of 1G, 10G; fiber, copper Uniform high capacity? Performance isolation? typically via VLANs Agility in terms of dynamically adding or shrinking servers? Agility in terms of adapting to failures, and to traffic dynamics? Ease of management? Data Center Layer 3 Layer 2 A A A A A A Key: = L3 Core Router = L3 Access Router = L2 witch = Load Balancer A = Top of Rack switch 50 Internal Fragmentation Prevents Applications from Dynamically Growing/hrinking No Performance Isolation Collateral damage VLANs used to isolate properties from each other IP addresses topologically determined by s Reconfiguration of IPs and VLAN trunks painful, errorprone, slow, often manual 51 VLANs typically provide only reachability isolation One service sending/recving too much traffic hurts all services sharing its subtree 52 Network has Limited erver-to-erver Capacity, and Requires Traffic Engineering to Use What It Has Capacity Mismatch 10:1 over-subscription or worse (80:1, 240:1) ~ 200:1 ~ 5:1 A A A ~ 40:1 A A A... A A A A A A Data centers run two kinds of applications: Outward facing (serving web pages to users) Internal computation (computing search index think HPC) 53 54 9

Network Needs Greater Bisection BW, and Requires Traffic Engineering to Use What It Has Data centers run two kinds of applications: Dynamic reassignment of servers and Map/Reduce-style computations mean traffic matrix is constantly changing Explicit traffic engineering is a nightmare Outward facing (serving web pages to users) Internal computation (computing search index think HPC) 55 Objectives for the Network of ingle Data Center Developers want network virtualization: a mental model where all their servers, and only their servers, are plugged into an Ethernet switch Uniform high capacity Capacity between two servers limited only by their NICs No need to consider topology when adding servers Performance isolation Traffic of one service should be unaffected by others Layer-2 semantics Flat addressing, so any server can have any IP address erver configuration is the same as in a LAN Legacy applications depending on broadcast must work 56 Monsoon Monsoon approach Layer 2 based using commodity switches Hierarchy has 2 types of switches: access switches (top of rack) load balancing switches Eliminate spanning tree Flat routing Allows network to take advantage of path diversity Prevent MAC address learning Monsoon Agent distribute data plane information TOR: Only need to learn address for the intermediate switches Core: learn for TOR switches upport efficient grouping of hosts (VLAN replacement) 57 Moonson Monsoon Components Top-of-Rack switch: Aggregate traffic from 20 end host in a rack Performs IP to MAC translation Intermediate witch Disperses traffic Balances traffic among switches Used for Valiant load balancing Decision Element Places routes in switches Maintain a directory services of IP to MAC Endhost Performs IP to MAC lookup 10

Valiant Load Balancing Interconnection structure r 2r/N 1 2 r You must set up a network peering N x N, N = 10, where each connected source can generate traffic up to 1 Gb/s. r N 3 r What would be an interconnection structure based Ethernet switches that have the following characteristics: 1 port of 1 Gb/s, 10 ports of 200 Mb/s r... 4 r 61 Interconnection structure You have N Ethernet switches with 100 ports of 1 Gb/s. You need to design an interconnection structure that can support any traffic matrix. What is the largest single network you can build (maximum number of server-facing ports R)? How many switches N are required to build the largest possible network? witch Topology Forwarding Example topology for layer 2 switches connecting 103,680 servers. Uses Valiant Load Balancing to support any feasible traffic matrix. 11

IEEE 802.1ah (Provider Backbone Bridge) Context Provider Backbone Bridge Network (N) 802.1ad Interfaces Provider Bridge Network (802.1ad) Provider Backbone Bridge Network (802.1ah) Provider Bridge Network (802.1ad) Provider Bridge Network (802.1ad) PB PB PB 802.1aj PB PB PB: Provider Bridge (as defined by 802.1ad) : Provider Backbone Bridge Edge (as defined by 802.1ah) CFM - Connectivity Fault Management (802.1ag) runs end-to-end 69 70 N: tunnels between PBNs PBN PBN B-VLAN X Encapsulation -VLAN 4 BBN -VLAN 3 PBN PBN -VLAN 1 -VLAN 2 B-VLAN Y BB PB: Provider Backbone Bridge Edge Each B-VLAN carries many -VLANs -VLANs may be carried on a subset of a B-VLAN (i.e. all P-P - VLANs could be carried on a single MP B-VLAN providing connection to all end points. 71 72 Agreed Terminology External Connections IEEE 802.1ad Terminology C-TAG Customer VLAN TAG C-VLAN Customer VLAN C-VID Customer VLAN ID -TAG ervice VLAN TAG -VLAN ervice VLAN -VID ervice VLAN ID Additional Provider Backbone Bridge Terminology I-TAG Extended ervice TAG I-ID Extended ervice ID C-MAC Customer MAC Address B-MAC Backbone MAC Address B-VLAN Backbone VLAN (tunnel) B-TAG Backbone TAG Field B-VID Backbone VLAN ID (tunnel) 73 Network path for connections across the. ECMP provides resiliency at Layer 3 for Access Router failures. Traffic is routed to nodes inside the data center with the help of Ingress ervers. 12

Equal Cost Multi-Path How routing works R 1 ECMP Routing R 2 R 4 End-host checks flow cache for MAC of flow If not found ask monsoon agent to resolve Agent returns list of MACs for server and MACs for intermediate routers R 3 Three packets arrive at R 1 for destination R 4 P 1 : IP dst=r 4, TCP dst port=22 P 2 : IP dst=r 4, TCP dst port=80 P 1 : IP dst=r 4, TCP dst port=80 end traffic to Top of Router Traffic is triple encapsulated Traffic is sent to intermediate destination Traffic is sent to Top of Rack switch of destination Monsoon Agent Lookup networking stack of a host. The M 13