L19 Data Center Network Architectures

Similar documents
Lecture 7: Data Center Networks

CS 6453 Network Fabric Presented by Ayush Dubey

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

Data Center Fundamentals: The Datacenter as a Computer

Network Traffic Characteristics of Data Centers in the Wild. Proceedings of the 10th annual conference on Internet measurement, ACM

CSE 291: Data Center Networking. Spring 2015 Tu/Th 8:00-9:20am George Porter UC San Diego

Data Centers and Cloud Computing

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Data Centers

Lecture 16: Data Center Network Architectures

Data Center Networks. Networking Case Studies. Cloud CompuMng. Cloud CompuMng. Cloud Service Models. Cloud Service Models

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Datacenter Backbone Enterprise Cellular Wireless

Congestion Control in Datacenters. Ahmed Saeed

Data Center Network Topologies II

Dynamic Distributed Flow Scheduling with Load Balancing for Data Center Networks

Advanced Computer Networks Data Center Architecture. Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015

Best Practices for Validating the Performance of Data Center Infrastructure. Henry He Ixia

Knowledge-Defined Network Orchestration in a Hybrid Optical/Electrical Datacenter Network

A Scalable, Commodity Data Center Network Architecture

Introduction To Cloud Computing

Lecture 7: Data Center Networks

Data Center Switch Architecture in the Age of Merchant Silicon. Nathan Farrington Erik Rubow Amin Vahdat

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

Data Center Interconnect Solution Overview

Data Center Network Topologies

NaaS Network-as-a-Service in the Cloud

Distributed Systems. 31. The Cloud: Infrastructure as a Service Paul Krzyzanowski. Rutgers University. Fall 2013

Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack

Data Center Networking

Securely Access Services Over AWS PrivateLink. January 2019

Evolution of Data Center Security Automated Security for Today s Dynamic Data Centers

Data Center architecture trends and their impact on PMD requirements

Architecting Data Center Networks in the era of Big Data and Cloud

T Computer Networks II Data center networks

Way to Implement SDN Network In Data Center

Software-Defined Data Centers

Module Day Topic. 1 Definition of Cloud Computing and its Basics

Emerging Architecture for Cloud Computing

Cloud Computing. What is cloud computing. CS 537 Fall 2017

Mellanox Virtual Modular Switch

Introduction to data centers

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Automating Cloud Networking with RedHat OpenStack

Enabling Efficient and Scalable Zero-Trust Security

Routing Domains in Data Centre Networks. Morteza Kheirkhah. Informatics Department University of Sussex. Multi-Service Networks July 2011

Micro load balancing in data centers with DRILL

Towards Predictable + Resilient Multi-Tenant Data Centers

Per-Packet Load Balancing in Data Center Networks

c-through: Part-time Optics in Data Centers

The End of Storage. Craig Nunes. HP Storage Marketing Worldwide Hewlett-Packard

GUIDE. Optimal Network Designs with Cohesity

XCo: Explicit Coordination for Preventing Congestion in Data Center Ethernet

CSC 401 Data and Computer Communications Networks

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

Frequently Asked Questions for HP EVI and MDC

IT Infrastructure: Poised for Change

Cloud networking (VITMMA02) DC network topology, Ethernet extensions

CHEM-E Process Automation and Information Systems: Applications

Oracle IaaS, a modern felhő infrastruktúra

Jellyfish. networking data centers randomly. Brighten Godfrey UIUC Cisco Systems, September 12, [Photo: Kevin Raskoff]

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Concise Paper: Freeway: Adaptively Isolating the Elephant and Mice Flows on Different Transmission Paths

Introduction to Cloud Computing. [thoughtsoncloud.com] 1

THE EXPONENTIAL DATA CENTER

ECE Enterprise Storage Architecture. Fall ~* CLOUD *~. Tyler Bletsch Duke University

Cloud & container monitoring , Lars Michelsen Check_MK Conference #4

Architecting the High Performance Storage Network

Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden

- Intranet, extranet, internet

SEVONE END USER EXPERIENCE

CHARTING THE FUTURE OF SOFTWARE DEFINED NETWORKING

Service Mesh and Microservices Networking

Volley: Automated Data Placement for Geo-Distributed Cloud Services

The Next Opportunity in the Data Centre

A Spot Capacity Market to Increase Power Infrastructure Utilization in Multi-Tenant Data Centers

Enterprise. Nexus 1000V. L2/L3 Fabric WAN/PE. Customer VRF. MPLS Backbone. Service Provider Data Center-1 Customer VRF WAN/PE OTV OTV.

White Paper. Platform9 ROI for Hybrid Clouds

Cloud Computing. Ennan Zhai. Computer Science at Yale University

T Computer Networks II Data center networks

Architecting Low Latency Cloud Networks

SAFEGUARDING YOUR VIRTUALIZED RESOURCES ON THE CLOUD. May 2012

Get Your Datacenter SDN Ready. Ahmad Chehime Cisco ACI Strategic Product Sales Specialist SPSS Emerging Region

Navigating the Pros and Cons of Structured Cabling vs. Top of Rack in the Data Center

Optimizing Network Performance in Distributed Machine Learning. Luo Mai Chuntao Hong Paolo Costa

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Top-Down Network Design

WIND RIVER TITANIUM CLOUD FOR TELECOMMUNICATIONS

Data Center TCP (DCTCP)

Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures. Dimitra Simeonidou

Application-Specific Configuration Selection in the Cloud: Impact of Provider Policy and Potential of Systematic Testing

A Two-Phase Multipathing Scheme based on Genetic Algorithm for Data Center Networking

Cloud 3.0 and Software Defined Networking October 28, Amin Vahdat on behalf of Google Technical Infratructure Google Fellow

Windows Azure Services - At Different Levels

Network Implications of Cloud Computing Presentation to Internet2 Meeting November 4, 2010

Understanding Data Center Traffic Characteristics

DELL EMC READY BUNDLE FOR VIRTUALIZATION WITH VMWARE AND ISCSI INFRASTRUCTURE

VMware Join the Virtual Revolution! Brian McNeil VMware National Partner Business Manager

GRIN: Utilizing the Empty Half of Full Bisection Networks

Transcription:

L19 Data Center Network Architectures by T.S.R.K. Prasad EA C451 Internetworking Technologies 27/09/2012

References / Acknowledgements [Feamster-DC] Prof. Nick Feamster, Data Center Networking, CS6250: Computer Networking, Fall 2011. [Al-Fares] [Hamilton] Mohammad Al-Fares, Alexander Loukissas and Amin Vahdat, A Scalable, Commodity Data Center Network Architecture, SIGCOMM-2008. James Hamilton, An Architecture for Modular Data Centers, CIDR 2007. [Benson] Theophilus Benson, Aditya Akella and David A. Maltz, Network Traffic Characteristics of Data Centers in the Wild, IMC-2010. References

Optional Readings [Gill] Phillipa Gill, Navendu Jain and Nachiappan Nagappan, Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications, SIGCOMM-2011. [Cisco-DCI] Cisco Data Center Infrastructure 2.5 Design Guide, 2007. [Greenberg] Albert Greenberg, James Hamilton, David A. Maltz, Parveen Patel, The Cost of a Cloud: Research Problems in Data Center Networks. Optional Reading

Self-Study [Greenberg2] Albert Greenberg, Parantap Lahiri, David A. Maltz, Parveen Patel, Sudipta Sengupta, Towards a Next Generation Data Center Architecture: Scalability and Commoditization, Presto 2008. Self-Study

Presentation Overview Traffic Characteristics Architectural Improvements Basic Architecture Motivation Lecture Outline

Presentation Overview Traffic Characteristics Architectural Improvements Basic Architecture Motivation Lecture Outline

Cloud Computing Elastic resources Expand and contract resources Pay-per-use Infrastructure on demand Multi-tenancy Multiple independent users Security and resource isolation Amortize the cost of the (shared) infrastructure Flexibility service management Resiliency: isolate failure of servers and storage Workload movement: move work to other locations Motivation Cloud Computing

Trend of Data Center By J. Nicholas Hoover, InformationWeek June 17, 2008 04:00 AM http://gigaom.com/cloud/google-builds-megadata-center-in-finland/ 200 million Euro http://www.datacenterknowledge.com Data centers will be larger and larger in the future cloud computing era to benefit from commodities of scale. Tens of thousand Hundreds of thousands in # of servers Most important things in data center management - Economies of scale - High utilization of equipment - Maximize revenue - Amortize administration cost - Low power consumption (http://www.energystar.gov/ia/partners/prod_development/downloads/epa_datacenter_report_congress_final1.pdf) Motivation Trend of Data Center

Cloud Service Models Software as a Service (Saas) Provider licenses applications to users as a service e.g., customer relationship management, email, Avoid costs of installation, maintenance, patches, Platform as a Service (PaaS) Provider offers software platform for building applications e.g., Google s App-Engine Avoid worrying about scalability of platform Infrastructure as a Service (IaaS) Provider offers raw computing, storage, and network e.g., Amazon s Elastic Computing Cloud (EC2) Avoid buying servers and estimating resource needs Motivation Cloud Service Models

Multi-Tier Applications Applications consist of tasks Many separate components Running on different machines Commodity computers Many general-purpose computers Not one big mainframe Easier scaling Front end Server Aggregator Aggregator Aggregator Aggregator Worker 10 Worker Worker Worker Worker Motivation Multi-Tier Applications

Enabling Technology: Virtualization Multiple virtual machines on one physical machine Applications run unmodified as on real machine VM can migrate from one computer to another Motivation Virtualization

Status Quo: Virtual Switch in Server Motivation vswitch

Presentation Overview Traffic Characteristics Architectural Improvements Basic Architecture Motivation Lecture Outline

Common Data Center Topology Core Internet Layer-3 router Data Center Aggregation Layer-2/3 switch Access Layer-2 switch Servers Basic Architecture Common DC Topology

Top-of-Rack (ToR) Architecture Rack of servers Commodity servers And top-of-rack switch Modular design Preconfigured racks Power, network, and storage cabling Aggregate to the next level Basic Architecture ToR Architecture

Modularity Containers Many containers Basic Architecture Modularity

Data Center Network Topology Basic Architecture DC Network Topology

Problems with Common Topologies Single point of failure Over subscription of links higher up in the topology Tradeoff between cost and provisioning Basic Architecture Problems with Common Topology

Requirements for future data center To catch up with the trend of mega data center, DCN technology should meet the requirements as below High Scalability Transparent VM migration (high agility) Easy deployment requiring less human administration Efficient communication Loop free forwarding Fault Tolerance Current DCN technology can t meet the requirements. Layer 3 protocol can not support the transparent VM migration. Current Layer 2 protocol is not scalable due to the size of forwarding table and native broadcasting for address resolution. Basic Architecture Future Requirements

Capacity Mismatch Basic Architecture Capacity Mismatch

Data-Center Routing Basic Architecture DC Routing

Reminder: Layer 2 vs. Layer 3 Ethernet switching (layer 2) Cheaper switch equipment Fixed addresses and auto-configuration Seamless mobility, migration, and failover IP routing (layer 3) Scalability through hierarchical addressing Efficiency through shortest-path routing Multipath routing through equal-cost multipath So, like in enterprises Data centers often connect layer-2 islands by IP routers Basic Architecture L2 vs L3

Need for Layer 2 Certain monitoring apps require server with same role to be on the same VLAN Using same IP on dual homed servers Allows organic growth of server farms Migration is easier Basic Architecture Need for L2

Review of Layer 2 & Layer 3 Layer 2 One spanning tree for entire network Prevents loops Ignores alternate paths Layer 3 Shortest path routing between source and destination Best-effort delivery Basic Architecture Review of L2 vs L3

Data Center Challenges Traffic load balance Support for VM migration Achieving bisection bandwidth Power savings / Cooling Network management (provisioning) Security (dealing with multiple tenants) Basic Architecture DC Challenges

Data Center Costs (Monthly Costs) Servers: 45% CPU, memory, disk Infrastructure: 25% UPS, cooling, power distribution Power draw: 15% Electrical utility costs Network: 15% Switches, links, transit http://perspectives.mvdirona.com/2008/11/28/costofpowerinlargescaledatacenters.aspx Basic Architecture Dc Costs

Presentation Overview Traffic Characteristics Architectural Improvements Basic Architecture Motivation Lecture Outline

FAT Tree-Based Solution Connect end-host together using a fat-tree topology Infrastructure consist of cheap devices Each port supports same speed as endhost All devices can transmit at line speed if packets are distributed along existing paths A k-port fat tree can support k 3 /4 hosts Naming controversy Architectural Improvements FAT Tree-Based Solution

Fat-Tree Topology Architectural Improvements Fat-Tree Topology

DC Architecture with Load Balancers (LB) Architectural Improvements DC Architecture with LBs

Monsoon Architecture Architectural Improvements Monsoon Architecture

Monsoon Architecture Aggregation and access layers form a full mesh topology Architectural Improvements Monsoon Architecture

Presentation Overview Traffic Characteristics Architectural Improvements Basic Architecture Motivation Lecture Outline

The Importance of Speed A 1-millisecond advantage in trading applications can be worth $100 million a year to a major brokerage firm Traffic Characteristics

Dataset: Data Centers Studied 10 data centers 3 classes Universities Private enterprise Clouds Internal users Univ/priv Small Local to campus External users Clouds Large Globally diverse Traffic Characteristics Dataset DC Role DC Name Location Universities EDU1 US-Mid 22 Private Enterprise Commercial Clouds EDU2 US-Mid 36 EDU3 US-Mid 11 PRV1 US-Mid 97 PRV2 US-West 100 CLD1 US-West 562 CLD2 US-West 763 CLD3 US-East 612 CLD4 S. America 427 CLD5 S. America 427 Number Devices

Canonical Data Center Architecture Capture Points Core (L3) Aggregation (L2) SNMP & Topology From ALL Links Edge (L2) Top-of-Rack Packet Sniffers Application servers Traffic Characteristics Capture Points

Traffic by Applications 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% PRV2_1 PRV2_2 PRV2_3 PRV2_4 EDU1 EDU2 EDU3 Differences between various bars Clustering of applications PRV2_2 hosts secured portions of applications PRV2_3 hosts unsecure portions of applications AFS NCP SMB LDAP HTTPS HTTP OTHER Traffic Characteristics Traffic by Applications

Analyzing Packet Traces Transmission patterns of the applications ON-OFF traffic at edges Binned in 15 and 100 m. secs Traffic Characteristics Analyzing Packet Traces

Data-Center Traffic is Bursty Understanding arrival process Range of acceptable models Data Center Off Period Dist ON periods Dist Inter-arrival Dist What is the arrival process? Heavy-tail for the 3 distributions ON, OFF times, Inter-arrival, Lognormal across all data centers Prv2_1 Lognormal Lognormal Lognormal Prv2_2 Lognormal Lognormal Lognormal Prv2_3 Lognormal Lognormal Lognormal Prv2_4 Lognormal Lognormal Lognormal EDU1 Lognormal Weibull Weibull EDU2 Lognormal Weibull Weibull EDU3 Lognormal Weibull Weibull Different from Pareto of WAN Need new models Traffic Characteristics DC Traffic is Bursty 39

Packet Size Distribution Bimodal (200B and 1400B) Small packets TCP acknowledgements Keep alive packets Persistent connections important to apps Traffic Characteristics Packet Size Distribution

Intra-Rack Versus Extra-Rack Results 100 90 80 70 60 50 40 30 20 10 0 EDU1 EDU2 EDU3 PRV1 PRV2 CLD1 CLD2 CLD3 CLD4 CLD5 Extra-Rack Inter-Rack Clouds: most traffic stays within a rack (75%) Colocation of apps and dependent components Other DCs: > 50% leaves the rack Un-optimized placement Traffic Characteristics Traffic Destination

Observations from Interconnect Links utils low at edge and agg Core most utilized Hot-spots exists (> 70% utilization) < 25% links are hotspots Loss occurs on less utilized links (< 70%) Implicating momentary bursts Time-of-Day variations exists Variation an order of magnitude larger at core Traffic Characteristics Observations from Interconnect

Argument for Larger Bisection Need for larger bisection VL2 [Sigcomm 09], Monsoon [Presto 08],Fat-Tree [Sigcomm 08], Portland [Sigcomm 09], Hedera [NSDI 10] Congestion at oversubscribed core links Increase core links and eliminate congestion Traffic Characteristics Argument for Larger Bisection

Calculating Bisection Demand Core Aggregation Edge Application servers Bisection Links (bottleneck) App Links If Σ traffic (App ) > 1 then more device are Σ capacity(bisection needed at the bisection Traffic Characteristics Calculating Bisection Demand

Bisection Demand Given our data: current applications and DC design NO, more bisection is not required Aggregate bisection is only 30% utilized Need to better utilize existing network Load balance across paths Migrate VMs across racks Traffic Characteristics Bisection Demand

Insights Gained 75% of traffic stays within a rack (Clouds) Applications are not uniformly placed Half packets are small (< 200B) Keep alive integral in application design At most 25% of core links highly utilized Effective routing algorithm to reduce utilization Load balance across paths and migrate VMs Questioned popular assumptions Do we need more bisection? No Is centralization feasible? Yes Traffic Characteristics Insights Gained