Connecting the World:

Similar documents
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

internet technologies and standards

Content Deployment and Caching Techniques in Africa

Delivering Microservices Securely and at Scale with NGINX in Red Hat OpenShift. November, 2017

IP Fabric Reference Architecture

IETF 81 World IPv6 Day Operators Review

Untrusting the network. Aijay Adams Jose Leitao Production Network Engineers

Lecture 4: Intradomain Routing. CS 598: Advanced Internetworking Matthew Caesar February 1, 2011

DESIGN, IMPLEMENTATION, AND OPERATION OF IPV6-ONLY IAAS SYSTEM WITH IPV4-IPV6 TRANSLATOR FOR TRANSITION TOWARD THE FUTURE INTERNET DATACENTER

Minimum Viable FIB. a.k.a Route Scale on ToR. VP, Infrastructure Minimum Viable FIB

Zero to Microservices in 5 minutes using Docker Containers. Mathew Lodge Weaveworks

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

Cisco ACI Multi-Pod/Multi-Site Deployment Options Max Ardica Principal Engineer BRKACI-2003

SaaS Providers. ThousandEyes for. Summary

Automating Cloud Networking with RedHat OpenStack

ThousandEyes for. Application Delivery White Paper

BGP MIGRATIONS IN A LIVE DATACENTER

Edge Datacenter Placement BY ABHISHEK GUPTA FRIDAY GROUP MEETING JUNE 10, 2016

ACI Multi-Site Architecture and Deployment. Max Ardica Principal Engineer - INSBU

Network Service Description

Looking Beyond the Internet

Introduction to Segment Routing

Hierarchical Fabric Designs The Journey to Multisite. Lukas Krattiger Principal Engineer September 2017

Making the Internet fast, reliable and secure

A Library and Proxy for SPDY

Internet Measurements. Motivation

Akamai's V6 Rollout Plan and Experience from a CDN Point of View. Christian Kaufmann Director Network Architecture Akamai Technologies, Inc.

Design and implementation of an MPLS based load balancing architecture for Web switching

Akamai's V6 Rollout Plan and Experience from a CDN Point of View. Christian Kaufmann Director Network Architecture Akamai Technologies, Inc.

KillTest ᦝ䬺 䬽䭶䭱䮱䮍䭪䎃䎃䎃ᦝ䬺 䬽䭼䯃䮚䮀 㗴 㓸 NZZV ]]] QORRZKYZ PV ٶ瀂䐘މ悹伥濴瀦濮瀃瀆ݕ 濴瀦

Extreme Networks How to Build Scalable and Resilient Fabric Networks

A New Approach to Fixing Internet Application Performance. Elad Rave, Founder and CEO

A Security Orchestration System for CDN Edge Servers

Oracle IaaS, a modern felhő infrastruktúra

How Teridion Works. A Teridion Technical Paper. A Technical Overview of the Teridion Virtual Network. Teridion Engineering May 2017

Datacenter Wide- area Enterprise

Student ID: CS457: Computer Networking Date: 3/20/2007 Name:

Seagate Kinetic Open Storage Platform. Mayur Shetty - Senior Solutions Architect

Robotron: Top-down Network Management at. Yu-Wei Eric Sung, Xiaozheng Tie, Starsky H.Y. Wong, Hongyi Zeng ACM SIGCOMM 2016 August 25, 2016

Solution Guide. Infrastructure as a Service: EVPN and VXLAN. Modified: Copyright 2016, Juniper Networks, Inc.

Outlines. Introduction (Cont d) Introduction. Introduction Network Evolution External Connectivity Software Control Experience Conclusion & Discussion

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

DATA CENTER FABRIC COOKBOOK

Huawei CloudFabric and VMware Collaboration Innovation Solution in Data Centers

Testing & Assuring Mobile End User Experience Before Production Neotys

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Hochverfügbarkeit in Campusnetzen

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

Service Mesh and Microservices Networking

NGFW Security Management Center

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Building an on premise Kubernetes cluster DANNY TURNER

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Load Balancing Technology White Paper

Building a Kubernetes on Bare-Metal Cluster to Serve Wikipedia. Alexandros Kosiaris Giuseppe Lavagetto

Data Centers. Tom Anderson

CSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017

Torc. Applications, Microservices, VNFs controlled by Top-of-Rack Controller AT&T Foundry, where ideas are made. Marcel Neuhausler.

Service Description Platform.sh by Orange

2nd SIG-NOC meeting and DDoS Mitigation Workshop Scrubbing Away DDOS Attacks. 9 th November 2015

Mellanox Virtual Modular Switch

Mul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya

ENTERPRISE MPLS. Kireeti Kompella

Multi-site Datacenter Network Infrastructures

Datacenter Wide- area Enterprise

Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks

Evolution of Rack Scale Architecture Storage

LISP Locator/ID Separation Protocol

Intuit Application Centric ACI Deployment Case Study

Introduction: PURPOSE BUILT HARDWARE. ARISTA WHITE PAPER HPC Deployment Scenarios

Cato Cloud. Software-defined and cloud-based secure enterprise network. Solution Brief

How Cisco IT Deployed Enterprise Messaging on Cisco UCS

A Cloud Gateway - A Large Scale Company s First Line of Defense. Mikey Cohen Manager - Edge Gateway Netflix

SDN Controllers in the WAN: protocols and applications

Distributed Network Function Virtualization

Facebook s Data Center Network Architecture

THE NETWORK AND THE CLOUD

Network Virtualization. Duane de Witt

Network Security: Network Flooding. Seungwon Shin GSIS, KAIST

Data Center Network Topologies

Data Center Network Topologies II

Chapter 2 Application Layer

Internet Architecture and Experimentation

STATEFUL TCP/UDP traffic generation and analysis

Understanding the Evolving Internet

How Teridion Works. A Teridion Technical Paper. A Technical Overview of the Teridion Virtual Network. Teridion Engineering March 2017

Migration Technologies. Dual Stack and Tunneling Using GRE, 6to4, and 6in4.

CS November 2018

Building a Big IaaS Cloud. David /

Building NFV Solutions with OpenStack and Cisco ACI

CSE 4/589 Midterm Review. Hengtong Zhang SUNY Buffalo 10/30/2018

Networking and Internetworking 1

ADX Software Updates and the Application Resource Broker (ARB) Introduction

NETWORK DEPLOYMENT WITH SEGMENT ROUTING (SPRING)

Cisco Tetration Platform: Network Performance Monitoring and Diagnostics

IPv6 World View and World IPv6 Day A global view of IPv6 commercial readiness

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

GÉANT IP Service Description. High Performance IP Services to Support Advanced Research

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Transcription:

Connecting the World: A look inside Facebook s Networking Infrastructure Arun Moorthy arunm@fb.com https://fb.me/arun.moorthy

about:me B. Tech CSE - Indian Institute of Technology, 1997 MSCS, University of North Carolina, 1999 Current: Facebook Inc, (2012 - ) Engg. Mgr on Network Team: Load-balancing, Transport Security Previous: Intel Corp, Nexsi Systems, Microsoft Corp, 1999-2011!

Agenda Challenges Facebook s Global Network Software Load Balancing POP Network Architecture Data Center Networking Concluding Remarks

The Challenges 1.4+ Billion users 1+ Tb/s egress 4B+ video views/day 1+ Million Requests/sec Worldwide user-base (80+% users outside US and Canada) Highly available + reliable

Did I mention Highly Available?

Datacenter

Datacenter

Datacenter Edge PoP

Datacenter Edge PoP

International RTT circa 11/2011

Seoul -> Oregon TCP Connect: 150ms

HTTPS Seoul -> Oregon 75ms TCP conn established: 150 SYN ACK ClientHello SYN+ACK ServerHello SSL session established: 450 ChangeCipherSpec ChangeCipherSpec Response Received 600 GET HTTP 1.1

Seoul -> Tokyo -> Oregon TCP Connect: 30ms SSL Session:?? HTTP Response:?? NRT

HTTPS Seoul->Tokyo->Oregon 15ms 60ms Sessions established: 90 ms (vs 450 ms) Request Received GET GET Response Received: 240 HTTP 1.1 HTTP 1.1 200

Seoul -> Oregon TCP Connect: 150ms 30ms 90ms SSL Session: 450ms 240ms HTTP Response: 600ms NRT

POP

POP Cluster TCP Routing (ip/port) TCP/SSL HTTP 31.13.76.107 L4LB POP L7LB CDN

DC POP

DC Cluster TCP Routing (ip/port)! L4LB TCP/SSL HTTP L7LB HHVM CDN STORAGE

Benefits of the Edge Reduced latencies (Edge Termination) Caching static content (CDN)

Sonar: Measuring Closeness SEA ORD NRT HKG LAX

Cartographer Internet ISP DNS resolver Facebook capacity resolvers/latency cartographer DNS map DNS Server routing health offline processing

Proxygen HTTP Framework High-performance C++ Server & Client Customizable Forward/Reverse Proxy Open-source Mobile Proxygen: Cross-platform Deep instrumentation Modular components: DNS TCP TLS HTTP SPDY

And More Shiv: L4 Load-balancer based on IPVS + python Edge-Fabric: Intelligent Interface utilization in POPs FB CDN: BigCache

Previous PoP Architecture Backbone Routers Peering Routers Links to FB Backbone Links to ISPs

Previous PoP Architecture Backbone Routers Peering Routers Links to FB Backbone Links to ISPs

Fabric-style PoP Architecture Backbone Routers Peering Routers Links to FB Backbone Peering/Transit Links PSW PSW PSW PSW

Edge Cluster Upgrade ~2X Gb/s 1X Gb/s

Rise of the @scale data center network Machine-to-Machine Machine-to-User

The 4-post cluster - our old design TOR switch 4 x Large Cluster Core Switches igp = BGP Rack Switches Server Rack

Box size limited cluster size TOR switch 4 x Large Cluster Core Switches igp = BGP Rack Switches Server Rack

Cluster size limited application size major tiers are pushing the limits of cluster size 90 80.9 81.3 68 57.5 56.3 45 40.7 30.7 34.5 34.8 23 0 12.9 17.7 15.7 15.9 16.1 11.9 13.3 12.1 13 7.3 8.6 3.3 2.5 2.7 0.2 0 Overall Hadoop Frontend Service Cache DB Intra-Rack Intra-Cluster Intra-DC Inter-DC

The Vision: The Whole Data Center, Redone.

The topology Facebook Fabric: an innovative network topology for data centers

The Fabric: one datacenter-wide network Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster

The Fabric: one datacenter-wide network TOR switch = fabric infra switches Server Rack small & simple boxes

Server Pod: a [small] unit of deployment 4 fabric switches spine planes 1 2 3 4 1 48 48 rack switches pods pods interconnected by parallel spines

Many paths between servers Server equal performance paths Server

Advantages of Fabric Modular/scalable building block More bandwidth capacity - future proof Distributed load Resilient to failures Individual devices and links are not important

The top-of-rack switch Facebook Wedge

The top-of-rack switch Facebook Wedge

The software FBOSS:Facebook Open Switching System

FBOSS

6-pack - Core/Spine Switch

6-pack Switch First open hardware modular switching platform! 128x40GE non-blocking switch! Runs FBOSS over Linux! Modular! 12 independent Wedges! 4 fabric, 8 front-panel! 100G ready

Data Center Networking Summary FBOSS & BMC 1! From Wedge 2! We built 6-pack 3! FBOSS! & BMC 4! 5! OCP based! Open hardware eco system & software

Summary Facebook s network infrastructure Datacenters, Edge, CDN, Load-balancers, @Scale Data Center Networking Redefined Software & Hardware Open Modular Ready for experimentation!

For more information https://code.facebook.com/posts/networking http://www.opencompute.org/ https://github.com/facebook Email: arunm@fb.com

TRUE, OPEN NETWORK SW ECOSYSTEM

Backup

Traffic Characteristics

Weekly Cycle Egress Ingress 7 Days

Daily Cycle Egress 11 3 PM 24 hours

Sum of timezones Canada United Kingdom Indonesia 7 PM 9 AM 24 hours

Load-balancing: Deep Dive

FB Request -- one web server rps = requests per second www? A 1.2.3.4 GET / DNS <html>... www? HHVM low rps how do we get more rps?!

Add a load balancer! rps = requests per second www? A 1.2.3.4 DNS www? GET / <html>... L7LB (proxygen) GET / <html>... PHP PHP HHVM lots more rps low rps how do we get more rps?!

Add another load balancer! www? DNS www? GET / <html>... A 1.2.3.4 L4LB (shiv) GET / <html>... L7LB (proxygen L7LB (proxygen L7LB (proxygen GET / <html>... PHP PHP PHP PHP HHVM network bound lots more rps low rps how do we get more rps?!

Add another load balancer! ECMP www? A 1.2.3.4 DNS GET / <html>... L4LB L4LB (shiv) (shiv) network bound GET / <html>... L7LB (proxygen L7LB (proxygen L7LB (proxygen L7LB (proxygen lots more rps GET / <html>... PHP PHP PHP PHP HHVM PHP PHP PHP PHP HHVM low rps

Front end Cluster ~10 ~100 Thousands

cont. x 10 or more

w More RPS? Add another cluster! www? A 1.2.3.4 DNS GET / <html>... L4LB L4LB (shiv) (shiv) GET / <html>... L7LB (proxygen L7LB (proxygen L7LB (proxygen GET / <html>... PHP PHP HHVM GET / <html>... L4LB L4LB (shiv) (shiv) GET / <html>... L7LB (proxygen L7LB (proxygen L7LB (proxygen GET / <html>... PHP PHP HHVM how do we get more rps?!

Add another datacenter! www? A 1.2.3.4 DNS w G G G < < < G G G < < < G G G < < < G G G < < < G G G < < < G G G < < <

L4LB BGP ECMP Hash 1.2.3.4 1.2.3.4 L4LB L4LB State Table + Hash 1.2.3.4 1.2.3.4 1.2.3.4 1.2.3.4 L7LB L7LB L7LB L7LB

L4LB Routing ECMP Hash A 1.2.3.4 1.2.3.4 D L4LB L4LB State Table + Hash 1.2.3.4 1.2.3.4 1.2.3.4 1.2.3.4 L7LB L7LB L7LB L7LB

L4LB Routing ECMP Hash A 1.2.3.4 1.2.3.4 D L4LB L4LB A State Table + Hash 1.2.3.4 1.2.3.4 1.2.3.4 1.2.3.4 L7LB L7LB L7LB L7LB

L4LB Routing ECMP Hash B 1.2.3.4 1.2.3.4 L4LB L4LB State Table + Hash 1.2.3.4 1.2.3.4 1.2.3.4 1.2.3.4 L7LB L7LB L7LB L7LB

L4LB Routing ECMP Hash B 1.2.3.4 1.2.3.4 L4LB L4LB State Table + Hash 1.2.3.4 1.2.3.4 1.2.3.4 1.2.3.4 L7LB L7LB L7LB L7LB