Content Delivery Networks

Similar documents
Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

CS November 2018

CS November 2017

Content Distribution. Today. l Challenges of content delivery l Content distribution networks l CDN through an example

The Akamai Network: A Platform for High-Performance Internet Applications Erik Nygren Ramesh K. Sitaraman Jennifer Sun

How Akamai delivers your packets - the insight. Christian Kaufmann SwiNOG #21 11th Nov 2010

End-user mapping: Next-Generation Request Routing for Content Delivery

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Mul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya

Why Highly Distributed Computing Matters. Tom Leighton, Chief Scientist Mike Afergan, Chief Technology Officer J.D. Sherman, Chief Financial Officer

CS4/MSc Computer Networking. Lecture 3: The Application Layer

CDN TUNING FOR OTT - WHY DOESN T IT ALREADY DO THAT? CDN Tuning for OTT - Why Doesn t It Already Do That?

Nygren, E., Sitaraman, R.K., and Sun, J., "The Akamai Network: A Platform for

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Content Distribu-on Networks (CDNs)

Internet Load Balancing Guide. Peplink Balance Series. Peplink Balance. Internet Load Balancing Solution Guide

Drafting Behind Akamai (Travelocity-Based Detouring)

SaaS Providers. ThousandEyes for. Summary

Chapter 2 Application Layer

Architekturen für die Cloud

CSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017

Internet Content Distribution

CONTENT DISTRIBUTION. Oliver Michel University of Illinois at Urbana-Champaign. October 25th, 2011

Multimedia Streaming. Mike Zink

Application Layer Protocols

Characterizing a Meta-CDN

A Tale of Three CDNs

Lecture 05: Application Layer (Part 02) Domain Name System. Dr. Anis Koubaa

NaaS Network-as-a-Service in the Cloud

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma

Lecture 6 Application Layer. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Cache Management for TelcoCDNs. Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK)

Service Mesh and Microservices Networking

The New Net, Edge Computing, and Services. Michael R. Nelson, Ph.D. Tech Strategy, Cloudflare May 2018

Web, HTTP, Caching, CDNs

CONTENT-DISTRIBUTION NETWORKS

A New Approach to Fixing Internet Application Performance. Elad Rave, Founder and CEO

Akamai's V6 Rollout Plan and Experience from a CDN Point of View. Christian Kaufmann Director Network Architecture Akamai Technologies, Inc.

416 Distributed Systems. March 23, 2018 CDNs

Akamai's V6 Rollout Plan and Experience from a CDN Point of View. Christian Kaufmann Director Network Architecture Akamai Technologies, Inc.

IPv6. Akamai. Faster Forward with IPv6. Eric Lei Cao Head, Network Business Development Greater China Akamai Technologies

Content Delivery on the Web: HTTP and CDNs

Application-Layer Protocols Peer-to-Peer Systems, Media Streaming & Content Delivery Networks

CONTENT-AWARE DNS. IMPROVING CONTENT-AWARE DNS RESOLUTION WITH AKAMAI DNSi CACHESERVE EQUIVALENCE CLASS. AKAMAI DNSi CACHESERVE

Week-12 (Multimedia Networking)

Distributed Systems Principles and Paradigms. Chapter 12: Distributed Web-Based Systems

Never Drop a Call With TecInfo SIP Proxy White Paper

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

Application Layer. Applications and application-layer protocols. Goals:

Democratizing Content Publication with Coral

CSC 401 Data and Computer Communications Networks

The Application Layer: Sockets, DNS

ThousandEyes for. Application Delivery White Paper

Democratizing Content Publication with Coral

Content distribution networks

Chapter 2 Application Layer. Lecture 5 DNS. Computer Networking: A Top Down Approach. 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012

Odin: Microsoft s Scalable Fault-

Deployment Scenarios for Standalone Content Engines

Server Selection Mechanism. Server Selection Policy. Content Distribution Network. Content Distribution Networks. Proactive content replication

Complex Interactions in Content Distribution Ecosystem and QoE

EECS 122: Introduction to Computer Networks DNS and WWW. Internet Names & Addresses

Exam - Final. CSCI 1680 Computer Networks Fonseca. Closed Book. Maximum points: 100 NAME: 1. TCP Congestion Control [15 pts]

From Internet Data Centers to Data Centers in the Cloud

6to4 Reverse DNS Delegation

A Guide to Architecting the Active/Active Data Center

SamKnows test methodology

Choosing the Right Acceleration Solution

11/13/2018 CACHING, CONTENT-DISTRIBUTION NETWORKS, AND OVERLAY NETWORKS ATTRIBUTION

Unity EdgeConnect SP SD-WAN Solution

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Takes 3-6 Months to Deploy. MPLS connections take 3-6 months to be up and running in some remote locations. Incurs Significantly High Costs

Domain Name Service. DNS Overview. October 2009 Computer Networking 1

War Stories from the Cloud Going Behind the Web Security Headlines. Emmanuel Mace Security Expert

Multimedia: video ... frame i+1

List of measurements in rural area

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. Caching, Content Distribution and Load Balancing

Application Protocols and HTTP

SDN Use-Cases. internet exchange, home networks. TELE4642: Week8. Materials from Prof. Nick Feamster is gratefully acknowledged

Cisco Videoscape Distribution Suite Service Broker

Cisco Intelligent WAN with Akamai Connect

Overview Content Delivery Computer Networking Lecture 15: The Web Peter Steenkiste. Fall 2016

Load Balancing Technology White Paper

FAST, FLEXIBLE, RELIABLE SEAMLESSLY ROUTING AND SECURING BILLIONS OF REQUESTS PER MONTH

ScaleArc for SQL Server

What s next for your data center? Power Your Evolution with Physical and Virtual ADCs. Jeppe Koefoed Wim Zandee Field sales, Nordics

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

The DNS of Things. A. 2001:19b8:10 1:2::f5f5:1d Q. WHERE IS Peter Silva Sr. Technical Marketing

Jim Metzler. Introduction. The Role of an ADC

SaaS Adoption in the Enterprise

CSE 123b Communications Software

Overview. Internet in Pakistan. How I Stumbled Upon this? Information Access and Communication Networks in the Developing-world. Poor Man s Broadband

Midterm Logistics. Midterm Review. The test is long.(~20 pages) Today. My General Philosophy on Tests. Midterm Review

Midterm Review. EE122 Fall 2012 Scott Shenker

Scalability of web applications

ANYCAST and MULTICAST READING: SECTION 4.4

Edge Side Includes (ESI) Overview

Today s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications

Send me up to 5 good questions in your opinion, I ll use top ones Via direct message at slack. Can be a group effort. Try to add some explanation.

BUILDING LARGE VOD LIBRARIES WITH NEXT GENERATION ON DEMAND ARCHITECTURE. Weidong Mao Comcast Fellow Office of the CTO Comcast Cable

Interdomain Routing Design for MobilityFirst

Transcription:

Content Delivery Networks Richard T. B. Ma School of Computing National University of Singapore CS 4226: Internet Architecture

Motivation Serving web content from one location scalability -- flash crowd problem reliability performance Key ideas: cache content & serve requests from multiple severs at network edge reduce demand on site s infrastructure provide faster services to users

Web cache and caching proxy

Replication and load balancing

The middle mile problem The last mile problem is solved by high levels of global broadband penetration but imposes a new question of scale by demand The first mile is easy in terms of performance and reliability Get stuck in the middle

Inside the Internet IXP IXP Large Content Distributor Tier 1 ISP Large Content Distributor Tier 1 ISP Tier 1 ISP

Inside the Internet Large Content Distributor IXP Tier 2 ISP Tier 2 ISP Tier 2 ISP Tier 1 ISP IXP Large Content Distributor Tier 2 ISP Tier 2 ISP Tier 1 ISP Tier 2 ISP Tier 2 ISP Tier 1 ISP Tier 2 ISP Tier 2 ISP

The middle mile problem The last mile problem is solved by high levels of global broadband penetration but imposes a new question of scale by demand The first mile is easy in terms of performance and reliability Stuck in the middle, potential solutions: big data center CDNs highly distributed CDNs how about P2P?

The challenge Distance (Server to User) Network RTT Packet Loss Throughput Local: <100 mi. 1.6 ms 0.6% 44 Mbps (high quality HDTV) Regional: 500 1,000 mi. 16 ms 0.7% 4 Mbps (basic HDTV) 4GB DVD Download Time 12 min. 2.2 hrs. Cross-continent: ~3K mi. 48 ms 1.0% 1 Mbps (TV) 8.2 hrs. Multi-continent: ~6K mi. 96 ms 1.4% 0.4 Mbps (poor) 20 hrs. The fat file paradox though bits are transmitted at a speed of light, distance between user and server is critical latency and throughput are coupled due to TCP

Major CDNs (by 15 revenue) Akamai $1.03B $700M of CDN Limelight $174M $120M of CDN Amazon $6B $1.8B of CDN, but big % on storage cloud provider Level 3 $8B $235M of CDN tier-1 transit provider

Major CDNs (by 15 revenue) EdgeCast $180M $125M of CDN Highwinds $135M $95M of CDN Fastly $60M $9M of CDN ChinaCache $270M $81M of CDN also a cloud provider Rest of smaller regional CDNs (MaxCDN, CDN77 etc.) $100M combined.

Reference Cheng Huang, Angela Wang, Jin Li and Keith W. Ross, Measuring and Evaluating Large-Scale CDNs, Internet Measurement Conference 2008. Erik Nygren, Ramesh K. Sitaraman and Jennifer Sun, The Akamai Network: A Platform for High-erformance Internet Applications, ACM SIGOPS Operating Systems Review 44(3), July 2010.

How can we understand a CDN? We don t know their internal structures but could infer via a measurement approach We know that CDNs use a DNS trick for example, end-user types www.youtube.com resolve IP address via local DNS (LDNS) server LDNS queries YouTube s authoritative DNS YouTube uses CDN if returns a CNAME like a1105.b.akamai.net or move.vo.llnwd.net LDNS then queries CNAME s authoritative DNS server and get the IP address of the content server

DNS records DNS: distributed db storing resource records (RR) RR format: (name, value, type, ttl) Type=A (Address) name is hostname value is IP address Type=NS (Name Server) name is domain (e.g., foo.com) value is hostname of authoritative name server for this domain Type=CNAME (Canonical NAME) name is alias name for some canonical (the real) name www.ibm.com is really servereast.backup2.ibm.com value is canonical name Type=MX (Mail exchange) value is name of mailserver associated with name

Content Server Assignment The returned content server will be close to the issuing local DNS (LDNS) server

Measurement Framework Assumptions: CDN chooses nearby content server based on the location of LDNS that originates the query the same LDNS might get different content servers for the same query at different times 1. Determine all the CNAMEs of a CDN 2. Query a large number of LDNSs all over the world, at different times of the day, for all of the CNAMEs found in step 1

Finding CNAMEs and LDNSs Find all the CNAMEs of a CDN use over 16 million web hostnames a DNS query tells if it resolves to a CNAME whether the CNAME belongs to the target CDN thousands of CNAMEs for Akamai and Limelight Locate a large # of distributed LDNSs need open recursive DNS servers use over 7 million unique client IP addresses and over 16 million web hostnames reverse DNS lookup and test trial DNS queries

Open recursive DNS servers many different DNS servers map into same IP addresses obtain 282,700 unique open recursive DNS servers

Measurement Platform 300 PlanetLab nodes, 3 DNS queries per second more than 1 day for the measurement

The Akamai Network Type # of CNAMES # of IPs Usage (a) *.akamai.net 1964 ~11,500 conventional content distribution (b) *.akadns.net 757 A few per CNAME load balancing for customers who has their own networks (c) *.akamaiedge.net 539 ~36,000 dynamic content distribution/secure service Type (a): returns 2 IP addresses, different for different locations hundreds of IPs behind a CNAME, ~11,500 content servers Type (c): returns only 1 IP address; 20-100 IPs for each CNAME guesses virtualization used for isolated environments

The Akamai Network ~27K content servers, ~6K also run DNS 60% in the US, 90% in top 10 countries flat distribution in ISPs: 15% in top 7

The Limelight Network Easier as it is an Autonomous System (AS) obtain the IP addresses of the AS only ~4K servers

Measuring performance Two metrics availability: how reliable are the CDN servers? delay: how fast content can be retrieved? Performance results are controversial do the metrics sufficiently match overall system performance goals? how does performance metric map to specific customer performance perception? both Akamai and Limelight issued statements to correct the research results

Availability Monitor all servers for 2 months, ping once every hour If a server does not respond in 2 consecutive hours, considered down But does down server necessarily affect availability?

Delay Different reasons: number of content servers? optimality (for delay) of routing?

More detailed delay comparison

Akamai s statement Availability cannot be reflected based on server uptime alone Akamai s CDN has more servers but not necessarily harder to maintain The use of open-resolvers miss many Akamai servers, hence over-estimating delay in Akamai case Akamaiedge is not a virtualized network

Limelight s statement Overall performance can t be represented by just two dimensions (availability & delay) Server downtime does not necessarily affect availability; suggested some way to measure and claim in the 99.9% range RTT of a packet can t represent delay for objects; suggest use different object sizes More authoritative performance study should be based on customer trial

Akamai vs. Limelight Akamai Limelight # of servers ~27K ~4K # of clusters 1158 18 95 percentile delay ~100ms ~200ms average delay ~30ms ~80ms penetration in ISPs high low cost high low complexity high low approach highly distributed big data center

Facts about Akamai (2014-2015) CDN company evolved from MIT research invent better ways to deliver Internet content tackle the "flash crowd" problem Earns over US$1B revenue in 2015, 25% of the whole CDN market Runs on 150,000 servers in 1,200 networks across 92 countries

Internet delivery challenge 5% traffic for the largest network Over 650 networks to reach 90% Long tail distribution of traffic % of access traffic from top networks

Other challenges Peering point congestion little economic incentive for middle mile Inefficient routing protocols how does BGP work? Unreliable networks de-peering between ISPs Inefficient communication protocols Scalability App limitations and slow rate of adoption

Delivery network as a virtual network Works as an overlay compatible transparent to users adaptive to changes The untaken cleanslate approach adoption problem development cost

The Akamai Network at ~2010 A large distributed system, consists of ~ 60000 servers ~ 1000 networks ~ 70 countries Can also be regarded as multiple delivery networks for different types of content static web streaming media dynamic applications

Anatomy of Delivery Network edge servers global deployment thousands of cites mapping system assigns requests to edge servers use historic data system conditions

Anatomy of Delivery Network transport system move content from origin to edge may cache data communication and control system disseminate status and control message configuration update

Anatomy of Delivery Network data collection and analysis collect and process data, e.g., logs used for monitoring, analytics, billing management portal customer visibility & fine-grained control update edge servers

System Design Principles Goals: scalable and fast data collection & management safe, quick & consistent configuration updates enterprise visibility & fine-grained control Assumption: a significant number of failures is expected to be occurring at all times machine, rack, cluster, connectivity or network philosophy: failures are normal and the delivery network must operate seamlessly despite them

System Design Principles Design for reliability ~100% end-to-end availability full redundancy and fault tolerance protocols Design for scalability handle large volumes of traffic, data, control Limit the necessity for human management automatic, needed to scale, respond to faults Design for performance improve bottleneck, response time, cache hit rate, resource utilization and energy efficiency

Streaming and content delivery Architectural considerations for cacheable web content and streaming media Principle: minimize long-haul communication through the middle-mile bottleneck of the Internet feasible by pervasive, distributed architectures where servers sit as close to users as possible Key question: how distributed it needs to be?

How distributed it needs to be? Akamai s approach: deploy server clusters not only in in Tier 1 and Tier 2 data centers also in network edges, thousands of locations more complexity and costs Reasons: highly fragmented Internet traffic, e.g., top 45 network only account for half of access traffic distance between server and users is the bottleneck for video throughput due to TCP P2P is not good for management and control

Video-grade scalability Content providers problem YouTube receives 2 billion views per day high rates for video, e.g., 2-40 Mbps for HDTV need to scale with user requests high capital and operational costs to overprovision so as to absorb spikes on-demand Akamai s throughput 3.45 Tbps in April 2010 ~ 50-100 Tbps throughput now needed

Akamai s challenges need consider throughput along entire path bottlenecks everywhere original data centers, peering points, network s backhaul capacity, ISP s upstream connectivity a data center s egress capacity has little impact on real throughput to end users even 50 well-provisioned, connected data centers cannot achieve ~100 Tbps IP-layer multicast does not work in practice, needs its own transport system

Transport system for content Tiered content distribution target for cold or infrequently-accessed efficiency cache strategy with high hit rates well-provisioned and highly connected parent clusters are utilized original servers are offloaded in the high 90 s helpful in flash crowds for large objects

Tiered distribution

Transport system for streaming An overlay network for live streaming once a stream is captured & encoded, it s sent to a cluster of servers called the entrypoint automatic failover among multiple entrypoints within an entrypoint cluster, distributed leader election is used to tolerate machine failure publish-subscribe (pub-sub) model: entrypoint publishes available streams, and each edge server subscribes to streams that it requires

Transport system for streaming An overlay network for live streaming reflectors act as intermediaries between the entrypoints and the edge clusters scaling: enables rapidly replicating a stream to a large number of edge clusters to serve popular events quality: provides alternate paths between each entrypoint and edge cluster, enhancing end-toend quality via path optimization

can use multiple link-disjoint paths need efficient algorithms for path selection

Application delivery network Target for dynamic web application and non-cacheable content Two complementary approaches speed up long-haul communications by using the Akamai platform as a high-performance overlay network, i.e., the transport system pushes application logic from the origin server out to the edge of the Internet

Transport system for app acceleration Path optimization overcome BGP, collect topology & performance data from mapping system dynamically select potential intermediate nodes for a particular path, or multiple paths ~30-50% performance improvement by overlay used also for packet loss reduction Middle East cable cut in 2008

Transport system for app acceleration Transport protocol optimizations proprietary transport-layer protocol use pools of persistent connections to eliminate connection setup and teardown overhead optimal TCP window sizing with global knowledge intelligent retransmission after packet loss Application optimizations parse HTML and prefetch embedded content content compression reduces # of roundtrips implement app logic at edge, e.g., authentication

Distributing applications to the edge EdgeComputing Services of Akamai E.g., deploy and execute request-driven Java J2EE apps on Akamai s edge servers Not all apps can be run entirely on the edge Some use cases content aggregation/transformation static databases data collection complex applications

Platform components

Other platform components Edge server platform Mapping system Communications and control system Data collection and analysis system Additional systems and services

Edge server platform Functionalities controlled by metadata origin server location and response to failures cache control and indexing access control header alteration (HTTP) EdgeComputing performance optimization

Mapping system Global traffic director uses historic and real-time data about the health of the Akamai network and the Internet objective: create maps that are used to direct traffic on the Akamai network in a reliable, efficient, and high performance manner a fault-tolerant distributed platform: run in multiple independent sites and leader-elect based on the current health status of each site two parts: scoring system + real-time mapping

Mapping system Scoring system creates the current Internet topology collects/processes data: ping, BGP, traceroute monitors latency, loss, connectivity frequently Real-time mapping creates the actual maps used to direct end users requests to the best edge servers selects intermediates for tiered distribution and the overlay network first step: map to cluster Based on scoring system info, updated every minute second step: map to server Based on content locality, load changes, and etc.

Communications and control system Real-time distribution of status and control information small real-time message throughout the net solution: pub-sub model Point-to-point RPC and web services Dynamic configuration updates quorum-based replication another whole paper Key management infrastructure Software/machine config. management

Data collection and analysis system Log collection over 10 million HTTP/sec 100TB/day compression, aggregation, pipeline and filter reporting and billing Real-time data collection and monitoring a distributed real-time relational database that supports SQL query another whole paper Analytics and Reporting enable customers to view traffic & performance uses log and Query system, & e.g., MapReduce