Second Half of the Course Content Distribu-on Networks (CDNs) Mike Freedman COS 461: Computer Networks h@p://www.cs.princeton.edu/courses/archive/spr14/cos461/ Applica-on case studies Content distribu-on, peer- to- peer systems and distributed hash tables (DHTs), and overlay networks Network case studies Enterprise, wireless, cellular, datacenter, and backbone networks; sooware- defined networking Network security Securing communica-on protocols Interdomain rou-ng security 2 Single Server, Poor Performance Skewed Popularity of Web Traffic Zipf or power- law distribu-on Single Popular content Single point of failure Popular site Easily overloaded Flash crowd (aka Far from most clients Slashdot effect ) Denial of Service a@ack Characteris*cs of WWW Client- based Traces Carlos R. Cunha, Azer Bestavros, Mark E. Crovella, BU- CS- - 01 4 1
Caches 6 Web Caching client request response request response request response origin client 6 Forward Cache close to the client Under administra-ve control of client- side AS Explicit proxy Requires configuring browser client request response request response client Implicit proxy Service provider deploys an on path proxy that intercepts and handles Web requests Reverse Cache close to Either by proxy run by or in third- party content distribu-on network (CDN) Direc-ng clients to the proxy Map the site name to the IP address of the proxy request response request response origin origin 2
Servers... Router Requests Google Design Reverse Client Data Centers Private Backbone Internet Client Reverse Client Router Servers Caches (A) Forward (B) Reverse (C) Both (D) Neither Reac-vely replicates popular content Reduces origin costs Reduces client ISP costs Intelligent load balancing between origin s Offload form submissions (POSTs) and user auth Content reassembly or transcoding on behalf of origin Smaller round- trip -mes to clients Maintain persistent connec-ons to avoid TCP setup delay (handshake, slow start) 10 Caches (A) Forward (B) Reverse (C) Both (D) Neither Reac-vely replicates popular content (C) Reduces origin costs (C) Reduces client ISP costs (A) Intelligent load balancing between origin s (B) Offload form submissions (POSTs) and user auth (D) Content reassembly, transcoding on behalf of origin (C) Smaller round- trip -mes to clients (C) Maintain persistent connec-ons to avoid TCP setup delay (handshake, slow start) (C) 11 Limita-ons of Web Caching Much content is not cacheable Dynamic data: stock prices, scores, web cams CGI scripts: results depend on parameters Cookies: results may depend on passed data SSL: encrypted data is not cacheable Analy-cs: owner wants to measure hits Stale data Or, overhead of refreshing the cached data 12
Modern Video- on- Demand Download content manifest from origin List of video segments belonging to video Each segment 1-2 seconds in length Client can know -me offset associated with each Standard naming for different video resolu-ons and formats: e.g., 20dpi, 20dpi, 1040dpi, Client downloads video segment (at certain resolu-on) using standard request. request can be sa-sfied by cache: it s a sta-c object Client observes download -me vs. segment dura-on, increases/decreases resolu-on if appropriate 1 Content Distribu-on Networks 14 Content Distribu-on Network Proac-ve content replica-on Content provider (e.g., CNN) contracts with a CDN CDN replicates the content On many s spread throughout the Internet Upda-ng the replicas Updates pushed to replicas when the content changes origin in North America CDN distribution node CDN CDN in S. America CDN in Asia in Europe Live For availability Server Selec-on Policy Requires con*nuous monitoring of liveness, load, and performance Lowest load To balance load across the s Closest Nearest geographically, or in round- trip -me Best performance Throughput, latency, Cheapest bandwidth, electricity, 1 16 4
Server Selec-on Mechanism Server Selec-on Mechanism Applica-on redirec-on GET Redirect GET OK Advantages Fine- grain control Selec-on based on client IP address Disadvantages Extra round- trips for TCP connec-on to Overhead on the Rou-ng Anycast rou-ng 1.2..0/24 1.2..0/24 Advantages No extra round trips Route to nearby Disadvantages Does not consider network or load Different packets may go to different s Used only for simple request- response apps 1 1 Server Selec-on Mechanism Naming DNS- based selec-on 1.2..4 Advantages Avoid TCP set- up delay DNS caching reduces overhead Rela-vely fine control How Works DNS query local 1.2.. Disadvantage Based on IP address of local Hidden load effect DNS TTL limits adapta-on 1 20
Sta-s-cs How Uses DNS 22 Distributed s Servers: ~100,000 Networks: ~1,000 Countries: ~0 Many customers Apple, BBC, FOX, GM IBM, MTV, NASA, NBC, NFL, NPR, Puma, Red Bull, Rutgers, SAP, Client requests 20+M per second Half in the top 4 networks 20% of all Web traffic worldwide 21 GET index. html http://cache.cnn.com/ foo.jpg DNS root global regional How Uses DNS 2 How Uses DNS 24 DNS lookup cache.cnn.com 4 ALIAS: g.akamai.net global regional ALIAS a.g.akamai.net DNS lookup g.akamai.net global regional 6
How Uses DNS 2 How Uses DNS 26 DNS a.g.akamai.net Address 1.2..4 global regional GET /foo.jpg Host: cache.cnn.com global regional How Uses DNS 2 How Uses DNS 2 GET foo.jpg 11 12 global regional 11 12 global regional GET /foo.jpg Host: cache.cnn.com 10
How Works: Cache Hit 2 Mapping System Equivalence classes of IP addresses IP addresses experiencing similar performance Quan-fy how well they connect to each other 4 global regional Collect and combine measurements Ping, traceroute, BGP routes, logs E.g., over 100 TB of logs per days Network latency, loss, and connec-vity 6 0 Mapping System Map each IP class to a preferred Based on performance, health, etc. Updated roughly every minute Map client request to a in the Load balancer selects a specific E.g., to maximize the cache hit rate Adap-ng to Failures Failing hard drive on a Suspends aoer finishing in progress requests Failed Another takes over for the IP address Low- level map updated quickly Failed High- level map updated quickly Failed path to customer s origin Route packets through an intermediate node 1 2
Transport Op-miza-ons Bad Internet routes Overlay rou-ng through an intermediate Packet loss Sending redundant data over mul-ple paths TCP connec-on set- up/teardown Pools of persistent connec-ons TCP conges-on window and round- trip -me Es-mates based on network latency measurements Applica-on Op-miza-ons Slow download of embedded objects Prefetch when HTML page is requested Large objects Content compression Slow applica-ons Moving applica-ons to edge s E.g., content aggrega-on and transforma-on E.g., sta-c databases (e.g., product catalogs) 4 Conclusion Content distribu-on is hard Many, diverse, changing objects Clients distributed all over the world Reducing latency is king Contribu-on distribu-on solu-ons Reac-ve caching Proac-ve content distribu-on networks