Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Similar documents
416 Distributed Systems. March 23, 2018 CDNs

Scaling Problem Computer Networking. Lecture 23: Peer-Peer Systems. Fall P2P System. Why p2p?

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Outline. Peer-to-Peer. P2p file-sharing. Wither p2p? What s out there? The p2p challenge C1: Search(human s goals) -> file

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Sharding & CDNs. CS 475, Spring 2018 Concurrent & Distributed Systems

Today s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications

Web, HTTP, Caching, CDNs

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

CS 43: Computer Networks. 14: DHTs and CDNs October 3, 2018

Peer-to-peer computing research a fad?

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

CSE 123b Communications Software

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

Lecture 21 P2P. Napster. Centralized Index. Napster. Gnutella. Peer-to-Peer Model March 16, Overview:

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

15-744: Computer Networking P2P/DHT

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

CS 43: Computer Networks BitTorrent & Content Distribution. Kevin Webb Swarthmore College September 28, 2017

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

Architectures for Distributed Systems

Overview Computer Networking Lecture 17: Delivering Content Peer to Peer Examples Peter Steenkiste

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

CS November 2018

EE 122: Peer-to-Peer Networks

Peer-to-Peer Protocols and Systems. TA: David Murray Spring /19/2006

Scalable overlay Networks

Files/News/Software Distribution on Demand. Replicated Internet Sites. Means of Content Distribution. Computer Networks 11/9/2009

Unit 8 Peer-to-Peer Networking

CIS 700/005 Networking Meets Databases

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

CSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017

CSE 486/586 Distributed Systems Peer-to-Peer Architectures

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007

A Survey of Peer-to-Peer Content Distribution Technologies

CS November 2017

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Mul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Application-Layer Protocols Peer-to-Peer Systems, Media Streaming & Content Delivery Networks

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

internet technologies and standards

CONTENT-DISTRIBUTION NETWORKS

Peer-to-Peer Applications Reading: 9.4

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Introduction to Peer-to-Peer Systems

Making Gnutella-like P2P Systems Scalable

11/13/2018 CACHING, CONTENT-DISTRIBUTION NETWORKS, AND OVERLAY NETWORKS ATTRIBUTION

Last Lecture SMTP. SUNY at Buffalo; CSE 489/589 Modern Networking Concepts; Fall 2010; Instructor: Hung Q. Ngo 1

Peer-to-Peer (P2P) Systems

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza

Internet Services & Protocols

Kademlia: A P2P Informa2on System Based on the XOR Metric

Server Selection Mechanism. Server Selection Policy. Content Distribution Network. Content Distribution Networks. Proactive content replication

HOW NEWNODE WORKS. Efficient and Inefficient Networks NEWNODE. And Who Needs a Content Distribution Network Anyway?

Distributed Systems Final Exam

CSCI-1680 P2P Rodrigo Fonseca

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

End-user mapping: Next-Generation Request Routing for Content Delivery

Telematics Chapter 9: Peer-to-Peer Networks

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Peer to Peer Systems and Probabilistic Protocols

Democratizing Content Publication with Coral

CSE 486/586 Distributed Systems

How Akamai delivers your packets - the insight. Christian Kaufmann SwiNOG #21 11th Nov 2010

Democratizing Content Publication with Coral

Miscellaneous. Name Service. Examples (cont) Examples. Name Servers Partition hierarchy into zones. Domain Naming System

Assignment 5. Georgia Koloniari

CSCI-1680 Web Performance, Content Distribution P2P John Jannotti

Lecture 8: Internet and Online Services. CS 598: Advanced Internetworking Matthew Caesar March 3, 2011

CSCI-1680 Web Performance, Content Distribution P2P Rodrigo Fonseca

INF3190 Data Communication. Application Layer

Outline Computer Networking. HTTP Basics (Review) How to Mark End of Message? (Review)

Searching for Shared Resources: DHT in General

CS 3516: Advanced Computer Networks

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Peer-to-peer systems and overlay networks

Peer-to-Peer Internet Applications: A Review

Peer-to-Peer Systems and Distributed Hash Tables

Searching for Shared Resources: DHT in General

Today s Objec2ves. Kerberos. Kerberos Peer To Peer Overlay Networks Final Projects

Peer-to-peer. T Applications and Services in Internet, Fall Jukka K. Nurminen. 1 V1-Filename.ppt / / Jukka K.

Motivation for peer-to-peer

CS514: Intermediate Course in Computer Systems

Overlay networks. T o do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. q q q. Turtles all the way down

Advanced Computer Networks

Introduction on Peer to Peer systems

Precept 6. Hashing & Partitioning. Server Load Balancing. Limitations of Round Robin. Multipath Load Balancing. Limitations of Round Robin

Transcription:

Overview 5-44 5-44 Computer Networking 5-64 Lecture 6: Delivering Content: Peer to Peer and CDNs Peter Steenkiste Web Consistent hashing Peer-to-peer Motivation Architectures Discussion CDN Video Fall 06 www.cs.cmu.edu/~prs/5-44-f6 Distributing Load across Servers Given document, we need to choose a server to use E.g., in a data center Suppose we use simple hashing: modulo n of a hash of the name of the document Number servers from n Place document on server ( mod n) What happens when a servers fails? n n- Same if different people have different measures of n Why might this be bad? Consistent Hash: Goals view = subset of all hash buckets that are candidate locations Correspond to a real server Desired features Load all hash buckets have a similar number of objects assigned to them Smoothness little impact on hash bucket contents when buckets are added/removed Spread small set of hash buckets that may hold an object regardless of views 4

Consistent Hash Example Consistent Hashing: Ring Construction 0 Assign each of C hash buckets to random points on mod n 4 circle, where, hash key size = n. Bucket Map object to random position on unit interval Hash of object = closest bucket 8 Monotone addition of bucket does not cause movement between existing buckets Spread & Load small set of buckets that lie near object Balance no bucket is responsible for large number of objects 4 5 Use consistent has to map both keys and nodes to an m-bit identifier in the same (metric) identifier space For example, use SHA- hashes Node identifier: SHA- hash of IP address IP= 98.0.0. Key identifier: SHA- hash of key Key= LetItBe SHA- SHA- ID= ID=60 Also need rule for assigning keys to nodes For example: closest, higher, lower,.. 6 Consistent Hashing Example Consistent Hashing Properties Rule: A key is stored at its successor: node with next higher or equal ID IP= 98.0.0. N K0 0 K5 Circular 7-bit ID space K0 N Load balance: all nodes receive roughly the same number of keys For N nodes and K keys, with high probability Each node holds at most (+ )K/N keys Provided that K is large compared to N When server is added, it receives its initial work load from neighbors on the ring Local operation: no other servers are affected Similar property when a server is removed N90 K60 Key= LetItBe 7 8

Finer Grain Load Balancing Redirector knows all server IDs s i It can also track approximate load for more precise load balancing Need to define load and be able to track it To balance load: W i = Hash(URL, ip of s i ) for all i Sort W i from high to low Find first server with low enough load Benefits and drawbacks? Consistent Hashing Used in Many Contexts Distribute load across servers in a data center The redirector sits in data center Finding storage cluster for an object in a CDN uses centralized knowledge Why? Can use consistent hashing in the cluster Consistent hashing can also be used in a distributed setting PP systems can use it find files (DHTs) 9 0 Scaling Problem PP System Millions of clients server and network meltdown Leverage the resources of client machines (peers) Computation, storage, bandwidth

Why pp? Common PP Framework Harness lots of spare capacity Big Fast Server: Gbit/s, $0k/month++,000 cable modems: Gbit/s, $?? M end-hosts: Uh, wow. Capacity grows with the number of users! Build very large-scale, self-managing systems Same techniques useful for companies and pp apps E.g., Akamai s 4,000+ nodes, Google s 00,000+ nodes Many differences to consider Servers versus arbitrary nodes Hard state (backups!) versus soft state (caches) Security, fairness, freeloading,.. Key= title Value=MP data Publish N N N Internet Fetch Content N 4 N5? N 6 New peer Join Client Search Lookup( title ) 4 What is (was) out there? When are pp Useful? Whole File Chunk Based Central Flood Supernode flood Route Napster Gnutella Freenet BitTorrent KaZaA DHTs edonkey 000 Works well for caching and soft-state, read-only data Works well! BitTorrent, KaZaA, etc., all use peers as caches for hot data Difficult to extend to persistent data Nodes come and go: need to create multiple copies for availability and replicate more as nodes leave Not appropriate for search engine styles searches Complex intersection queries ( the + who ): billions of hits for each term alone Sophisticated ranking: Must compare many results Search time tends to be unpredictable 5 6 4

Overview Content Distribution Networks (CDNs) Web Consistent hashing Peer-to-peer CDN Motivation Edge servers Content delivery Mapping Impact on Internet Video Some slides based on presentation by Patrick Gilmore The content providers are the CDN customers. Content replication CDN company installs hundreds of CDN servers throughout Internet Close to users CDN replicates its customers content in CDN servers. When provider updates content, CDN updates servers CDN server in S. America origin server in North America CDN distribution node CDN server in Europe CDN server in Asia 7 8 Case Study on Reliability and Scalability: The 000 Election http://www.akamai.com/html/technology/nui/news/index.html Customer Visits (Millions) 0 7 5 0 7 5 0 Crash Zone Without Akamai this site could not have served customers above their crash zone Time 9 5

What is the CDN? Potential Benefits Edge Caches: work with ISP and networks everywhere to install edge caches Edge = close to customers Content delivery: getting content to the edge caches Content can be objects, video, or entire web sites Mapping: find the closest edge server for each user and deliver content from that server Network proximity not the same as geographic proximity Focus is on performance as observed by user (quality) Very good scalability Near infinite if deployed properly Good economies at large scales Infrastructure is shared efficiently by customers Statistical multiplexing: hot sites use more resources Can reduce latency more predictable performance Through mapping to closest server Avoids congestion and long latencies Can be extremely reliable Very high degree of redundancy Can mitigate some DoS attacks 5-44 S'0 Edge Caches Example Configuration Region set of caches managed as a cluster May have a specific function: http, streaming, Availability is a major concern in architecture design Redundancy at the network level See next slide Dealing with server failures Servers do fail occasionally Each server has a buddy which is constantly trading hellos If hellos stop, buddy starts to respond directly to requests for primary server Users in the middle of a download may have to hit reload No one else will notice any interruption 6

Overview Content Delivery: Possible Bottlenecks Web Consistent hashing Peer-to-peer CDN Motivation Edge servers Content delivery Mapping Impact on Internet Video End User Last Mile Problem Peering Problem Internet First Mile Problem Backbone Problem Host Server 5 Process Flow Process Flow. User wants to download distributed web content. User is directed through Akamai s dynamic mapping to the closest edge cache 7

Process Flow Process Flow. Edge cache searches local hard drive for content b. If requested object is not on local hard drive, edge cache checks other edge caches in same region for object Process Flow Process Flow b b c b. If requested object is not cached or not fresh, edge cache sends an HTTP GET the origin server c. Origin server delivers object to edge cache over optimized connection 8

Process Flow Core Hierarchy Regions b 4 c 4. Edge server delivers content to end user. User requests content and is mapped to optimal edge Akamai server Core Hierarchy Regions Core Hierarchy Regions. If content is not present in the region, it is requested from most optimal core region. Core region makes one request back to origin server 9

Core Hierarchy Regions Core Hierarchy Features Reduces traffic back to origin server Reduces infrastructure needs of customer Provides best protection against flash crowds Especially important for large files (e.g. Operating System updates or video files) Improved end-user response time Core regions are well connected Optimized connection speeds object delivery 4. Core region can serve many edge regions with one request to origin server Overview Web Consistent hashing Peer-to-peer CDN Motivation Edge servers Content delivery Mapping Impact on Internet Video 9 0