CS 43: Computer Networks BitTorrent & Content Distribution. Kevin Webb Swarthmore College September 28, 2017

Similar documents
CS 43: Computer Networks. 14: DHTs and CDNs October 3, 2018

Application-Layer Protocols Peer-to-Peer Systems, Media Streaming & Content Delivery Networks

Lecture 8: Application Layer P2P Applications and DHTs

Application Layer: P2P File Distribution

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Chapter 2: Application layer

416 Distributed Systems. March 23, 2018 CDNs

CMSC 332 Computer Networks P2P and Sockets

CSC 4900 Computer Networks: P2P and Sockets

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

internet technologies and standards

Last Lecture SMTP. SUNY at Buffalo; CSE 489/589 Modern Networking Concepts; Fall 2010; Instructor: Hung Q. Ngo 1

CS 3516: Advanced Computer Networks

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Today s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications

Section 2: Application layer

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Peer-to-Peer Systems and Distributed Hash Tables

CSE 123b Communications Software

Computer Networking Introduction

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

CSCI-1680 P2P Rodrigo Fonseca

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Introduction to Computer Networks

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Peer to Peer Networks

Distributed Hash Tables

08 Distributed Hash Tables

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

Do incentives build robustness in BitTorrent?

CSE 124: CONTENT-DISTRIBUTION NETWORKS. George Porter December 4, 2017

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Peer to Peer Networks

CSCI-1680 Web Performance, Content Distribution P2P John Jannotti

Peer to Peer Systems and Probabilistic Protocols

Distributed Hash Tables: Chord

CS 3516: Computer Networks

CC451 Computer Networks

CSE 486/586 Distributed Systems Peer-to-Peer Architectures

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

CONTENT-DISTRIBUTION NETWORKS

Application Layer. Pure P2P architecture. Client-server architecture. Processes communicating. Hybrid of client-server and P2P. Creating a network app

BitTorrent. Masood Khosroshahy. July Tech. Report. Copyright 2009 Masood Khosroshahy, All rights reserved.

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Peer-to-Peer Applications Reading: 9.4

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

CSCI-1680 Web Performance, Content Distribution P2P Rodrigo Fonseca

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Lecture 17: Peer-to-Peer System and BitTorrent

Scalability of the BitTorrent P2P Application

Distributed Hash Tables

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

Mul$media Networking. #9 CDN Solu$ons Semester Ganjil 2012 PTIIK Universitas Brawijaya

Peer-to-Peer (P2P) Systems

Content Distribution. Today. l Challenges of content delivery l Content distribution networks l CDN through an example

Chapter 2 Application Layer

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza

LECT-05, S-1 FP2P, Javed I.

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Server Selection Mechanism. Server Selection Policy. Content Distribution Network. Content Distribution Networks. Proactive content replication

Extreme Computing. BitTorrent and incentive-based overlay networks.

Telematics Chapter 9: Peer-to-Peer Networks

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Introduction to Peer-to-Peer Systems

Overlay and P2P Networks. Applications. Prof. Sasu Tarkoma and

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

CSC 401 Data and Computer Communications Networks

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Game Theory. Presented by Hakim Weatherspoon

Files/News/Software Distribution on Demand. Replicated Internet Sites. Means of Content Distribution. Computer Networks 11/9/2009

Scaling Out Key-Value Storage

Middleware and Distributed Systems. Peer-to-Peer Systems. Peter Tröger

CSE 486/586 Distributed Systems

: Scalable Lookup

Overlay networks. T o do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. q q q. Turtles all the way down

Democratizing Content Publication with Coral

Web, HTTP, Caching, CDNs

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Democratizing Content Publication with Coral

CSE 486/586 Distributed Systems

P2P content distribution Jukka K. Nurminen

CS514: Intermediate Course in Computer Systems

Chapter 1 Introduction

Content Distribution and BitTorrent [Based on slides by Cosmin Arad]

A Scalable Content- Addressable Network

Drafting Behind Akamai (Travelocity-Based Detouring)

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli

Cooperative End-to-end content distribution. Márk Jelasity

Building a low-latency, proximity-aware DHT-based P2P network

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

EE 122: Peer-to-Peer Networks

Lesson 9 Applications of DHT: Bittorrent Mainline DHT, the KAD network

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Kademlia: A P2P Informa2on System Based on the XOR Metric

CIS 700/005 Networking Meets Databases

Transcription:

CS 43: Computer Networks BitTorrent & Content Distribution Kevin Webb Swarthmore College September 28, 2017

Agenda BitTorrent Cooperative file transfers Briefly: Distributed Hash Tables Finding things without central authority Content distribution networks (CDNs) Add hosts to network to exploit locality Video streaming (DASH)

File Transfer Problem You want to distribute a file to a large number of people as quickly as possible.

Traditional Client/Server Free Capacity Heavy Congestion

P2P Solution

Minimum Distribution Time Client-server vs. P2P: example 3.5 3 2.5 2 1.5 1 0.5 0 P2P Client-Server 0 5 10 15 20 25 30 35 N (Participants) Let F = file size, client UL rate = u, server rate = u s, d = client DL rate Assumptions: F/u = 1 hour, u s = 10u, d min u s

Am I helpful? P2P Solution

Do we need a centralized server at all? Would you use one for something? Am I helpful? A. Unnecessary, would not use one. B. Unnecessary, would still use one. C. Necessary, would have to use it. D. Something else.

P2P file distribution: BitTorrent File divided into chunks (commonly 256 KB) Peers in torrent send/receive file chunks tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file Alice arrives obtains list of peers from tracker and begins exchanging file chunks with peers in torrent

.torrent files Contains address of tracker for the file Where can I find other peers? Contain a list of file chunks and their cryptographic hashes This ensures pieces are not modified

P2P file distribution: BitTorrent Peer joining torrent: has no chunks, but will accumulate them over time from other peers registers with tracker to get list of peers, connects to subset of peers ( neighbors ) While downloading, peer uploads chunks to other peers Peer may change peers with whom it exchanges chunks Churn: peers may come and go Once peer has entire file, it may (selfishly) leave or (altruistically) remain in torrent

Requesting Chunks At any given time, peers have different subsets of file chunks. Periodically, each asks peers for list of chunks that they have.

If you re trying to receive a file, which chunk should you request next? A. Random chunk. B. Most common chunk. C. Least common chunk. D. Some other chunk. E. It doesn t matter.

Requesting Chunks At any given time, peers have different subsets of file chunks. Periodically, each asks peers for list of chunks that they have. In BitTorrent: Peers request rarest chunks first.

Sending Chunks A node sends chunks to those four peers currently sending it chunks at highest rate other peers are choked (do not receive chunks) re-evaluate top 4 every ~10 secs Every 30 seconds: randomly select another peer, start sending chunks optimistically unchoke this peer newly chosen peer may join top 4

Academic Interest in BitTorrent BitTorrent was enormously successful Large user base Lots of aggregate traffic Invented relatively recently Academic Projects Modifications to improve performance Modeling peer communications (auctions) Gaming the system (BitTyrant)

Getting rid of that server Distribute the tracker information using a Distributed Hash Table (DHT) A DHT is a lookup structure. Maps keys to an arbitrary value. Works a lot like, well a hash table.

Recall: Hash Function Mapping of any data to an integer E.g., md5sum, sha1, etc. md5: 04c3416cadd85971a129dd1de86cee49 With a good (cryptographic) hash function: Hash values very likely to be unique Near-impossible to find collisions (hashes spread out)

Recall: Hash table N buckets Key-value pair is assigned bucket i i = HASH(key)%N Easy to look up value based on key Multiple key-value pairs assigned to each bucket

Distributed Hash Table (DHT) DHT: a distributed P2P database Distribute the (k, v) pairs across the peers key: ss number; value: human name key: file name; value: BT tracker peer(s) Same interface as standard HT: (key, value) pairs get(key) send key to DHT, get back value put(key, value) modify stored value at the given key

Challenges How do we assign (key, value) pairs to nodes? How do we find them again quickly? What happens if nodes join/leave? Basic idea: Convert each key to an integer via hash Assign integer to each peer via hash Store (key, value) pair at the peer closest to the key

Circular DHT Overlay 1 15 3 12 10 Simplest form: each peer only aware of immediate successor and predecessor. 8 5 4

Circular DHT Overlay 1 15 3 12 10 Simplest form: each peer only aware of immediate successor and predecessor. 8 5 4

Circular DHT Overlay 1 15 3 4 12 5 10 8 Example: Node 1 wants key Led Zeppelin IV Hash the key

Circular DHT Overlay 1 15 3 4 12 5 10 8 Example: Node 1 wants key Led Zeppelin IV Hash the key (suppose it gives us 6)

Circular DHT Overlay 1 15 3 4 12 5 10 8 Example: Node 1 wants key Led Zeppelin IV Hash the key (suppose it gives us 6)

Circular DHT Overlay 1 15 3 4 12 5 10 8 Example: Node 1 wants key Led Zeppelin IV Hash the key (suppose it gives us 6)

Circular DHT Overlay 1 15 3 4 12 5 10 8 Example: Node 1 wants key Led Zeppelin IV Hash the key (suppose it gives us 6)

Circular DHT Overlay 1 15 3 12 5 4 If anybody has it, it s my successor. 10 8 Example: Node 1 wants key Led Zeppelin IV Hash the key (suppose it gives us 6)

Circular DHT Overlay 1 15 3 4 12 5 10 8 Checks key Example: Node 1 wants key Led Zeppelin IV Hash the key (suppose it gives us 6)

Circular DHT Overlay 1 15 Value Data 3 4 12 5 10 8 Example: Node 1 wants key Led Zeppelin IV Hash the key (suppose it gives us 6)

Given N nodes, what is the complexity (number of messages) of finding a value when each peer knows its successor? A. O(log n) Can we do better? How? 1 B. O(n) 15 3 C. O(n 2 ) 12 5 4 D. O(2 n ) 10 8

Reducing Message Count 1 15 3 4 12 5 10 8 Store successors that are 1, 2, 4, 8,, N/2 away. Can jump up to half way across the ring at once. Cut the search space in half - lookups take O(log N) messages.

More DHT Info How do nodes join/leave? How does cryptographic hashing work? How much state does each node store?

More DHT Info How do nodes join/leave? How does cryptographic hashing work? How much state does each node store? Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications Dynamo: Amazon s Highly Available Key-value Store

High-Performance Content Distribution Problem: You have a service that supplies lots of data. You want good performance for all users! (often lots of data means media files)

High-Performance Content Distribution CDNs applied to all sorts of traffic. You pay for service (e.g., Akamai), they ll host your content very close to many users. Major challenges: How do we direct the user to a nearby replica instead of the centralized source? How do we determine which replica is the best to send them to?

Finding the CDN Three main options: Application redirect (e.g., HTTP) Anycast routing DNS resolution (most popular in practice) Example: CNN + Akamai

CNN + Akamai www.cnn.com Request: cnn.com/article Response: HTML with link to cache.cnn.com media Content servers: serve media.

CNN + Akamai Root DNS Servers www.cnn.com com DNS servers org DNS servers edu DNS servers cnn.com DNS servers pbs.org DNS servers swarthmore.edu DNS servers Request: cnn.com/article Response: HTML with link to cache.cnn.com media Content servers: serve media.

CNN + Akamai Root DNS Servers www.cnn.com Request: cnn.com/article Response: HTML with link to cache.cnn.com media Retrieve media file. com DNS servers org DNS servers edu DNS servers cnn.com DNS servers pbs.org DNS servers akamai.net DNS servers Akamai s DNS response directs user to selected server. swarthmore.edu DNS servers Content servers: serve media.

CNN + Akamai Root DNS Servers www.cnn.com com DNS servers org DNS servers edu DNS servers cnn.com DNS servers pbs.org DNS servers swarthmore.edu DNS servers Request: cnn.com/article Response: HTML with link to cache.cnn.com media akamai.net DNS servers How to choose? Retrieve media file. Akamai s DNS response directs user to selected server. Content servers: serve media.

Which metric is most important when choosing a server? (CDN or otherwise) A. RTT latency B. Data transfer rate / throughput C. Hardware ownership D. Geographic location This is the CDN operator s secret sauce! E. Some other metic(s) (such as?)

Streaming Media Straightforward approach: simple GET Challenges: Dynamic network characteristics Varying user device capabilities User mobility

Dynamic Adaptive Streaming over HTTP (DASH) Encode several versions of the same media file low / medium / high / ultra quality Break each file into chunks Create a manifest to map file versions to chunks / video time offset

Dynamic Adaptive Streaming over HTTP (DASH) Client requests manifest file, chooses version Requests new chunks as it plays existing ones Can switch between versions at any time!

Summary Peer-to-peer architectures for: High performance: BitTorrent Decentralized lookup: DHTs CDNs: locating good replica for media server DASH: streaming despite dynamic conditions