Peer-to-Peer Protocols and Systems. TA: David Murray Spring /19/2006

Similar documents
Outline. Peer-to-Peer. P2p file-sharing. Wither p2p? What s out there? The p2p challenge C1: Search(human s goals) -> file

Scaling Problem Computer Networking. Lecture 23: Peer-Peer Systems. Fall P2P System. Why p2p?

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Lecture 21 P2P. Napster. Centralized Index. Napster. Gnutella. Peer-to-Peer Model March 16, Overview:

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

Reti P2P per file distribution: BitTorrent

Miscellaneous. Name Service. Examples (cont) Examples. Name Servers Partition hierarchy into zones. Domain Naming System

Special Topics: CSci 8980 Edge History

Peer-to-Peer Applications Reading: 9.4

Overview Computer Networking Lecture 17: Delivering Content Peer to Peer Examples Peter Steenkiste

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Scalable overlay Networks

CPS 214: Computer Networks and Distributed Systems Networked Environments: Grid and P2P systems

CS 3516: Advanced Computer Networks

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

EE 122: Peer-to-Peer Networks

Peer-to-Peer Internet Applications: A Review

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Peer to Peer Systems and Probabilistic Protocols

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Content Overlays. Nick Feamster CS 7260 March 12, 2007

CIS 700/005 Networking Meets Databases

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

15-744: Computer Networking P2P/DHT

Distributed Systems Peer to Peer

Content Search. Unstructured P2P. Jukka K. Nurminen

CSCI-1680 P2P Rodrigo Fonseca

Peer-to-Peer Systems. Internet Computing Workshop Tom Chothia

CSE 486/586 Distributed Systems Peer-to-Peer Architectures

Overlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Content Search. Unstructured P2P

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

Telematics Chapter 9: Peer-to-Peer Networks

Introduction to P2P Computing

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

Making Gnutella-like P2P Systems Scalable

Distributed Systems Peer to Peer

Lecture 8: Application Layer P2P Applications and DHTs

P2P. 1 Introduction. 2 Napster. Alex S. 2.1 Client/Server. 2.2 Problems

Introduction on Peer to Peer systems

P2P Computing. Nobuo Kawaguchi. Graduate School of Engineering Nagoya University. In this lecture series. Wireless Location Technologies

Peer-peer and Application-level Networking. CS 218 Fall 2003

Peer-to-peer systems and overlay networks

Distributed Systems Peer to Peer

Last Lecture SMTP. SUNY at Buffalo; CSE 489/589 Modern Networking Concepts; Fall 2010; Instructor: Hung Q. Ngo 1

CPSC156a: The Internet Co-Evolution of Technology and Society

CS 3516: Computer Networks

Scalable overlay Networks

Ossification of the Internet

Overlay and P2P Networks. Unstructured networks: Freenet. Dr. Samu Varjonen

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Introduction to Peer-to-Peer Systems

Distributed Systems. peer-to-peer Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems Peer to Peer

Distributed Knowledge Organization and Peer-to-Peer Networks

Architectures for Distributed Systems

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

INF5071 Performance in distributed systems: Distribution Part III

Peer-to-Peer Networks

Early Measurements of a Cluster-based Architecture for P2P Systems

Peer-to-Peer Systems

Implementation of the Gnutella Protocol

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli

A Survey of Peer-to-Peer Content Distribution Technologies

Main Challenge. Other Challenges. How Did it Start? Napster. Model. EE 122: Peer-to-Peer Networks. Find where a particular file is stored

CS December 2017

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Peer-to-Peer Systems and Distributed Hash Tables

Searching for Shared Resources: DHT in General

Peer to Peer Networks

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza

Peer-to-Peer (P2P) Systems

Scalable overlay Networks

Distributed Systems. 25. Distributed Caching & Some Peer-to-Peer Systems Paul Krzyzanowski. Rutgers University. Fall 2017

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Searching for Shared Resources: DHT in General

Adaptive Routing of QoS-Constrained Media Streams over Scalable Overlay Topologies

Today. Architectural Styles

Extreme Computing. BitTorrent and incentive-based overlay networks.

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Peer to Peer Networks

Advanced Computer Networks

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd

Transcription:

Peer-to-Peer Protocols and Systems TA: David Murray 15-441 Spring 2006 4/19/2006

P2P - Outline What is P2P? P2P System Types 1) File-sharing 2) File distribution 3) Streaming Uses & Challenges 2

Problem: Scalability Hundreds of clients => 1 server OK Thousands of clients => 1 server Maybe OK Millions/billions of clients => 1 server What happens?... 3

4

Solution: Distribute the cost among the end users 5

Three Classes of P2P Systems 1) File-sharing (old) Napster (centralized) Gnutella (flooding) KaZaA (intelligent flooding) DHTs/Chord (structured overlay routing) 2) File distribution BitTorrent 3) Streaming End System Multicast (a.k.a. Overlay Multicast) 6

1) P2P File-sharing Systems

1) P2P File-sharing Systems Centralized Database (old) Napster Query Flooding Gnutella Intelligent Query Flooding KaZaA Structured Overlay Routing Distributed Hash Tables 8

File searching N 1 N 2 N3 Key= title Value=MP3 data Publisher Internet? Client Lookup( title ) N 4 N5 N 6 9

File searching Needles vs. Haystacks Searching for top 40, or an obscure punk track from 1981 that nobody s heard of? Search expressiveness Whole word? Regular expressions? File names? Attributes? Whole-text search? (e.g., p2p gnutella or p2p google?) 10

File-sharing: Framework Common Primitives: Join: how do I begin participating? Publish: how do I advertise my file? Search: how do I find a file? Fetch: how do I retrieve a file? 11

P2P File-sharing Systems Centralized Database (old) Napster Query Flooding Gnutella Intelligent Query Flooding KaZaA Structured Overlay Routing Distributed Hash Tables 12

(old) Napster: History 1999: Sean Fanning launches Napster Peaked at 1.5 million simultaneous users Jul 2001: Napster shuts down [2003: Napster s name reused for an online music service (no relation)] 13

(old) Napster: Overview Centralized Database: Join: on startup, client contacts central server Publish: reports list of files to central server Search: query the server => return someone that stores the requested file Fetch: get the file directly from peer 14

Napster: Publish insert(xenathemesong.mp3, 123.2.21.23) Insert(HerculesThemeSong.mp3, 123.2.21.23)... Publish I have lots of TV show theme song files! 123.2.21.23 15

Napster: Search Where is the Xena Theme song? Fetch Query search(xena) Reply 16

Napster: Discussion Pros: Simple Search scope is O(1) Controllable (pro or con?) Cons: Server maintains O(N) State Server does all processing Single point of failure 17

P2P File-sharing Systems Centralized Database (old) Napster Query Flooding Gnutella Intelligent Query Flooding KaZaA Structured Overlay Routing Distributed Hash Tables 18

Gnutella: History In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella Soon many other clients: Bearshare, Morpheus, LimeWire, etc. In 2001, many protocol enhancements including ultrapeers 19

Gnutella: Overview Query Flooding: Join: on startup, client contacts a few other nodes; these become its neighbors Publish: (no need) Search: ask neighbors, who ask their neighbors, and so on... when/if found, reply to sender. TTL limits propagation Fetch: get the file directly from peer 20

Gnutella: Search I have XenaEpisode1.mpg I have XenaEpisode1.mpg Reply Query Where is XenaEpisode1.mpg? 21

Gnutella: Discussion Pros: Fully de-centralized Search cost distributed Processing @ each node permits powerful search semantics Cons: Search scope is O(N) Search time is O(???) Nodes leave often, network unstable TTL-limited search works well for haystacks. For scalability, does NOT search every node. May have to re-issue query later 22

P2P File-sharing Systems Centralized Database (old) Napster Query Flooding Gnutella Intelligent Query Flooding KaZaA Structured Overlay Routing Distributed Hash Tables 23

KaZaA: History In 2001, KaZaA created by Dutch company Kazaa BV Single network called FastTrack used by other clients as well: Morpheus, gift, etc. Eventually protocol changed so other clients could no longer talk to it 24

KaZaA: Overview Smart Query Flooding: Join: on startup, client contacts a supernode... may at some point become one itself Publish: send list of files to supernode Search: send query to supernode, supernodes flood query amongst themselves. Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers 25

KaZaA: Network Design Super Nodes 26

KaZaA: File Insert insert(xena.mp3, Publish 123.2.21.23) Insert(Hercules.mp3, 123.2.21.23) 123.2.21.23 I have lots of files! 27

KaZaA: File Search search(xena.mp3) --> 123.2.22.50 123.2.22.50 Query Replies search(xena.mp3) --> 123.2.0.18 Where is Xena.mp3? 123.2.0.18 28

KaZaA: Fetching More than one node may have requested file... How to tell? Must be able to distinguish identical files Not necessarily same filename Same filename not necessarily same file... Use Hash of file KaZaA uses its own UUHash : fast, but not secure Alternatives: MD5, SHA-1 How to fetch? Get bytes [0..1000] from A, [1001...2000] from B Alternative: Erasure Codes 29

KaZaA: Discussion Pros: Tries to take into account node heterogeneity: Bandwidth Host Computational Resources Host Availability (?) Rumored to take into account network locality Cons: Mechanisms easy to circumvent Still no real guarantees on search scope or search time Similar behavior to Gnutella, but better. 30

Stability and Superpeers Why supernodes? Query consolidation Many connected nodes may have only a few files Propagating a query to a sub-node would take more b/w than answering it yourself Caching effect Requires network stability Supernode selection is time-based How long you ve been on is a good predictor of how long you ll be around. 31

P2P File-sharing Systems Centralized Database (old) Napster Query Flooding Gnutella Intelligent Query Flooding KaZaA Structured Overlay Routing Distributed Hash Tables 32

Distributed Hash Tables Academic answer to P2P Goals Guaranteed lookup success Provable bounds on search time Provable scalability Makes some things harder Fuzzy queries / full-text search / etc. Read-write, not read-only Hot Topic in networking since introduction in ~2000/2001 33

DHT: Overview Abstraction: a distributed hash-table (DHT) data structure: put(id, item); item = get(id); Implementation: nodes in system form a distributed data structure Can be Ring, Tree, Hypercube, Skip List, Butterfly Network,... 34

DHT: Overview (continued) Structured Overlay Routing: Join: On startup, contact a bootstrap node and integrate yourself into the distributed data structure; get a node id Publish: Route publication for file id toward a close node id along the data structure Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication. (Note: cannot do keyword search) Fetch: Two options: Publication contains actual file => fetch from where query stops Publication says I have file X => query tells you 128.2.1.3 has X, use IP routing to get X from 128.2.1.3 35

DHT: Example Chord Associate to each node and file a unique id in an uni-dimensional space (a Ring) E.g., pick from the range [0...2 m ] Usually the hash of the file or IP address Properties: It allows a distributed set of participants to agree on a single node as a rendezvous point for a given key, without any central coordination. (Chord site) Can find data using O(log N) messages, where N is the total number of nodes (Why? We ll see ) from MIT in 2001 36

DHT: Consistent Hashing Node 105 N105 Key 5 K5 K20 Circular ID space N32 N90 K80 A key is stored at its successor: node with next higher ID 37

DHT: Chord Basic Lookup N105 N120 N10 Where is key 80? N90 has K80 N32 K80 N90 N60 38

DHT: Chord Finger Table 1/4 1/2 1/8 1/16 1/32 1/64 1/128 N80 Entry i in the finger table of node n is the first node that succeeds or equals n + 2 i In other words, the ith finger points 1/2 n-i way around the ring (This is what makes O(log N) messages for a retrieval possible!) 39

DHT: Chord Join Assume an identifier space [0..8] Node n1 joins 7 0 1 Succ. Table i id+2 i succ 0 2 1 1 3 1 2 5 1 6 2 5 4 3 40

DHT: Chord Join Node n2 joins 7 0 1 Succ. Table i id+2 i succ 0 2 2 1 3 1 2 5 1 6 2 Succ. Table 5 4 3 i id+2 i succ 0 3 1 1 4 1 2 6 1 41

DHT: Chord Join Succ. Table Nodes n0, n6 join Succ. Table 7 0 i id+2 i succ 0 1 1 1 2 2 2 4 0 1 Succ. Table i id+2 i succ 0 2 2 1 3 6 2 5 6 i id+2 i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table 5 4 3 i id+2 i succ 0 3 6 1 4 6 2 6 6 42

DHT: Chord Join Nodes: n1, n2, n0, n6 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 Items 7 Items: f7, f2 0 Succ. Table 7 1 i id+2 i succ 0 2 2 1 3 6 2 5 6 Items 1 Succ. Table 6 2 i id+2 i succ 0 7 0 1 0 0 2 2 2 5 4 3 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 43

DHT: Chord Routing Upon receiving a query for item id, a node: Checks whether stores the item locally If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table i id+2 i succ 0 7 0 1 0 0 2 2 2 6 0 Succ. Table 7 1 i id+2 i succ query(7) 0 2 2 1 3 6 2 5 6 5 4 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 3 2 Items 7 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 44 Items 1

DHT: Chord Summary Routing table size? Log N fingers Routing time? Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops. 45

DHT: Discussion Pros: Guaranteed Lookup O(log N) per node state and search scope Cons: Does *not* support keyword search No one uses them? (only one file sharing app) Supporting non-exact match search is hard 46

1) P2P File-sharing: Summary Many different styles; remember pros and cons of each centralized, flooding, intelligent flooding, overlay routing Lessons learned: Single points of failure are very bad Flooding messages to everyone is bad Underlying network topology is important Not all nodes need be equal Need incentives to discourage freeloading Privacy and security matter Structure can provide theoretical bounds and guarantees 47

2) P2P File Distribution Systems (i.e. BitTorrent)

BitTorrent: History In 2002, B. Cohen debuted BitTorrent Key Motivation: Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN on 9/11, new movie/game release Focused on Efficient Fetching, not Searching: Distribute the same file to all peers Single publisher, multiple downloaders Has many real publishers: Example: Blizzard Entertainment uses it for World of Warcraft update distribution 49

BitTorrent: Overview Focused in efficient fetching, not searching Swarming: Join: contact centralized tracker server, get a list of peers. Publish: Run a tracker server. Search: n/a (need to find elsewhere, i.e. Google) Fetch: Download chunks of the file from your peers. Upload chunks you have to them. Improvements from old Napster-era: Chunk based downloading; few large files Anti-freeloading mechanisms 50

BitTorrent: Publish/Join Tracker 51

BitTorrent: Fetch 52

BitTorrent: Summary Pros: Works reasonably well in practice Gives peers incentive to share resources; avoids freeloaders Cons: No search; only content distribution Central tracker server needed to bootstrap swarm 53

3) P2P Streaming Systems (i.e. Overlay Multicast a.k.a. End System Multicast)

Live Broadcast: Pre-Internet Tower/Cable/Satellite TV, Radio Problems Limitations # of channels Physical reach Cost Content production monopolized by big corps. Content distribution monopolized by big corps. 55

Live Internet Broadcast: Pre-P2P Unicast streaming Small-scale video conferencing Large-scale streaming (AOL Live, etc.) Problems Unicast streaming requires lots of bandwidth Example: AOL webcast of Live 8 concert, July 2, 2005: 1500 servers in 90 locations = $$$ 56

Solution...? Use IP Multicast!...? 57

Solution: IP Multicast? On a single LAN: GREAT! Can distribute traffic to everyone at once without duplicating streams, set TTL=1, no problem! Cross-LAN Problem Requires multicast-enabled routers Must allocate resources toward processing multicast packets As a result, *MANY, MANY* computers can t receive IP multicast packets from outside their LAN 58

Solution (again!): Don t depend on routers Distribute the cost among the end users 59

Internet broadcasting Structure Router Source Application end-point 60

End System Multicast (ESM) (a.k.a Overlay Multicast) + Instantly deployable + Enables ubiquitous broadcast Router Source Application end-point 61

Example ESM Tree http://esm.cs.cmu.edu 62

Single Overlay Distribution Tree A C D B E G F 63

Single Overlay Distribution Tree A IN: n kb/sec C OUT: 2n kb/sec D B E G F 64

Multiple Overlay Distribution Trees A C D B E G F 65

Multiple Overlay Distribution Trees A C D B E G F 66

Multiple Overlay Distribution Trees A C D B E G F 67

Multiple Overlay Distribution Trees A C D N kb/sec B G F E N/2 kb/sec N/2 kb/sec D A E G B C B A F C F G D E 68

My Research with ESM Can we combine the best parts of multicast with ESM? My solution: Integrate LAN Multicast (i.e. IP Multicast with TTL=1) with ESM Each LAN has 1 or more forwarders from the outside receiving data which gets forwarded on multicast with TTL=1 Nodes of overlay trees are now LANs instead of individual hosts 69

ESM + LAN Multicast LAN A LAN C LAN D LAN B LAN E LAN G LAN F 70

P2P Systems: Summary 3 types of P2P systems File-sharing Centralized ((old) Napster), Flooding (Gnutella), Intelligent Flooding (KaZaA) Overlay Routing (DHTs/Chord) File distribution BitTorrent Streaming End System Multicast a.k.a. Overlay Multicast Lessons Single points of failure are very bad Underlying network topology is important Not all nodes are equal Can t depend on routers to satisfy all of your networking desires Room for growth Privacy & Security Research is ongoing 71

Extra Slides (From previous P2P lectures)

KaZaA: Usage Patterns KaZaA is more than one workload! Many files < 10MB (e.g., Audio Files) Many files > 100MB (e.g., Movies) from Gummadi et al., SOSP 2003 73

KaZaA: Usage Patterns (2) KaZaA is not Zipf! FileSharing: Request-once Web: Requestrepeatedly from Gummadi et al., SOSP 2003 74

KaZaA: Usage Patterns (3) What we saw: A few big files consume most of the bandwidth Many files are fetched once per client but still very popular Solution? Caching! from Gummadi et al., SOSP 2003 75

Freenet: History In 1999, I. Clarke started the Freenet project Basic Idea: Employ Internet-like routing on the overlay network to publish and locate files Addition goals: Provide anonymity and security Make censorship difficult 76

Freenet: Overview Routed Queries: Join: on startup, client contacts a few other nodes it knows about; gets a unique node id Publish: route file contents toward the file id. File is stored at node with id closest to file id Search: route query for file id toward the closest node id Fetch: when query reaches a node containing file id, it returns the file to the sender 77

Freenet: Routing Tables id file identifier (e.g., hash of file) next_hop another node that stores the file id file file identified by id being stored on the local node Forwarding of query for file id If file id stored locally, then stop Forward data back to upstream requestor If not, search for the closest id in the table, and forward the message to the corresponding next_hop If data is not found, failure is reported back Requestor then tries next closest match in routing table id next_hop file 78

Freenet: Routing query(10) n1 4 n1 f4 12 n2 f12 5 n3 1 n2 9 n3 f9 4 4 n4 n5 n3 2 3 14 n5 f14 13 n2 f13 3 n6 5 4 n1 f4 10 n5 f10 8 n6 3 n1 f3 14 n4 f14 5 n3 79

Freenet: Routing Properties Close file ids tend to be stored on the same node Why? Publications of similar file ids route toward the same place Network tend to be a small world Small number of nodes have large number of neighbors (i.e., ~ six-degrees of separation ) Consequence: Most queries only traverse a small number of hops to find the file 80

Freenet: Anonymity & Security Anonymity Randomly modify source of packet as it traverses the network Can use mix-nets or onion-routing Security & Censorship resistance No constraints on how to choose ids for files => easy to have to files collide, creating denial of service (censorship) Solution: have a id type that requires a private key signature that is verified when updating the file Cache file on the reverse path of queries/publications => attempt to replace file with bogus data will just cause the file to be replicated more! 81

Freenet: Discussion Pros: Intelligent routing makes queries relatively short Search scope small (only nodes along search path involved); no flooding Anonymity properties may give you plausible deniability Cons: Still no provable guarantees! Anonymity features make it hard to measure, debug 82