Lecture-2 Content Sharing in P2P Networks Different P2P Protocols

Lecture-2 Content Sharing in P2P Networks Different P2P Protocols Zhou Shuigeng March 10, 2005

Outline Classification of content sharing P2P systems Content sharing P2P systems Napster; Gnutella; Freenet Chord; CAN Pastry; Tapestry Summary References 2006-3-10 Dept. of Computer Sci. & Eng. 2

Classification of Content Sharing P2P Systems 2006-3-10 Dept. of Computer Sci. & Eng. 3

How to Classify? We can classify content sharing P2P systems based on two general aspects Degree of decentralization Network structure 2006-3-10 Dept. of Computer Sci. & Eng. 4

Degree of Decentralization(1) Degree of decentralization: to what extent they rely to one or more servers to facilitate the interaction between peers Purely decentralized Partially decentralized Hybrid decentralized 2006-3-10 Dept. of Computer Sci. & Eng. 5

Degree of Decentralization(2) Purely decentralized Examples: original Gnutella, Freenet All nodes in the network perform exactly the same tasks, acting both as servers and clients There is no central coordination of the nodes activities The nodes of such networks are termed servents (SERVers+clieENTS). 2006-3-10 Dept. of Computer Sci. & Eng. 6

Purely Decentralized Peer Peer Peer Peer Peer Peer 2006-3-10 Dept. of Computer Sci. & Eng. 7

Degree of Decentralization(3) Partially decentralized Examples: Kazaa, Morpheus, and new version of Gnutella Supernodes assume a more important role than the rest of the nodes, acting as local central indexes for files shared by local peers Supernodes are dynamically assigned in case of single point failure or malicious attack 2006-3-10 Dept. of Computer Sci. & Eng. 8

Partially Decentralized Supernode Layer Cluster Cluster Cluster 2006-3-10 Dept. of Computer Sci. & Eng. 9

Degree of Decentralization(4) Hybrid decentralized Examples: Napster A central server facilitating the interaction between peers by maintaining meta-data, performing the lookups and identifying desirable nodes The end-to-end interaction is between any two peer clients Weakness: single point failure, malicious attack 2006-3-10 Dept. of Computer Sci. & Eng. 10

Hybrid Decentralized Peer Peer Peer Central Server Peer Peer Peer 2006-3-10 Dept. of Computer Sci. & Eng. 11

BestPeer Location independent global lookup (LIGLO) LIGLO Servers LIGLO Server LIGLO ServerLIGLO Server Peers Peer Peer Peer Peer Peer Peer Peer 2006-3-10 Dept. of Computer Sci. & Eng. 12

Different Classification P2P systems Purely decentralized, partial decentralized, hybrid decentralized Purely P2P, hybrid P2P, centralized P2P A question What type P2P should we regard CAN, Chord etc. in terms of the degree of decentralization? 2006-3-10 Dept. of Computer Sci. & Eng. 13

Network Structure(1) A P2P system corresponds to an overlay network, which is a kind of logical topologies, and may be totally unrelated to the underlying physical network P2P network structure indicates the way in which the content of the network is located with respect to the network topology Directly knowing on which nodes some specific content is located randomly search the entire network to find the desirable content 2006-3-10 Dept. of Computer Sci. & Eng. 14

Network Structure(2) P2P systems can be differentiated by the degree to which these overlay networks contain some structure or are created ad-hoc Unstructured Structured Loosely structured 2006-3-10 Dept. of Computer Sci. & Eng. 15

Network Structure(3) Unstructured Examples: Gnutella Features Data placement is unrelated to the overlay topology Data finding relies on random search; distributing queries from node to node Overlay topology construction Advantage: easily accommodate a highly transient node population Disadvantages: hard to find the desired files without distributing queries widely; scalability 2006-3-10 Dept. of Computer Sci. & Eng. 16

Network Structure(4) Structured Examples: Chord, CAN, Pastry, Tapestry Address the scalability problem in unstructured networks Features: topology is tightly controlled; files (or pointers to them) are placed at specified locations by providing a mapping between file identifier and location Advantage: scalable solution for exact-match queries Disadvantage: hard to maintain the structure when nodes leaving and joining at high rate 2006-3-10 Dept. of Computer Sci. & Eng. 17

Network Structure(5) Loosely structured Example: Freenet/Napster(?) Network structure is between structured and unstructured File locations are affected by routing hints, but they are not completely specified, so not all searches succeed 2006-3-10 Dept. of Computer Sci. & Eng. 18

A Combined View BestPeer? 2006-3-10 Dept. of Computer Sci. & Eng. 19

P2P Content Sharing Systems 2006-3-10 Dept. of Computer Sci. & Eng. 20

Unstructured Systems Hybrid decentralized unstructured systems (Napster) Purely decentralized unstructured systems (Gnutella: original architecture) Partially decentralized unstructured systems (Kazaa, Morpheus, Gnutella: more recent architecture) 2006-3-10 Dept. of Computer Sci. & Eng. 21

Napster: Architecture Web search engine? 2006-3-10 Dept. of Computer Sci. & Eng. 22

Napster: the Central Server The central directory server maintaining An index with metadata (file name, time of creation etc.) of all the files in the network. Document info A table listing the files that each user holds and shares in the network Peer-document info A table of registered user connection information (IP addresses, connection speeds etc.) Peer info: address 2006-3-10 Dept. of Computer Sci. & Eng. 23

Napster: Operational Mode On startup the client contacts the central server and reports a list with the files it maintains When the server receives a query from a user, it searches for matches in its index returning a list of users that hold the matching file The user then opens a direct connection with the peer that holds the requested file, and downloads it 2006-3-10 Dept. of Computer Sci. & Eng. 24

Napster: Strength vs. Weakness Advantage simple and efficient Disadvantages vulnerable to censorship, malicious attack and technical failure. inherently not largely scalable do not offer an acceptable p2p solution (controlled by the single institution, company or user maintaining the central server) 2006-3-10 Dept. of Computer Sci. & Eng. 25

Gnutella: General Information First introduced in March of 2000 by two employees of AOL s Nullsoft division The goal is to provide purely distributed file sharing solution A Communication protocol used to search for and share files among users 2006-3-10 Dept. of Computer Sci. & Eng. 26

Gnutella: A Snapshot of Network A snapshot of the Gnutella network on January 27 2000 2006-3-10 Dept. of Computer Sci. & Eng. 27

Gnutella: Operational Mode(1) Users connect to each other directly through a Gnutella software (referred to as a servent) An instance of this software running on a particular machine is also referred to as a host Using four type messages for communication between hosts Ping, Pong, Query, and Queryhit 2006-3-10 Dept. of Computer Sci. & Eng. 28

Gnutella: Operational Mode(2) Ping: A request for a certain host to announce itself Pong: Reply to a Ping message. It contains the IP and port of the responding host and number and size of files shared Query: A search request. It contains search string and the minimum speed requirements of the response Query hits: Reply to a Query message. It contains the IP and port and speed of the responding host, the number of matching files found and their indexed result set 2006-3-10 Dept. of Computer Sci. & Eng. 29

Gnutella: Operational Mode(3) A node joins Gnutella by using hosts such as gnutellahosts.com After joining the Gnutella network a node sends out a Ping message to any node it is connected to The nodes send back a Pong message identifying themselves, and also propagate the ping message to their neighbors Gnutella originally uses TTL(Time-to-Live)- limited flooding (or broadcast) to distribute Ping and Query messages 2006-3-10 Dept. of Computer Sci. & Eng. 30

Gnutella: Operational Mode(4) Each message is labeled by a unique identifier Each host with a dynamic routing table of message identifiers Since the response messages contain the same ID as the original messages, the host checks its routing table to determine along which link the response message should be forwarded Once a node receives a QueryHit message, it initiates a direct out-of-network download by establishing a direct connection between the source and target node 2006-3-10 Dept. of Computer Sci. & Eng. 31

Gnutella: Operational Mode(5) 2006-3-10 Dept. of Computer Sci. & Eng. 32

Free-riding on Gnutella(1) 24 hour sampling period 70% of Gnutella users share no files 50% of all responses are returned by top 1% of sharing hosts A social problem not a technical one Problems: Degradation of system performance: collapse? Increase of system vulnerability Gnutella causes copyright issues? 2006-3-10 Dept. of Computer Sci. & Eng. 33

Free-riding on Gnutella(2) Most Gnutella users are free riders Of 33,335 hosts Top 1 percent (333) hosts share 37% (1,142,645) of total files shared Top 5 percent (1,667) hosts share 70% (1,142,645) of total files shared Top 10 percent (3,334) hosts share 87% (2,692,082) of total files shared 2006-3-10 Dept. of Computer Sci. & Eng. 34

Free-riding on Gnutella(3) Many servents share files nobody downloads Of 11,585 sharing hosts: Top 1% of sites provide nearly 47% of all answers Top 25% of sites provide 98% of all answers 7,349 (63%) never provide a query response 2006-3-10 Dept. of Computer Sci. & Eng. 35

Gnutella: Popularity of Queries Very popular documents are approximately equally popular Less popular documents follow a Zipf-like distribution (i.e., the probability of seeing a query for the i-th most popular query is proportional to 1/ α i 2006-3-10 Dept. of Computer Sci. & Eng. 36

Gnutella Topology(1) Small-world properties verified ( find everything close by ) Backbone + outskirts 2006-3-10 Dept. of Computer Sci. & Eng. 37

Gnutella Topology(2) Backbone of Gnutella 2006-3-10 Dept. of Computer Sci. & Eng. 38

Gnutella: Strength & Weakness Advantages Fully Embody the nature of P2P Disadvantages TTL segments the Gnutella network into subnets Heavy query traffic on the networks Improving measurements parallel random walks Objects replication: passive vs. proactive 2006-3-10 Dept. of Computer Sci. & Eng. 39

Kazaa, Morpheus and New Gnutella Use SuperNodes dynamically assigned the task of servicing a small subpart of the peer network index and cache files contained in the part of the network they are assigned to have sufficient bandwidth and processing power Kazaa & Morpheus are proprietary systems Features More efficient than old Gnutella More robust than Napster 2006-3-10 Dept. of Computer Sci. & Eng. 40

Freenet: Loosely structured systems(1) Purely decentralized, loosely structured, selforganizing P2P Pools unused disk space in peer computers to create a collaborative virtual file system Provides file-storage service (vs. file-sharing service) Files are pushed to other nodes for storage, replication and persistence Provides both security and publisher anonymity Peer does not know the true address of retrieved files Peer may not know the content it holds 2006-3-10 Dept. of Computer Sci. & Eng. 41

Freenet: Loosely structured systems(2) Peers maintain local datastore: available to the network for reading and writing dynamic routing table: addresses of other nodes and the keys (file identifiers) they are thought to hold Files are identified by binary keys keyword-signed keys, signed-subspace keys and content-hash keys user sends a request message specifying the key and a timeout (hops-to-live) value 2006-3-10 Dept. of Computer Sci. & Eng. 42

Freenet: Loosely structured systems(3) A Messages always include ID (for loop detection) hops-to-live value source and destination and the following types: Data request. Additional field: Key Data reply. Additional field: Data Data failed. Additional fields: Location and reason Data insert. Additional fields: Key and data 2006-3-10 Dept. of Computer Sci. & Eng. 43

Freenet: Loosely structured systems(4) Each Freenet node maintains a common stack storing: ID: File identifier Next hop: Another node that stores this ID File: The file identified by the id, stored on the local node. 2006-3-10 Dept. of Computer Sci. & Eng. 44

Freenet: Loosely structured systems(5) Joining Freenet: finding the an existing peer and sending it message Inserting new file Calculates file key, sends an insert message to node with the key and a hop-to-live value When a node receives the message, it checks to see if the key is already taken Yes-> returns the pre-existing file as if a request were made for it No -> looks up the nearest key in its routing table, and forwards the insert message to the corresponding node 2006-3-10 Dept. of Computer Sci. & Eng. 45

Freenet: Loosely structured systems(6) Freenet uses a chain-mode discovery mechanism If a node receives a request for a file that it stores locally, the search stops and the data is forwarded back to the requestor If the node does not store the file, it forwards the request to one of its neighbors that is more likely to have the file (searching for the closest ID in the stack If the file is found at a certain node, a reply is passed back to the original node, and the data will be cached in all intermediate nodes 2006-3-10 Dept. of Computer Sci. & Eng. 46

Freenet: Loosely structured systems(7) The Freenet chain mode file discovery mechanism 2006-3-10 Dept. of Computer Sci. & Eng. 47

Freenet: Loosely structured systems(8) To keep data source anonymous, two further measures are taken : Any node along the reply path can change the reply message and claim to be the source of the data The hops-to-live counter is randomly initiated in order to obscure the distance from the originator 2006-3-10 Dept. of Computer Sci. & Eng. 48

Freenet: Loosely structured systems(9) Some further features Nodes tend to specialize in searching for similar keys over time, as they get queries from other nodes for similar keys Nodes store similar keys over time, due to the caching of files as a result of successful queries Similarity of keys does not reflect similarity of files Routing does not reflect underlying network topology 2006-3-10 Dept. of Computer Sci. & Eng. 49

Structured Network Chord CAN Tapestry Pastry OceanStore 2006-3-10 Dept. of Computer Sci. & Eng. 50

Documents Routing Structured P2P systems adopt documents routing mechanism to store/discover desirable files Routing Challenges Define a useful key nearness metric Keep the hop count small Keep the tables small Stay robust despite rapid change 2006-3-10 Dept. of Computer Sci. & Eng. 51

Distributed Hash Table Problem Given an id, map to a host Challenges Scalability: hundreds or thousands or millions machines Instability: routes, traffic, and availability of machines Heterogeneity: Latency:1ms-1000ms Bandwidth: 32kb/s-1Gb/s Presence: from 10s to a year Trust Selfish users; malicious users 2006-3-10 Dept. of Computer Sci. & Eng. 52

Chord: Overview Provides peer-to-peer hash lookup Lookup(key) ->IP The problems to solve How to locate a node? How to route lookups? How to maintain routing tables? 2006-3-10 Dept. of Computer Sci. & Eng. 53

Chord: Performance Efficiency O(log(N)) messages per lookup N is the total number of servers Scalability O(log(N)) state per node N is the total number of servers Robustness: survives massive failures 2006-3-10 Dept. of Computer Sci. & Eng. 54

Chord: IDs m bit identifier space for both keys and nodes Key identifier = SHA-1(key) Key= LetItBe SHA-1 ID=5 Node identifier = SHA-1(IP address) IP= 198.10.10.1 SHA-1 ID=105 Both are uniformly distributed How to map key IDs to node IDs? 2006-3-10 Dept. of Computer Sci. & Eng. 55

Chord: Basic Routing(1) IP= 198.10.10.1 K5 N105 K20 As nodes enter the network, they are assigned unique IDs by hashing their IP address N90 K80 Circular 7-bit ID space N32 A key is stored at its successor: node with next higher ID 2006-3-10 Dept. of Computer Sci. & Eng. 56

Chord: Basic Routing(2) N105 N120 N10 Where is key 80? N90 has K80 N32 K80 N90 N60 Every node knows its successor in the ring 2006-3-10 Dept. of Computer Sci. & Eng. 57

Chord: Basic Routing(3) The Lookup algorithm Lookup(my-id, key-id) n = my successor if my-id <n <key-id call Lookup(id) on node n // next hop else return my successor // done Correctness depends only on successors 2006-3-10 Dept. of Computer Sci. & Eng. 58

Chord: Basic Routing(4) Routing information maintenance When a node n joins the network, certain keys previously assigned to n's successor will become assigned to n When node n leaves the network, all keys assigned to it will be reassigned to its successor 2006-3-10 Dept. of Computer Sci. & Eng. 59

Chord: Finger-table based routing(1) Finger table (FT): With m additional entry The i-th entry points to the successor of node n+ 2 i To lookup key k at node n In FT, identify the highest node n' whose id is between n and k. If such a node exists, the lookup is repeated starting from n' Otherwise, the successor of n is returned 2006-3-10 Dept. of Computer Sci. & Eng. 60

Chord: Finger-table based routing(2) N112 N96 80+2 4 80+2 5 M=7 N16 80+2 6 80+2 3 80+2 2 80+2 1 80+2 0 N80 2006-3-10 Dept. of Computer Sci. & Eng. 61

Chord: Finger-table based routing(3) 112 ¼ N120 ½ 1/8 1/16 1/32 1/64 1/128 N80 2006-3-10 Dept. of Computer Sci. & Eng. 62

Chord: Finger-table based routing(4) Algorithm Lookup(my-id, key-id) look in local finger table for highest node n s.t. my-id < n <key-id if n exists call Lookup(id) on node n // next hop else return my successor // done 2006-3-10 Dept. of Computer Sci. & Eng. 63

Chord: Finger-table based routing(5) An example N32: N60, N80, N99 N99: N110, N5, N60 N5 : N10, N20, N32, N60, N80 N10: N20, N32, N60 N80 N20: N32, N60, N99 N99 N110 N80 N5 N60 N10 K19 N20 N32 Lookup(K19) 2006-3-10 Dept. of Computer Sci. & Eng. 64

Chord: A New Node Joining(1) N36 N25 1. Lookup(36) N40 K30 K38 2006-3-10 Dept. of Computer Sci. & Eng. 65

Chord: A New Node Joining(2) N25 2. N36 sets its own successor pointer N40 K30 K38 N36 2006-3-10 Dept. of Computer Sci. & Eng. 66

Chord: A New Node Joining(3) N25 3. Copy keys 26..36 from N40 to N36 N36 K30 N40 K38 2006-3-10 Dept. of Computer Sci. & Eng. 67

Chord: A New Node Joining(4) Update finger pointers in the background N25 Correct successors produce correct lookups 4. Set N25 s successor pointer N36 K30 N40 K38 2006-3-10 Dept. of Computer Sci. & Eng. 68

Chord: Finger table update(1) Assume an identifier space 0..7 (m=3) 7 0 1 Succ. Table i id+2 i succ 0 2 1 1 3 1 2 5 1 Node n1:(1) joins all entries in its finger table are initialized to itself 6 5 3 2 4 2006-3-10 Dept. of Computer Sci. & Eng. 69

Chord: Finger table update(2) 7 0 1 Succ. Table i id+2 i succ 0 2 2 1 3 1 2 5 1 6 2 Node n2:(2) joins Succ. Table 5 4 3 i id+2 i succ 0 3 1 1 4 1 2 6 1 2006-3-10 Dept. of Computer Sci. & Eng. 70

Chord: Finger table update(3) Succ. Table Nodes n3:(0), n4:(6) join i id+2 i succ 0 1 1 1 2 2 2 4 6 Succ. Table Succ. Table 7 0 1 i id+2 i succ 0 2 2 1 3 6 2 5 6 i id+2 i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table 5 4 3 i id+2 i succ 0 3 6 1 4 6 2 6 6 2006-3-10 Dept. of Computer Sci. & Eng. 71

Chord: Nodes Failure N80 cannot find correct successor, so incorrect lookup N80: N85, N102, N120, N10 N102 N110 N120 N10 N85 Lookup(90) N80 2006-3-10 Dept. of Computer Sci. & Eng. 72

Chord: Successor lists Each node knows r immediate successors After failure, will know first live successor Correct successors guarantee correct lookups Guarantee is with some probability 2006-3-10 Dept. of Computer Sci. & Eng. 73

Chord: successors list length Assume every node fails with prob. ½ P(successor list all dead) = (1/2) r P(this node breaks the Chord ring) Depends on independent failure P(no broken nodes) = (1 (1/2) r ) N r = 2log(N) makes prob. = 1 1/N 2006-3-10 Dept. of Computer Sci. & Eng. 74

Chord: Lookup with fault tolerance Lookup(my-id, key-id) look in local finger table and successor-list for highest node n s.t. my-id < n < key-id if n exists call Lookup(id) on node n // next hop if call failed, remove n from finger table return Lookup(my-id, key-id) else return my successor // done 2006-3-10 Dept. of Computer Sci. & Eng. 75

Chord: Discussion Network proximity (consider latency?) Protocol security Malicious data insertion Malicious Chord table information Keyword search and indexing... 2006-3-10 Dept. of Computer Sci. & Eng. 76

CAN: Content-Addressable Network Internet scale hash table Interface insert(key, value) value = retrieve(key) Properties scalable operationally simple good performance Related systems: Chord/Pastry/Tapestry/Buzz/Plaxton... 2006-3-10 Dept. of Computer Sci. & Eng. 77

CAN: Problem Scope Design a system that provides the interface scalability robustness performance security Application-specific, higher level primitives keyword searching anonymity 2006-3-10 Dept. of Computer Sci. & Eng. 78

CAN: Basic idea(1) A distributed, internet-scale hash table that maps file names to their location in the network Each CAN node stores a part (called a zone ) of the hash table, as well as information about a small number of adjacent zones in the table Requests to insert, lookup or delete for a particular key are routed via intermediate nodes to the node that maintains the zone containing the key 2006-3-10 Dept. of Computer Sci. & Eng. 79

CAN: Basic idea(2) K V K V K V K V K V K V K V K V K V K V K V 2006-3-10 Dept. of Computer Sci. & Eng. 80

CAN: Basic idea(2) K V K V K V K V K V K V K V K V insert (K 1,V 1 ) K V K V K V 2006-3-10 Dept. of Computer Sci. & Eng. 81

CAN: Basic idea(2) K V K V K V K V K V K V K V K V insert (K 1,V 1 ) K V K V K V 2006-3-10 Dept. of Computer Sci. & Eng. 82

CAN: Basic idea(2) (K 1,V 1 ) K V K V K V K V K V K V K V K V K V K V K V 2006-3-10 Dept. of Computer Sci. & Eng. 83

CAN: Basic idea(2) K V K V K V K V K V K V K V K V K V K V K V retrieve (K 1 ) 2006-3-10 Dept. of Computer Sci. & Eng. 84

CAN: Solution virtual Cartesian coordinate space entire space is partitioned amongst all the nodes every node owns a zone in the overall space abstraction can store data at points in the space can route from one point to another point = node that owns the enclosing zone Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address) 2006-3-10 Dept. of Computer Sci. & Eng. 85

CAN: A Simple Example 1 2006-3-10 Dept. of Computer Sci. & Eng. 86

CAN: A Simple Example 1 2 2006-3-10 Dept. of Computer Sci. & Eng. 87

CAN: A Simple Example 1 3 2 2006-3-10 Dept. of Computer Sci. & Eng. 88

CAN: A Simple Example 1 3 2 4 2006-3-10 Dept. of Computer Sci. & Eng. 89

CAN: A Simple Example 2006-3-10 Dept. of Computer Sci. & Eng. 90

CAN: A Simple Example I 2006-3-10 Dept. of Computer Sci. & Eng. 91

CAN: A Simple Example node I::insert(K,V) I 2006-3-10 Dept. of Computer Sci. & Eng. 92

CAN: A Simple Example node I::insert(K,V) (1) a = h x (K) I x = a 2006-3-10 Dept. of Computer Sci. & Eng. 93

CAN: A Simple Example node I::insert(K,V) (1) a = h x (K) b = h y (K) I y = b x = a 2006-3-10 Dept. of Computer Sci. & Eng. 94

CAN: A Simple Example node I::insert(K,V) (1) a = h x (K) b = h y (K) I 2006-3-10 Dept. of Computer Sci. & Eng. 95

CAN: A Simple Example node I::insert(K,V) (1) a = h x (K) b = h y (K) I (2) route(k,v) -> (a,b) (K,V) (3) (a,b) stores (K,V) 2006-3-10 Dept. of Computer Sci. & Eng. 96

CAN: A Simple Example node J::retrieve(K) (1) a = h x (K) b = h y (K) (2) route retrieve(k) to (a,b) (K,V) J 2006-3-10 Dept. of Computer Sci. & Eng. 97

CAN: Routing A node only maintains state for its immediate neighboring nodes 2006-3-10 Dept. of Computer Sci. & Eng. 98

CAN: Routing A node only maintains state for its immediate neighboring nodes (x,y) (a,b) 2006-3-10 Dept. of Computer Sci. & Eng. 99

CAN: node insertion The new node identifies a node already existing in CAN, using some bootrstrap mechanism Using the CAN routing mechanism, it randomly chooses a point P in the space and sends a JOIN request to the node whose zone contains P The zone will be split, and half will be assigned to the new node The new node learns the IP addresses of its neighbors, and the neighbors of the split zone are notified so that routing can include the new node. 2006-3-10 Dept. of Computer Sci. & Eng. 100

CAN: node insertion Bootstrap node new node 1) Discover some node I already in CAN 2006-3-10 Dept. of Computer Sci. & Eng. 101

CAN: node insertion I new node 1) discover some node I already in CAN 2006-3-10 Dept. of Computer Sci. & Eng. 102

CAN: node insertion (p,q) 2) pick random point in space I new node 2006-3-10 Dept. of Computer Sci. & Eng. 103

CAN: node insertion (p,q) J I new node 3) I routes to (p,q), discovers node J 2006-3-10 Dept. of Computer Sci. & Eng. 104

CAN: node insertion Inserting a new node affects only a single other node and its immediate neighbors J new 4) split J s zone in half new owns one half 2006-3-10 Dept. of Computer Sci. & Eng. 105

CAN: Node Failures Need to repair the space recover database soft-state updates use replication, rebuild database from replicas repair routing takeover algorithm Only the failed node s immediate neighbors are required for recovery 2006-3-10 Dept. of Computer Sci. & Eng. 106

CAN: Node Failures Simple failures know your neighbor s neighbors when a node fails, one of its neighbors takes over its zone More complex failure modes simultaneous failure of multiple adjacent nodes scoped flooding to discover neighbors hopefully, a rare event 2006-3-10 Dept. of Computer Sci. & Eng. 107

CAN: Evaluation Scalability For a uniformly partitioned space with n nodes and d dimensions per node, number of neighbors is 2d average routing path is (dn 1/d )/4 hops simulations show that the above results hold in practice Can scale the network without increasing pernode state 2006-3-10 Dept. of Computer Sci. & Eng. 108

CAN: Evaluation Latency Problem latency stretch = (CAN routing delay) (IP routing delay) application-level routing may lead to high stretch Solution increase dimensions heuristics RTT(round-trip-time)-weighted routing multiple nodes per zone (peer nodes) deterministically replicate entries 2006-3-10 Dept. of Computer Sci. & Eng. 109

CAN: Evaluation Dealing with hot-spots popular (key,value) pairs nodes cache recently requested entries overloaded node replicates popular entries at neighbors Uniform coordinate space partitioning uniformly spread (key,value) entries uniformly spread out routing load 2006-3-10 Dept. of Computer Sci. & Eng. 110

CAN: Evaluation Robustness Completely distributed no single point of failure Resilience of routing can route around trouble 2006-3-10 Dept. of Computer Sci. & Eng. 111

CAN: Improvements Use of multi-dimensional coordinate space for improving network latency and fault tolerance with a small routing table size overhead Use of multiple coordinate spaces (realities) for fault tolerance Better routing metrics, by taking into account the underlying IP topology and connection latency alongside the Cartesian distance between source and destination Overloading coordinate zones by allowing multiple nodes to share the same zone for improved fault tolerance, reduced per-hop latency and path length Use of multiple hash functions to map the same key onto different points in the coordinate space for replication Topologically-sensitive network construction, assuming the existence of a set of machines that act as landmarks on the internet Use of caching and replication techniques etc. 2006-3-10 Dept. of Computer Sci. & Eng. 112

CAN: Summary CAN an Internet-scale hash table potential building block in Internet applications Scalability O(d) per-node state Low-latency routing simple heuristics help a lot Robust decentralized, can route around trouble 2006-3-10 Dept. of Computer Sci. & Eng. 113

Combination of Chord & CAN C&C(or C 2 ) means Chord and CAN It is a overlay network that combines Chord and CAN s structure characteristics It is like CAN except that each dimension is a ring similar to Chord C&C surpass CAN in both routing efficiency and fault tolerance C 2 was first published in GCC2003 and an extended version appears in the International Journal of High Performance Computing and Networking, vol. 3, no. 4, pages 248-261, 2005 2006-3-10 Dept. of Computer Sci. & Eng. 114

Pastry: Introduction A scalable,distributed application-level object location and routing substrate for wide-area peer-to-peer applications in a potentially very large overlay network of nodes connected via the Internet Pastry takes into account network locality Scalability: up to 100,000 nodes Applications: global file sharing, file storage, group communication and naming systems 2006-3-10 Dept. of Computer Sci. & Eng. 115

Pastry Design Node identification space: 128-bit node identifier (nodeid) NodeId is used to indicate a node s position in a circular nodeid space, which ranges from 0 to 2 128-1 For a network with N node can route to the numerically closest node to a given key in less than [log 2 bn] delivery is guaranteed unless [ L /2] nodes with adjacent nodeids fail simultaneously L and b are configuration parameters 2006-3-10 Dept. of Computer Sci. & Eng. 116

Pastry Routing Routing idea: routes messages to the node whose nodeid is numerically closest to the given key. In each routing step a node normally forwards the message to a node whose nodeid shares with the key a prefix that is at least one digit (or b bits) longer than the prefix that the key shares with the present node s id If no such node is known, the message is forwarded to a node whose nodeid shares a prefix with the key as long as the current node, but is numerically closer to the key than the present node s id 2006-3-10 Dept. of Computer Sci. & Eng. 117

Pastry Node State Each Pastry node maintains a routing table, a neighborhood set and a leaf set. A Routing table has [log 2 bn] rows with 2 b -1 entries each The 2 b -1 entries at row n of the routing table each refer to a node whose nodeid shares the present node s nodeid in the first n digits, but whose n + 1th digit has one of the 2 b -1 possible values other than the n + 1th digit in the present node s id Each entry in the routing table contains the IP address of one of potentially many nodes whose nodeid have the appropriate prefix 2006-3-10 Dept. of Computer Sci. & Eng. 118

Pastry Node State(cont d) The neighborhood set M contains the nodeids and IP addresses of the M nodes that are closest (according the proximity metric) to the local node useful in maintaining locality properties The leaf set L is the set of nodes with the L /2 numerically closest larger nodeids, and the L /2 nodes with numerically closest smaller nodeids, relative to the present node s nodeid 2006-3-10 Dept. of Computer Sci. & Eng. 119

An Example State node with nodeid= 10233102; b = 2; L = 8. 2006-3-10 Dept. of Computer Sci. & Eng. 120

Routing Algorithm Some notations R li : the entry in the routing table R at column i, 0 i < 2 b and row l, 0 l <[log 2 bn] L i : the i-th closest nodeid in the leaf set L, - [L/2] I [L/2], where negative/positive indices indicate nodeids smaller/larger than the present nodeid, respectively D l : the value of the l s digit in the key D shl(a;b): the length of the prefix shared among A and B, in digits 2006-3-10 Dept. of Computer Sci. & Eng. 121

Routing Algorithm(cont d) For a message key D arrived node A 2006-3-10 Dept. of Computer Sci. & Eng. 122

Pastry APIs nodeid = pastryinit(credentials, Application) route(msg,key) deliver(msg,key) forward(msg,key,nextid) newleafs(leafset) 2006-3-10 Dept. of Computer Sci. & Eng. 123

Locality Pastry s notion of network proximity is based on a scalar proximity metric the number of IP routing hops, or geographic distance It is assumed that the application provides a function that allows each Pastry node to determine the distance of a node with a given IP address to itself. 2006-3-10 Dept. of Computer Sci. & Eng. 124

Summary 2006-3-10 Dept. of Computer Sci. & Eng. 125

Unstructured P2P(1) Advantages Decentralization Autonomy Keyword searching Disadvantage: scalability When use unstructured P2P Keyword searching is the common operation. Most content is typically replicated at a fair fraction of participating sites. The node population is highly transient. 2006-3-10 Dept. of Computer Sci. & Eng. 126

Unstructured P2P(2) Open Problems Improving scalability Intelligent routing algorithms Workload balancing Security & privacy 2006-3-10 Dept. of Computer Sci. & Eng. 127

Structured P2P Advantages Scalability Efficiency Disadvantages Exact-matching Locality-destroyed 2006-3-10 Dept. of Computer Sci. & Eng. 128

References(1) 1. A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, 2001. 2. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. ACM SIGCOMM 2001. 3. ZHAO, B.,KUBIATOWICZ, J., JOSEPH, A. 2001. Tapestry: An Infrastructure for Fault-Toleran Wide-area Location and Routing. Computer Science Division, University of California, Berkeley Technical Report No.UCB/CSD-01-1141, April 2001. 4. DRUSCHEL P. AND ROWSTRON, A. 2001. PAST: A Large-Scale, Persistent Peer-to-Peer Storage Utility, HotOS VIII, Schloss Elmau, Germany, May 2001. 2006-3-10 Dept. of Computer Sci. & Eng. 129

References(2) 5. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. Proc. ACM SIGCOMM (2001) 6. D. S. Milojicic, V. Kalogeraki, R. Lukose, K. Nagaraja1, J. Pruyne, B. Richard, S. Rollins, Z. Xu. Peer-to-Peer Computing. Technical Report HPL-2002-57, HP Labs, March 2002. 7. S. Androutsellis-Theotokis. A survey of peer-to-peer file sharing technologies. White paper of ELTRUN, Athens University of Economics and Business, Greece. 2002 8. Andy Oram (ed.). Peer-to-Peer: Harnessing the Benefits of a Disruptive Technology. O REILLY, 2001 9. Gnutella website. http://www.gnutella.com 10. Freenet homepage. http://freenet.sourceforge.com/ 11. Napster homepage. http://www.napster.com/ 12. BestPeer homepage. http://xena1.ddns.comp.nus.sg/p2p/ 2006-3-10 Dept. of Computer Sci. & Eng. 130