EE 122: Peer-to-Peer (P2P) Networks Ion Stoica November 27, 22
How Did it Start? A killer application: Naptser - Free music over the Internet Key idea: share the storage and bandwidth of individual (home) users Internet istoica@cs.berkeley.edu 2
Model Each user stores a subset of files Each user has access (can download) files from all users in the system istoica@cs.berkeley.edu 3
Main Challenge Find where a particular file is stored F E D E? A B C istoica@cs.berkeley.edu 4
Other Challenges Scale: up to hundred of thousands or millions of machines Dynamicity: machines can come and go any time istoica@cs.berkeley.edu 5
Napster Assume a centralized index system that maps files (songs) to machines that are alive How to find a file (song) - Query the index system return a machine that stores the required file Ideally this is the closest/least-loaded machine - ftp the file Advantages: - Simplicity, easy to implement sophisticated search engines on top of the index system Disadvantages: - Robustness, scalability (?) istoica@cs.berkeley.edu 6
Napster: Example m5 m6 E F E? E E? m5 m1 A m2 B m3 C m4 D m5 E m6 F m4 D m1 A m2 B m3 C istoica@cs.berkeley.edu 7
Gnutella Distribute file location Idea: multicast the request Hot to find a file: - Send request to all neighbors - Neighbors recursively multicast the request - Eventually a machine that has the file receives the request, and it sends back the answer Advantages: - Totally decentralized, highly robust Disadvantages: - Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL) istoica@cs.berkeley.edu 8
Gnutella: Example Assume: m1 s neighbors are m2 and m3; m3 s neighbors are m4 and m5; m5 m6 E F E E? E? m4 D E? E? m1 A m2 B m3 C istoica@cs.berkeley.edu 9
FastTrack Use the concept of supernode A combination between Napster and Gnutella When a user joins the network it joins a supernode A supernode acts like Napster server for all users connected to it Queries are brodcasted amongst supernodes (like Gnutella) istoica@cs.berkeley.edu 1
Other Solutions to the Location Problem Goal: make sure that an item (file) identified is always found Abstraction: a distributed hash-table data structure - insert(id, item); - item = query(id); - Note: item can be anything: a data object, document, file, pointer to a file Proposals (also called structured p2p networks) - CAN (ACIRI/Berkeley) - Chord (MIT/Berkeley) - Pastry (Rice) - Tapestry (Berkeley) istoica@cs.berkeley.edu 11
Content Addressable Network (CAN) Associate to each node and item a unique id in an d-dimensional space Properties - Routing table size O(d) - Guarantee that a file is found in at most d*n 1/d steps, where n is the total number of nodes istoica@cs.berkeley.edu 12
CAN Example: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: - Assume space size (8 x 8) - Node n1:(1, 2) first node that joins cover the entire space 7 6 5 4 3 2 n1 1 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 13
CAN Example: Two Dimensional Space Node n2:(4, 2) joins space is divided between n1 and n2 7 6 5 4 3 2 1 n1 n2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 14
CAN Example: Two Dimensional Space Node n2:(4, 2) joins space is divided between n1 and n2 7 6 5 n3 4 3 2 1 n1 n2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 15
CAN Example: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join 7 6 5 n3 n4 n5 4 3 2 1 n1 n2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 16
CAN Example: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 5 n3 n4 n5 f4 4 3 2 1 n1 f1 f3 n2 f2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 17
CAN Example: Two Dimensional Space Each item is stored by the node who owns its mapping in the space 7 6 5 n3 n4 n5 f4 4 3 2 1 n1 f1 f3 n2 f2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 18
CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 7 6 5 n3 n4 n5 f4 4 3 2 1 n1 f1 f3 n2 f2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 19
Chord Associate to each node and item a unique id in an uni-dimensional space Properties - Routing table size O(log(N)), where N is the total number of nodes - Guarantees that a file is found in O(log(N)) steps istoica@cs.berkeley.edu 2
Data Structure Assume identifier space is..2 m Each node maintains - Finger table Entry i in the finger table of n is the first node that succeeds or equals n + 2 i - Predecessor node An item identified by id is stored on the succesor node of id istoica@cs.berkeley.edu 21
Chord Example Assume an identifier space..8 Node n1:(1) joins all entries in its finger table are initialized to itself 7 1 Succ. Table i id+2 i succ 2 1 1 3 1 2 5 1 6 2 5 4 3 istoica@cs.berkeley.edu 22
Chord Example Node n2:(3) joins Succ. Table 7 1 i id+2 i succ 2 2 1 3 1 2 5 1 6 2 Succ. Table 5 4 3 i id+2 i succ 3 1 1 4 1 2 6 1 istoica@cs.berkeley.edu 23
Chord Example Nodes n3:(), n4:(6) join Succ. Table i id+2 i succ 1 1 1 2 2 2 4 6 Succ. Table Succ. Table 7 1 i id+2 i succ 2 2 1 3 6 2 5 6 i id+2 i succ 7 1 2 2 2 6 2 Succ. Table 5 4 3 i id+2 i succ 3 6 1 4 6 2 6 6 istoica@cs.berkeley.edu 24
Chord Examples Nodes: n1:(1), n2(3), n3(), n4(6) Items: f1:(7), f2:(2) Succ. Table i id+2 i succ 1 1 1 2 2 2 4 6 Items 7 1 Succ. Table 7 i id+2 i succ 2 2 1 3 6 2 5 6 Items 1 Succ. Table 6 2 i id+2 i succ 7 1 2 2 2 5 4 3 Succ. Table i id+2 i succ 3 6 1 4 6 2 6 6 istoica@cs.berkeley.edu 25
Query Upon receiving a query for item id, node n Check whether the item is stored at the successor node s, i.e., - id belongs to (n, s) If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table i id+2 i succ 1 1 1 2 2 2 4 6 1 Items 7 Succ. Table 7 i id+2 i succ 2 2 query(7) 1 3 6 2 5 6 Items 1 Succ. Table 6 2 i id+2 i succ 7 1 2 2 2 5 4 3 Succ. Table i id+2 i succ 3 6 1 4 6 2 6 6 istoica@cs.berkeley.edu 26
Discussion Query can be implemented - Iteratively - Recursively Performance: routing in the overlay network can be more expensive than in the underlying network - Because usually there is no correlation between node ids and their locality; a query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe! - Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest one in terms of network distance istoica@cs.berkeley.edu 27
Discussion (cont d) Gnutella, Napster, Fastrack can resolve powerful queries, e.g., - Keyword searching, approximate matching Natively, CAN, Chord, Pastry and Tapestry support only exact matching - On-going work to support more powerful queries istoica@cs.berkeley.edu 28
Discussion Robustness - Maintain multiple copies associated to each entry in the routing tables - Replicate an item on nodes with close ids in the identifier space Security - Can be build on top of CAN, Chord, Tapestry, and Pastry istoica@cs.berkeley.edu 29
Conclusions The key challenge of building wide area P2P systems is a scalable and robust location service Solutions covered in this lecture - Naptser: centralized location service - Gnutella: broadcast-based decentralized location service - Freenet: intelligent-routing decentralized solution (but correctness not guaranteed; queries for existing items may fail) - CAN, Chord, Tapestry, Pastry: intelligent-routing decentralized solution Guarantee correctness Tapestry (Pastry?) provide efficient routing, but more complex istoica@cs.berkeley.edu 3