EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Similar documents
EE 122: Peer-to-Peer Networks

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Main Challenge. Other Challenges. How Did it Start? Napster. Model. EE 122: Peer-to-Peer Networks. Find where a particular file is stored

CIS 700/005 Networking Meets Databases

CS 268: DHTs. Page 1. How Did it Start? Model. Main Challenge. Napster. Other Challenges

15-744: Computer Networking P2P/DHT

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

CS 3516: Advanced Computer Networks

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

Page 1. P2P Traffic" P2P Traffic" Today, around 18-20% (North America)! Big chunk now is video entertainment (e.g., Netflix, itunes)!

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

A Survey of Peer-to-Peer Content Distribution Technologies

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Peer-to-Peer (P2P) Systems

0!1. Overlaying mechanism is called tunneling. Overlay Network Nodes. ATM links can be the physical layer for IP

CSCI-1680 P2P Rodrigo Fonseca

Page 1. Key Value Storage"

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

Peer-to-Peer Protocols and Systems. TA: David Murray Spring /19/2006

CDNs and Peer-to-Peer

Peer-to-Peer Internet Applications: A Review

Content Overlays. Nick Feamster CS 7260 March 12, 2007

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

An Expresway over Chord in Peer-to-Peer Systems

Peer-to-Peer Systems. Chapter General Characteristics

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Architectures for Distributed Systems

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

Telematics Chapter 9: Peer-to-Peer Networks

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Introduction to P2P Computing

Handling Churn in a DHT

Unit 8 Peer-to-Peer Networking

Badri Nath Rutgers University

L3S Research Center, University of Hannover

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

Distributed Systems. peer-to-peer Johan Montelius ID2201. Distributed Systems ID2201

INF5070 media storage and distribution systems. to-peer Systems 10/

INF5071 Performance in distributed systems: Distribution Part III

An Adaptive Stabilization Framework for DHT

P2P: Distributed Hash Tables

Early Measurements of a Cluster-based Architecture for P2P Systems

Peer-to-peer systems and overlay networks

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Naming. Naming. Naming versus Locating Entities. Flat Name-to-Address in a LAN

Distributed Hash Tables: Chord

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

CS 347 Parallel and Distributed Data Processing

Ossification of the Internet

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

Hierarchical peer-to-peer look-up service. Prototype implementation

Introduction to Peer-to-Peer Systems

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Distributed Hash Table

Distributed Hash Tables Chord and Dynamo

Special Topics: CSci 8980 Edge History

Lecture 8: Application Layer P2P Applications and DHTs

Peer-peer and Application-level Networking. CS 218 Fall 2003

Scaling Problem Computer Networking. Lecture 23: Peer-Peer Systems. Fall P2P System. Why p2p?

CS514: Intermediate Course in Computer Systems

Advanced Distributed Systems. Peer to peer systems. Reference. Reference. What is P2P? Unstructured P2P Systems Structured P2P Systems

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Design and Implementation of a Distributed Object Storage System on Peer Nodes. Roger Kilchenmann. Diplomarbeit Von. aus Zürich

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Information Retrieval in Peer to Peer Systems. Sharif University of Technology. Fall Dr Hassan Abolhassani. Author: Seyyed Mohsen Jamali

Searching for Shared Resources: DHT in General

Structured P2P. Complexity O(log N)

Searching for Shared Resources: DHT in General

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

Addressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P?

Distributed lookup services

DATA. The main challenge in P2P computing is to design and implement LOOKING UP. in P2P Systems

Part 1: Introducing DHTs

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

Making Gnutella-like P2P Systems Scalable

Peer-to-Peer Signalling. Agenda

Distributed Information Processing

Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Distributed Hash Tables

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Advanced Computer Networks

CSCI-1680 Web Performance, Content Distribution P2P John Jannotti

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Chapter 6 PEER-TO-PEER COMPUTING

Outline A Hierarchical P2P Architecture and an Efficient Flooding Algorithm

CSCI-1680 Web Performance, Content Distribution P2P Rodrigo Fonseca

Transcription:

EE 122: Peer-to-Peer (P2P) Networks Ion Stoica November 27, 22

How Did it Start? A killer application: Naptser - Free music over the Internet Key idea: share the storage and bandwidth of individual (home) users Internet istoica@cs.berkeley.edu 2

Model Each user stores a subset of files Each user has access (can download) files from all users in the system istoica@cs.berkeley.edu 3

Main Challenge Find where a particular file is stored F E D E? A B C istoica@cs.berkeley.edu 4

Other Challenges Scale: up to hundred of thousands or millions of machines Dynamicity: machines can come and go any time istoica@cs.berkeley.edu 5

Napster Assume a centralized index system that maps files (songs) to machines that are alive How to find a file (song) - Query the index system return a machine that stores the required file Ideally this is the closest/least-loaded machine - ftp the file Advantages: - Simplicity, easy to implement sophisticated search engines on top of the index system Disadvantages: - Robustness, scalability (?) istoica@cs.berkeley.edu 6

Napster: Example m5 m6 E F E? E E? m5 m1 A m2 B m3 C m4 D m5 E m6 F m4 D m1 A m2 B m3 C istoica@cs.berkeley.edu 7

Gnutella Distribute file location Idea: multicast the request Hot to find a file: - Send request to all neighbors - Neighbors recursively multicast the request - Eventually a machine that has the file receives the request, and it sends back the answer Advantages: - Totally decentralized, highly robust Disadvantages: - Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL) istoica@cs.berkeley.edu 8

Gnutella: Example Assume: m1 s neighbors are m2 and m3; m3 s neighbors are m4 and m5; m5 m6 E F E E? E? m4 D E? E? m1 A m2 B m3 C istoica@cs.berkeley.edu 9

FastTrack Use the concept of supernode A combination between Napster and Gnutella When a user joins the network it joins a supernode A supernode acts like Napster server for all users connected to it Queries are brodcasted amongst supernodes (like Gnutella) istoica@cs.berkeley.edu 1

Other Solutions to the Location Problem Goal: make sure that an item (file) identified is always found Abstraction: a distributed hash-table data structure - insert(id, item); - item = query(id); - Note: item can be anything: a data object, document, file, pointer to a file Proposals (also called structured p2p networks) - CAN (ACIRI/Berkeley) - Chord (MIT/Berkeley) - Pastry (Rice) - Tapestry (Berkeley) istoica@cs.berkeley.edu 11

Content Addressable Network (CAN) Associate to each node and item a unique id in an d-dimensional space Properties - Routing table size O(d) - Guarantee that a file is found in at most d*n 1/d steps, where n is the total number of nodes istoica@cs.berkeley.edu 12

CAN Example: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: - Assume space size (8 x 8) - Node n1:(1, 2) first node that joins cover the entire space 7 6 5 4 3 2 n1 1 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 13

CAN Example: Two Dimensional Space Node n2:(4, 2) joins space is divided between n1 and n2 7 6 5 4 3 2 1 n1 n2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 14

CAN Example: Two Dimensional Space Node n2:(4, 2) joins space is divided between n1 and n2 7 6 5 n3 4 3 2 1 n1 n2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 15

CAN Example: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join 7 6 5 n3 n4 n5 4 3 2 1 n1 n2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 16

CAN Example: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 5 n3 n4 n5 f4 4 3 2 1 n1 f1 f3 n2 f2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 17

CAN Example: Two Dimensional Space Each item is stored by the node who owns its mapping in the space 7 6 5 n3 n4 n5 f4 4 3 2 1 n1 f1 f3 n2 f2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 18

CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 7 6 5 n3 n4 n5 f4 4 3 2 1 n1 f1 f3 n2 f2 1 2 3 4 5 6 7 istoica@cs.berkeley.edu 19

Chord Associate to each node and item a unique id in an uni-dimensional space Properties - Routing table size O(log(N)), where N is the total number of nodes - Guarantees that a file is found in O(log(N)) steps istoica@cs.berkeley.edu 2

Data Structure Assume identifier space is..2 m Each node maintains - Finger table Entry i in the finger table of n is the first node that succeeds or equals n + 2 i - Predecessor node An item identified by id is stored on the succesor node of id istoica@cs.berkeley.edu 21

Chord Example Assume an identifier space..8 Node n1:(1) joins all entries in its finger table are initialized to itself 7 1 Succ. Table i id+2 i succ 2 1 1 3 1 2 5 1 6 2 5 4 3 istoica@cs.berkeley.edu 22

Chord Example Node n2:(3) joins Succ. Table 7 1 i id+2 i succ 2 2 1 3 1 2 5 1 6 2 Succ. Table 5 4 3 i id+2 i succ 3 1 1 4 1 2 6 1 istoica@cs.berkeley.edu 23

Chord Example Nodes n3:(), n4:(6) join Succ. Table i id+2 i succ 1 1 1 2 2 2 4 6 Succ. Table Succ. Table 7 1 i id+2 i succ 2 2 1 3 6 2 5 6 i id+2 i succ 7 1 2 2 2 6 2 Succ. Table 5 4 3 i id+2 i succ 3 6 1 4 6 2 6 6 istoica@cs.berkeley.edu 24

Chord Examples Nodes: n1:(1), n2(3), n3(), n4(6) Items: f1:(7), f2:(2) Succ. Table i id+2 i succ 1 1 1 2 2 2 4 6 Items 7 1 Succ. Table 7 i id+2 i succ 2 2 1 3 6 2 5 6 Items 1 Succ. Table 6 2 i id+2 i succ 7 1 2 2 2 5 4 3 Succ. Table i id+2 i succ 3 6 1 4 6 2 6 6 istoica@cs.berkeley.edu 25

Query Upon receiving a query for item id, node n Check whether the item is stored at the successor node s, i.e., - id belongs to (n, s) If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table i id+2 i succ 1 1 1 2 2 2 4 6 1 Items 7 Succ. Table 7 i id+2 i succ 2 2 query(7) 1 3 6 2 5 6 Items 1 Succ. Table 6 2 i id+2 i succ 7 1 2 2 2 5 4 3 Succ. Table i id+2 i succ 3 6 1 4 6 2 6 6 istoica@cs.berkeley.edu 26

Discussion Query can be implemented - Iteratively - Recursively Performance: routing in the overlay network can be more expensive than in the underlying network - Because usually there is no correlation between node ids and their locality; a query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe! - Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest one in terms of network distance istoica@cs.berkeley.edu 27

Discussion (cont d) Gnutella, Napster, Fastrack can resolve powerful queries, e.g., - Keyword searching, approximate matching Natively, CAN, Chord, Pastry and Tapestry support only exact matching - On-going work to support more powerful queries istoica@cs.berkeley.edu 28

Discussion Robustness - Maintain multiple copies associated to each entry in the routing tables - Replicate an item on nodes with close ids in the identifier space Security - Can be build on top of CAN, Chord, Tapestry, and Pastry istoica@cs.berkeley.edu 29

Conclusions The key challenge of building wide area P2P systems is a scalable and robust location service Solutions covered in this lecture - Naptser: centralized location service - Gnutella: broadcast-based decentralized location service - Freenet: intelligent-routing decentralized solution (but correctness not guaranteed; queries for existing items may fail) - CAN, Chord, Tapestry, Pastry: intelligent-routing decentralized solution Guarantee correctness Tapestry (Pastry?) provide efficient routing, but more complex istoica@cs.berkeley.edu 3