Introduction to Peer-to-Peer Systems

Similar documents
Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Peer-to-peer systems and overlay networks

Introduction to P2P Computing

Content Overlays. Nick Feamster CS 7260 March 12, 2007

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Peer-to-Peer Systems. Chapter General Characteristics

Architectures for Distributed Systems

Telematics Chapter 9: Peer-to-Peer Networks

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

A Survey of Peer-to-Peer Content Distribution Technologies

Ossification of the Internet

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Making Gnutella-like P2P Systems Scalable

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla

Advanced Distributed Systems. Peer to peer systems. Reference. Reference. What is P2P? Unstructured P2P Systems Structured P2P Systems

Peer-to-Peer Systems and Distributed Hash Tables

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Peer to Peer Networks

Distributed Hash Tables: Chord

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Distributed Knowledge Organization and Peer-to-Peer Networks

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

Introduction on Peer to Peer systems

Lecture 8: Application Layer P2P Applications and DHTs

CIS 700/005 Networking Meets Databases

Peer to Peer Networks

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Agent and Object Technology Lab Dipartimento di Ingegneria dell Informazione Università degli Studi di Parma. Distributed and Agent Systems

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

Assignment 5. Georgia Koloniari

Unit 8 Peer-to-Peer Networking

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

L3S Research Center, University of Hannover

Overlay networks. T o do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. q q q. Turtles all the way down

Distributed Hash Tables

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Peer-to-Peer (P2P) Systems

Telecommunication Services Engineering Lab. Roch H. Glitho

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

CSCI-1680 P2P Rodrigo Fonseca

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Information Processing

Content Search. Unstructured P2P. Jukka K. Nurminen

Today. Architectural Styles

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Slides for Chapter 10: Peer-to-Peer Systems

Motivation for peer-to-peer

Distributed Hash Table

EE 122: Peer-to-Peer Networks

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Distributed Systems Final Exam

Naming in Distributed Systems

Content Search. Unstructured P2P

Chapter 10: Peer-to-Peer Systems

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Fast Topology Management in Large Overlay Networks

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma

GNUnet Distributed Data Storage

Handling Churn in a DHT

Today. Architectural Styles

Scalable overlay Networks

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

: Scalable Lookup

Scaling Problem Computer Networking. Lecture 23: Peer-Peer Systems. Fall P2P System. Why p2p?

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza

Overlay Networks. Behnam Momeni Computer Engineering Department Sharif University of Technology

6. Peer-to-peer (P2P) networks I.

Debunking some myths about structured and unstructured overlays

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Opportunistic Application Flows in Sensor-based Pervasive Environments

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Security for Structured Peer-to-peer Overlay Networks. Acknowledgement. Outline. By Miguel Castro et al. OSDI 02 Presented by Shiping Chen in IT818

CS 347 Parallel and Distributed Data Processing

Detecting and Recovering from Overlay Routing Attacks in Peer-to-Peer Distributed Hash Tables

Georges Da Costa Introduction on Peer to Peer systems

08 Distributed Hash Tables

Kademlia: A P2P Informa2on System Based on the XOR Metric

CDNs and Peer-to-Peer

Cooperation in Open Distributed Systems. Stefan Schmid

Architectures for distributed systems (Chapter 2)

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli

Peer-to-Peer Architectures and Signaling. Agenda

L3S Research Center, University of Hannover

Lecture 13: P2P Distributed Systems

Today s Objec2ves. Kerberos. Kerberos Peer To Peer Overlay Networks Final Projects

Lecture-2 Content Sharing in P2P Networks Different P2P Protocols

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Transcription:

Introduction Introduction to Peer-to-Peer Systems Peer-to-peer (PP) systems have become extremely popular and contribute to vast amounts of Internet traffic PP basic definition: A PP system is a distributed collection of peer nodes Ozalp Babaoglu Each node is both a server and a client: May provide services to other peers May consume services from other peers Very different from the client-server model ALMA MATER STUDIORUM UNIVERSITA DI BOLOGNA PP History: - PP History: 1 - today The origins: In the beginning, all nodes in Arpanet/Internet were peers Each node was capable of: Performing routing (locate machines) Accepting ftp connections (file sharing) Accepting telnet connections (distributed computation) The advent of Napster: Jan 1: the first version of Napster was released by Shawn Fanning, student at Northeastern University July 1: Napster Inc. founded Feb 01: Napster closed down

Napster Client/Server vs. Peer-to-Peer Napster Client 1 Napster Client Copy: song.mp Query: song.mp Your Computer Napster Client Napster Central Index Server Client Napster Client - Client/Server search - PP download - Napster is not pure PP Servers well connected to the core of the Internet Servers carry out critical tasks Clients only talk to servers Nodes located at the periphery of the Internet Tasks distributed across all nodes Clients talk to other clients Example Video sharing Example Video sharing Client-Server: YouTube Advantages Peer-to-peer: BitTorrent Advantages Client can disconnect after upload Does not depend on a central server Uploader needs little bandwidth Other users can find the file easily (just use search on server webpage) Bandwidth shared across nodes (s also act as uploaders) High scalability, low cost uploader Disadvantages seeder Disadvantages Server may not accept file or remove it later (according to content policy) Seeder must remain on-line to guarantee file availability client-server Whole system depends on the server (what if shut down like Napster?) Server storage and bandwidth are expensive! peer-to-peer Content is more difficult to find (s must find.torrent file) Freeloaders cheat in order to download without uploading

PP vs. client-server PP and Overlay Networks Client-server Asymmetric: client and servers carry out different tasks Global knowledge: servers have a global view of the network Centralization: communications and management are centralized Single point of failure: a server failure brings down the system Limited scalability: servers easily overloaded Expensive: server storage and bandwidth capacity is not cheap Peer-to-peer Symmetric: No distinction between node; they are peers Local knowledge: nodes only know a small set of other nodes Decentralization: no global knowledge, only local interactions Robustness: several nodes may fail with little or no impact High scalability: high aggregate capacity, load distribution Low-cost: storage and bandwidth are contributed by users Peer-to-Peer systems are usually structured as overlays Logical structures built on top of a physical routed communication infrastructure (IP) that creates the allusion of a completely-connected graph Links based on logical knows relationships rather than physical connectivity Overlay networks Overlay networks Physical network: who has a communication link to whom Logical network: who can communicate with whom

Overlay networks Overlay networks Overlay network (ring): who knows whom Overlay network (binary tree): who knows whom 1 PP Environment Why PP? Completely decentralized control with limited local states High latency, low bandwidth communication Churn Nodes may disconnect temporarily New nodes are continuously joining the system, while others leave permanently Security PP clients runs on machines under the total control of their owners Malicious users may try to bring down the system Selfishness Users may run hacked clients in order to avoid contributing resources Decentralization enables deployment of applications that are: Highly available Fault-tolerant Self organizing Scalable Difficult or impossible to shut down This results in a grassroots approach and democratization of the Internet 1

PP Problems PP Applications Overlay construction and maintenance e.g., random, two-level, ring, etc. Data location locate a given data object among a large number of nodes Data dissemination propagate data in an efficient and robust manner Global reasoning with local information maintain local views with small per node state Tolerance to churn maintain system invariants (e.g., topology, data location, data availability) despite node arrivals and departures Sharing of content: File sharing, content delivery networks Gnutella, emule, Akamai Sharing of storage: Distributed file systems Sharing of CPU time: Parallel computing, Grid Seti@home, Folding@home, FightAids@home (typically not pure PP) 1 PP Topologies Evaluating topologies Unstructured Structured Centralized Hierarchical Hybrid Manageability How hard is it to keep working? Information coherence How authoritative is info? Extensibility How easy is it to grow? Fault tolerance How well can it handle failures? Censorship How hard is it to shut down? 1

Unstructured Structured: Centralized Manageable Difficult, many owners Coherent Difficult, unreliable peers Extensible Anyone can join in Fault tolerance redundancy Censorship Difficult to shut down Manageable System is all in one place Coherent Information is centralized Extensible No Fault tolerance Single point of failure Censorship Easy to shut down Structured: Hierarchical Hybrid: Superpeers Manageable Chain of authority Coherent Cache consistency Extensible Add more leaves, rebalance Fault tolerance Root is vulnerable Censorship Just shut down the root Manageable Same as decentralized Coherent Better than decentralized Extensible Anyone can join in Fault tolerance redundancy Censorship Difficult to shut down

Some Common Topologies Data location in unstructured networks: Flooding Flat unstructured: a node can connect to any other node only constraint: maximum degree dmax fast join procedure good for data dissemination, bad for location Two-level unstructured: nodes connect to a superpeer superpeers form a small overlay used for indexing and forwarding high load on superpeer Flat structured: constraints based on node ids allows for efficient data location constraints require long join and leave procedures Problem: find the set of nodes S that store a copy of object O Flooding: forward the search message to all neighbors (first Gnutella protocol) A search message contains either keywords or an object id Advantages: simplicity no topology constraints Disadvantages: high network overhead (huge traffic generated by each search request) flooding stopped by TTL (which produces search horizon) only applicable to small number of nodes Data location in unstructured networks: Flooding Data location in unstructured networks: Superpeers Flooding (cont.) Flooding in a flat unstructured network: obj Two-level overlay: use superpeers to track the locations of an object [Gnutella, BitTorrent] Each node connects to a superpeer and advertises the list of objects it stores Search requests are sent to the supernode, which forwards them to other supernodes Advantages: highly scalable search horizon for TTL = Objects that lie outside of the horizon are not found Disadvantages: superpeers must be reliable, powerful and well connected to the Internet (expensive) superpeers must maintain large state the system relies on a small number of superpeers

Data location in unstructured networks: Superpeers Two-Level Overlay (cont.) obj A two-level overlay is a partially centralized system request response In some systems, superpeers may be disconnected (e.g., BitTorrent) Data location in structured networks: Key-Based Routing Structured networks: use a routing algorithm that implements Key- Based Routing (KBR) [Chord, Pastry, Overnet, Kad, emule] KBR (also known as Distributed Hash Tables) works as follows: nodes are (randomly) assigned unique node identifiers (nodeid) given a key k, the node whose nodeid is numerically closest to k among all nodes in the network is known as the root of key k given a key k, a KBR algorithm can route a message to the root of k in a small number of hops, usually O(log n) the location of an object with id objectid is tracked by the root of k = objectid thus, one can find the location of an object by routing a message to the root of k = objectid and querying the root for the location of the object Structured overlay network: Chord Structured overlay network: Chord Basics Basics Each peer is assigned a unique m-bit identifier id. Every peer is assumed to store data contained in a file. Each file has a unique m-bit key k. Peer with smallest identifier id storing file with key k. k is responsible for succ(k): The peer (i.e., node) with the smallest identifier p k. Each peer is assigned a unique m-bit identifier id. Every peer is assumed to store data contained in a file. Each file has a unique m-bit key k. Peer with smallest identifier id storing file with key k. k is responsible for succ(k): The peer (i.e., node) with the smallest identifier p k. Note Note All arithmetic is done modulo M = m. In other words, if x = k M + y, then x mod M = y. All arithmetic is done modulo M = m. In other words, if x = k M + y, then x mod M = y. / /

Example Efficient lookups Peer 1 stores files with keys,,, 0, 1 Peer stores file with key Peer stores files with keys,,,, Partial view = finger table Each node p maintains a finger table FT p [] with at most m entries: FT p [i]=succ(p + i 1 ) Note: FT p [i] points to the first node succeeding p by at least i 1. To look up a key k, node p forwards the request to node with index j satisfying 1 1 1 1 q = FT p [j] apple k < FT p [j + 1] If p < k < FT p [1], the request is also forwarded to FT p [1] / / Example finger tables Example lookup: 1@ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i i-1 succ(p + ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FT [] apple 1 < FT [] )! 1 p = 1 < 1 < FT p [1] ) 1! / /

Example lookup: 1@ Example lookup: 1@ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FT [] apple 1 < FT [] )! 1 p = 1 < 1 < FT p [1] ) 1! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FT [] apple 1 < FT [] )! 1 p = 1 < 1 < FT p [1] ) 1! / / Example lookup: @ Example lookup: @ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FT [] apple )! FT [1] apple < FT [] )! p = < < FT [1] )! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FT [] apple )! FT [1] apple < FT [] )! p = < < FT [1] )! / /

Example lookup: @ Example lookup: @ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FT [] apple )! FT [1] apple < FT [] )! p = < < FT [1] )! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FT [] apple )! FT [1] apple < FT [] )! p = < < FT [1] )! / / Example lookup: @ Example lookup: @ 1 1 1 1 1 1 1 1 1 p = < FT p [1] 1 1 1 1 1 1 1 1 1 1 1 )! FT [] < )! FT [] apple < FT [] )! 1 p = 1 < < FT p [1] ) 1! 1 1 1 1 1 1 1 1 1 p = < FT p [1] 1 1 1 1 1 1 1 1 1 1 1 )! FT [] < )! FT [] apple < FT [] )! 1 p = 1 < < FT p [1] ) 1! / /

Example lookup: @ Example lookup: @ 1 1 1 1 1 1 1 1 1 p = < FT p [1] 1 1 1 1 1 1 1 1 1 1 1 )! FT [] < )! FT [] apple < FT [] )! 1 p = 1 < < FT p [1] ) 1! 1 1 1 1 1 1 1 1 1 p = < FT p [1] 1 1 1 1 1 1 1 1 1 1 1 )! FT [] < )! FT [] apple < FT [] )! 1 p = 1 < < FT p [1] ) 1! / / Example lookup: @ The Chord graph 1 1 1 1 1 1 1 1 1 p = < FT p [1] 1 1 1 1 1 1 1 1 1 1 1 )! FT [] < )! FT [] apple < FT [] )! 1 p = 1 < < FT p [1] ) 1! Essence Each peer represented by a vertex; if FT p [i]=j, add arc h! i,ji, but keep directed graph strict. 1 1 / /

Chord: path lengths Chord: degree distribution Observation With d n (i,j)=min{ i j,n i j }, we can see that every peer is joined with another peer at distance 1 n, 1 n, 1 n,...,1. 00 00 Average path length..0..0. Occurrences 00 00 0 0 0 Occurrences 00 100 00 1 Outdegree 1 Network size (x 00) 0 0 0 0 Indegree / / Chord: clustering coefficient Data location in structured networks: Key-Based Routing Note 0. 0. 0. 0.0 1 1 CC is computed over undirected Chord graph; x-axis shows number of 00 nodes. Structured networks Advantages: completely decentralized (no need for superpeers) routing algorithm achieves low hop count for large network sizes Disadvantages: each object must be tracked by a different node objects are tracked by unreliable nodes (which may disconnect) keyword-based searches are more difficult to implement than with superpeers (because objects are located by their objectid) the overlay must be structured according to a given topology in order to achieve a low hop count routing tables must be updated every time a node joins or leaves the overlay /

Effects of Churn Churn - Preventing Partitions Churn can have several effects on a PP system: data objects may be become unavailable if all replicas disconnect A naïve approach to preventing partitions is to increase the average node degree routing tables may become inconsistent (e.g., entries may point to disconnected nodes) the overlay may become partitioned if many nodes suddenly disconnect: Ring partitions can be avoided by keep a list of successor nodes Churn Tolerance Security Node arrivals and departures must not disrupt the normal behavior of the system system invariants must be maintained connected overlay (i.e., no partitions), low average path length data objects accessible from anywhere in the network two types of churn tolerance: dynamic recovery: ability to react to changes in the overlay to maintain system invariants (e.g., heal partitions) static resilience: ability to continue operating correctly before adaptation occurs (e.g., route messages through alternate paths) Security in PP systems is hard to enforce: Users have full control of their computers Modified clients may not follow the standard protocol Data may be corrupted Private data stored on remote computers may disclosed

Security - Weak identities The user may leave the system and rejoin it with a new identity (different user id) If an attack is detected, the attacker can re-enter the system with a new id An attacker may create a large number of false identities (Sybil attack) S S1 S A S S S Example of Sybil attack: - Nodes S1 to S are actually instances of the PP client running on the same machine - The attacker can intercept all traffic coming from or going to node A Security - Strong identities The user cannot change its identity Solution: use a centralized, trusted Certification Authority (CA) Each new user must obtain an identity certificate The certificate is digitally signed by the CA, whose public key is known by all users A certificate cannot be forged (require the CA s private key) To prove his identity, a user signs a message with his private key, and attaches the corresponding certificate signed by the CA Strong identities prevent Sybil Attacks If an attacker is caught, it cannot easily rejoin the system Security - Weak vs. Strong identities Strong identities require a centralized CA New nodes must contact the CA before joining the network: The CA response may be slow If the CA is unavailable, new nodes cannot join The security of the system depends on CA: The CA must correctly verify the identity of the requester The CA s private key must be secret Many PP systems use weak identities IP addresses already gives some identity information Some systems ensure anonymity