Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Similar documents
March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

: Scalable Lookup

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

CS535 Big Data Fall 2017 Colorado State University 11/7/2017 Sangmi Lee Pallickara Week 12- A.

Peer-to-Peer (P2P) Systems

Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

LECT-05, S-1 FP2P, Javed I.

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Content Overlays. Nick Feamster CS 7260 March 12, 2007

P2P: Distributed Hash Tables

CS 347 Parallel and Distributed Data Processing

Searching for Shared Resources: DHT in General

L3S Research Center, University of Hannover

CSE 486/586 Distributed Systems

Distributed Hash Table

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

Searching for Shared Resources: DHT in General

CIS 700/005 Networking Meets Databases

CRESCENDO GEORGE S. NOMIKOS. Advisor: Dr. George Xylomenos

Distributed Hash Tables

L3S Research Center, University of Hannover

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

08 Distributed Hash Tables

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

EE 122: Peer-to-Peer Networks

Peer to peer systems: An overview

In this assignment, you will implement a basic CHORD distributed hash table (DHT) as described in this paper:

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

Distributed Hash Tables: Chord

CS514: Intermediate Course in Computer Systems

Peer-to-Peer Systems and Distributed Hash Tables

Page 1. Key Value Storage"

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

Chapter 6 PEER-TO-PEER COMPUTING

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Naming. Naming. Naming versus Locating Entities. Flat Name-to-Address in a LAN

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Securing Chord for ShadowWalker. Nandit Tiku Department of Computer Science University of Illinois at Urbana-Champaign

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Some of the peer-to-peer systems take distribution to an extreme, often for reasons other than technical.

Distributed Hash Tables

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

A Framework for Peer-To-Peer Lookup Services based on k-ary search

An Expresway over Chord in Peer-to-Peer Systems

A Survey of Peer-to-Peer Content Distribution Technologies

Early Measurements of a Cluster-based Architecture for P2P Systems

Peer-to-Peer Systems. Chapter General Characteristics

Introduction to P2P Computing

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Ch06. NoSQL Part III.

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Saarland University Faculty of Natural Sciences and Technology I Department of Computer Science. Masters Thesis

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Back-Up Chord: Chord Ring Recovery Protocol for P2P File Sharing over MANETs

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

Topics in P2P Networked Systems

Peer-to-Peer (P2P) Distributed Storage. Dennis Kafura CS5204 Operating Systems

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS

Telecommunication Services Engineering Lab. Roch H. Glitho

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

DATA. The main challenge in P2P computing is to design and implement LOOKING UP. in P2P Systems

Definition of a DS A collection of independent computers that appear to its user as a single coherent system.

Distributed Service Discovery with Guarantees in Peer-to-Peer Networks using Distributed Hashtables

Effect of Links on DHT Routing Algorithms 1

A Survey of Peer-to-Peer Systems

Distributed Hash Tables Chord and Dynamo

Part 1: Introducing DHTs

Chord. Advanced issues

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Architectures for Distributed Systems

Peer-to-Peer Signalling. Agenda

Lecture 8: Application Layer P2P Applications and DHTs

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007

Structured P2P. Complexity O(log N)

Athens University of Economics and Business. Dept. of Informatics

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

Detecting and Recovering from Overlay Routing. Distributed Hash Tables. MS Thesis Defense Keith Needels March 20, 2009

Distributed hash table - Wikipedia, the free encyclopedia

Main Challenge. Other Challenges. How Did it Start? Napster. Model. EE 122: Peer-to-Peer Networks. Find where a particular file is stored

Structured Peer-to-Peer Networks

Providing File Services using a Distributed Hash Table

Transcription:

: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashock, Frank Dabek, Hari Balakrishnan March 4, 2013

One slide Summary Problem In a peer-to-peer network, how does one efficiently locate a node which is storing a desired data item? Solution : A scalable, distributed protocol which efficiently locates the desired node in such a dynamic network.

Other efforts in the same direction DNS 1 While DNS requires special root servers, has no such requirement. 2 DNS requires manual management of NS records. auto-corrects routing information. 3 DNS works best when hostnames are structured to reflect administrative boundaries. imposes no naming structure.

Other efforts in the same direction DNS 1 While DNS requires special root servers, has no such requirement. 2 DNS requires manual management of NS records. auto-corrects routing information. 3 DNS works best when hostnames are structured to reflect administrative boundaries. imposes no naming structure. Napster, Gnutella, DC++ 1 Napster & DC++ use a central index. This leads to a single point of failure. 2 Gnutella floods the entire network with each query. 3 No keyword search in. Only unique Ids.

Content Addressable Network (CAN) Problem Identification Scalability Bottleneck :- Centralized hash table

Content Addressable Network (CAN) Problem Identification Scalability Bottleneck :- Centralized hash table Scheme d-dimensional co-ordinate space Completely logical. Has no bearing with physical co-ordinates. Map each Key deterministically to a point P using uniform hashing.

Content Addressable Network (CAN) Problem Identification Scalability Bottleneck :- Centralized hash table Scheme d-dimensional co-ordinate space Completely logical. Has no bearing with physical co-ordinates. Map each Key deterministically to a point P using uniform hashing. Space creation. Bootstrapping.

Content Addressable Network (CAN) Problem Identification Scalability Bottleneck :- Centralized hash table Scheme d-dimensional co-ordinate space Completely logical. Has no bearing with physical co-ordinates. Map each Key deterministically to a point P using uniform hashing. Space creation. Bootstrapping. Node join/departures.

Content Addressable Network (CAN) Problem Identification Scalability Bottleneck :- Centralized hash table Scheme d-dimensional co-ordinate space Completely logical. Has no bearing with physical co-ordinates. Map each Key deterministically to a point P using uniform hashing. Space creation. Bootstrapping. Node join/departures. Message routing.

Content Addressable Network (CAN) Problem Identification Scalability Bottleneck :- Centralized hash table Scheme d-dimensional co-ordinate space Completely logical. Has no bearing with physical co-ordinates. Map each Key deterministically to a point P using uniform hashing. Space creation. Bootstrapping. Node join/departures. Message routing. Key Facts Info maintained by each node is indepedent of N How does one fix d?

CHORD : Design Requirements Network Assumptions 1 Symmetric:- If A B, then B A 2 Trasitive:- If A B and B C then A C

CHORD : Design Requirements Network Assumptions 1 Symmetric:- If A B, then B A 2 Trasitive:- If A B and B C then A C Targets

CHORD : Design Requirements Network Assumptions 1 Symmetric:- If A B, then B A 2 Trasitive:- If A B and B C then A C Targets 1 Load Balance:- Distributed hash function.

CHORD : Design Requirements Network Assumptions 1 Symmetric:- If A B, then B A 2 Trasitive:- If A B and B C then A C Targets 1 Load Balance:- Distributed hash function. 2 Decentralization :- No node is more important than the other.

CHORD : Design Requirements Network Assumptions 1 Symmetric:- If A B, then B A 2 Trasitive:- If A B and B C then A C Targets 1 Load Balance:- Distributed hash function. 2 Decentralization :- No node is more important than the other. 3 Scalable :- Achieved without any parameter tuning.

CHORD : Design Requirements Network Assumptions 1 Symmetric:- If A B, then B A 2 Trasitive:- If A B and B C then A C Targets 1 Load Balance:- Distributed hash function. 2 Decentralization :- No node is more important than the other. 3 Scalable :- Achieved without any parameter tuning. 4 Availibility :- Handles most network failures.

CHORD : Design Requirements Network Assumptions 1 Symmetric:- If A B, then B A 2 Trasitive:- If A B and B C then A C Targets 1 Load Balance:- Distributed hash function. 2 Decentralization :- No node is more important than the other. 3 Scalable :- Achieved without any parameter tuning. 4 Availibility :- Handles most network failures. 5 Flexible naming :- Flat and unstructured key space.

The big picture

Consistent Hashing

Consistent Hashing How do you do it? 1 Assign an m bit identifier to each node and key separately. 2 Use SHA-1 to ensure keys are evenly distributed. 3 ring:- a 2 m identifier circle. m=6, 6 keys, 10 nodes

Consistent Hashing How do you do it? 1 Assign an m bit identifier to each node and key separately. 2 Use SHA-1 to ensure keys are evenly distributed. 3 ring:- a 2 m identifier circle. m=6, 6 keys, 10 nodes Theorem 1 Each node responsible for (1 + ɛ)k/n keys 2 Only O(K/N) keys change hands when (N + 1) st node joins/leaves.

Naive Key Lookup Naive Algorithm //ask a node n to find the successor of id n.find_successor(id) if(id \belongs (n,successor] ) return successor; else //forward the query around the circle return successor.find_successor(id); Performance O(N)

Scalable Key Lookup Finger Table :- m entries, only O(log(N)) are distinct i th entry = first node that succeeds the current node by atleast 2 i 1 on the identifier circle. n.finger[i], a.k.a. i th finger of n

Scalable Key Lookup Finger Table :- m entries, only O(log(N)) are distinct i th entry = first node that succeeds the current node by atleast 2 i 1 on the identifier circle. n.finger[i], a.k.a. i th finger of n Successor :- next node, n.finger[1] Predecessor :- previous node, p.finger[1]=n

Scalable Key Lookup Finger Table :- m entries, only O(log(N)) are distinct i th entry = first node that succeeds the current node by atleast 2 i 1 on the identifier circle. n.finger[i], a.k.a. i th finger of n Successor :- next node, n.finger[1] Predecessor :- previous node, p.finger[1]=n Important Observations 1 Each nodes stores a small amount of info. 2 Each node, knows more about closer nodes than far off ones. 3 A node s finger table does not contain enough info to directly find the successor of any arbitrary node k.

Sample Finger Table

The Lookup Algorithm N8 looks up K54 Algorithm //ask a node n to find the successor of id n.find_successor(id) if(id \belongs (n,successor] ) return successor; else n =closest_preceding_node(id); return n.find_successor(id); //search the local table for the highest //predecessor of id n.closest_preceding_node(id) for i= m down to 1 if (finger[i] \belongs (n,id)) return finger[i]; return n;

The Lookup Algorithm N8 looks up K54 Algorithm //ask a node n to find the successor of id n.find_successor(id) if(id \belongs (n,successor] ) return successor; else n =closest_preceding_node(id); return n.find_successor(id); //search the local table for the highest //predecessor of id n.closest_preceding_node(id) for i= m down to 1 if (finger[i] \belongs (n,id)) return finger[i]; return n; Theorem The no. of nodes which need to be contacted are O(log(N))

Node Join and Stabilization Every node periodically runs the stabilize algo to learn about newly joined nodes. The algo is, basically ask the successor for its predecessor p. Decide if p should be its successor. Thereby, the successor also gets a chance to check its predecessor. Each node periodically fixes its finger table by essentially reconstructing it. Similarly, each node periodically checks if its predecessor is alive. If it is not, then it initializes it to nil

Node Join and Stabilization Every node periodically runs the stabilize algo to learn about newly joined nodes. The algo is, basically ask the successor for its predecessor p. Decide if p should be its successor. Thereby, the successor also gets a chance to check its predecessor. Each node periodically fixes its finger table by essentially reconstructing it. Similarly, each node periodically checks if its predecessor is alive. If it is not, then it initializes it to nil Theorem If any sequence of join operations are interleaved with stabilize, eventually, the successor pointers will form a cycle on all nodes in the network.

Impact of Node Joins on Lookups

Impact of Node Joins on Lookups Case 1: Finger table entries are reasonably correct : Theorem The node is correctly located in O(log(N)) time.

Impact of Node Joins on Lookups Case 1: Finger table entries are reasonably correct : Theorem The node is correctly located in O(log(N)) time. Case 2: Successor pointers are correct, finger table inacccurate Lookups will be correct. Just slower.

Impact of Node Joins on Lookups Case 1: Finger table entries are reasonably correct : Theorem The node is correctly located in O(log(N)) time. Case 2: Successor pointers are correct, finger table inacccurate Lookups will be correct. Just slower. Case 3: Successor pointers incorrect Lookup will fail. The high level application can try again after a small pause. It will not take time for the successor pointers to get fixed.

Failure and Replication Invariant Assumed so far :- Each node knows its successor. To increase Robustness, maintain a successor list containing r successors. Probability of all r nodes concurrently failing = p r

Failure and Replication Invariant Assumed so far :- Each node knows its successor. To increase Robustness, maintain a successor list containing r successors. Probability of all r nodes concurrently failing = p r Modified stabilize algorithm Copy successors list, remove the last entry and prepend the successor. If the successor has failed, do the above with the first live successor in own list.

Failure and Replication Invariant Assumed so far :- Each node knows its successor. To increase Robustness, maintain a successor list containing r successors. Probability of all r nodes concurrently failing = p r Modified stabilize algorithm Copy successors list, remove the last entry and prepend the successor. If the successor has failed, do the above with the first live successor in own list. Modified closest preceding node Search not just the finger table, but also the successor list for the most immediate successor of id

Robustness Guarentee

Robustness Guarentee Theorem If we use a successor list of length r=ω(log(n)), in a network which is initially stable, and every node fails with probability 0.5, then with high probability find successor returns the closes living successor to the query key.

Robustness Guarentee Theorem If we use a successor list of length r=ω(log(n)), in a network which is initially stable, and every node fails with probability 0.5, then with high probability find successor returns the closes living successor to the query key. Theorem In a network which is initially stable, if every node fails with probability.5, then the expected time to execute find successor is O(log(N))

Voluntary Node Departures Treating a departure as a node failure is rather wasteful. A node which is about to leave may transfer its keys to its successor as it departs. It can also notify its predecessor and successor before departing. The predecessor can remove the node from its successor list and add the last node in the new successor list to its own successor list. Similarly, the departing nodes successor can update its predecessor to reflect the departure.

Simulation Environment successor list size = 1 when the predecessor of a node changes, it notifies its old predecessor about its new predecessor packet delay modelled with exponential distribution with meain 50ms. node declared dead if it does not respond within 500ms. not concerned with actual data. Lookup is considered successful if current successor has the desired key.

Load Balance Without virtual nodes With virtual nodes Parameter Settings No. of nodes = 10 4 10 5 No. of keys 10 6 Increments of 10 5 20 runs per No. of keys

Path Length Node coount = 2 k Key count = 100 2 k 3 k 14 Picked a random set of keys Find query length

Improving Routing Latency

Improving Routing Latency Nodes closer in identifier ring can be quite far in underlying network.

Improving Routing Latency Nodes closer in identifier ring can be quite far in underlying network. Actual latency can be large although avg. path length is small.

Improving Routing Latency Nodes closer in identifier ring can be quite far in underlying network. Actual latency can be large although avg. path length is small. Maintain alternative nodes for each finger

Improving Routing Latency Nodes closer in identifier ring can be quite far in underlying network. Actual latency can be large although avg. path length is small. Maintain alternative nodes for each finger Route the query to the one which is closest.

Improving Routing Latency Nodes closer in identifier ring can be quite far in underlying network. Actual latency can be large although avg. path length is small. Maintain alternative nodes for each finger Route the query to the one which is closest. Topologies 1 3-d space: The network distance is modeled as geometric distance in a 3-d space 2 Transit stub: A transit-stub topology with 5000 nodes. 50ms link latency for intra-transit domain links. 20ms, for transit-stub links and 1ms for intra-stub links

Summary Major Contributions

Summary Major Contributions Load Balance :- Consistent hashing.

Summary Major Contributions Load Balance :- Consistent hashing. Decentralization :- Each node knows about only O(log(N)) nodes for efficient lookup

Summary Major Contributions Load Balance :- Consistent hashing. Decentralization :- Each node knows about only O(log(N)) nodes for efficient lookup Scalability :- Handles large number of nodes, joining and leaving the system.

Summary Major Contributions Load Balance :- Consistent hashing. Decentralization :- Each node knows about only O(log(N)) nodes for efficient lookup Scalability :- Handles large number of nodes, joining and leaving the system. Availibility :- Graceful performance degradation : Single correct info is enough

Summary Major Contributions Load Balance :- Consistent hashing. Decentralization :- Each node knows about only O(log(N)) nodes for efficient lookup Scalability :- Handles large number of nodes, joining and leaving the system. Availibility :- Graceful performance degradation : Single correct info is enough Efficiency :- Each node resolves lookups via O(log(N)) messages

Summary Major Contributions Load Balance :- Consistent hashing. Decentralization :- Each node knows about only O(log(N)) nodes for efficient lookup Scalability :- Handles large number of nodes, joining and leaving the system. Availibility :- Graceful performance degradation : Single correct info is enough Efficiency :- Each node resolves lookups via O(log(N)) messages Possible extensions Deal with network partitions Deal with adverserial/faulty nodes

Questions?