Distributed Hash Tables

Similar documents
Distributed Hash Tables

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Peer-to-Peer Networks Pastry & Tapestry 4th Week

Peer- Peer to -peer Systems

08 Distributed Hash Tables

Module SDS: Scalable Distributed Systems. Gabriel Antoniu, KERDATA & Davide Frey, ASAP INRIA

Middleware and Distributed Systems. Peer-to-Peer Systems. Peter Tröger

Modern Technology of Internet

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Athens University of Economics and Business. Dept. of Informatics

Distributed Hash Table

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Telematics Chapter 9: Peer-to-Peer Networks

Architectures for Distributed Systems

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

INF5070 media storage and distribution systems. to-peer Systems 10/

Chapter 10: Peer-to-Peer Systems

: Scalable Lookup

INF5071 Performance in distributed systems: Distribution Part III

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

CIS 700/005 Networking Meets Databases

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

LECTURE 3: CONCURRENT & DISTRIBUTED ARCHITECTURES

Slides for Chapter 10: Peer-to-Peer Systems

Winter CS454/ Assignment 2 Instructor: Bernard Wong Due date: March 12 15, 2012 Group size: 2

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

P2P: Distributed Hash Tables

Kademlia: A P2P Informa2on System Based on the XOR Metric

Analysis of Peer-to-Peer Protocols Performance for Establishing a Decentralized Desktop Grid Middleware

Should we build Gnutella on a structured overlay? We believe

Distributed Hash Tables: Chord

Peer-to-Peer (P2P) Systems

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

P2P Network Structured Networks (IV) Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

Performance and dependability of structured peer-to-peer overlays

CS4700/CS5700 Fundamentals of Computer Networks

Reminder: Distributed Hash Table (DHT) CS514: Intermediate Course in Operating Systems. Several flavors, each with variants.

internet technologies and standards

P2P Overlay Networks of Constant Degree

Structured Peer-to-Peer Networks

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

DHT Optimizations for mobile devices. Seminar Mobile Systems Supervisor: Thomas Bocek Student: Dario Nakic

EE 122: Peer-to-Peer Networks

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Debunking some myths about structured and unstructured overlays

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Peer to Peer Networks

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

GNUnet Distributed Data Storage

Unit 8 Peer-to-Peer Networking

Handling Churn in a DHT

A Scalable Content- Addressable Network

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications

CS 43: Computer Networks BitTorrent & Content Distribution. Kevin Webb Swarthmore College September 28, 2017

Diploma Thesis Collaborative Data Processing on Mobile Handsets Jan Kettner. Examiner: Prof. Dr. Mesut Günes Tutor: Georg Wittenburg, M. Sc.

Leader Election. Rings, Arbitrary Networks. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India

L3S Research Center, University of Hannover

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Peer-to-Peer Systems and Distributed Hash Tables

CSE 5306 Distributed Systems

PUB-2-SUB: A Content-Based Publish/Subscribe Framework for Cooperative P2P Networks

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

Cycloid: A constant-degree and lookup-efficient P2P overlay network

An Adaptive Stabilization Framework for DHT

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network

Content Overlays. Nick Feamster CS 7260 March 12, 2007

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Telecommunication Services Engineering Lab. Roch H. Glitho

Hybrid Overlay Structure Based on Random Walks

CSE 5306 Distributed Systems. Naming

Searching for Shared Resources: DHT in General

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

Slides for Chapter 10: Peer-to-Peer Systems. From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design

A Common API for Structured Peer-to- Peer Overlays. Frank Dabek, Ben Y. Zhao, Peter Druschel, Ion Stoica

Exploiting the Synergy between Peer-to-Peer and Mobile Ad Hoc Networks

Distributed Hash Tables

Badri Nath Rutgers University

Distributed Systems Peer-to-Peer Systems

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Searching for Shared Resources: DHT in General

ChordNet: A Chord-based self-organizing super-peer network

Lecture-2 Content Sharing in P2P Networks Different P2P Protocols

CS514: Intermediate Course in Computer Systems

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

Cycloid: A Constant-Degree and Lookup-Efficient P2P Overlay Network

Overlay Networks for Multimedia Contents Distribution

Motivation for peer-to-peer

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

An Expresway over Chord in Peer-to-Peer Systems

Transcription:

Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi 1/34

Outline 1 2 Smruti R. Sarangi 2/34

Normal Hashtables Hashtable : Contains a set of key-value pairs. If the user supplies the key, the hashtable returns the value. Basic operations. insert(key,value) Inserts the key,value pair into the hashtable. lookup(key) Returns the value, or null if there is no value. delete(key) Deletes the key Time Complexity Approximately, θ(1) Need a sophisticated hash function to map keys to unique locations. Need to resolve collisions through chaining or rehashing. Smruti R. Sarangi 3/34

Distributed Hashtables(DHT) 1. lookup 2. insert 3. delete Hashtable 1. lookup 2. insert 3. delete Distributed Hashtable Smruti R. Sarangi 4/34

Salient Points of DHTs They can store more data than centralized databases. DHTs are the only feasible options for web-scale data: facebook, linkedin, google Assume that a bank has 10 crore customers (0.1 billion) Each customer requires storage equivalent to the size of this latex file. Total storage requirement: 8 KB 0.1 billion = 0.8 TB Smruti R. Sarangi 5/34

Salient Points of DHTs They can store more data than centralized databases. DHTs are the only feasible options for web-scale data: facebook, linkedin, google Assume that a bank has 10 crore customers (0.1 billion) Each customer requires storage equivalent to the size of this latex file. Total storage requirement: 8 KB 0.1 billion = 0.8 TB A user is sharing 100 songs : 500 MB/user There are 10 crore(0.1 billion) users Storage: 50 PB (petabytes) Smruti R. Sarangi 5/34

Salient Points of DHTs They can store more data than centralized databases. DHTs are the only feasible options for web-scale data: facebook, linkedin, google Assume that a bank has 10 crore customers (0.1 billion) Each customer requires storage equivalent to the size of this latex file. Total storage requirement: 8 KB 0.1 billion = 0.8 TB A user is sharing 100 songs : 500 MB/user There are 10 crore(0.1 billion) users Storage: 50 PB (petabytes) There is a difference of an order of magnitude!!! Smruti R. Sarangi 5/34

Advantages of DHTs DHTs scale, and are ideal candidates for web scale storage. They are more immune to node failures. They use extensive data replication. DHTs also scale in terms of number of users. Different users are redirected to different nodes based on their keys. ( better load balancing ). In the case of torrent applications: they reduce the legal liability since there is no dedicated central server. Major Proposals, Chord, Tapestry, CAN, Fawn Smruti R. Sarangi 6/34

Outline 1 2 Smruti R. Sarangi 7/34

Salient Points Scalable distributed object location service. Uses a ring based overlay network. The overlay network takes into account network locality. Automatically adapts to the arrival, departure, and failure or nodes. has been used as the substrate to make large storage services (PAST) and scalable publish/subscribe system(scribe). PAST is a large scale file storage service. SCRIBE stores a massive number of topics in the DHT. When a topic changes, the list of subscribers are notified. Smruti R. Sarangi 8/34

Design of Node has a unique 128 bit nodeid. The nodes are conceptually organized as a ring, arranged in ascending order of nodeids. nodeids are generated by computing a cryptographic hash of the node s IP address or public key. Basic idea of routing: Given a key, find its 128 bit hash. Find the node whose nodeid is numerically closest to the hash-value. Send the request to that node. Smruti R. Sarangi 9/34

Advantages of can route a request to the right node in less than log 2 b(n) steps. b is typically 4. Eventual delivery is guaranteed unless L/2 nodes with adjacent nodeids fail simultaneously. L = 16 or 32. Both nodeids and keys are thought to be base 2 b numbers. If we assume that b = 4, then these are hexadecimal numbers. In each step, the request is forwarded to a node whose shared prefix with the key is at least 1 digit (b bits) more than the length of the shared prefix with the current node s nodeid. If no such node is found, then the request is forwarded to a node, which has the same length of the prefix but is numerically closer to the key. Smruti R. Sarangi 10/34

Outline 1 2 Smruti R. Sarangi 11/34

K = O(log(N)) Node Structure of a Node It contains a routing table, neighborhood table, and a leaf set Structure of the routing table Routing Table 2 b -1 IP address of node Smruti R. Sarangi 12/34

Routing Table (R) The routing table contains roughly ( log 2 b(n) ) rows (0... 31). The entries at row i ( count starts from 1 ) point to nodes that share the first (i 1) digits of the prefix with the key. Each row contains 2 b 1 columns. Each cell refers to a base 2 b digit. If the digit associated with a cell matches the i th digit of the key, then we have a node that matches the key with a longer prefix. We should route the request to that node. Smruti R. Sarangi 13/34

Some Maths The probability that two hashes have the first m digits common is 16 m. Let us assume 2 b = 16. The probability that a key does not have its first m digits matching with the first m digits of a node: 1 16 m The probability that the key does not have a prefix match of length m with all of the nodes: (1 16 m ) n Let m = c + log 16 (n) We have: prob = (1 16 m ) n = (1 16 c log 16(n) ) n = (1 16 c /n) n (λ = 16 c ) = (1 λ/n) n/λ) λ = e λ (n ) Smruti R. Sarangi 14/34

Some Maths-II As, c becomes larger (let s say 5), λ = 16 c becomes very small. Example: 16 5 = 9.5 10 7. Essentially: λ 0 1 f(x) 0.9 0.8 probability 0.7 0.6 0.5 0.4 0.3 0 1 2 3 4 5 c Smruti R. Sarangi 15/34

Neighborhood Set and Leaf Set Neighborhood Set (M) It contains M nodes that are closest to the node according to a proximity metric. Typically contains 2 b+1 entries. Leaf Set (L) Contains L/2 nodes with a numerically closest larger nodeids. Contains L/2 nodes with a numerically closest smaller nodeids. Typically 2 b. Smruti R. Sarangi 16/34

Routing Algorithm Algorithm 1: Routing algorithm 1 Input: key K, routing table R, Hash of the key D Ouput: Value V if L L/2 D L L/2 then /* K is within the range of the leaf set */ 2 forward K to L i such that L i K is minimal 3 end 4 else 5 l common_prefix(k, nodeid) if R(l + 1, D l+1 ) null then 6 forward to R(l + 1, D l+1 ) 7 end 8 else 9 forward to (T L R M) such that prefix(t, K ) l T K < nodeid K 10 end 11 end Smruti R. Sarangi 17/34

Explanation The node first checks to find if the key is within the leaf set. If so, it forwards the messages to the closest node (by nodeid) in the leaf set. Otherwise, forwards the message to a node with one more matching digit in the common prefix. In the rare case, when we are not able to find a node that matches the first two criteria, we forward the request to any node that is closer to the key than the current nodeid. Note that it still needs to have a match of at least l digits. Smruti R. Sarangi 18/34

Performance and Reliability If L/2 nodes in the leaf set are alive then the message can always be passed on to some other node. At every step, we are ( with high probability ) either searching in the leaf set or moving to another node. If the key is within the range of the leaf set it is at most one hop away Otherwise, at every step we increase the length of the matched prefix by 2 b. Routing Time Complexity The average routing time is thus O(log 2 b(n)). Smruti R. Sarangi 19/34

Outline 1 2 Smruti R. Sarangi 20/34

Node Arrival Assume that node X wants to join the network. It locates another nearby node A. A can also be found with expanding ring multicast. A forwards the request to the numerically closest node Z. Nodes, A, Z, and all the nodes in the path from A to Z send their routing tables to X. X uses all of this information to initialize its tables. Smruti R. Sarangi 21/34

Table Initialization Neighborhood Set Since A is close to X. X copies A s neighborhood set. Leaf Set Z is the closest to X (nodeid). X uses Z s leaf set to form its own leaf set. Smruti R. Sarangi 22/34

Table Initialization - II Routing Table Assume that A and X do not share any digits in the prefix ( General Case ). The first row of the routing table is independent of the nodeid ( No Match ). X can A s row to initialize its first row. Every node in the path from A to Z has one additional digit matching with X. Let B i be the i th entry in the path from A to Z. Observation : B i shares i bits of its prefix with X. Use its (i + 1) th row in its routing table to populate the (i + 1) th row of the routing table of X. Finally X transmits its state to all the nodes in L R M. Smruti R. Sarangi 23/34

Node Departure Leaf Set Nodes might just fail or leave without notifying. Repairing the leaf set Assume that a leaf L k fails. ( L/2 < k < 0). In this case, the node contacts L L/2. It gets its leaf set and merges it with its leaf set. For any new nodes added, it verifies their existence by pinging them. Smruti R. Sarangi 24/34

Node Departure Routing Table and Neighborhood Set Repairing the Routing Table Assume that R(l, d) fails. Try to get a replacement for it by contacting R(l, d )(d d ) If it is not able to find a candidate, it casts a wider net by asking R(l + 1, d )(d d ) Repairing the Neighborhood Set (M) A node periodically pings its neighbors. If a neighbor is found to be dead, it gets M from its neighbors and repairs its state. Smruti R. Sarangi 25/34

Maintaining Locality Assume that before node X is added, there is good locality. All the nodes in the routing table point to nearby nodes. We add a new node X We start with a nearby node A and move towards Z (closest by nodeid). Let B i be the i th node in the path. ( Induction assumption: B i has locality ) B i is fairly close to X because it is fairly close to A. Since we get the i th row of the routing table from B i, and B i has locality, the i th row of the routing table of X. Thus, X has locality. Smruti R. Sarangi 26/34

Maintaining Locality Assume that before node X is added, there is good locality. All the nodes in the routing table point to nearby nodes. We add a new node X We start with a nearby node A and move towards Z (closest by nodeid). Let B i be the i th node in the path. ( Induction assumption: B i has locality ) B i is fairly close to X because it is fairly close to A. Since we get the i th row of the routing table from B i, and B i has locality, the i th row of the routing table of X. Thus, X has locality. Induction hypothesis proved Smruti R. Sarangi 26/34

Tolerating Byzantine Failures Have multiple entries in each cell of the routing table. Randomize the routing strategy. Periodically send IP broadcasts and multicasts (expanding ring) to connect disconnected networks. Smruti R. Sarangi 27/34

Outline 1 2 Smruti R. Sarangi 28/34

Setup 100,000 nodes Each node runs a Java based VM Each node is assigned a co-ordinate in an Euclidean plane. Smruti R. Sarangi 29/34

- I Source [1] Smruti R. Sarangi 30/34

- II Source [1] Smruti R. Sarangi 31/34

- III Source [1] Smruti R. Sarangi 32/34

- IV Source [1] Smruti R. Sarangi 33/34

- V Source [1] Smruti R. Sarangi 34/34

: Scalable, decentralized object location and routing for large-scale peer-to-peer systems by Antony Rowstron and Peter Druschel in Middleware 2001 Smruti R. Sarangi 34/34