P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Similar documents
P2P Network Structured Networks (IV) Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

CS514: Intermediate Course in Computer Systems

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

08 Distributed Hash Tables

LECT-05, S-1 FP2P, Javed I.

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Badri Nath Rutgers University

Peer-to-peer computing research a fad?

CIS 700/005 Networking Meets Databases

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Distributed lookup services

Telematics Chapter 9: Peer-to-Peer Networks

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Reminder: Distributed Hash Table (DHT) CS514: Intermediate Course in Operating Systems. Several flavors, each with variants.

Part 1: Introducing DHTs

Ken Birman. Cornell University. CS5410 Fall 2008.

Distributed Hash Tables: Chord

Structured Peer-to-Peer Networks

P2P: Distributed Hash Tables

Peer- Peer to -peer Systems

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

L3S Research Center, University of Hannover

How to make sites responsive? 7/121

Cycloid: A constant-degree and lookup-efficient P2P overlay network

Distributed Hash Table

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Locality in Structured Peer-to-Peer Networks

L3S Research Center, University of Hannover

Distributed Knowledge Organization and Peer-to-Peer Networks

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

INF5070 media storage and distribution systems. to-peer Systems 10/

CS 347 Parallel and Distributed Data Processing

Athens University of Economics and Business. Dept. of Informatics

Peer-to-Peer Systems and Security IN2194. Chapter 1 Peer-to-Peer Systems 1.3 Structured Networks

Distributed Hash Tables

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

Distributed Hash Tables

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

CS 268: DHTs. Page 1. How Did it Start? Model. Main Challenge. Napster. Other Challenges

: Scalable Lookup

An Expresway over Chord in Peer-to-Peer Systems

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

A Scalable Content- Addressable Network

Modern Technology of Internet

DATA. The main challenge in P2P computing is to design and implement LOOKING UP. in P2P Systems

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

15-744: Computer Networking P2P/DHT

Searching for Shared Resources: DHT in General

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

Tracking Objects in Distributed Systems. Distributed Systems

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Overview. Peer-to-Peer Systems and Security IN2194. Chapter 1 Peer-to-Peer Systems 1.3 Structured Networks. Motivation. Routing on the Internet

Telecommunication Services Engineering Lab. Roch H. Glitho

A Survey of Peer-to-Peer Content Distribution Technologies

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Peer-to-Peer Systems and Distributed Hash Tables

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

Searching for Shared Resources: DHT in General

Motivation. The Impact of DHT Routing Geometry on Resilience and Proximity. Different components of analysis. Approach:Component-based analysis

Topics in P2P Networked Systems

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Peer-to-Peer: Part II

The Impact of DHT Routing Geometry on Resilience and Proximity. Acknowledgement. Motivation

Peer-to-Peer Systems and Security IN2194. Chapter 1 Peer-to-Peer Systems 1.3 Structured Networks

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007

Advanced Computer Networks

Understanding Chord Performance

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Peer to peer systems: An overview

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Distributed Hash Tables

CSE 486/586 Distributed Systems

Middleware and Distributed Systems. Peer-to-Peer Systems. Peter Tröger

Distributed Hash Tables

Winter CS454/ Assignment 2 Instructor: Bernard Wong Due date: March 12 15, 2012 Group size: 2

Cycloid: A Constant-Degree and Lookup-Efficient P2P Overlay Network

Cycloid: A Constant-Degree and Lookup-Efficient P2P Overlay Network

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Chapter 10: Peer-to-Peer Systems

Chapter 6 PEER-TO-PEER COMPUTING

Peer-to-Peer Internet Applications: A Review

CRESCENDO GEORGE S. NOMIKOS. Advisor: Dr. George Xylomenos

Peer-To-Peer Techniques

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

INF5071 Performance in distributed systems: Distribution Part III

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications

Peer-to-Peer (P2P) Systems

Distributed Data Management. Profr. Dr. Wolf-Tilo Balke Institut für Informationssysteme Technische Universität Braunschweig

6 Structured P2P Networks

Transcription:

P2P Network Structured Networks: Distributed Hash Tables Pedro García López Universitat Rovira I Virgili Pedro.garcia@urv.net

Index Introduction to DHT s Origins of structured overlays Case studies Chord Pastry CAN Conclusions

Introduction to DHT s: locating contents I have it Who has this paper? I have it Simple strategy: expanding ring search until content is found If r of N nodes have copy, the expected search cost is at least N / r, i.e., O(N) Need many copies to keep overhead small

Directed Searches Idea Assign particular nodes to hold particular content (or know where it is) When a node wants this content, go to the node that is supposes to hold it (or know where it is) Challenges Avoid bottlenecks: distribute the responsibilities evenly among the existing nodes Adaptation to nodes joining or leaving (or failing) Give responsibilities to joining nodes Redistribute responsibilities from leaving nodes

Idea: Hash Tables A hash table associates data with keys Key is hashed to find bucket in hash table Each bucket is expected to hold #items/#buckets items In a Distributed Hash Table (DHT), nodes are the hash buckets Key is hashed to find responsible peer node Data and load are balanced across nodes lookup (key) data insert (key, data) key pos hash function Beattles h(key)%n 2 lookup (key) data insert (key, data) key hash bucket node hash table pos hash function Beattles h(key)%n 2 00 11 22 33... N-1 0 1 2 N-1... xx yy zz

DHTs: Problems Problem 1 (dynamicity): adding or removing nodes With hash mod N, virtually every key will change its location! h(k) mod m h(k) mod (m+1) h(k) mod (m-1) Solution: use consistent hashing Define a fixed hash space All hash values fall within that space and do not depend on the number of peers (hash bucket) Each key goes to peer closest to its ID in hash space (according to some proximity metric)

DHTs: Problems (cont d) Problem 2 (size): all nodes must be known to insert or lookup data Works with small and static server populations Solution: each peer knows of only a few neighbors Messages are routed through neighbors via multiple hops (overlay routing)

What Makes a Good DHT Design For each object, the node(s) responsible for that object should be reachable via a short path (small diameter) The different DHTs differ fundamentally only in the routing approach The number of neighbors for each node should remain reasonable (small degree) DHT routing mechanisms should be decentralized (no single point of failure or bottleneck) Should gracefully handle nodes joining and leaving Repartition the affected keys over existing nodes Reorganize the neighbor sets Bootstrap mechanisms to connect new nodes into the DHT To achieve good performance, DHT must provide low stretch Minimize ratio of DHT routing vs. unicast latency

DHT Interface Minimal interface (data-centric) Lookup(key) IP address Supports a wide range of applications, because few restrictions Keys have no semantic meaning Value is application dependent DHTs do not store the data Data storage can be build on top of DHTs Lookup(key) data Insert(key,, data)

DHTs in Context User Application CFS DHash Chord TCP/IP store_file store_block File System Reliable Block Storage send lookup DHT load_file receive Transport load_block Retrieve and store files Map files to blocks Storage Replication Caching Lookup Routing Communication

DHTs Support Many Applications File sharing [CFS, OceanStore, PAST, ] Web cache [Squirrel, ] Censor-resistant stores [Eternity, FreeNet, ] Application-layer multicast [Narada, ] Event notification [Scribe] Naming systems [ChordDNS, INS, ] Query and indexing [Kademlia, ] Communication primitives [I3, ] Backup store [HiveNet] Web archive [Herodotus]

Origins of Structured overlays Accessing Nearby Copies of Replicated Objects in a Distributed Environment, by Greg Plaxton, Rajmohan Rajaraman, and Andrea Richa, at SPAA 1997 The paper proposes an efficient search routine (similar to the evangelist papers). In particular search, insert, delete, storage costs are all logarithmic, the base of the logarithm is a parameter. Prefix routing, distance and coordinates! Theory paper

Evolution

Hypercubic topologies Hypercube: Plaxton, Chord, Kademlia, Pastry, Tapestry Butterfly/Benes: Viceroy, Mariposa De BruijnGraph: Koorde SkipList: SkipGraph, SkipNet Pancake Graph Cube Connected Cycles

DHT Case Studies Case Studies Chord Pastry CAN Questions How is the hash space divided evenly among nodes? How do we locate a node? How does we maintain routing tables? How does we cope with (rapid) changes in membership?

Chord (MIT) Circular m-bit ID space for both keys and nodes Node ID = SHA-1(IP address) Key ID = SHA-1(key) A key is mapped to the first node whose ID is equal to or follows the key ID Each node is responsible for O(K/N) keys O(K/N) keys move when a node joins or leaves N48 N51 N56 K54 N42 m=6 N38 K38 2 m -1 0 N1 N32 K30 N8 K10 N14 N21 K24

Chord State and Lookup (1) Basic Chord: each node knows only 2 other nodes on the ring m=6 2 m -1 0 N1 Successor Predecessor (for ring management) N51 N56 K54 lookup(k54) N8 Lookup is achieved by forwarding requests around the ring through successor pointers N48 N42 N38 N21 N14 Requires O(N) hops N32

Chord State and Lookup (2) Each node knows m other nodes on the ring Successors: finger i of n points to node at n+2 i (or successor) Predecessor (for ring management) O(log N) state per node Lookup is achieved by following closest preceding fingers, then successor O(log N) hops N48 N51 N56 K54 N42 m=6 N38 2 m -1 0 N1 lookup(k54) +16 +8 +32 N32 Finger table N8+1 N8+2 N8+4 N8+8 N8+16 N8+32 N8 +1 +2 +4 N21 N14 N14 N14 N21 N32 N42 N14

Chord Ring Management For correctness, Chord needs to maintain the following invariants For every key k, succ(k) is responsible for k Successor pointers are correctly maintained Finger table are not necessary for correctness One can always default to successor-based lookup Finger table can be updated lazily

Joining the Ring Three step process: Initialize all fingers of new node Update fingers of existing nodes Transfer keys from successor to new node

Joining the Ring Step 1 Initialize the new node finger table Locate any node n in the ring Ask n to lookup the peers at j+2 0, j+2 1, j+2 2 Use results to populate finger table of j

Joining the Ring Step 2 Updating fingers of existing nodes N56 m=6 2 m -1 0 N1 6 N8+1 N8+2 N8+4 N8+8 N8+16 N8+32 N8 N14 N14 N14 N21 N32 N28 N42 New node j calls update function on existing nodes that must point to j N48 N51 +16-16 12 N14 Nodes in the ranges [j-2 i, pred(j)-2 i +1] O(log N) nodes need to be updated N42 N38 N32 N28 N21

Joining the Ring Step 3 Transfer key responsibility Connect to successor Copy keys from successor to new node Update successor pointer and remove keys Only keys in the range are transferred N21 N21 N21 N21 N32 N32 N28 N32 N28 N32 N28 K24 K24 K24 K24 K30 K24 K30 K30 K30

Stabilization Case 1: finger tables are reasonably fresh Case 2: successor pointers are correct, not fingers Case 3: successor pointers are inaccurate or key migration is incomplete MUST BE AVOIDED! Stabilization algorithm periodically verifies and refreshes node pointers (including fingers) Basic principle (at node n): x = n.succ.pred if x (n, n.succ) n = n.succ notify n.succ Eventually stabilizes the system when no node joins or fails N32 N28 N21

Dealing With Failures Failure of nodes might cause incorrect lookup N8 doesn t know correct successor, so lookup of K19 fails Solution: successor list Each node n knows r immediate successors After failure, n knows first live successor and updates successor list Correct successors guarantee correct lookups N48 N51 N42 N56 m=6 N38 2 m -1 0 N1 lookup(k19)? +16 +8 N32 N8 +1 +2 +4 N14 N18 K19 N21

Chord and Network Topology Nodes numerically-close are not topologically-close (1M nodes = 10+ hops)

Pastry (MSR) Circular m-bit ID space for both keys and nodes m=8 b=2 2 m -1 0 N0002 Addresses in base 2 b with m / b digits Node ID = SHA-1(IP address) Key ID = SHA-1(key) A key is mapped to the node whose ID is numerically-closest the key ID N3033 N3001 N3200 K3122 N2222 N2120 K2120 N2001 K1320 N0201 K0220 N0322 N1113 K1201

Pastry Lookup Prefix routing from A to B At h th hop, arrive at node that shares prefix with B of length at least h digits Example: 5324 routes to 0629 via 5324 0748 0605 0620 0629 If there is no such node, forward message to neighbor numerically-closer to destination (successor) 5324 0748 0605 0609 0620 0629 O(log 2 b N) hops

Pastry State and Lookup For each prefix, a node knows some other node (if any) with same prefix and different next digit For instance, N0201: N-: N1???, N2???, N3??? N0: N00??, N01??, N03?? N02: N021?, N022?, N023? N020: N0200, N0202, N0203 When multiple nodes, choose topologically- closest Maintain good locality properties (more on that later) N3033 N3001 N3200 N2222 m=8 b=2 N2120 K2120 2 m -1 0 N0002 lookup(k2120) N2001 N0122 Routing table N0201 N0212 N0221 N0233 N1113 N0322

A Pastry Routing Table m=16 Contains the nodes that are numerically closest to local node MUST BE UP TO DATE Contains the nodes that are closest to local node according to proximity metric b=2 m/b rows Node ID 10233102 Leaf set < SMALLER LARGER > 10233033 10233021 10233120 10233122 10233001 10233000 10233230 10233232 Routing Table 02212102 1 22301203 31203203 0 11301233 12230203 13021022 10031203 10132102 2 10323302 10200230 10211302 10222302 3 10230322 10231000 10232121 3 10233001 1 10233232 0 10233120 2 Neighborhood set 2 b -1 entries per row 13021022 10200230 11301233 31301233 02212102 22301203 31203203 33213321 b=2, so node ID is base 4 (16 bits) Entries in the m th column have m as next digit n th digit of current node Entries in the n th row share the first n digits with current node [ common-prefix next-digit rest ] Entries with no suitable node ID are left empty

Pastry and Network Topology Expected node distance increases with row number in routing table Smaller and smaller numerical jumps Bigger and bigger topological jumps

Joining X joins X knows A (A is close to X) X 0629 0629 s routing table Join message A 5324 Route message to node numerically closest to X s ID D s leaf set A 0???? B 1 0??? C 2 06?? D 4 062? B 0748 A s neighborhood set C 0605 D 0620

Locality The joining phase preserves the locality property First: A must be near X Entries in row zero of A s routing table are close to A, A is close to X X 0 can be A 0 The distance from B to nodes from B 1 is much larger than distance from A to B (B is in A 0 ) B 1 can be reasonable choice for X 1, C 2 for X 2, etc. To avoid cascading errors, X requests the state from each of the node in its routing table and updates its own with any closer node This scheme works pretty well in practice Minimize the distance of the next routing step with no sense of global direction Stretch around 2-3

Node Departure Node is considered failed when its immediate neighbors in the node ID space cannot communicate with it To replace a failed node in the leaf set, the node contacts the live node with the largest index on the side of failed node, and asks for its leaf set To repair a failed routing table entry R d l, node contacts first the node referred to by another entry R i l, i d of the same row, and ask for that node s entry for R d l If a member in the M table, is not responding, node asks other members for their M table, check the distance of each of the newly discovered nodes, and update its own M table

CAN (Berkeley) Cartesian space (ddimensional) Space wraps up: d- torus Incrementally split space between nodes that join Node (cell) responsible for key k is determined by hashing k for each dimension h y (k) d=2 insert(k,data) 1 1 2 retrieve(k) h x (k) 4 2 2 3

CAN State and Lookup A node A only maintains state for its immediate neighbors (N, S, E, W) d=2 2d neighbors per node Messages are routed to neighbor that minimizes Cartesian distance B More dimensions means faster the routing but also more state W N A E (dn 1/d )/4 hops on average Multiple choices: we can route around failures S

CAN Landmark Routing CAN nodes do not have a pre-defined ID Nodes can be placed according to locality Use well known set of m landmark machines (e.g., root DNS servers) Each CAN node measures its RTT to each landmark Orders the landmarks in order of increasing RTT: m! possible orderings CAN construction Place nodes with same ordering close together in the CAN To do so, partition the space into m! zones: m zones on x, m-1 on y, etc. A node interprets its ordering as the coordinate of its zone

CAN and Network Topology C;A;B C A Use m landmarks to split space in m! zones Nodes get random zone in their zone A;C;B C;B;A Topologically-close nodes tend to be in the same zone A;B;C B;A;C B B;C;A

Conclusion DHT is a simple, yet powerful abstraction Building block of many distributed services (file systems, application-layer multicast, distributed caches, etc.) Many DHT designs, with various pros and cons Balance between state (degree), speed of lookup (diameter), and ease of management System must support rapid changes in membership Dealing with joins/leaves/failures is not trivial Dynamics of P2P network is difficult to analyze Many open issues worth exploring