Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
The Architectures of st and nd Gen. PP Client-Server Peer-to-Peer. Server is the central entity and only provider of service and content. Network managed by the Server. Server as the higher performance system. 3. Clients as the lower performance system. Resources are shared between the peers. Resources can be accessed directly from other peers 3. Peer is provider and requestor (Servent concept) Unstructured PP Structured PP Centralized PP Pure PP Hybrid PP DHT-Based Example: WWW. All features of Peer-to-Peer included. Central entity is necessary to provide the service 3. Central entity is some kind of index/group database Example: Napster. All features of Peer-to-Peer included. Any terminal entity can be removed without loss of functionality 3. No central entities Examples: Gnutella.4, Freenet. All features of Peer-to-Peer included. Any terminal entity can be removed without loss of functionality 3. dynamic central entities Example: Gnutella.6, JXTA. All features of Peer-to-Peer included. Any terminal entity can be removed without loss of functionality 3. No central entities 4. Connections in the overlay are fixed Examples: Chord, CAN st Gen. nd Gen.
Addressing in Distributed Hash Tables Step : Mapping of content/nodes into linear space Usually:,, m - >> number of objects to be stored Mapping of data and nodes into an address space (with hash function) E.g., Hash(String) mod m : H( my data ) 33 Association of parts of address space to DHT nodes 3485-6 6-79 8-6 6 - - 6 7-95 96-3484 (3485-6) m - H(Node Y)=3485 Data item D : H( D )=37 Y X H(Node X)=96 Often, the address space is viewed as a circle.
Step : Routing to a Data Item Routing to a K/V-pair Start lookup at arbitrary node of DHT Routing to requested data item (key) put (key, value) value = get (key) H( my data ) = 37 79 8 6 7 Node 3485 manages keys 97-3485, 6 Initial node (arbitrary) 3485 96 Key = H( my data ) (37, (ip, port)) Value = pointer to location of data
Step : Routing to a Data Item Getting the content K/V-pair is delivered to requester Requester analyzes K/V-tuple (and downloads data from actual location in case of indirect storage) H( my data ) = 37 79 Get_Data(ip, port) 6 8 7 In case of indirect storage: After knowing the actual Location, data is requested 6 3485 96 Node 3485 sends (37, (ip/port)) to requester
Chord
Chord: Topology Keys and IDs on ring, i.e., all arithmetic modulo ^6 (key, value) pairs managed by clockwise next node: successor 6 7 successor() = successor(6) = 6 6 Chord Ring successor() = 3 5 4 3 X Identifier Node Key
Chord: Primitive Routing Primitive routing: Forward query for key x until successor(x) is found Return result to source of query Pros: Cons: Simple Little node state Poor lookup efficiency: O(/ * N) hops on average (with N nodes) Node failure breaks circle 6 7 6 Node Key 6? 5 3 4
Chord: Routing Chord s routing table: finger table Stores log(n) links per node Covers exponentially increasing distances: Node n: entry i points to successor(n + ^i) (i-th finger) finger table i start succ. keys 6 3 4 7 finger table i start 3 5 succ. 3 3 keys 6 5 4 3 finger table i start 4 5 7 succ. keys
Chord: Routing Chord s routing algorithm: Each node n forwards query for key k clockwise To farthest finger preceding k Until n = predecessor(k) and successor(n) = successor(k) Return successor(n) to source of query 6 63 4 i ^i Target Link i ^i 5 i ^i Target Target 4 43 Link Link 4 45 453 4 44 654 4 45 554 654 49 lookup 4 43 46 (44) (44) = 49 45 4 756 356 3 8 6 47 5 6 49 5 3 8 3 33 4 6 55 58 4 56 6 4 45 4 6 39 393 44 5 3 7 3 7 5 3 55 56 56 54 4 39 37 33 3 6 7 3 3 4 6 9
Comparison of Lookup Concepts System Per Node State Communication Overhead Fuzzy Queries No false negatives Robustness Central Server O(N) O() Flooding Search O() O(N²) Distributed Hash Tables O(log N) O(log N)
Extra slides
Summary of DHT Use of routing information for efficient search for content Self-organizing system Advantages Theoretical models and proofs about complexity (Lookup and memory O(log N)) Simple & flexible Supporting a wide spectrum of applications Disadvantages <Key, value> pairs can represent anything No notion of node proximity and proximity-based routing optimizations Chord rings may become disjoint in realistic settings No wildcard or range searches Performance under high churn. Especially handling of node departures Key deletion vs. refresh Many improvements published e.g. proximity, bi-directional links, load balancing, etc.
Different kinds of DHTs Specific examples of Distributed Hash Tables Chord, UC Berkeley, MIT Pastry, Microsoft Research, Rice University Tapestry, UC Berkeley CAN, UC Berkeley, ICSI P-Grid, EPFL Lausanne Kademlia, Symphony, Viceroy, A number of uses Distributed tracker PP SIP epost