Structured Peer-to-Peer Networks

Similar documents
L3S Research Center, University of Hannover

Introduction to Peer-to-Peer Networks

LECT-05, S-1 FP2P, Javed I.

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Part 1: Introducing DHTs

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General

Distributed Hash Tables (DHT)

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Distributed Hash Table

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Architectures for Distributed Systems

Advanced Distributed Systems. Peer to peer systems. Reference. Reference. What is P2P? Unstructured P2P Systems Structured P2P Systems

CS514: Intermediate Course in Computer Systems

Telematics Chapter 9: Peer-to-Peer Networks

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

08 Distributed Hash Tables

Athens University of Economics and Business. Dept. of Informatics

L3S Research Center, University of Hannover

Distributed Hash Tables: Chord

A Scalable Content- Addressable Network

Overlay Multicast. Application Layer Multicast. Structured Overlays Unstructured Overlays. CAN Flooding Centralised. Scribe/SplitStream Distributed

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Badri Nath Rutgers University

CIS 700/005 Networking Meets Databases

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Survey of DHT Evaluation Methods

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Distributed Data Management. Profr. Dr. Wolf-Tilo Balke Institut für Informationssysteme Technische Universität Braunschweig

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Peer-to-Peer Systems and Distributed Hash Tables

6 Structured P2P Networks

: Scalable Lookup

Building a low-latency, proximity-aware DHT-based P2P network

Structured P2P. Complexity O(log N)

Distributed Hash Tables

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

DRing: A Layered Scheme for Range Queries over DHTs

Introduction to P2P Computing

Distributed lookup services

Should we build Gnutella on a structured overlay? We believe

P2P: Distributed Hash Tables

Peer-to-Peer Systems. Chapter General Characteristics

Cycloid: A Constant-Degree and Lookup-Efficient P2P Overlay Network

Early Measurements of a Cluster-based Architecture for P2P Systems

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

Reminder: Distributed Hash Table (DHT) CS514: Intermediate Course in Operating Systems. Several flavors, each with variants.

Peer-to-Peer Systems and Security IN2194. Chapter 1 Peer-to-Peer Systems 1.3 Structured Networks

Peer to peer systems: An overview

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Peer-to-Peer (P2P) Systems

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Kademlia: A P2P Informa2on System Based on the XOR Metric

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Distriubted Hash Tables and Scalable Content Adressable Network (CAN)

ReCord: A Distributed Hash Table with Recursive Structure

Modern Technology of Internet

Effect of Links on DHT Routing Algorithms 1

DATA. The main challenge in P2P computing is to design and implement LOOKING UP. in P2P Systems

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network

Introduction to Peer-to-Peer Systems

Cycloid: A Constant-Degree and Lookup-Efficient P2P Overlay Network

Distributed Knowledge Organization and Peer-to-Peer Networks

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Distributed Hash Tables

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Distributed hash table - Wikipedia, the free encyclopedia

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

E. Shim Panasonic October 16, 2006

Understanding Chord Performance

Motivation for peer-to-peer

A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol

A Survey of Peer-to-Peer Content Distribution Technologies

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Key: Location and Routing. Driving Applications

CS 268: DHTs. Page 1. How Did it Start? Model. Main Challenge. Napster. Other Challenges

Ken Birman. Cornell University. CS5410 Fall 2008.

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

A Method for Designing Proximity-aware Routing Algorithms for Structured Overlays

Peer-to-Peer Systems and Security IN2194. Chapter 1 Peer-to-Peer Systems 1.3 Structured Networks

Peer to Peer Networks

Overview. Peer-to-Peer Systems and Security IN2194. Chapter 1 Peer-to-Peer Systems 1.3 Structured Networks. Motivation. Routing on the Internet

Winter CS454/ Assignment 2 Instructor: Bernard Wong Due date: March 12 15, 2012 Group size: 2

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

CRESCENDO GEORGE S. NOMIKOS. Advisor: Dr. George Xylomenos

Cycloid: A constant-degree and lookup-efficient P2P overlay network

R/Kademlia: Recursive and Topology-aware Overlay Routing

Transcription:

Structured Peer-to-Peer Networks The P2P Scaling Problem Unstructured P2P Revisited Distributed Indexing Fundamentals of Distributed Hash Tables DHT Algorithms Chord Pastry Can Programming a DHT Graphics repeatedly taken from: R.Steinmetz, K. Wehrle: Peer-to-Peer Systems and Applications, Springer LNCS 3485, 2005 1 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Demands of P2P Systems Instant Deployment Independent of infrastructural provisions Flexibility Seamless adaptation to changing member requirements Reliability Robustness against node or infrastructure failures Scalability Resources per node do not (significantly) increase as the P2P network grows 2 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

The Challenge in Peer-to-Peer Systems? I have item D. Where to place D? D? Data item D distributed system I want item D. Where can I find D?? 12.5.7.31 peer-to-peer.info berkeley.edu planet-lab.org 89.11.20.15 95.7.6.10 86.8.10.18 7.31.10.25 Location of resources (data items) distributed among systems Where shall the item be stored by the provider? How does a requester find the actual location of an item? Scalability: limit the complexity for communication and storage Robustness and resilience in case of faults and frequent changes 3 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Unstructured P2P Revisited Basically two approaches: Centralized Simple, flexible searches at server (O(1)) Single point of failure, O(N) node states at server Decentralized Flooding Fault tolerant, O(1) node states Communication overhead O(N 2 ), search may fail But: No reference structure between nodes imposed 4 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Unstructured P2P: Complexities Flooding Communication Overhead O(N) O(log N) O(1) Bottleneck: Communication Overhead False negatives? Scalable solution between both extremes? Bottlenecks: Memory, CPU, Network Availability Central Server O(1) O(log N) O(N) 5 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Idea: Distributed Indexing Initial ideas from distributed shared memories (1987 ff.) Nodes are structured according to some address space Data is mapped into the same address space Intermediate nodes maintain routing information to target nodes Efficient forwarding to destination (content not location) Definitive statement about existence of content H( my data ) = 3107 709 1008 1622 2011 2207? 611 3485 2906 6 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Scalability of Distributed Indexing Communication effort: O(log(N)) hops Node state: O(log(N)) routing entries Routing in O(log(N)) steps to the node storing the data H( my data ) = 3107 709 1008 1622 2011 2207? 611 3485 2906 Nodes store O(log(N)) routing information to other nodes 12.5.7.31 berkeley.edu planet-lab.org peer-to-peer.info 89.11.20.15 95.7.6.10 86.8.10.18 7.31.10.25 7 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Distributed Indexing: Complexities Communication Overhead O(N) O(log N) O(1) Flooding Bottleneck: Communication Overhead False negatives Distributed Hash Table Scalability: O(log N) No false negatives Resistant against changes Failures, Attacks Short time users Bottlenecks: Memory, CPU, Network Availability Central Server O(1) O(log N) Node State 8 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt O(N)

Fundamentals of Distributed Hash Tables Desired Characteristics: Flexibility, Reliability, Scalability Challenges for designing DHTs Equal distribution of content among nodes Crucial for efficient lookup of content Permanent adaptation to faults, joins, exits of nodes Assignment of responsibilities to new nodes Re-assignment and re-distribution of responsibilities in case of node failure or departure Maintenance of routing information 9 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Distributed Management of Data 1. Mapping of nodes and data into same address space Peers and content are addressed using flat identifiers (IDs) Nodes are responsible for data in certain parts of the address space Association of data to nodes may change since nodes may disappear 2. Storing / Looking up data in the DHT Search for data = routing to the responsible node Responsible node not necessarily known in advance Deterministic statement about availability of data 10 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Addressing in Distributed Hash Tables Step 1: Mapping of content/nodes into linear space Usually: 0,, 2 m -1 à number of objects to be stored Mapping of data and nodes into an address space (with hash function) E.g., Hash(String) mod 2 m : H( my data ) 2313 Association of parts of address space to DHT nodes 3485-610 611-709 1008-1621 1622-2010 2011-2206 2207-2905 2906-3484 (3485-610) 2 m -1 0 H(Node Y)=3485 Data item D : H( D )=3107 Y 11 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt X H(Node X)=2906 Often, the address space is viewed as a circle.

Mapping Address Space to Nodes Each node is responsible for part of the value range Often with redundancy (overlapping of parts) Continuous adaptation Real (underlay) and logical (overlay) topology so far uncorrelated Node 3485 is responsible for data items in range 2907 to 3485 (in case of a Chord-DHT) 709 1008 1622 2011 2207 Logical view of the Distributed Hash Table 611 3485 2906 Mapping on the real topology 12 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Routing to a Data Item Step 2: Locating the data (content-based routing) Goal: Small and scalable effort O(1) with centralized hash table Minimum overhead with distributed hash tables O(log N): DHT hops to locate object O(log N): number of keys and routing information per node (N = # nodes) 13 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Routing to a Data Item (2) Routing to a Key-Value-pair Start lookup at arbitrary node of DHT Routing to requested data item (key) recursively according to node tables H( my data ) = 3107? 709 1008 1622 2011 2207 Node 3485 manages keys 2907-3485, 611 3485 2906 Key = H( my data ) Initial node (arbitrary) (3107, (ip, port)) 14 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt Value = pointer to location of data

Routing to a Data Item (3) Getting the content K/V-pair is delivered to requester Requester analyzes K/V-tuple (and downloads data from actual location in case of indirect storage) H( my data ) = 3107 709 1008 Get_Data(ip, port) 1622 2011 2207 In case of indirect storage: After knowing the actual Location, data is requested? 611 3485 2906 Node 3485 sends (3107, (ip/port)) to requester 15 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Data Storage Direct storage Content is stored in responsible node for H( my data ) Inflexible for large content o.k. for small data (<1KB) Indirect storage Nodes in a DHT store tuples like (key,value) Key = Hash( my data ) 2313 Value is often real storage address of content: (IP, Port) = (134.2.11.140, 4711) More flexible, but one step more to reach content 16 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Dynamic of a DHT: Node Arrival Bootstrapping/Joining of a new node 1. Calculation of node ID 2. New node contacts DHT via arbitrary node 3. Assignment of a particular hash range 4. Copying of K/V-pairs of hash range (usually with redundancy) 5. Binding into routing environment (of overlay) 709 1008 1622 2011 2207 611 3485 2906 ID: 3485 134.2.11.68 17 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Node Failure / Departure Failure of a node Use of redundant K/V pairs (if a node fails) Use of redundant / alternative routing paths Key-value usually still retrievable if at least one copy remains Departure of a node Partitioning of hash range to neighbor nodes Copying of K/V pairs to corresponding nodes Unbinding from routing environment 18 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

DHT Algorithms Lookup algorithm for nearby objects (Plaxton et al 1997) Before P2P later used in Tapestry Chord (Stoica et al 2001) Straight forward 1-dim. DHT Pastry (Rowstron & Druschel 2001) Proximity neighbour selection CAN (Ratnasamy et al 2001) Route optimisation in a multidimensional identifier space Kademlia (Maymounkov & Mazières 2002) 19 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Overview Early and successful algorithm Simple & elegant easy to understand and implement many improvements and optimizations exist Main responsibilities: Routing Flat logical address space: l-bit identifiers instead of IPs Efficient routing in large systems: log(n) hops, with N number of total nodes Self-organization Handle node arrival, departure, and failure 20 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Topology Hash-table storage put (key, value) inserts data into Chord Value = get (key) retrieves data from Chord Identifiers from consistent hashing Uses monotonic, load balancing hash function E.g. SHA-1, 160-bit output 0 <= identifier < 2 160 Key associated with data item E.g. key = sha-1(value) ID associated with host E.g. id = sha-1 (IP address, port) 21 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Topology Keys and IDs on ring, i.e., all arithmetic modulo 2 160 (key, value) pairs managed by clockwise next node: successor 6 7 0 1 1 successor(1) = 1 successor(6) = 0 6 6 Chord Ring 2 2 successor(2) = 3 5 4 3 2 X Identifier Node Key 22 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Topology Topology determined by links between nodes Link: knowledge about another node Stored in routing table on each node Simplest topology: circular linked list Each node has link to clockwise next node 7 0 1 6 2 5 4 3 23 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Routing on Ring? Primitive routing: Forward query for key x until successor(x) is found Return result to source of query Pros: Simple Little node state Cons: Poor lookup efficiency: O(1/2 * N) hops on average (with N nodes) Node failure breaks circle 24 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt 6 7 5 6 0 4 1 3 2 1 2 Node 0 Key 6?

Improved Routing on Ring? Improved routing: Store links to z next neighbors, Forward queries for k to farthest known predecessor of k For z = N: fully meshed routing system Lookup efficiency: O(1) Per-node state: O(N) Still poor scalability in linear routing progress Scalable routing: Mix of short- and long-distance links required: Accurate routing in node s vicinity Fast routing progress over large distances Bounded number of links per node 25 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Routing Chord s routing table: finger table Stores log(n) links per node Covers exponentially increasing distances: Node n: entry i points to successor(n + 2 i ) (i-th finger) finger table i start succ. keys 6 0 1 2 1 2 4 1 3 0 7 0 1 finger table i start 0 1 2 2 3 5 succ. 3 3 0 keys 1 6 2 5 3 finger table i start succ. keys 2 4 0 1 2 4 5 7 0 0 0 26 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Routing Chord s routing algorithm: Each node n forwards query for key k clockwise To farthest finger preceding k Until n = predecessor(k) and successor(n) = successor(k) Return successor(n) to source of query i 2^i Target Link 52 0 1 i 2^i 40 43Target 42 45Link 1 2 0 1 41 44 24 53 42 4526 54 1 2 25 54 26 54 49 lookup 2 4 43 46 (44) (44) = 49 45 2 4 27 56 30 56 3 8 3 8 47 50 31 60 49 5233 60 4 164164 55 58 39 456 6039 4 45 5 325325 10 7 55 2013 7 56 23 44 56 54 42 39 37 60 63 33 30 4 26 7 23 13 14 16 19 27 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Self-Organization Handle changing network environment Failure of nodes Network failures Arrival of new nodes Departure of participating nodes Maintain consistent system state for routing Keep routing information up to date Routing correctness depends on correct successor information Routing efficiency depends on correct finger tables Failure tolerance required for all operations 28 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Failure Tolerance in Storage Layered design Chord DHT mainly responsible for routing Data storage managed by application persistence consistency Chord soft-state approach: Nodes delete (key, value) pairs after timeout Applications need to refresh (key, value) pairs periodically Worst case: data unavailable for refresh interval after node failure 29 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Failure Tolerance in Routing Finger failures during routing query cannot be forwarded to finger forward to previous finger (do not overshoot destination node) trigger repair mechanism: replace finger with its successor Active finger maintenance periodically check fingers fix_fingers 49 52 56 54 60 63 4 7 13 14 16 replace with correct nodes on failures trade-off: maintenance traffic vs. correctness & timeliness 30 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt 45 44 42 39 37 33 30 26 23 19

Chord: Failure Tolerance in Routing Successor failure during routing Last step of routing can return node failure to source of query -> all queries for successor fail Store n successors in successor list successor[0] fails -> use successor[1] etc. routing fails only if n consecutive nodes fail simultaneously Active maintenance of successor list periodic checks similar to finger table maintenance stabilize uses predecessor pointer crucial for correct routing 31 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Node Arrival New node picks ID Contact existing node Construct finger table via standard routing/lookup() Retrieve (key, value) pairs from successor finger table i start succ. keys 6 0 1 2 1 2 4 1 3 0 7 0 1 finger table i start 0 1 2 2 3 5 succ. 3 3 0 keys 1 6 2 finger table keys i 0 1 2 start 7 0 2 succ. 0 0 3 5 4 3 finger table i start 0 1 2 4 5 7 succ. 0 0 0 keys 2 32 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Node Arrival Examples for choosing new node IDs random ID: equal distribution assumed but not guaranteed hash IP address & port external observables Retrieval of existing node IDs Controlled flooding DNS aliases Published through web etc. DNS entrypoint.chord.org? 182.84.10.23 6 7 5 ID = rand()? = 6 0 4 1 3 2 33 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Node Arrival Construction of finger table iterate over finger table rows for each row: query entry point for successor standard Chord routing on entry point Construction of successor list add immediate successor from finger table request successor list from successor finger table i start 0 1 2 7 0 2 succ. 0 0 3 keys 6 7 successor list 1 3 0 1 succ(7)? = 0 succ(0)? = 0 succ(2)? = 3 2 successor list 0 1 34 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt 5 4 3

Chord: Node Departure Deliberate node departure clean shutdown instead of failure For simplicity: treat as failure system already failure tolerant soft state: automatic state restoration state is lost briefly invalid finger table entries: reduced routing efficiency For efficiency: handle explicitly notification by departing node to successor, predecessor, nodes at finger distances copy (key, value) pairs before shutdown 35 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Performance Impact of node failures on lookup failure rate lookup failure rate roughly equivalent to node failure rate 36 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Performance Moderate impact of number of nodes on lookup latency Consistent average path length 37 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Performance Lookup latency (number of hops/messages): ~ 1/2 log 2 (N) Confirms theoretical estimation Lookup Path Length Number of Nodes 38 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Chord: Summary Complexity Messages per lookup: O(log N) Memory per node: O(log N) Messages per management action (join/leave/fail): O(log² N) Advantages Theoretical models and proofs about complexity Simple & flexible Disadvantages No notion of node proximity and proximity-based routing optimizations Chord rings may become disjoint in realistic settings Many improvements published e.g. proximity, bi-directional links, load balancing, etc. 39 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry: Overview Similar to Chord: Organises nodes & keys in a ring of flat hash IDs 0 ID 2 128-1 Uses prefix-based routing: Interprets identifiers as digit strings of base 2 b, b 4 Routing according to longer prefix match Result: routing down a tree Routing table built according to proximity selection enhanced routing efficiency due to locality 40 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry: Identifier Mapping Pastry views l-bit identifiers as digit strings of base 2 b Example: l = 4, b = 2 Keys (K..) are stored at closest node (N..) according to prefix metric In case of equal distance key is stored on both neighbouring nodes (K22) 41 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry Routing Table Contains l/b rows ( the range of string lenths ) 2 b -1 columns ( the digits, one represents the node) Cell position approximates pastry node v within overlay, using the index transformation ( concatenates): Cell value maps to corresponding network address As there are several nodes with same prefix match: topologically closest selected for routing table Proximity Neighbour Selection (PSN) 42 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Prefix-Match Routing Table Node ID v = 103220, l= 12, b = 2 0 1 2 3 0 031120 1 201303 312201 1 0 110003 120132 132012 2 100221 101203 102303 3 3 103031 103112 2 103302 4 103200 103210 2 103233 5 0 103221 103222 103223 43 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Routing & Lookup Tables Three tables: Routing Prefix Match Leaf Set Closest Nodes in Overlay Neighbourhood Set Closest Nodes in phys. Network according to given metric: RTT, Hops, 44 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry Routing Step 1: Check, if key k is within the range of the leaf set Request forwarded to closest node in leaf set Step 2: For k not in the range of leaf set, lookup routing table Try to identify entry with longer common prefix If not available, route to entry closer to key Note: Routing is loop-free, as forwarding is strictly done according to numerical closeness. 45 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry Routing Examples Key k = 103200 Node ID v = 103220 Key k = 102022 Key k = 103000 46 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry: Node Arrival New node n picks Pastry ID and contacts a Pastry node k nearby w.r.t the proximity metric As k is nearby, its neighbourhood set is copied to n The leaf set is copied from the numerically closest overlay node c, which n reaches by a join message via k The join message is forwarded along nodes with increasingly longer prefixes common to n and will trigger routing updates from intermediate nodes to n Finally n sends its state to all nodes in its routing tables (active route propagation incl. time stamps) 47 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry: Node Failure Node failure arrives at contact failures of tabulated nodes Lazy failure detection Pastry provides several redundancies: Routing tables may include several equivalent entries Forwarding may take place to an adjacent entry Routing & neighbourhood table repair: Query nodes neighbouring in table rows If unsuccessful: query entries from previous rows Lively routing tables are advertised from new nodes 48 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry: Hop Performance 49 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry: Delay Stretch 50 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Pastry: Summary Complexity Messages per lookup: O( log 2 b N) Messages per mgmt. action (join/leave/fail): O( log 2 bn)/o(log b N) Memory per node: O(b log 2 b N) Advantages Exploits proximity neighbouring Robust & flexible Disadvantages Complex, theoretical modelling & analysis more difficult Pastry admits constant delay stretch w.r.t. # of overlay nodes, but depends on network topology Chord s delay stretch remains independent of topology, but depends on overlay size 51 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

CAN: Overview Maps node IDs to regions, which partition d-dimensional space Keys correspondingly are coordinate points in a d- dim. torus: <k 1,, k d > Routing from neighbour to neighbour neighbourhood enhanced in high dimensionality d tuning parameter of the system 52 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

CAN: Space Partitioning Keys mapped into [0,1] d (or other numerical interval) Node s regions always cover the entire torus Data is placed on node, who owns zone of its key Zone management is done by splitting / re-merging regions Dimensional ordering to retain spatial coherence 53 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

CAN Routing Each node maintains a coordinate neighbour set (Neighbours overlap in (d-1) dim. and abut in the remaining dim.) Routing is done from neighbour to neighbour along the straight line path from source to destination: Forwarding is done to that neighbour with coordinate zone closest to destination 54 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

CAN Node Arrival The new node 1. Picks a random coordinate 2. Contacts any CAN node and routes a join to the owner of the corresponding zone 3. Splits zone to acquire region of its picked point & learns neighbours from previous owner 4. Advertises its presence to neighbours 55 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Node Failure / Departure Node failure detected by missing update messages Leaving gracefully, a node notifies neighbours and copies its content On node s disappearance zone needs re-occupation in a size-balancing approach: Neighbours start timers invers. proportional to their zone size On timeout a neighbour requests takeover, responded only by those nodes with smaller zone sizes 56 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

CAN Optimisations Redundancy: Multiple simultaneous coordinate spaces - Realities Expedited Routing: Cartesian Distance weighted by network-level measures Path-length reduction: Overloading coordinate zones Proximity neighbouring: Topologically sensitive construction of overlay (landmarking) 57 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

CAN Path Length Evaluation 58 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

CAN Path Length Evaluation (2) 59 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

CAN: Summary Complexity Messages per lookup: O(N 1/d ) Messages per mgmt. action (join/leave/fail): O( d/2 N 1/d )/O(2 d) Memory per node: O(d) Advantages Performance parametrisable through dimensionality Simple basic principle, easy to analyse & improve Disadvantages Lookup complexity is not logarithmically bound Due to its simple construction, CAN is open to many variants, improvements and customisations 60 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Implementations / Deployment Many concepts & implementations Storage Systems Content Distribution Indexing/Naming DB Query Processing, Real Deployment: Public DHT-Service: OpenDHT Filesharing: Overnet (edonkey), BitTorrent (newer) Media Conferencing: P2P-SIP Music Indexing: freedb WebCaching: Coral Problems: Overload + Starvation, Need Fairness Balance 61 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Programming a DHT Two communication interfaces: One towards the application layer (user of the DHT) One towards other nodes within the DHT Functions similar Node-to-Node Interface must be network transparent, choice of: Application layer protocol (using TCP or UDP sockets) Remote procedure calls (RPCs) Remote Method Invocation (RMI/Corba) Web services Application layer interface may be local or distributed 62 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

DHT Data Interfaces Generic interface of distributed hash tables Provisioning of information Publish(key,value) Requesting of information (search for content) Lookup(key) Reply: value DHT approaches are interchangeable (with respect to interface) Distributed Application Put(Key,Value) Get(Key) Distributed Hash Table (CAN, Chord, Pastry, Tapestry, ) Value Node 1 Node 2 Node 3.... Node N 63 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

DHT Self Organisation Interface Join(mykey): Retrieve ring successor & predecessor (neighbours), initiate key transfer Leave(): Transfer predecessor and keys to successor Maintain predecessor & successor (neighbour) list, e.g., stabilize in Chord Maintain routing table, e.g., fix_fingers in Chord 64 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Dabek Model Layered approach towards a unified overlay routing Core idea: KBR layer (Tier 0) as a routing abstraction on (interchangeable) structured schemes Tier 1: General services Tier 2: Higher layer services and applications 65 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

Common KBR API 66 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

References C.Plaxton, R. Rajaraman, A. Richa: Accessing Nearby Copies of Replicated Objects in a Distributed Environment, Proc. of 9th ACM Sympos. on parallel Algor. and Arch. (SPAA), pp.311-330, June 1997. I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. Proc. of the 2001 ACM SigComm, pp. 149 160, ACM Press, 2001. A. Rowstron and P. Druschel: Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. IFIP/ACM Intern. Conference on Distrib. Systems Platforms (Middleware), pp. 329-350, Springer, 2001. S. Ratnasamy, P. Francis, M. Handley, R. Karp: A Scalable Content-Addressable Network. Proc. of the 2001 ACM SigComm, pp. 161 172, ACM Press, 2001. F. Dabek et al.: Towards a Common API for Structured Peer-to-Peer Overlays, IPTPS 2003, LNCS, Vol 2735, pp. 33-44, Springer, 2003 67 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt