Searching for Shared Resources: DHT in General

Similar documents
Searching for Shared Resources: DHT in General

L3S Research Center, University of Hannover

L3S Research Center, University of Hannover

Advanced Distributed Systems. Peer to peer systems. Reference. Reference. What is P2P? Unstructured P2P Systems Structured P2P Systems

Part 1: Introducing DHTs

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Distributed Hash Tables (DHT)

Telematics Chapter 9: Peer-to-Peer Networks

Structured Peer-to-Peer Networks

Architectures for Distributed Systems

Content Overlays. Nick Feamster CS 7260 March 12, 2007

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Introduction to P2P Computing

CS514: Intermediate Course in Computer Systems

Distributed Knowledge Organization and Peer-to-Peer Networks

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Distributed Hash Tables: Chord

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Structured P2P. Complexity O(log N)

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

LECT-05, S-1 FP2P, Javed I.

Badri Nath Rutgers University

Distributed Hash Table

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Peer-to-Peer Systems. Chapter General Characteristics

08 Distributed Hash Tables

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Distributed Data Management. Profr. Dr. Wolf-Tilo Balke Institut für Informationssysteme Technische Universität Braunschweig

6 Structured P2P Networks

Peer-to-Peer Systems and Distributed Hash Tables

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Modern Technology of Internet

Introduction to Peer-to-Peer Networks

CIS 700/005 Networking Meets Databases

CSE 486/586 Distributed Systems

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Peer-To-Peer Techniques

Security for Structured Peer-to-peer Overlay Networks. Acknowledgement. Outline. By Miguel Castro et al. OSDI 02 Presented by Shiping Chen in IT818

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Scalable overlay Networks

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Towards Benchmarking of P2P Technologies from a SCADA Systems Protection Perspective

CSE 5306 Distributed Systems

Introduction to Peer-to-Peer Systems

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

A Survey of Peer-to-Peer Content Distribution Technologies

CSCI-1680 P2P Rodrigo Fonseca

Peer-to-Peer (P2P) Systems

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

Distributed lookup services

Making Gnutella-like P2P Systems Scalable

Advanced Computer Networks

Lecture 8: Application Layer P2P Applications and DHTs

Peer- Peer to -peer Systems

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

Overlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Distributed Hash Tables

Symphony. Symphony. Acknowledgement. DHTs: : The Big Picture. Spectrum of DHT Protocols. Distributed Hashing in a Small World

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

CSE 5306 Distributed Systems. Naming

Distributed Information Processing

P2P: Distributed Hash Tables

Peer-to-peer computing research a fad?

: Scalable Lookup

An Adaptive Stabilization Framework for DHT

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Page 1. Key Value Storage"

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Early Measurements of a Cluster-based Architecture for P2P Systems

EE 122: Peer-to-Peer Networks

Peer-to-peer systems and overlay networks

Telecommunication Services Engineering Lab. Roch H. Glitho

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

Peer-to-Peer Internet Applications: A Review

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

Providing File Services using a Distributed Hash Table

Scaling Out Key-Value Storage

Chapter 6 PEER-TO-PEER COMPUTING

Distriubted Hash Tables and Scalable Content Adressable Network (CAN)

INF5070 media storage and distribution systems. to-peer Systems 10/

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Naming WHAT IS NAMING? Name: Entity: Slide 3. Slide 1. Address: Identifier:

Transcription:

1 ELT-53207 P2P & IoT Systems Searching for Shared Resources: DHT in General Mathieu Devos Tampere University of Technology Department of Electronics and Communications Engineering Based on the original slides provided by A. Surak (TUT), K. Wehrle, S. Götz, S. Rieche (University of Tübingen), Jani Peltotalo (TTY), Olivier Lamotte (TTY) and the OSDI team 2006 ELT-53206

2 Distributed Management and Retrieval of Data Challenges in P2P systems : Location of data among the distributed system Where to store the data, and how to request and recover it? Scalability of the topology Keep a low complexity (Big O) and scalable storage capabilities Fault tolerance and RESILIENCE Frequent changes, heterogeneous network I have D, Where to store D? D? Data item D Overlay layer Where can I find D? 12.5.7.31 peer-to-peer.info berkeley.edu planet-lab.org 89.11.20.15 95.7.6.10 86.8.10.18 7.31.10.25

3 Big O Notation Big O notation is widely used by computer scientists to concisely describe the behavior of algorithms Specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used by an algorithm Common types of orders O(1) constant O(log n) logarithmic O(n) linear O(n 2 ) quadratic

4 Comparison of strategies for data retrieval Strategies for storage and retrieval of data items scattered over distributed systems Central server (Napster) Flooding search (GNUTELLA, unstructured overlays) Distributed indexing (CHORD, PASTRY)

5 Central Server Simple strategy : the server stores information about locations Node A (provider) tells server that it stores item D Node B (requester) asks server S for the location of D Server S tells B that node A stores item D Node B requests item D from node A A stores D Server S Node B A stores D Node A A stores D

6 Central Server Pros and Cons Search complexity O(1) : just ask the server Complex and fuzzy queries possible Easy to implement Fault tolerance and RESILIENCE? Scalability of the topology? No scalability since stateful O(n) node state in server O(n) network and server load Easy target (single point of failure) also for law suites (Napster, TPB) Costs of maintenance, availability, scalability Not suitable for systems with massive numbers of users No self sustainability, needs moderation Location of data among the distributed system? YET : Best approach for small and simple applications!

Where is D? 7 Flooding Search Applied in UNSTRUCTURED P2P systems No information about the location of requested data in overlay Content is only stored in the node providing it Fully distributed approach Retrieval of data No routing information for content Necessity to ask as much systems as possible/necessary Approaches Flooding: high traffic load on network, does not scale Highest degree search: (here degree = number of connections) quick search through large areas large number of messages needed for unique identification using highly connected nodes (nodeid)

8 Flooding Search No information about location of data in the intermediate systems Necessity for broad search (Graph theory) Node B (requester) asks neighboring nodes for item D Ping Node B Topology loop detection Pong Query Hit Connect 3 Messages Node A

Communication Overhead 9 Motivations for a better solution Communication overhead vs. node state Flooding Bottlenecks: Communication overhead False negatives = poor reliability O(n) O(log n) O(1)? Distributed Hash Table Scalable solution Bottlenecks: Memory, CPU, Network Availability Central Server O(1) O(log n) Node State O(n)

10 Distributed Indexing Approach of distributed indexing schemes Issues Data and nodes are mapped into same address space Intermediate nodes maintain routing information to target nodes Efficient forwarding to destination Definitive statement of existence of content How to join the topology? Maintenance of routing information required Network events : join, leave, failures, attacks! Fuzzy queries (e.g., wildcard searches) not primarily supported

11 Distributed Indexing Goal is scalable complexity for Communication effort: O(log n) hops Node state: O(log n) routing entries n Nodes Routing in O(log n) steps H( my data ) = 3107 709 1008 1622 2011 2207? 611 3485 2906 12.5.7.31 berkeley.edu planet-lab.org peer-to-peer.info 89.11.20.15 95.7.6.10 86.8.10.18 7.31.10.25

12 Fundamentals of Distributed Hash Tables Design challenges : Desired characteristics Flexibility Reliability Scalability Hashtables use a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (number) Equal distribution of content among nodes Crucial for efficient lookup of content Consistent hashing function (advantages and disadvantages) Permanent adaptation to faults, joins and departures of nodes Assignment of responsibilities to new nodes Re-assignment and re-distribution of responsibilities in case of node failure or departure

13 Distributed Management of Data Nodes and Keys of the hashtable share the same address space Using IDs Nodes are responsible for data in certain parts of the address space The data a node is in charge of may change since there are changes Looking for data means : Finding the responsible node via intermediate nodes The query is routed to the node responsible, which returns the key/value pair Mathematical closeness of comparing hashes Node in China might be next to Node in USA (hashing!) Target node is not necessarily known in advance is it even available? This is a very deterministic statement about availability of data

14 Addressing in Distributed Hash Tables Step 1: Mapping of content/nodes into linear space : consistent hashing m bit identifier space for both keys and nodes The hash function result has to fit in the space : mod 2 m, here m=6 [ 0,, 2 m -1 ] number of objects to be stored, 64 IDs E.g., Hash( ELT-53206-lec4.pdf ) mod 2 m : 54 E.g., Hash( 129.100.16.93 ) mod 2 m : 21 The address space is commonly viewed as a circle

15 Addressing in Distributed Hash Tables Step 2: Association of address ranges to the nodes Often with small redundancy (overlapping of parts) Continuous adaptation due to changes Real (underlay) and logical (overlay) topology are unlikely correlated (consistent hashing) Keys are assigned to their successor node in the identifier circle the node with next higher ID (remember the circular namespace) N1?

16 Association of Address Space with Nodes 709 1008 1622 2011 2207 Logical view of the Distributed Hash Table (Overlay layer) 611 3485 2906 Mapping on the real topology (Physical layer)

17 Addressing in Distributed Hash Tables Step 3: Locating the data (content-based routing) Minimum overhead with distributed hash tables O(1) with centralized hash table (aka expensive server) O(n) DHT hops without finger table (left) O(log n): DHT hops to locate object, and n keys and routing information to store at a node (right)

18 Routing to a Data Item Routing to a Key/Value pair Start lookup at arbitrary node of DHT, unless Bootstrap nodes Routing to requested data item (key) Key/Value pair is delivered to requester. H( TLT2626-lec4.pdf ) In our case, the value is a pointer to the location of file, we are indexing resources for P2P exchanges Key = H( TLT2626-lec4.pdf ) (54, (ip, port)) Initial node (arbitrary or bootstrap) Value = pointer to location of file (indirect storage)

19 Routing table example Routing in CHORD Start lookup at arbitrary node of DHT, unless Bootstrap nodes Routing to requested data item (key) Key/Value pair is delivered to requester. Node target (N) N8 + 1 = N14 N8 + 2 = N14 N8 + 4 = N14 N8 + 8 = N21 N8 + 16 = N32 N8 + 32 = N42 Node address IP address (N14) IP address (N14) IP address (N14) IP address (N21) IP address (N32) IP address (N42)

20 Routing table example Routing in PASTRY = 2 128 0 IP 0 1 IP 1 2 IP 2 SELF 4 IP 4 etc f IP f 30 IP 32... 31 IP 31 SELF 33 IP 33 34 IP 34 Etc 3f IP 3f d0c9 etc 32 hex symbols = 2 128 1/16, at every step 32a dab3

21 How is content stored Direct / Indirect Direct storage Content is stored in responsible node for H( my data ) Inflexible for large content o.k., if small amount data (<1KB) Example: DNS queries Indirect storage Value is often real storage address of content: (IP, Port) = (134.2.11.140, 4711) More flexible, but one step more to reach content 709 1008 1622 2011 2207 D 611 D D H SHA-1 ( D )=3107 3485 2906 134.2.11.68

22 Node Arrival Joining of a new node Calculation of node ID New node contacts DHT via arbitrary bootstrap node Assignment of a particular hash range Binding into routing environment Copying of Key/Value pairs of hash range 709 1008 1622 2011 2207 611 3485 2906 ID: 3485 134.2.11.68 TLT-2626, Lecture 4 03.10.2012

23 Node Arrival Chord Example

24 Node Arrival Chord Example

25 Node Arrival Chord Example

Node Failure and Departure Failure of a node Use of redundant/replicated data if node fails Use of redundant/alternative routing paths if routing environment fails Departure of a node Partitioning of hash range to neighbor nodes Copying of Key/Value pairs to corresponding nodes Unbinding from routing environment

27 Reliability in Distributed Hash Tables Erasure codes and redundancy Erasure code transform a message of k symbols into a longer message (code word) with n symbols such that the original message can be recovered from a subset of the n symbols Every time a node crashes, a piece of the data is destroyed, and after some time, the data may no longer be computable Therefore, the idea of redundancy also needs replication of the data Form of Forward Error Correction (future of storage?) Replication Several nodes should manage the same range of keys Introduces new possibilities for underlay aware routing

Replication Example: Multiple Nodes in One Interval Each interval of the DHT may be maintained by several nodes at the same time Fixed positive number K indicates how many nodes have to at least act within one interval Each data item is therefore replicated at least K times 1 2 4 5 6 7 9 10 3 8 Node

Load Balancing in Distributed Hash Tables Initial assumption: uniform key distribution Hash function Every node with equal load Load balancing is not needed Optimal distribution of documents across nodes Equal distribution Nodes across address space Data across nodes Is this assumption justifiable? Example: Analysis of distribution 4096 Chord nodes 500000 documents Optimum ~122 documents per node Distribution could benefit from load balancing Frequency distribution of DHT nodes storing a certain number of documents

Load Balancing Algorithms Several techniques have been proposed to ensure an equal data distribution Possible subject for Paper assignment 1. Power of Two Choices John Byers, Jeffrey Considine, and Michael Mitzenmacher 2. Virtual Servers Simple Load Balancing for Distributed Hash Tables" Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, and Ion Stoica "Load Balancing in Structured P2P Systems" 3. Thermal-Dissipation-based Approach Simon Rieche, Leo Petrak, and Klaus Wehrle "A Thermal-Dissipation-based Approach for Balancing Data Load in Distributed Hash Tables" 4. A Simple Address-Space and Item Balancing David Karger, and Matthias Ruhl "Simple, Efficient Load Balancing Algorithms for Peer-to-Peer Systems"

31 DHT Interfaces Generic interface of Distributed Hash Tables Provisioning of information Put(key, value) Requesting of information (search for content) Get(key) Reply Value DHT approaches are interchangeable (with respect to interface) Distributed Application Put(Key,Value) Get(Key) Distributed Hash Table (CAN, Chord, Pastry, Tapestry, ) Value Node 1 Node 2 Node 3.... Node N

Comparison: DHT vs. DNS Traditional name services follow fixed mapping DNS maps a logical node name to an IP address DHTs offer flat/generic mapping of addresses Not bound to particular applications or services value in (key, value) may be an address a document or other data

Comparison: DHT vs. DNS Domain Name System Mapping: Symbolic name IP address Is built on a hierarchical structure with root servers Names refer to administrative domains Specialized to search for computer names and services Distributed Hash Table Mapping: key value can easily realize DNS Does not need a special server Does not require special name space Can find data that are independently located of computers

34 Comparison of Lookup Concepts System Per Node State Communication Overhead Fuzzy Queries No False Negatives Robustness Central Server O(n) O(1) Flooding Search O(1) O(n²) Distributed Hash Tables O(log n) O(log n)

35 Properties of DHTs Use of routing information for efficient search for content Keys are evenly (or not?) distributed across nodes of DHT No bottlenecks A continuous increase in number of stored keys is admissible Failure of nodes can be tolerated Survival of attacks possible Self-organizing system Simple and efficient realization Supporting a wide spectrum of applications Flat (hash) key without semantic meaning Value depends on application

36 Learning Outcomes Things to know Differences between lookup concepts Fundamentals of DHTs How DHT works Be ready for specific examples of DHT algorithms Try some yourself! Simulation using Python here : http://bit.ly/szazxt

Any questions? 37 mathieu.devos@tut.fi