Badri Nath Rutgers University

Similar documents
Distributed lookup services

08 Distributed Hash Tables

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Motivation. The Impact of DHT Routing Geometry on Resilience and Proximity. Different components of analysis. Approach:Component-based analysis

The Impact of DHT Routing Geometry on Resilience and Proximity. Acknowledgement. Motivation

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Distributed Hash Tables: Chord

CIS 700/005 Networking Meets Databases

CS 268: DHTs. Page 1. How Did it Start? Model. Main Challenge. Napster. Other Challenges

Peer-to-Peer (P2P) Systems

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Distributed Hash Table

15-744: Computer Networking P2P/DHT

L3S Research Center, University of Hannover

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Telematics Chapter 9: Peer-to-Peer Networks

Peer-to-Peer Systems and Distributed Hash Tables

Part 1: Introducing DHTs

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

LECT-05, S-1 FP2P, Javed I.

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General

CSE 486/586 Distributed Systems

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Lecture-2 Content Sharing in P2P Networks Different P2P Protocols

P2P: Distributed Hash Tables

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

Advanced Computer Networks

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Peer-to-peer computing research a fad?

CSCI-1680 P2P Rodrigo Fonseca

Building a low-latency, proximity-aware DHT-based P2P network

A Survey of Peer-to-Peer Content Distribution Technologies

Distributed Hash Tables

Content Overlays. Nick Feamster CS 7260 March 12, 2007

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

A Scalable Content- Addressable Network

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Architectures for Distributed Systems

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

EE 122: Peer-to-Peer Networks

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

CS514: Intermediate Course in Computer Systems

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Peer-To-Peer Techniques

Introduction to P2P Computing

Peer to peer systems: An overview

ReCord: A Distributed Hash Table with Recursive Structure

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Structured Peer-to-Peer Networks

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Page 1. Key Value Storage"

Cycloid: A constant-degree and lookup-efficient P2P overlay network

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Scaling Out Systems. COS 518: Advanced Computer Systems Lecture 6. Kyle Jamieson

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup

Distributed Hash Tables

Early Measurements of a Cluster-based Architecture for P2P Systems

An Expresway over Chord in Peer-to-Peer Systems

Peer-to-Peer Systems. Chapter General Characteristics

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Introduction to Peer-to-Peer Systems

Distriubted Hash Tables and Scalable Content Adressable Network (CAN)

Making Gnutella-like P2P Systems Scalable

Peer-to-Peer Internet Applications: A Review

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Scalable overlay Networks

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

CSCI-1680 Web Performance, Content Distribution P2P John Jannotti

Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

CSCI-1680 Web Performance, Content Distribution P2P Rodrigo Fonseca

Peer-to-Peer Systems and Security IN2194. Chapter 1 Peer-to-Peer Systems 1.4 Other Issues

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Kademlia: A P2P Informa2on System Based on the XOR Metric

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Effect of Links on DHT Routing Algorithms 1

Distributed Hash Tables (DHT)

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

L3S Research Center, University of Hannover

Introduction to Peer-to-Peer Networks

Athens University of Economics and Business. Dept. of Informatics

CS 347 Parallel and Distributed Data Processing

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Peer-to-peer systems and overlay networks

Distributed Data Management. Profr. Dr. Wolf-Tilo Balke Institut für Informationssysteme Technische Universität Braunschweig

Towards Benchmarking of P2P Technologies from a SCADA Systems Protection Perspective

Telecommunication Services Engineering Lab. Roch H. Glitho

Fault Resilience of Structured P2P Systems

Transcription:

lookup services Badri Nath Rutgers University badri@cs.rutgers.edu 1. CAN: A scalable content addressable network, Sylvia Ratnasamy et.al. SIGCOMM 2001 2. Chord: A scalable peer-to-peer lookup protocol for Internet applications. Ion Stoica et.al., IEEE/ACM Transactions on Networking (TON), Volume 11, Issue 1, pages 17--32, February 2003. 3. The Impact of DHT Routing Geometry on Resilience and Proximity, by Krishna P. Gummadi, Ramakrishna Gummadi, Steven D. Gribble, Sylvia Ratnasamy, Scott Shenker, and Ion Stoica.Proceedings of the ACM SIGCOMM 2003, Karlsruhe, Germany, August 2003.

Distributed lookup services A set of nodes cooperating Peers Run special purpose algorithms/software Doesn t have to be deployed at every node Ignore underlay network

Search or lookup service Find a host that satisfies some property In a distributed, dynamic, internet-scale manner No central state, nodes come and go Storage, overlay networks, databases, diagnosis Locate an item Locate a node Locate a tuple Locate an event

Lookup services classification Centralized Napster Flooding Gnutella Distributed Hashing (DHT) CAN, CHORD, Tapestry, Pastry, Kademlia

Centralized Central server directory IP1 IP5 Slum Dog Millionaire? IP3, IP5 IP2 IP4 IP3 Example: Napster

Napster Client server protocol over TCP Connect to well-known napster server Upload <file names, keywords> that you want to share Select best client/peer to download files Selection done by pinging peers and selecting the one with lowest latency or best transfer rate Popular for downloading music among peers Copyright issue Shutdown for legal reasons

Napster Pros and cons Pros Simple Control easy Server can be made to scale with clusters etc Cons Server bottleneck Need open servers (firewall, NAT) No security No authentication No anonymity

flooding Gnutella a file sharing service based on flooding A well known node acts as an anchor Nodes who have files inform this node about their existence Nodes select other nodes as peers Form an overlay based on some criteria Each node stores a select number of files Send a query to peers if file not found locally Peers respond to the query if found else Reroute query to its neighbor peers and so on Basically flood the network for a search of the file

Gnutella search and flooding Search A request by a node NodeID for string S Check local system, if not found Form a descriptor <NodeID, S, N, T> N is a unique request ID, T is TTL time-to-live Flood Send descriptor to fixed number (6 or 7) of peers If S not found, decrement TTL, send to its (6 or 7) peers Stop flooding if request ID already seen or TTL expiry

Gnutella back propagation When NodeB receives a request (NodeA, S,ReqID, T) If B already has seen reqid or TTL = 0, do nothing Lookup locally, if found send it to NodeA Eventually by back propagation, it reaches the originator

Gnutella protocol messages Broadcast messages PING: initiate message to peers, hello, if booting send to anchor node Query: <nodeid, S, ReqID, TTL> Response messages Pong: reply to ping, announce self, with Sharing information Query response: contains the NodeID that has the requested file Point-to-point messages GET: retrieve the requested file PUT (PUSH) upload the file to me

Gnutella flooding Download Query-hit Query-hit TTL=3 TTL=2 TTL=1 Query TTL=3 Query-hit TTL=3 TTL=2 TTL=2 Response Query-hit

issues Peers not staying ON all the time All peers the same Modem users were the culprit More intelligent overlay needed

hashing Search for data based on a key Has function maps keys to a range [0,, N-1] h(x)=x mod N An item k is stored in index i=h(k) Issues Collision Selection of hash function Handling Collisions Chaining and linear probing Double hashing

Conventional Hashing, search tree etc Hashing Map key/index to bucket holding data H(k)= k mod m (prime) Secondary hash =p- k mod p 14 H(18)=18 mod 13 =5 -bucket H(41)=41 mod 13= 2-bucket 18, 44,31 59

Changes? Buckets increase /decrease, re- mapping Add additional buckets H(k)=k mod 17 Remove 2 buckets H(k)=k mod 11 Move items to the new buckets What to do in a distributed setting (internet scale) Each node stores a portion of the key space

What Is a DHT? Distributed Hash Table: table is distributed among a set of nodes nodes use the same hashing function key = Hash(data) lookup(key) -> node id that holds data Two problems Partitioning of data Lookup or routing

CAN design Key hashes to a point in d-dim space A node randomly decides on a point P Use existing CAN nodes to determine the node that owns the zone to which P belongs Split the zone between existing CAN node and new node

Dynamic zone creation Rotate the dimension on which to split 0,Y n2 n1 0,y/2 n4 n3 n5 0 0 x/2,y X,0

CAN Key Assignment 0,Y K1 K2 n2 n1 K3 n4 n3 n5 0 0 X/2,0 X,0

CAN: node assignment I

CAN: node assignment node I::insert(K,V) I

CAN: node assignment node I::insert(K,V) (1) a = h x (K) I x = a

CAN: node assignment node I::insert(K,V) (1) a = h x (K) b = h y (K) I y = b x = a

CAN: simple example I

CAN: simple example node I::insert(K,V) I

CAN: simple example node I::insert(K,V) (1) a = h x (K) I x = a

CAN: simple example node I::insert(K,V) (1) a = h x (K) b = h y (K) I y = b x = a

CAN: Phase II routing node I::insert(K,V) (1) a = h x (K) b = h y (K) I (2) route(k,v) -> (a,b)

CAN: Phase II routing node I::insert(K,V) (1) a = h x (K) b = h y (K) I (2) route(k,v) -> (a,b) (K,V) (3) (a,b) stores (K,V)

CAN: lookup node J::retrieve(K) (1) a = h x (K) b = h y (K) (2) route retrieve(k) to (a,b) (K,V) J

CAN: routing/lookup A node only maintains state for its immediate neighboring nodes

CAN: Routing/Lookup Use neighbors that minimizes distance to destination X,Y 0,Y K1 K2 n2 n1 K3? 0,Y/2 K3 K3? n4 n3 n5 0 0 X/2,0 X,0

CAN: routing table

CAN: routing (a,b) (x,y)

CAN: node insertion Bootstrap node new node 1) Discover some node I already in CAN

CAN: node insertion I new node 1) discover some node I already in CAN

CAN: node insertion (p,q) 2) pick random point in space I new node

CAN: node insertion (p,q) J I new node 3) I routes to (p,q), discovers node J

CAN: node insertion J new 4) split J s zone in half new owns one half

CAN: node insertion Inserting a new node affects only a single other node and its immediate neighbors

CAN: Node failures Simple failures know your neighbor s neighbors Only the failed node s immediate neighbors are required for recovery when a node fails, one of its neighbors takes over its zone

CAN: performance For a uniformly partitioned space with n nodes and d dimensions per node, number of neighbors is 2d average routing path is o(sqrt(n)) hops for 2-dimension simulations confirm analysis Can scale the network without increasing per-node state Next Chord log(n) fingers with log(n) hops

CAN: Discussion Scalable State information is O(d) at each node Locality Nodes are neighbors in the overlay, not in the physical network Latency stretch Paper suggests other metrics RTT relative to distance progress

Consistent hashing Hash keys and bucket-id to some uniform name space Assign key to the first bucket encountered in the name space Make collisions rare (for the bucket-ids) 0.1 0.15 0.25 0.6 0.7 0.8 0.2 0.5 0.8 When servers/buckets come and go small local movements Maintain a directory to quickly locate server holding items

Consistent hashing Node 105 Key 5 K5 N105 K20 Circular 7-bit ID space N32 N90 K80 A key is stored at its successor: node with next higher ID

Routing Centralized directory One node knows cache points O(n) state, fixed distance Single point of failure Napster Flat Every node knows about every other node O(n2) state, O(n2) communication, fixed distance RON Hierarchical Maintain a tree, log(n) distance, root is bottleneck

Distributed hash table (DHT) put(key, data) Distributed application get (key) data Distributed hash table lookup(key) node IP address Lookup service (File sharing) (DHash) (Chord) node node. node Application may be distributed over many nodes DHT distributes data storage over many nodes

DHT=Distributed Hash Table A hash table allows you to insert, lookup and delete objects with keys A distributed hash table allows you to do the same in a distributed setting (objects=files) Performance Concerns: Load balancing Fault-tolerance Efficiency of lookups and inserts Locality Chord, Pastry, Tapestry, Plaxton, Viceroy, Kademlia, Skipnet, Symphony, Koorde, Ulysses, Apocrypha, Land, ORDI. YADHT. Yet another DHT papers have started appearing

Variations Geometry Underlying graph structure Ring, tree, butterfly Distance Distance between the nodes Routing Protocol for selecting neighbor and route

The lookup problem N 1 N 2 N 3 Key= boyle Value=Slum dog Millionaire Publisher Internet? Client Lookup( boyle ) N 4 N 5 N 6

Centralized lookup (Napster) SetLoc( boyle, N4) N 1 N 2 Publisher@ N 4 Key= Boyle Value=Slum dog millionaire N 9 N 7 N 6 DB N 3 N 8 Client Lookup( boyle ) Simple, but O(N) state and a single point of failure

Flooded queries (Gnutella) N 1 N 2 N 3 Lookup( boyle ) Client Publisher@N 4 Key= boyle Value=Slum Dog Millionaire N 6 N 9 N 7 N 8 Robust, but worst case O(N) messages per lookup

Routed queries (Chord) N 1 Publisher N 4 Key= Boyle Value=Slum Dog Millionaire N 2 N 6 N 9 N 7 N 3 N 8 Client Lookup( Boyle )

Comparative Performance Memory Lookup Latency #Messages for a lookup Napster O(1) O(1) (O(N)@server) Gnutella O(N) O(N) O(N)

Comparative Performance Memory Lookup Latency #Messages for a lookup Napster O(1) O(1) (O(N)@server) Gnutella O(N) O(N) O(N) Chord O(log(N)) O(log(N)) O(log(N))

Routing challenges Define a useful key nearness metric Keep the hop count small Keep the tables small Stay robust despite rapid change Chord: emphasizes efficiency and simplicity

Chord overview Provides peer-to-peer hash lookup: Lookup(key) IP address Chord does not store the data How does Chord route lookups? How does Chord maintain routing tables?

O(n) lookup, O(1) state N105 N120 N10 Where is key 80? N90 has K80 N32 K80 N90 N60

A Simple Key Lookup If each node knows only how to contact its current successor node on the identifier circle, all node can be visited in linear order. Queries for a given identifier could be passed around the circle via these successor pointers until they encounter the node that contains the key.

Simple lookup algorithm Lookup(my-id, key-id) n = my successor if my-id <n <key-id call Lookup(id) on node n // next hop else return my successor // done Correctness depends only on successors

Join and Departure When a node n joins the network, certain keys previously assigned to n s successor now become assigned to n. When node n leaves the network, all of its assigned keys are reassigned to n s successor.

Direct lookup, O(n) state N105 N120 N10 Where is key 80? N90 has K80 N32 K80 N90 N60

Direct Key Lookup If each node knows all current nodes on the identifier circle, the node that has the key can be directly visited. Node leave/join expensive

Chord IDs Key identifier = SecureHASH(key) Node identifier = SecureHASH(IP address) Both are uniformly distributed Both exist in the same ID space How to map key IDs to node IDs? What directory structure to maintain

Chord properties Efficient: O(log(N)) messages per lookup N is the total number of servers Scalable: O(log(N)) state per node Robust: survives massive failures Since chord number of proposals for DHT

Routing table size vs distance Routing table size n Maintain full state Plaxton et.al, Chord, Pastry, Tapestry log n <=d Ulysses, de Bruijn Graph (Koorde etc.) Viceroy (d=7) SCAN Maintain no state 0 Worst case distance O(1) O(log n) O(dn 1/d ) O(n) O(log logn n) Ulysses : A Robust, Low-Diameter, Low-Latency Peer-to-Peer Network in ICNP 2003

Node Entry keys 4,5,6 7 7 0 1 keys 1 keys 6 2 5 3 keys 2 4

Node departure keys 7 7 0 1 keys 1 keys 4,5,6 6 2 5 3 keys 2 4

Scalable Key Location To accelerate lookups, Chord maintains additional routing information. A finger table entry includes both the Chord identifier and the IP address (and port number) of the relevant node. The first finger of n is the immediate successor of n on the circle.

Scalable Key Location Finger Tables Each node n maintains a routing table with up to m entries (which is in fact the number of bits in identifiers), called finger table. The i th entry in the table at node n contains the identity of the first node s that succeeds n by at least 2 i-1 on the identifier circle. s = successor(n+2 i-1 ). s is called the i th finger of node n, denoted by n.finger(i)

Building finger table:example m=3 <0..7> Node n 1 joins all entries in its finger table are initialized to itself 0 7 1 Succ. Table i id+2 i succ 0 2 1 1 3 1 2 5 1 6 2 5 4 3

Example (cont) Node n 2 joins Succ. Table 7 0 1 i id+2 i succ 0 2 2 1 3 1 2 5 1 6 2 Succ. Table 5 4 3 i id+2 i succ 0 3 1 1 4 1 2 6 1

Example (cont) Nodes n 0, n 6 join Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 6 Succ. Table Succ. Table 7 0 1 i id+2 i succ 0 2 2 1 3 6 2 5 6 i id+2 i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table 5 4 3 i id+2 i succ 0 3 6 1 4 6 2 6 6

Scalable Key Location Example query The path a query for key 54 starting at node 8:

finger table example Finger Table at N80 i ft[i] 0 81 96 1 82 96 2 84 96 3 88 96 4 96 96 5 112 112 6 16 16 N96 N112 80 + 2 4 80 + 2 5 80 + 2 6 80 + 2 3 80 + 2 2 80 + 2 1 80 + 2 0 N80 0 N16 N45 N32 with m=7 i m ith entry at peer with id n is first peer with id >= n + 2 (mod 2 )

Lookup with fingers Lookup(my-id, key-id) look in local finger table for highest node n s.t. my-id < n < key-id if n exists call Lookup(id) on node n // next hop else return my successor // done

Lookups take O(log(N)) hops N5 N110 N10 N20 K19 N99 N32 Lookup(K19) N80 N60

Simulation Results: ½ log 2 (N) Average Messages per Lookup Number of Nodes Error bars mark 1 st and 99 th percentiles

node join Introducer directs N40 to N32 N32 updates successor to N40 N40 initializes successor to N45 N40 periodically talks to neighbors to update finger table Say m=7 0 N112 N16 N96 N32 N80 N45 N40

node join N40 may need to copy some files/keys from N45 (files with file-id between 32 and 40) Say m=7 N112 0 N16 N96 N32 N80 N45 N40 K34,K38,K42

Intermediate peer failure Say m=7 N112 0 Lookup fails (N16 does not know N45) N16 N96 X X XN32 N80 Lookup Trump with key K40 stored in N45 N45 Trump with key K40 stored in N45

Replicate successor pointers Maintain r successor pointers Say m=7 N112 0 N16 N96 X XN32 N80 Lookup Trump with key K40 stored in N45 N45 Trump with key K40 stored in N45

Node failure Node that stores the file has failed Say m=7 0 N112 Lookup fails N45 not responding N16 N96 X N32 N80 Lookup Trump with key K40 stored in N45 X N45 Trump with key K40 stored in N45

Node failure Replicate files at K succ and pred nodes Say m=7 N112 0 N16 N96 Trump with key K40 stored in N32 N32 N80 Lookup Trump with key K40 stored in N45 X N45 Trump with key K40 stored in N45

Chord latency stretch Neighbor on the virtual ring can be physically far apart US 100 Japan 5 80 10

Chord Summary Chord provides peer-to-peer hash lookup Efficient: O(log(n)) messages per lookup Robust as nodes fail and join Good primitive for peer-to-peer systems

CAN vs CHORD From Chord to CAN From one dim hashing to d-dim hashing Nodes form an overlay in d-dimensional space Node IDs are chosen randomly from the d-space Object IDs (keys) are chosen from the same d-space Nodes form an overlay of a ring Finger tables used to improve lookup to O(log N) Nodes join/leave Zones are split and merged as nodes join and leave Finger tables are adjusted Local movement of data objects