: Scalable Lookup

Similar documents
Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

CS514: Intermediate Course in Computer Systems

Peer-to-Peer Systems and Distributed Hash Tables

P2P: Distributed Hash Tables

LECT-05, S-1 FP2P, Javed I.

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Distributed Hash Tables

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

08 Distributed Hash Tables

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Distributed Hash Tables

L3S Research Center, University of Hannover

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

CIS 700/005 Networking Meets Databases

Telematics Chapter 9: Peer-to-Peer Networks

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

Ken Birman. Cornell University. CS5410 Fall 2008.

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Hash Table

Peer-to-Peer (P2P) Systems

L3S Research Center, University of Hannover

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Building a low-latency, proximity-aware DHT-based P2P network

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Distributed Hash Tables: Chord

Kademlia: A P2P Informa2on System Based on the XOR Metric

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

CSE 486/586 Distributed Systems

Back-Up Chord: Chord Ring Recovery Protocol for P2P File Sharing over MANETs

Modern Technology of Internet

Introduction to P2P Computing

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Chord. Advanced issues

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Securing Chord for ShadowWalker. Nandit Tiku Department of Computer Science University of Illinois at Urbana-Champaign

Implications of Neighbor Selection on DHT Overlays

A Framework for Peer-To-Peer Lookup Services based on k-ary search

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Peer-to-peer systems and overlay networks

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Peer to peer systems: An overview

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

Distributed Hash Tables

Peer-to-Peer: Part II

A Survey of Peer-to-Peer Content Distribution Technologies

Overlay networks. T o do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. q q q. Turtles all the way down

CSCI-1680 P2P Rodrigo Fonseca

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

Distributed Data Management. Profr. Dr. Wolf-Tilo Balke Institut für Informationssysteme Technische Universität Braunschweig

6 Structured P2P Networks

Distributed Systems Final Exam

Structured Peer-to-Peer Networks

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Peer to Peer Networks

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Naming WHAT IS NAMING? Name: Entity: Slide 3. Slide 1. Address: Identifier:

Introduction to Peer-to-Peer Systems

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General

Peer-to-peer computing research a fad?

Part 1: Introducing DHTs

TinyTorrent: Implementing a Kademlia Based DHT for File Sharing

Effect of Links on DHT Routing Algorithms 1

Handling Churn in a DHT

A Survey of Peer-to-Peer Systems

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Cycloid: A constant-degree and lookup-efficient P2P overlay network

Vivaldi Practical, Distributed Internet Coordinates

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Early Measurements of a Cluster-based Architecture for P2P Systems

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Fixing the Embarrassing Slowness of OpenDHT on PlanetLab

Distributed Hash Tables

Architectures for Distributed Systems

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Distributed Hash Tables

Distributed hash table - Wikipedia, the free encyclopedia

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Peer- Peer to -peer Systems

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

INF5071 Performance in distributed systems: Distribution Part III

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

Fault Resilience of Structured P2P Systems

Structured P2P. Complexity O(log N)

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

ReCord: A Distributed Hash Table with Recursive Structure

Transcription:

6.824 2006: Scalable Lookup Prior focus has been on traditional distributed systems e.g. NFS, DSM/Hypervisor, Harp Machine room: well maintained, centrally located. Relatively stable population: can be known in entirety. Focus on performance, semantics, recovery. Biggest system might be Porcupine. Now: Internet scale systems Machines owned by you and me: no central authority. Huge number of distributed machines: can't know everyone e.g. From e-mail to Napster. Problems How do you name nodes and objects? How do you find other nodes in the system (efficiently)? How should data be split up between nodes? How to prevent data from being lost? How to keep it available? How to provide consistency? How to provide security? anonymity? What structure could be used to organize nodes? Central contact point: Napster Hierarchy: DNS for E-mail, WWW Flat? Let's look at a system with a flat interface: DHT Scalable lookup: Provide an abstract interface to store and find data Typical DHT interface: put(key, value) get(key) -> value loose guarantees about keeping data alive log(n) hops, even for new nodes guarantees about load balance, even for new nodes Potential DHT applications: publishing: DHT keys are like links file system, use DHT as a sort of distributed disk drive keys are like block numbers Petal is a little bit like this location tracking keys are e.g. cell phone numbers a value is a phone's current location Basic idea Two layers: routing (lookup) and data storage Routing layer handles naming and arranging nodes and then finding them. Storage layer handles actually putting and maintaining data. What does a complete algorithm have to do? 1. Define IDs, document ID to node ID assignment 2. Define per-node routing table contents 3. Lookup algorithm that uses routing tables 4. Join procedure to reflect new nodes in tables 5. Failure recovery

6. Move data around when nodes join 7. Make new replicas when nodes fail Typical approach: Give each node a unique ID Have a rule for assigning keys to nodes based on node/key ID e.g. key X goes on node node with nearest ID to X Now how, given X, do we find that node? Arrange nodes in an ID-space, based on ID i.e. use ID as coordinates Build a global sense of direction Examples: 1D line, 2D square, Tree based on bits, hypercube, or ID circle Build routing tables to allow ID-space navigation Each node knows about ID-space neighbors I.e. knows neighbors' IDs and IP addresses Perhaps each node knows a few farther-away nodes To move long distances quickly The "Chord" peer-to-peer lookup system By Stoica, Morris, Karger, Kaashoek and Balakrishnan http://pdos.csail.mit.edu/chord/ An example system of this type ID-space topology Ring: All IDs are 160-bit numbers, viewed in a ring. Everyone agrees on how the ring is divided between nodes Just based on ID bits Assignment of key IDs to node IDs? Key stored on first node whose ID is equal to or greater than key ID. Closeness is defined as the "clockwise distance" If node and key IDs are uniform, we get reasonable load balance. Node IDs can be assigned, chosen randomly, SHA-1 hash of IP address... Key IDs can be drived from data, or chosen by user Routing? Query is at some node. Node needs to forward the query to a node "closer" to key. Simplest system: either you are the "closest" or your neighbor is closer. Hand-off queries in a clockwise direction until done Only state necessary is "successor". n.find_successor (k): if k in (n,successor]: return successor else: return successor.find_successor (k) Slow but steady; how can we make this faster? This looks like a linked list: O(n) Can we make it more like a binary search? Need to be able to halve the distance at each step. Finger table routing: Keep track of nodes exponentially further away: New state: succ(n + 2^i)

N) Many of these entries will be the same in full system: expect O(lg n.find_successor (k): if k in (n,successor]: return successor else: n' = closest_preceding node (k) return n'.find_successor (k) Maybe node 8's looks like this: 1: 14 2: 14 4: 14 8: 21 16: 32 32: 42 There's a complete tree rooted at every node Starts at that node's row 0 Threaded through other nodes' row 1, &c Every node acts as a root, so there's no root hotspot This is *better* than simply arranging the nodes in one tree How does a new node acquire correct tables? General approach: Assume system starts out w/ correct routing tables. Use routing tables to help the new node find information. Add new node in a way that maintains correctness. Issues a lookup for its own key to any existing node. Finds new node's successor. Ask that node for its finger table. At this point the new node can forward queries correctly: Tweak its own finger table as necessary. Does routing *to* us now work? If new node doesn't do anything, query will go to where it would have gone before we joined. I.e. to the existing node numerically closest to us. So, for correctness, we need to let people know that we are here. Each node keeps track of its current predecessor. When you join, tell your successor that its predecessor has changed. Periodically ask your successor who its predecessor is: If that node is closer to you, switch to that guy. Is that enough? Everyone must also continue to update their finger tables: Periodically lookup your n + 2^i-th key What about concurrent joins? E.g. two new nodes with very close ids, might have same successor. e.g. 44 and 46. Both may find node 48... spiky tree! Good news: periodic stabilization takes care of this. What about node failures? Assume nodes fail w/o warning. Strictly harder than graceful departure.

Two issues: Other nodes' routing tables refer to dead node. Dead nodes predecessor has no successor. If you try to route via dead node, detect timeout, treat as empty table entry. I.e. route to numerically closer entry instead. Repair: ask any node on same row for a copy of its corresponding entry. Or any node on rows below. All these share the right prefix. For missing successor Failed node might have been closest to key ID! Need to know next-closest. Maintain a _list_ of successors: r successors. If you expect really bad luck, maintain O(log N) successors. We can route around failure. The system is effectively self-correcting. Locality Lookup takes log(n) messages. But they are to random nodes on the Internet! Will often be very far away. Can we route through nodes close to us on underlying network? This boils down to whether we have choices: If multiple correct next hops, we can try to choose closest. Chord doesn't allow much choice. Observe: Strict successor for finger not necessary. Sample nodes in successor list of true finger, pick closest. What's the effect? Individual hops are lower latency. But less and less choice (lower node density) as you get close in ID space. So last few hops likely to be very long. Thus you don't *end up* close to the initiating node. You just get there quicker. How fast could proximity be? 1 + 1/4 + 1/16 + 1/64 Not as good as real shortest-path routing! Any down side to locality routing? Harder to prove independent failure. Maybe no big deal, since no locality for successor lists sets. Easier to trick me into using malicious nodes in my tables. What about security? Self-authenticating data, e.g. key = SHA1(value) So DHT node can't forge data Of course it's annoying to have immutable data... Can a DHT node claim that data doesn't exist? Yes, though perhaps you can check other replicas Can a host join w/ IDs chosen to sit under every replica? Or "join" many times, so it is most of the DHT nodes? How are IDs chosen? Why not just keep complete routing tables?

So you can always route in one hop? Danger in large systems: timeouts or cost of keeping tables up to date. Accordion (NSDI '05) trade off between keeping complete state and performance subject to a budget Are there any reasonable applications for DHTs? For example, could you build a DHT-based Gnutella? Next time: wide area storage.