Distributed Hash Tables

Similar documents
Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Peer-to-Peer Systems and Distributed Hash Tables

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Distributed Hash Tables: Chord

Scaling Out Systems. COS 518: Advanced Computer Systems Lecture 6. Kyle Jamieson

Peer-to-peer computing research a fad?

Peer-to-Peer Systems

NAMING, DNS, AND CHORD

08 Distributed Hash Tables

: Scalable Lookup

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

EECS 498 Introduction to Distributed Systems

P2P: Distributed Hash Tables

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Peer-to-Peer (P2P) Distributed Storage. Dennis Kafura CS5204 Operating Systems

CSCI-1680 P2P Rodrigo Fonseca

CSE 486/586 Distributed Systems

CS6450: Distributed Systems Lecture 13. Ryan Stutsman

Kademlia: A P2P Informa2on System Based on the XOR Metric

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Primary/Backup. CS6450: Distributed Systems Lecture 3/4. Ryan Stutsman

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

CS514: Intermediate Course in Computer Systems

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Peer-to-Peer (P2P) Systems

Peer to Peer Systems and Probabilistic Protocols

Introduction to Peer-to-Peer Systems

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

Page 1. Key Value Storage"

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Lecture 8: Application Layer P2P Applications and DHTs

Overlay networks. T o do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. q q q. Turtles all the way down

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

L3S Research Center, University of Hannover

Telematics Chapter 9: Peer-to-Peer Networks

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Bitcoin. CS6450: Distributed Systems Lecture 20 Ryan Stutsman

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

CIS 700/005 Networking Meets Databases

CS 43: Computer Networks. 14: DHTs and CDNs October 3, 2018

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Introduction to P2P Computing

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Peer-to-Peer Systems. Chapter General Characteristics

Handling Churn in a DHT

Today s Objec2ves. Kerberos. Kerberos Peer To Peer Overlay Networks Final Projects

LECT-05, S-1 FP2P, Javed I.

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General

CDNs and Peer-to-Peer

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Badri Nath Rutgers University

CS 43: Computer Networks BitTorrent & Content Distribution. Kevin Webb Swarthmore College September 28, 2017

Byzantine Fault Tolerance

Peer to Peer Networks

EE 122: Peer-to-Peer Networks

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

CS 268: Lecture 22 DHT Applications

Scaling Out Key-Value Storage

Peer-to-peer systems and overlay networks

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Architectures for Distributed Systems

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Ossification of the Internet

Chapter 6 PEER-TO-PEER COMPUTING

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

15-744: Computer Networking P2P/DHT

Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

INF5070 media storage and distribution systems. to-peer Systems 10/

Page 1. P2P Traffic" P2P Traffic" Today, around 18-20% (North America)! Big chunk now is video entertainment (e.g., Netflix, itunes)!

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

L3S Research Center, University of Hannover

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

Distributed Hash Table

Large-Scale Data Stores and Probabilistic Protocols

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications

Peer to Peer Networks

CS 347 Parallel and Distributed Data Processing

Distributed Hash Tables

Tracking Objects in Distributed Systems. Distributed Systems

Modern Technology of Internet

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

Chapter 10: Peer-to-Peer Systems

Reminder: Distributed Hash Table (DHT) CS514: Intermediate Course in Operating Systems. Several flavors, each with variants.

Transcription:

Distributed Hash Tables CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed for use under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Some material taken/derived from MIT 6.824 by Robert Morris, Franz Kaashoek, and Nickolai Zeldovich. 1

Today 1. Peer-to-Peer Systems Napster, Gnutella, BitTorrent, challenges 2. Distributed Hash Tables 3. The Chord Lookup Service 4. Concluding thoughts on DHTs, P2P 2

What is a Peer-to-Peer (P2P) system? Node Node Internet Node Node Node A distributed system architecture: No centralized control Nodes are roughly symmetric in function Large number of unreliable nodes 3

Why might P2P be a win? High capacity for services through parallelism: Many disks Many network connections Many CPUs Absence of a centralized server or servers may mean: Less chance of service overload as load increases Easier deployment A single failure won t wreck the whole system System as a whole is harder to attack 4

P2P adoption Successful adoption in some niche areas 1. Client-to-client (legal, illegal) file sharing Popular data but owning organization has no money 2. Digital currency: no natural single owner (Bitcoin) 3. Voice/video telephony: user to user anyway Issues: Privacy and control 5

Example: Classic BitTorrent 1. User clicks on download link Gets torrent file with content hash, IP addr of tracker 2. User s BitTorrent (BT) client talks to tracker Tracker tells it list of peers who have file 3. User s BT client downloads file from one or more peers 4. User s BT client tells tracker it has a copy now, too 5. User s BT client serves the file to others for a while Provides huge download bandwidth, without expensive server or network links 6

The lookup problem get( Star Wars.mov ) N 1 Publisher (N 4 ) N 2 N 3 Internet Client? put( Star Wars.mov, [content]) N 5 N 6 7

Centralized lookup (Napster) N 2 N 3 N 1 SetLoc( Star Wars.mov, IP address of N 4 ) DB Client Lookup( Star Wars.mov ) Publisher (N 4 ) key= Star Wars.mov, value=[content] Simple, but O(N) state and a single point Nof 6 failure N 5 8

Flooded queries (original Gnutella) Lookup( Star Wars.mov ) N 2 N 3 N 1 Robust, but O(N = number of peers) messages per lookup Client Publisher (N 4 ) key= Star Wars.mov, value=[content] N 5 N 6 9

Routed DHT queries (Chord) Lookup(H(audio data)) N 2 N 3 N 1 Client Publisher (N 4 ) N 5 N 6 key= H(audio data), Can we make it robust, reasonable value=[content] state, reasonable number of hops? 10

Today 1. Peer-to-Peer Systems 2. Distributed Hash Tables 3. The Chord Lookup Service 4. Concluding thoughts on DHTs, P2P 11

What is a DHT (and why)? Local hash table: key = Hash(name) put(key, value) get(key) à value Service: Constant-time insertion and lookup How can I do (roughly) this across millions of hosts on the Internet? Distributed Hash Table (DHT) 12

What is a DHT (and why)? Distributed Hash Table: key = hash(data) lookup(key) à IP addr (Chord lookup service) send-rpc(ip address, put, key, data) send-rpc(ip address, get, key) à data Partitioning data in truly large-scale distributed systems Tuples in a global database engine Data blocks in a global file system Files in a P2P file-sharing system 13

Cooperative storage with a DHT put(key, data) Distributed application get (key) Distributed hash table data (DHash) lookup(key) Lookup service node IP address (Chord) node node. node App may be distributed over many nodes DHT distributes data storage over many nodes 14

BitTorrent over DHT BitTorrent can use DHT instead of (or with) a tracker BT clients use DHT: Key = file content hash ( infohash ) Value = IP address of peer willing to serve file Can store multiple values (i.e. IP addresses) for a key Client does: get(infohash) to find other clients willing to serve put(infohash, my-ipaddr)to identify itself as willing 15

Why might DHT be a win for BitTorrent? The DHT comprises a single giant tracker, less fragmented than many trackers So peers more likely to find each other Maybe a classic tracker too exposed to legal & c. attacks 16

Why the put/get DHT interface? API supports a wide range of applications DHT imposes no structure/meaning on keys Key/value pairs are persistent and global Can store keys in other DHT values And thus build complex data structures 17

Why might DHT design be hard? Decentralized: no central authority Scalable: low network traffic overhead Efficient: find items quickly (latency) Dynamic: nodes fail, new nodes join 18

Today 1. Peer-to-Peer Systems 2. Distributed Hash Tables 3. The Chord Lookup Service Basic design Integration with DHash DHT, performance 19

Chord lookup algorithm properties Interface: lookup(key) IP address Efficient: O(log N) messages per lookup N is the total number of servers Scalable: O(log N) state per node Robust: survives massive failures Simple to analyze 20

Chord identifiers Key identifier = SHA-1(key) Node identifier = SHA-1(IP address) SHA-1 distributes both uniformly How does Chord partition data? i.e., map key IDs to node IDs 21

Consistent hashing [Karger 97] Key 5 K5 N105 K20 Node 105 N90 K80 Circular 7-bit ID space N32 Key is stored at its successor: node with next-higher ID 22

Chord: Successor pointers N105 N120 N10 N32 K80 N90 N60 23

Basic lookup N105 N120 N10 Where is K80? N32 K80 N90 N60 24

Simple lookup algorithm Lookup(key-id) succ ß my successor if my-id < succ < key-id // next hop call Lookup(key-id) on succ else // done return succ Correctness depends only on successors 25

Improving performance Problem: Forwarding through successor is slow Data structure is a linked list: O(n) Idea: Can we make it more like a binary search? Need to be able to halve distance at each step 26

Finger table allows log N-time lookups ¼ ½ 1/8 1/16 1/32 1/64 N80 27

Finger i Points to Successor of n+2 i K112 N120 ¼ ½ 1/8 1/16 1/32 1/64 N80 28

Implication of finger tables A binary lookup tree rooted at every node Threaded through other nodes' finger tables This is better than simply arranging the nodes in a single tree Every node acts as a root So there's no root hotspot No single point of failure But a lot more state in total 29

Lookup with finger table Lookup(key-id) look in local finger table for highest n: my-id < n < key-id if n exists call Lookup(key-id) on node n // next hop else return my successor // done 30

Lookups Take O(log N) Hops N5 N110 N10 N20 K19 N99 N32 Lookup(K19) N80 N60 31

An aside: Is log(n) fast or slow? For a million nodes, it s 20 hops If each hop takes 50 milliseconds, lookups take a second If each hop has 10% chance of failure, it s a couple of timeouts So in practice log(n) is better than O(n) but not great 32

Joining: Linked list insert N36 N25 1. Lookup(36) N40 K30 K38 33

Join (2) N25 2. N36 sets its own successor pointer N40 K30 K38 N36 34

Join (3) N25 3. Copy keys 26..36 from N40 to N36 N40 K30 K38 N36 K30 35

Notify messages maintain predecessors N25 notify N25 N36 N40 notify N36 36

Stabilize message fixes successor stabilize N25 N36 N40 My predecessor is N36. 37

Joining: Summary N25 N36 K30 N40 K30 K38 Predecessor pointer allows link to new node Update finger pointers in the background Correct successors produce correct lookups 38

Failures may cause incorrect lookup N113 N120 N10 N102 N85 N80 Lookup(K90) N80 does not know correct successor, so incorrect lookup 39

Successor lists Each node stores a list of its r immediate successors After failure, will know first live successor Correct successors guarantee correct lookups Guarantee is with some probability 40

Choosing successor list length Assume one half of the nodes fail P(successor list all dead) = (½) r i.e., P(this node breaks the Chord ring) Depends on independent failure Successor list of size r = O(log N) makes this probability 1/N: low for large N 41

Lookup with fault tolerance Lookup(key-id) look in local finger table and successor-list for highest n: my-id < n < key-id if n exists call Lookup(key-id) on node n // next hop if call failed, remove n from finger table and/or successor list return Lookup(key-id) else return my successor // done 42

Today 1. Peer-to-Peer Systems 2. Distributed Hash Tables 3. The Chord Lookup Service Basic design Integration with DHash DHT, performance 43

The DHash DHT Builds key/value storage on Chord Replicates blocks for availability Stores k replicas at the k successors after the block on the Chord ring Caches blocks for load balancing Client sends copy of block to each of the servers it contacted along the lookup path Authenticates block contents 44

DHash data authentication Two types of DHash blocks: Content-hash: key = SHA-1(data) Public-key: key is a cryptographic public key, data are signed by corresponding private key Chord File System example: public key root block... signature H(D) directory block D... H(F) inode block F... H(B1) H(B2) data block B1 data block B2 45

DHash replicates blocks at r successors N110 N5 N10 N99 N20 N40 Block 17 N80 N50 N68 N60 Replicas are easy to find if successor fails Hashed node IDs ensure independent failure 46

Experimental overview Quick lookup in large systems Low variation in lookup costs Robust despite massive failure Goal: Experimentally confirm theoretical results 47

Chord lookup cost is O(log N) Average Messages per Lookup Number of Nodes Constant is 1/2 48

Failure experiment setup Start 1,000 Chord servers Each server s successor list has 20 entries Wait until they stabilize Insert 1,000 key/value pairs Five replicas of each Stop X% of the servers, immediately make 1,000 lookups 49

Massive failures have little impact Failed Lookups (Percent) 1.4 1.2 1 0.8 0.6 0.4 0.2 0 (1/2) 6 is 1.6% 5 10 15 20 25 30 35 40 45 50 Failed Nodes (Percent) 50

Today 1. Peer-to-Peer Systems 2. Distributed Hash Tables 3. The Chord Lookup Service Basic design Integration with DHash DHT, performance 4. Concluding thoughts on DHT, P2P 51

DHTs: Impact Original DHTs (CAN, Chord, Kademlia, Pastry, Tapestry) proposed in 2001-02 Following 5-6 years saw proliferation of DHTbased applications: Filesystems (e.g., CFS, Ivy, OceanStore, Pond, PAST) Naming systems (e.g., SFR, Beehive) DB query processing [PIER, Wisc] Content distribution systems (e.g., Coral) distributed databases (e.g., PIER) 52

Why don t all services use P2P? 1. High latency and limited bandwidth between peers (cf. between server cluster in datacenter) 2. User computers are less reliable than managed servers 3. Lack of trust in peers correct behavior Securing DHT routing hard, unsolved in practice 53

DHTs in retrospective Seem promising for finding data in large P2P systems Decentralization seems good for load, fault tolerance But: the security problems are difficult But: churn is a problem, particularly if log(n) is big So DHTs have not had the impact that many hoped for 54

What DHTs got right Consistent hashing Elegant way to divide a workload across machines Very useful in clusters: actively used today in Amazon Dynamo and other systems Replication for high availability, efficient recovery after node failure Incremental scalability: add nodes, capacity increases Self-management: minimal configuration Unique trait: no single server to shut down/monitor 55