CS 347 Parallel and Distributed Data Processing

Similar documents
Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

Chord. Advanced issues

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

CIS 700/005 Networking Meets Databases

L3S Research Center, University of Hannover

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

A Survey of Peer-to-Peer Content Distribution Technologies

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Peer-to-Peer Systems and Distributed Hash Tables

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Distributed Hash Tables (DHT)

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Architectures for Distributed Systems

Peer-to-Peer (P2P) Systems

Distributed Hash Table

LECT-05, S-1 FP2P, Javed I.

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Peer-to-Peer Systems. Chapter General Characteristics

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

P2P: Distributed Hash Tables

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Page 1. Key Value Storage"

Peer to Peer Networks

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Introduction to Peer-to-Peer Systems

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

EE 122: Peer-to-Peer Networks

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Special Topics: CSci 8980 Edge History

08 Distributed Hash Tables

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

Telematics Chapter 9: Peer-to-Peer Networks

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

: Scalable Lookup

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Making Gnutella-like P2P Systems Scalable

Distributed Hash Tables

Badri Nath Rutgers University

Introduction to P2P Computing

Lecture 8: Application Layer P2P Applications and DHTs

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Peer-to-Peer Internet Applications: A Review

Distributed Hash Tables

Scalable overlay Networks

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

L3S Research Center, University of Hannover

A Decentralized Content-based Aggregation Service for Pervasive Environments

Distributed lookup services

Advanced Distributed Systems. Peer to peer systems. Reference. Reference. What is P2P? Unstructured P2P Systems Structured P2P Systems

CS535 Big Data Fall 2017 Colorado State University 11/7/2017 Sangmi Lee Pallickara Week 12- A.

CSCI-1680 P2P Rodrigo Fonseca

Distributed Hash Tables: Chord

Securing Chord for ShadowWalker. Nandit Tiku Department of Computer Science University of Illinois at Urbana-Champaign

Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd

Part 1: Introducing DHTs

Distributed Hash Tables

FORMAL METHODS IN NETWORKING COMPUTER SCIENCE 598D, SPRING 2010 PRINCETON UNIVERSITY LIGHTWEIGHT MODELING IN PROMELA/SPIN AND ALLOY

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Peer-to-peer systems and overlay networks

CS514: Intermediate Course in Computer Systems

Saarland University Faculty of Natural Sciences and Technology I Department of Computer Science. Masters Thesis

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

In this assignment, you will implement a basic CHORD distributed hash table (DHT) as described in this paper:

An Expresway over Chord in Peer-to-Peer Systems

Tracking Objects in Distributed Systems. Distributed Systems

Advanced Computer Networks

Distributed Systems Final Exam

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

CSCI-1680 Web Performance, Content Distribution P2P John Jannotti

CSCI-1680 Web Performance, Content Distribution P2P Rodrigo Fonseca

Distributed K-Ary System

Opportunistic Application Flows in Sensor-based Pervasive Environments

15-744: Computer Networking P2P/DHT

Naming. Naming. Naming versus Locating Entities. Flat Name-to-Address in a LAN

Internet Services & Protocols

Flexible Routing Tables in a Distributed Key-Value Store

Resilient GIA. Keywords-component; GIA; peer to peer; Resilient; Unstructured; Voting Algorithm

Transcription:

CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 9: Peer-to-Peer Systems

Previous Topics Data Database design Queries Query processing Localization Operators Optimization Transactions Concurrency control Reliability Replication CS 347 Notes 9 2

Previous Topics Data Database design Queries Query processing Localization Operators Optimization Transactions Concurrency control Reliability Replication Client-server architecture Relational data Good understanding of What the data is Where the data is CS 347 Notes 9 3

Next Topic Client-server architecture Relational data Good understanding of What the data is Where the data is? CS 347 Notes 9 4

Peer-to-Peer Systems 1999 2000 2001 2003 2006 2009 CS 347 Notes 9 5

Peer-to-Peer Systems Distributed applications where nodes are Autonomous Very loosely coupled Equal in role or functionality Sharing & exchanging resources with each other CS 347 Notes 9 6

Peer-to-Peer Systems Related concepts File sharing P2P is one option Grid computing Focus is on computing Autonomic computing Focus is on self-management CS 347 Notes 9 7

Peer-to-Peer Systems Search Essential problem to solve Query Who has X? Node 3 Resources R 3,1, R 3,2,... Node 2 Resources R 2,1, R 2,2,... Node 1 Resources R 1,1, R 1,2,... CS 347 Notes 9 8

Peer-to-Peer Systems Search Node 3 Resources R 3,1, R 3,2,... Query Who has X? Node 2 Resources R 2,1, R 2,2,... Answers Node 1 Resources R 1,1, R 1,2,... CS 347 Notes 9 9

Peer-to-Peer Systems Search Node 3 Resources R 3,1, R 3,2,... Query Who has X? Node 2 Resources R 2,1, R 2,2,... Answers Request resource Node 1 Resources R 1,1, R 1,2,... Provide resource CS 347 Notes 9 10

Distributed Lookup Have < k, v > pairs, each of n nodes holds some pairs Given key k, find matching values { v 1, v 2, } k v 1 a 1 b 4 a 7 c 3 a 1 a 4 d lookup(4) = { a, d } CS 347 Notes 9 11

Distributed Lookup Communication overlay network Structured (+) Efficient distributed lookup Unstructured (+) Easy, robust Can handle high churn rates ( ) Flood queries Hybrid E.g., centralized search, decentralized exchange CS 347 Notes 9 12

Distributed Lookup Distributed hashing Most common way to create a structured overlay network H(k) is an m-bit number (k is a key) H(X) is an m-bit number (X is a node identifier) Hash function H is good H(k) 0 2 m - 1 k v stored at Y H(X) H(Y) CS 347 Notes 9 13

Distributed Lookup Distributed hashing Two approaches Chord Replicated hash table (RHT) CS 347 Notes 9 14

Chord The Chord circle N1 m = 6 N51 N48 N56 K54 K10 N8 N14 N42 N21 K38 Using hashed values E.g., N56 is node with id hashing to 56 N38 N32 K24, K30 CS 347 Notes 9 15

Chord Ownership rule Consider nodes X, Y such that Y follows X clockwise Node Y owns keys k such that H(k) in (H(X), H(Y)] N54 N3 Stores K55, K56,, K3 CS 347 Notes 9 16

Chord Notation X.function( ) (remote) calls function at X X.A returns value A at X If X omitted, refers to current node CS 347 Notes 9 17

Chord Successor/predecessor links N1.pred N1 m = 6 N51 N48 N56 N1.succ N8 N14 N42 N21 N38 N32 CS 347 Notes 9 18

Chord Search for owner using successor links X.find_succ(k): if k in (pred, X] return X else if k in (X, succ] return succ else return succ.find_succ(k) CS 347 Notes 9 19

Chord Value lookup X.lookup(k): Y := X.find_succ(k) return Y.get_value(k) X.get_value(k): // Return local value v for k, if it exists CS 347 Notes 9 20

Chord Example Searching for K52 N56 N1 N8 m = 6 N51.find_succ(K52) = N56 N51 N48 N14 N14.find_succ(K52) N42 N21 N38 N32 CS 347 Notes 9 21

Chord Finger table N1 m = 6 N56 N51 N48 N42 N8 N14 N21 Finger table for N8 N8 + 1 N14 N8 + 2 N14 N8 + 4 N14 N8 + 8 N21 N8 + 16 N32 N8 + 32 N42 N38 N32 CS 347 Notes 9 22

Chord Finger table N1 m = 6 N56 N51 N48 N42 N8 N14 N21 Finger table for N8 N8 + 1 N14 N8 + 2 N14 N8 + 4 N14 N8 + 8 N21 N8 + 16 N32 N8 + 32 N42 N38 N32 Node that owns key 8 + 32 = 40 CS 347 Notes 9 23

Chord Search using the finger table X.find_succ(k): if k in (pred, X] return X if k in (X, succ] return succ else Y := closest_preceeding(k) return Y.find_succ(k) X.closest_preceeding(k): for i := m downto 1 if finger[i] in (X, k] return finger[i] return nil CS 347 Notes 9 24

Chord Example Looking up K54 N56 N1 N8 N8.find_succ(K54) m = 6 N51 N48 N14 N42.find_succ(K54) N42 N21 N38 N32 CS 347 Notes 9 25

Chord Example Looking up K54 N56 N1 N8 m = 6 N51.find_succ(K54) N51 N48 N14 N42.find_succ(K54) N42 N21 N38 N32 CS 347 Notes 9 26

Chord Example Looking up K54 N56 N1 N8 N8.find_succ(K54) m = 6 N51.find_succ(K54) N51 N48 N14 N42.find_succ(K54) N42 N21 N38 N32 CS 347 Notes 9 27

Chord Adding nodes N1 m = 6 Need to 1. Update links 2. Move data N51 N48 N56 N8 N14 N42 Joining node N26 N21 N38 For now, assume nodes never die N32 CS 347 Notes 9 28

Chord Adding nodes X.join(Y): // Node Y is known to belong to the circle pred := nil; succ := Y.find_succ(X); CS 347 Notes 9 29

Chord Periodic stabilization X.stabilize(): Y := succ.pred if Y in (X, succ) succ := Y succ.notify(x) X.notify(Y): if pred = nil or Y in (pred, X) pred := Y CS 347 Notes 9 30

Chord Join example Before updating links N x N p N s CS 347 Notes 9 31

Chord Join example After N x.join() nil N x N p N s CS 347 Notes 9 32

Chord Join example After N x.stabilize() nil N x N p N s CS 347 Notes 9 33

Chord Join example After N p.stabilize() N x N p N s Exercise: fix finger table CS 347 Notes 9 34

Chord Moving data When? Keys in (N p, N x ] N x Keys in (N p, N s ] N p N s CS 347 Notes 9 35

Chord Moving data After N s.notify(n x ) nil N x N p N s Send all keys in (N p, N x ] when N s.pred gets updated CS 347 Notes 9 36

Chord Moving data Revised notify() X.notify(Y): if pred = nil or Y in (pred, X) Y.add(data in (pred, Y]) temp := pred pred := Y X.remove(data in (temp, Y]) Glossing over concurrency issues E.g., what happens to lookups while moving data? CS 347 Notes 9 37

Chord Moving data Revised notify() Exercise: when pred = nil, what data gets moved? CS 347 Notes 9 38

Chord Moving data Lookup at wrong node nil Keys in (N p, N x ] N x N p Keys in (N x, N s ] N s Lookup for all k in (N p, N x ] directed to N s CS 347 Notes 9 39

Chord Moving data Revised lookup() X.lookup(k): X.get_value(k): ok := false if k in (pred, X] while not ok return [true, value for k] Y := X.find_succ(k) else [ok, v] := Y.get_value(k) return [false, nil] return v CS 347 Notes 9 40

Chord Moving data Revised lookup() works pred, succ links eventually correct Data ends up at correct node Finger pointers speed up searches, but do not cause problems CS 347 Notes 9 41

Chord Performance n = number of live nodes With high probability, the number of nodes that must be contacted to find a successor is O(log n) Although finger table contains room for m entries, only O(log n) need to be stored Experimental results show average lookup time is ~(log n)/2 CS 347 Notes 9 42

Chord Node failures k7 k8 k9 v7 v8 v9 N x N p N s Assume N x dies Links become invalid Data gets lost CS 347 Notes 9 43

Chord Node failures Fixing links X.check_pred(): if pred failed pred := nil Also, keep backup links to s > 1 successors CS 347 Notes 9 44

Chord Node failure example State before failure N x N p N s backup succ link (s = 2) CS 347 Notes 9 45

Chord Node failure example After failure After N s.check_pred() N p Ns nil CS 347 Notes 9 46

Chord Node failure example After N p discovers that N x is down N p Ns nil CS 347 Notes 9 47

Chord Node failure example After stabilization N p Ns CS 347 Notes 9 48

Chord Node failure Protection from data loss: e.g., robust nodes (see replication notes) Robust node X Takeover protocols Node X 1 Node X 2 Node X 3 k7 v7 k7 v7 k7 v7 k8 v8 k8 v8 k8 v8 k9 v9 k9 v9 k9 v9 Backup protocols CS 347 Notes 9 49

Chord Summary Finding owner nodes Successor and predecessor links Finger table Adding nodes Updating links Moving data Coping with node failures Fixing links Protecting from data loss Performance CS 347 Notes 9 50

Replicated Hash Table Node N0 Node N1 Node N2 Node N3 Hash Node Hash Node Hash Node Hash Node 0 N0 0 N0 0 N0 0 N0 1 N1 1 N1 1 N1 1 N1 2 N2 2 N2 2 N2 2 N2 3 N3 3 N3 3 N3 3 N3 Data for keys that hash to 0 Data for keys that hash to 1 Data for keys that hash to 2 Data for keys that hash to 3 CS 347 Notes 9 51

Replicated Hash Table Adding nodes E.g., N0 overloaded Node N0 Hash Node 0 N0 1 N0 2 N2 3 N2 Data for keys that hash to 0,1 Node N2 Hash Node 0 N0 1 N0 2 N2 3 N2 Data for keys that hash to 2,3 CS 347 Notes 9 52

Replicated Hash Table Adding nodes First, set up N1 Node N0 Hash Node 0 N0 1 N0 2 N2 3 N2 Data for keys that hash to 0,1 Node N1 Hash Node 0 N0 1 N0 2 N2 3 N2 Node N2 Hash Node 0 N0 1 N0 2 N2 3 N2 Data for keys that hash to 2,3 CS 347 Notes 9 53

Replicated Hash Table Adding nodes Next, copy data to N1 Node N0 Node N1 Node N2 Hash Node Hash Node Hash Node 0 N0 0 N0 0 N0 1 N0 1 N0 1 N0 2 N2 2 N2 2 N2 3 N2 3 N2 3 N2 Data for keys that hash to 0,1 Data copy Data for keys that hash to 1 Data for keys that hash to 2,3 CS 347 Notes 9 54

Replicated Hash Table Adding nodes Next, change control Node N0 Node N1 Node N2 Hash Node Hash Node Hash Node 0 N0 0 N0 0 N0 1 N0 N1 1 N0 N1 1 N0 2 N2 2 N2 2 N2 3 N2 3 N2 3 N2 Data for keys that hash to 0,1 Data for keys that hash to 1 Data for keys that hash to 2,3 CS 347 Notes 9 55

Replicated Hash Table Adding nodes Next, remove data from N0 Node N0 Hash Node 0 N0 1 N1 2 N2 3 N2 Data for keys that hash to 0 Node N1 Hash Node 0 N0 1 N1 2 N2 3 N2 Data for keys that hash to 1 Node N2 Hash Node 0 N0 1 N0 2 N2 3 N2 Data for keys that hash to 2,3 CS 347 Notes 9 56

Replicated Hash Table Adding nodes Finally, update other nodes Eagerly by N0 or N1? Lazily during future lookups? Node N0 Node N1 Node N2 Hash Node Hash Node Hash Node 0 N0 0 N0 0 N0 1 N1 1 N1 1 N0 N1 2 N2 2 N2 2 N2 3 N2 3 N2 3 N2 Data for keys that hash to 0 Data for keys that hash to 1 Data for keys that hash to 2,3 CS 347 Notes 9 57

Replicated Hash Table Adding nodes What about inserts while adding a node (e.g., during copy)? Apply at N0 then copy? Apply at both? Redirect to N1? Node N0 Node N1 Node N2 Hash Node Hash Node Hash Node 0 N0 0 N0 0 N0 1 N0 1 N0 1 N0 2 N2 2 N2 2 N2 3 N2 3 N2 3 N2 Data for keys that hash to 0,1 Data copy Data for keys that hash to 1 Data for keys that hash to 2,3 CS 347 Notes 9 58

Chord vs. Replicated Hash Table Which is simpler to implement? Cost of operations Looking up values: O(log n) vs. O(1) Adding nodes Recovering from node failures Storage cost Routing table size: log n vs. n CS 347 Notes 9 59

Distributed Lookup Communication overlay network Structured Chord Replicated hash table Unstructured Neighborhood search CS 347 Notes 9 60

Neighborhood Search Each node stores its own data Searches nearby nodes E.g., Gnutella Key Value Key Value Key Value 41 g 99 c 14 d 12 a 7 b 13 c 25 a 47 f 12 d 51 x 9 y CS 347 Notes 9 61

Neighborhood Search Example 41 g 13 x 41 g 13 c 14 d 13 f 12 c 15 f 12 c 14 d 41 g 13 c 14 d 12 a 7 b 13 c 25 a 47 f 12 d 25 x 9 y 13 d 12 c 41 g 13 f 14 d Node N1 N1.lookup(13), TTL = 4 13 t 1 d 44 s 5 a CS 347 Notes 9 62

Neighborhood Search Example 41 g 13 x 41 g 13 c 14 d 13 f 12 c 15 f 12 c 14 d 41 g 13 c 14 d TTL = 1 12 a 7 b 13 c 25 a 47 f 12 d 25 x 9 y 13 d 12 c TTL = 2 41 g 13 f 14 d TTL = 3 Node N1 N1.lookup(13), TTL = 4 Answer so far = { c, f, x } 13 t 1 d 44 s 5 a CS 347 Notes 9 63

Neighborhood Search Example 41 g 13 x 41 g 13 c 14 d 13 f 12 c 15 f 12 c 14 d 41 g 13 c 14 d 12 a 7 b 13 c 25 a 47 f 12 d 25 x 9 y 13 d 12 c 41 g 13 f 14 d Node N1 N1.lookup(13), TTL = 4 Answer so far = { c, d, f, x } 13 t 1 d 44 s 5 a Incorrect/incomplete CS 347 Notes 9 64

Neighborhood Search Optimization Queries have unique identifiers Nodes keep cache of recent queries (query identifier and TTL) CS 347 Notes 9 65

Neighborhood Search Example [ 77, 1 ] 41 g 13 x 41 g 13 c 14 d 13 f 12 c 15 f 12 c [ 77, 2 ] 14 d 41 g 13 c 14 d TTL = 1 Do not reprocess 77 12 a 7 b 13 c 25 a 47 f 12 d 25 x 9 y 13 d 12 c TTL = 2 41 g 13 f 14 d TTL = 3 Node N1 [ 77, 4 ] N1.lookup(13), TTL = 4, id = 77 13 t 1 d 44 s 5 a [ 77, 3 ] CS 347 Notes 9 66

Neighborhood Search Example TTL = 1 [ 77, 1 ] 41 g 13 x 41 g 13 c 14 d 13 f 12 c 15 f 12 c 14 d [ 77, 2 ] TTL = 2 41 g 13 c 14 d TTL = 3 12 a 7 b 13 c 25 a 47 f 12 d 25 x 9 y 13 d 12 c TTL = 2 Do not reprocess 77 41 g 13 f 14 d Node N1 [ 77, 4 ] N1.lookup(13), TTL = 4, id = 77 13 t 1 d 44 s 5 a [ 77, 3 ] CS 347 Notes 9 67

Neighborhood Search Bootstrapping Bootstrap server Add to S Known nodes S = N1, N2, Get neighbors N5 N1 N2 N7 CS 347 Notes 9 68

Neighborhood Search Problems Unnecessary messages High load and traffic E.g., if nodes have p neighbors, each search ~p TTL messages Low capacity nodes are a bottleneck May not find all answers Network Area searched TTL CS 347 Notes 9 69

Neighborhood Search Advantages Can handle complex queries Simple, robust algorithm Works well if data is highly replicated Network Area searched TTL Nodes with data CS 347 Notes 9 70

Open Problems Performance Availability Efficiency Load balancing Authenticity DoS prevention Incentives Anonymity Correctness Participation CS 347 Notes 9 71

Peer-to-Peer Systems Summary Structured networks Chord Replicated hash table Unstructured networks Neighborhood search Open problems CS 347 Notes 9 72