Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

Similar documents
Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Distributed Hash Tables

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

CS535 Big Data Fall 2017 Colorado State University 11/7/2017 Sangmi Lee Pallickara Week 12- A.

Content Overlays. Nick Feamster CS 7260 March 12, 2007

: Scalable Lookup

Tracking Objects in Distributed Systems. Distributed Systems

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

LECT-05, S-1 FP2P, Javed I.

Some of the peer-to-peer systems take distribution to an extreme, often for reasons other than technical.

An Expresway over Chord in Peer-to-Peer Systems

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

Peer-to-Peer (P2P) Systems

CS514: Intermediate Course in Computer Systems

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Peer-to-Peer Systems and Distributed Hash Tables

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

CS 347 Parallel and Distributed Data Processing

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Telematics Chapter 9: Peer-to-Peer Networks

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Peer to peer systems: An overview

CSE 486/586 Distributed Systems

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

L3S Research Center, University of Hannover

08 Distributed Hash Tables

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Peer-to-Peer (P2P) Distributed Storage. Dennis Kafura CS5204 Operating Systems

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

Peer-to-Peer Systems. Chapter General Characteristics

P2P: Distributed Hash Tables

Peer to Peer Networks

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Distributed Hash Tables (DHT)

Distributed Hash Tables: Chord

Cycloid: A constant-degree and lookup-efficient P2P overlay network

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Opportunistic Application Flows in Sensor-based Pervasive Environments

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

Introduction to P2P Systems

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

EE 122: Peer-to-Peer Networks

A Survey of Peer-to-Peer Content Distribution Technologies

Architectures for Distributed Systems

Page 1. Key Value Storage"

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd

Distributed Hash Tables

Introduction to P2P Computing

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

CIS 700/005 Networking Meets Databases

Lecture 8: Application Layer P2P Applications and DHTs

Back-Up Chord: Chord Ring Recovery Protocol for P2P File Sharing over MANETs

PEER-TO-PEER (P2P) systems are now one of the most

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

Information Leak in the Chord Lookup Protocol

Modern Technology of Internet

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Distributed Hash Tables

Proceedings of the First Symposium on Networked Systems Design and Implementation

Making Gnutella-like P2P Systems Scalable

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

CRESCENDO GEORGE S. NOMIKOS. Advisor: Dr. George Xylomenos

INF5070 media storage and distribution systems. to-peer Systems 10/

Introduction to Peer-to-Peer Systems

DYNAMIC RESOURCE LOCATION IN PEER-TO-PEER NETWORKS. A Thesis RIPAL BABUBHAI NATHUJI

L3S Research Center, University of Hannover

Peer-to-Peer: Part II

Peer-to-Peer Protocols and Systems. TA: David Murray Spring /19/2006

Searching for Shared Resources: DHT in General

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

CSCI-1680 P2P Rodrigo Fonseca

Searching for Shared Resources: DHT in General

Scaling Problem Computer Networking. Lecture 23: Peer-Peer Systems. Fall P2P System. Why p2p?

An Adaptive Stabilization Framework for DHT

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

Peer to Peer Networks

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Distributed Systems Final Exam

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Transcription:

Distributed File Systems: An Overview of Peer-to-Peer Architectures Distributed File Systems Data is distributed among many sources Ex. Distributed database systems Frequently utilize a centralized lookup server for addressing Completely distributed approach No centralized services or information 1

Peer-to-peer Systems What is a P2P System? Network of nodes with equivalent capabilities and responsibilities Common Examples Gnutella Freenet Peer-to-Peer Systems P2P is a convenient paradigm which can be used to serve various applications (including the popular use of file sharing). Problem with P2P systems: locating the particular node which stores the requested data 2

Node Location: Napster Napster uses a central index to search for nodes with desired file. If central server goes down, service is lost for all users. NOT true P2P. Node Location: Gnutella Search Algorithm: Random Gnutella broadcasts file requests to nodes Solution scales extremely poorly, therefore requests cannot be broadcast to all nodes. With Gnutella, search may fail even though the file exists in the system 3

Node Location: Freenet Search Algorithm: Random In the Freenet system, no particular node is responsible for a file. Instead searches look for cached copies of the file. Problems with Freenet: once again existing files are not guaranteed to be retrieved; no bound on cost. Problems Random Search Algorithms: Work poorly for uncommon data No theoretical bound on the overhead required How to introduce determinism without centralized mechanisms? Ans: Hash functions 4

Hash Functions A hash function is (usually) a one way function which will produce the same key given the same input Hash functions we consider will have follow a uniform distribution in keyspace Hash Based P2P Systems Chord (Pastry is very similar) Utilizes a ring based overlay CAN Utilizes a hyper-space overlay model 5

What Chord Can Contribute Chord is truly distributed: No node is more important than another. If requested object exists in the system, it will definitely be found. Chord gives performance bounds. Consistent Hashing Using a consistent hashing function (SHA-1 function), Chord maps a given key to a particular node. Requests for object with a particular key are easily forwarded to the correct node. Consistent hashing helps to provide natural load balancing. 6

Mapping of Objects 0 6 2 5 3 Node identifiers are established in a circle. Keys are assigned to the node with the next highest value. Routing If every node has knowledge of its successor node, requests can be propagated along the ring. Problem: it may require traversing all N nodes in order to find the object. Solution: use finger tables to optimize routing. 7

Finger Tables Given m bits in the key/node identifiers, every node keeps a routing table with m entries (entries hold node identifier, IP address, and port numbers). The ith entry in the table at node n contains the identity of the node that succeeds n by at least 2 i-1. If s is the ith finger of the node, then s=successor(n+ 2 i-1 ), and is denoted by n.finger[i].node. Finger Tables (cont.) The first entry of the finger table is the successor of n. Subsequent entries are spaced out more and more. If a node doesn t know the successor of a key k, it passes the request on to a node whose ID is closer. Thus the request is passed at least half the distance to the responsible node. 8

Finding Successors In order to find the successor of a key, the predecessor of the key is found, and the successor of that node is taken from its finger table. n.find_successor(key) n =find_predecessor(key); return n.successor; Finding Predecessors (1) In order to find the predecessor of a key, request is passed along to next closest node until key falls within appropriate interval. n.find_predecessor(key) n =n; while(id!(n,n.successor]) n =n.closest_preceding_finger(key); return n ; 9

Finding Predecessors (2) n.closest_preceding_finger(key) for i=m downto 1 if (finger[i].node!(n, key)) return finger[i].node; return n; Performance Since the distance to the successor of a key is halved with each step, the bound for number of steps required for lookups is O(log N). 10

Node Joins: Invariants Every node maintains the correct successor. For every key k, node successor(k) is responsible for k. Node Joins: Process Initialize the predecessor and fingers of new node n (to simplify joins and leaves, every node is also responsible for keeping a pointer to their predecessor). Update the fingers and predecessors of existing nodes. Notify the higher layer software so that state of keys is transferred appropriately (note that n is the only node to which any transfers occur). 11

Concurrent Operations Aggressively maintaining finger tables of all nodes difficult to maintain with concurrent joins. When a single node joins, very few finger table entries need to be modified. To adjust for concurrent joins, use a stabilization protocol instead of the aggressive correctness protocol. Stabilization Pseudocode Node n knows of some other node n when joining. n.join(n ) predecessor = nil; successor = n.find_successor(n); 12

Pseudocode (cont.) Successor s are verified periodically. n.stabilize() x=successor.predecessor; if(x! (n, successor)) successor = x; successor.notify(n); n.notify(n ) if(predecessor is nil or n! (predecessor,n)) predecessor = n ; Pseudocode (cont.) Finger tables are refreshed periodically. n.fix_fingers() i=random index > 1 into finger[]; finger[i].node = find_successor(finger[i].start); 13

Node Failures In order for failure recovery, queries must still succeed until system stabilizes. In order for successful queries, nodes must have correct knowledge of their successors. Every node keeps a list of its r nearest successors. Quick Note on Pastry The Chord and Pastry designs are similar for the most part. The notable difference between the two is that Pastry provides locality while Chord does not. 14

CAN Use an d-dimensional hyperspace to distribute files A node is assigned to a point in the hyperspace using d hash functions The hyperspace is divided into partitions according to node placement (should be uniformly distributed due to hash functions) CAN (0, 4) (4, 4) 7 (0, 0) (4, 0) 15