Peer-to-Peer (P2P) Distributed Storage. Dennis Kafura CS5204 Operating Systems

Similar documents
Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Distributed Hash Tables

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

Peer-to-Peer Systems and Distributed Hash Tables

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

INF5070 media storage and distribution systems. to-peer Systems 10/

Distributed Hash Tables: Chord

Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

A Survey of Peer-to-Peer Content Distribution Technologies

Architectures for Distributed Systems

CS 268: Lecture 22 DHT Applications

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

Peer-to-peer computing research a fad?

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

08 Distributed Hash Tables

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

Providing File Services using a Distributed Hash Table

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Introduction to Peer-to-Peer Systems

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

A Framework for Peer-To-Peer Lookup Services based on k-ary search

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.

CSE 5306 Distributed Systems

Topics in P2P Networked Systems

Back-Up Chord: Chord Ring Recovery Protocol for P2P File Sharing over MANETs

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

CSE 5306 Distributed Systems. Naming

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Vivaldi Practical, Distributed Internet Coordinates

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Slides for Chapter 10: Peer-to-Peer Systems

Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Chapter 6 PEER-TO-PEER COMPUTING

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Distributed Information Processing

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Naming WHAT IS NAMING? Name: Entity: Slide 3. Slide 1. Address: Identifier:

CS 347 Parallel and Distributed Data Processing

Kademlia: A P2P Informa2on System Based on the XOR Metric

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Some of the peer-to-peer systems take distribution to an extreme, often for reasons other than technical.

P2P: Distributed Hash Tables

L3S Research Center, University of Hannover

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

Early Measurements of a Cluster-based Architecture for P2P Systems

Slides for Chapter 10: Peer-to-Peer Systems. From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design

Cloud Computing CS

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Naming. Distributed Systems IT332

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

A Survey on Peer-to-Peer File Systems

Large-Scale Data Stores and Probabilistic Protocols

Ken Birman. Cornell University. CS5410 Fall 2008.

Presented By: Devarsh Patel

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Scaling Out Key-Value Storage

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Lecture 8: Application Layer P2P Applications and DHTs

File Management. Chapter 12

Searching for Shared Resources: DHT in General

Dynamo: Key-Value Cloud Storage

Introduction to P2P Systems

Peer-to-peer systems and overlay networks

Searching for Shared Resources: DHT in General

Peer-to-Peer (P2P) Systems

: Scalable Lookup

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Naming. Naming. Naming versus Locating Entities. Flat Name-to-Address in a LAN

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Modern Technology of Internet

LECT-05, S-1 FP2P, Javed I.

Page 1. Key Value Storage"

Peer to Peer Systems and Probabilistic Protocols

Telematics Chapter 9: Peer-to-Peer Networks

CSE 123b Communications Software

INF5071 Performance in distributed systems: Distribution Part III

Today s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Ivy: A Read/Write Peer-to-Peer File System

Unit 8 Peer-to-Peer Networking

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Chapter 10: Peer-to-Peer Systems

A Comparison Of Replication Strategies for Reliable Decentralised Storage

Secure Distributed Storage in Peer-to-peer networks

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma

Load Sharing in Peer-to-Peer Networks using Dynamic Replication

CIS 700/005 Networking Meets Databases

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Subway : Peer-To-Peer Clustering of Clients for Web Proxy

Transcription:

Peer-to-Peer (P2P) Distributed Storage Dennis Kafura CS5204 Operating Systems 1

Peer-to-Peer Systems Definition: Peer-to-peer systems can be characterized as distributed systems in which all nodes have identical capabilities and responsibilities, and all communication is symmetric. Rowstron- Popular Examples: Napster Gnutella Goals (from Dabek, et. al.) Symmetric and decentralized Operate with unmanaged voluntary participants Fast location of data Tolerate frequent joining/leaving by servers Balanced load CS 5204 Operating Systems 2

CFS: Properties Decentralized control (use ordinary Internet hosts) Scalability (overhead at most logarithmic in the number of servers) Availability (placement of replicas on unrelated servers) Load balance (block distribution and caching) Persistence (renewable lifetimes) Quotas (source-limited insertions) Efficiency (comparable to FTP access) CS 5204 Operating Systems 3

CFS: Architecture CFS Read-only file system interface client server DHash Chord Block distribution/fetching Caching/replication Quota enforcement Block lookup A generic, distributed block store CS 5204 Operating Systems 4

CFS: Content-hash indexing Each block (except for the root block of a file system) is identified by an index obtained from a hash (e.g., SHA-1) of its contents A root block is signed by the author; the index of the root block is a hash of the user s public key data block H( public key ) root block H(D) directory block D H(F) Inode block F H(B1) B1 H(B2) B2 timestamp data block signature CS 5204 Operating Systems 5

Chord: Mapping server s stores all values indexed by key k for which s is the successor of k (successor(k) is the node whose identifier is the smallest one greater than k ) each Chord server maintains two lists: a finger table for searching r immediate successors and their latency information successor H(block) H(IP address + virtual index) block server CS 5204 Operating Systems 6

Chord: Searching (1) D n3 M finger table for n1 start interval successor C A = n1 + 2 0 [A,B) n2 B = n1 + 2 1 [B,C) n3 C = n1 + 2 2 [C,D) n3 B n2 A n1 M = n1 + 2 m-1 CS 5204 Operating Systems 7

Chord: Searching (2) CS 5204 Operating Systems 8

Chord: performance CS 5204 Operating Systems 9

Chord: Adding Servers (1) Two Invariants maintained: Successor information is correct Successor(k) is responsible for key k Steps: 1. By out-of-band means, locate an existing server, n 2. Update tables Update successor/predecessor links Creates finger tables for new server Update other server s finger tables 3. Redistribute responsibility for keys to n from its successor Call higher (DHash) layer CS 5204 Operating Systems 10

Chord: Adding Servers (2) Adding a new node at 6 assuming that node 6 knows, by out-of-band means, of node 2 CS 5204 Operating Systems 11

DHash: Interface put_h(block) stores block using content-hashing put_s(block, pubkey) stores block as a root block; key is hash of pubkey get(key) finds/returns block associated with key CS 5204 Operating Systems 12

DHash: replication Places replicas on k servers following successor Note: each Chord server maintains a list of r immediate successors. By keeping r >= k, it is easy for DHash to determine replica locations Existence of replicas eases reallocation when node leaves the system By fetching the successor list from Chord, the DHash layer can select the most efficient node from which to access a replica of a desired block CS 5204 Operating Systems 13

DHash: caching, load balancing, quotas Caching is effective because searches from different clients converge toward the end of the search Virtual servers hosted on one machine allow for more capable machines to store a larger portion of the identifier space Each server enforces a fixed, per-ip address quota on publishing nodes CS 5204 Operating Systems 14

DHash: replication and caching server target server identifier replicas CS 5204 Operating Systems 15