Peer-to-peer computing research a fad?

Similar documents
Distributed Hash Tables: Chord

Peer-to-Peer Systems and Distributed Hash Tables

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Vivaldi Practical, Distributed Internet Coordinates

Distributed Hash Tables

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

P2P Network Structured Networks (IV) Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

CIS 700/005 Networking Meets Databases

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

Peer- Peer to -peer Systems

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Kademlia: A P2P Informa2on System Based on the XOR Metric

Peer-to-Peer Internet Applications: A Review

Topics in P2P Networked Systems

L3S Research Center, University of Hannover

Unit 8 Peer-to-Peer Networking

15-744: Computer Networking P2P/DHT

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

INF5070 media storage and distribution systems. to-peer Systems 10/

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Telematics Chapter 9: Peer-to-Peer Networks

Slides for Chapter 10: Peer-to-Peer Systems

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

Peer-to-Peer (P2P) Systems

08 Distributed Hash Tables

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Chapter 10: Peer-to-Peer Systems

Badri Nath Rutgers University

Scaling Out Systems. COS 518: Advanced Computer Systems Lecture 6. Kyle Jamieson

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

DATA. The main challenge in P2P computing is to design and implement LOOKING UP. in P2P Systems

Distributed Knowledge Organization and Peer-to-Peer Networks

INF5071 Performance in distributed systems: Distribution Part III

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Searching for Shared Resources: DHT in General

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Kademlia: A peer-to peer information system based on XOR. based on XOR Metric,by P. Maymounkov and D. Mazieres

Telecommunication Services Engineering Lab. Roch H. Glitho

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

Searching for Shared Resources: DHT in General

Peer-To-Peer Techniques

Distributed Systems Peer-to-Peer Systems

Distributed Hash Tables (DHTs) OpenDHT: A Shared, Public DHT Service

Modern Technology of Internet

CS 268: Lecture 22 DHT Applications

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

LECT-05, S-1 FP2P, Javed I.

Distributed Hash Table

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Advanced Computer Networks

Making Gnutella-like P2P Systems Scalable

: Scalable Lookup

Motivation for peer-to-peer

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

A Survey of Peer-to-Peer Content Distribution Technologies

Peer-to-Peer Systems. Chapter General Characteristics

Distributed lookup services

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Middleware and Distributed Systems. Peer-to-Peer Systems. Peter Tröger

(C) Chukiat Worasucheep 1

Proceedings of the First Symposium on Networked Systems Design and Implementation

Slides for Chapter 10: Peer-to-Peer Systems. From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Distributed Information Processing

Peer-to-Peer (P2P) Distributed Storage. Dennis Kafura CS5204 Operating Systems

P2P: Distributed Hash Tables

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Overlay networks. T o do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. q q q. Turtles all the way down

CS514: Intermediate Course in Computer Systems

Subway : Peer-To-Peer Clustering of Clients for Web Proxy

CSCI-1680 P2P Rodrigo Fonseca

Introduction on Peer to Peer systems

Chapter 6 PEER-TO-PEER COMPUTING

Chapter 10 Peer-to-Peer Systems

Today s Objec2ves. Kerberos. Kerberos Peer To Peer Overlay Networks Final Projects

A Common API for Structured Peer-to- Peer Overlays. Frank Dabek, Ben Y. Zhao, Peter Druschel, Ion Stoica

Introduction to P2P Computing

Peer to Peer Systems and Probabilistic Protocols

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications

Providing File Services using a Distributed Hash Table

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Vivaldi Practical, Distributed Internet Coordinates

CDNs and Peer-to-Peer

Coordinate-based Routing:

Nash Equilibrium Load Balancing

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

Towards Benchmarking of P2P Technologies from a SCADA Systems Protection Perspective

CS November 2018

Peer-to-Peer Systems

Symphony. Symphony. Acknowledgement. DHTs: : The Big Picture. Spectrum of DHT Protocols. Distributed Hashing in a Small World

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Overlay Networks in ScaleNet

Transcription:

Peer-to-peer computing research a fad? Frans Kaashoek kaashoek@lcs.mit.edu NSF Project IRIS http://www.project-iris.net Berkeley, ICSI, MIT, NYU, Rice

What is a P2P system? Node Node Node Internet Node Node A distributed system architecture: No centralized control Nodes are symmetric in function Large number of unreliable nodes Enabled by technology improvements

P2P: an exciting social development Internet users cooperating to share, for example, music files Napster, Gnutella, Morpheus, KaZaA, etc. Lots of attention from the popular press The ultimate form of democracy on the Internet The ultimate threat to copy-right protection on the Internet

How to build robust services? Many critical services use Internet Hospitals, government agencies, etc. These services need to be robust Node and communication failures Load fluctuations (e.g., flash crowds) Attacks (including DDoS)

Example: peer-to-peer data archiver Back up hard disk to other users machines Why? Backup is usually difficult Many machines have lots of spare disk space Requirements for cooperative archiver: Divide data evenly among many computers Find data Don t lose any data High performance: backups are big More challenging than sharing music!

The promise of P2P computing Reliability: no central point of failure Many replicas Geographic distribution High capacity through parallelism: Many disks Many network connections Many CPUs Automatic configuration Useful in public and proprietary settings

Traditional distributed computing: client/server Server Client Client Internet Client Client Successful architecture, and will continue to be so Tremendous engineering necessary to make server farms scalable and robust

Application-level overlays Site 2 Site 3 N N ISP1 ISP2 N Site 1 N ISP3 N One per application Nodes are decentralized NOC is centralized N Site 4 P2P systems are overlay networks without central control

Distributed hash table (DHT) put(key, data) Distributed application get (key) data Distributed hash table lookup(key) node IP address Lookup service (Backup) (DHash) (Chord) node node. node DHT distributes data storage over perhaps millions of nodes Many applications can use the same DHT infrastructure

A DHT has a good interface Put(key, value) and get(key) Æ value Simple interface! API supports a wide range of applications DHT imposes no structure/meaning on keys Key/value pairs are persistent and global Can store keys in other DHT values And thus build complex data structures

A DHT makes a good shared infrastructure Many applications can share one DHT service Much as applications share the Internet Eases deployment of new applications Pools resources from many participants Efficient due to statistical multiplexing Fault-tolerant due to geographic distribution

Many applications for DHTs Streaming [SplitStream, ] Content distribution networks [Coral, Squirrel,..] File systems [CFS, OceanStore, PAST, Ivy, ] Archival/Backup store [HiveNet,Mojo,Pastiche] Censor-resistant stores [Eternity, FreeNet,..] DB query and indexing [PIER, ] Event notification [Scribe] Naming systems [ChordDNS, Twine,..] Communication primitives [I3, ] Common thread: data is location-independent

DHT implementation challenges 1. Scalable lookup 2. Handling failures 3. Network-awareness for performance 4. Data integrity 5. Coping with systems in flux 6. Balance load (flash crowds) 7. Robustness with untrusted participants 8. Heterogeneity 9. Anonymity 10. Indexing Goal: simple, provably-good algorithms this talk

1. The lookup problem How do you find the node responsible for a key? N 1 N 2 N 3 Publisher put(key, data) Internet? Client get(key) N 4 N5 N 6

Centralized lookup (Napster) Central server knows where all keys are key/value N 4 key is at N4 N 1 DB N 2 N 3 Client Lookup(key) N 9 N N 7 6 N 8 Simple, but O(N) state for server Server can be attacked (lawsuit killed Napster)

Flooding queries (Gnutella) Lookup by asking every node key/value N 4 N 1 N 2 N 3 Lookup(key) Client N 6 N 9 N 7 N 8 Robust but expensive: O(N) messages per lookup

Routed lookups Each node has a numeric identifier (ID) Route lookup(key) in ID space N 100 put(key=50, data= ) Client1 N 30 N 10 N 15 N 5 get(key=50) Client2 N 12 N 50 N 25 N 60 N 45

Routing algorithm goals Fair (balanced) key range assignments Easy to maintain routing table Dead nodes lead to time outs Small number of hops to route message Low stretch Stay robust despite rapid change Solutions: Small table and Log(N) hops: CAN, Chord, Kademlia, Pastry, Tapestry, Koorde, etc. Big table and One/two hops: Kellips, EpiChord, etc.

Chord key assignments Each node has 160-bit ID ID space is circular Data keys are also IDs A key is stored on the next higher node Good load balance Easy to find keys slowly N105 N90 Circular ID space K80 K5 K20 N60 (N90 is responsible for keys K61 through K90) N32

Chord s routing table Routing table lists nodes: _ way around circle _ way around circle 1/8 way around circle next around circle The table is small: log N entries 1/8 _ 1/16 1/32 1/64 1/128 N80 _

Chord lookups take O(log N) hops Each step goes at least halfway to the destination Lookups are fast: log N steps N110 N99 _ N5 _ N10 K19 N20 N32 N80 N60 Node N32 looks up key K19

2. Handling failures: redundancy Each node knows about next r nodes on circle Each key is stored by the r nodes after it on the circle To save space, each node stores only a piece of the block Collecting half the pieces is enough to reconstruct the block N110 N99 N80 N5 N60 N10 K19 N20 K19 N32 N40 K19

Redundancy handles failures Failed Lookups (Fraction) 1000 DHT nodes Average of 5 runs 6 replicas for each key Kill fraction of nodes Then measure how many lookups fail All replicas must be killed for lookup to fail Failed Nodes (Fraction)

3. Reducing Chord lookup delay N20 N80 N40 N41 N20 s routing table may point to distant nodes Any node with a closer ID could be used in route Knowing about proximity could help performance Key challenge: avoid O(N 2 ) pings

Estimate latency using synthetic coordinate system Each node estimates its position Position = (x,y): synthetic coordinates x and y units are time (milliseconds) Distance between two nodes coordinates predicts network latency Challenges: triangle equality, etc.

Vivaldi synthetic coordinates Each node starts with a random incorrect position 2,3 1,2 0,1 3,0

Vivaldi synthetic coordinates Each node starts with a random incorrect position Each node pings a few other nodes to measure network latency (distance) 1 ms 2 ms A 1 ms 2 ms B

Vivaldi synthetic coordinates Each node starts with a random incorrect position Each node pings a few other nodes to measure network latency (distance) Each nodes moves to cause measured distances to match coordinates 1 2 2 1

Vivaldi synthetic coordinates Each node starts with a random incorrect position Each node pings a few other nodes to measure network latency (distance) Each nodes moves to cause measured distance to match coordinates Minimize force in spring network 1 2 2 1

Execution on 86 PlanetLab Internet hosts Each host only pings a few other random hosts Most hosts find useful coordinates after a few dozen pings Use 3D coordinates Vivaldi in action

Vivaldi vs. network coordinates Vivaldi s coordinates match geography well over-sea distances shrink (faster than over-land) orientation of Australia and Europe is wrong Simulations confirm results for larger networks

Vivaldi predicts latency well 600 Actual latency (ms) 400 200 NYU AUS y = x 0 0 200 400 600 Predicted latency (ms)

DHT fetches with Vivaldi (1) Choose the lowest-latency copy of data Fetch time (ms) 500 400 300 200 100 Lookup Download 0 Without Vivaldi With Replica Selection

DHT fetches with Vivaldi (2) Choose the lowest-latency copy of data Route Chord lookup through nearby nodes Fetch time (ms) 500 400 300 200 100 0 Without Vivaldi With Replica Selection Lookup Download With Proximity Routing

DHT implementation summary Chord for looking up keys Replication at successors for fault tolerance Fragmentation and erasure coding to reduce storage space Vivaldi network coordinate system for Server selection Proximity routing

Backup system on DHT Store file system image snapshots as hash trees Can access daily images directly Yet images share storage for common blocks Only incremental storage cost Encrypt data User-level NFS server parses file system images to present dump hierarchy Application is ignorant of DHT challenges DHT is just a reliable block store

Future work DHTs Improve performance Handle untrusted nodes Vivaldi Does it scale to larger and more diverse networks? Apps Need lots of interesting applications

Philosophical questions How decentralized should systems be? Gnutella versus content distribution network Have a bit of both? (e.g., CDNs) Why does the distributed systems community have more problems with decentralized systems than the networking community? A distributed system is a system in which a computer you don t know about renders your own computer unusable Internet (BGP, NetNews)

Related Work Lookup algs CAN, Kademlia, Koorde, Pastry, Tapestry, Viceroy, DHTs DOLR, Past, OpenHash Network coordinates and springs GNP, Hoppe s mesh relaxation Applications Ivy, OceanStore, Pastiche, Twine,

Conclusions Peer-to-peer promises some great properties DHTs are a good way to build peer-to-peer applications: Easy to program Single, shared infrastructure for many applications Robust in the face of failures Scalable to large number of servers Self configuring Can provide high performance http://www.project-iris.net