Vivaldi Practical, Distributed Internet Coordinates

Similar documents
Vivaldi Practical, Distributed Internet Coordinates

Peer-to-peer computing research a fad?

Vivaldi: A Decentralized Network Coordinate System. Authors: Frank Dabek, Russ Cox, Frans Kaashoek, Robert Morris MIT. Published at SIGCOMM 04

Vivaldi: : A Decentralized Network Coordinate System

Vivaldi: A Decentralized Coordinate System

Peer-to-Peer Systems and Distributed Hash Tables

Building a low-latency, proximity-aware DHT-based P2P network

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

: Scalable Lookup

Distributed Hash Tables: Chord

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Charm: A charming network coordinate system

Vivaldi: A Decentralized Network Coordinate System

Distributed Hash Tables

Topics in P2P Networked Systems

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Telematics Chapter 9: Peer-to-Peer Networks

Reminder: Distributed Hash Table (DHT) CS514: Intermediate Course in Operating Systems. Several flavors, each with variants.

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Network Coordinates in the Wild

CS514: Intermediate Course in Computer Systems

Vivaldi: A Decentralized Network Coordinate System

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Coordinate-based Routing:

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Peer-to-Peer (P2P) Systems

P2P: Distributed Hash Tables

Motivation and goal Design concepts and service model Architecture and implementation Performance, and so on...

Proceedings of the First Symposium on Networked Systems Design and Implementation

Peer-to-Peer Systems. Chapter General Characteristics

Re-imagining Vivaldi with HTML5/WebGL. PhD Candidate: Djuro Mirkovic Supervisors: Prof. Grenville Armitage and Dr. Philip Branch

Handling Churn in a DHT

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

Veracity: Practical Secure Network Coordinates via Vote-Based Agreements

Path Optimization in Stream-Based Overlay Networks

Internet Coordinate Systems Tutorial

Building Large-scale Distributed Systems with Network Coordinates

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Pharos: A Decentralized and Hierarchical Network Coordinate System for Internet Distance Prediction

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

CIS 700/005 Networking Meets Databases

Understanding Chord Performance

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

A Survey on Research on the Application-Layer Traffic Optimization (ALTO) Problem

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

08 Distributed Hash Tables

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

L3S Research Center, University of Hannover

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

L3S Research Center, University of Hannover

CS 347 Parallel and Distributed Data Processing

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

LECT-05, S-1 FP2P, Javed I.

Peer-to-Peer Protocols and Systems. TA: David Murray Spring /19/2006

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Introduction to Peer-to-Peer Systems

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Peer-to-Peer (P2P) Distributed Storage. Dennis Kafura CS5204 Operating Systems

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Distributed File Systems: An Overview of Peer-to-Peer Architectures. Distributed File Systems

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

Special Topics: CSci 8980 Edge History

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

EE 122: Peer-to-Peer Networks

Overlay networks. Today. l Overlays networks l P2P evolution l Pastry as a routing overlay example

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Symphony. Symphony. Acknowledgement. DHTs: : The Big Picture. Spectrum of DHT Protocols. Distributed Hashing in a Small World

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Unit 8 Peer-to-Peer Networking

Quasi-Chord: physical topology aware structured P2P network

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla

Chapter 6 PEER-TO-PEER COMPUTING

Peer to Peer Systems and Probabilistic Protocols

Fixing the Embarrassing Slowness of OpenDHT on PlanetLab

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

CS 268: Lecture 22 DHT Applications

Architectures for Distributed Systems

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Distributed Hash Table

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

Experimental Study on Neighbor Selection Policy for Phoenix Network Coordinate System

Venugopal Ramasubramanian Emin Gün Sirer SIGCOMM 04

Scaling Problem Millions of clients! server and network meltdown. Peer-to-Peer. P2P System Why p2p?

*Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)

Intelligent Neighbor Selection in P2P with CapProbe and Vivaldi

NetForecast: A Delay Prediction Scheme for Provider Controlled Networks

Transcription:

Vivaldi Practical, Distributed Internet Coordinates Russ Cox, Frank Dabek, Frans Kaashoek, Robert Morris, and many others rsc@mit.edu Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology January 11, 2005 February 7, 2005 at IDA-CCS, Bowie, Maryland

Overview Circa 2001: file sharing networks extremely popular harness huge numbers of machines but don t scale well (at least asymptotically) Academics study how to build scalable peer-to-peer systems. Chord, Pastry, OceanStore,... How can we make them perform better in practice? predict round trip times with Vivaldi

P2P circa 2001: an exciting social development Internet users cooperating to share, for example, music files. Napster, Gnutella, Morpheus, KaZaA,... Lots of attention from the popular press (and the RIAA!) The ultimate form of democracy on the Internet. The ultimate threat to copyright protection on the Internet. Not much new technologically.

What is a P2P System? System without any central servers. Every node is a server No particular node is vital to the network Nodes all have the same functionality Huge number of nodes, many node failures Enabled by technology improvements

P2P: useful for building reliable services? Many critical services use the Internet. Hospitals, some government agencies, etc. (but not other Agencies) Non-Internet systems exist at large scales too. corporations, government agencies, etc. Can we build large-scale robust distributed services? Node and communication failures Load fluctuations (e.g., flash crowds) Attacks (including DDoS)

The promise of P2P computing Reliability: no central point of failure. Many replicas Geographic distribution High capacity through parallelism Many disks Many network connections Many CPUs Automatic configuration Useful in public and proprietary settings

Distributed Hash Table Building block for principled peer-to-peer systems. DHTs provide location service: map key to a set of machines.

DHT as Location and Rendezvous Service Put(K,D) @ t = 0

DHT as Location and Rendezvous Service Put(K,D) @ t = 0 Get(K) @ t = 1

DHT as Location and Rendezvous Service Many useful primitives can be and have been built on top of this rendezvous. File sharing Web cache Archival/Backup storage Censor-resistant stores DB query and indexing Event notification Naming systems Communication primitives We will concentrate on storage. but first, how to implement DHT?

Chord: One Peer-to-Peer System Developed at MIT in 2000. Recipe differs from others, but flavor is same. Assign nodes and keys big identifiers (SHA1 hashes). Traverse identifier space during lookup. Provide lookup first; layer storage on top.

Chord: Successor Defines Ownership N1 K45 N43 N48 Identifier space arranged in a logical ring. A key s successor is first node clockwise after key on ring. That node is the owner of the key.

Chord: Successors Ensure Correctness N1 K45 N43 N48 Nodes maintain list of r successors. On failure of node n, successor takes over keys. Nodes ping predecessors regularly to detect failure. Can traverse ring by following successors. a bit slow if there are millions of nodes.

Chord: Fingers Speed Lookups K45 N48 N1 1/32 1/16 1/8 N43 1/4 1/2 Nodes maintain pointers to larger hops around the ring. Like using skip lists in a linked list. O(log N) exponentially-distributed fingers.

Chord: Lookup Requires log N Hops N1 K47 N50 N1: N50 is the successor of K47 Looking for K47. Tell N1 N33 Each hop halves the distance to the key.

DHash: a Storage Infrastructure Now we have lookup; add storage (and then apps). To fetch a block in DHash: Block replicated on r successors of key Use Chord (DHT) to find servers responsible for block Download block from one of those servers. Which?

Problem: Predicting Round Trip Times on Internet Example: server selection in a system where: no centralized infrastructure nodes act as servers and clients many thousands of nodes exchanges with server are short server choice changes for each exchange C S1 Internet S2 S3 Want to choose server with lowest round trip time to client. How? S5 S4

Possible Solutions Can avoid predictions, wasting time or bandwidth: measure RTT on demand measure RTT in advance talk to multiple servers at once Can predict using synthetic coordinates as in GNP (Infocom 2002).

Synthetic Coordinates with GNP GNP assigns Euclidean coordinates to nodes such that coordinate distance predicts round trip time.

Synthetic Coordinates with GNP GNP assigns Euclidean coordinates to nodes such that coordinate distance predicts round trip time. Node A pings landmarks to compute its own position. Coord Dist L1 L5 L1 (40,320) L2 (60,180) L3 (160,250) L4 (250,160) L5 (280,300) A L2 L3 L4

Synthetic Coordinates with GNP GNP assigns Euclidean coordinates to nodes such that coordinate distance predicts round trip time. Node A pings landmarks to compute its own position. Coord Dist L1 L5 L1 (40,320) 117 ms L2 (60,180) 201 ms L3 (160,250) 110 ms L4 (250,160) 223 ms L5 (280,300) 143 ms A L2 L3 L4

Synthetic Coordinates with GNP GNP assigns Euclidean coordinates to nodes such that coordinate distance predicts round trip time. Node A pings landmarks to compute its own position. A Coord Dist L1 L5 L1 (40,320) 117 ms L2 (60,180) 201 ms L3 (160,250) 110 ms L4 (250,160) 223 ms L5 (280,300) 143 ms L2 L3 L4

Synthetic Coordinates with GNP GNP assigns Euclidean coordinates to nodes such that coordinate distance predicts round trip time. Node A pings landmarks to compute its own position. A Node B does the same. L1 L5 B L3 L2 L4

Synthetic Coordinates with GNP GNP assigns Euclidean coordinates to nodes such that coordinate distance predicts round trip time. Node A pings landmarks to compute its own position. A Node B does the same. RTT between A and B is predicted by the distance between their coordinates, without direct measurement. L1 B L3 L5 L2 L4

Vivaldi Overview Vivaldi is a decentralized method for computing synthetic coordinates Piggyback on application traffic Node updates its own coordinates in response to sample Each node need only contact a small fraction of the other nodes

Vivaldi Example Follow node A through a sequence of communications. B app msg C A

Vivaldi Example A obtains B s coordinates, RTT. B B at (110, 250) rtt 20ms C A

Vivaldi Example A computes distance to B in coordinate space. B B at (110, 250) rtt 20ms 10 20 30 40 50 C A

Vivaldi Example A adjusts coordinates so distance matches actual RTT. B 10 20 30 40 50 A C

Vivaldi Example Follow node A through communication with C. B app msg A C

Vivaldi Example A obtains C s coordinates, RTT. B C C at (85,125) rtt 50ms A

Vivaldi Example A computes distance to C in coordinate space. C B C at (85,125) rtt 50ms A 10 20 30 40 50 60

Vivaldi Example A adjusts coordinates so distance matches actual RTT. (Now A is wrong distance from B.) C B A 10 20 30 40 50 60

Challenges of Decentralization Without centralized control, must consider: will the system converge to an accurate coordinate set? how long will the system take to converge? will the system be disturbed by new nodes joining the system?

Tuning Vivaldi: Convergence Run Vivaldi on round trip times derived from grid. As described, algorithm never converges. To cause convergence, damp motion. To speed convergence, vary damping with estimate of prediction accuracy.

Tuning Vivaldi: Naive Newcomers Run Vivaldi on round trip times derived from grid. Blue nodes start first, stabilize. Red nodes join the system. High-accuracy nodes are displaced by new, low-accuracy nodes joining the system To avoid this, vary damping with ratio of local node s accuracy and sampled node s accuracy.

Vivaldi Algorithm Given the coordinates, round trip time, and accuracy estimate of a node: Update local accuracy estimate. Compute ideal location. Compute damping constant δ using local and remote accuracy estimates. Moveδof the way toward the ideal location.

Vivaldi Algorithm Given the coordinates, round trip time, and accuracy estimate of a node: Update local accuracy estimate. Compute ideal location. Compute damping constant δ using local and remote accuracy estimates. Moveδof the way toward the ideal location. B app msg, rtt, coords, accuracy A rtt - dist δ * (rtt - dist) new location ideal location 10 20 30 40 50 60 70

Evaluating Synthetic Coordinates on the Internet Cannot evaluate by comparing to correct coordinate set. Evaluate predictions made using a coordinate set. Predictions of Internet will never be perfect. violations of triangle inequality,...

Evaluating Vivaldi on the Internet How accurate are Vivaldi s predictions? How quickly does Vivaldi converge to a coordinate set? How quickly can Vivaldi adapt to network changes? How does choice of coordinate space affect error? How does Vivaldi work in real-world apps?

Evaluation Methodology Use simulator seeded with real Internet measurements. pairwise RTTs for 192 PlanetLab nodes use RTT matrix as input to simulator run various algorithms on simulator Each Vivaldi node queries others as fast as it can one message outstanding at a time each node has a small fixed neighbor set

Vivaldi s Absolute Prediction Error Look at distribution of absolute prediction error, defined as actual RTT predicted RTT, over all node pairs in the system. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 Vivaldi 0.0 0 50 100 150 200 Absolute Error (ms)

Vivaldi s Absolute Prediction Error Look at distribution of absolute prediction error, defined as actual RTT predicted RTT, over all node pairs in the system. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 Vivaldi 0.0 0 50 100 150 200 Absolute Error (ms)

Vivaldi s Absolute Prediction Error Look at distribution of absolute prediction error, defined as actual RTT predicted RTT, over all node pairs in the system. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 Vivaldi 0.0 0 50 100 150 200 Absolute Error (ms)

Vivaldi s Relative Prediction Error Look at relative error, defined as actual RTT predicted RTT min(actual RTT, predicted RTT), over all node pairs in the system. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 Vivaldi 0.0 0 1 2 3 Relative Error

Vivaldi s Relative Prediction Error Look at relative error, defined as actual RTT predicted RTT min(actual RTT, predicted RTT), over all node pairs in the system. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 Vivaldi 0.0 0 1 2 3 Relative Error

Vivaldi s Relative Prediction Error Look at relative error, defined as actual RTT predicted RTT min(actual RTT, predicted RTT), over all node pairs in the system. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 Vivaldi 0.0 0 1 2 3 Relative Error

Vivaldi Compared to GNP on Relative Error Compare to GNP s predictions. GNP sensitive to landmark choice. Use best of 64 random landmark sets. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 GNP best Vivaldi 0.0 0 1 2 3 Relative Error

Vivaldi s Convergence Time Depends on choice of δ, the damping constant. absolute error (ms) 100 50 delta 0.01 delta 1.0 adaptive 0 0 20 40 60 time (s) Using adaptive δ, Vivaldi converges in under 20 seconds (60 measurements per node).

Vivaldi s Time to Adapt to Network Changes Vivaldi nodes are always adjusting their coordinates. Test adapting speed with synthetic topology change: lengthen one link by factor of ten. 30 Median Error (ms) 20 10 change topology revert 0 100 200 300 time (sec) Vivaldi adapts in about twenty seconds.

Other Coordinate Spaces A priori, it s not clear why any coordinate system should fit the Internet well. GNP showed that Euclidean coordinates work well. Why do they work? Are there better coordinate systems? Obvious other candidates: globe, 3D, 4D,...

Vivaldi s 2D Assignment for PlanetLab Placement in 2D mirrors physical geography. Korea China West Coast East Coast Europe Australia

Globe Coordinates vs. 2D Euclidean Globe coordinates (latitude, longitude) place nodes on surface of a sphere. Great circle distance between two nodes on the sphere depends on radius. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 0.0 2D sphere r=240ms sphere r=200ms sphere r=160ms sphere r=140ms sphere r=120ms sphere r=100ms sphere r=80ms sphere r=60ms sphere r=40ms 0 1 2 3 Relative Error Coordinate sets are using one part of the sphere as a rough approximation to a 2D plane.

Higher Euclidean Dimensions If two are good, more should be better. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 9D 5D 4D 3D 2D Why are they better? 0.0 0 1 2 3 Relative Error

Higher Euclidean Dimensions Explained In 2D, some nodes need to be farther away from all others. In 3D, these hard-to-place nodes can move up or down from the 2D plane to get away from everyone. Each new dimension adds an independent direction. Accomodates per-node overhead: server load, access links. Problem: how can we accomodate hard-to-place nodes without an arbitrary number of dimensions?

Height Vectors Give hard-to-place nodes their own way to get away from everyone. Height vectors place nodes at some height above a 2D transit plane. Directly models per-node overhead. (x,y,h) (x,y,h ) h (x,y) h (x,y ) Distance from (x, y, h) to (x, y, h ) is h+ (x x ) 2 + (y y ) 2 + h.

Height Vectors Work Well Height Vectors outperform 2- and 3-D Euclidean. 1.0 Pr[Node Pair Error <= x] 0.8 0.6 0.4 0.2 Height Vectors 9D 4D 3D 2D 0.0 0 1 2 3 Relative Error Works to view Internet as geographically-accurate core with access links attached.

Evaluating Vivaldi on Chord Chord performance is O(log n) but sluggish. Can Vivaldi help?

Watching Traffic Lookups move randomly in the network. Node IDs, block keys are assigned (pseudo-)randomly. 60 0x123: Looking for 0xabcd. latitude 40 20 0-100 -50 0 50 100 longitude

Watching Traffic Lookups move randomly in the network. Node IDs, block keys are assigned (pseudo-)randomly. 60 latitude 40 20 0xa000: 0x123 is looking for 0xabcd 0-100 -50 0 50 100 longitude

Watching Traffic Lookups move randomly in the network. Node IDs, block keys are assigned (pseudo-)randomly. 60 latitude 40 20 0xab00: 0x123 is looking for 0xabcd 0-100 -50 0 50 100 longitude

Watching Traffic Lookups move randomly in the network. Node IDs, block keys are assigned (pseudo-)randomly. 60 latitude 40 20 0xabc0: 0x123, successors of 0xabcd are... 0-100 -50 0 50 100 longitude

Chord: Shortening Lookup Hops Using Vivaldi N1 K200 N1 1/2 Relax definition of finger from immediate successor to any node in range. Given more choices, can choose nearer neighbors.

Chord: Lookup Using Neighbor Selection N1 K200 Each hop now has a range of choices.

Chord: Lookup Using Neighbor Selection N1 K200 Each hop now has a range of choices.

Chord: Lookup Using Neighbor Selection N1 K200 Each hop now has a range of choices.

Chord: Lookup Using Neighbor Selection N1 K200 Each hop now has a range of choices.

Vivaldi Improves Chord and DHash Median latency (ms) 600 400 200 fetch time lookup time 0 Baseline Download Routing Avoid Last Hop Latency optimization techniques (cumulative)

Vivaldi Improves Chord and DHash Median latency (ms) 600 400 200 fetch time lookup time 0 Baseline Download Routing Avoid Last Hop Latency optimization techniques (cumulative)

Vivaldi Improves Chord and DHash Median latency (ms) 600 400 200 fetch time lookup time 0 Baseline Download Routing Avoid Last Hop Latency optimization techniques (cumulative)

Vivaldi Improves Chord and DHash Median latency (ms) 600 400 200 fetch time lookup time 0 Baseline Download Routing Avoid Last Hop Latency optimization techniques (cumulative)

Vivaldi Improves Chord and DHash Vivaldi improved the performance of Chord and DHash: Chord lookups can use nearby neighbors instead of hopping all over the globe. DHash replicates blocks on r successors of block key. DHash can download block from nearest replica. Net effect: Vivaldi reduced block fetch time by close to 50% on PlanetLab. Vivaldi is also used by other systems: Bamboo Distributed Hash Table SWORD Resource Discovery system others in development

Related Work Other location techniques (IDMaps, IP2Geo) use static data (AS maps, guesses at physical location). Centralized coordinate systems (GNP, Lighthouse) need well-known landmark nodes. Decentralized coordinate systems (PIC, NPS) have been developed concurrently. Tang and Crovella (IMC 2003) analyze best Euclidean models to use; Shavitt and Tankel (Infocom 2004) suggest using hyperbolic geometries.

Conclusions Chord is a scalable peer-to-peer system. Vivaldi: accurately predicts round trip time between node pairs not directly measured. works without centralized infrastructure. improves the performance of a Chord and DHash Height vectors are a promising coordinate space for the Internet.

Links Chord home page http://pdos.lcs.mit.edu/chord Project IRIS (Peer-to-peer research) http://project-iris.net Email rsc@mit.edu