A Scalable Content- Addressable Network

Similar documents
Distributed Hash Table

Distriubted Hash Tables and Scalable Content Adressable Network (CAN)

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network

A Scalable Content- Addressable Network

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Making Gnutella-like P2P Systems Scalable

Building a low-latency, proximity-aware DHT-based P2P network

A Scalable Content-Addressable Network

Relaxing Routing Table to Alleviate Dynamism in P2P Systems

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Athens University of Economics and Business. Dept. of Informatics

Query Processing Over Peer-To-Peer Data Sharing Systems

RAQ: A Range-Queriable Distributed Data Structure

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

08 Distributed Hash Tables

A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Telematics Chapter 9: Peer-to-Peer Networks

Motivation. The Impact of DHT Routing Geometry on Resilience and Proximity. Different components of analysis. Approach:Component-based analysis

The Impact of DHT Routing Geometry on Resilience and Proximity. Acknowledgement. Motivation

Architectures for Distributed Systems

CS514: Intermediate Course in Computer Systems

Peer to peer systems: An overview

Early Measurements of a Cluster-based Architecture for P2P Systems

Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup

Should we build Gnutella on a structured overlay? We believe

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Mill: Scalable Area Management for P2P Network based on Geographical Location

Application Layer Multicast For Efficient Peer-to-Peer Applications

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Structured Peer-to-Peer Networks

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Survey of DHT Evaluation Methods

Symphony. Symphony. Acknowledgement. DHTs: : The Big Picture. Spectrum of DHT Protocols. Distributed Hashing in a Small World

Data-Centric Query in Sensor Networks

Load Sharing in Peer-to-Peer Networks using Dynamic Replication

Protocol for Tetherless Computing

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Internet Indirection Infrastructure (i3) Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh Surana. UC Berkeley SIGCOMM 2002

ReCord: A Distributed Hash Table with Recursive Structure

Brocade: Landmark Routing on Peer to Peer Networks. Ling Huang, Ben Y. Zhao, Yitao Duan, Anthony Joseph, John Kubiatowicz

A Scalable Content-Addressable Network

CSE 5306 Distributed Systems

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Turning Heterogeneity into an Advantage in Overlay Routing

R/Kademlia: Recursive and Topology-aware Overlay Routing

Adaptive Load Balancing for DHT Lookups

Coordinate-based Routing:

A DHT-Based Grid Resource Indexing and Discovery Scheme

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems

SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.

A Super-Peer Based Lookup in Structured Peer-to-Peer Systems

Badri Nath Rutgers University

CSE 5306 Distributed Systems. Naming

Distributed Knowledge Organization and Peer-to-Peer Networks

CIS 700/005 Networking Meets Databases

Structured P2P. Complexity O(log N)

Lecture-2 Content Sharing in P2P Networks Different P2P Protocols

Effect of Links on DHT Routing Algorithms 1

Introduction to Peer-to-Peer Systems

Load Balancing in Structured P2P Systems

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

Understanding Chord Performance

0!1. Overlaying mechanism is called tunneling. Overlay Network Nodes. ATM links can be the physical layer for IP

PChord: Improvement on Chord to Achieve Better Routing Efficiency by Exploiting Proximity

Module SDS: Scalable Distributed Systems. Gabriel Antoniu, KERDATA & Davide Frey, ASAP INRIA

Peer- Peer to -peer Systems

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Data Replication under Latency Constraints Siu Kee Kate Ho

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

: Scalable Lookup

CRESCENDO GEORGE S. NOMIKOS. Advisor: Dr. George Xylomenos

Searching for Shared Resources: DHT in General

Resilient GIA. Keywords-component; GIA; peer to peer; Resilient; Unstructured; Voting Algorithm

Searching for Shared Resources: DHT in General

ASPEN: An Adaptive Spatial Peer-to-Peer Network

CSE 124: QUANTIFYING PERFORMANCE AT SCALE AND COURSE REVIEW. George Porter December 6, 2017

Topologically-Aware Overlay Construction and Server Selection

Fixing the Embarrassing Slowness of OpenDHT on PlanetLab

A Survey of Peer-to-Peer Systems

A Top Catching Scheme Consistency Controlling in Hybrid P2P Network

Peer-to-Peer Systems. Chapter General Characteristics

Part 1: Introducing DHTs

Providing Administrative Control and Autonomy in Structured Peer-to-Peer Overlays

Reducing Outgoing Traffic of Proxy Cache by Using Client-Cluster

Evaluation Study of a Distributed Caching Based on Query Similarity in a P2P Network

Minimizing Churn in Distributed Systems

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Topics in P2P Networked Systems

Distributed lookup services

EAD: An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems

IN recent years, the amount of traffic has rapidly increased

Page 1. Key Value Storage"

P2P: Distributed Hash Tables

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS

CS 43: Computer Networks BitTorrent & Content Distribution. Kevin Webb Swarthmore College September 28, 2017

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

Transcription:

A Scalable Content- Addressable Network In Proceedings of ACM SIGCOMM 2001 S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Presented by L.G. Alex Sung 9th March 2005 for CS856 1

Outline CAN basics Improving the basic CAN Evaluations Extensions Comment Discussion 2

CAN Basics Virtual d-dimensional coordinate space Each node holds a zone Key is hashed to a point P located in a zone hold by a node Insertion Lookup Deletion 3

Finding the key Location of Key K Hash(K) Point P By looking at the routing table of neighbours IP addresses virtual coordinate zones Determine the neighbour with the closest coordinate to P Greedily forward the msg[p(k), dst_coordinates] through that neighbour 4

Routing illustrated Routing the request until it reaches the node in which zone P lies P 5

Joining CAN Find a node currently in the system (by DNS) Ask for IPs of some other CAN nodes Randomly choose a point P in the space Send a JOIN request to the CAN node at point P through any CAN node Current occupant of point P splits its zone Being handed over the key-value pairs of that zone IP addresses and coordinates of neighbours Inform all old node s neighbours 6

Joining illustrated Randomly find a point; split that zone; get the keys and inform the neighbours P 7

Leaving CAN Departure Goal: give the keys to a neighbour Combine the zone with a neighbour to form a valid single zone OR temporarily hand over to the neighbour with the smallest zone Node Failure Identified by prolonged absence of update message The zone will be take over by the neighbour with the smallest zone volume 8

Improving the basic CAN Goal: reduce lookup latency Nodes can be physically far away The tradeoff (+) higher routing performance (+) system more robust ( ) higher per-node states ( ) higher system complexity 9

Multiple independent coordinate spaces (Realities) Allocation of multiple zones per node each zone in a different reality (Hash tables are replicated on every reality) Route to the neighbor closest to destination in all realities. (+) lower path length and path latency (+) higher data availability (+) routing fault tolerance ( ) more states per node 10

Multi-dimensioned coordinate spaces More dimensions more neighbours per node (+) routing fault tolerance more paths can be chosen ( ) more states per node routing table Multi-dimension is better. 11

Refinement of CAN routing metrics Goal: reduction of per-hop latency When selecting the next hop take into account the RTT and not just closer coordinate Simulation results: 24-40% improvement 12

Overloading coordinate zones When joining, zone sharing (if < MAXPEERS) instead of splitting More state info: neighbour list + peer list Neighbour selection by lowest measured RTT Hash tables: replication vs. partitioning (+) higher data availability ( ) need consistency mechanism ( ) larger size of data stored (+) lower path latency (+) higher fault tolerance ( ) higher system complexity ( ) additional control traffic 13

Multiple hash functions Mapping a single key to multiple nodes (replication) parallel queries (+) lower query latency (+) higher data availability ( ) larger size of the <key, value> database ( ) higher query traffic 14

Topologically-sensitive CAN construction Node insertion based on RTT from landmarks (instead of random insertion) (+) lower path latency ( ) uneven load distribution load balancing needed 15

Uniform partitioning Volume-based zone splitting (+) some form of load balancing each zone holding similar # of keys ( ) Hot spot problem: some <key, value> pairs are more popular network congestion 16

Caching & Replication for hot spot management Caching recently accessed keys (which belongs to other nodes) Replication: actively pushing popular keys to neighbours (+) higher data availability (+) lower query latency (+) load balancing ( ) cache management 17

Evaluation Critical factors: 1. increase in # of dimensions d reduction of path length 2. Use of RTT-weighted routing optimization of next-hop forwarding reduction of path latency 18

Evaluation (2) Effect of link delay distribution on CAN latency latency stretch = CAN latency/ip latency 1. Increase in # of nodes slow increase in latency stretch 2. Random delay the largest latency stretch 3. Larger backbone lower density of CAN nodes less effect of RTT-weighted routing degraded gains 19

CAN vs other DHTs RDP = Overlay network latency / IP latency CAN seems to be better than Full SkipNet Source: the SkipNet paper 20

CAN Extensions Application level multicasting [3] Spatial Data Query Support [2] 21

Summary CAN an Internet-scale hash table potential building block in Internet applications Scalability (basic CAN) O(d) per-node state O(n 1/d ) average path length Low-latency routing simple heuristics help a lot Robust decentralized, can route around trouble 22

Comment Strength Pioneer work in DHT (same as Chord) Intuitive presentation of formal concepts Taken into account the RTT in neighbour selection Weakness High computational and memory requirement (it s a trade-off) The CPU and memory usage statistics are not given 23

Discussion Network capacity is not taken into account when assigning keys. (hot-spot problem) Can we divide the zone based on networking capacity (rather than zone size only)? Can we predict the probability of congestion? How many levels of replication is reasonable? Many optimizations involve replicating the (K,V) pairs (and require more CPU cycles) Replication limit under reasonable assumptions? How about CPU limit? Which is cheaper: network delay or CPU/Memory? 24

Discussion - DHTs Internet users are heterogeneous. Memory and CPU power are relatively cheaper than routing cost. Would it be better to build CAN as a service to lower the heterogeneity and select the best balance point for optimizations vs CPU power & memory requirement? What can other DHT schemes do to reduce path latency? Which CAN optimization can be applied? Shall we give priority to maintenance or routing? What does O(log N) really mean? Would the average IP network latency be more important? What metrics shall we use to compare DHTs? 25

References 1. N.J.A. Harvey, M.B. Jones, S. Saroiu, M. Theimer, and A. Wolman, SkipNet: A Scalable Overlay Network with Practical Locality Properties, In Proc. 4th USENIX Symp. on Internet Tech. and Syst. (USITS), 2003. 2. Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang. Spatial Data Query Support in Peer-to-Peer Systems. COMPSAC 2004. 3. Sylvia Ratnasamy, Mark Handley, Richard Karp, and Scott Shenker. Application-Level Multicast Using Content- Addressable Networks. 4. Zacharias Boufidis. http://www.srdc.metu.edu.tr/webpage/seminars/p2p/can.ppt 26