Fault Resilience of Structured P2P Systems

Similar documents
Survive Under High Churn in Structured P2P Systems: Evaluation and Strategy

Building a low-latency, proximity-aware DHT-based P2P network

Dorina Luminiţa COPACI, Constantin Alin COPACI

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Relaxing Routing Table to Alleviate Dynamism in P2P Systems

P2P: Distributed Hash Tables

Effects of Churn on Structured P2P Overlay Networks

Cycloid: A constant-degree and lookup-efficient P2P overlay network

Effect of Links on DHT Routing Algorithms 1

P2P Overlay Networks of Constant Degree

Early Measurements of a Cluster-based Architecture for P2P Systems

Cycloid: A Constant-Degree and Lookup-Efficient P2P Overlay Network

A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol

Cycloid: A Constant-Degree and Lookup-Efficient P2P Overlay Network

MULTI-DOMAIN VoIP PEERING USING OVERLAY NETWORK

Implications of Neighbor Selection on DHT Overlays

Survey of DHT Evaluation Methods

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

DYNAMIC TREE-LIKE STRUCTURES IN P2P-NETWORKS

Degree Optimal Deterministic Routing for P2P Systems

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network

LECT-05, S-1 FP2P, Javed I.

Efficient Multi-source Data Dissemination in Peer-to-Peer Networks

Distributed Hash Table

ReCord: A Distributed Hash Table with Recursive Structure

DRing: A Layered Scheme for Range Queries over DHTs

CSE 486/586 Distributed Systems

: Scalable Lookup

Motivation. The Impact of DHT Routing Geometry on Resilience and Proximity. Different components of analysis. Approach:Component-based analysis

The Impact of DHT Routing Geometry on Resilience and Proximity. Acknowledgement. Motivation

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

Hybrid Overlay Structure Based on Random Walks

Should we build Gnutella on a structured overlay? We believe

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

An Overlapping Structured P2P for REIK Overlay Network

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS

Back-Up Chord: Chord Ring Recovery Protocol for P2P File Sharing over MANETs

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

EFFICIENT ROUTING OF LOAD BALANCING IN GRID COMPUTING

Design of a New Hierarchical Structured Peer-to-Peer Network Based On Chinese Remainder Theorem

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

L3S Research Center, University of Hannover

Proximity Based Peer-to-Peer Overlay Networks (P3ON) with Load Distribution

UC Berkeley UC Berkeley Previously Published Works

Towards Scalable and Robust Overlay Networks

Understanding Chord Performance

A Simple Fault Tolerant Distributed Hash Table

Comparing the performance of distributed hash tables under churn

Telematics Chapter 9: Peer-to-Peer Networks

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

Athens University of Economics and Business. Dept. of Informatics

Three Layer Hierarchical Model for Chord

PEER-TO-PEER (P2P) systems are now one of the most

PAPER A Proximity-Based Self-Organizing Hierarchical Overlay Framework for Distributed Hash Tables

Routing Table Construction Method Solely Based on Query Flows for Structured Overlays

Badri Nath Rutgers University

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Architectures for Distributed Systems

The Cost of Inconsistency in DHTs

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

DATA. The main challenge in P2P computing is to design and implement LOOKING UP. in P2P Systems

Distriubted Hash Tables and Scalable Content Adressable Network (CAN)

PChord: Improvement on Chord to Achieve Better Routing Efficiency by Exploiting Proximity

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Searching for Shared Resources: DHT in General

Providing File Services using a Distributed Hash Table

A Method for Designing Proximity-aware Routing Algorithms for Structured Overlays

Detecting and Recovering from Overlay Routing. Distributed Hash Tables. MS Thesis Defense Keith Needels March 20, 2009

Making Chord Robust to Byzantine Attacks

Diminished Chord: A Protocol for Heterogeneous Subgroup Formation in Peer-to-Peer Networks

A DHT-Based Grid Resource Indexing and Discovery Scheme

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

FPN: A Distributed Hash Table for Commercial Applications

Simple Determination of Stabilization Bounds for Overlay Networks. are now smaller, faster, and near-omnipresent. Computer ownership has gone from one

Understanding Disconnection and Stabilization of Chord

Peer-To-Peer Techniques

OVER the last few years, peer-to-peer networks have

Overlay Partition: Iterative Detection and Proactive Recovery

BAKE: A Balanced Kautz Tree Structure for Peer-to-Peer Networks

Chord-based Key Establishment Schemes for Sensor Networks

L3S Research Center, University of Hannover

IMPLEMENTING P2P RESOURCE SHARING APPLICATIONS IN WIRELESS MESH NETWORKS

Handling Network Partitions and Mergers in Structured Overlay Networks

Searching for Shared Resources: DHT in General

The Impact of DHT Routing Geometry on Resilience and Proximity

Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. What We Want. Today s Question. What We Want. What We Don t Want C 1

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

RAQ: A Range-Queriable Distributed Data Structure

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

C 1. Last Time. CSE 486/586 Distributed Systems Distributed Hash Tables. Today s Question. What We Want. What We Want. What We Don t Want

CS514: Intermediate Course in Computer Systems

Impact of Neighbor Selection on Performance and Resilience of Structured P2P Networks

Distributed Hash Tables: Chord

NodeId Verification Method against Routing Table Poisoning Attack in Chord DHT

Implementing Range Queries with a Decentralized Balanced Tree Over Distributed Hash Tables

SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks

Peer-to-Peer Systems and Distributed Hash Tables

Transcription:

Fault Resilience of Structured P2P Systems Zhiyu Liu 1, Guihai Chen 1, Chunfeng Yuan 1, Sanglu Lu 1, and Chengzhong Xu 2 1 National Laboratory of Novel Software Technology, Nanjing University, China 2 Department of Electrical and Computer Engineering, Wayne State University, USA Abstract. A fundamental problem that confronts structured peer-topeer system that use DHT technologies to map data onto nodes is the performance of the network under the circumstance that a large percentage of nodes join and fail frequently and simultaneously. A careful examination of some typical peer-to-peer networks will contribute a lot to choosing and using certain kind of topology in special applications. This paper analyzes the performance of Chord [7] and Koorde [2], and find out the crash point of each network through the simulation experiment. 1 Introduction Peer-to-peer systems are distributed systems without any centralized control or hierarchical organization. However, unlike traditional distributed systems, nodes in peer-to-peer system are continuously joining and leaving the system. Thus, data items must migrate as nodes come and go, and routing tables must be updated constantly. To deal with these problems that arise over time, all realistic peer-to-peer systems implement some kind of stabilization routine which continuously repairs the overlay as nodes come and go, updating control information and routing tables to ensure that the overlay remains connected and support efficient lookups. However, when facing large percentage of nodes joins and leaves, different peerto-peer proposals may have very different performance due to the geometric properties of the topology they are built up on. In general, there is a tradeoff between degree and resilience. High degree usually induces strong connectivity and therefore good resilience, vice versa. However, other aspects like clustering properties also have impact on resilience, so, through careful design, even network with constant degree can perform well under a certain percentage of node failure. Now there is the question: if we know how often the nodes come and go, given the demanding fault tolerance, which kind of peer-to-peer network should we choose? A lot of related work addressed to this problem has been done. Some [4,6,3,5] did execellent theoretical analysis and some did careful selected simulation [3].So far, however, simulation of dynamic peer-to-peer systems to detect their crash point hasn t been done yet. For there is no widely accepted definition of crash point, we here define it as follows: If x percent of nodes fails, and half (50%) look-ups failed, then x is defined as the crash point. By this definition, we can ignore the difference between two kinds of crash. One kind is a node become X. Zhou et al. (Eds.): WISE 2004, LNCS 3306, pp. 736 741, 2004. c Springer-Verlag Berlin Heidelberg 2004

Fault Resilience of Structured P2P Systems 737 isolated, all look-ups from/to it will fail. The other kind is the whole network break up into some disconnected subnets, look-ups between each net will fail, however look-ups within the subnet can still success. Then we can compare different protocols under the same criterion. In this paper we will simulate to find out the crash point of 2 typical kind of networks Chord [7] and Koorde [2]. 2 Related Works Fault tolerance of peer-to-peer networks is an important topic. Liben-Nowell et al. [4] examine error resilience dynamics of Chord when nodes join/leave the system and derive lower bounds on the degree necessary to maintain a connected graph with high probability. Fiat et al. [8] build a Censorship Resistant Network that can tolerate massive adversarial node failures and random object deletions. Saia et al. [9] create another highly fault-resilient structure with O(log 3N) state at each node and O(log 3N) hops per message routing overhead. Unfortunately, very few studies examine the resilience of existing graphs in comparison with each other, especially when nodes join/leave at a high rate. We are aware of only few comparison study, including that Gummadi et al. [3] find that ringbased graphs (such as Chord) offer more flexibility with route selection and provide better performance under random node, and Dmitri Loguinov et al. [5] do some theoretical analysis of the existing graph. 3 Analysis One of the reasons DHTs are seen as an excellent platform for large scale distributed systems is that they are resilient in the presence of node failure. This resilience has two different aspects [3]: Static resilience: we keep the routing table static, only delete the items of failed nodes to see whether the DHTs can route correctly. Thus, static resilience gives a measure of how quickly the recovery algorithm has to work. Routing recovery: they repopulate the routing table with live nodes, deleting the items of failed nodes. Later you will see, Chord and Koorde nearly have the same routing recovery algorithm. However, since Koorde has only (log 1) neighbor, it has much weaker static resilience than Chord. Even if it increases its neighbors to O(log n), it is still not as good as Chord, because unlike Chord, the geometry of Koorde (de bruijn graph) doesn t naturally support sequential neighbor. Let s first discuss flexibility and path overlap, which have impact on static resilience.

738 Z. Liu et al. 3.1 Flexibility When basic routing geometry has been chosen, more flexibility means more freedom in the selection of neighbors and routes. Let s discuss in turn: Neighbor Selection. DHTs have a routing table comprised of neighbors. Some algorithms make purely deterministic neighbors (i.e., Koorde), others allow some freedom to choose neighbors based on other criteria in addition to the identifiers; most notably, latencies have been used to select neighbors. (i.e., ChordFingerPNS). Route Selection. Given a set of neighbors, and a destination, the routing algorithm determines the choice of the next hop. However, when the determined next-hop is down, flexibility will describe how many other options are there for the next-hop. If there are none, or only a few, then the routing algorithm is likely to fail. Then we examine Chord and Koorde from the aspect of flexibility. Chord. Chord provides both neighbor selection flexibility and routing selection flexibility. Gummadi et al. [3] mentioned that Chord has 2 i possible options in picking its ith neighbors for a total of approximately n{(log n)/2} possible routing tables for each node, and having selected one of its possible routing tables, the flexibility in route selection now available is (log n)! for two nodes that are initially O(n) distance apart. What s more, Chord also allows paths that are much longer than log n. This is accomplished by taking multiple hops of smaller spans instead of a single hop of large span. For example, one could take two successive hops using (i 1)th neighbors of span 2 i 1 each instead of a single hop using ith neighbor of span 2 i. This feature will reduce the probability of path failure when large percentage nodes fails. Koorde. Koorde has no flexibility in either neighbor selection or route selection. When large percentage of nodes fails simultaneously, the lookup through de Bruijn path will easily fail. Fortunately like finger pointers in Chord, Koorde s de Bruijn pointer is merely an important performance optimization; a query can always reach its destination slowly by following successors. Because of this property, Koorde can use Chord s join algorithm. Similarly, to keep the ring connected in the presence of nodes that leave, Koorde can use Chord s successor list and stabilization algorithm. However, when facing a large percentage of nodes leaving, the path length will increase fast, and more failure due to time out will happen. 3.2 Path Overlap Dmitri et.al [5] have done simulation to compare the path overlap of Chord and de Burijn graph. They drew the conclusion that de Bruijn graph select backup

Fault Resilience of Structured P2P Systems 739 Fig. 1. Percentage of failed failures across Chord and Koorde. Koorde has 8 fingers, which is equal to Chord with 256 nodes. The stabilization time are both 20s. Successors are both 4. paths that do not overlap with the best shortest path or with each other, while Chord has certain neighbors that show tendency to construct shortest paths that always overlap with the already-failed ones. However, in sparse Koorde, one real node is responsible for forwarding request for many imaginary nodes. It s another kind of path overlap. Unfortunately, in Koorde, all neighbors of a certain node usually have the same predecessor. The advantage of non-overlap path could only be seen in dense de Bruijn graph. 4 Simulation Since Chord and Koorde use the same stabilization algorithm, We will see from the simulation: when Chord and Koorde execute the stabilization routine under the same frequency, which one has a better performance. To speed up the development, we build the simulators on top of an existing tools p2psim [1]. The protocol of Chord and Koorde are already implemented, we rewrite parts of code, mainly the eventgenerator() to do the experiments in this paper. To test how Chord and Koorde perform when mass nodes fail simultaneously, we use the same network events but with different protocols. The network events includes join, fail, look-up. First, we build a stable network of 256 (2 8 ) nodes. Each node executes look-up events which are exponentially distributed about the given meantime (0.5 second). At the 5th second we randomly select p percentage of nodes. And from the 5th second till the end of the experiment, these nodes will fail or join simultaneously every 20 seconds. That is to say at the 5th second, they all fail, then at the 25th second, they all join, and so on. p is variated from 5-percent to 95 percent in increments of 5 percent. The total simulation time is 120s, during which the network is continuously changing, and we record the percentage of failure lookups. It is obviously that Chord performs better than Koorde under the same network circumstance (see Figure 1), which are not surprising after our discussion above. By our definition, the crash point of Koorde is 55%, while the crash point of Chord is about 70%. And in Koorde, when 15% nodes fails, failure start

740 Z. Liu et al. Fig. 2. Chord: Percentage of failed failures. When successors increase from 4 to 8, the crash point increases from 65% to 80% Fig. 3. Chord: Percentage of failed failures. When stabilization time decrease from 40s to 20s, the crash point increases from 65% to 70% Fig. 4. Koorde: Percentage of failed failures. When successors increase from 4 to 8, the crash point increases from 50% to 75%. Fig. 5. Koorde: Percentage of failed failures with different stabilization time:10s, 20s, 30s and 40s to emerge. While in Chord, look-up failures are very few when failed nodes are under 40%. This can be attributed to successor back-ups, when 40% nodes fail, the probability of a node losing all it four successors is only 0.4 4 =0.025.In both protocols when the number of fail nodes increases, the percentage of look-up failure increasing fast. It is because when lots of fingers are not correct, the path length is growing fast, and more nodes are contacted, hence, the probability of encountering failed nodes increases fast. When the failed nodes are above 80%, the failed look-ups are decreasing because at this time, most of the nodes are failed, the number of look-ups are very few, and some nearby look-ups can reach its destination quickly. Then let us have a look at how increasing successors and speedup stabilization have impact on the performance of Chord and Koorde. The result get from the simulation is that both increasing successors and speedup stabilization obviously enhance the fault resilience of Chord (see Figure 2 and Figure 3). However, in Koorde while increasing successors greatly enhances the fault re-

Fault Resilience of Structured P2P Systems 741 silience of Koorde (see Figure 4), the influence of speedup stabilization is not that obvious (see Figure 5). It tells that concurrent nodes failure is sensitive to Koorde, and when large percent of nodes fail simutaneously, stabilization routine couldn t help a lot. 5 Conclusions In this paper, we compare the dynamic resilience of Chord and Koorde from theoretical analysis and simulation. The result is that: Chord has better dynamic resilience than Koorde. We also find out the crash point of both protocol under some given circumstance. However we haven t compare the path overlap through simulation. This is left for further work. Due to time limitation, we just finish some initiating work in comparing the fault resilient of different peer-to-peer networks, Further work will be done to compare more networks under more complex circumstance. Acknowledgement. The work is partly supported by China NSF grant (No. 60073029), China 973 project (No. 2002CB312002), TRAPOYT award of China Ministry of Education, and US NSF grant (No. ACI-0203592). References 1. http://pdos.lcs.mit.edu/p2psim/howto.html 2. M. F. Kaashoek and R. Karger: Koorde: A simple degreeoptpimal distributed hash table. In 2nd International workshop on P2P Systems(IPTPS 03),(2003) 3. K.P. Gummadi, R. Gummadi, S.D. Gribble, S. Ratnasamy, S. Shenker, and I. Stoica: The Impact of DHT Routing Geometry on Resilience and Proximity. ACM SIGCOMM (2003) 4. D. Liben-Nowell, H. Balakrishnan, and D. Karger: Analysis of the Evolution of Peer-to-Peer Networks. ACM PODC (2002) 5. D. Loguinov, A. Kumar, V. Rai, S. Ganesh: Graph-Theoretic Analysis of Structured Peer-to-Peer Systems: Routing Distances and Fault Resilience. ACM SIG- COMM (2003) 6. J. Aspnes, Z. Diamadi, and G. Shah: Fault-Tolerant Routing in Peer-to-Peer Systems. ACM PODC (2002) 7. I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, and H. Balakrishnan: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. ACM SIGCOMM (2001) 8. A. Fiat and J. Saia: Censorship Resistant Peer-to-Peer Content Addressable Networks. ACM/SIAM Symposium on Discrete Algorithms (2002). 9. J. Saia, A. Fiat, S. Gribble, A.R. Karlin, and S. Saroiu: Dynamically Fault-Tolerant Content Addressable Networks. IPTPS (2002).