Effect of Links on DHT Routing Algorithms 1

Effect of Links on DHT Routing Algorithms 1 Futai Zou, Liang Zhang, Yin Li, Fanyuan Ma Department of Computer Science and Engineering Shanghai Jiao Tong University, 200030 Shanghai, China zoufutai@cs.sjtu.edu.cn Abstract. Various DHT routing algorithms have been proposed in recent years. All these algorithms have tried to keep an uniform structured geometry while nodes join and leave. In this paper, we use links to capture the dynamic characteristics of the geometry and suggest there are three kinds of links in the geometry: the basic short link, the redundant short link, and the long links. Several current DHT systems have been investigated to argue these links are inherent in them and pointed out the possible improved directions of performance based on the characteristics of links. We analyze how links impact the routing performance and observe it with simulation experiments. Our experimental results show that each kind of links has its special contribution to the performance of P2P systems and it needs to take the effect of links into account as designing DHT routing algorithms. 1 Introduction A peer-to-peer networked system is a collaborating group of Internet nodes which construct their own special-purpose network on top of the Internet. Peer-to-Peer(P2P) systems can provide the capability to organize and utilize the huge amounts of resources in the Internet. Generally, we can taxonomize these decentralized systems into two categories: structured P2P systems and unstructured P2P systems. Structured P2P system are systems where nodes organize themselves in an orderly fashion and search is routed while unstructured P2P systems are ones where nodes organize themselves random and search is blind. Structured P2P systems provide an efficient lookup mechanism by means of DHTs(Distributed Hash Tables) while unstructured P2P systems use mostly broadcast search. Systems like CAN[3], Chord[4], Tapestry[6] and Koorder[7] are examples of the former, and Gnutella[2], Freenet[5] belong to the latter. The most important difference between structured and unstructured P2P systems is the determinate routing. A determinate routing can guarantee the location of an object and find the object in a preconcerted approach. In contrast to this, searching objects in unstructured P2P systems are indeterminate and it is often failed to find rare objects. 1 Research described in this paper is supported by The Science & Technology Committee of Shanghai Municipality Key Technologies R&D Project Grant 03dz15027 and by The Science & Technology Committee of Shanghai Municipality Key Project Grant 025115032.

Considering the basic topology construction, we argue that the difference between unstructured and structured systems in routing guarantee is base on the topologic geometry. Structured systems provide an efficiently uniform geometry while unstructured systems are lack of uniform geometry. These uniform geometries may be tree, ring, hypercube and DHT technique can distribute p2p nodes on to the vertex of the geometry. In this way, the edge of geometries embodies the relationship of a pair of nodes in P2P systems. A link is referred to an edge but it is very different the edge from the dynamic characteristic. The link can be dynamically adjusted according to the change of node s neighbors and it reflects how well the node senses the system. We call the original node as the owner of the links. Though it depends on the requirements for all kinds of DHTs, in essence, each DHT geometry could provide three kinds of links: basic short links, redundant short links and long links. Basic short links are these links between the node and its adjacent nodes with only one hop. Likewise, redundant short links are these links sequentially following basic short links. That means they are these links between the node and its adjacent nodes with over one hop. The similarity of tow kinds of short links is that they maintain the connectivity of DHT geometry so that a request can be routed to any node in P2P systems. The difference is that basic short links are the commonness of all DHT geometries and must exist in the construction of DHT geometries, but redundant short links may not exist because they only enhance the connectivity of the underlying DHT geometry. Long links begin from current node to contact the remote distant nodes so as to shorten the network diameter, in that a request can be routed much faster. We use the links to capture the dynamic characteristic of structured peer-to-peer network and in nature it is the reflection of the dynamic geometry underlying the structured peer-to-peer network. DHT systems can forward a request only using its basic short links, however, it is usually inefficient and unreliable. Therefore, to design a DHT system, one should add additional links to the nude DHT geometry so as to enhance the system performance. As mentioned above, redundant short links can enhance the connectivity of the geometry that improves the fault-tolerance and long links can shorten the diameter of the geometry that reduces the average path length. In this way, we have provided the linking model to anatomise DHT algorithms. In next section, we will investigate several DHT systems to argue these inherent linked constructions and the direction of the improvement of their performance. The remainder of the paper is organized as follows. Section 2 investigates several popular P2P systems and provides the insight of links to these DHT geometries. Section 3 analyzes how the links impact on the routing performance and the construction of links and give the basic methods to establish links into DHT geometries. Experiments are discussed in Section 4 and we conclude our research and propose future work in the last section. 2. Links in DHTs We discuss several current DHT designs in this section. We make special investigations on their inherent geometric construction and research how links impact

their routing performance. We consider the following DHTs: Chord, CAN, Koorde and expect to provide the insight of links to DHT geometries through these typical designs. 2.1 Chord Chord [4] arranges all n nodes into a uniform circle. It uses a single dimensional circular identifier space and forwards messages based on numerical difference with the destination address. Chord maintains two sets of neighbors. Each node has a successor list that immediately follow it in the identifier space and a finger list which includes log(n) contacted nodes. These contacted nodes is formed as follows: current node with identifier (say) x maintains log(n) contacted nodes where the i th node is the node identifier closest to x+2 i on the circle. As described above, we can know that the successor list is just short links and the finger list is just the long links. So there are k short links(k is the size of the successor list) and log(n) long links. The geometry of Chord is a ring. The minimal connected ring need only a link as its basic short link. That means Chord could use only one link per node to make a determinate search. However, this design is very inefficient for its very weak connectivity and a large network diameter O(n). So the design needs to add redundant links to the robust connectivity and to add long links to shorten the network diameter. The successor list in Chord is just the redundant short links and the finger list is just the long links. In this design, Chord achieves O(logn) network diameter. 2.2 CAN Sylvia et al [3] proposed the Content Addressable Networks as a distributed infrastructure that provides hash table like functionality on internet-like scales. CAN use a d-torus that is partitioned among nodes such that every node owns a distinct zone within the space. Each node has 2d neighbors; neighbor i differs from the given node on only the i th bit. Using its neighbor coordinate set, a node routes a message towards its destination by simple greedy forwarding to the neighbor with coordinates closest to the destination coordinates. CAN embody hypercube geometry. To keep the minimal connected hypercube, it does need 2d links per node. Above described above, the design of CAN only provides 2d the basic short links, without any other links in CAN. Therefore, CAN is deficient in robust connectivity and has a longer network diameter (d/2)n (1/d). 2.3 Koorde Koorde [7] is a novel DHT that exploits de Bruijn graph [1]. It looks up a key by contacting log 2 n nodes with only 2 neighbors per node. A de Bruijn graph has a node for each binary number of b bits. For the geometry of de Bruijn graph, a node has two outgoing edges: node m has an edge to node 2m mod 2 b and an edge to node 2m+1 mode 2 b. It is easy to extend Koorde to k neighbors.

The de Bruijn graph is a compacted geometry so as to an optimal diameter. In essence, it is a minimal connected graph. So it has only the basic short links but can achieve an optimal diameter. This is a desired geometry which can keep a constant overheads with constant short links because long links would do more helps to shorten the diameter. We think it is a good choice to design the new DHT algorithm on a geometry which has inherent optimal diameter. However, the pure de Bruijn geometry is less resilience due to the lack of redundant short links. So Koorde has been designed to add k redundant short links in a similar way with Chord. 3 Link Analysis and Establishment After surveying classic P2P geometry, we give a more comprehensive understand of links in this section. First, we analyze how the links impact on the routing performance of P2P systems. Second, we give the basic methods to establish links into DHT geometries. 3.1 Link Analysis We use links to capture the dynamic relationship of nodes in structured P2P systems. These links form a structured geometry. We think that the well-organized and connected geometry underlying structured P2P systems is the radical difference with the free-riding unstructured P2P systems. According to the different function in the geometry, we distinguish three kinds of links, that is, the basic short link, the redundant short link, the long link. Although each kind of link has its function, the basic short link is the radical link of the geometry and is decided inherently by the geometry. Hence we emphasize on how the long links and the redundant short links impact the routing performance of P2P systems. We analyses the two metrics, average path length and resilience as follows. 3.1.1 Average path length The average path length is the average hops between every pair of nodes. It identifies how quickly a request is forward to the destination. The long link is an efficient way to improve the average path length. Chord adds long links to the basic geometry to get an optimizing average path length. CAN hasn t long links in that it gets a longer average path length. The methods to add long links are diversified and have still more widely space to be explored. We point out, however, it needs the tradeoff between the number of links and the maintenance overheads of links. 3.2.1 Resilience Resilience measures the extent to which DHTs can route around trouble even without the aid of recovery mechanisms that fix trouble. The basic short link is inner structure of the DHT geometry and redundant short links provide the chance to enhance its connectivity. The connectivity embodies the routing resilience to node failure. The lack of redundant short links would be less resilience, which will be frail for node

failure or spend a long path to be rewound. Resilience is a important aspect of P2P systems. As a special geometry without redundant short links, it is suggested to add redundant short links to improve the resilience. Koorde is a example as described in section 2.3. 3.2 Link Establishment As we have investigated in section 2, DHT systems build their routing algorithms based on a structured geometry and we can use link to capture the dynamic characteristics of the geometry and suggest there are three kinds of links in the geometry: the basic short link, the redundant short link, and the long links. We have pointed out that basic short links are the inner structure of various DHT geometries and the improved performance of DHT systems would rely on long links and additional redundant short links. Hence we discuss how to establish additional long links and redundant links into the geometries with only inner basic short links so as to improve the performance of P2P systems. 3.2.1 Long links Long links would shorten the average path length as mentioned above. Long links should be arranged appropriately so as to reach a good tradeoff between the number of long inks and the maintenance overheads of these links. We assume the node ID space is N. A general technique is to split the space into w non-overlapping sub intervals and establish one long link per sub-interval. That is, a node x may establish its w long links according to the formalizing description as follows: For a node whose id is x can establish log 2 (N) long links to these nodes whose id is i (x + 2 ) mod N, i (1,log 2(N)). Chord use this method and shorten path length to O(log 2 N). We can extend this method from one dimension to multiple dimensions. For a d-dimensional torus, we may assume that torus is of size m d, where m d = N and m is the width of each dimension of the torus. In each dimension, the node establishes log 2 (m) long links to nodes whose dimensional coordination is i (x + 2 ) mod m,i (1,log 2(m)). In this way, each node still maintains O(log 2 N) long links. This method can be applied to reconstruct CAN topology. However, we want to keep the merit of CAN with constant degree overheads. Hence we only add one long link along each dimension, that is, node x establishes one long link along each dimension to nodes whose correspondingly dimensional coordination is (x + 2/m) mod m. This method will reduce the average path length but still keep constant degree overheads as O(3d). 3.2.2 Redundant short links Redundant short links will improve resilience performance because it can make routing continue to the destination even if the basic short links haven t been recovered from the failure. For one dimension such a circle, redundant short links would be established with the clockwise successors of the node. For multiple dimension, redundant short links can be established from the node to its adjacent nodes in torus

with hops 2,3,4 etc. 4 Experiments In this section, we observe how the links impact on DHT routing performance by simulation. For the better understanding the effect of links, we use CAN as the base of our simulation. More exactly, there are two reasons for selecting CAN: CAN has only 2d basic short links and it is the well-known DHT system. In section 4.1 and 4.2, we add long links and redundant short links to CAN respectively and observe the effect of links. We focus on two kinds of performance metrics: the average path length and resilience. 4.1 Effect of long links 35 9 Average path length(hops) 30 25 20 15 10 5 CAN(D=2,L=0) CAN(D=2,L=2) Average path length(hops) 8 7 6 5 4 3 2 CAN(D=4,L=4) CAN(D=6,L=0) 0 64 128 256 512 1024 2048 4096 The number of nodes 1 64 128 256 512 1024 2048 4096 The number of nodes Fig. 1. Effect of long links. Left: Comparing average path length between 0 and 2 long link (L) to 2-CAN. Right: Comparing average path length between additional long links and short links in higher dimension. Note that 4-CAN with additional long links has the same number of links with 6-CAN. We add additional long links to CAN in according to 3.2.1. So each node in CAN has 3d links with the additional d long links distributed in each dimension. We run simulation in the network with node number form 64 to 4096. First, we observe the improved performance after long links are added to 2-CAN. Second, we add 4 additional long links to 4-CAN so that 4-CAN has the same number of links with 6- CAN, then we compare their routing performance. We want to know if long links have more important than short links in higher dimension on the improvement of average path length. Fig. 1 shows the simulated results. The left graph in Fig. 1 presents the improved path length with additional long links. As it is clear to see, with the number of nodes increases, long links decrease the average path length more significantly. It can be explained that long links have more widely space to shorten the diameter of the network as the number of nodes increases. As shown in the right graph in Fig. 1, the performance with additional long links is better than short links in

higher dimension. Hence, the use of long links might be the preferred option if one is focusing on the improvement of the average path length. 4.2 Effect of redundant short links We add additional redundant short link with the methods described in section 3.2.2. As mentioned earlier, short links focus on the connectivity of the network. The basic short link is inner structure of the DHT geometry and redundant short links provide the chance to enhance its connectivity. The connectivity embodies the routing resilience to node failure. To observe how redundant short links impact on the resilience, we let some fixed fraction of uniformly chosen nodes fail and disable the failure recovery mechanism. In this case, we define failed routing as any two alive nodes cannot be connected. Fig. 2 shows the simulated results. The higher failed routing is because the failure recovery mechanism has been disabled. The left graph in Fig. 2 presents the resilience would be gradually improved with the increasing redundant short links. This is because of the enhanced connectivity with redundant short links. The right graph in Fig. 2 plots the resilience in different dimension CAN. It clearly shows that the improvement in higher dimension would be less significant. This is because higher dimension CAN has more basic links than lower dimension. Hence, the effect of redundant short links would decrease accordingly. 100 100 90 90 CAN(D=2,RSL=0) CAN(D=2,RSL=2) 80 80 CAN(D=4,RSL=0) CAN(D=4,RSL=2) 70 70 CAN(D=6,RSL=0) CAN(D=6,RSL=2) Failed routing(%) 60 50 40 30 20 CAN(D=2,RSL=0) CAN(D=2,RSL=1) CAN(D=2,RSL=2) CAN(D=2,RSL=3) CAN(D=2,RSL=4) Failed routing(%) 60 50 40 30 20 10 10 0 0 10 20 30 40 50 60 70 80 90 Failed nodes(%) 0 0 10 20 30 40 50 60 70 80 90 Failed Node(%) Fig. 2. Effect of redundant short links. Left: Percentage of failed routing for varying percentages of node failures considering varying numbers of redundant short links (RSL) to CAN with fixed dimension in the network of 1024 nodes. Right: Comparing failed routing between 0 and 2 redundant short links to different dimension CAN in the network of 1024 nodes. 5 Conclusions and Future work In this paper, we have researched the effect of links on DHT routing algorithms. Firstly, we investigate current several DHT algorithms and provide the insight of links to these algorithms and the direction of the improvement of their performance. Secondly, we analyze how links in DHT routing algorithm would impact the routing performance of peer-to-peer systems. Then we give the basic methods to establish

links into DHT geometries. Our simulation results have demonstrated the great effect of links on the routing performance and have given a basic scheme to design new DHT algorithms and redesign current DHT algorithms. Now there are several directions to extend our approach: (1) More current DHT algorithms may need to be investigated so as to provide a deep understanding how links affect routing performance and give the possible improved directions of their performance based on the effect of different kinds of links. (2) Considering the effect of links, more metrics such as load balance need to be investigated besides path length and resilience. (3) It will be an interesting and a worthwhile effort to explore how links would affect the physical delay on the underlying network. References [1] DE BRUIJN, N. 1972. Lambda-calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser Theorem. Indag. Math. 34, 5, 381-392. [2] Gnutella. http://www.gnutella.co.uk. [3] S. Ratnaswamy, P. Francis, M. Handley, R. Karp, and S.Shenker. A scalable contentaddressable network. ACM SIGCOMM, 2001. [4] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A peer-to-peer lookup service for internet applications. ACM SIGCOMM, 2001. [5] I. Clarke, O. Sandberg, B. Wiley, and T.W. Hong. Freenet: Adistributed anonymous information storage and retrieval system in designing privacy enhancing echnologies.international Workshop on Design Issues in Anonymity and Unobservability, LNCS 2009, 2001. [6] B. Zhao, K. Kubiatowicz, and A. Joseph. Tapestry: An infrastructure for fault-resilient wide-area location and routing. Technical Report UCB//CSD-01-1141, University of California at Berkeley, April 2001. [7] M. Frans Kaashoek, David R. Karger. Koorde: A simple degree-optimal distributed hash table. in 2st International Workshop on Peer-to-Peer Systems (IPTPS'03). 2003. Berkeley, CA, USA.