On Degree-Based Decentralized Search in Complex Networks

Similar documents
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks

arxiv:cs/ v1 [cs.ir] 26 Apr 2002

Optimal Resource Discovery Paths of Gnutella2

A Hybrid Searching Scheme in Unstructured P2P Networks

Performance Analysis of Restricted Path Flooding Scheme in Distributed P2P Overlay Networks

Evaluation of Models for Analyzing Unguided Search in Unstructured Networks

Simulating Search Strategies for Gnutella

The Role of Homophily and Popularity in Informed Decentralized Search

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Distributed Network Routing Algorithms Table for Small World Networks

Peer-to-Peer Streaming

SEARCHING WITH MULTIPLE RANDOM WALK QUERIES

Simulation Study of Language Specific Web Crawling

Peer-to-Peer (P2P) Communication

ECS 253 / MAE 253, Lecture 8 April 21, Web search and decentralized search on small-world networks

On Veracious Search In Unsystematic Networks

Response Network Emerging from Simple Perturbation

Local Search in Unstructured Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

Web Crawling As Nonlinear Dynamics

A New Version of K-random Walks Algorithm in Peer-to-Peer Networks Utilizing Learning Automata

Performance Evaluation of Active Route Time-Out parameter in Ad-hoc On Demand Distance Vector (AODV)

AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems

Implementation of Resource Discovery Mechanisms onto PeerSim

CCP: Conflicts Check Protocol for Bitcoin Block Security 1

An Optimal Overlay Topology for Routing Peer-to-Peer Searches

DYNAMIC SEARCH TECHNIQUE USED FOR IMPROVING PASSIVE SOURCE ROUTING PROTOCOL IN MANET

Characterizing Gnutella Network Properties for Peer-to-Peer Network Simulation

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

The Complex Network Phenomena. and Their Origin

Sentiment Analysis for Customer Review Sites

A P2P-based Incremental Web Ranking Algorithm

Characterizing Traffic Demand Aware Overlay Routing Network Topologies

Power Aware Hierarchical Epidemics in P2P Systems Emrah Çem, Tuğba Koç, Öznur Özkasap Koç University, İstanbul

Properties of Biological Networks

Complex networks: A mixture of power-law and Weibull distributions

Chedar: Peer-to-Peer Middleware

Shaking Service Requests in Peer-to-Peer Video Systems

Peer Clustering and Firework Query Model

A FEEDBACK-BASED APPROACH TO REDUCE DUPLICATE MESSAGES IN UNSTRUCTURED PEER-TO-PEER NETWORKS

Topology Affects the Efficiency of Network Coding in Peer-to-Peer Networks

Investigation on OLSR Routing Protocol Efficiency

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Addressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P?

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A FAST COMMUNITY BASED ALGORITHM FOR GENERATING WEB CRAWLER SEEDS SET

A Square Root Topologys to Find Unstructured Peer-To-Peer Networks

arxiv: v1 [cs.si] 21 Nov 2011

An Agent-Based Adaptation of Friendship Games: Observations on Network Topologies

Resilient GIA. Keywords-component; GIA; peer to peer; Resilient; Unstructured; Voting Algorithm

Search in weighted complex networks

SELF-ORGANIZING TRUST MODEL FOR PEER TO PEER SYSTEMS

Dynamic Search Technique Used for Improving Passive Source Routing Protocol in Manet

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

SIL: Modeling and Measuring Scalable Peer-to-Peer Search Networks

Creating a Classifier for a Focused Web Crawler

Parallel Routing Method in Churn Tolerated Resource Discovery

TELCOM2125: Network Science and Analysis

Load Sharing in Peer-to-Peer Networks using Dynamic Replication

Optimizing Random Walk Search Algorithms in P2P Networks

Semantic Overlay Networks

TAON: A Topology-Oriented Active Overlay Network Protocol

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network

Evaluation Methods for Focused Crawling

How to explore big networks? Question: Perform a random walk on G. What is the average node degree among visited nodes, if avg degree in G is 200?

Automatically Constructing a Directory of Molecular Biology Databases

Decentralized Search

Example 1: An algorithmic view of the small world phenomenon

Automatic Web Image Categorization by Image Content:A case study with Web Document Images

Creating and maintaining replicas in unstructured peer-to-peer systems

Global dynamic routing for scale-free networks

ECS 289 / MAE 298, Lecture 9 April 29, Web search and decentralized search on small-worlds

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches

Characterizing Gnutella Network Properties for Peer-to-Peer Network Simulation

M.E.J. Newman: Models of the Small World

Implementation of Resource Discovery Mechanisms on PeerSim: Enabling up to One Million Nodes P2P Simulation

Evaluating Unstructured Peer-to-Peer Lookup Overlays

Column Generation Method for an Agent Scheduling Problem

Towards Limited Scale-free Topology with Dynamic Peer Participation

Degree Distribution, and Scale-free networks. Ralucca Gera, Applied Mathematics Dept. Naval Postgraduate School Monterey, California

The Robustness of Content-Based Search in Hierarchical Peer to Peer Networks

Flat Routing on Curved Spaces

The missing links in the BGP-based AS connectivity maps

Attack Vulnerability of Network with Duplication-Divergence Mechanism

Exploiting Content Localities for Efficient Search in P2P Systems

A Joint Replication-Migration-based Routing in Delay Tolerant Networks

A ROUTING MECHANISM BASED ON SOCIAL NETWORKS AND BETWEENNESS CENTRALITY IN DELAY-TOLERANT NETWORKS

How Routing Algorithms Work

Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques

Ant-inspired Query Routing Performance in Dynamic Peer-to-Peer Networks

Improving Information Retrieval Effectiveness in Peer-to-Peer Networks through Query Piggybacking

Peer-to-peer networks: pioneers, self-organisation, small-world-phenomenons

Preliminary results from an agent-based adaptation of friendship games

A Low-Overhead Hybrid Routing Algorithm for ZigBee Networks. Zhi Ren, Lihua Tian, Jianling Cao, Jibi Li, Zilong Zhang

Assessing and Safeguarding Network Resilience to Nodal Attacks

Minimizing bottleneck nodes of a substrate in virtual network embedding

A Hybrid Load Balance Mechanism for Distributed Home Agents in Mobile IPv6

Web Structure & Content (not a theory talk)

Adaptively Routing P2P Queries Using Association Analysis

Transcription:

1 On Degree-Based Decentralized Search in Complex Networks Shi Xiao Gaoxi Xiao Division of Communication Engineering School of Electrical and Electronic Engineering Nanyang technological University, Singapore 639798 Email: shixiao@pmail.ntu.edu.sg, egxxiao@ntu.edu.sg Abstract Decentralized search aims to find the target node in a large network by using only local information. The applications of it include peer-to-peer file sharing, web search and anything else that requires locating a specific target in a complex system. In this paper, we examine the degree-based decentralized search method. Specifically, we evaluate the efficiency of the method in different cases with different amounts of available local information. In addition, we propose a simple refinement algorithm for significantly shortening the length of the route that has been found. Some insights useful for the future developments of efficient decentralized search schemes have been achieved. Index Terms Decentralized search, complex network, scale-free network. I. INTRODUCTION Recently there has been increasing research interest in studying large-scaled real-world systems by formulating them into various network models. It is found that different systems, varying from Internet, world-wide web (WWW), airline transportation systems, scientific co-authorship, food web, to protein-protein reactions, even terrorism organizations, when formulated into network models, share some stunning common features. The above observations have spawn a new area called complex networks [1, 2]. One of the most important complex-network models is the scale-free network [1, 2]. In such networks, the nodal-degree distribution (i.e., the fraction of nodes with k connections) behaves as: P( k) k α, m k M where α is the exponent, m is the lower cutoff, and M is the upper cutoff. It is claimed (though not without arguments) that all the systems we mentioned above can be formulated as the scale-free networks [1, 2]. An important operation in complex networks is decentralized search, i.e., the operation of locating a certain target without a central controller to provide global information. The applications include peer-to-peer file sharing as that in Gnutella and Napster [3, 4], web crawlers search [5, 6], and anything else that requires locating a specific target in a complex system. The methods without considering the special features of the corresponding network models include breadth-first searching methods based on limited flooding [7, 8], random walks [9], and various informed searching methods with available information ranging from simple forwarding hints to exact object locations [1, 11]. By making use of the special features of the system topologies, most notably the six-degree separation in small-world networks [12], the well-known methods include the degree-based searching [13, 14] and the similarity-based searching methods [15-17]. The degree-based search methods typically make the assumptions that (i) each node knows its own neighborhood network topology; and (ii) each node can locate the target if and only if the target is within a certain range of its neighborhood. If the target is not found, the request will be forwarded to a chosen set of adjacent nodes for further search. In the degree-based method, generally speaking the request is forwarded to high-degree nodes. In fact, the most typical method simply forwards the request to the highest-degree adjacent node [14]. The similarity-based method, on the other hand, explores the similarity features existing in many real-world systems. For example, people like to watch men s basketball games may know more about women s basketball as well. Therefore it may be a good idea to ask for information This work was partly supported by Microsoft Research Asia (MSRA).

2 about some female basketball players to a community of men s basketball fans. From an algorithm-development point of view, however, it can be highly tricky to develop proper measurements of similarity, which sometimes have to be application dependent [16, 17]. Recently, there are research efforts aiming to combine the degree-based and similarity-based searching methods [13]. The results turn out to be quite encouraging. We in this report go back to study the very fundamental degree-based methods, with two major objectives: (1) to better understand how the performance of such methods can be affected by the amount of the local information being available to each node; (2) to develop a simple distributed refinement algorithm that can shorten the route between the source-destination nodes. The reason we try to shorten each route is that a shorter route generally speaking is more effective in supporting information exchanges between source-destination nodes, and usually leads to better robustness as well. Our main observations can be summarized as follows: Comparing the cases where each node can identify the target node within one, two or three hops away from itself respectively (We call such cases as having one-, two-, and three-hop information, respectively.), we find that the efficiency of the degree-based searching method can be drastically improved when more information is available. Having complete two-hop information plus partial three-hop information helps to improve the performance of the degree-based search compared to the case with only complete two-hop information. However, it does not easily improve the searching performance to be comparable to that of the case with complete three-hop information. The proposed simple refinement algorithm can shorten the average hop length of each route to be close to the theoretical optimal solution, i.e., the average length of the shortest paths between all the source-destination node pairs. The rest parts of this paper are organized as follows. In Section II, we define in details the degree-based searching methods we would investigate. These methods are evaluated by extensive simulations in Section III. The refinement algorithm for shortening the route is reported and evaluated in Section IV. Finally, Section V concludes the paper. II. DEGREE-BASED DECENTRALIZED SEARCHING METHODS The first degree-based searching method we will study is the simple method proposed in [14]. Specifically, the operation procedure is as follows: Upon the arrival of a searching request, each node will check the one-hop, two-hop or three-hop information it has. If the target is identified among its neighborhood, the search stops; otherwise, there are two different cases: 1) If some nodes adjacent to the current node have never been visited by the searching request, the searching request will be forwarded to the largest-degree node among them. When there is a tie, break it randomly. 2) If no such adjacent node exists (In other words, all the adjacent nodes have been visited by the searching request.), the searching request will be deflected back to the node where it came from. The searching operation will be terminated if the request has been forwarded/deflected many times yet still cannot find the target. In our simulations, we terminate the search when the number of forward/deflection operations equals to the number of nodes in the network, which seldom happens in our experiences. A slightly different method is reported in [14]: in case 2), if all the adjacent nodes have been visited by the searching request before, we find among them those nodes whose single-hop neighborhood has not been exhaustively visited by the searching request. Then we forward the request to the highest-degree node among them. Our experiences show that the two methods perform nearly the same. The latter one only occasionally helps to shorten the searching procedures. Therefore in this paper, we discuss the former method only. As aforementioned, we will evaluate the performance of the method with one-hop, two-hop and three-hop information, respectively (while in [14], it has always been assumed that the two-hop information is available). In many real-world systems, however, it is not easy to acquire complete

3 two-hop information, let alone three-hop information. For example, in the Internet, a big hub may have hundreds of thousands of neighborhood nodes within three-hop distance from itself. To test the possibility of loosening the request of having complete three-hop information without significantly sacrificing searching performance, we consider the following partial three-hop information-based (Partial-3) searching method: If the target cannot be found by a node within its two-hop neighborhood, this node will consult a few adjacent nodes whether the target is within their two-hop neighborhood. If a positive reply comes back, the search stops; otherwise, the search request is forwarded (or deflected) in the same way as that in the original method we discussed earlier. In our simulations, we let each node to consult its largest-degree adjacent nodes that have never been consulted before. III. EVALUATIONS OF THE EFFECTS OF DIFFEERNT INFORMATION AVAILABILITIES We first evaluate the performance of the simple method proposed in [14] with one-, two-, and three-hop information, respectively. Numerical simulations are conducted on the Barabási-Albert (BA) model [1] containing 1, nodes and a real-world Internet model containing 6,47 nodes [18]. In each model, we simulate decentralized search between pairs of randomly chosen source-destination nodes. The average results of 1 rounds of such simulations are presented in Fig. 1. 3 3 2 2 1 1 1-hop neighborhood information average hop length=14.14 average hop length=78.3 3-hop neighborhood information average hop length=7.64-5 5-1 1-2 2-3 3-4 4- -6 6-7 Hop Length 7-8 8-9 9-1 1- > (a) BA model 4 3 3 2 2 1 1-hop neighborhood information average hop length=.66 average hop length=42.97 3-hop neighborhood information average hop length=7.262 1-5 5-1 1-2 2-3 3-4 4-5 5-6 6-7 H op length 7-8 8-9 9-1 1-5 > 5 (b) 647-node Internet model Fig. 1. Performance of the classic degree-based searching method with different amounts of local information. The simulation results evidently show that having more local information helps to drastically improve the efficiency of the searching operations. For example, in the BA model, with single-hop information, the average number of hops each search operation goes through is as large as 14.14. This number can be reduced to 78.3 when two-hop information is available; and

4 further reduced to 7.64 when three-hop information is available. The numbers of those search operations going through fewer than 1 hops account for 1.8%, 55%, and 94.4% among all the operations in the three different cases, respectively. Similar conclusions hold in the Internet model. 2 2 1 1 Two adjacent nodes consulted average hop length=4.818 Three adjacent nodes consulted average hop length=33.364 Five adjacent nodes consulted average hop length=21.468-5 5-1 1-2 2-3 3-4 4- -6 6-7 Hop Length 7-8 8-9 9-1 1- > (a) BA model 3 3 2 2 1 1 Two adjacent nodes consulted average hop length=2.74 Three adjacent nodes consulted average hop length=17.8 Five adjacent nodes consulted average hop length=14.266-5 5-1 1-2 2-3 3-4 4- -6 6-7 Hop length 7-8 8-9 9-1 1- > (b) 647-node Internet model Fig. 2. Performance evaluations of the Partial-3 method. The results in Fig. 1 show that, having complete two-hop information is not sufficient to achieve comparable performance to that of the centralized search (where the shortest path between each node pair can be calculated). Considering the fact that it may not be feasible to acquire three-hop information in many real-world systems, we test the Partial-3 method where each node can consult at most 2, 3 or 5 of its adjacent nodes respectively. The simulation results are presented in Fig. 2. We see that compared to the case with complete two-hop information, the Partial-3 method performs much better. However, it does not improve the performance to be close to that of the case with complete three-hop information. In the BA model, even when each node can consult 5 adjacent nodes, the average hop length of each search operation is still 21.468 (With complete three-hop information, the average path length would be a much shorter 7.64.). The high sensitivity of searching performance to the amount of available local information may imply necessity of combining the degree-based and the similarity-based searching methods: searching based on network topology information (such as nodal-degree information) alone may simply never be good enough in certain circumstances. IV. REFINEMENT METHOD AND ITS PERFORMANCE For many applications, finding the target is not the end of the story. Once the target is found, a path needs to be set up between the source and the target. From the discussions in the last section, we see that, unless we have extensive three-hop information, the searching request may have to go

5 through a long path before it reaches the destination. If a short path between the source-destination nodes cannot be found at the first place, for many applications, it is helpful to shorten the path afterwards. For example, for peer-to-peer multimedia file sharing, some delay in connection setup generally speaking can be tolerated; yet it is important to make sure that the set-up connection is short, robust and with sufficient bandwidth. Motivated by such applications, we propose the following simple distributed refinement method with the objective of shortening a route that has been found. Assume the searching request finally reaches the target node carrying a list of all the nodes it has passed through. Denote the node list as { sn, 1, n2,..., nk, t }, where s is the ID of the source, t the ID of the target, and ni, i = 1,2,..., k the intermediate nodes. Find among the node list the first one that is adjacent to the destination. Denote this node as node m. Then find among the node list the first one adjacent to node m. Repeat the procedure under the node s is finally reached. 4 3 3 2 2 1 average hop length=78.3 2-hop neighborhood informaton with refinment average hop length=5.534 1-5 5-1 1-2 2-3 3-4 4- -6 6-7 Hop Length 7-8 8-9 9-1 1- > (a) BA model 4 3 3 2 2 1 average hop length=42.97 with refinement average hop length=3.916 1-5 5-1 1-2 2-3 3-4 4- -6 6-7 Hop length 7-8 8-9 9-1 1- > (b) 647-node Internet model Fig. 3. Performance evaluations of the refinement method. Apparently, the refinement method adopts a simple greedy approach to take shortcuts of the route that the searching request has gone through. The performance of the method is presented in Fig. 3, where we assume that the complete two-hop information is available to every node. Simulation results show that, simple as it is, the refinement method drastically shortens the average hop length of each path from 78.3 to 5.53 in the BA model, close to the average length of the shortest paths between all the node pairs (which is 4.88). More significantly, not a single path after refinement has more than 1 hops. Therefore, in applications where refinements are allowed, the simple algorithm can effectively achieve close-to-optimal solutions.

6 V. CONCLUSION We studied the degree-based decentralized searching methods in complex networks. Extensive simulations results show that the performance of such methods is rather sensitive to the amount of available local information. Such high sensitivity cannot be easily eliminated by limited information exchanges between neighborhood nodes, even if these nodes are of high degrees and are in scale-free networks with relatively large numbers of hub nodes. However, by using a simple refinement algorithm, we can shorten most of the routes to be close to the shortest lengths they could ever achieve. These insights would be helpful for the future developments of more efficient decentralized searching methods. REFERENCES [1] S. Bornholdt, H. G. Schuster (Eds.), Handbook of Graphs and Networks: From the Genome to the Internet, Wiley-VCH, 23. [2] E. Ben-Naim, H. Frauenfelder, and Z. Toroczkai (Eds.), Complex Networks, SpringerVerlag Berline Heidelberg, 24. [3] Gnutella website: http://www.gnutella.com [4] Napster website: http://www.napster.com [5] S. Chakrabarti, M. van den Berg, and B. Dom, Focused Crawling: A New Approach to Topic-specific Web Resource Discovery, In Proceedings of the 8 th International World Wide Web Conference, 1999. [6] M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori, Focused Crawling Using Context Graphs, In 26 th International Conference on Very Large Database (VLDB), 2. [7] V. Kalogeraki, D. Gunopulos, and D. Zeinalipour-Yazti, A Local Search Mechanism for Peer-to-Peer networks, In Proceedings of the 11th international Conference on Infomation and Knowledge Management (CIKM), 22. [8] B. Yang and H. Garcia-Molina, Improving Search in Peer-to-Peer Networks, In IEEE International Conference on Distributed Computing Systems (ICDCS), 22. [9] C. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, Search and Replication in Unstructured Peer-to-Peer Networks, In Proceedings of the 16th annual ACM International Conference on supercomputing (ICS), 22. [1] D. Tsoumakos and N. Roussopoulos, Adaptive Probabilistic Search (APS) for Peer-to-Peer Networks, Technical Report CS-TR-4451, University of Maryland, 23. [11] A. Crespo and H. Garcia-Molina, Routing Indices for Peer-to-Peer Systems, In IEEE International Conference on Distributed Computing Systems (ICDCS), 22. [12] J. Travers and S. Milgram, An Experimental Study of the Small World Problem, Sociometry, 32, 1969. [13] O. Simsek and D. Jensen, Decentralized Search in Networks Using Homophily and Degree Disparity, In Proc. 19th International Joint Conference on Artificial Intelligence, 25. [14] L. A. Adamic, R. M. Lucose, A. R. Puniyani, and B. A. Huberman, Search in power-law networks, Physical Review E, 64, 21. [15] J. Kleinberg, Navigation in a small world, Nature, 46:845, 2. [16] J. Kleinberg, The Small-World Phenomena: An Algorithmic Perspective, In Proceedings of the 32 nd ACM Symposium on Theory of Computing, 2. [17] J. Kleinberg, Small-world Phenomena and the Dynamics of Information, In Advanced in Neural Information Processing Systems (NIPS), vol. 14, 21. [18] http://moat.nlanr.net/routing/rawdata