Peer Clustering and Firework Query Model

Similar documents
Peer Clustering and Firework Query Model in the Peer-to-Peer Network

Early Measurements of a Cluster-based Architecture for P2P Systems

Peer Clustering and Firework Query Model in Peer-to-Peer Networks

Building a low-latency, proximity-aware DHT-based P2P network

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

P2P Information Retrieval: A Self-Organizing Paradigm

Architectures for Distributed Systems

A Server-mediated Peer-to-peer System

Small World Overlay P2P Networks

A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Athens University of Economics and Business. Dept. of Informatics

DYNAMIC TREE-LIKE STRUCTURES IN P2P-NETWORKS

A Square Root Topologys to Find Unstructured Peer-To-Peer Networks

Should we build Gnutella on a structured overlay? We believe

Introduction to P2P Computing

A LIGHTWEIGHT FRAMEWORK FOR PEER-TO-PEER PROGRAMMING *

Understanding Chord Performance

INF5071 Performance in distributed systems: Distribution Part III

Subway : Peer-To-Peer Clustering of Clients for Web Proxy

Comparing Chord, CAN, and Pastry Overlay Networks for Resistance to DoS Attacks

Application Layer Multicast For Efficient Peer-to-Peer Applications

Scalable overlay Networks

A Transpositional Redundant Data Update Algorithm for Growing Server-less Video Streaming Systems

Neighborhood Signatures for Searching P2P Networks

AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

Design of a New Hierarchical Structured Peer-to-Peer Network Based On Chinese Remainder Theorem

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

Improved Dominating Set Indices for Mobile Peer-to-Peer Networks

Evaluating Unstructured Peer-to-Peer Lookup Overlays

Chapter 6 PEER-TO-PEER COMPUTING

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

Peer-to-Peer (P2P) Systems

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

A General Framework for Searching in Distributed Data Repositories

Evolution of Peer-to-peer algorithms: Past, present and future.

Semantic-laden Peer-to-Peer Service Directory

SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks

Exploiting Semantic Clustering in the edonkey P2P Network

Update Propagation Through Replica Chain in Decentralized and Unstructured P2P Systems

ENSC 835: HIGH-PERFORMANCE NETWORKS CMPT 885: SPECIAL TOPICS: HIGH-PERFORMANCE NETWORKS. Scalability and Robustness of the Gnutella Protocol

Evaluation Study of a Distributed Caching Based on Query Similarity in a P2P Network

ReCord: A Distributed Hash Table with Recursive Structure

: Scalable Lookup

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

IN recent years, the amount of traffic has rapidly increased

A Small World Overlay Network for Semantic Based Search in P2P Systems

A Survey of Peer-to-Peer Systems

Distributed Hash Table

Multi-level Hashing for Peer-to-Peer System in Wireless Ad Hoc Environment

Adaptively Routing P2P Queries Using Association Analysis

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Overlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen

OMNIX: A topology-independent P2P middleware

Advanced Computer Networks

Peer-to-Peer Semantic Search Engine

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Modeling and Analysis of Random Walk Search Algorithms in P2P Networks

Ossification of the Internet

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network

A DHT-Based Grid Resource Indexing and Discovery Scheme

Peer-to-Peer Systems. Chapter General Characteristics

Lecture-2 Content Sharing in P2P Networks Different P2P Protocols

A Scalable and Robust QoS Architecture for WiFi P2P Networks

12/5/16. Peer to Peer Systems. Peer-to-peer - definitions. Client-Server vs. Peer-to-peer. P2P use case file sharing. Topics

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

Interest-Based Lookup Protocols for Mobile Ad Hoc Networks *

P2P Based Architecture for Global Home Agent Dynamic Discovery in IP Mobility

Hybrid Overlay Structure Based on Random Walks

Peer-To-Peer Techniques

A New Version of K-random Walks Algorithm in Peer-to-Peer Networks Utilizing Learning Automata

An Intelligent Home Environment based on Service Planning over Peer-to-Peer Overlay Network

APSALAR: Ad hoc Protocol for Service-Aligned Location Aware Routing

A Peer-to-Peer Architecture to Enable Versatile Lookup System Design

A Super-Peer Based Lookup in Structured Peer-to-Peer Systems

Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd

Peer-to-Peer Signalling. Agenda

Searching in variably connected P2P networks

An Analysis of the Overhead and Energy Consumption in Flooding, Random Walk and Gossip based Resource Discovery Protocols in MP2P Networks

A Search Theoretical Approach to P2P Networks: Analysis of Learning

Flexible Information Discovery in Decentralized Distributed Systems

L3S Research Center, University of Hannover

A Top Catching Scheme Consistency Controlling in Hybrid P2P Network

Effect of Links on DHT Routing Algorithms 1

BOOTSTRAPPING LOCALITY-AWARE P2P NETWORKS

Proximity Based Peer-to-Peer Overlay Networks (P3ON) with Load Distribution

Mill: Scalable Area Management for P2P Network based on Geographical Location

A New Adaptive, Semantically Clustered Peer-to-Peer Network Architecture

Aggregation of a Term Vocabulary for P2P-IR: a DHT Stress Test

Finding Good Peers in Peer-to-Peer Networks

Efficient Broadcast in P2P Grids

Telematics Chapter 9: Peer-to-Peer Networks

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

Location Efficient Proximity and Interest Clustered P2p File Sharing System

Three Layer Hierarchical Model for Chord

Searching for Shared Resources: DHT in General

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Transcription:

Peer Clustering and Firework Query Model Cheuk Hang Ng, Ka Cheung Sia Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, N.T., Hong Kong SAR {chng,kcsia}@cse.cuhk.edu.hk ABSTRACT In this paper, we present a new strategy for information retrieval over the peer-to-peer (P2P) network. To avoid query messages flooding and saving resources in handling irrelevant queries in the P2P network, we propose a method for clustering peers that share similar properties together, thus, data inside the P2P network will be organized in a fashion similar to a Yellow Pages. In order to make use of our clustered P2P network efficiently, we also propose a new intelligent query routing strategy, the Firework Query Model. In contrast to broadcasting query message, our query message will first walk around the network from peer to peer randomly, once it reaches the target cluster, the query message explodes and is broadcasted by peers inside the cluster. We present our algorithm, give analysis and experimental results to demonstrate our method. Keywords Peer-to-Peer, Information Retrieval, Peer Clustering, Intelligent Query Routing 1 Introduction The appearance of P2P applications such as Gnutella [1] and Napster [2] has demonstrated the significance of distributed information sharing systems. Users can access the information distributed among peers around the world. These models offer advantage of decentralization by distributing the storage, information and computation cost among the peers. However, since there is no centralized server to keep track of what data are being stored in what peer, we do not know which peer contains the information we want. Therefore, when a peer wants to search for a file, he needs to broadcast query to all his peers. Likewise, his peers propagate the query to their peers and so on. There are two problems associated with current strategies. Since every query is broadcasted to every peer in the network, each peer has to waste resources in handling irrelevant query. Moreover, broadcasting query messages across the network also increases network traffic. Portmann et. al. [3] investigated the problem of scalability in P2P network due to network traffic cost. Rowstron et. al. [5] and Stoica et. al. [6] proposed their algorithm targeting at reducing network traffic in P2P network. Ramanathan et. al. [4] proposed a clustering strategy to reduce query traffic, while increasing the goodness of search result. In order to solve the problems above, our proposed strategy will cluster peers of similar properties together, just like the organization in a Yellow Pages, which makes query more systematic. In addition, our intelligent Firework Query Model will help in reducing network traffic by forwarding query selectively rather than the original Brute- Force-Search (BFS) broadcasting. Under our query model, irrelevant query subjects to a particular peer will not go into its cluster, once the query arrives at a matching peer, it is broadcasted inside its cluster. As a result, we avoid unnecessary traffic while fully utilize each query message.

Figure 1. Attractive Link Selection (a) left and Firework Query Algorithm (b) right 2 Peer Clustering and Firework Query Model 2.1 Peer Clustering Inside our P2P network, each peer is connected by two types of link, random link and attractive link. Before continuing on how to manipulate these two types of link, we introduce the following terms: Random link - link random (p): The connection which a peer p makes randomly to another peer in the network. It is chosen by the user. Attractive link - link attractive (p): The connection which a peer p makes explicitly to another peer, which they share similar data. Cat(p),Cat(Q): A signature value representing the characteristic of a peer p, and a query Q respectively. Sim(p, q): The similarity between two peers p, q, which is a distance measure between Cat(p) and Cat(q). Sim(p, Q): The similarity between a peer p and a query Q,,which is a distance measure between Cat(p) and Cat(Q). P eer(p, t): The set of peers which a peer p can reach within t hops. Based on the above definition, we introduce an algorithm to choose the attractive link as illustrated in Fig. 1(a). Referring to Fig. 2(a), we illustrate how our peer clustering strategy works. In the beginning, let us assume every peer can be represented by a value Cat(p) based on the characteristic of that peer p respectively. In the mean time, we consider this value to be geographical location. When a peer joins the network, it will connect to another peer randomly. Through ping-pong messages [1], it learns the set of peers within a certain number of hops away from it (P eer(p, t)). Based on the attractive link algorithm, it will connect to a peer that shares similar properties through attractive links. In Fig. 2 (a), the peer CN 4 first connects to UK 3 by random link, and by ping-pong messages, it learns the location of other peers belong to category CN, then it makes an attractive link to CN 2 to perform peer clustering. Our algorithm will choose a similar peer with largest hop count to connect, so CN 2 is chosen. As a result, you notice that peers of similar characteristic will connect together to form a cluster, which makes information retrieval much more systematic. In the example we used, geographical information is used to determine the similarity between peers. However, our algorithm is generic in the sense that, as long as we can choose a good descriptor to describe a peer, for example, the major file type it shares, the images content information it shares, will also work properly. 2.2 Firework Query Model To make use of our clustered P2P network, we propose the Firework Query Model. In this model, a query message will first walk around the network from peer to peer by random link, then once it reaches the target cluster, the query message will be broadcasted by peers using attractive link insides the cluster as shown in Fig. 2 (b).

To decide which peer a query is being sent to, we introduce an algorithm to determine when and how a query message explodes in Fig. 1 (b). Referring to Fig. 2 (a) again, we illustrate an example to show how the firework query model works. Assume the new peer CN 4, whose geographical location is China, initiates a search to find an American song. Since the geographical information of his query is far away from his geographical location, therefore, CN 4 will send the query to UK 3 through random link. When UK 3 receives the query, it will forward the query to US 2 through random link again. After walking around the network randomly, the query message reaches its target cluster and starts to explode. During the explosion, US 2 will broadcast the query to US 1 through attractive link. Likewise, US 1 will broadcast the query to US 2, CN 1 and so on. Figure 2. Illustration of peer clustering (a) left and firework query (b) right Similar to IP packets and Gnutella messages, our query messages also have a Time-To-Live (TTL). This avoids messages from circling around the network forever. Once the TTL decreases to zero, the message will be dropped and no longer forwarded. The only difference between our query messages and the gnutella messages is that TTL will not be decreased when the messages are sent through attractive link. The reason is that, when a query reaches its target cluster, all peers inside the cluster are highly related, in order to get as more hits as possible, there is no reason to decrease the TTL and prevent further searching inside the highly related cluster. 3 Experiment We develop a program to simulate the P2P environment and verify the effectiveness of our proposed strategy. We investigate the effect on query efficiency subjects to increasing number of peers, different values of query message s TTL. We formulate five cases to compare the difference in performance subject to different network set-up methods and query strategies, described in the following. Random BFS (BFS): Like pure P2P network (Gnutella) [1], network topology is built randomly and queries are broadcasted to all connected peers. Centralized fuzzy (CF): Peer acquires knowledge of other peers from a centralized server to determine how to make attractive link. Queries are forwarded to attractive link if Sim(p, q) < ɛ, otherwise to random link. We consider this to be fuzzy because queries are considered similar when the similarity measures are still within a range. In our experiments, for example, category 5 peer will treat category 4 and 6 queries as matching queries and forward them through attractive link. Learning fuzzy (LF): Peer acquires knowledge of others through ping-pong messages to learn a partial picture of the network, and establishes attractive link based on this information. Queries are forwarded in the same way as Centralized fuzzy.

Centralized discrete (CD): Network setting-up method is the same as Centralized fuzzy. Queries are forwarded to attractive link if Sim(p, q) = 0, otherwise to random link. Learning discrete (LD): Network setting-up method is the same as Learning fuzzy. Queries are forwarded in the same way as Centralized discrete. In the experiments, we assume there are ten categories of peers, and assign different Cat(p) values evenly. Also, queries generated only fall in these ten categories, when a peer receives a query of the same category, it will fire a hit count. We measure the recall of a particular query as the number of hit counts over total number of peers under that category. We record three measures, the number of query message, recall, and recall per query message (efficiency of query) against the number of peers and TTL. Referring to Fig. 3(b), our proposed strategy shows a promising sub-linear increases in number of query messages with increasing TTL, while BFS increases exponentially. In addition, we show that our strategy has much improvement on recall per query message than traditional BFS as shown in Fig. 3(b) and 4(b), also, the recall performance is shown in Fig. 4(a), which indicates our algorithm still outperforms BFS while we use less number of query messages. Detailed data are listed in Table 1 and 2. 4 Conclusion and Future Work We empirically examine several aspects of the performance of clustered P2P network. Among the many possible ways to extend the current work, perhaps the most challenging one is to choose one or more descriptor(s) to describe a particular peer for use in clustering, where this may be a single value, a vector or even multi-dimension vectors to precisely describe a peer. Figure 3. Number of query packet (a) left and query efficiency (b) right against TTL Figure 4. Recall measure (a) left and Query efficiency (b) right against number of peers

Table 1. Average number of query messages and recall per query message against increasing TTL over 10 simulation runs number of query messages recall per query message ( 10 6 ) TTL BFS CF LF CD LD BFS CF LF CD LD 2 272 33986 191 4039 173 1.845 20.382 8.967 49.366 6.896 3 1245 51781 22337 10612 1500 1.508 17.374 12.365 46.984 30.994 4 6053 53635 56235 16593 12632 1.506 16.847 16.281 48.211 32.951 5 28959 53825 62263 17496 14895 1.557 16.733 16.061 45.688 40.219 6 63896 60504 59419 24179 25566 1.877 16.528 16.830 41.358 39.114 7 125749 64548 63070 29476 25162 1.842 15.492 15.855 33.926 39.742 Table 2. Average recall and recall per query message ( 10 6 ) against increasing number of peers over 10 simulation runs Recall Recall per query message ( 10 6 ) number of peers BFS CF LF CD LD BFS CF LF CD LD 30000 0.003407 1.000000 0.954259 0.699008 0.163576 6 61 55 158 143 27000 0.005353 0.798179 0.861528 0.797958 0.346832 5 61 66 172 138 24000 0.004764 1.000000 0.741765 0.500960 0.368099 8 71 70 194 160 20000 0.004273 1.000000 0.799729 0.499800 0.349756 9 88 92 244 204 15000 0.008146 0.902563 0.693782 0.501639 0.304515 10 113 113 307 236 10000 0.009970 1.000000 0.833564 0.500250 0.461937 19 162 174 376 340 5000 0.016126 1.000000 0.967871 0.513788 0.395052 33 342 340 905 737 2000 0.052970 1.000000 0.993896 0.899050 0.275470 101 731 729 2076 1198 500 0.134000 1.000000 0.689796 0.699797 0.490239 395 2890 2815 5598 3954 5 Acknowledgements This project is supported in part by grants from the Hong Kong s Research Grants Council (RGC) under CUHK 4407/99E and #2050259. References [1] The gnutella homepage. In http://www.gnutella.com. [2] The napster homepage. In http://www.napster.com. [3] M. Portmann, P. Sookavatana, S. Ardon, and A. Seneviratne. The cost of peer discovery and searching in the gnutella peer-to-peer file sharing protocol. In Proceedings to the International Conference on Networks, volume 1, 2001. [4] M. K. Ramanathan, V. Kalogeraki, and J. Pruyne. Finding good peers in peer-to-peer networks. In International Parallel and Distributed and Computing Symposium, April 2002. [5] A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In Proceedings of the 18th IFIP/ACM International Conference on Distributed Systems Platforms, November 2001. [6] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of ACM SIGCOMM, pages 149 160, August 2001.