An Efficient Neighbor Searching Scheme of Distributed Collaborative Filtering on P2P Overlay Network 1

Similar documents
A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol

Early Measurements of a Cluster-based Architecture for P2P Systems

A Super-Peer Based Lookup in Structured Peer-to-Peer Systems

Athens University of Economics and Business. Dept. of Informatics

Privacy-Preserving Collaborative Filtering using Randomized Perturbation Techniques

Architectures for Distributed Systems

Building a low-latency, proximity-aware DHT-based P2P network

Distributed Hash Table

A Survey on Various Techniques of Recommendation System in Web Mining

A Constrained Spreading Activation Approach to Collaborative Filtering

A Scalable Content- Addressable Network

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

The Tourism Recommendation of Jingdezhen Based on Unifying User-based and Item-based Collaborative filtering

A Recursive Prediction Algorithm for Collaborative Filtering Recommender Systems

Project Report. An Introduction to Collaborative Filtering

PUB-2-SUB: A Content-Based Publish/Subscribe Framework for Cooperative P2P Networks

Collaborative Filtering based on User Trends

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Should we build Gnutella on a structured overlay? We believe

Effect of Links on DHT Routing Algorithms 1

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Collaborative Filtering using a Spreading Activation Approach

amount of available information and the number of visitors to Web sites in recent years

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Aggregation of a Term Vocabulary for P2P-IR: a DHT Stress Test

IN recent years, the amount of traffic has rapidly increased

Design of a New Hierarchical Structured Peer-to-Peer Network Based On Chinese Remainder Theorem

Peer-to-Peer Systems and Distributed Hash Tables

DYNAMIC TREE-LIKE STRUCTURES IN P2P-NETWORKS

SimRank : A Measure of Structural-Context Similarity

Visualizing Recommendation Flow on Social Network

Application of Dimensionality Reduction in Recommender System -- A Case Study

Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Peer Clustering and Firework Query Model

Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize

Load Balancing in Structured P2P Systems

Robustness and Accuracy Tradeoffs for Recommender Systems Under Attack

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Distriubted Hash Tables and Scalable Content Adressable Network (CAN)

Survey of DHT Evaluation Methods

Comparing Chord, CAN, and Pastry Overlay Networks for Resistance to DoS Attacks

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

Evolution of Peer-to-peer algorithms: Past, present and future.

Problems in Reputation based Methods in P2P Networks

PChord: Improvement on Chord to Achieve Better Routing Efficiency by Exploiting Proximity

Recommender System using Collaborative Filtering Methods: A Performance Evaluation

Query Processing Over Peer-To-Peer Data Sharing Systems

Collaborative Filtering Using Random Neighbours in Peer-to-Peer Networks

Semantic feedback for hybrid recommendations in Recommendz

Shaking Service Requests in Peer-to-Peer Video Systems

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm

P2P Overlay Networks of Constant Degree

A Constrained Spreading Activation Approach to Collaborative Filtering

The Design and Implementation of an Intelligent Online Recommender System

Peer-to-peer computing research a fad?

Chapter 10: Peer-to-Peer Systems

Making Gnutella-like P2P Systems Scalable

DATA. The main challenge in P2P computing is to design and implement LOOKING UP. in P2P Systems

CIS 700/005 Networking Meets Databases

Distributed Information Processing

A Scalable Content- Addressable Network

Scalable P2P architectures

Aggregation of a Term Vocabulary for Peer-to-Peer Information Retrieval: a DHT Stress Test

Scalable and Self-configurable Eduroam by using Distributed Hash Table

Content-Based Recommendation for Web Personalization

Content-based Dimensionality Reduction for Recommender Systems

Mill: Scalable Area Management for P2P Network based on Geographical Location

Distributed Hash Tables: Chord

ReCord: A Distributed Hash Table with Recursive Structure

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Proposing a New Metric for Collaborative Filtering

Distributed Hash Tables

08 Distributed Hash Tables

A Peer-to-Peer Architecture to Enable Versatile Lookup System Design

Update Propagation Through Replica Chain in Decentralized and Unstructured P2P Systems

Efficient Resource Management for the P2P Web Caching

Relaxing Routing Table to Alleviate Dynamism in P2P Systems

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General

PERFORMANCE ANALYSIS OF R/KADEMLIA, PASTRY AND BAMBOO USING RECURSIVE ROUTING IN MOBILE NETWORKS

Kademlia: A P2P Informa2on System Based on the XOR Metric

INF5070 media storage and distribution systems. to-peer Systems 10/

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

On User Recommendations Based on Multiple Cues

EECS 426. Multimedia Streaming with Caching on Pure P2P-based Distributed e-learning System using Mobile Agent Technologies

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

A P2P File Sharing Technique by Indexed-Priority Metric

Introduction to Peer-to-Peer Systems

Performance Modelling of Peer-to-Peer Routing

Solving the Sparsity Problem in Recommender Systems Using Association Retrieval

Multi-level Hashing for Peer-to-Peer System in Wireless Ad Hoc Environment

Telematics Chapter 9: Peer-to-Peer Networks

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Transcription:

An Efficient Neighbor Searching Scheme of Distributed Collaborative Filtering on P2P Overlay Network 1 Bo Xie, Peng Han, Fan Yang, Ruimin Shen Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200030, China {Bxie, phan, fyang, rmshen}@sjtu.edu.cn Abstract. Distributed Collaborative Filtering (DCF) has gained more and more attention as an alternative implementation scheme of CF based recommender system, because of its advantage in scalability and privacy protection. However, as there is no central user database in DCF systems, the task of neighbor searching becomes much more difficult. In this paper, we first propose an efficient distributed user profile management scheme based on distributed hash table (DHT) method, which is one of the most popular and effective routing algorithm in Peer-to-Peer (P2P) overlay network. Then, we present a heuristic neighbor searching algorithm to locate potential neighbors of the active users in order to reduce the network traffic and executive cost. The experimental data show that our DCF algorithm with the neighbor searching scheme has much better scalability than traditional centralized ones with comparable prediction efficiency and accuracy. 1 Introduction While the rapid development of network and information technology provides people with unprecedented abundant information resources, it also brings the problem of information overload. So how to help people find their interested resources attracted much attention from both the researchers and the vendors. Among the technologies proposed, Collaborative Filtering (CF) has proved to be one of the most effective for its simplicity both in theory and implementation. Since Goldberg et al [1] published the first account of using it for information filtering, CF has achieved great success both in the research [2, 3] and application [4, 5, 6] area. The key idea of CF is that users will prefer those items that people with similar interests prefer. Due to different techniques used to describe and calculate the similarities between users, CF algorithms have often been divided into two general classes [3]: memory-based algorithm and model-based algorithm. Memory-based algorithms directly calculate the similarities between the active users and other users, and then use the K most similar users (K nearest neighbors) to make prediction. In contrast, model-based algorithms first construct a predictive model from the user database, and then use it to make prediction. As the calculation complexity of both the 1 Supported by the National Natural Science Foundation of China under Grant No. 60372078

2 Bo Xie, Peng Han, Fan Yang, Ruimin Shen calculation of similarities and the construction of model increased quickly both in time and space as the record in the database increases, the two kinds of algorithms both suffered from their shortage in efficiency and scalability. So recent years, Distributed CF (DCF) has gained more and more attention as an alternative implementation scheme of CF based recommender system [8, 9] because of its advantage in scalability. However, as there is no central user database in DCF systems, the task of neighbor searching becomes much more difficult. In [8], Tviet uses a routing algorithm similar to Gnutella [15] to forward the neighbor query information. As it is a broadcasting routing algorithm, when the user number is large it will cause unimaginable heavy traffic in the network. In [9], Olsson improves this by only exchanging information between neighbors, however it also reduces the efficiency of finding similar users. Still, as they all used a totally different mechanism to make prediction, their performance is hard to analyze and the existing improvement on CF algorithms cannot be used any more. In this paper, we first propose an efficient distributed user profile management scheme based on distributed hash table (DHT) method, which is one of the most popular and effective routing algorithm in Peer-to-Peer (P2P) overlay network. Then, we present a heuristic neighbor searching algorithm to locate potential neighbors of the active users. The main advantages of our algorithm include: 1. Both the user database management and prediction computation task can be done in a decentralized way which increases the algorithm s scalability dramatically. 2. The implementation of our neighbor searching algorithm on a DHT-based P2P overlay network is quite straightforward which can obtain efficient retrieval time and excellent performance at the same time. 3. It keeps all the other features of traditional memory-based CF algorithm so that the system s performance can be analyzed both empirically and theoretically and the improvement on traditional memory-based algorithm can also be applied here. The rest of this paper is organized as follows. In Section 2, several related works are presented and discussed. In Section 3, we give the architecture and key features of our algorithm, and describe the implementation of it on a DHT-based P2P overlay network in Section 4. In Section 5 the experimental results of our system are presented and analyzed. Finally we make a brief concluding remark in Section 6. 2 Related Works 2.1 Memory-Based CF Algorithm Generally, the task of CF is to predict the votes of active users from the user database which consists of a set of votes corresponding to the vote of user i on item j. The memory-based CF algorithm calculates this prediction as a weighted average of other users votes on that item through the following formula:

An Efficient Neighbor Searching Scheme of Distributed Collaborative Filtering on P2P Overlay Network 3 P n a, j = va + κ ϖ ( a, j)( vi, j vi ) i = 1 (1) Where P a, j denotes the prediction of the vote for active user a on item j and n is the number of users in user database. v i is the mean vote for user i as: 1 (2) v i = vi j Ii, j I i Where Ii is the set of items on which user i has voted. The weights ϖ ( a, j) reflect the similarity between active user and users in the user database. κ is a normalizing factor to make the absolute values of the weights sum to unity. 2.2 P2P System and DHT Routing Algorithm The term Peer-to-Peer refers to a class of systems and applications that employ distributed resources to perform a critical function in a decentralized manner. With the pervasive deployment of computers, P2P is increasingly receiving attention in research and more and more P2P systems have been deployed on the Internet. Some of the benefits of a P2P approach include: improving scalability by avoiding dependency on centralized points; eliminating the need for costly infrastructure by enabling direct communication among clients; and enabling resource aggregation. As the main purpose of P2P systems are to share resources among a group of computers called peers in a distributed way, efficient and robust routing algorithms for locating wanted resource is critical to the performance of P2P systems. Among these algorithms, distributed hash table (DHT) algorithm is one of the most popular and effective and supported by many P2P systems such as CAN [10], Chord [11], Pastry [12], and Tapestry [13]. A DHT overlay network is composed of several DHT nodes and each node keeps a set of resources (e.g., files, rating of items). Each resource is associated with a key (produced, for instance, by hashing the file name) and each node in the system is responsible for storing a certain range of keys. There are two basic operations: (1) put(key, value); (2)lookup(key), and two layers: (1)route(key, message); (2)key/value storage in the DHT overlay network. Peers in the DHT overlay network can announce what resource they have by issue a put(key, value) message, or locate their wanted resource by issue a lookup(key) request which returns the identity (e.g., the IP address) of the node that stores the resource with the certain key. The primary goals of DHT are to provide an efficient, scalable, and robust routing algorithm which aims at reducing the number of P2P hops, which are involved when we locate a certain resource, and to reduce the amount of routing state that should be preserved at each peer. In Chord [11], each peer keeps track information of logn other peers where N is the total number of peers in the community. When a peer joins and leaves the overlay

4 Bo Xie, Peng Han, Fan Yang, Ruimin Shen network, this highly optimized version of DHT algorithm will only require notifying logn peers about that change. 3 Our Neighbor Searching Algorithm 3.1 Distributed User Profile Management Scheme Distributed user profile management has two key points: Division and Location. In our scheme, we wish to divide the original centralized user database into fractions (For concision, we will call such fractions by the term bucket in the following of this paper) in such a manner that potential neighbors can be put into the same bucket. So later we can access only several buckets to fetch the useful user profiles for the active users since retrieve all the buckets will cause unimaginable traffic in the network and often unnecessary. Here, we solve the first problem by proposing a division strategy which makes each bucket hold a group of users record who has a particular <ITEM_ID, VOTE> tuple. It means that users in the same bucket at least voted one item with the same rating. Figure1 illustrate our division strategy: Fig. 1. User Database Division Strategy Later when we want to make prediction for a particular user, we only need to retrieve those buckets which the active user s record is in. This strategy is based on the heuristic that people with similar interests will at least rate one item with similar votes. As we can see in Figure 3 of section 5.3.1, this strategy has a very high hitting ratio. Still, we can see that through this strategy we reduce about 50% calculation than traditional CF algorithm and obtain comparable prediction as shown in Figure 4 in section 5. After we make the proper division and choosing strategy of buckets, we still need an efficient way to locate and retrieve the needed buckets from the distributed network. As we mentioned in section 2.2, DHT has provided an efficient infrastructure to accomplish this task. We will discuss this in detail in section 4.

An Efficient Neighbor Searching Scheme of Distributed Collaborative Filtering on P2P Overlay Network 5 3.2 Neighbor Searching in DCF 3.2.1 Return All vs. Return K In the buckets choosing strategy mentioned in section 3.1, we return all users which are in the at least one same bucket with the active user. As we can see in Figure 5 in section 5.3.2, this strategy has an O(N) fetched user number where N is the total user number. In fact, as Breese presented in [3] by the term inverse user frequency, universally liked items are not as useful as less common items in capturing similarity. So we introduce a new concept significance refinement (SR) which reduces the returned user number of the original strategy by limiting the number of returned users for each bucket. We term this strategy improved by SR as Return K which means for every rated item, we return no more than K users for each bucket and call the original strategy as Return All. The experimental result in Figure 5 and 6 of Section 5.3.2 shows that this method reduces the returned user number dramatically and also improves the prediction accuracy. 3.2.2 Improved Return K Strategy Although the Return K strategy proved to increase the scalability of our DCF algorithm, it has the shortage of unstable when the number of users in each bucket increases because the chosen of the K users is random. As we can see in Figure 7, when the total number of users is above 40,000, the average bucket size is more than 60. So the randomization of neighbor choosing will cause much uncertainty in prediction. There are two natural ways to solve this problem: increase the value of K or do some filtering at the node holding the bucket. However, the first method will increase the traffic dramatically while the second will cause much more calculation. In our method, we add a mergence procedure in our neighbor choosing strategy, which we called Improved Return K strategy. In this new strategy, before we retrieve the user record back, we first get the user ID list from each bucket which will cause little traffic. Then we merge the same user ID locally and generate a ranked list of user ID according to their occurrence. The user ID with the most occurrences, which mean those users who have the most same voting as the active user, will appear at the top of the list. After that, we only fetch the top K users record in the ranked list and made prediction based on them. We can see from Figure 5 and 6 in Section 5, the improved return K strategy has a better scalability and prediction strategy. 4 Implementation of Our Neighbor Searching Algorithm on a DHT-based P2P Overlay Network 4.1 System Architecture Figure 2 gives the system architecture of our implementation of DCF on the DHT-based P2P overlay network. Here, we view the users rating as resources and the

6 Bo Xie, Peng Han, Fan Yang, Ruimin Shen system generate a unique key for each particular <ITEM_ID, VOTE> tuple through the hash algorithm, where the ITEM_ID denotes identity of the item user votes on and VOTE is the user s rating on that item. As different users may vote particular item with same rating, each key will correspond to a set of users who have the same <ITEM_ID, VOTE> tuple corresponding to the key in their rating vector. As we stated in section 3, we call such set of users record as bucket. As we can see in Figure 2, each peer in the distributed CF system is responsible for storing one or several buckets using the distributed storage strategy described in Figure 1. Peers are connected through a DHT-based P2P overlay network. Peers can find their wanted buckets by their keys efficiently through the DHT-based routing algorithm which is the foundation of our implementation. As we can see from Figure 1 and Figure 2, the implementation of our PipeCF on DHT-based P2P overlay network is quite straightforward. DHT has provided a basic infrastructure to do the following things: 1. Define IDs, Vote ID to Node ID assignment 2. Define per-node routing table contents 3. Lookup algorithm that uses routing tables 4. Join procedure to reflect new nodes in tables 5. Failure recovery For our DHT-based CF scenario, IDs are 128-bit numbers. Vote IDs are chosen by MD5 hash of user s <ITEM_ID,VOTE> tuple, and Node IDs are chosen by MD5 hash of user s <IP address, MAC address>. Key is stored on node with numerically closest ID. If node and key IDs are uniform, we get reasonable load balance. There are already two basic operations in DHT overlay network: put(key, value) and lookup(key). And for our special DHT-based CF scenario, we present two new operations: getuserid(votekey) and getuservalue(votekey,userkey). The first operation returns many userids which have the same <ITEM_ID,VOTE> tuple with the active user, and the second operation returns the candidate user s real rating vectors.

An Efficient Neighbor Searching Scheme of Distributed Collaborative Filtering on P2P Overlay Network 7 Local vote vector Cached vote vectors Item_ID_1 1.0 Item_ID_2 5.0 Item_ID_1 3.0 USER_5 Item_ID_n 9.0 Item_ID_1 3.0 USER_K USER_5 Local vote vector Item_ID_1 5.0 Item_ID_2 9.0 Item_ID_n 2.0 Cached vote vectors Item_ID_4 4.0 USER_3 Item_ID_4 4.0 USER_K Local vote vector Item_ID_1 9.5 Item_ID_2 3.0 Item_ID_n 7.5 Cached vote vectors Item_ID_2 4.0 USER_3 Item_ID_2 4.0 USER_T USER_4 USER_3 Local vote vector Item_ID_1 5.0 Item_ID_2 7.5 Item_ID_n 6.0 Cached Vote Vectors Local vote vector Item_ID_1 5.5 Item_ID_2 2.5 Item_ID_n 9.0 Cached vote vectors Item_ID_3 4.0 USER_3 Item_ID_3 4.0 USER_K Fig. 2. System Architecture of Distributed CF Recommender System 5 Experimental Results 5.1 Data Set We use EachMovie data set [7] to evaluate the performance of improved algorithm. The EachMovie data set is provided by the Compaq System Research Center, which ran the EachMovie recommendation service for 18 months to experiment with a collaborative filtering algorithm. The information they gathered during that period consists of 72,916 users, 1,628 movies, and 2,811,983 numeric ratings ranging from 0 to 5. 5.2 Metrics and Methodology The metrics for evaluating the accuracy of we used here is statistical accuracy metrics which evaluate the accuracy of a predictor by comparing predicted values with userprovided values. More specifically, we use Mean Absolute Error (MAE), a statistical accuracy metrics, to report prediction experiments for it is most commonly used and easy to understand: MAE = a T a, j va, j p T (3)

8 Bo Xie, Peng Han, Fan Yang, Ruimin Shen Where v a, j is the rating given to item j by user a, is the predicted value of user a on item j, T is the test set, T is the size of the test set. We select 2000 users and choose one user as active user per time and the remainder users as his candidate neighbors, because every user only make self s recommendation locally. We use ALL-BUT-ONE strategy [3] and the mean prediction accuracy of all the 2000 users as the system's prediction accuracy. 5.3 Experimental Result We design several experiments for evaluating our algorithm and analyze the effect of various factors by comparison. All our experiments are run on a Windows 2000 based PC with Intel Pentium 4 processor having a speed of 1.8 GHz and 512 MB of RAM. 5.3.1 The Efficiency of Neighbor Choosing We used a data set of 2000 users and show among the users chosen by Return-All neighbor searching scheme, how many are in the top-100 users who have the most similarities with active users calculated by the traditional memory-based CF algorithms in Figure 3. We can see from the data that when the user number rises above 1000, more than 80 users who have the most similarities with the active users are chosen by our Neighbor choosing scheme. 5.3.2 Performance Comparison We compare the prediction accuracy of traditional CF algorithm and our Return-All algorithm and the results are shown as Figure 4. We can see that our algorithm has better prediction accuracy than the traditional CF algorithm. This result looks surprising at the first sight as the traditional CF selecting similar users from the whole user database while we use only a fraction. However, as we look in depth into our strategy, we find that we may filter out those users who have high-correlations with the active users but no same ratings. We have found that these users are bad predictors in [16] which provide explanation to our result. We then compared the performance of Return K and Return All by setting the K as 5. From the result Figure 6 we can see that by eliminating those users who have same ratings on popular items, the prediction accuracy can be increased further. DHT-based bucket distribution is illustrated in Figure 7, we can see that the more users, the larger bucket size, and when the total number of users is above 40,000, the average bucket size is more than 60. At last, we compare the performance of Improved Return K strategy. We can see In Figure 6 that we can obtain better performance while retrieving only 30 user records. Size of Users in Top 100(Predict 20%) 92 90 88 86 84 82 80 78 Vector Similarity Pearson Similarity 76 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Total Size of Train Set(# of users) Fig. 3. How Many Users Chosen by DHT-based CF Fall in Traditional CF s Top 100 Mean Absolute Error 0.875 0.87 0.865 0.86 0.855 0.85 Traditional CF DHT-based CF(Return All) 0.845 400 600 800 1000 1200 1400 1600 1800 Total Size of Train Set(# of users) Fig. 4. DHT-based CF vs. Traditional CF

An Efficient Neighbor Searching Scheme of Distributed Collaborative Filtering on P2P Overlay Network 9 Size of Fetched DHT Users 1000 900 800 700 600 500 400 300 200 DHT-based CF Return All DHT-based CF Return K(K=5) DHT-based CF Inproved Return K(K=30) Mean Absolute Error 0.865 0.86 0.855 0.85 0.845 0.84 0.835 0.83 DHT-based CF Return All DHT-based CF Return K(K=5) DHT-based CF improved Return K(K=30) 100 0.825 0 400 600 800 1000 1200 1400 1600 1800 Total Size of Train Set(# of users) Fig. 5. The Effect on Scalability of Our Neighbor Searching Strategy 0.82 400 600 800 1000 1200 1400 1600 1800 Total Size of Train Set(# of users) Fig. 6. The Effect on Prediction Accuracy of Our Neighbor Searching Strategy 7000 6000 4000 users of eachmovie 40000 users of eachmovie 5000 Bucket Number 4000 3000 2000 1000 0 0 50 100 150 200 250 300 User Number per Bucket(# of users) Fig. 7. The DHT Buckets Size Distribution 6 Conclusion In this paper, we first propose an efficient distributed user profile management scheme based on distributed hash table (DHT) method in order to solve the scalability problem of centralized KNN-based CF algorithm. Then, we present a heuristic neighbor searching algorithm to locate potential neighbors of the active users in order to reduce the network traffic and executive cost. The experimental data show that our DCF algorithm with the neighbor searching scheme has much better scalability than traditional centralized ones with comparable prediction efficiency and accuracy.

10 Bo Xie, Peng Han, Fan Yang, Ruimin Shen References 1. David Goldberg, David Nichols, Brian M. Oki, Douglas Terry.: Using collaborative filtering to weave an information tapestry, Communications of the ACM, v.35 n.12, p.61-70, Dec. 1992. 2. J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl.: An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 230-237, 1999. 3. Breese, J., Heckerman, D., and Kadie, C.: Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, 1998 (43-52). 4. Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, John Riedl.: GroupLens: an open architecture for collaborative filtering of netnews, Proceedings of the 1994 ACM conference on Computer supported cooperative work, p.175-186, October 22-26, 1994, Chapel Hill, North Carolina, United States. 5. Upendra Shardanand, Pattie Maes.: Social information filtering: algorithms for automating word of mouth, Proceedings of the SIGCHI conference on Human factors in computing systems, p.210-217, May 07-11, 1995, Denver, Colorado, United States. 6. G. Linden, B. Smith, and J. York, Amazon.com Recommendations Item-to-item collaborative filtering, IEEE Internet Computing, Vo. 7, No. 1, pp. 7680, Jan. 2003. 7. Eachmovie collaborative filtering data set.: http://research.compaq.com/src/eachmovie 8. Amund Tveit.: Peer-to-peer based Recommendations for Mobile Commerce. Proceedings of the First International Mobile Commerce Workshop, ACM Press, Rome, Italy, July 2001, pp. 26-29. 9. Tomas Olsson.: "Bootstrapping and Decentralizing Recommender Systems", Licentiate Thesis 2003-006, Department of Information Technology, Uppsala University and SICS, 2003 10. J. Canny.: Collaborative filtering with privacy. In Proceedings of the IEEE Symposium on Research in Security and Privacy, pages 45--57, Oakland, CA, May 2002. IEEE Computer Society, Technical Committee on Security and Privacy, IEEE Computer Society Press. 11. S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker.: A scalable contentaddressable network. In SIGCOMM, Aug. 2001 12. Stocal I et al.: Chord: A scalable peer-to-peer lookup service for Internet applications (2001). In ACM SIGCOMM, San Diego, CA, USA, 2001, pp.149-160 13. Rowstron A. Druschel P.: Pastry: Scalable, distributed object location and routing for large scale peer-to-peer systems. In IFIP/ACM Middleware, Hedelberg, Germany, 2001 14. Zhao B Y et al.: Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech.Rep.UCB/CSB-0-114,UC Berkeley,EECS,2001 15. M.Ripeanu, "Peer-to-peer Architecture Case Study: Gnutella Network", Technical Report, University of Chicago, 2001. 16. Han Peng,Xie Bo,Yang Fan,Shen Ruimin, A Scalable P2P Recommender System Based on Distributed Collaborative Filtering, Expert systems with applications,27(2) Elsevier Sep 2004 (To appear)