A P2P REcommender system based on Gossip Overlays (PREGO)

Similar documents
FROM PEER TO PEER...

A P2P REcommender system based on Gossip Overlays (PREGO)

Kleinberg s Small-World Networks. Normalization constant have to be calculated:

A POTPOURRI OF DISTRIBUTED COMPUTING RESEARCHES. Patrizio Dazzi. lunedì 25 giugno 12

Finding Similar Sets. Applications Shingling Minhashing Locality-Sensitive Hashing

Fast Topology Management in Large Overlay Networks

Being Prepared In A Sparse World: The Case of KNN Graph Construction. Antoine Boutet DRIM LIRIS, Lyon

Gossip-based Networking for Internet-Scale Distributed Systems

Consistency and Replication (part b)

Collaborative Filtering using a Spreading Activation Approach

Intro to Peer-to-Peer Search

Hybrid Recommendation System Using Clustering and Collaborative Filtering

Project Report. An Introduction to Collaborative Filtering

Gozar: NAT friendly Peer Sampling with One Hop Distributed NAT Traversal

Shuffling with a Croupier: Nat Aware Peer Sampling

Block lectures on Gossip-based protocols

Constructing Overlay Networks through Gossip

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Cloudy Weather for P2P

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

PARALLEL AND DISTRIBUTED PLATFORM FOR PLUG-AND-PLAY AGENT-BASED SIMULATIONS. Wentong CAI

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

AVMEM: Availability-Aware Overlays for Management Operations in Non-cooperative Distributed Systems

Overlapping Ring Monitoring Algorithm in TIPC

An Analytical Model of Information Dissemination for a Gossip-based Protocol

Clustering: Classic Methods and Modern Views

BUFFER management is a significant component

User-Relative Names for Globally Connected Personal Devices

Functionality, Challenges and Architecture of Social Networks

Replica Placement. Replica Placement

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Optimizing Capacity-Heterogeneous Unstructured P2P Networks for Random-Walk Traffic

over Multi Label Images

Unsupervised Learning

Gossiping in Distributed Systems Foundations

Building a Recommendation System for EverQuest Landmark s Marketplace

Predicting User Ratings Using Status Models on Amazon.com

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Predictive Indexing for Fast Search

Real-time Collaborative Filtering Recommender Systems

Performance Evaluation of NoSQL Databases

A Constrained Spreading Activation Approach to Collaborative Filtering

An Adaptive Stabilization Framework for DHT

SIPCache: A Distributed SIP Location Service for Mobile Ad-Hoc Networks

A Scalable Content- Addressable Network

MASTER DEGREE COMPUTER SCIENCE COMPUTER SCIENCE AND NETWORKING. Peer to Peer Systems LAURA RICCI 2/5/2011

Some Interesting Applications of Theory. PageRank Minhashing Locality-Sensitive Hashing

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Intuitive distributed algorithms. with F#

Stochastic Models of Pull-Based Data Replication in P2P Systems

Jeff Howbert Introduction to Machine Learning Winter

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS

GOSSIP ARCHITECTURE. Gary Berg css434

Community-Based Recommendations: a Solution to the Cold Start Problem

A Case For OneSwarm. Tom Anderson University of Washington.

Developing Focused Crawlers for Genre Specific Search Engines

PUBLISHER Subscriber system is an event notification service

DIMPLE-II: Dynamic Membership Protocol for Epidemic Protocols 1

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

GLive: The Gradient overlay as a market maker for mesh based P2P live streaming

ICT 6544 Distributed Systems Lecture 2: ARCHITECTURES

Peer-to-Peer Support of Massively Multiplayer Games

CS514: Intermediate Course in Computer Systems

Learning independent, diverse binary hash functions: pruning and locality

PeerSim: A Brief Introduction

Minimizing Churn in Distributed Systems

Collaborative Filtering using Euclidean Distance in Recommendation Engine

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

Overlay and P2P Networks. Unstructured networks: Freenet. Dr. Samu Varjonen

Self-optimised Tree Overlays using Proximity-driven Self-organised Agents

Introduction to Data Mining

Architectural Approaches for Social Networks. Presenter: Qian Li

Mining Web Data. Lijun Zhang

A Constrained Spreading Activation Approach to Collaborative Filtering

Cluster Analysis. Ying Shen, SSE, Tongji University

Discovery of Stable Peers in a Self-Organising Peer-to-Peer Gradient Topology

10. Replication. Motivation

Gene Clustering & Classification

Distriubted Hash Tables and Scalable Content Adressable Network (CAN)

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

BIG DATA AND CONSISTENCY. Amy Babay

Making Gnutella-like P2P Systems Scalable

SMART: Lightweight Distributed Social Map Based Routing in Delay Tolerant Networks

CS 514: Transport Protocols for Datacenters

COSC6376 Cloud Computing Homework 1 Tutorial

Exploiting Communities for Enhancing Lookup Performance in Structured P2P Systems

Using Geometrical Routing for Overlay Networking in MMOGs

COMP6237 Data Mining Making Recommendations. Jonathon Hare

Efficient Block Based Motion Estimation

Data Distribution in Large-Scale Distributed Systems

Lesson 9 Applications of DHT: Bittorrent Mainline DHT, the KAD network

Resource Allocation Strategies for Multiple Job Classes

WSN Routing Protocols

Sentiment analysis under temporal shift

Coriolis: Scalable VM Clustering in Clouds

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems

It also performs many parallelization operations like, data loading and query processing.

TRANSACTIONAL CLUSTERING. Anna Monreale University of Pisa

Transcription:

10 th IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY Bradford,UK, 29 June - 1 July, 2010 Ranieri Baraglia, Patrizio Dazzi, Matteo Mordacchini ISTI,CNR, Pisa,Italy Laura Ricci University of Pisa,Italy

PRESENTATION OUTLINE Recommender Systems: general characteristics Our Approach: PREGO Gossip Protocols: basic issues PREGO User profile definition Similarity metrics Architecture of the P2P overlays Recommendations spreading Experimental Results Conclusions

RECOMMENDER SYSTEMS: CHARACTERISTICS Recommender systems (recommendation systems): information filtering systems attempting to recommend information items that are likely to be of interest to the user compare the user profile to some reference characteristics to predict the 'rating' that a user would give to an item he does not know yet Content Based Approaches The user profile is compared with some characteristics of the item Collaborative Filtering (Social Information Filtering) Exploits the social environment of the user Automates the 'word-of-mouth' process: items are recommended to a user by other people with similar tastes

RECOMMENDER SYSTEMS: PREGO A P2P system exploiting the collaborative exchange of information between users in order to build a system able to cluster users with similar interests spread useful suggestions among them PREGO distinguishing features: builds a user profile exploiting content accessed from the user (purchase items, visited pages,...). Based on the fact that a user will show interest in the future for content accessed in the past exploits a set of gossip protocols to cluster users with similar interests defines a distributed recommandation service no centralized authority storing users profiles and ratings: preserve user privacy no centralized/controlled recommandations: a 'democratic' suggestion system

GOSSIP PROTOCOLS: CENTRALIZED APPROACHES Gossip (epidemic) protocols: originally introduced in 1987 for propagating updates in loosely replicated databases to maintain replica consistency The original gossip protocol: each node has a complete view of the nodes of the network periodically picks a random node N from the view and exchange some data with N Main characteristics of the approach information known by any one node of the network is spread to all other nodes with very high probability. requires a complete knowledge of the network at each node and this is infeasible for large scale, highly dynamic networks, like P2P systems.

A P2P RANDOM SAMPLING SERVICE A P2P gossip protocol requires the definition of a distributed service that returns to each peer a random sample of the nodes A distributed Random Samplice Service implemented through a gossip approach: each peer P knows a small, continuously changing set of other peers (its neighbours) defining its view periodically chooses a random node Q from its view P and Q exchange a set of neighbours randomly picked from their views. the continous mixing of neighbours generates random overlays The random sampling service is exploited by higher level gossiping protocols to epidemically spread information over the network.

A P2P RANDOM SAMPLING SERVICE when a peer P (ex: 2) choose a peer Q (ex: 9) to gossip, the neighbour relation between P and Q is reversed P sends its descriptor to Q, and deletes the descriptor of Q from its view

CYCLON: AN ENHANCED P2P SAMPLING SERVICE Cyclon [Voulgaris 2006]: A P2P random sampling service improving the quality of the overlay in terms of randomness View = set of node descriptors including the contact information of a peer a numeric age field: number of gossip cycles since the moment the descritor was created (descriptor age) Enhanced protocol: A peer P select the neighbour Q with the highest age sends its 'fresh descriptor' to Q, i.e. a descriptor of age 0 With respect to the basic gossip protocol, Cyclon prevents links to dead nodes from lingering indefinitively imposes a predictible lifetime to each descriptor this implies an even spreading of links across the overlay

OUR PROPOSAL Extract a profile for each peer according to the past interests of the user. The profile may be constructed by considering resources recently accessed items recently purchased pages visited Exploit a gossip protocol to define clusters of peers characterized by similar interests definition of a proper similarity metrics exploits the distributed sampling service defined by Cyclon Spread recommendations of new acquired content to proper cluster of peers

PREGO: THE OVERLAYS A peer may be characterized by a set of interests An overlay for each interest (sport, music,films..) within a single overlay clusters correspond to highly similar peers, for instance: an overlay for peers interested in films different clusters for different film genres (action, westerns,...)

THE DEFINITION OF THE USER PROFILE Let I be the set of items of the system and I p I the subset of items belonging to a peer p The profile π p is defined as follows: π p = { ( i, C( i ), R( i ) ) i I p } C( i ) is the content associated with the item i R( i ) is the rating given by p to i The set of items I p can be partitioned according to the set of interests of the peer p if the peer p has a set I p = {I p,..., 1 Ip } of interests, π k p (Ip ) is the set of j items associated with the interest I p j Different overlays associated to different interests

THE DEFINTION OF A SIMILARITY METRICS Jaccard similarity. Given a pair of peer p 1 and p 2 and a pair of interests

THE GOSSIP PROTOCOLS Each interest based P2P overlay is constructed through a gossip protocol A two-layered gossip framework PREGO PEER SAMPLING The Sampling Layer is responsible for feeding the PREGO-layer with nodes uniformely randomly selected from the whole overlay The PREGO Layer in charge of discovering peers characterized by similar interests

GOSSIP PROTOCOL: THE ACTIVE THREAD

GOSSIP PROTOCOL: THE PASSIVE THREAD

SPREADING RECOMMENDATIONS when an overlay is stabilized, a peer can consider its neighbours as the representatives of a community of 'friends' with similar interests when a peer p of the community becomes acquainted with a new item h p discovers the interest group IG corresponding to h p forwards a recommendation for h on the overlay corresponding to IG p exploits the similarity function to evaluate the interest of its neighbours for the item h recommendation is forwarded only iff the similarity between h and the neighbour is above a given threshold. for instance, a new film is recommended to a neighbour iff the intersection among the genres of the film and the profile of the neighbour is above a given threshold.

EXPERIMENTAL ENVIRONMENT A prototype implemented by the Overlay Weaver framework rapid prototyping high scalability experiments replication possibility of monitoring the status of the network Experiments exploit the Movie-lens Data Set A DataBase of movies used for user recommendations Includes 1 millions ratings for 3900 movies by 6040 users Each user rates a few movies a very sparse matrix of preferences a complex scenario to build communities User profile: movies seen by the corresponding user

GOSSIP PROTOCOLS COMPARISON Vicinity, TwinFinder two similar gossip algorithm based on interest similarity Reports average similarity (Jaccard) between a peer and its neighbourhood

GOSSIP PROTOCOLS COMPARISON Twinfinder obtains the smallest deviation, i.e. a more uniform overlay with respect to similarity

COVERAGE EVALUATION Goal of the test: measuring the ability of gossip protocols to provide recommendations to interested communities Coverage: percentage of film genres which may be recommended to a user by its neighbours

COVERAGE EVALUATION Coverage measure for different network sizes Best result for networks including high number of nodes

CONCLUSIONS Definition of a gossip algorithm for the clustering of peers characterized by similar interests Creation of communities characterized by similar interests Spreading of recommendations Future work: definition of a distributed algorithm to detect a profile characterizing a community of peers distributed voting algorithms to detect the 'most representative peer' of a community investigating the structured overlays (DHT) approach to register community profiles LSH (Locality Sensitive Hashing) to support the search of community with profiles similar to that of a given user