Recommendation Algorithms: Collaborative Filtering CSE 6111 Presentation Advanced Algorithms Fall. 2013 Presented by: Farzana Yasmeen 2013.11.29
Contents What are recommendation algorithms? Recommendations Recommender Systems Recommender Algorithms Focus: Collaborative Filtering Nearest Neighbour Conclusions Pros and Cons Discussion 2
Recommendations What are recommendations? Alternative to search Relevant content : movies, online shopping, radio stations. Customer is looking for a product Receive tips Receive personal offerings 3
Recommendations Benefits? 4
Recommender Systems What are recommender systems? predict the opinion of users based on prior knowledge 3 Display recommendations 1 Data collection and processing 4 Self-learning & improving capabilities 2 Relevance & preference ordering Models Recommender System 5
Recommendation Algorithms (1) The best fitting algorithms are selected after careful analysis of the data to the given recommendation problem and the corresponding optimization task. given (input) = recommendation problem (task) = corresponding optimization required (output) = a recommendation 6
Recommendation Algorithms (2) Content-based Filtering (CBF): item triggered (user, item metadata) keyword matching key problem: learn and apply cross-content (i.e. decision trees) Collaborative Filtering (CF): event triggered (vod purchase, live channel watching) finds similarities on users and items (vod content, live schedule) key problem: how to combine and weight the preferences of user neighbors (i.e. nearest neighbor) Hybrid Model: combining the watching and searching habits of similar users (collaborative filtering) offering content with shared characteristics that a user has rated highly (content-based filtering). 7
Content-based Filtering User s Favorite Movies A user Already Saw it... Nothing Special... Well because you saw most of the major horror movies, here is minor horror movies
Collaborative Filtering
Collaborative Filtering
Collaborative Filtering
Collaborative Filtering
Collaborative Filtering
Collaborative Filtering CF Algorithms
Recommender Space Ratings or Vote data = m x n sparse binary matrix n columns = products, e.g., books for purchase or movies for viewing m rows = users Interpretation: Implicit Ratings: v(i,j) = user i s rating of product j (e.g. on a scale of 1 to 5) Explicit Purchases: v(i,j) = 1 if user i purchased product j entry = 0 if no purchase or rating Implicit Ratings
The Recommender Space as a Bipartite Graph Links derived from similar attributes, explicit connections Users Items User-User Links Item-Item Links Observed preferences (Ratings, purchases, page views, play lists, bookmarks, etc) Links derived from similar attributes, similar content, explicit cross references
Near-Neighbor Algorithms for Collaborative Filtering r i,k = rating of user i on item k I i = items for which user i has generated a rating Mean rating for user i is Predicted vote for user i on item j is a weighted sum Normalization constant (e.g., total sum of weights) weights of K similar users Value of K can be optimized on a validation data set
K-nearest neighbor Near-Neighbor Weighting Pearson correlation coefficient (Resnick 94, Grouplens): Sums are over items rated by both users
How do I find someone similar Manhattan Distance
How do I find someone similar N dim Euclidean Distance Minkowski
How do I find someone similar N dim
How do I find someone similar N dim
Problem 1: entry matching
Problem 2 : blame the users
Pearson s Correlation Co-efficient
Pearson s Correlation Co-efficient
Pearson s Correlation Co-efficient
Pearson s Correlation Co-efficient (1 to -1)
Pearson s Correlation Co-efficient (1 to -1) Approximation: (x) (y)
Pearson s Correlation Co-efficient (1 to -1)
What to Use?
Comments on Neighbor-based Methods Here we emphasized user-user similarity Can also do this with item-item similarity, i.e., Find similar items (across users) to the item we need a rating for Simple and intuitive Easy to provide the user with explanations of recommendations Computational + Issues In theory we need to calculate all n 2 pairwise weights So scalability is an issue (e.g., real-time) Cold start data sparsity For recent advances in neighbor-based approaches see Y. Koren, Factor in the neighbors: scalable and accurate collaborative filtering, ACM Transactions on Knowledge Discovery in Data, 2010
Technology thrives simply because we create the supply as opposed to the demand first 33
References Corporate presentations of Random company executives Lots of Google ing and YouTube Wikipedia Programmers Guide to Data Mining 34
s 35