CompSci 1: Overview CS0. Collaborative Filtering: A Tutorial. Collaborative Filtering. Everyday Examples of Collaborative Filtering...

Similar documents
Collaborative Filtering. William W. Cohen Center for Automated Learning and Discovery Carnegie Mellon University

Overview of this week

Recommender Systems. Collaborative Filtering & Content-Based Recommending

Always start off with a joke, to lighten the mood. Workflows 3: Graphs, PageRank, Loops, Spark

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

Introduction to Data Mining

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

How to organize the Web?

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Distributed Itembased Collaborative Filtering with Apache Mahout. Sebastian Schelter twitter.com/sscdotopen. 7.

Part 11: Collaborative Filtering. Francesco Ricci

COMP6237 Data Mining Making Recommendations. Jonathon Hare

Web Personalization & Recommender Systems

BBS654 Data Mining. Pinar Duygulu

Graphs (Part II) Shannon Quinn

Part 11: Collaborative Filtering. Francesco Ricci

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Jeff Howbert Introduction to Machine Learning Winter

Data Mining Lecture 2: Recommender Systems

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Web Personalization & Recommender Systems

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS 124/LINGUIST 180 From Languages to Information

CS249: ADVANCED DATA MINING

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Mining Web Data. Lijun Zhang

CC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 7: Information Retrieval II. Aidan Hogan

Mining Web Data. Lijun Zhang

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information

Big Data Analytics CSCI 4030

Author(s): Rahul Sami, 2009

Data Mining. Lecture 03: Nearest Neighbor Learning

Based on Raymond J. Mooney s slides

VECTOR SPACE CLASSIFICATION

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

Introduction. Chapter Background Recommender systems Collaborative based filtering

Performance Comparison of Algorithms for Movie Rating Estimation

CISC 4631 Data Mining

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Recommender Systems: User Experience and System Issues

Knowledge Discovery and Data Mining 1 (VO) ( )

Collaborative Filtering using a Spreading Activation Approach

A Recursive Prediction Algorithm for Collaborative Filtering Recommender Systems

Part 1: Link Analysis & Page Rank

Automatic Domain Partitioning for Multi-Domain Learning

Collaborative Filtering and Recommender Systems. Definitions. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

Hybrid Recommendation System Using Clustering and Collaborative Filtering

Information Networks: PageRank

second_language research_teaching sla vivian_cook language_department idl

A Constrained Spreading Activation Approach to Collaborative Filtering

CS570: Introduction to Data Mining

Recommender Systems. Techniques of AI

Influence in Ratings-Based Recommender Systems: An Algorithm-Independent Approach

Ranked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Recap: Project and Practicum CS276B. Recommendation Systems. Plan for Today. Sample Applications. What do RSs achieve? Given a set of users and items

CSE 158, Fall 2017: Midterm

Recommender Systems (RSs)

Recommendation Systems

A Time-based Recommender System using Implicit Feedback

Recommender Systems 6CCS3WSN-7CCSMWAL

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Semi-Supervised Clustering with Partial Background Information

Privacy-Preserving Collaborative Filtering using Randomized Perturbation Techniques

Network-based recommendation: Using graph structure in user-product rating networks to generate product recommendations

amount of available information and the number of visitors to Web sites in recent years

A comprehensive survey of neighborhood-based recommendation methods

Automatic Summarization

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Neighborhood-Based Collaborative Filtering

Big Data Analytics CSCI 4030

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

EigenRank: A Ranking-Oriented Approach to Collaborative Filtering

Semi-Supervised Learning: Lecture Notes

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp

Author(s): Rahul Sami, 2009

Community-Based Recommendations: a Solution to the Cold Start Problem

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Information Retrieval and Web Search

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Youtube Graph Network Model and Analysis Yonghyun Ro, Han Lee, Dennis Won

Collaborative Filtering: A Comparison of Graph-Based Semi-Supervised Learning Methods and Memory-Based Methods

CS371R: Final Exam Dec. 18, 2017

Information Retrieval Spring Web retrieval

Reducing the Cold-Start Problem by Explicit Information with Mathematical Set Theory in Recommendation Systems Haider Khalid 1, Shengli Wu 2

Web Search. Lecture Objectives. Text Technologies for Data Science INFR Learn about: 11/14/2017. Instructor: Walid Magdy

Scope of Recommenders. Recommender Systems: User Experience and System Issues. Wide Range of Algorithms. About me. Classic Collaborative Filtering

Part I: Data Mining Foundations

Personalized Recommendations using Knowledge Graphs. Rose Catherine Kanjirathinkal & Prof. William Cohen Carnegie Mellon University

SOFIA: Social Filtering for Niche Markets

Chapter 27 Introduction to Information Retrieval and Web Search

Refining searches. Refine initially: query. Refining after search. Explicit user feedback. Explicit user feedback

Transcription:

CompSci 1: Overview CS0 Collaborative Filtering: A Tutorial! Audioscrobbler and last.fm! Collaborative filtering! What is a neighbor?! What is the network? Drawn from tutorial by William W. Cohen Center for Automated Learning and Discovery Carnegie Mellon University HarambeNet Workshop 2006 1 HarambeNet Workshop 2006 2 Collaborative Filtering Everyday Examples of Collaborative Filtering! Goal: predict the utility of an item to a particular user based on a database of user profiles! User profiles contain user preference information! Preference may be explicit or implicit Explicit means that a user votes explicitly on some scale Implicit means that the system interprets user behavior or selections to impute a vote! Problems! Missing data: voting is neither complete nor uniform! Preferences may change over time! Interface issues HarambeNet Workshop 2006 3 HarambeNet Workshop 2006 4

Everyday Examples of Collaborative Filtering Rate it? The Dark Star's crew is on a 20-year mission..but unlike Star Trek the nerves of this crew are frayed to the point of psychosis. Their captain has been killed by a radiation leak that also destroyed their toilet paper. "Don't give me any of that 'Intelligent Life' stuff," says Commander Doolittle when presented with the possibility of alien life. "Find me something I can blow up. HarambeNet Workshop 2006 5 HarambeNet Workshop 2006 6 Everyday Examples of Collaborative Filtering HarambeNet Workshop 2006 7 HarambeNet Workshop 2006 8

Google s PageRank a b c HarambeNet Workshop 2006 9 HarambeNet Workshop 2006 10 a b c pdq pdq.. Inlinks are good (recommendations) Inlinks from a good are better than inlinks from a bad but inlinks from s with many outlinks are not as good Good and bad are relative. Google s PageRank Google s PageRank (Brin & Page, http://www-db.stanford.edu/~backrub/google.html) Imagine a pagehopper that always either Imagine a pagehopper that always either a b c follows a random link, or jumps to random page a b c follows a random link, or jumps to random page a b c pdq pdq.. a b c pdq pdq.. PageRank ranks pages by the amount of time the pagehopper spends on a page: or, if there were many pagehoppers, PageRank is the expected crowd size HarambeNet Workshop 2006 11 HarambeNet Workshop 2006 12

Everyday Examples of Collaborative Filtering BellCore s MovieRecommender! Bestseller lists! Top 40 music lists! The recent returns shelf at the library! Unmarked but well-used paths thru the woods! The printer room at work! Many logs! Read any good books lately?!.! Common insight: personal tastes are correlated:! If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y! especially (perhaps) if Bob knows Alice! Participants sent email to videos@bellcore.com! System replied with a list of 500 movies to rate New participant P sends in rated movies via email! System compares ratings for P to ratings of (a random sample of) previous users! Most similar users are used to predict scores for unrated movies! Empirical Analysis of Predictive Algorithms for Collaborative Filtering Breese, Heckerman, Kadie, UAI98! System returns recommendations in an email message. HarambeNet Workshop 2006 13 HarambeNet Workshop 2006 14 BellCore s MovieRecommender MovieRecommender Goals! Recommending And Evaluating Choices In A Virtual Community Of Use. Will Hill, Larry Stead, Mark Rosenstein and George Furnas, Bellcore; CHI 1995 By virtual community we mean "a group of people who share characteristics and interact in essence or effect only". In other words, people in a Virtual Community influence each other as though they interacted but they do not interact. Thus we ask: "Is it possible to arrange for people to share some of the personalized informational benefits of community involvement without the associated communications costs?" HarambeNet Workshop 2006 15 Recommendations should:! simultaneously ease and encourage rather than replace social processes.should make it easy to participate while leaving in hooks for people to pursue more personal relationships if they wish.! be for sets of people not just individualsmulti-person recommending is often important, for example, when two or more people want to choose a video to watch together.! be from people not a black box machine or so-called "agent".! tell how much confidence to place in them, in other words they should include indications of how accurate they are. HarambeNet Workshop 2006 16

BellCore s MovieRecommender! Participants sent email to videos@bellcore.com! System replied with a list of 500 movies to rate on a 1-10 scale (250 random, 250 popular)! Only subset need to be rated! New participant P sends in rated movies via email! System compares ratings for P to ratings of (a random sample of) previous users! Most similar users are used to predict scores for unrated movies (more later)! System returns recommendations in an email message. Suggested Videos for: John A. Jamus. Your must-see list with predicted ratings: 7.0 "Alien (1979)" 6.5 "Blade Runner" 6.2 "Close Encounters Of The Third Kind (1977)" Your video categories with average ratings: 6.7 "Action/Adventure" 6.5 "Science Fiction/Fantasy" 6.3 "Children/Family" 6.0 "Mystery/Suspense" 5.9 "Comedy" 5.8 "Drama" HarambeNet Workshop 2006 17 HarambeNet Workshop 2006 18 The viewing patterns of 243 viewers were consulted. Patterns of 7 viewers were found to be most similar. Correlation with target viewer: 0.59 viewer-130 (unlisted@merl.com) 0.55 bullert,jane r (bullert@cc.bellcore.com) 0.51 jan_arst (jan_arst@khdld.decnet.philips.nl) 0.46 Ken Cross (moose@denali.ee.cornell.edu) 0.42 rskt (rskt@cc.bellcore.com) 0.41 kkgg (kkgg@athena.mit.edu) 0.41 bnn (bnn@cc.bellcore.com) By category, their joint ratings recommend: Action/Adventure: "Excalibur" 8.0, 4 viewers "Apocalypse Now" 7.2, 4 viewers "Platoon" 8.3, 3 viewers Science Fiction/Fantasy: "Total Recall" 7.2, 5 viewers Children/Family: "Wizard Of Oz, The" 8.5, 4 viewers "Mary Poppins" 7.7, 3 viewers Mystery/Suspense: "Silence Of The Lambs, The" 9.3, 3 viewers Comedy: "National Lampoon's Animal House" 7.5, 4 viewers "Driving Miss Daisy" 7.5, 4 viewers "Hannah and Her Sisters" 8.0, 3 viewers Drama: "It's A Wonderful Life" 8.0, 5 viewers "Dead Poets Society" 7.0, 5 viewers "Rain Man" 7.5, 4 viewers Correlation of predicted ratings with your actual ratings is: 0.64 This number measures ability to evaluate movies accurately for you. 0.15 means low ability. 0.85 means very good ability. 0.50 means fair ability. HarambeNet Workshop 2006 19 BellCore s MovieRecommender! Evaluation:! Withhold 10% of the ratings of each user to use as a test set! Measure correlation between predicted ratings and actual ratings for test-set movie/user pairs HarambeNet Workshop 2006 20

Another key observation: rated movies tend to have positive ratings: i.e., people rate what they watch, and watch what they like Question: Can observation replace explicit rating? HarambeNet Workshop 2006 21 HarambeNet Workshop 2006 22 HarambeNet Workshop 2006 23 HarambeNet Workshop 2006 24

HarambeNet Workshop 2006 25 HarambeNet Workshop 2006 26 HarambeNet Workshop 2006 27 HarambeNet Workshop 2006 28

TrustMail (back) HarambeNet Workshop 2006 29 HarambeNet Workshop 2006 30 Email Filtering! Boykin and Roychowdhury (2004) use social networks derived from email folders to classify messages as spam or not spam! 50% of messages can be classified How does CF actually work? HarambeNet Workshop 2006 31 HarambeNet Workshop 2006 32

Memory-Based Methods (Breese et al, UAI98)! v i,j = vote of user i on item j! I i = items for which user i has voted! Mean vote for i is Computing the weight! K-nearest neighbor # 1 if i $ neighbors( a) w( a, i) = "! 0 else! Pearson correlation coefficient (Resnick 94, Grouplens):! Predicted vote for active user a is weighted sum! Cosine distance (from IR) normalizer weights of n similar users HarambeNet Workshop 2006 33 HarambeNet Workshop 2006 34 Calculating the weight Visualizing Cosine Distance! Cosine with inverse user frequency f i = log(n/n j ), where n is number of users, n j is number of users voting for item j v( a, j) similarity of doc a to doc b = sim( a, b) = "! 2 word i r " v ( a, j') Let A =<, v( a, j), > j' r r r A A Let A' = r = 2 A! v ( a, j') word 1 j' word 2 v( b, j) " j' 2 v ( b, j') = A'! B' doc c doc d doc a word j doc b HarambeNet Workshop 2006 35 word n HarambeNet Workshop 2006 36

Visualizing Cosine Distance distance from user a to user i = Visualizing Cosine Distance Approximating Matrix Multiplication for Pattern Recognition Tasks, Cohen & Lewis, SODA 97 explores connection between cosine distance/inner product and random walks item 1 item 2 item 1 item 2 user a item j user i user a item j user i Suppose user-item links were probabilities of following a link item n Then w(a,i) is probability of a and i meeting Suppose user-item links were probabilities of following a link item n Then w(a,i) is probability of a and i meeting HarambeNet Workshop 2006 37 HarambeNet Workshop 2006 38 Model-based methods! Really what we want is the expected value of the user s vote! Cluster Models Users belong to certain classes in C with common tastes Naive Bayes Formulation Calculate Pr(v i C=c) from training set! Bayesian Network Models Other issues, not addressed much! Combining and weighting different types of information sources! How much is a page link worth vs a link in a newsgroup?! Spamming how to prevent vendors from biasing results?! Efficiency issues how to handle a large community?! What do we measure when we evaluate CF?! Predicting actual rating may be useless!! Example: music recommendations: Beatles, Eric Clapton, Stones, Elton John, Led Zep, the Who,! What s useful and new? for this need model of user s prior knowledge, not just his tastes. Subjectively better recs result from poor distance metrics HarambeNet Workshop 2006 39 HarambeNet Workshop 2006 40