Collaborative Filtering and Recommender Systems. Definitions. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Similar documents
Recommender Systems New Approaches with Netflix Dataset

Introduction to Data Mining

Movie Recommender System - Hybrid Filtering Approach

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Recommender Systems (RSs)

Overview. Lab 5: Collaborative Filtering and Recommender Systems. Assignment Preparation. Data

Jeff Howbert Introduction to Machine Learning Winter

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Using Social Networks to Improve Movie Rating Predictions

Mining Web Data. Lijun Zhang

Document Clustering: Comparison of Similarity Measures

Part 11: Collaborative Filtering. Francesco Ricci

Research Article Novel Neighbor Selection Method to Improve Data Sparsity Problem in Collaborative Filtering

Mining Web Data. Lijun Zhang

Collaborative Filtering using a Spreading Activation Approach

Refining searches. Refine initially: query. Refining after search. Explicit user feedback. Explicit user feedback

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Neighborhood-Based Collaborative Filtering

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

A Time-based Recommender System using Implicit Feedback

Part 11: Collaborative Filtering. Francesco Ricci

Collaborative Filtering Applied to Educational Data Mining

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Collaborative Filtering using Euclidean Distance in Recommendation Engine

The Principle and Improvement of the Algorithm of Matrix Factorization Model based on ALS

Topic 1 Classification Alternatives

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

Recommendation Systems

A comprehensive survey of neighborhood-based recommendation methods

A probabilistic model to resolve diversity-accuracy challenge of recommendation systems

A comprehensive survey of neighborhood-based recommendation methods

The k-means Algorithm and Genetic Algorithm

Recommender Systems - Content, Collaborative, Hybrid

Available online at ScienceDirect. Procedia Technology 17 (2014 )

Link Prediction for Social Network

Knowledge Discovery and Data Mining 1 (VO) ( )

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Distributed Itembased Collaborative Filtering with Apache Mahout. Sebastian Schelter twitter.com/sscdotopen. 7.

Study and Implementation of CHAMELEON algorithm for Gene Clustering

Web Personalization & Recommender Systems

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems

Data Mining Lecture 2: Recommender Systems

Towards a hybrid approach to Netflix Challenge

A PERSONALIZED RECOMMENDER SYSTEM FOR TELECOM PRODUCTS AND SERVICES

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

Real-time Collaborative Filtering Recommender Systems

SOCIAL MEDIA MINING. Data Mining Essentials

BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering

Supervised vs unsupervised clustering

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

COMP6237 Data Mining Making Recommendations. Jonathon Hare

CS570: Introduction to Data Mining

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Fast Contextual Preference Scoring of Database Tuples

CS5670: Computer Vision

Gene Clustering & Classification

Performance Comparison of Algorithms for Movie Rating Estimation

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Review on Techniques of Collaborative Tagging

Inverted Index for Fast Nearest Neighbour

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

NLMF: NonLinear Matrix Factorization Methods for Top-N Recommender Systems

Naïve Bayes for text classification

Query Answering Using Inverted Indexes

Machine Problem 8 - Mean Field Inference on Boltzman Machine

Personalize Movie Recommendation System CS 229 Project Final Writeup

Recommendation System for Netflix

Predicting Gene Function and Localization

Based on Raymond J. Mooney s slides

Supervised Learning Classification Algorithms Comparison

Clustering and Recommending Services based on ClubCF approach for Big Data Application

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp

Web Personalization & Recommender Systems

Statistics Class 25. May 14 th 2012

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Using PageRank in Feature Selection

BBS654 Data Mining. Pinar Duygulu

Contents. Preface to the Second Edition

Variational Bayesian PCA versus k-nn on a Very Sparse Reddit Voting Dataset

Introduction. Chapter Background Recommender systems Collaborative based filtering

Multi-domain Predictive AI. Correlated Cross-Occurrence with Apache Mahout and GPUs

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

Problem 1: Complexity of Update Rules for Logistic Regression

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

CS249: ADVANCED DATA MINING

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

Data Preprocessing. Supervised Learning

IMPROVING SCALABILITY ISSUES USING GIM IN COLLABORATIVE FILTERING BASED ON TAGGING

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Proposing a New Metric for Collaborative Filtering

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS570: Introduction to Data Mining

Transcription:

.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Collaborative Filtering and Recommender Systems Definitions Recommendation generation problem. Given a set of users and their (incomplete) preferences over set of items, find, for each user new items for which they would have high preferences. Users. C = {c,...,c M }: a set of users. Items. S = {s,...,s n }: a set of items. Utility. u : C S R. u(c, s): utility, or rating or preference of user c for item s. Typically, utility function is incomplete. Utility function u(c,s) can also be viewed as a utility matrix u[.,.], where u[i,] = u(c i,s ). Matrix u[] is typically sparse. Problem. The main problem solved by collaborative filtering methods/recommender systems can be phrased in a number of ways: User-based recommendations. Given a user c find items s,... s k (such that u(c,s i ) is undefined), for which c is predicted to have highest utility. Individual ratings. Given a user c and an item s, predict u(c,s). Item-based recommendations. Given an item s, find users c,...,c k, (s.t., c i,s is undefined) for which u(s,c i ) is predicted to be the highest.

Recommender Systems Recommender System: a system, which given C, S and a partial utility function u, solves one or more of the problems of recommendation generation. Content-based recommendation systems: recommend items similar to the ones preferred by the user in the past. Collaborative recommendation systems: recommend items that other users with similar preferences find to be of high utilitiy. Hybrid recommendation systems: combine content-based and collaborative recommendations. Content-based recommendation systems. Content-based recommendation systems use methodology similar to that used in Information Retrieval. These methods will be covered separately. Collaborative Filtering in Recommender Systems Idea. Given c C and s S, estimate u(c,s) based on the known utilities u(c,s) for item s for users C = {c } C. Types. There are two types of collaborative filtering approaches:. Memory-based methods. These methods use different heuristics to construct utility predictions. 2. Model-based methods. These methods use the utility function u to learn a model of a specific type. The model is then used to generate predictions. Memory-based Collaborative Filtering. Aggregation of the utilities. Memory-based collaborative filtering methods aggregrate the known utilities u(c i,s) for item s to predict the utility (rating) of s for user c (user-based aggregation): Similar aggregation exists for items: u(c,s) = aggregate ci Cu(c i,s). u(c,s) = aggregate si Su(c,s i ). Notation. Let s S be some item. As C s we denote the set C s = {c C u(s,c) is defined}, i.e., the set of all users which have an existing rating (utility) for item s. 2

Similarly, for a user c C, S c = {s S u(s,c) is defined}. Notation. vector: Let c C and let S = {s,...,s n }. As u[c] we denote the (sparse) u[c] = (u(c,s ),u(c,s 2 ),...,u(c,s n )). Note. In the computations below, whenever we see a value of u(c,s) that is not defined in the dataset, we assume that its value is 0 in all computations. Mean utility. The most simple collaborative filtering predictor is the mean utility. u(c,s) = C s c i C s u(c i,s). This is a very simplistic prediction, as it ignores various information about current user s preferences that is available to us. Weighted sum. This predictor is one of the most commonly used. It assumes existence of a similarity function sim(.,.) which reports the proximity between utility vectors for two users. u(c,s) = k sim(u[c],u[c ]) u(c,s), c c where k, the normalization factor is typically set to k = c c sim(u[c],u[c ]) u(c,s) = c c sim(u[c],u[c ]) sim(u[c],u[c ]) u(c,s), Weighted sum predictor has one weakness: insensitivity to the fact that different users employ the rating/utility scale differently when reporting their preferences. c c Adusted weighted sum. In predicting the utilities for a specific user, we take into account, the user s approach to rating the items. First, we compute the user s average rating û c : ū c = S c s S c u(c,s ). We then predict u(c,s) for some item s S as follows: u(c,s) = ū c + k c c Here, k is the same normalizing factor as above. sim(u[c],u[c ]) (u(c,s) u c ). 3

N Nearest Neighbors predictors All predictors discussed above can be updated to include only N nearest neighbors of the user c in the comparison. Let C c = {c C rank(sim(u[c],u[c ])) N} be the set of N nearest neighbors to user c using similarity function sim(.,.). Average N nn ranking. u(s,c) = N c C c u(c,s). Weighted N nn sum. u(c,s) = k c C c sim(u[c],u[c ]) u(c,s). k = c C c sim(c,c ). Adusted weighted Nnn sum. u(c,s) = ū c + k c C sim(u[c],u[c ]) (u(c,s) u c ). Similarity Measures Two similarity measures are typically used in collaborative filtering. Pearson Correlation. sim(u[c],u[c ]) = ni= (u(c,s i ) ū c ) (u(c,s i ) u c ) ni= (u(c,s i ) ū c ) 2 n i= (u(c,s i ) u c ) 2. This measure reflects statistical correlation between the two (sparse) vectors of data. Cosine similarity. sim(u[c],u[c ]) = cos(u[c],u[c ]) = u[c] u[c ] u[c] u[c ] = ni= u(c,s i ) u(c,s i ) ni= u(c,s i ) 2 n i= u(c,s i ) 2. Cosine similarity measures the colinearity of the two vectors (it is if the vectors are colinear, and 0 if they are orthogonal). 4

Default Voting. Default voting extends Pearson Correlation similarity measure by substituting a default vote on all items not explicitly rated by the user. sim(u[c], u[c ]) = (n + k)( u(c, s )u(c, s ) + kd 2 ) ( u(c, s i) + kd)( u(c, s ) + kd) ((n + k)( u2 (c, s ) + kd 2 ) ( u(c, s ) + kd) 2 )((n + k)( u2 (c, s ) + kd 2 ) ( u(c, s ) + kd) 2 ) Here, in addition to all items rated by both users c and c, we assign a score of d to k items that neither user has rated. Usually, d is a neutral, or mildly negative score. Inverse User Frequency. Inverse user frequency f of an item s is defined as f = log 2 C s C The more frequently the item is rated, the lower its inverse user frequency is. The idea is that items that are rated infrequently should play more role in identifying similarity between users. The transformed vote v(c,s ) is defined as: v(c,s ) = f u(c,s ). We can transform both the cosine similarity and the Pearson correlation measure, to use transformed votes instead of the original ratings. For cosine similarity the new formula will be: sim(u[c],u[c ]) = v[c] v[c ] v[c] v[c ] = For Pearson correlation, the new formula is: ni= v(c,s i ) v(c,s i ) ni= u(v,s i ) 2 n i= u(v,s i ) 2. sim(u[c],u[c ]) = f f u(c,s )u(c,s ) ( f u(c,s ))( f u(c,s )) UV where U = V = f f u 2 (c,s ) f f u 2 (c,s ) 2 f u(c,s ) ; 2 f u(c,s ). 5

Case Amplification Case amplification is a technique that modifies the ratings u(c, s) before they are used in collaborative filtering methods. The amplification usually is applied for u(c, s) [, ] and it rewards weights closest to and punishes negative weights (ratings). A typical case amplification scheme is: u amp (c,s) = u ρ (c,s), if u(c,s) 0; u amp (c,s) = ( u ρ (c,s)), if u(c,s) < 0; The u amp (c,s) ratings are then used in computations. References [] G. Adomavicius, A. Tuzhilin. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions, IEEE Transactions on Knowledge and Data Engineering, vol. 7, No. 6, June 2005. 6