Using Social Networks to Improve Movie Rating Predictions

Similar documents
Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS224W Project: Recommendation System Models in Product Rating Predictions

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Data Mining Techniques

Introduction to Data Mining

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp

Data Mining Techniques

Introduction. Chapter Background Recommender systems Collaborative based filtering

Recommender Systems New Approaches with Netflix Dataset

Recommendation Systems

Performance Comparison of Algorithms for Movie Rating Estimation

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

CS 124/LINGUIST 180 From Languages to Information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

CS 124/LINGUIST 180 From Languages to Information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Part 11: Collaborative Filtering. Francesco Ricci

Collaborative Filtering Applied to Educational Data Mining

Collaborative Filtering for Netflix

CS 124/LINGUIST 180 From Languages to Information

Jeff Howbert Introduction to Machine Learning Winter

COMP 465: Data Mining Recommender Systems

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Singular Value Decomposition, and Application to Recommender Systems

CS249: ADVANCED DATA MINING

Towards a hybrid approach to Netflix Challenge

Machine Learning and Data Mining. Collaborative Filtering & Recommender Systems. Kalev Kask

Privacy Preserving Collaborative Filtering

Collaborative Filtering and Recommender Systems. Definitions. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm

On hybrid modular recommendation systems for video streaming

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Progress Report: Collaborative Filtering Using Bregman Co-clustering

BBS654 Data Mining. Pinar Duygulu

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem

Movie Recommender System - Hybrid Filtering Approach

CptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007

Countering Sparsity and Vulnerabilities in Reputation Systems

Performance of Alternating Least Squares in a distributed approach using GraphLab and MapReduce

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Improved Neighborhood-based Collaborative Filtering

Sampling PCA, enhancing recovered missing values in large scale matrices. Luis Gabriel De Alba Rivera 80555S

Announcements. Image Matching! Source & Destination Images. Image Transformation 2/ 3/ 16. Compare a big image to a small image

Collaborative Filtering based on User Trends

Part 11: Collaborative Filtering. Francesco Ricci

Available online at ScienceDirect. Procedia Technology 17 (2014 )

CSE 454 Final Report TasteCliq

Recommender Systems 6CCS3WSN-7CCSMWAL

BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering

Recommender System Optimization through Collaborative Filtering

Mining Web Data. Lijun Zhang

Variational Bayesian PCA versus k-nn on a Very Sparse Reddit Voting Dataset

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

Supervised Random Walks

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett

Disclaimer. Lect 2: empirical analyses of graphs

Web Personalization & Recommender Systems

Social Interaction Based Video Recommendation: Recommending YouTube Videos to Facebook Users

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Graph Structure Over Time

Matrix Decomposition for Recommendation System

Mining Web Data. Lijun Zhang

Recommender Systems (RSs)

Overcompressing JPEG images with Evolution Algorithms

arxiv: v4 [cs.ir] 28 Jul 2016

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

Community-Based Recommendations: a Solution to the Cold Start Problem

Centroid Decomposition Based Recovery for Segmented Time Series

Project Report. An Introduction to Collaborative Filtering

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Music Recommendation with Implicit Feedback and Side Information

Web Personalization & Recommender Systems

Recommender Systems - Content, Collaborative, Hybrid

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu

Parallel machine learning using Menthor

Predicting User Ratings Using Status Models on Amazon.com

Recommender Systems by means of Information Retrieval

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

The Principle and Improvement of the Algorithm of Matrix Factorization Model based on ALS

Using Data Mining to Determine User-Specific Movie Ratings

A Time-based Recommender System using Implicit Feedback

Supplementary Material

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Recommendation System for Location-based Social Network CS224W Project Report

CONTENTS. 1.1 Recommender System Collaborative filtering 1 2. LITERATURE REVIEW 5 3. NARRATIVE Software Requirements Specification 11

Achieving Better Predictions with Collaborative Neighborhood

Link Prediction for Social Network

JPEG Compression Using MATLAB

International Journal of Advance Engineering and Research Development. A Facebook Profile Based TV Shows and Movies Recommendation System

1-3 Square Roots. Warm Up Lesson Presentation Lesson Quiz. Holt Algebra 2 2

Collaborative Filtering using Euclidean Distance in Recommendation Engine

CS294-1 Assignment 2 Report

Transcription:

Introduction Using Social Networks to Improve Movie Rating Predictions Suhaas Prasad Recommender systems based on collaborative filtering techniques have become a large area of interest ever since the Netflix Prize was announced, in which the system must be able to predict how a user will rate a movie based on the rating histories of that user along with many others. These types of recommendation systems are become ever more popular throughout web- based applications, including e- commerce, news, video streaming, and similar sites. However, it does not appear as though any of these systems have attempted to incorporate a user s social network in determining how they might rate a movie. The collaborative filtering approach attempts to learn past user- item relationships by treating the problem as a large- scale data mining problem. A common collaborative filtering method is the k- nearest neighbor (knn) method in which the user- item preference is determined by looking at the ratings of similar users or items. While most neighborhood based method for the Netflix challenge find the item- based approach a better option, in order to incorporate social networks, the approach outlined in this paper uses user similarity to predict user- item preferences. In addition, several methods for using a user s friend network to weigh user similarities and their results are described. The data for the users ratings and social networks have been provided by Flixster, a social movie platform where users can rate and review movies with their friends. To formally describe the problem, given a set of users, U, movies, M, a set of training examples of the format (user, movie, rating), and a set of user connections (user1, user2), the task is to make a prediction for a rating for a user and movie. Then the social network is represented by the matrix A, where A u1,u2 is 1 if there exists an edge between user u1 and u2 (i.e. u1 and u2 are friends) and 0 otherwise. Moreover the ratings matrix R, holds the user- movie ratings, where R u,m indicates the rating given by user u for movie m, which is undefined if no rating exists in the training set and is on a 1 to 5 scale otherwise. The effectiveness of an algorithm given a training set is based on the root mean square error (RMSE) on a test set: rmse 1 S test (m,u) S test (R m,u P m,u ) 2 where S test is the number of test cases (m,u) in the test set S.

K-Nearest Neighbor The premise of the k- nearest neighbor approach is that similar users will rate items similarly and similar items will be rated similarly; however, we will only be examining the user- user similarities in the following knn implementation. The method assumes that user rating sets are independent of each other, so one might average the ratings of similar users for a movie to predict how a user would rate that movie. Rather than simply average the ratings, knn uses a weighted average, where the weights are given by how similar the users are. Therefore, the predicted rating is given by: v U m k v U m k sim(u,v)r v,m sim(u,v) where U m k is the set of k users most similar to user u that have rated movie m, and sim(u,v) is the similarity measure of users u and v. The two most common similarity measures are the cosine similarity and Pearson s correlation coefficient, both of which are implemented and compared here. Cosine similarity is defined as: sim(u,v) m M ( R u,m ) 2 m M R u,m R v,m ( ) 2 R v,m m M while the closely related Pearson s coefficient ρ ij is defined below: where µ i is the average rating given by user i. The values of both cosine similarity and Pearson s coefficient lie in the range [- 1,1], but can be converted to the [0,1] range if

necessary. The two similarity measures essentially determine how linearly related the two ratings distributions are. Incorporating Social Networks In addition to the traditional knn, in which the similarity of two users is determined by their ratings histories, one new approach examined involves increasing the similarity if the two users are friends. The premise here is that users who are friends with each other will tend to rate similar movies similarly. Therefore the new similarity s given sim(u,v) from the previous section can be defined as: s sim(u,v) + ( 1 sim(u,v) ) w where w and sim(u,v) lie on the range [0,1] and w 0 if u,v are not friends and > 0 otherwise. This essentially adjusts the similarity closer to 1 if users u and v are friends. Another approach investigated involves interpolating the ratings of a user s extended friends network. Rather than use knn, this method looks only at the ratings of users within three degrees of separation from the user and averages the ratings of those users: 1 F(u) v F(u) R v,m where F(u) is the set of extended friends of user u that have rated movie m. knn Post Processing Some of the modifications for knn that are necessary to improve the RMSE include addressing special cases such as users with no recorded ratings, rounding predictions that are close to an integer, and more. For the case of newcomers, in which a user in the test set has no recorded ratings in the training set, the traditional approach is to use the average rating for that movie, since the similarity measure is useless: 1 U(m) v U(m) where U(m) is the set of users that have rated movie m. However, in this case, we can use the user s friends to gather a set of similar users from which to take an average rating. R v,m

Results Using the Flixster data with 8000 users and 5600 movies, the results of running the knn algorithms with the following parameters are as follows. For the baseline item- based knn, the RMSE seen was approximately 0.989. Using this as a comparison for the user- based method, we see that item- based knn far outperforms any user- based approach. Pearson Coefficient: Cosine Similarity: Pearson /w Friend Weighting of 0.25

As far as can be seen, increasing the weight of friends ratings in knn does not seem to have any significant effect on the RSME. The use of Pearson s coefficient clearly surpasses that of the cosine similarity, and post- processing tricks appear to only slightly increase the RSME. Moreover the approach that interpolates one s friends ratings rather than using knn performs significantly worse with an RSME of 1.48. Conclusion and Future Work While the results show no improvement by taking one s social network into account, the results are not conclusive. The similar results with and without increasing the similarity of users who are friends may be due to the sparseness of the social graph in the data. Given that the average degree of the network is 4.38, the principles of knn would likely overshadow the use of a user s friends ratings. Moreover, Flixster users may not necessarily represent one s true social graph. A better approach would be to use Facebook users who use the Flixster network; however, this data was unavailable due to Facebook restrictions. One of the primary goals for future work would be to construct a data set of only socially active users to see if there is a significant increase in accuracy when testing on these users. Another goal is to try blended models using item- based collaborative filtering methods in addition to these user- based methods that take a user s social graph into account. References 1. J. Golbeck, J Hendler, FilmTrust: Movie Recommendations using Trust in Web- based Social Networks, Proceedings of the IEEE Consumer Communications and Networking Conference, January 2006. 2. Z. Wen, Recommendation System Based on Collaborative Filtering, Stanford CS229 Projects, 2008. 3. T. Hong, D. Tsamis, Use of KNN for the Netflix Prize, CS229 Projects. 2006. 4. Y. Zhou, D. Wilkinson, R. Schreiber, R. Pan. Large- Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM 2008: 337-348.