CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Size: px
Start display at page:

Download "CS246: Mining Massive Datasets Jure Leskovec, Stanford University"

Transcription

1 CS6: Mining Massive Datasets Jure Leskovec, Stanford University

2 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets Training data 100 million ratings, 80,000 users, 17,770 movies 6 years of data: Test data Last few ratings of each user (.8 million) Evaluation criterion: Root Mean Square Error (RMSE) Netflix Cinematch RMSE: Competition 700+ teams $1 million prize for 10% improvement on Cinematch

3 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 17,700 movies 80,000 users

4 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 80,000 users 17,700 movies ??? 1?? 1 Test Data Set SSE = r uu r uu (i,u) R

5 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 5 Basically the winner of the Netflix Challenge Multi-scale modeling of the data: Combine top level, regional modeling of the data, with a refined, local view: Global: Overall deviations of users/movies Factorization: Addressing regional effects CF (k-nn): Extract local patterns Global effects Factorization CF/NN

6 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 6 Global: Mean movie rating:.7 stars The Sixth Sense is 0.5 stars above avg. Joe rates 0. stars below avg. Baseline estimation: Joe will rate The Sixth Sense stars Local neighborhood (CF/NN): Joe didn t like related movie Signs Final estimate: Joe will rate The Sixth Sense.8 stars

7 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 7 Earliest and most popular collaborative filtering method Derive unknown ratings from those of similar movies (item-item variant) Define similarity measure s ij of items i and j Select k-nearest neighbors, compute the rating N(i; u): items most similar to i that were rated by u rˆ ui = j N ( i; u) s j N ( i; u) ij s r ij uj s ij similarity of items i and j r uj rating of user u on item j N(i;u) set of similar items

8 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 8 In practice we get better estimates if we model deviations: ^ r ui ( ) s r b (; ) ij uj uj = b + j N iu ui s j N(; iu) ij baseline estimate for r ui μ = overall mean rating b u = rating deviation of user u = avg. rating of user u μ b i = avg. rating of movie i μ Problems: 1) Similarity measures are arbitrary ) Pairwise similarities neglect interdependencies among neighbors ) Taking a weighted average is restricting

9 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 9 Use a weighted sum rather than weighted avg.: ( ) ^ r = b + w j N(; iu) r b How to set w ij? ui ui ij uj uj Allow: Remember, error metric is SSE: r uu r uu (i,u) R Find w ij that minimize SSE on training data! min v r vv b vi + w ii r vj b vj w j N i;v Models relationships between item i and its neighbors j w ij can be learnt through gradient decent based on u and all other users v that rated i j N(; iu) w ij 1

10 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 10 Find w ij that minimize SSE on training data! min r vv b vi + w ii r vj b vj w v Gradient decent j N i;v Iterate until convergence: w w - η w Where: η learning rate w ii = r vv b vv + w ik r vk b vk v k N i;v for j N i; v (else w ii = 0) r vv b vv

11 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 11 So far: Weights w ij derived based on their role; no use of an arbitrary similarity measure (w ij s ij ) Explicitly account for interrelationships among the neighboring movies Next: Latent factor model Extract regional correlations ( ) r = b + w r b ^ ui ui j N(; iu) ij uj uj Global effects Factorization CF/NN

12 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 1 The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Geared towards males The Lion King The Princess Diaries funny Independence Day Dumb and Dumber

13 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 1 SVD on Netflix data: R Q P T users items P T Q For now let s assume we can approximate the rating matrix R as a product of thin Q P T There are important differences between SVD and the real SVD. We will get to them later

14 How to estimate the missing rating? items users? r ii = q i p u T 5 1 items users P T Q /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 1

15 How to estimate the missing rating? items users? r ii = q i p u T 5 1 items users P T Q /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 15

16 How to estimate the missing rating? items users. 1 5 r ii = q i p u T 5 1 items users P T Q /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 16

17 The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males The Lion King The Princess Diaries Factor funny Independence Day Dumb and Dumber /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 17

18 The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males The Lion King The Princess Diaries Factor funny Independence Day Dumb and Dumber /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 18

19 Remember SVD: A: Input data matrix U: Left singular vecs V: Right singular vecs Σ: Singular values U SVD gives minimum reconstruction error (MSE!) min U,V,Σ ii m A ii UΣV T ii So in our case, SVD on Netflix data: R Q P T A = R, Q = U, P T = Σ V T r ii = q i pt u But, we are not done yet! R has missing entries! n A /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 19 m Σ The sum goes over all entries. Our A/R has missing entries! n V T

20 [Bellkor Team] /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets items SVD isn t defined when entries are missing Use specialized methods to find P, Q min r ii q i pt u P,Q i,u R r ii = q i pt u Don t require cols of P, Q to be orthogonal/unit length P, Q map users/movies to a latent space Q The most popular model among Netflix contestants users P T

21 Want to minimize SSE for test data Idea: Minimize SSE on Training data Want large f (# of factors) to capture all the signals But, test SSE begins to rise for f > Regularization is needed Allow rich model where there are sufficient data Shrink aggressively where data are scarce min P, Q training ( r ui q error λ regularization parameter i p T u ) + λ /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 1 u p u + length i q i ??? 1?? 1

22 The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males min P, Q training ( r ui q p The Princess Diaries i T u ) + λ min factors error + λ length u p u + i q i The Lion King Factor funny Independence Day Dumb and Dumber /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets

23 The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males min P, Q training ( r ui q p The Princess Diaries i T u ) + λ min factors error + λ length u p u + i q i The Lion King Factor funny Independence Day Dumb and Dumber /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets

24 The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males min P, Q training ( r ui q p The Princess Diaries i T u ) + λ min factors error + λ length u p u + i q i The Lion King Factor funny Independence Day Dumb and Dumber /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets

25 The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males min P, Q training ( r ui q p The Princess Diaries i T u ) + λ min factors error + λ length u p u + i q i The Lion King Factor funny Independence Day Dumb and Dumber /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 5

26 Want to find matrices P and Q: T + + ( rui qi pu ) λ pu qi, Q training u i Online stochastic gradient decent: Initialize P and Q (random, using SVD) Then iterate over the ratings and update factors: For each r ui : min P T ε uu = r uu q i p u q i q i + η ε uu p u λ q i p u p u + η ε uu q i λ p u (derivative of the error ) (update equation) (update equation) η learning rate /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 6

27 Koren, Bell, Volinksy, IEEE Computer, 009 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 7

28 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 8 user bias movie bias user-movie interaction Baseline predictor Separates users and movies Benefits from insights into user s behavior Among the main practical contributions of the competition User-Movie interaction Characterizes the matching between users and movies Attracts most research in the field Benefits from algorithmic and mathematical innovations μ = overall mean rating b u = bias of user u b i = bias of movie i

29 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 9 We have expectations on the rating by user u of movie i, even without estimating u s attitude towards movies like i Rating scale of user u Values of other ratings user gave recently (day-specific mood, anchoring, multi-user accounts) (Recent) popularity of movie i Selection bias; related to number of ratings user gave on the same day ( frequency )

30 r uu = μ + b u + b i + q i p u T Overall mean rating Bias for user u Bias for movie i User-Movie interaction Example: Mean rating: µ =.7 You are a critical reviewer: your ratings are 1 star lower than the mean: b u = -1 Star Wars gets a mean rating of 0.5 higher than average movie: b i = Predicted rating for you on Star Wars: = =. /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 0

31 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 1 Solve: min Q, P ( u, i) R ( T r ( + b + b + q p )) µ ui + λ λ Is typically selected via grid-search on a validation set u i i goodness of fit ( ) q + p + b + b i regularization Stochastic gradient decent to find parameters Note: Both biases (b u, b i ) as well as interactions (q i, p u ) are treated as parameters (we estimate them) u u u i

32 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets Global average: User average: Movie average: 1.05 Netflix: Basic Collaborative filtering: 0.9 Collaborative filtering++: 0.91 Latent factors: 0.90 Latent factors+biases: 0.89 Final BellKor: Grand Prize: 0.856

33 CF (no time bias) Basic Latent Factors Latent Factors w/ Biases 0.91 RMSE Millions of parameters /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets

34

35 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 5 Sudden rise in the average movie rating (early 00) Improvements in Netflix GUI improvements Meaning of rating changed Movie age Users prefer new movies without any reasons Older movies are just inherently better than newer ones Y. Koren, Collaborative filtering with temporal dynamics, KDD 09

36 Original model: r ui = µ +b u + b i + q i p u T Add time dependence to biases: r ui = µ +b u (t)+ b i (t) +q i p u T Make parameters b u and b i to depend on time (1) Parameterize time-dependence by linear trends () Each bin corresponds to 10 consecutive weeks Add temporal dependence to factors p u (t) user preference vector on day t Y. Koren, Collaborative filtering with temporal dynamics, KDD 09 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 6

37 RMSE CF (no time bias) Basic Latent Factors CF (time bias) Latent Factors w/ Biases + Linear time factors + Per-day user biases + CF Millions of parameters /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 7

38 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 8 Many options for modeling Variants of the ideas we have seen so far Different numbers of factors Different ways to model time Different ways to handle implicit information Other models (not described here) Nearest-neighbor models Restricted Boltzmann machines Model averaging is useful. Linear model combining

39 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 9

40 June 6 th submission triggers 0-day last call /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 0

41 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 1 Ensemble team formed Group of other teams on leaderboard forms a new team Relies on combining their models Quickly also get a qualifying score over 10% BellKor Continue to get small improvements in their scores Realize that they are in direct competition with Ensemble Strategy Both teams carefully monitoring the leaderboard Only sure way to check for improvement is to submit a set of predictions This alerts the other team of your latest score

42 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets Submissions limited to 1 a day Only 1 final submission could be made in the last h hours before deadline BellKor team member in Austria notices (by chance) that Ensemble posts a score that is slightly better than BellKor s Frantic last hours for both teams Much computer time on final optimization run times carefully calibrated to end about an hour before deadline Final submissions BellKor submits a little early (on purpose), 0 mins before deadline Ensemble submits their final entry 0 mins later.and everyone waits.

43 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets

44 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets

45 /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets 5 Some slides and plots borrowed from Yehuda Koren, Robert Bell and Padhraic Smyth Further reading: Y. Koren, Collaborative filtering with temporal dynamics, KDD

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000 00 Test data Last few ratings of

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets Training data 00 million ratings, 80,000 users, 7,770 movies

More information

COMP 465: Data Mining Recommender Systems

COMP 465: Data Mining Recommender Systems //0 movies COMP 6: Data Mining Recommender Systems Slides Adapted From: www.mmds.org (Mining Massive Datasets) movies Compare predictions with known ratings (test set T)????? Test Data Set Root-mean-square

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University We need your help with our research on human interpretable machine learning. Please complete a survey at http://stanford.io/1wpokco. It should be fun and take about 1min to complete. Thanks a lot for your

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 60 - Section - Fall 06 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Recommender Systems The Long Tail (from: https://www.wired.com/00/0/tail/)

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6 - Section - Spring 7 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Project Project Deadlines Feb: Form teams of - people 7 Feb:

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Customer X Buys Metalica CD Buys Megadeth CD Customer Y Does search on Metalica Recommender system suggests Megadeth

More information

Recommendation and Advertising. Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University)

Recommendation and Advertising. Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Recommendation and Advertising Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Lecture breakdown Part : Advertising Bipartite Matching AdWords Part : Recommendation

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets High dim. data Graph data Infinite data Machine learning

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys Metallica

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /7/0 Jure Leskovec, Stanford CS6: Mining Massive Datasets, http://cs6.stanford.edu High dim. data Graph data Infinite

More information

CS 572: Information Retrieval

CS 572: Information Retrieval CS 7: Information Retrieval Recommender Systems : Implementation and Applications Acknowledgements Many slides in this lecture are adapted from Xavier Amatriain (Netflix), Yehuda Koren (Yahoo), and Dietmar

More information

Recommendation Systems

Recommendation Systems Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich Recommender Systems (RecSys)

More information

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize) CSE 158 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction

More information

Recommender Systems Collabora2ve Filtering and Matrix Factoriza2on

Recommender Systems Collabora2ve Filtering and Matrix Factoriza2on Recommender Systems Collaborave Filtering and Matrix Factorizaon Narges Razavian Thanks to lecture slides from Alex Smola@CMU Yahuda Koren@Yahoo labs and Bing Liu@UIC We Know What You Ought To Be Watching

More information

Machine Learning and Data Mining. Collaborative Filtering & Recommender Systems. Kalev Kask

Machine Learning and Data Mining. Collaborative Filtering & Recommender Systems. Kalev Kask Machine Learning and Data Mining Collaborative Filtering & Recommender Systems Kalev Kask Recommender systems Automated recommendations Inputs User information Situation context, demographics, preferences,

More information

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize) CSE 258 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction

More information

Recommender Systems New Approaches with Netflix Dataset

Recommender Systems New Approaches with Netflix Dataset Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based

More information

Collaborative Filtering Applied to Educational Data Mining

Collaborative Filtering Applied to Educational Data Mining Collaborative Filtering Applied to Educational Data Mining KDD Cup 200 July 25 th, 200 BigChaos @ KDD Team Dataset Solution Overview Michael Jahrer, Andreas Töscher from commendo research Dataset Team

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #16: Recommenda2on Systems

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #16: Recommenda2on Systems CS 6: (Big) Data Management Systems B. Aditya Prakash Lecture #6: Recommendaon Systems Example: Recommender Systems Customer X Buys Metallica CD Buys Megadeth CD Customer Y Does search on Metallica Recommender

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Performance Comparison of Algorithms for Movie Rating Estimation

Performance Comparison of Algorithms for Movie Rating Estimation Performance Comparison of Algorithms for Movie Rating Estimation Alper Köse, Can Kanbak, Noyan Evirgen Research Laboratory of Electronics, Massachusetts Institute of Technology Department of Electrical

More information

CS 124/LINGUIST 180 From Languages to Information

CS 124/LINGUIST 180 From Languages to Information CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of

More information

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data Nicola Barbieri, Massimo Guarascio, Ettore Ritacco ICAR-CNR Via Pietro Bucci 41/c, Rende, Italy {barbieri,guarascio,ritacco}@icar.cnr.it

More information

CptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007

CptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007 CptS 570 Machine Learning Project: Netflix Competition Team: MLSurvivors Parisa Rashidi Vikramaditya Jakkula Wednesday, December 12, 2007 Introduction In current report, we describe our efforts put forth

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

Real-time Recommendations on Spark. Jan Neumann, Sridhar Alla (Comcast Labs) DC Spark Interactive Meetup East May

Real-time Recommendations on Spark. Jan Neumann, Sridhar Alla (Comcast Labs) DC Spark Interactive Meetup East May Real-time Recommendations on Spark Jan Neumann, Sridhar Alla (Comcast Labs) DC Spark Interactive Meetup East May 19 2015 Who am I? Jan Neumann, Lead of Big Data and Content Analysis Research Teams This

More information

Collaborative Filtering for Netflix

Collaborative Filtering for Netflix Collaborative Filtering for Netflix Michael Percy Dec 10, 2009 Abstract The Netflix movie-recommendation problem was investigated and the incremental Singular Value Decomposition (SVD) algorithm was implemented

More information

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem Additive Regression Applied to a Large-Scale Collaborative Filtering Problem Eibe Frank 1 and Mark Hall 2 1 Department of Computer Science, University of Waikato, Hamilton, New Zealand eibe@cs.waikato.ac.nz

More information

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive

More information

CS224W Project: Recommendation System Models in Product Rating Predictions

CS224W Project: Recommendation System Models in Product Rating Predictions CS224W Project: Recommendation System Models in Product Rating Predictions Xiaoye Liu xiaoye@stanford.edu Abstract A product recommender system based on product-review information and metadata history

More information

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University  Infinite data. Filtering data streams /9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

More information

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University {tedhong, dtsamis}@stanford.edu Abstract This paper analyzes the performance of various KNNs techniques as applied to the

More information

THE goal of a recommender system is to make predictions

THE goal of a recommender system is to make predictions CSE 569 FUNDAMENTALS OF STATISTICAL LEARNING 1 Anime Recommer System Exploration: Final Report Scott Freitas & Benjamin Clayton Abstract This project is an exploration of modern recommer systems utilizing

More information

Yelp Recommendation System

Yelp Recommendation System Yelp Recommendation System Jason Ting, Swaroop Indra Ramaswamy Institute for Computational and Mathematical Engineering Abstract We apply principles and techniques of recommendation systems to develop

More information

Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model

Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model Yehuda Koren AT&T Labs Research 180 Park Ave, Florham Park, NJ 07932 yehuda@research.att.com ABSTRACT Recommender systems

More information

Advances in Collaborative Filtering

Advances in Collaborative Filtering Chapter 5 Advances in Collaborative Filtering Yehuda Koren and Robert Bell Abstract The collaborative filtering (CF) approach to recommenders has recently enjoyed much interest and progress. The fact that

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #7: Recommendation Content based & Collaborative Filtering Seoul National University In This Lecture Understand the motivation and the problem of recommendation Compare

More information

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies. CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct

More information

Advances in Collaborative Filtering

Advances in Collaborative Filtering Advances in Collaborative Filtering Yehuda Koren and Robert Bell 1 Introduction Collaborative filtering (CF) methods produce user specific recommendations of items based on patterns of ratings or usage

More information

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth Under The Guidance of Dr. Richard Maclin Outline Problem Statement Background Proposed Solution Experiments & Results Related Work Future

More information

Using Social Networks to Improve Movie Rating Predictions

Using Social Networks to Improve Movie Rating Predictions Introduction Using Social Networks to Improve Movie Rating Predictions Suhaas Prasad Recommender systems based on collaborative filtering techniques have become a large area of interest ever since the

More information

Factor in the Neighbors: Scalable and Accurate Collaborative Filtering

Factor in the Neighbors: Scalable and Accurate Collaborative Filtering 1 Factor in the Neighbors: Scalable and Accurate Collaborative Filtering YEHUDA KOREN Yahoo! Research Recommender systems provide users with personalized suggestions for products or services. These systems

More information

Collaborative Filtering with Temporal Dynamics

Collaborative Filtering with Temporal Dynamics Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo! Research, Haifa, Israel yehuda@yahoo-inc.com ABSTRACT Customer preferences for products are drifting over time. Product perception and

More information

Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation

Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation Kiran Kannar A53089098 kkannar@eng.ucsd.edu Saicharan Duppati A53221873 sduppati@eng.ucsd.edu Akanksha Grover A53205632 a2grover@eng.ucsd.edu

More information

CS294-1 Assignment 2 Report

CS294-1 Assignment 2 Report CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The

More information

Recommender System Optimization through Collaborative Filtering

Recommender System Optimization through Collaborative Filtering Recommender System Optimization through Collaborative Filtering L.W. Hoogenboom Econometric Institute of Erasmus University Rotterdam Bachelor Thesis Business Analytics and Quantitative Marketing July

More information

Sparse Estimation of Movie Preferences via Constrained Optimization

Sparse Estimation of Movie Preferences via Constrained Optimization Sparse Estimation of Movie Preferences via Constrained Optimization Alexander Anemogiannis, Ajay Mandlekar, Matt Tsao December 17, 2016 Abstract We propose extensions to traditional low-rank matrix completion

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest es Neighbor Approach Jeff Howbert Introduction to Machine Learning Winter 2012 1 Bad news Netflix Prize data no longer available to public. Just after contest t ended d

More information

Progress Report: Collaborative Filtering Using Bregman Co-clustering

Progress Report: Collaborative Filtering Using Bregman Co-clustering Progress Report: Collaborative Filtering Using Bregman Co-clustering Wei Tang, Srivatsan Ramanujam, and Andrew Dreher April 4, 2008 1 Introduction Analytics are becoming increasingly important for business

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

Improved Neighborhood-based Collaborative Filtering

Improved Neighborhood-based Collaborative Filtering Improved Neighborhood-based Collaborative Filtering Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park Ave, Florham Park, NJ 07932 {rbell,yehuda}@research.att.com ABSTRACT Recommender systems

More information

Web Personalisation and Recommender Systems

Web Personalisation and Recommender Systems Web Personalisation and Recommender Systems Shlomo Berkovsky and Jill Freyne DIGITAL PRODUCTIVITY FLAGSHIP Outline Part 1: Information Overload and User Modelling Part 2: Web Personalisation and Recommender

More information

COSC6376 Cloud Computing Homework 1 Tutorial

COSC6376 Cloud Computing Homework 1 Tutorial COSC6376 Cloud Computing Homework 1 Tutorial Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Outline Homework1 Tutorial based on Netflix dataset Homework 1 K-means

More information

Assignment 5: Collaborative Filtering

Assignment 5: Collaborative Filtering Assignment 5: Collaborative Filtering Arash Vahdat Fall 2015 Readings You are highly recommended to check the following readings before/while doing this assignment: Slope One Algorithm: https://en.wikipedia.org/wiki/slope_one.

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Machine Learning 2015-II Universidad Nacional de Colombia F. González NMF for MM IR ML 2015-II 1 / 54 Outline 1 The

More information

BBS654 Data Mining. Pinar Duygulu

BBS654 Data Mining. Pinar Duygulu BBS6 Data Mining Pinar Duygulu Slides are adapted from J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Mustafa Ozdal Example: Recommender Systems Customer X Buys Metallica

More information

Computational Intelligence Meets the NetFlix Prize

Computational Intelligence Meets the NetFlix Prize Computational Intelligence Meets the NetFlix Prize Ryan J. Meuth, Paul Robinette, Donald C. Wunsch II Abstract The NetFlix Prize is a research contest that will award $1 Million to the first group to improve

More information

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017 CPSC 340: Machine Learning and Data Mining Recommender Systems Fall 2017 Assignment 4: Admin Due tonight, 1 late day for Monday, 2 late days for Wednesday. Assignment 5: Posted, due Monday of last week

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Recommender Systems II Instructor: Yizhou Sun yzsun@cs.ucla.edu May 31, 2017 Recommender Systems Recommendation via Information Network Analysis Hybrid Collaborative Filtering

More information

Predicting Popular Xbox games based on Search Queries of Users

Predicting Popular Xbox games based on Search Queries of Users 1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which

More information

Singular Value Decomposition, and Application to Recommender Systems

Singular Value Decomposition, and Application to Recommender Systems Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation

More information

Hybrid Recommendation Models for Binary User Preference Prediction Problem

Hybrid Recommendation Models for Binary User Preference Prediction Problem JMLR: Workshop and Conference Proceedings 18:137 151, 2012 Proceedings of KDD-Cup 2011 competition Hybrid Recommation Models for Binary User Preference Prediction Problem Siwei Lai swlai@nlpr.ia.ac.cn

More information

Seminar Collaborative Filtering. KDD Cup. Ziawasch Abedjan, Arvid Heise, Felix Naumann

Seminar Collaborative Filtering. KDD Cup. Ziawasch Abedjan, Arvid Heise, Felix Naumann Seminar Collaborative Filtering KDD Cup Ziawasch Abedjan, Arvid Heise, Felix Naumann 2 Collaborative Filtering Recommendation systems 3 Recommendation systems 4 Recommendation systems 5 Recommendation

More information

Achieving Better Predictions with Collaborative Neighborhood

Achieving Better Predictions with Collaborative Neighborhood Achieving Better Predictions with Collaborative Neighborhood Edison Alejandro García, garcial@stanford.edu Stanford Machine Learning - CS229 Abstract Collaborative Filtering (CF) is a popular method that

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/12/2014 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 3/12/2014 Jure

More information

Recommender Systems 6CCS3WSN-7CCSMWAL

Recommender Systems 6CCS3WSN-7CCSMWAL Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:

More information

CSE 546 Machine Learning, Autumn 2013 Homework 2

CSE 546 Machine Learning, Autumn 2013 Homework 2 CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab Recommender System What is it? How to build it? Challenges R package: recommenderlab 1 What is a recommender system Wiki definition: A recommender system or a recommendation system (sometimes replacing

More information

Towards a hybrid approach to Netflix Challenge

Towards a hybrid approach to Netflix Challenge Towards a hybrid approach to Netflix Challenge Abhishek Gupta, Abhijeet Mohapatra, Tejaswi Tenneti March 12, 2009 1 Introduction Today Recommendation systems [3] have become indispensible because of the

More information

Music Recommendation with Implicit Feedback and Side Information

Music Recommendation with Implicit Feedback and Side Information Music Recommendation with Implicit Feedback and Side Information Shengbo Guo Yahoo! Labs shengbo@yahoo-inc.com Behrouz Behmardi Criteo b.behmardi@criteo.com Gary Chen Vobile gary.chen@vobileinc.com Abstract

More information

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu 1 Introduction Yelp Dataset Challenge provides a large number of user, business and review data which can be used for

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

Variational Bayesian PCA versus k-nn on a Very Sparse Reddit Voting Dataset

Variational Bayesian PCA versus k-nn on a Very Sparse Reddit Voting Dataset Variational Bayesian PCA versus k-nn on a Very Sparse Reddit Voting Dataset Jussa Klapuri, Ilari Nieminen, Tapani Raiko, and Krista Lagus Department of Information and Computer Science, Aalto University,

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Bioingenium Research Group Computer Systems and Industrial Engineering Department Universidad Nacional de Colombia

More information

Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize

Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize Extension Study on Item-Based P-Tree Collaborative Filtering Algorithm for Netflix Prize Tingda Lu, Yan Wang, William Perrizo, Amal Perera, Gregory Wettstein Computer Science Department North Dakota State

More information

Multiple-Choice Questionnaire Group C

Multiple-Choice Questionnaire Group C Family name: Vision and Machine-Learning Given name: 1/28/2011 Multiple-Choice naire Group C No documents authorized. There can be several right answers to a question. Marking-scheme: 2 points if all right

More information

Document Information

Document Information Horizon 2020 Framework Programme Grant Agreement: 732328 FashionBrain Document Information Deliverable number: D5.3 Deliverable title: Early Demo for Trend Prediction Deliverable description: This early

More information

Hotel Recommendation Based on Hybrid Model

Hotel Recommendation Based on Hybrid Model Hotel Recommendation Based on Hybrid Model Jing WANG, Jiajun SUN, Zhendong LIN Abstract: This project develops a hybrid model that combines content-based with collaborative filtering (CF) for hotel recommendation.

More information

Collaborative filtering models for recommendations systems

Collaborative filtering models for recommendations systems Collaborative filtering models for recommendations systems Nikhil Johri, Zahan Malkani, and Ying Wang Abstract Modern retailers frequently use recommendation systems to suggest products of interest to

More information

Introduction. Chapter Background Recommender systems Collaborative based filtering

Introduction. Chapter Background Recommender systems Collaborative based filtering ii Abstract Recommender systems are used extensively today in many areas to help users and consumers with making decisions. Amazon recommends books based on what you have previously viewed and purchased,

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

arxiv: v4 [cs.ir] 28 Jul 2016

arxiv: v4 [cs.ir] 28 Jul 2016 Review-Based Rating Prediction arxiv:1607.00024v4 [cs.ir] 28 Jul 2016 Tal Hadad Dept. of Information Systems Engineering, Ben-Gurion University E-mail: tah@post.bgu.ac.il Abstract Recommendation systems

More information

3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today

3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today 3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

More information

Clustering-Based Personalization

Clustering-Based Personalization Western University Scholarship@Western Electronic Thesis and Dissertation Repository September 2015 Clustering-Based Personalization Seyed Nima Mirbakhsh The University of Western Ontario Supervisor Dr.

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering

BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering BordaRank: A Ranking Aggregation Based Approach to Collaborative Filtering Yeming TANG Department of Computer Science and Technology Tsinghua University Beijing, China tym13@mails.tsinghua.edu.cn Qiuli

More information

Performance of Recommender Algorithms on Top-N Recommendation Tasks

Performance of Recommender Algorithms on Top-N Recommendation Tasks Performance of Recommender Algorithms on Top- Recommendation Tasks Paolo Cremonesi Politecnico di Milano Milan, Italy paolo.cremonesi@polimi.it Yehuda Koren Yahoo! Research Haifa, Israel yehuda@yahoo-inc.com

More information

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as

More information

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp Sumedh Sawant sumedh@stanford.edu Team 38 December 10, 2013 Abstract We implement a personal recommendation

More information

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.

More information

Lecture 26: Missing data

Lecture 26: Missing data Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:

More information

Personalize Movie Recommendation System CS 229 Project Final Writeup

Personalize Movie Recommendation System CS 229 Project Final Writeup Personalize Movie Recommendation System CS 229 Project Final Writeup Shujia Liang, Lily Liu, Tianyi Liu December 4, 2018 Introduction We use machine learning to build a personalized movie scoring and recommendation

More information

Information Networks: PageRank

Information Networks: PageRank Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the

More information