Data Mining Techniques CS 60 - Section - Fall 06 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6)
Recommender Systems
The Long Tail (from: https://www.wired.com/00/0/tail/)
The Long Tail (from: https://www.wired.com/00/0/tail/)
The Long Tail (from: https://www.wired.com/00/0/tail/)
Problem Setting
Problem Setting
Problem Setting
Problem Setting Task: Predict user preferences for unseen items
Content-based Filtering serious The Color Purple Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s Lethal Weapon Geared towards males Dave The Princess Diaries The Lion King Independence Day Gus Dumb and Dumber escapist
Content-based Filtering serious The Color Purple Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s Lethal Weapon Geared towards males Dave The Princess Diaries The Lion King Independence Day Gus Dumb and Dumber escapist Idea: Predict rating using item features on a per-user basis
Content-based Filtering serious The Color Purple Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s Lethal Weapon Geared towards males Dave The Princess Diaries The Lion King Independence Day Gus Dumb and Dumber escapist Idea: Predict rating using user features on a per-item basis
Collaborative Filtering # # # Joe # Idea: Predict rating based on similarity to other users
Problem Setting Task: Predict user preferences for unseen items Content-based filtering: Model user/item features Collaborative filtering: Implicit similarity of users items
Recommender Systems Movie recommendation (Netflix) Related product recommendation (Amazon) Web page ranking (Google) Social recommendation (Facebook) News content recommendation (Yahoo) Priority inbox & spam filtering (Google) Online dating (OK Cupid) Computational Advertising (Everyone)
Challenges Scalability Millions of objects 00s of millions of users Cold start Changing user base Changing inventory Imbalanced dataset User activity / item reviews power law distributed Ratings are not missing at random
Running Example: Netflix Data Training data Test data user movie date score user movie date score /7/0 6 /6/0? 8//0 96 9//0? /6/0 7 8/8/0? //0 //0? 768 7//0 7 6//0? 76 //0 8//0? 8//00 9//00? 68 9/0/0 8 8/7/0? //0 9 //0? /8/00 7 7/6/0? 6 76 8//0 6 69 //0? 6 6 6//0 6 8 0//0? Released as part of $M competition by Netflix in 006 Prize awarded to BellKor in 009
Running Yardstick: RMSE rmse(s) = s S X (ˆr ui r ui ) (i,u)s
Running Yardstick: RMSE rmse(s) = s S X (i,u)s (ˆr ui r ui ) (doesn t tell you how to actually do recommendation)
Ratings aren t everything Netflix then Netflix now
Content-based Filtering
Item-based Features
Item-based Features
Item-based Features
Per-user Regression Learn a set of regression coefficients for each user w u = argmin w r u Xw
Bias
Bias
Bias Moonrise Kingdom 0. 0.
Bias Moonrise Kingdom 0. 0. Problem: Some movies are universally loved / hated
Bias Moonrise Kingdom 0. 0. Problem: Some movies are universally loved / hated some users are more picky than others
Bias Moonrise Kingdom 0. 0. Problem: Some movies are universally loved / hated some users are more picky than others Solution: Introduce a per-movie and per-user bias
Temporal Effects
Changes in user behavior Netflix changed rating labels 00
Movies get better with time?
Temporal Effects Solution: Model temporal effects in bias not weights
Neighborhood Methods
Neighborhood Based Methods # # # Joe # Users and items form a bipartite graph (edges are ratings)
Neighborhood Based Methods (user, user) similarity predict rating based on average from k-nearest users good if item base is smaller than user base good if item base changes rapidly (item,item) similarity predict rating based on average from k-nearest items good if the user base is small good if user base changes rapidly
Parzen-Window Style CF ˆr ui = b ui + P js k (i,u) s ij(r uj b uj ) P js k (i,u) s ij b ui = µ + b u + b i Define a similarity sij between items Find set sk(i,u) of k-nearest neighbors to i that were rated by user u Predict rating using weighted average over set How should we define sij?
Pearson Correlation Coefficient User ratings for item i:??????????? User ratings for item j:??????????? s ij = Cov[r ui,r uj ] Std[r ui ]Std[r uj ]
(item,item) similarity Empirical estimate of Pearson correlation coefficient P uu(i,j) (r ui b ui )(r uj b uj ) ˆ ij = q P uu(i,j) (r ui b ui ) P uu(i,j) (r uj b uj ) Regularize towards 0 for small support s ij = U(i, j) U(i, j) + ˆ ij Regularize towards baseline for small neighborhood P js ˆr ui = b ui + k (i,u) s ij(r uj b uj ) + P js k (i,u) s ij
Similarity for binary labels Pearson correlation not meaningful for binary labels (e.g. Views, Purchases, Clicks) Jaccard similarity Observed / Expected ratio s ij = m ij + m i + m j m ij s ij = observed expected m ij + m i m j /m m i users acting on i m ij users acting on both i and j m total number of users
Matrix Factorization Methods
Matrix Factorization Moonrise Kingdom 0. 0.
Matrix Factorization Moonrise Kingdom 0. 0. Idea: pose as (biased) matrix factorization problem
Matrix Factorization items. -....6 -... -.... -. -.7..7 - -.9... -..8 -. -.. -... -.. -.7.9. -....7 -.8. -.6.7.8. -..9..7.6 -.. ~ ~ items users users A rank- SVD approximation
Prediction items. -....6 -... -.... -. -.7..7 - -.9... -..8 -. -.. -... -.. -.7.9. -....7 -.8. -.6.7.8. -..9..7.6 -.. ~ ~ items users A rank- SVD approximation users?
Prediction items. -....6 -... -.... -. -.7..7 - -.9... -..8 -. -.. -... -.. -.7.9. -....7 -.8. -.6.7.8. -..9..7.6 -.. ~ ~ items users. A rank- SVD approximation users
SVD with missing values. -....6 -... -.... -. -.7..7 - -.9... -..8 -. -.. -... -.. -.7.9. -....7 -.8. -.6.7.8. -..9..7.6 -.. ~ Pose as regression problem Regularize using Frobenius norm
Alternating Least Squares. -....6 -... -.... -. -.7..7 - -.9... -..8 -. -.. -... -.. -.7.9. -....7 -.8. -.6.7.8. -..9..7.6 -.. ~ (regress wu given X)
Alternating Least Squares. -.. -..6. ~ -.. -.7..... -. -.8. -..7 -....6...7 -.. -. -.9.8. -. -..9.. -.7.8...7. -. -.6 -.9.. -.7. (regress wu given X) L: closed form solution w =(X T X + I) X T y Remember ridge regression?
Alternating Least Squares. -....6 -... -.... -. -.7..7 - -.9... -..8 -. -.. -... -.. -.7.9. -....7 -.8. -.6.7.8. -..9..7.6 -.. ~ (regress xi given W) (regress wu given X)
Stochastic Gradient Descent. -.. -..6. ~ -.. -.7..... -. -.8. -..7 -....6...7 -.. -. -.9.8. -. -..9.. -.7.8...7. -. -.6 -.9.. -.7. No need for locking Multicore updates asynchronously (Recht, Re, Wright, 0 - Hogwild)
Netflix Prize
Netflix Prize Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000-00 Test data Last few ratings of each user (.8 million) Evaluation criterion: Root Mean Square Error (RMSE) Competition,700+ teams Netflix s system RMSE: 0.9 $ million prize for 0% improvement on Netflix
Improvements RMSE 0.9 0.90 0.9 0.89 0.89 0.88 0.88 Factor models: Error vs. #parameters 0 0 60 90 880 0 00 00 0 00 00 00 00 Add biases 00 00 00 00 0 00 00 00 000 00 NMF BiasSVD SVD++ SVD v. SVD v. SVD v. 0.87 0 00 000 0000 00000 Millions of Parameters Do SGD, but also learn biases μ, bu and bi
Improvements RMSE 0.9 0.90 0.9 0.89 0.89 0.88 0.88 Factor models: Error vs. #parameters 0 0 60 90 880 0 00 00 0 00 00 00 00 who rated what 00 00 00 00 0 00 00 00 000 00 NMF BiasSVD SVD++ SVD v. SVD v. SVD v. 0.87 0 00 000 0000 00000 Millions of Parameters Account for fact that ratings are not missing at random.
Improvements 0.9 0.90 0.9 Factor models: Error vs. #parameters 0 60 90 880 0 00 00 NMF BiasSVD SVD++ RMSE 0.89 0.89 0.88 0.88 0 00 0 00 00 00 00 temporal effects 00 00 00 0 00 00 00 000 00 SVD v. SVD v. SVD v. 0.87 0 00 000 0000 00000 Millions of Parameters Account for drift in user and item biases
Improvements 0.9 0.90 0.9 Factor models: Error vs. #parameters 0 60 90 880 0 00 00 NMF BiasSVD SVD++ RMSE 0.89 0.89 0.88 0.88 0 00 0 00 00 00 00 temporal effects 00 00 00 0 00 00 00 000 00 SVD v. SVD v. SVD v. 0.87 0 00 000 0000 00000 Millions of Parameters Still pretty far from 0.86 grand prize
Winning Solution from BellKor
Last 0 days June 6 th submission triggers 0-day last call
Last 0 days June 6 th submission triggers 0-day last call
BellKor fends off competitors by a hair
BellKor fends off competitors by a hair