Recommender Systems Collabora2ve Filtering and Matrix Factoriza2on

Recommender Systems Collaborave Filtering and Matrix Factorizaon Narges Razavian Thanks to lecture slides from Alex Smola@CMU Yahuda Koren@Yahoo labs and Bing Liu@UIC

We Know What You Ought To Be Watching This Summer

Amazon.com

score movie user 768 76 68 76 6 6 6 score movie user? 6? 96? 7?? 7??? 8? 9? 7? 69 6? 8 6 Training data Test data An example

Two basic approaches Content- based recommendaons: The user will be recommended items based on profile informaon or similar to the ones the user preferred in the past; Collaborave filtering (or collaborave recommendaons): The user will be recommended items that people with similar tastes and preferences liked in the past. Hybrids: Combine collaborave and content- based methods.

Road Map Introducon Content- based recommenda@on Collaborave filtering based recommendaon K- nearest neighbor Matrix factorizaon 6

Content- Based Recommendaon Recommend items that matches the User Profile. The Profile is based on items user has liked in the past or explicit interests that he defines. A content- based recommender system matches the profile of the item to the user profile to decide on its relevancy to the user. 7

Road Map Introducon Content- based recommendaon Collaborave filtering based recommendaons K- nearest neighbor Matrix factorizaon 8

Collaborave Filtering Idea User Database A 9 B C : : Z A B C 9 : : Z 0 A B C : : Z 7 A B C 8 : : Z A 6 B C : : Z A 0 B C 8.. Z Correlaon Match A 9 B C : : Z A 0 B C 8.. Z

Collaborave filtering Collaborave filtering (CF): most widely- used recommendaon approach in pracce. k- nearest neighbor, matrix factorizaon Key characterisc of CF: it predicts the ulity of items for a user based on the items previously rated by other like- minded users. 0

k- Nearest Neighbor knn : ulizes the enre user- item database to generate predicons directly, i.e., there is no model building. This approach includes both User- based methods Item- based methods Two primary phases: the neighborhood formaon phase and the recommendaon phase.

Neighborhood formaon phase The similarity between the target user, u, and a neighbor, v, can be calculated using the Pearson s correla@on coefficient: r u,i is the rang given to item I by user u. C is the list of items rated by BOTH users, u and v

Recommendaon Phase Then we can compute the rang predicon of item i for target user u where V is the set of k similar users(could be all users), r v,i is the rang of user v given to item i,

Issue with the user- based knn CF Lack of scalability: it requires the real- me comparison of the target user to all user records in order to generate predicons. Any suggesons to improve this? A variaon of this approach that remedies this problem is called item- based CF.

Item- based CF The item- based approach works by comparing items based on their pacern of rangs across users. The similarity of items i and j is computed as follows:

Recommendaon phase Ader compung the similarity between items we select a set of k most similar items to the target item and generate a predicted value of user u s rang where J is the set of k similar items 6

Praccal Issues : Cold Start New user Rate some inial items Non- personalized recommendaons Describe tastes Demographic info. New Item Non- CF : content analysis, metadata

Road Map Introducon Content- based recommendaon Collaborave filtering based recommendaons K- nearest neighbor Matrix factoriza@on 8

Latent factor models serious The Color Purple Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s Lethal Weapon Geared towards males Dave The Princess Diaries The Lion King escapist Independence Day Dumb and Dumber Gus

Latent factor models items. -....6 -... -.... -. -.7..7 - -.9... -..8 -. -.. -... -.. -.7.9. -....7 -.8. -.6.7.8. -..9..7.6 -.. ~ ~ items users users

Esmate unknown rangs as inner- products of factors: items. -....6 -... -.... -. -.7..7 - -.9... -..8 -. -.. -... -.. -.7.9. -....7 -.8. -.6.7.8. -..9..7.6 -.. ~ ~ items users users?

Esmate unknown rangs as inner- products of factors: items. -....6 -... -.... -. -.7..7 - -.9... -..8 -. -.. -... -.. -.7.9. -....7 -.8. -.6.7.8. -..9..7.6 -.. ~ ~ items users. users

Challenges Similar to SVD, but less constrained: Factorize with missing values! Re- define objecve funcon: To avoid over- filng Can use gradient descent to deal with missing values

Stochasc Gradient Descent For each data point, Derivaves on variables (q and p) are used for update: Both p and q are unknown, so we have to alternate Will converge to local opma

Incorporang bias Some users rate movies higher than others Some movies get hyped and get higher rangs The new model: The new objecve funcon Derivaves:

Further modeling assumpons Changing preferences over me? Varying confidence levels in rangs? Other ideas?

Summary Recommendaon based on Content Collaborave filtering Collaborave filtering Neighborhood method Matrix Factorizaon Possible Further topics Hybrid models of content and collaborave to impute missing values and deal with cold start