Seminar Collaborative Filtering KDD Cup Ziawasch Abedjan, Arvid Heise, Felix Naumann
2 Collaborative Filtering
Recommendation systems 3
Recommendation systems 4
Recommendation systems 5
Recommendation systems 6 Content-based recommendations Relevance is based on content Similar items Uses also metadata Collaborative filtering Recommend items suiting one s taste Based on community ratings Serendipity
Collaborative Filtering Algorithms 7 Memory-based CF Item/user-based top N New items and user can be added easily Scales well on dense data sets Limited scalability on sparse data sets Model-based CF Singular Value Decomposition (SVD) Better scalability on sparse data sets Little evidence for the model Hybrid recommenders Content-boosted CF Mixture of memory-based and model-based techniques
Neighbor-based CF 8 Process: filtering -> prediction -> recommendation User-based approach Create cluster of similar users Recommendations depend on ratings of similar users Problem: missing ratings Item-based approach Create item-item matrix based on rating similarities Retrieve top K similar items and aggregating their ratings by the user
Singular Value Decomposition 9 Reduce dimensions of user/item matrix by factorization Use overlapping features (e.g. gender, genre) items features items user r 11 r ij r 1n r m1 r mn = user r i1 f 11 f 12 f m1 f m2 X feature g 11 g 12 g 21 g 22 g 1n g 2n Source: Koren, Yehuda, Bell, Robert & Volinsky, Chris: "Matrix Factorization Techniques for Recommender Systems
10 KDD Cup
Overview 11 ACM SIGKDD Knowledge Discovery and Data Mining Variety of topics 2010 - Student performance evaluation 2008 - Breast cancer 2007 - Consumer recommendations 2004 - Particle physics; plus protein homology prediction 2001 - Molecular bioactivity; plus protein locale prediction 1997 - Direct marketing for lift curve optimization 2 tracks with three places
KDD Cup 2011 12 On Yahoo Music Dataset Artists, Albums, Songs, Genres Track 1: predict user rating Track 2: decide whether a user rates a song or not Album * Song * Genre Artist
Track 1 13 Typical collaboration filtering task Predict the rating of a specific user for unrated songs Includes hierarchy of items User might have rated other songs of same album/artist No user might have rated the song but the same album/artist Includes time stamp of rating Rating behavior might have changed over time Older songs rated differently than newer songs? #Users #Items #Ratings #Train Ratings #Validation Ratings #Test Ratings 1,000,990 624,961 262,810,175 252,800,275 4,003,960 6,005,940
Track 2 14 Predict if a user would rate a given song highly or not at all Need a model for rate behavior Machine learning? Ratings on songs only No timestamps Given six songs, which three will most likely not be rated #Users #Items #Ratings #Train Ratings #Test Ratings #Users 249,012 296,111 62,551,438 61,944,406 607,032 249,012
15 Seminar
Approaches Track 1 16 Koren, Y., Bell, R. & Volinsky, C.: "Matrix Factorization Techniques for Recommender Systems Similar to Singular Value Decomposition Create small abstracted matrix Soft clustering Sarwar, B., Karypis, G., Konstan, J. & Reidl, J.: "Item-based collaborative filtering recommendation algorithms Classic approach Improvements in Robert M. Bell & Yehuda Koren: "Improved Neighborhood-based Collaborative Filtering" Need to be adjusted to include time stamp and hierarchy
Approaches Track 2 17 Kurucz, M., Benczúr, A.A., Kiss, T., Nagy, I., Szabó, A. & Torma, B.: "Who Rated What: a Combination of SVD, Correlation and Frequent Sequence Mining" Combination of four predictions (only two relevant) SVD and base prediction Needs estimation on how many ratings Sueiras, J.: "A classical predictive modeling approach for Task Who rated what? of the KDD CUP 2007" Machine learning (logistic regression) User-specific, movie-related, and user-movie pair features SVD for movie-related
Application 18 Send mail to ziawasch.abedjan@hpi.uni-potsdam.de Subject: [CF Seminar] Deadline: April 18 th Notification: April 19 th Limited to 8 participants = 4 teams Random selection if more applicants Send top 3 wishes on tasks and approaches We are open to other approaches
Time Schedule 19 April 14th: first seminar, topic presentations April 16th: application deadline, team/paper preferences April 17th: team/paper notification April 21th: mandatory consultation April 28th: paper presentation May 12th: initial implementation/idea presentation June 9th: intermediate presentation/project consolidation June 23th: final presentations June 30th: KDD cup submission deadline (results only) July 9th: hand in short paper (6 pages)
Grading 20 3 LP (half semester, project seminar) Paper and final presentations Participation during intermediate presentations / discussions Implementation strategies/proposed extensions Short paper Bonus: good results in KDD cup
Options 21 Mahout or stand-alone Single or combined repository We recommend cooperation Many similar issues
References 22 Official KDD Cup: http://kddcup.yahoo.com/ Su, Xiaoyuan & Khoshgoftaar, Taghi M.: "A survey of collaborative filtering techniques" Koren, Yehuda; Bell, Robert & Volinsky, Chris: "Matrix Factorization Techniques for Recommender Systems" Sarwar, Badrul; Karypis, George, Konstan, Joseph & Reidl, John: "Itembased collaborative filtering recommendation algorithms Improvements in Robert M. Bell & Yehuda Koren: "Improved Neighborhood-based Collaborative Filtering Miklós Kurucz, András A. Benczúr, Tamás Kiss, István Nagy, Adrienn Szabó & Balázs Torma: "Who Rated What: a Combination of SVD, Correlation and Frequent Sequence Mining" Jorge Sueiras: "A classical predictive modeling approach for Task Who rated what? of the KDD CUP 2007