Recommendation Systems

Similar documents
CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Data Mining Techniques

Data Mining Techniques

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

CS 124/LINGUIST 180 From Languages to Information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

COMP 465: Data Mining Recommender Systems

CS 124/LINGUIST 180 From Languages to Information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Introduction to Data Mining

CS 124/LINGUIST 180 From Languages to Information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Performance Comparison of Algorithms for Movie Rating Estimation

Recommender Systems 6CCS3WSN-7CCSMWAL

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

BBS654 Data Mining. Pinar Duygulu

General Instructions. Questions

CptS 570 Machine Learning Project: Netflix Competition. Parisa Rashidi Vikramaditya Jakkula. Team: MLSurvivors. Wednesday, December 12, 2007

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

Applying Supervised Learning

CSE 158. Web Mining and Recommender Systems. Midterm recap

Recommender Systems New Approaches with Netflix Dataset

Opportunities and challenges in personalization of online hotel search

Document Information

Recommendation and Advertising. Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University)

Mining Web Data. Lijun Zhang

COMP6237 Data Mining Making Recommendations. Jonathon Hare

CS224W Project: Recommendation System Models in Product Rating Predictions

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Yelp Recommendation System

Machine Learning using MapReduce

Collaborative Filtering for Netflix

Using Social Networks to Improve Movie Rating Predictions

Lecture on Modeling Tools for Clustering & Regression

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

CSC 411 Lecture 18: Matrix Factorizations

Recommender Systems - Content, Collaborative, Hybrid

Introduction. Chapter Background Recommender systems Collaborative based filtering

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth. Under The Guidance of Dr. Richard Maclin

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Singular Value Decomposition, and Application to Recommender Systems

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Collaborative Filtering Applied to Educational Data Mining

CS249: ADVANCED DATA MINING

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

Machine Learning and Data Mining. Collaborative Filtering & Recommender Systems. Kalev Kask

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

CS 572: Information Retrieval

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Recommender Systems (RSs)

Link Prediction for Social Network

Advances in Collaborative Filtering

Advances in Collaborative Filtering

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Jeff Howbert Introduction to Machine Learning Winter

arxiv: v4 [cs.ir] 28 Jul 2016

CS570: Introduction to Data Mining

node2vec: Scalable Feature Learning for Networks

Mining Web Data. Lijun Zhang

Variational Bayesian PCA versus k-nn on a Very Sparse Reddit Voting Dataset

CSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Data Mining. Lecture 03: Nearest Neighbor Learning

Part 11: Collaborative Filtering. Francesco Ricci

Predicting User Ratings Using Status Models on Amazon.com

Supervised Learning: K-Nearest Neighbors and Decision Trees

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

Gene Clustering & Classification

Hotel Recommendation Based on Hybrid Model

Orange3 Data Fusion Documentation. Biolab

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

CSE 573: Artificial Intelligence Autumn 2010

Assignment 5: Collaborative Filtering

Web Personalization & Recommender Systems

Recommender Systems. Master in Computer Engineering Sapienza University of Rome. Carlos Castillo

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem

Recommender Systems. Techniques of AI

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Deep Learning for Recommender Systems

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Part 11: Collaborative Filtering. Francesco Ricci

CS294-1 Assignment 2 Report

Collaborative Filtering based on User Trends

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Collaborative Filtering and Recommender Systems. Definitions. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Missing Data. Where did it go?

Transcription:

Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich

Recommender Systems (RecSys) Search Recommendations Items Products, web sites, blogs, news items,

RecSys is Everywhere System that provides or suggests items to the end users

Long Tail Phenomenon Source: Chris Anderson (2004)

Physical vs Online Presence

RecSys: Tasks Task 2: Top N recommendation Task 1: Predict user rating

RecSys: Paradigms Personalized recommendation Collaborative Contentbased Knowledge-based

RecSys: Evolution Item hierarchy: You bought a printer, will also need ink Collaborative filtering & user-user similarity: People like you who bought beer also bought diapers Social + interest graph based: Your friends like Lady Gaga so you will like Lady Gaga Attribute based: You like action movies starring Clint Eastwood, you will also like Good, Bad, and Ugly Collaborative filtering & item-item similarity: You like Godfather so you will like Scarface Model based: Training SVM, LDA, SVD for implicit features

RecSys: Basic Techniques Pros Cons Collaborative No knowledge engineering effort, serendipity of results, learns market segments Requires some form of rating feedback, cold start for new users and new items Content-based Knowledge-based No community required, comparison between items possible Deterministic recommendations, assured quality, no cold start, can resemble sales dialogue Content descriptions necessary, cold start for new users, no surprises Knowledge engineering effort to bootstrap, basically static, does not react to short term trends

RecSys: Challenges Scalability - millions of objects and users Cold start Changing user base Changing inventory (movies, stories, goods) Attributes Imbalanced dataset - user activity / item reviews are power law distributed

Netflix Prize: $1 M (2006-2009)

Netflix Movie Recommendation Training Data: 480,000 users 17,700 movies 6 years of data: 2000-2005 Test data: most recent ratings of each user

Evaluation Metrics Error on unseen test set Q, not on training error Root Mean Square Error v u RMSE = t 1 S X (i,u)2s (ˆr ui r ui ) 2 Mean Absolute Error MAE = 1 S X ˆr ui r ui (i,u)2s Rank-based objectives (e.g., What fraction of true top-10 preferences are in predicted top 10?)

Netflix Prize Evaluation criterion: RMSE Cinematch (Netflix) system RMSE: 0.9514 Competition 2700+ teams $1 M prize for 10% improvement on Netflix

Netflix Winner: BellKor Multi-scale modeling of the data: Global: Overall deviations of users & movies Factorization: Regional effects Collaborative filtering: Extract local patterns

Normalization / Global Bias Mean movie rating across all movies Some users tend to give higher ratings than others Some movies tend to receive higher rating than others

Example: Global & Local Effects Global effect Mean movie rating: 3.7 stars The Sixth Sense is 0.5 stars above average Joe rates 0.2 stars below average Local effect Baseline estimate: 4 stars Joe doesn t like related movie Signs Final estimate: 3.8 stars

Netflix Performance Global average: 1.1296 User average: 1.0651 Movie average: 1.0533 Cinematch:0.9514

Neighborhood Methods: Basic Idea

Review: k-nn Examine the k- closest training data points to new point x Closest depends on distance metric used Assign the object the most frequently occurring class (majority vote) or the average value (regression) http://cs.nyu.edu/~dsontag/courses/ml13/slides/lecture11.pdf

k-nn: User-based Intuition: Similar users will rate the item similarly Represent each user as incomplete vector of item ratings Find set of N users who are similar to Joe s ratings Estimate Joe s ratings based on ratings of users in set N

k-nn: User-based What is the right distance metric then? Jaccard similarity: D(x, y) = x \ y x [ y ignores value of the rating Cosine similarity: D(x, y) = x y x 2 y 2 missing ratings are negative Pearson correlation coefficient Ps2S (x xy s x)(y s ȳ) > D(x, y) = q P s2sxy (x s x) 2 q P s2sxy (y s ȳ) 2

k-nn: Item-based Intuition: Users rate similar items similarly Represent each item as incomplete vector of user ratings Find other similar items Estimate rating for item based on ratings for similar items

k-nn: Item-based 12 11 10 9 8 7 6 5 4 3 2 1 4 5 5? 3 1 1 3 1 2 4 4 5 2 5 3 4 3 2 1 4 2 3 2 4 5 4 2 4 5 2 2 4 3 4 5 4 2 3 3 1 6 users movies

k-nn: Item-based users 1 2 3 4 5 6 7 8 9 10 11 12 sim(1,m) 1 1 3? 5 5 4 1.00 2 5 4 4 2 1 3-0.18 movies 3 4 2 4 2 4 1 2 5 3 4 4 3 5 2 0.41-0.10 5 4 3 4 2 2 5-0.31 6 1 3 3 2 4 0.59 use weighted average to predict

k-nn: Advantages Intuitive interpretation: you will like what your neighbors like Easy to implement and zero training time No feature selection needed works for any kind of item

k-nn: Disadvantages Cold start Need enough users in the system to find a match New items and esoteric items may not have any ratings Sparse, high-dimensional similarity search is not easy Tends to recommend popular items Need to store all items or user vectors in memory

Netflix Performance Global average: 1.1296 User average: 1.0651 Movie average: 1.0533 Basic k-nn:0.94 k-nn + bias + learned weights:0.91 Cinematch:0.9514

Review: Dimensionality Reduction Generate a low-dimensional encoding of a highdimensional space Purposes: Data compression / visualization Robustness to noise and uncertainty Potentially easier to interpret

Review: Matrix Factorization samples H features X W regressors, activation coefficients, expansion coefficients Data matrix dictionary, patterns, topics, basis, explanatory variables

Dimensionality Reduction

Review: SVD Each matrix can be decomposed using singular value decomposition (SVD): X {z} n p = U {z} n p D {z} p p V > {z} p p orthonormal columns which are principal components orthonormal columns which are normalized PC scores diagonal matrix which if each diagonal element is squared and divided by n gives variance explained

SVD to MF Create two new matrices (user and item matrices) where the square root of the singular values are distributed to each matrix U and V Interpretation: p u indicates how much user likes each latent factor f q i means the contribution of item to each of the latent factors f

RecSys: SVD SVD is great as it minimizes SSE which is monotonically related to RMSE Conventional SVD is undefined for missing entries No rating can be interpreted as zero rating is that right?

RecSys: SVD One idea: Expectation maximization as form of imputation Fill in unknown entries with best guess Apply SVD Repeat Can be expensive and inaccurate imputation can distort data

SVD w/ Missing Values New idea: Model only the observed entries and avoid overfitting via regularization Two methods for solving the new model Stochastic gradient descent Alternating least squares - easier to parallelize as each qi is independent and more memory efficient

Netflix Results: Latent Factors lowbrow comedies and horror films for male or adolescent audience independent, critically acclaimed, quirky films mainstream, formulaic films serious drama or comedy with strong female leads

SVD with Bias Baseline predictor Separates users and movies Benefits from insights into user s behavior Among the main prac;cal contribu;ons of the compe;;on User-Movie interac/on Characterizes the matching between users and movies A6racts most research in the field Benefits from algorithmic and mathema;cal innova;ons

Netflix Performance Global average: 1.1296 User average: 1.0651 Movie average: 1.0533 Basic k-nn:0.94 k-nn + bias + learned weights:0.91 SVD:0.90 SVD w/ bias:0.89 Cinematch:0.9514

Implicit Feedback May have access to binary information reflecting implicit user preferences Is a movie in a user s queue? Test source can be source we know that user u rated item i, just don t know the rating Data is not missing at random Fact that user rated item provides information

Netflix: Temporal Bias

Netflix: Temporal Bias Items Users Seasonal effects Changed review labels Public perception (Oscars, SAG, etc.) Anchoring (relative to previous movie) Grow and fade in popularity Selection bias for time of viewing

Temporal SVD linear modeling of user biases movie-related temporal effects single day effect sudden spikes

Netflix Performance Global average: 1.1296 User average: 1.0651 Movie average: 1.0533 Basic k-nn:0.94 k-nn + bias + learned weights:0.91 SVD:0.90 SVD w/ bias:0.89 Cinematch:0.9514 SVD w/ bias and time:0.876

Netflix Results: RMSE Add bias implicit feedback temporal effects

Final Solution: Kitchen Sink Approach

Winning Solution Beat Netflix by 10.06% in RMSE Tied with another team but won because submitted 20 minutes earlier Computationally intensive and impractical

Many More Ideas Cold start (new users) Different regularization for different parameter groups and differs users Sharing of statistical strength between users Hierarchical matrix co-clustering / factorization Incorporate social network, user profiles, item profiles

RecSys: Challenges Relevant objectives Predicting actual rating may be useless! May care more about ranking of items Missing at random assumption How can our models capture information in choices of our ratings? Handling users and items with few ratings

RecSys: Challenges Multiple individuals using the same account individual preference Preference versus intention Distinguish between liking and interested in seeing / purchasing Worthless to recommend an item a user already has Scalability Simple and parallelizable algorithms are preferred