CSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems

Similar documents
CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

Singular Value Decomposition, and Application to Recommender Systems

Document Information

Mining Human Trajectory Data: A Study on Check-in Sequences. Xin Zhao Renmin University of China,

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

A probabilistic model to resolve diversity-accuracy challenge of recommendation systems

Mining Web Data. Lijun Zhang

Data Mining Techniques

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Mining Web Data. Lijun Zhang

Link Prediction for Social Network

A Brief Review of Representation Learning in Recommender 赵鑫 RUC

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Opportunities and challenges in personalization of online hotel search

ECS289: Scalable Machine Learning

Slides for Data Mining by I. H. Witten and E. Frank

Data Mining Techniques

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Information Retrieval. (M&S Ch 15)

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

Recommender Systems. Collaborative Filtering & Content-Based Recommending

COMP6237 Data Mining Making Recommendations. Jonathon Hare

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

Data Mining Concepts & Tasks

CS249: ADVANCED DATA MINING

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

STREAMING RANKING BASED RECOMMENDER SYSTEMS

CSE255 Assignment 1 Improved image-based recommendations for what not to wear dataset

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

A Constrained Spreading Activation Approach to Collaborative Filtering

Data Mining Lecture 2: Recommender Systems

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Recommendation Systems

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

CSE 417 Network Flows (pt 4) Min Cost Flows

Recommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems

CS145: INTRODUCTION TO DATA MINING

Lecture 25: Review I

Spatial Localization and Detection. Lecture 8-1

Data Warehouse and Data Mining

Tutorials Case studies

node2vec: Scalable Feature Learning for Networks

Extracting Rankings for Spatial Keyword Queries from GPS Data

System For Product Recommendation In E-Commerce Applications

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Knowledge Discovery and Data Mining 1 (VO) ( )

Data Mining Concepts & Tasks

Automatic Identification of User Goals in Web Search [WWW 05]

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

Introduction to Automated Text Analysis. bit.ly/poir599

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett

Introduction to Data Mining

Evaluating Classifiers

Prowess Improvement of Accuracy for Moving Rating Recommendation System

Music Recommendation with Implicit Feedback and Side Information

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

Recommendation System for Location-based Social Network CS224W Project Report

CS249: ADVANCED DATA MINING

CS 124/LINGUIST 180 From Languages to Information

Scalable Network Analysis

Machine Learning Techniques for Data Mining

Learning video saliency from human gaze using candidate selection

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Learning Graph-based POI Embedding for Location-based Recommendation

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Tree-based methods for classification and regression

Evaluating Classifiers

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Recommender Systems 6CCS3WSN-7CCSMWAL

A Neuro Probabilistic Language Model Bengio et. al. 2003

GOOGLE S MOST-SEARCHED ONLINE PRODUCTS AND SERVICES JULY Perfect Search Media

Topology Inference from Co-Occurrence Observations. Laura Balzano and Rob Nowak with Michael Rabbat and Matthew Roughan

Spatial Index Keyword Search in Multi- Dimensional Database

Information Retrieval: Retrieval Models

Machine Learning using MapReduce

Multimodal topic model for texts and images utilizing their embeddings

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

A Constrained Spreading Activation Approach to Collaborative Filtering

Machine Learning 13. week

Project 3 Q&A. Jonathan Krause

Linear Classification and Perceptron

Every Picture Tells a Story: Generating Sentences from Images

Diversity in Recommender Systems Week 2: The Problems. Toni Mikkola, Andy Valjakka, Heng Gui, Wilson Poon

Review on Techniques of Collaborative Tagging

Part I: Data Mining Foundations

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Predicting your Next Stop-Over from Location-Based Social Network Data with Recurrent Neural Networks RecTour workshop 2017 RecSys 2017, Como, Italy

Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle

CSCI 5417 Information Retrieval Systems. Jim Martin!

Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle

Relevance Feedback and Query Reformulation. Lecture 10 CS 510 Information Retrieval on the Internet Thanks to Susan Price. Outline

CSEP 573: Artificial Intelligence

2. Design Methodology

CS 124/LINGUIST 180 From Languages to Information

Part 11: Collaborative Filtering. Francesco Ricci

Transcription:

CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems

This week Methodological papers Bayesian Personalized Ranking Factorizing Personalized Markov Chains Personalized Ranking Metric Embedding Translation-based Recommendation

This week Goals:

This week Application papers (Wednesday) Recommending Product Sizes to Customers Playlist prediction via Metric Embedding Efficient Natural Language Response Suggestion for Smart Reply Personalized Itinerary Recommendation with Queuing Time Awareness Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

This week We (hopefully?) know enough by now to Read academic papers on Recommender Systems Understand most of the models and evaluations used See also CSE291

Bayesian Personalized Ranking

Bayesian Personalized Ranking Goal: Estimate a personalized ranking function for each user

Bayesian Personalized Ranking Why? Compare to traditional approach of replacing missing values by 0: But! 0 s aren t necessarily negative!

Bayesian Personalized Ranking Why? Compare to traditional approach of replacing missing values by 0: This suggests a possible solution based on ranking

Bayesian Personalized Ranking Defn: AUC (for a user u) ( ) scoring function that compares an item i to an item j for a user u The AUC essentially counts how many times the model correctly identifies that u prefers the item they bought (positive feedback) over the item they did not

Bayesian Personalized Ranking Defn: AUC (for a user u) AUC = 1: AUC = 0.5: We always guess correctly among two potential items i and j We guess no better than random

Bayesian Personalized Ranking Defn: AUC = Area Under Precision Recall Curve

Bayesian Personalized Ranking Summary: Goal is to count how many times we identified i as being more preferable than j for a user u

Bayesian Personalized Ranking Summary: Goal is to count how many times we identified i as being more preferable than j for a user u

Bayesian Personalized Ranking Idea: Replace the counting function by a smooth function is any function that compares the compatibility of i and j for a user u e.g. could be based on matrix factorization:

Bayesian Personalized Ranking Idea: Replace the counting function by a smooth function

Bayesian Personalized Ranking Idea: Replace the counting function by a smooth function

Bayesian Personalized Ranking Experiments: RossMann (online drug store) Netflix (treated as a binary problem)

Bayesian Personalized Ranking Experiments:

Bayesian Personalized Ranking Morals of the story: Given a one-class prediction task (like purchase prediction) we might want to optimize a ranking function rather than trying to factorize a matrix directly The AUC is one such measure that counts among a users u, items they consumed i, and items they did not consume, j, how often we correctly guessed that i was preferred by u We can optimize this approximately by maximizing where

Factorizing Personalized Markov Chains for Next-Basket Recommendation

Factorizing Personalized Markov Chains for Next-Basket Recommendation Goal: build temporal models just by looking at the item the user purchased previously (or )

Factorizing Personalized Markov Chains for Next-Basket Recommendation Assumption: all of the information contained by temporal models is captured by the previous action this is what s known as a first-order Markov property

Factorizing Personalized Markov Chains for Next-Basket Recommendation Is this assumption realistic?

Factorizing Personalized Markov Chains for Next-Basket Recommendation Data setup: Rossmann basket data

Factorizing Personalized Markov Chains for Next-Basket Recommendation Prediction task:

Factorizing Personalized Markov Chains for Next-Basket Recommendation Could we try and compute such probabilities just by counting? Seems okay, as long as the item vocabulary is small (I^2 possible item/item combinations to count) But it s not personalized

Factorizing Personalized Markov Chains for Next-Basket Recommendation What if we try to personalize? Now we would have U*I^2 counts to compare Clearly not feasible, so we need to try and estimate/model this quantity (e.g. by matrix factorization)

Factorizing Personalized Markov Chains for Next-Basket Recommendation What if we try to personalize?

Factorizing Personalized Markov Chains for Next-Basket Recommendation What if we try to personalize?

Factorizing Personalized Markov Chains for Next-Basket Recommendation Prediction task:

Factorizing Personalized Markov Chains for Next-Basket Recommendation Prediction task:

Factorizing Personalized Markov Chains for Next-Basket Recommendation F@5 FMC: not personalized MF: personalized, but not sequentially-aware

Factorizing Personalized Markov Chains for Next-Basket Recommendation Morals of the story: Can improve performance by modeling third order interactions between the user, the item, and the previous item This is simpler than temporal models but makes a big assumption Given the blowup in the interaction space, this can be handled by tensor decomposition techniques

Personalized Ranking Metric Embedding for Next New POI Recommendation

Factorizing Personalized Markov Chains for Next-Basket Recommendation Goal: Can we build better sequential recommendation models by using the idea of metric embeddings vs.

Personalized Ranking Metric Embedding for Next New POI Recommendation Why would we expect this to work (or not)?

Personalized Ranking Metric Embedding for Next New POI Recommendation Otherwise, goal is the same as the previous paper:

Personalized Ranking Metric Embedding for Next New POI Recommendation Data

Personalized Ranking Metric Embedding for Next New POI Recommendation Qualitative analysis

Personalized Ranking Metric Embedding for Next New POI Recommendation Qualitative analysis

Personalized Ranking Metric Embedding for Next New POI Recommendation Basic model (not personalized)

Personalized Ranking Metric Embedding for Next New POI Recommendation Basic model (not personalized)

Personalized Ranking Metric Embedding for Next New POI Recommendation Personalized version

Personalized Ranking Metric Embedding for Next New POI Recommendation Personalized version

Personalized Ranking Metric Embedding for Next New POI Recommendation Learning

Personalized Ranking Metric Embedding for Next New POI Recommendation Results

Personalized Ranking Metric Embedding for Next New POI Recommendation Morals of the story: In some applications, metric embeddings might be better than inner products Examples could include geographical data, but also others (e.g. playlists?)

Translation-based Recommendation

Translation-based Recommendation Goal: (e.g) which movie is this user going to watch next? viewing history of Want models that consider characteristics/preferences of each user local context, i.e., the last consumed item(s)

Translation-based Recommendation Goal: (e.g) which movie is this user going to watch next? viewing history of Option 1: Matrix Factorization

Translation-based Recommendation Goal: (e.g) which movie is this user going to watch next? viewing history of Option 2: Markov Chains

Translation-based Recommendation Idea: Considering the two simultaneously means modeling the interactions between a user and adjacent items user transition previous item next item

Translation-based Recommendation Compare: Factorized Personalized Markov Chains (earlier today) user preference local context

Translation-based Recommendation Compare: Personalized Ranking Metric Embedding (earlier today) an additional hyperpara. to balance the two components

Translation-based Recommendation Compare: Hierarchical Representation Model (HRM) Wang et al., 2015 (earlier today) average/max pooling, etc.

Translation-based Recommendation Compare: Hierarchical Representation Model (HRM) Wang et al., 2015 (earlier today) average/max pooling, etc. Goal: Try and get the best of both worlds, by modeling third-order interactions and using metric embeddings

Translation-based Recommendation Detour: Translation models in Knowledge Bases Data: entities; links (multiple types of relationships) State-of-the-art method: relationships as translations Goal: Predict unseen links Training example: entity h relation r entity t Basic idea: E.g. [Bordes et al., 2013], [Wang et al., 2014], [Lin et al., 2015]

Translation-based Recommendation Embedding space Items as points Users as translation vectors user Training triplet: previous item next item Objective:

Translation-based Recommendation Embedding space Items as points Users as translation vectors

Translation-based Recommendation bias Benefit from using metric embeddings Model (u, i, j) with a single component Recommendations can be made by a simple NN search

Translation-based Recommendation

Translation-based Recommendation Automotives Office Products Toys & Games Video Games Cell Phones & Accessories Clothing, Shoes, and Jewelry Electronics May 1996 - July 2014

Translation-based Recommendation check-ins at different venues Dec. 2011 - Apr. 2012 movie ratings Nov. 2005 - Nov. 2009 user reviews Jan. 2001 - Nov. 2013 (all available online)

Translation-based Recommendation 11.4M reviews & ratings of 4.5M users on 3.1M local businesses restaurants, hotels, parks, shopping malls, movie theaters, schools, military recruiting offices, bird control, mediation services... Characteristics: vast vocabulary of items, variability, and sparsity http://cseweb.ucsd.edu/~jmcauley/

Translation-based Recommendation

Translation-based Recommendation varying sparsity

Translation-based Recommendation Unified

Translation-based Recommendation TransRec

Translation-based Recommendation Works well with Doesn t work well with

Overview Morals of the story: Today we looked at two main ideas that extend the recommender systems we saw in class: 1. Sequential Recommendation: Most of the dynamics due to time can be captured purely by knowing the sequence of items 2. Metric Recommendation: In some settings, using inner products may not be the correct assumption

Assignment 1

Assignment 1

CSE 258 Web Mining and Recommender Systems Real-world applications of recommender systems

Recommending product sizes to customers

Recommending product sizes to customers Goal: Build a recommender system that predicts whether an item will fit :

Recommending product sizes to customers Challenges: Data sparsity: people have very few purchases from which to estimate size Cold-start: How to handle new customers and products with no past purchases? Multiple personas: Several customers may use the same account

Recommending product sizes to customers Data: Shoe transactions from Amazon.com For each shoe j, we have a reported size c_j (from the manufacturer), but this may not be correct! Need to estimate the customer s size (s_i), as well as the product s true size (t_j)

Recommending product sizes to customers Loss function:

Recommending product sizes to customers Loss function:

Recommending product sizes to customers Loss function:

Recommending product sizes to customers

Recommending product sizes to customers Loss function:

Recommending product sizes to customers Model fitting:

Recommending product sizes to customers Extensions: Multi-dimensional sizes Customer and product features User personas

Recommending product sizes to customers Experiments:

Recommending product sizes to customers Experiments: Online A/B test

Playlist prediction via Metric Embedding

Playlist prediction via Metric Embedding Goal: Build a recommender system that recommends sequences of songs Idea: Might also use a metric embedding (consecutive songs should be nearby in some space)

Playlist prediction via Metric Embedding Basic model: (compare with metric model from last lecture)

Playlist prediction via Metric Embedding Basic model ( single point ):

Playlist prediction via Metric Embedding Dual-point model

Playlist prediction via Metric Embedding Extensions: Popularity biases

Playlist prediction via Metric Embedding Extensions: Personalization

Playlist prediction via Metric Embedding Extensions: Semantic Tags

Playlist prediction via Metric Embedding Extensions: Observable Features

Playlist prediction via Metric Embedding Experiments: Yes.com playlists Dec 2010 May 2011 Small dataset: 3,168 songs 134,431 + 1,191,279 transitions Large dataset 9,775 songs 172,510 transitions + 1,602,079 transitions

Playlist prediction via Metric Embedding Experiments:

Playlist prediction via Metric Embedding Experiments: Small Big

Efficient Natural Language Response Suggestion for Smart Reply

Efficient Natural Language Response Suggestion for Smart Reply Goal: Automatically suggest common responses to e-mails

Efficient Natural Language Response Suggestion for Smart Reply Basic setup

Efficient Natural Language Response Suggestion for Smart Reply Previous solution (KDD 2016) Based on a seq2seq method

Efficient Natural Language Response Suggestion for Smart Reply Idea: Replace this (complex) solution with a simple multiclass classification-based solution

Efficient Natural Language Response Suggestion for Smart Reply Idea: Replace this (complex) solution with a simple multiclass classification-based solution

Efficient Natural Language Response Suggestion for Smart Reply Model: S(x,y)

Efficient Natural Language Response Suggestion for Smart Reply Model: Architecture v1

Efficient Natural Language Response Suggestion for Smart Reply Model: Architecture v2

Efficient Natural Language Response Suggestion for Smart Reply Model: Extensions

Efficient Natural Language Response Suggestion for Smart Reply Model: Extensions

Efficient Natural Language Response Suggestion for Smart Reply Experiments: (offline)

Efficient Natural Language Response Suggestion for Smart Reply Experiments: (online)

Personalized Itinerary Recommendation with Queuing Time Awareness

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences Goal: Identify items that might be purchased together

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences browsed together (substitutable) bought together (complementary)

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences Four types of relationship: 1) People who viewed X also viewed Y 2) People who viewed X eventually bought Y 3) People who bought X also bought Y 4) People bought X and Y together Substitutes (1 and 2), and Complements (3 and 4)

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences 1) Data collection

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences 2) Training data generation

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences 2) Training (simpler models)

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences 2) Training (simpler models)

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences 3) Siamese CNNs

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences 4) Recommendation

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences