STREAMING RANKING BASED RECOMMENDER SYSTEMS

Similar documents
Streaming Ranking Based Recommender Systems

Improving Top-N Recommendation with Heterogeneous Loss

Bayesian Personalized Ranking for Las Vegas Restaurant Recommendation

Collaborative Metric Learning

Recommender Systems New Approaches with Netflix Dataset

CS249: ADVANCED DATA MINING

Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data

Real-time Collaborative Filtering Recommender Systems

arxiv: v1 [cs.ir] 19 Dec 2018

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

ECS289: Scalable Machine Learning

Real-time Collaborative Filtering Recommender Systems

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

BPR: Bayesian Personalized Ranking from Implicit Feedback

Learning Graph-based POI Embedding for Location-based Recommendation

CS224W Project: Recommendation System Models in Product Rating Predictions

Pseudo-Implicit Feedback for Alleviating Data Sparsity in Top-K Recommendation

AI Dining Suggestion App. CS 297 Report Bao Pham ( ) Advisor: Dr. Chris Pollett

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Rating Prediction Using Preference Relations Based Matrix Factorization

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

A Brief Review of Representation Learning in Recommender 赵鑫 RUC

Recommender System. What is it? How to build it? Challenges. R package: recommenderlab

Stable Matrix Approximation for Top-N Recommendation on Implicit Feedback Data

Music Recommendation with Implicit Feedback and Side Information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Document Information

Chapter 2 Basic Structure of High-Dimensional Spaces

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

Understanding and Recommending Podcast Content

Machine Learning (BSMC-GA 4439) Wenke Liu

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Part 11: Collaborative Filtering. Francesco Ricci

Improving Implicit Recommender Systems with View Data

arxiv: v1 [cs.ir] 27 Jun 2016

CSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems

Effective Latent Models for Binary Feedback in Recommender Systems

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

Query Independent Scholarly Article Ranking

Jeff Howbert Introduction to Machine Learning Winter

arxiv: v1 [cs.ir] 2 Oct 2017

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Matrix Co-factorization for Recommendation with Rich Side Information HetRec 2011 and Implicit 1 / Feedb 23

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data

A Collaborative Method to Reduce the Running Time and Accelerate the k-nearest Neighbors Search

Content-based Dimensionality Reduction for Recommender Systems

Community-Based Recommendations: a Solution to the Cold Start Problem

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

System For Product Recommendation In E-Commerce Applications

ECS289: Scalable Machine Learning

Data Mining Techniques

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

Investigating the performance of matrix factorization techniques applied on purchase data for recommendation purposes

AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

Information Integration of Partially Labeled Data

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Project Report. An Introduction to Collaborative Filtering

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Ranking by Alternating SVM and Factorization Machine

Improving the Accuracy of Top-N Recommendation using a Preference Model

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Data Mining Techniques

On hybrid modular recommendation systems for video streaming

Towards a hybrid approach to Netflix Challenge

Fast Matrix Factorization for Online Recommendation with Implicit Feedback

Cold-Start Web Service Recommendation Using Implicit Feedback

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Personalized Interactive Faceted Search

Author(s): Rahul Sami, 2009

Know your neighbours: Machine Learning on Graphs

Use of KNN for the Netflix Prize Ted Hong, Dimitris Tsamis Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Using Social Networks to Improve Movie Rating Predictions

ExcUseMe: Asking Users to Help in Item Cold-Start Recommendations

Big Data Analytics CSCI 4030

Recommendation System for Location-based Social Network CS224W Project Report

arxiv: v1 [cs.ir] 22 Nov 2017

Machine Learning (BSMC-GA 4439) Wenke Liu

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Mining Web Data. Lijun Zhang

Mining Data that Changes. 17 July 2015

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

Implicit Documentation

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Tracking. Hao Guan( 管皓 ) School of Computer Science Fudan University

Data Analytics with HPC. Data Streaming

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Recommender Systems (RSs)

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Collaborative Filtering using Euclidean Distance in Recommendation Engine

Prowess Improvement of Accuracy for Moving Rating Recommendation System

Improved Recommendation System Using Friend Relationship in SNS

goccf: Graph-Theoretic One-Class Collaborative Filtering Based on Uninteresting Items

Improving Implicit Recommender Systems with View Data

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

arxiv: v1 [cs.ir] 15 Sep 2017

Sentiment Analysis for Amazon Reviews

A Bayesian Approach to Hybrid Image Retrieval

Transcription:

STREAMING RANKING BASED RECOMMENDER SYSTEMS Weiqing Wang, Hongzhi Yin, Zi Huang, Qinyong Wang, Xingzhong Du, Quoc Viet Hung Nguyen University of Queensland, Australia & Griffith University, Australia July 17, 2018

Outline Introduction Challenges of Streaming Recommender Systems Effective Streaming Models Evaluation 1

Introduction Recommender systems Streaming data everywhere ebay : more than 10 million transactions per day. Twitter: half a billion tweets are generated every day (around 6,000 [2] tweets every second). [1] [1] https://www.ebayinc.com/our-company/privacy-center/gdpr/ [2] https://en.wikipedia.org/wiki/netflix 2

Introduction Streaming data everywhere ebay : more than 10 million transactions per day. Netflix: more than 3 million subscribers from mid-march 2013 to [2] April 2013. Twitter: half a billion tweets are generated every day (around 6,000 tweets every second). Characteristics Temporally ordered Large-volume High-velocity [1] [1] https://www.ebayinc.com/our-company/privacy-center/gdpr/ [2] https://en.wikipedia.org/wiki/netflix 3

Introduction Traditional recommender systems - static Collaborative filtering & matrix factorization Nearest neighbors and correlation coefficients are precomputed New coming data in the streams cannot be integrated into the trained model efficiently 4

Introduction Traditional recommender systems - static Collaborative filtering & matrix factorization Nearest neighbors and correlation coefficients are precomputed New coming data in the streams cannot be integrated into the trained model efficiently Imperious need for Streaming Recommender Systems 5

Outline Introduction Challenges of Streaming Recommender Systems Effective Streaming Models Evaluation 6

One possible way Challenges Learn parameters online with SGD Update users preferences based on each new instance in stream Challenge Short-term memory The updates are only based on the most recent data Tend to forget past observations Cannot users long-term interests [1] [2] [1] Steffen Rendle and Lars Schmidt-Thieme. 2008. Online-updating Regularized Kernel Matrix Factorization Models for Large-scale Recommender Systems. In RecSys. 251 258. [2] Ernesto Diaz-Aviles, Lucas Drumond, Lars Schmidt-Thieme, and Wolfgang Nejdl. 2012. Real-ime Top-n Recommendation in Social Streams. In Recsys. 59 66. 7

Challenges Capture users drifted interests & model new users/items Data in streams are temporarily ordered Users preference tend to drift over time A mother tends to be interested in different products for children as her child grows up Avoid being overwhelmed by past observations New users and new items arrive continuously From 2007 to 2015, the number of registered users on Amazon increase from 76 millions to 304 millions Challenges Large-volume and high-velocity Identify and model new users and new items 8

Challenges Overload [1, 2, 3] Input rates > Available computing capacity Process time for one activity: 2 secs Input rate: 1 activity per second Impossible to update model over every activity [1] Ling Liu and M. Tamer su (Eds.). 2009. Encyclopedia of Database Systems. Springer US. [2] Steffen Rendle and Lars Schmidt-Thieme. 2008. Online-updating Regularized Kernel Matrix Factorization Models for Large-scale Recommender Systems. In RecSys. 251 258. [3] Nesime Tatbul, Ugur 莈 tintemel, and Stanley B. Zdonik. 2007. Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing. In VLDB. 159 170. 9

Reservoir based [1] Challenges Maintain and update only based on the instances in reservoir Aim of reservoir: keep a memory of long-term records Sampling rate: sample t th instances with the probability 1 t Shortcoming: cannot capture users drifted interests and modeling new users or items [1] Ernesto Diaz-Aviles, Lucas Drumond, Lars Schmidt-Thieme, and Wolfgang Nejdl. 2012. Real-ime Top-n Recommendation in Social Streams. In Recsys. 59 66. 10

Outline Introduction Challenges of Streaming Recommender Systems Effective Streaming Models Evaluation 11

Offline model Effective Streaming Model PMF model with BPR optimization Ratings->binary Each user and item are assigned with a z-dimensional latent factor vector, denoted as u i and v j respectively. Assumption: all scores are conditionally independent with each other given latent factors of users and items. Score function of a rating: (1) where (2) 12

Offline model Effective Streaming Model Introduce the Gaussian priors over the latent parameters Θ, the generative process of our model is as follows: BPR optimization framework. 13

Offline model Effective Streaming Model How to sample the negative instances? A uniform sampling (simple but efficient) GAN based sampling (expensive to maintain a generator) A statistic-based sampling (data far away from being complete) BPR optimization framework. 14

Online model - SPMF A general framework Effective Streaming Model Negative sampling: treat all the data outside W R as negative How to sample positive data instances under the overload assumption? 15

Online model Effective Streaming Model Efficiently and correctly sample positive instances 16

Compute current model s predictability of each data instance (defined by f) Maintain the reservoir Rank data according to f Compute the sampling probability of each data instance according to their ranks Sample and update the model Naïve idea: sample the data instances that are expected to change the model most 17

Outline Introduction Challenges of Streaming Recommender Systems Effective Streaming Models Evaluation 18

Important settings Datasets Evaluation Netflix: 100 millions ratings, 480 thousand movies, 17 thousand movies Movielens: about 1 million ratings, 4 thousand movies, 6 thousand users Taobao: 14 millions ratings, 462 thousand clothes, 1 million users Comparative methods State-of-art [1] srec : latest, update based on all the incoming data and the posterior probabilities at the previous moment. [2] RMFX : pairwise, reservoir based, no positive sampling [3] RKMF : update model on selected data and assign larger probabilities to new users and new items [4] WRMF : not designed for streaming data and we assume that it can access the whole data for training. It is expected to set up an upper bound for the on-line approaches. [1] S. Chang, Y. Zhang, J. Tang, D. Yin, Y. Chang, M. A. H. Johnson and T. S. Huang, 2017. Streaming Recommender Systems. In WWW. 381-389 [2] E. Diaz-Aviles, L. Drumond, L. S. Thieme, and W. Nejdl. 2012. Real-time Top-n Recommendation in Social Streams. In Recsys. 59-66. [3] S. Rendle, L. S. Thieme. 2008. Online-updating Regularized Kernel Matrix Factorization Models for Large-scale Recommender Systems. In RecSys. 251-258 [4] Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM. 263-272 19

Important settings Comparative methods Evaluation Three variant versions of our SPMF model SPMF-S1: sample the positive data instances randomly SPMF-S2: without maintaining the reservoir, the positive data are sampled from W only SPMF-S3: without exploiting W. The positive data are sampled from R only. Dataset division Order all the instances based on temporary information 20

Dataset division Evaluation Order all the instances based on temporary information Divide all data into halves Base training set: considered as the historical data pool to train the hyper parameters Candidate set: used to mimic the on-line streaming inputs Mimic the on-line stream Divide the candidate set into five slices sequentially Each slice is treated as a test set D test All the data prior to the target test set in the candidate set are used for on-line learning E.g., target test set is D i test, all the sets D j test, where j < i, are used for on-line learning and each one of them is treated as an input window W. For users who have no activities before D i test, we recommend the most popular items for all comparison methods D train D i test 21

Evaluation Evaluation metric The popular Hits@n = #hit@n D test Recommendation Effectiveness Simulation of overload situation The time used to update RMFX (one comparison method) for W 2 iterations. Size of reservoir D 20 Statistics soundness Ran all the methods 30 times and use the average results 22

Evaluation results 1. On both Movielens and Netflix, WRMF performs best among all the methods. 2. SPMF achieves very competitive results compared with WRMF under the limitation of memory and training time. 3. All methods perform worse on Taobao a much sparser data. Another possible reason is that Taobao has much larger numbers of test cases containing new items (about 45.86% on Taobao while less than 1% on other two datasets) 4. SPMF performs even better than WRMF in many cases on Taobao as WRMF is popularity based and is biased to the popular items. 23

Evaluation results 1. SPMF consistently outperforms the three variant versions, indicating the benefits brought by each factor. 2. SPMF-S2 performs second best on Movielens, demonstrating the advantage of wisely sampling positive outweigh the advantage of using reservoir. 24

Evaluation results 1. New users: test on the cases that contain users who have no history activities in the base training set D train 2. New items : no history records on D train 3. SPMF performs best in recommendation for new users and new items even compared with the batch based WRMF 4. RKMF performs best among the other streaming methods as it also prefers to update its model based on the data related to less popular users or items 25

Evaluation results 1. the results of D test 2 on Movielens 2. As the number of iterations increase, the recommendation performance first increase rapidly, and then the values become steady. 26

Another line of work Qinyong Wang, Hongzhi Yin, Zhiting Hu, Defu Lian, Hao Wang and Zi Huang. "Neural Memory Streaming Recommender Networks with Adversarial Training". KDD'18 27

Thank you! 28