Recommendation with Differential Context Weighting

Similar documents
Tutorial: Context In Recommender Systems

User-Oriented Context Suggestion

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

A PROPOSED HYBRID BOOK RECOMMENDER SYSTEM

Seminar Collaborative Filtering. KDD Cup. Ziawasch Abedjan, Arvid Heise, Felix Naumann

CS249: ADVANCED DATA MINING

Towards a hybrid approach to Netflix Challenge

Content-based Dimensionality Reduction for Recommender Systems

Comparison of Recommender System Algorithms focusing on the New-Item and User-Bias Problem

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

Recommendation System for Location-based Social Network CS224W Project Report

A Recommender System Based on Improvised K- Means Clustering Algorithm

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

List of Exercises: Data Mining 1 December 12th, 2015

Local Search Insights

arxiv: v4 [cs.ir] 28 Jul 2016

Assignment 5: Collaborative Filtering

Social Data Exploration

Web Personalization & Recommender Systems

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization

Vector Semantics. Dense Vectors

Machine Learning using MapReduce

Hybrid Recommendation System Using Clustering and Collaborative Filtering

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Diversity in Recommender Systems Week 2: The Problems. Toni Mikkola, Andy Valjakka, Heng Gui, Wilson Poon

Using Social Networks to Improve Movie Rating Predictions

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Efficient Mining Algorithms for Large-scale Graphs

Web Personalization & Recommender Systems

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

More Efficient Classification of Web Content Using Graph Sampling

Text clustering based on a divide and merge strategy

The OTT Co-Viewing Experience: 2017 November 2017

Statistical Disclosure Control meets Recommender Systems: A practical approach

Recommender Systems New Approaches with Netflix Dataset

Efficient Search for Inputs Causing High Floating-point Errors

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

Performance Comparison of Algorithms for Movie Rating Estimation

Semantically Enhanced Collaborative Filtering on the Web

Matrix Co-factorization for Recommendation with Rich Side Information HetRec 2011 and Implicit 1 / Feedb 23

A Constrained Spreading Activation Approach to Collaborative Filtering

Graph Mining: Overview of different graph models

Project Report. An Introduction to Collaborative Filtering

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Recommender Systems - Content, Collaborative, Hybrid

Justified Recommendations based on Content and Rating Data

WHITE PAPER Application Performance Management. The Case for Adaptive Instrumentation in J2EE Environments

Recommender Systems: User Experience and System Issues

Introduction to Data Mining

Recommendation System Using Yelp Data CS 229 Machine Learning Jia Le Xu, Yingran Xu

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Project Participants

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety

Hotel Recommendation Based on Hybrid Model

Semantic Clickstream Mining

the uk and the u.s.a.

Constrained Classification of Large Imbalanced Data

Predict Topic Trend in Blogosphere

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Travel Time Estimation of a Path using Sparse Trajectories

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

PARTICLE SWARM OPTIMIZATION (PSO)

GeoTemporal Reasoning for the Social Semantic Web

Know your neighbours: Machine Learning on Graphs

DISTANCE EVALUATED SIMULATED KALMAN FILTER FOR COMBINATORIAL OPTIMIZATION PROBLEMS

Path Optimization in Stream-Based Overlay Networks

An Empirical Comparison of Collaborative Filtering Approaches on Netflix Data

Efficient Search for Inputs Causing High Floating-point Errors

Final Exam Study Guide

CS 124/LINGUIST 180 From Languages to Information

arxiv: v2 [cs.lg] 15 Nov 2011

CS229 Final Project: Predicting Expected Response Times

Collaborative Filtering for Netflix

Unsupervised learning on Color Images

Experiences from Implementing Collaborative Filtering in a Web 2.0 Application

Decentralised and Privacy-Aware Learning of Traversal Time Models

BaggTaming Learning from Wild and Tame Data

Algorithm Design (4) Metaheuristics

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

I Travel on mobile / UK

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Music Recommendation with Implicit Feedback and Side Information

Algorithm Collections for Digital Signal Processing Applications Using Matlab

Visual Query Suggestion

AN IMPROVED DENSITY BASED k-means ALGORITHM

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

Collaborative Filtering based on User Trends

José Miguel Hernández Lobato Zoubin Ghahramani Computational and Biological Learning Laboratory Cambridge University

A Constrained Spreading Activation Approach to Collaborative Filtering

Clustering and Recommending Services based on ClubCF approach for Big Data Application

SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS

Part 11: Collaborative Filtering. Francesco Ricci

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Trademark Matching and Retrieval in Sport Video Databases

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Transcription:

Recommendation with Differential Context Weighting Yong Zheng Robin Burke Bamshad Mobasher Center for Web Intelligence DePaul University Chicago, IL USA Conference on UMAP June 12, 2013

Overview Introduction (RS and Context-aware RS) Sparsity of Contexts and Relevant Solutions Differential Context Relaxation & Weighting Experimental Results Conclusion and Future Work

Introduction Recommender Systems Context-aware Recommender Systems

Recommender Systems (RS) Information Overload Recommendations

Context-aware RS (CARS) Traditional RS: Users Items Ratings Context-aware RS: Users Items Contexts Ratings Companion Example of Contexts in different domains: Food: time (lunch, dinner), occasion (business lunch, family dinner) Movie: time (weekend, weekday), location (home, cinema), etc Music: time (morning, evening), activity (study, sports, party), etc Book: a book as a gift for kids or mother, etc Recommendation cannot live alone without considering contexts.

Research Problems Sparsity of Contexts Relevant Solutions

Sparsity of Contexts Assumption of Context-aware RS: It is better to use preferences in the same contexts for predictions in recommender systems. Same contexts? How about multiple contexts & sparsity? An example in the movie domain: User Movie Time Location Companion Rating U1 Titanic Weekend Home Girlfriend 4 U2 Titanic Weekday Home Girlfriend 5 U3 Titanic Weekday Cinema Sister 4 U1 Titanic Weekday Home Sister? Are there rating profiles in the contexts <Weekday, Home, Sister>?

Relevant Solutions User Movie Time Location Companion Rating U1 Titanic Weekend Home Girlfriend 4 U2 Titanic Weekday Home Girlfriend 5 U3 Titanic Weekday Cinema Sister 4 U1 Titanic Weekday Home Sister? Context Matching The same contexts <Weekday, Home, Sister>? 1.Context Selection Use the influential dimensions only 2.Context Relaxation Use a relaxed set of dimensions, e.g. time 3.Context Weighting We can use all dimensions, but measure how similar the contexts are! (to be continued later) Differences between context selection and context relaxation: Context selection is conducted by surveys or statistics; Context relaxation is directly towards optimization on predictions; Optimal context relaxation/weighting is a learning process!

DCR and DCW Differential Context Relaxation (DCR) Differential Context Weighting (DCW) Particle Swarm Intelligence as Optimizer

Differential Context Relaxation Differential Context Relaxation (DCR) is our first attempt to alleviate the sparsity of contexts, and differential context weighting (DCW) is a finer-grained improvement over DCR. There are two notion in DCR Differential Part Algorithm Decomposition Separate one algorithm into different functional components; Apply appropriate context constraints to each component; Maximize the global contextual effects together; Relaxation Part Context Relaxation References We use a set of relaxed dimensions instead of all of them. Y. Zheng, R. Burke, B. Mobasher. "Differential Context Relaxation for Context-aware Travel Recommendation". In EC-WEB, 2012 Y. Zheng, R. Burke, B. Mobasher. "Optimal Feature Selection for Context-Aware Recommendation using Differential Relaxation". In RecSys Workshop on CARS, 2012

DCR Algorithm Decomposition Take User-based Collaborative Filtering (UBCF) for example. Pirates of the Caribbean 4 Kung Fu Panda 2 Harry Potter 6 Harry Potter 7 U1 4 4 2 2 U2 3 4 2 1 U3 2 2 4 4 U4 4 4 1? Standard Process in UBCF (Top-K UserKNN, K=1 for example): 1). Find neighbors based on user-user similarity 2). Aggregate neighbors contribution 3). Make final predictions

DCR Algorithm Decomposition Take User-based Collaborative Filtering (UBCF) for example. 1.Neighbor Selection 2.Neighbor contribution 3.User baseline 4.User Similarity All components contribute to the final predictions, where we assume appropriate contextual constraints can leverage the contextual effect in each algorithm component. e.g. use neighbors who rated in same contexts.

DCR Context Relaxation User Movie Time Location Companion Rating U1 Titanic Weekend Home Girlfriend 4 U2 Titanic Weekday Home Girlfriend 5 U3 Titanic Weekday Cinema Sister 4 U1 Titanic Weekday Home Sister? Notion of Context Relaxation: Use {Time, Location, Companion} 0 record matched! Use {Time, Location} 1 record matched! Use {Time} 2 records matched! In DCR, we choose appropriate context relaxation for each component. Balance # of matched ratings best performances & least noises

DCR Context Relaxation 1.Neighbor Selection 2.Neighbor contribution 3.User baseline 4.User Similarity c is the original contexts, e.g. <Weekday, Home, Sister> C1, C2, C3, C4 are the relaxed contexts. The selection is modeled by a binary vector. E.g. <1, 0, 0> denotes we just selected the first context dimension Take neighbor selection for example: Originally select neighbors by users who rated the same item. DCR further filter those neighbors by contextual constraint C1 i.e.. C1 = <1,0,0> Time=Weekday u must rated i on weekdays

DCR Drawbacks 1.Neighbor Selection 2.Neighbor contribution 3.User baseline 4.User Similarity 1. Context relaxation is still strict, especially when data is sparse. 2. Components are dependent. For example, neighbor contribution is dependent with neighbor selection. E.g. neighbors are selected by C1: Location = Cinema, it is not guaranteed, neighbor has ratings under contexts C2: Time = Weekend A finer-grained solution is required!! Differential Context Weighting

Differential Context Weighting User Movie Time Location Companion Rating U1 Titanic Weekend Home Girlfriend 4 U2 Titanic Weekday Home Girlfriend 5 U3 Titanic Weekday Cinema Sister 4 U1 Titanic Weekday Home Sister? Goal: Use all dimensions, but we measure the similarity of contexts. Assumption: More similar two contexts are given, the ratings may be more useful for calculations in predictions. Similarity of contexts is measured by Weighted Jaccard similarity c and d are two contexts. (Two red regions in the Table above.) σ is the weighting vector <w1, w2, w3> for three dimensions. Assume they are equal weights, w1 = w2 = w3 = 1. J(c, d, σ) = # of matched dimensions / # of all dimensions = 2/3

Differential Context Weighting 1.Neighbor Selection 2.Neighbor contribution 3.User baseline 4.User Similarity 1. Differential part Components are all the same as in DCR. 2. Context Weighting part (for each individual component): σ is the weighting vector ϵ is a threshold for the similarity of contexts. i.e., only records with similar enough ( ϵ) contexts can be included. 3.In calculations, similarity of contexts are the weights, for example 2.Neighbor contribution It is similar calculation for the other components.

Particle Swarm Optimization (PSO) The remaining work is to find optimal context relaxation vectors for DCR and context weighting vectors for DCW. PSO is derived from swarm intelligence which helps achieve a goal by collaborative Fish Birds Bees Why PSO? 1). Easy to implement as a non-linear optimizer; 2). Has been used in weighted CF before, and was demonstrated to work better than other non-linear optimizer, e.g. genetic algorithm; 3). Our previous work successfully applied BPSO for DCR;

Particle Swarm Optimization (PSO) Swarm = a group of birds Particle = each bird each run in algorithm Vector = bird s position in the space Vectors we need Goal = the location of pizza Lower prediction error So, how to find goal by swam? 1.Looking for the pizza Assume a machine can tell the distance 2.Each iteration is an attempt or move 3.Cognitive learning from particle itself Am I closer to the pizza comparing with my best locations in previous history? 4.Social Learning from the swarm Hey, my distance is 1 mile. It is the closest!. Follow me!! Then other birds move towards here. DCR Feature selection Modeled by binary vectors Binary PSO DCW Feature weighting Modeled by real-number vectors PSO How it works? Take DCR and Binary PSO for example: Assume there are 4 components and 3 contextual dimensions Thus there are 4 binary vectors for each component respectively We merge the vectors into a single one, the vector size is 3*4 = 12 This single vector is the particle s position vector in PSO process.

Experimental Results Data Sets Predictive Performance Performance of Optimizer

Context-aware Data Sets AIST Food Data Movie Data # of Ratings 6360 1010 # of Users 212 69 # of Items 20 176 # of Contexts Real hunger (full/normal/hungry) Virtual hunger Time (weekend, weekday) Location (home, cinema) Companions (friends, alone, etc) Other Features User gender Food genre, Food style Food stuff User gender Year of the movie Density Dense Sparse Context-aware data sets are usually difficult to get. Those two data sets were collected from surveys.

Evaluation Protocols Metric: root-mean-square error (RMSE) and coverage which denotes the percentage we can find neighbors for a prediction. Our goal: improve RMSE (i.e. less errors) within a decent coverage. We allow a decline in coverage, because applying contextual constraints usually bring low coverage (i.e. the sparsity of contexts!). Baselines: context-free CF, i.e. the original UBCF contextual pre-filtering CF which just apply the contextual constraints to the neighbor selection component no other components in DCR and DCW. Other settings in DCR & DCW: K = 10 for UserKNN evaluated on 5-folds cross-validation T = 100 as the maximal iteration limit in the PSO process Weights are ranged within [0, 1] We use the same similarity threshold for each component, which was iterated from 0.0 to 1.0 with 0.1 increment in DCW

Predictive Performances Blue bars are RMSE values, Red lines are coverage curves. Findings: 1) DCW works better than DCR and two baselines; 2) Significance t-test shows DCW works significantly in movie data, but DCR was not significant over two baselines; DCW can further alleviate sparsity of contexts and compensate DCR; 3) DCW offers better coverage over baselines!

Performances of Optimizer Running time is in seconds. Using 3 particles is the best configuration for two data sets here! Factors influencing the running performances: More particles, quicker convergence but probably more costs; # of contextual variables: more contexts, probably slower; Density of the data set: denser, more calculations in DCW; Typically DCW costs more than DCR, because it uses all contextual dimensions and the calculation for similarity of contexts is time-consuming, especially for dense data, like the Food data.

Other Results (Optional) 1.The optimal threshold for similarity of contexts For Food data set, it is 0.6; For Movie data set, it is 0.1; 2.The optimal weighting vectors (e.g. Movie data) Note: Darker smaller weights; Lighter Larger weights

It is gonna end Conclusions Future Work

Conclusions We propose DCW which is a finer-grained improvement over DCR; It can further improve predictive accuracy within decent coverage; PSO is demonstrated to be the efficient optimizer; We found underlying factors influencing running time of optimizer; Stay Tuned DCR and DCW are general frameworks (DCM, i.e. differential context modeling as the name of this framework), and they can be applied to any recommendation algorithms which can be decomposed into multiple components. We have successfully extend its applications to item-based collaborative filtering and slope one recommender. References Y. Zheng, R. Burke, B. Mobasher. "Differential Context Modeling in Collaborative Filtering ". In SOCRS-2013, Chicago, IL USA 2013

Future Work Try other similarity of contexts instead of the simple Jaccard one; Introduce semantics into the similarity of contexts to further alleviate the sparsity of contexts, e.g., Rome is closer to Florence than Paris. Parallel PSO or put PSO on MapReduce to speed up optimizer; Acknowledgement Student Travel Support from US NSF (UMAP Platinum Sponsor) See u later The 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Chicago, IL USA, Aug 11-14, 2013

Thank You! Center for Web Intelligence, DePaul University, Chicago, IL USA