Collaborative Metric Learning

Size: px

Start display at page:

Download "Collaborative Metric Learning"

Alfred Malone
5 years ago
Views:

1 Collaborative Metric Learning Andy Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, Deborah Estrin Connected Experience Lab, Cornell Tech AOL CONNECTED EXPERIENCES LAB CORNELL TECH 1

2 Collaborative Metric Learning A different perspective on collaborative filtering Better accuracy Extremely efficient Top-K recommendations Easy to interpret and extend 2

3 User-Item Matrix Items Users 3

4 Matrix Factorization (MF) Items Items Users Users 4

5 Implicit Feedback Ubiquitous in today s online services Only positive feedback is available Traditional MF does not work Click Thumbs up Like 5

6 Matrix Factorization for Implicit Feedback Weighted Regularized Matrix Factorization (WRMF) [Hu08] Probabilistic Matrix Factorization (PMF) [Salakhutdinov08] Bayesian Personalized Ranking (BPR) [Rendle09] and many more 6

7 Think Beyond Matrix Explicit Implicit No longer about estimating ratings But about modeling the relationships between different user/item pairs 7

8 Think Beyond Matrix Explicit Implicit No longer about estimating ratings But about modeling the relationships between different user/item pairs 8

9 Metric Learning Unknown relationships Known relationships 9

10 Collaborative Metric Learning Learn a joint user-item distance metric. The Euclidean distances reflect the relationships between users/items. 10

11 Based on the inherent Triangular Inequality of Metric Learning If A is close to B, and B is close to C, then A is close to C. Fit the model with implicit feedback 1. An user is pulled closer to the items she liked 2. Other similar users are pulled closer. 3. The items users liked are also pulled closer. Top-K recommendations are simply KNN search (a well-optimized task) 11

12 Based on the inherent Triangular Inequality of Metric Learning If A is close to B, and B is close to C, then A is close to C. Fit the model with implicit feedback 1. An user is pulled closer to the items she liked 2. Other similar users are pulled closer. 3. The items users liked are also pulled closer. Top-K recommendations are simply KNN search (a well-optimized task) 12

Based on the inherent Triangular Inequality of Metric Learning If A is close to B, and B is close to C, then A is close to C. Fit the model with implicit feedback 1.

13 Based on the inherent Triangular Inequality of Metric Learning If A is close to B, and B is close to C, then A is close to C. Fit the model with implicit feedback 1. An user is pulled closer to the items she liked 2. Other similar users are pulled closer. 3. The items users liked are also pulled closer. Top-K recommendations are simply KNN search (a well-optimized task) 13

14 Collaborative Large Margin Nearest Neighbor Before Safety Margin After User Positive item Imposter Gradients * The outline of figure is inspired by Weinberger, Kilian Q., John Blitzer, and Lawrence Saul. "Distance metric learning for large margin nearest neighbor classification." Advances in neural information processing systems 18 (2006):

15 Pitfalls of Matrix Factorization (Dot-Product) Dot-Product violates triangle inequality misleading embedding. 15

16 Pitfalls of Matrix Factorization (Dot-Product) Dot-Product violates triangle inequality misleading embedding. V # $ V % = 0: does not reflect that they are both liked by U * U # $ U % = 0: does not reflect that they both share the same interest as U * 16

17 Collaborative Metric Learning Embedding Euclidian distance faithfully reflects the relative relationships. 17

18 Integrating Item Features Use a learnable function (e.g. Multi-Layer Perceptron) to project features into user-item embedding. Treat the projections as a prior for items' locations. 18

19 Evaluation 6 Datasets from Different Domains Papers - CiteULike Books - BookCrossing Photography - Flickr Articles - Medium Movies - MovieLens Music - EchoNest 19

20 Accuracy 100 Improvements Over BPR (%) * * * * CiteULike BookCX Flickr Medium MovieLens EchoNest WRMF WARP CML * Indicate that CML > the second best algorithm is statistically significant according to Wilcoxon signed rank test 20

21 Accuracy (with Item Features) 120 Improvements Over Factorization Machine (%) * * * CiteULike BookCX Flickr Medium MovieLens VBPR CDL CML+F * Indicate that CML > the second best algorithm is statistically significant according to Wilcoxon signed rank test 21

22 Efficiency All optimized with LSHs 8x faster CML s throughput is improved by 106x with only 2% reduction in accuracy Over 8x faster than (optimized) MF models given the same accuracy s are brute force search 22

23 Embedding Interpretability A A B C B C 23

24 Conclusions The notion of user-item matrix and matrix factorization becomes less applicable with implicit feedback. CML is a metric learning model that has better accuracy, efficiency, interpretability, and extensibility. Applying metric-based algorithms, such as K-means, and SVMs, to other recommendation problems. 24

25 Thank you! AOL CONNECTED EXPERIENCES LAB CORNELL TECH 25

A Brief Review of Representation Learning in Recommender 赵鑫 RUC

A Brief Review of Representation Learning in Recommender Systems @ 赵鑫 RUC batmanfly@qq.com Representation learning Overview of recommender systems Tasks Rating prediction Item recommendation Basic models