A Brief Review of Representation Learning in Recommender Systems @ 赵鑫 RUC batmanfly@qq.com
Representation learning
Overview of recommender systems Tasks Rating prediction Item recommendation Basic models MF LibFM
Rating Prediction User-item matrix i1 i2 u1 1? u2? 5 u3 3? u4? 2 Online test Offline test
Item Recommendation User-item matrix i1 i2 u1 yes? u2? yes u3 no? u4 yes yes Online test Offline test Retrieval-based metrics, e.g., P@k, R@k, MAP
Context-Aware Recommendation When you know more information about users and items
More complicated tasks
Practical Considerations
Rating Prediction User-item matrix i1 i2 u1 1? u2? 5 u3 3? u4? 2 Online test Offline test
Latent Factor Models
Matrix factorization
A Basic Model
Another formulation A Basic Model
Probabilistic Matrix Factorization
Probabilistic Matrix Factorization
Context-Aware Recommendation When you know more information about users and items
LibFM
Outline of the approaches Recommendation by network embedding Recommendation by word embedding Embedding as regularization Recommendation by TransE Recommendation by metric learning Recommendation by multi-modality fusion
Outline of the approaches Recommendation by network embedding Recommendation by word embedding Embedding as regularization Recommendation by TransE Recommendation by metric learning Recommendation by multi-modality fusion
What is network embedding? We map each node in a network into a lowdimensional space Distributed representation for nodes Similarity between nodes indicate the link strength Encode network information and generate node representation 20
Example Zachary s Karate Network: 21
Framework 22
LINE First-order Proximity 2 3 4 1 5 6 7 8 9 10 Vertex 6 and 7 have a large first-order proximity The local pairwise proximity between the vertices Determined by the observed links However, many links between the vertices are missing Not sufficient for preserving the entire network structure From Jian Tang s slides
LINE 2 3 4 1 5 6 7 8 9 10 Vertex 5 and 6 have a large second-order proximity p 5 = (1,1, 1,1,0,0,0,0,0,0) p 6 = (1,1, 1,1,0,0,5,0,0,0) Second-order Proximity The proximity between the neighborhood structures of the vertices Mathematically, the second-order proximity between each pair of vertices (u,v) is determined by: p u = (w u1, w u2,, w u V ) p v = (w v1, w v2,, w v V ) From Jian Tang s slides
LINE Preserving the First-order Proximity Given an undirected edge v i, v j, the joint probability of v i, v j 1 p 1 v i, v j = 1 + exp ( u T i u j ) u i : Embedding of vertexv i v i p 1 v i, v j = (i,j ) w ij w i j Objective: O 1 = d(p 1,, p 1, ) w ij log p 1 (v i, v j ) i,j E KL-divergence From Jian Tang s slides
LINE Preserving the Second-order Proximity Given a directed edge (v i, v j ), the conditional probability of v j given v i is: p 2 v j v i = exp(u j T u i ) V k=1 exp(u k T u i ) u i : Embedding of vertex i when i is a source node; u i : Embedding of vertex i when i is a target node. p 2 v j v i = w ij k V w ik Objective: O 2 = λ i d(p 2 v i, p 2 v i ) i V w ij log p 2 (v j v i ) i,j E λ i : Prestige of vertex in the network λ i = j w ij From Jian Tang s slides
LINE Preserving both Proximity Concatenate the embeddings individually learned by the two proximity First-order Second-order From Jian Tang s slides
Recommendation by network embedding Learning Distributed Representations for Recommender Systems with a Network Embedding Approach (Zhao et al, AIRS 2016) Motivation
Recommendation by network embedding Given any edge in the network
Recommendation by network embedding User-item recommendation
Recommendation by network embedding User-item-tag recommendation
Outline of the approaches Recommendation by network embedding Recommendation by word embedding Embedding as regularization Recommendation by TransE Recommendation by metric learning Recommendation by multi-modality fusion
Recommendation by word embedding Recall word2vec Input: a sequence of words from a vocabulary V Output: a fixed-length vector for each term in the vocabulary v w It implements the idea of distributional semantics using a shallow neural network model.
Recommendation by word embedding Generalized token2vec Input: a sequence of symbol tokens from a vocabulary V Output: a fixed-length vector for each symbol in the vocabulary v w You can imagine that all the sequences in which surrounding contexts are sensitive can potentially be modeled with word2vec.
Recommendation by word embedding POI data modeling User ID Location ID Check-in time Category label/name GPS information Check-in information User connections
A sequential way to model POI data Given a user u, a trajectory is a sequence of check-in records related to u User ID Location ID Check-in Timestamp u1 l181 2016-08-26 9:26am u1 l32 2016-08-26 10:26am u1 l323 2016-08-25 11:26am u1 l32323 2016-08-25 1:26pm u2 l345 2016-08-26 9:16am u2 l13 2016-08-26 10:36am
A sequential way to model POI data Given a user u, a trajectory is a sequence of check-in records related to u User ID Location ID Check-in Timestamp u1 l181 2016-08-26 9:26am u1 l32 2016-08-26 10:26am u1 l323 2016-08-25 11:26am u1 l32323 2016-08-25 1:26pm u2 l345 2016-08-26 9:16am u2 l13 2016-08-26 10:36am u1: l181 l32 l323 l32323 u2: l345 l13
Task Input: Check-in sequences together with user relations Output: Embedding representations for users, locations and other related information Zhao et al., ACM TKDD 2017
Recall CBOW CBOW predicts the current word using surrounding contexts Pr(w t context(w t )) Window size 2c context(w t ) = [w t c,, w t+c ]
Model sequential relatedness A direct application of doc2vec
Modeling social connectedness A skip-gram way to model all the friends
A joint model to characterize trajectories and links Jointly optimizing the two loss functions
Modeling multi-grained sequential contexts A long trajectory sequence can be split into multiple segments User ID Location ID Check-in Timestamp u1 l181 2016-08-26 9:26am u1 l32 2016-08-26 10:26am u1 l323 2016-08-25 11:26am u1 l32323 2016-08-25 1:26pm u1: s1 s2 s1: l181 l32 s2: l323 l32323
Modeling multi-grained sequential contexts Modeling segment-level relatedness u1: s1 s2 s1: l181 l32 s2: l323 l32323
Modeling multi-grained sequential contexts Modeling location-level relatedness u1: s1 s2 s1: l181 l32 s2: l323 l32323
The joint hierarchical model Jointly optimizing three objective functions
Recommendation by word embedding Token2vec for product recommendation Doc2vec (Zhao et al., IEEE TKDE 2016) Doc user Word product A user profiling way
Recommendation by word embedding Token2vec for next-basket recommendation (Wang et al., SIGIR 2015)
Outline of the approaches Recommendation by network embedding Recommendation by word embedding Embedding as regularization Recommendation by TransE Recommendation by metric learning Recommendation by multi-modality fusion
Matrix factorization Motivation It mainly captures user-item interactions The item co-occurrence across users has been ignored Liang et al., RecSys 2016
Item embedding Motivation Levy and Goldberg show an equivalence between skip-gram word2vec trained with negative sampling value of k and implicit factorizing the pointwise mutual information (PMI) matrix shifted by log k. We can factorize the item co-occurrence matrix to obtain item embeddings
The joint model MF with embedding regularization
TransE Characterizing the triple relations
Next recommendation scenario What s the next movie to watch? He et al., RecSys 2017
Next recommendation scenario What s the next movie to watch? A traditional method Markov chain and factorized Markov chain
Next recommendation scenario What s the next movie to watch? A TransE based approach
Next recommendation scenario What s the next movie to watch? A TransE based approach
Outline of the approaches Recommendation by network embedding Recommendation by word embedding Embedding as regularization Recommendation by TransE Recommendation by metric learning Recommendation by multi-modality fusion
Metric learning for recommendation Metric A metric on a set X is a function The following conditions are satisfied
Metric learning for recommendation Metric learning The most original metric learning approach attempts to learn a Mahalanobis distance metric We can define the objective function
Metric learning for recommendation Metric Learning for knn Large margin nearest neighbor (LMNN) Pull loss Push loss
Metric learning for recommendation Representation-based metric learning Distance function Loss function Hsieh et al., WWW 2017
Metric learning for recommendation Representation-based metric learning Improving representations by integrating item features Regularization The joint loss
Outline of the approaches Recommendation by network embedding Recommendation by word embedding Embedding as regularization Recommendation by TransE Recommendation by metric learning Recommendation by multi-modality fusion
Multi-modality representation Rich side information
Multi-modality representation Rich side information Zhang et al., KDD 2016
Multi-modality representation Rich side information Modeling KB information
Multi-modality representation Rich side information Modeling text information
Multi-modality representation Rich side information Modeling image information
Multi-modality representation Rich side information Generative process
Multi-modality representation Complementary effect of visual and textual features Chen et al., to appear in AIRS 2017
Multi-modality representation A Multi-task learning method Chen et al., to appear in AIRS 2017
Future work ItemKNN MF (svd++) BPR FM? Why svd++, BPR and FM perform so well consistently on various datasets? How recommender systems borrow ideas from representation learning and deep learning? What is the future direction for recommender systems?
Thanks Wayne Xin Zhao, Sui Li, Yulan He, Edward Y. Chang, Ji-Rong Wen, Xiaoming Li: Connecting Social Media to E-Commerce: Cold-Start Product Recommendation Using Microblogging Information. IEEE Trans. Knowl. Data Eng. 28(5): 1147-1159 (2016) Pengfei Wang, Jiafeng Guo, Yanyan Lan, Jun Xu, Shengxian Wan, Xueqi Cheng. Learning Hierarchical Representation Model for NextBasket Recommendation. SIGIR 2015: 403-412 Xu Chen, Yongfeng Zhang, Wayne Xin Zhao and Zheng Qin. A Collaborative Neural Model for Rating Prediction by Leveraging User Reviews and Product Images. To appear in AIRS 2017. Wayne Xin Zhao, Feifan Fan, Ji-Rong Wen, Edward Chang. Joint Representation Learning for Location-based Social Networks with Multi-Grained Sequential Contexts. To appear in ACM TKDD. Liang, J. Altosaar, L. Charlin, and D. Blei. Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence. ACM RecSys, 2016. Ruining He, Wang-Cheng Kang, Julian McAuley. Translation-based Recommendation. RecSys 2017: 161-169 Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge J. Belongie, Deborah Estrin. Collaborative Metric Learning. WWW 2017: 193-201 Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, Wei-Ying Ma. Collaborative Knowledge Base Embedding for Recommender Systems. KDD 2016: 353-362