Hierarchical Mixed Neural Network for Joint Representation Learning of Social-Attribute Network

Size: px

Start display at page:

Download "Hierarchical Mixed Neural Network for Joint Representation Learning of Social-Attribute Network"

Kelley Short
5 years ago
Views:

1 Hierarchical Mixed Neural Network for Joint Representation Learning of Social-Attribute Network Weizheng Chen 1(B), Jinpeng Wang 2, Zhuoxuan Jiang 1, Yan Zhang 1, and Xiaoming Li 1 1 School of Electronics Engineering and Computer Science, Peking University, Beijing, China cwz.pku@gmail.com, jzhx1211@gmail.com, zhyzhy001@gmail.com, lxm.at.pku@gmail.com 2 Microsoft Research, Beijing, China jinpwa@microsoft.com Abstract. Most existing network representation learning (NRL) methods are designed for homogeneous network, which only consider topological properties of networks. However, in real-world networks, text or categorical attributes are usually associated with nodes, providing another description for networks in a different perspective. In this paper, we present a joint learning approach which learns the representations of nodes and attributes in the same low-dimensional vector space simultaneously. Particularly, we show that more discriminative node representations can be acquired by leveraging attribute features. The experiments conducted on three social-attribute network datasets demonstrate that our model outperforms several state-of-the-art baselines significantly for node classification task and network visualization task. Keywords: Social-attribute network Representation learning Joint learning 1 Introduction The growth of online social media produces massive amounts of user-generated content, such as tweets posted in Twitter, personal profile in LinkedIn. These data and social relations among users make up complex heterogeneous information network [16], which is an effective organization form of multi-source data. Mining such heterogeneous information network is crucial for various research tasks and commercial applications, for example, node classification [9] and product recommendation [15]. During the last few years, representation learning, also known as embedding, has become a promising and powerful tool in network analysis area. Since the density of typical social network is usually quite small in the real world, c Springer International Publishing AG 2017 J. Kim et al. (Eds.): PAKDD 2017, Part I, LNAI 10234, pp , DOI: /

2 Hierarchical Mixed Neural Network for Joint Representation Learning 239 traditional network representation such as adjacency matrix suffers from the data sparsity problem. Thus we can t apply most statistical machine learning algorithms to solve network analysis task directly. To overcome this problem, many NRL methods which aim to project the nodes into a low-dimensional continuous vector space have been proposed. Among those representative methods, DeepWalk [14], LINE [17], GraRep [3] and node2vec [8] learn general node representations which are not tuned for specific task for homogeneous network by only considering the structural features. However, both text and categorical attributes play an crucial role in real-life networks, e.g., papers or authors in citation networks are associated with corresponding text content, users in Twitter or Facebook have profiles with categorical attributes such as gender and job. It is necessary to uncover the potential effect of nodes attributes in NRL process. To utilize the rich text content information of nodes, Yang et al. [20] presents Text-associated DeepWalk (TADW) to learn network representations from both network structure and text attributes in an inductive matrix completion framework. Because of high dimensionality of text attributes (namely words), singularvalue decomposition is performed on the node-attribute matrix to get robust attribute features in TADW. However, TADW has two serious weaknesses: 1. Unlike text attributes, categorical attributes space is low-dimensional, e.g., only up to dozens of demographic attributes appears in the mobile social network [6] and Twitter network [4], which makes TADW unsuitable in such a scenario. 2. The performance of TADW falls fast if texts of some nodes are missing. Note that, text information is often incomplete. For example, in online social media, some users text features are difficult to obtain due to their privacy settings or they actually never publish any texts. To overcome the above problems, in this paper, we propose Social-Attribute Network Representation Learning (SANRL), a scalable joint NRL framework which preserve both structural and attribute information in the unified representations. Compared with TADW, which only learn representations of nodes, our model also learn representations of attributes in the same low dimensional vector space. A hierarchical mixed neural network model is adopted to model the interactive relationship between the nodes and the attributes. We conduct experiments with three real-world network datasets, including a social network with categorical attributes, a co-author network with text attributes and a citation network with text attributes. In summary, this paper has the following three major contributions: 1. We propose a network representation learning model for the heterogeneous social-attribute networks, which can handle either categorical attributes or text attributes. Our model can make use of limited attribute information by using a coupled architecture, which makes it more flexible in real scenarios. 2. Our model can map the nodes and the attributes to the same space, which provide meaningful features for various applications. 3. The experimental results show that our model outperforms other competitive baselines significantly.

3 240 W. Chen et al. 2 Related Work Most existing unsupervised NRL models only consider the network structure information. These models are mainly based on two classical assumptions. The first is only suitable for the undirected networks, called smoothness assumption, in which two linked nodes should have close representations. Laplacian Eigenmaps [1], LINE with first-order proximity, SDNE [19] all adopt the smoothness assumption. The second is also applicable to the directed networks, in which two nodes should have close representations if they have similar neighbors. Here, the neighbors can be high-order. DeepWalk, GraRep and node2vec all adopt this assumption. To incorporate the text attribute information, TADW adopts the inductive matrix completion technique to incorporates the tfidf matrix into Deep- Walk. As far as we know, how to use the categorical attributes of nodes has not been studied yet. Recently, learning network representations in a semi-supervised way has drawn many attentions by incorporating the label information [11, 18, 21]. All these models are trained in a transductive way, which means that they use a combination of an unsupervised NRL model and a classifier trained on the labeled nodes. However, all these models only learn the distributed representations of the nodes. Unlike semi-supervised NRL models which leverage the label information to get more discriminative node representation, our motivation is to enhance node representation with rich categorical or text attributes via jointly projecting the nodes and the attributes into the same vector space. In this paper, we follow the line of the unsupervised NRL models. In the future, we will explore the semi-supervised extension of our model by adding a classifier to it. 3 Problem Definition We adopt a social-attribute network (SAN) [7] G =(V,E,A,C) to represent an attributed network in this work. V is the set of social nodes, each representing a data object. E is the set of social links, each representing an edge between two social nodes. A is the set of attribute nodes, each representing an attribute. C is the set of attribute links, each representing an affiliation relationship between an attribute node and a social node. For simplification, we use A(v) ={a (a, v) C} to represent the attribute sets of node v V. All attribute links are undirected, while social links can be directed or undirected and weighted or unweighted depending upon the data type. Moreover, the weight of attribute links are defined as relative importance of certain attribute for corresponding social node. As shown on the left side of Fig. 1, we present a sample SAN which has five social nodes and four attribute nodes. Our research target is to a learn informative continuous vector representation u R d for each social node of a SAN, where d V. In vector space R d, both structural information and attribute information should be preserved. Then the learned low-dimensional representations can used as input features to a variety of machine learning models such as logistic regression for classification task, k-means for clustering task. For

4 Hierarchical Mixed Neural Network for Joint Representation Learning 241 Attribute layer A vector space a1 a2 a3 a4 a1 v1 a4 v 1 v 2 v 3 v 4 a 1 a 2 a 3 a 4 Representation learning a2 v2 a3 v3 v2 v4 v5 v1 v3 v4 v5 attribute node social node Social layer attribute link social link u v or u a Fig. 1. Illustration of learning representations jointly for a toy social-attribute network. Fig. 2. Mixed binary Huffman tree used in SANRL. simplification and clarification, in the following paper, when we say node, we refer to a social node. And when we say attribute, we refer to an attribute node. 4 SANRL Model 4.1 Loss Function for Structural Information We adapt DeepWalk to learn representations of social nodes based on the assumption that nodes which have similar neighborhoods will have similar representations. Here, neighborhoods refer to both direct neighbors and higher order neighbors. Next, we give a brief outline of DeepWalk. Given a social network G = (V,E) (e.g., social layer of Fig. 1), Deep- Walk generates many node sequences as training data. More specifically, staring from a node v 1, DeepWalk generates γ fixed-length sequences of nodes S v1 k = {v 1,v 2,...,v t } through random walk for 1 k γ. Repeat the above process for every node v V.ThenwegetasetS seq which contains γ V node sequences. By feeding shuffled S seq to Skip-Gram [12], an efficient and scalable method for learning word representations, the following objective loss function will be minimized: O 1 = log P (S seq ) = log P (Sk) v 1 k γ = log P (v i w,...,v i 1,v i+1,...,v i+w v i ) 1 k γ v i Sk v = log P (v j v i ), 1 k γ v i Sk v i w j i+w,j i where w is the size of the sliding window and P (v j v i ) is formulated using softmax function: P (v j v i )= exp(u v T j u vi ) exp(u v T u vi ), (2) where u v and u v are the input and output representations of social node v. (1)

5 242 W. Chen et al. In the context of SAN, we replace Eq. (2) with the following Eq. (3) ino 1 to take attribute nodes into consideration: P (v j v i )= exp(u T v exp(u v T j u vi ) u vi )+ a A exp(u T a u vi ). (3) Here u a and u a are the input and output representations of attribute node a. In addition, the original DeepWalk is only applicable to unweighted networks. Therefore, weighted random walk is adopted in the sampling process to handle weighted network in our model. 4.2 Loss Function for Attribute Information Considering a social node v with categorical attribute sets A(v), we use the co-occurrence patterns to model P (A(v)) log P (A(v)) = {log P (a i v) + log P (v a i )+ log P (a j a i )}, (4) a i A(v) a j A(v),j i where log P (a i v j ), log P (v j a i ) and log P (a j a i ) are defined as: P (a i v j )= P (v j a i )= P (a j a i )= exp(u T a i u vj ) exp(u T v u vj )+ a A exp(u T v j u ai ) exp(u T v u ai )+ a A exp(u T a j u ai ) exp(u T v u ai )+ a A exp(u T a u vj ), (5) exp(u T a u ai ), (6) exp(u T a u ai ). (7) The above three kinds of conditional probability can capture different similarities in the SAN. Firstly, by maximizing P (a i v j ), nodes with many similar attributes will tend to have close representations, which is the key idea of the PV-DBOW model [10]. Secondly, by maximizing P (a j a i ), paradigmatic relations will be modeled. As a consequence, attributes with many similar contexts will tend to have close representations, which is the key idea of the Word2Vec model [12]. Finally, by maximizing P (v j a i ), syntagmatic relations will be modeled. Then attributes often co-occur will tend to have close representations. Unlike categorical attributes, word order and word frequency are essential properties of text attributes, which are not captured in Eq. (4). To make Eq. (4) suitable for both categorical and text attributes, we replace A(v) by a attribute set sampled from A(v). More specifically, given a node v V, we generate γ fixed-length sequences of attributes Tk v = {a 1,...,a w } through random sampling for 1 k γ. For text attributes, Tk v is a text window whose length is w sampled from the corresponding document of v. But for categorical attributes,

6 Hierarchical Mixed Neural Network for Joint Representation Learning 243 Tk v is generated by sampling w attributes independently from A(v). Using T to represent the γ V attribute sequences acquired in the preceding procedure, we define the following objective loss function for attribute information: O 2 = log P (T ) = = 1 k γ log P (T v k ) 1 k γ a i Tk v {log P (a i v) + log P (v a i )+ a j A(v),j i log P (a j a i )}. (8) 4.3 Loss Function of SANRL To integrate both structure and attribute information into a joint representation learning framework, we use a weighted linear combination of O 1 and O 2 to formulate our objective loss function of the SANRL model: O joint = O 1 + λo 2, (9) where λ is a trade-off parameter. If the network has rich discriminative attributes, λ should be a big number. Otherwise, a small λ is suitable. By minimizing O joint, we can get two resulting vectors u v and u v for each node v, and two resulting vectors u a and u a for each attribute a. Finally, u v will be used as the feature vector of v. 4.4 Learning and Complexity Analysis The optimization scheme of SANRL model is similar to DeepWalk, in which Skip- Gram is applied to maximize the co-occurrence probability among node-node pairs, node-attribute pairs, attribute-attribute pairs and attribute-node pairs. In practice, all the representation vectors are initialized randomly at first. Then node sequences and attribute sequences are sampled alternately and iteratively as the input data streams. In our implementation, the effect of λ is reflected by controlling the sampling probability of attribute sequences. By using the back-propagation algorithm to estimate the derivatives, Eq. (9) can be optimized by adopting the asynchronous stochastic gradient descent (ASGD) algorithm. Directly computing softmax functions defined in Eqs. (3), (5), (6) and (7) is very expensive, so we use hierarchical softmax technique to speed up training. In previous works [5,13], hierarchical softmax is widely used to train a similar neural network, in which only one type of objects such as words or nodes are mapped to the leaves of one binary Huffman tree. However, the application of hierarchical softmax in our model is very different from them. As shown in Fig. 2, every social node and attribute node are associated with one leaf in a same single mixed Huffman tree. And the input of this neural network can be either the representation vector of a node or an attribute. Thus given an arbitrary training

7 244 W. Chen et al. instance whose form is an input-output pair, the computational complexity of computing P (output input) can be reduced from O( V + A ) to O(log( V + A )). Overall, the time complexity of SANRLis γ V d(t + λw) log( V + A ) Because A is much more smaller than V in most cases, the computational complexity of SANRL is still acceptable. 5 Experiments 5.1 Dataset Weibo Dataset. We have crawled 294,634 users detailed information from Sina Weibo, the most popular microblogging service in China, including profiles and 3,183,187 following relationships among them. In our Weibo dataset, the user profile contains four demographic variables: Gender: male; female. Age: 1 11; 12 17; 18 29; 30 44; 45 59; 60+. Education: literature, history; natural science; engineering; economics; medicine; art. Job: Internet industry; creative design industry; cultural media industry; public service industry; manufacturing industry; scientific research industry; pharmaceutical industry; business management industry; Here, we regard every possible value of each demographic variable as a categorical attribute. Moreover, an user may have multiple education or job attributes but values in gender and age are exclusive. We build two SANs based on Weibo dataset for node classification task, i.e., the Weibo-education network and the Weibo-job network. In the Weibo-education network, education attributes are treated as labels, which are not used in the representation learning process. The Weibo-job network is organised in the same manner. DBLP Dataset. We use DBLP-four-area dataset provided in [9] to build a weighted co-author network. This data contains 20 major computer conferences from four related areas, i.e., data mining, database, information retrieval and machine learning, and 27,199 authors and all their publications in these conferences. If two authors have co-authored a paper, we add an undirected edge between them. The weight of the edge is the number of their collaborative papers. Finally, the number of edges in 66,832. The titles of all the paper published by one author is recognized as his or her text attributes. The size of the word vocabulary is 12,091. If an author publishes a paper in a certain conference, the research areas of this conference will be added to the author s label set. 5.2 Compared Algorithms DeepWalk [14]. DeepWalk is the first work which adopts the neural network language model to solve NRL problem.

8 Hierarchical Mixed Neural Network for Joint Representation Learning 245 LINE[17]. LINE can learn two representation vectors for each node by optimizing two carefully designed objective function that preserves the first-order proximity and second-order proximity. Then the two representations are concatenated as the final representation. LDA[2]. Latent Dirichlet Allocation (LDA) is a classical probabilistic topic model. Each node can be represented as a topic distribution vector. TADW [20]. TADW is a state-of-the-art NRL algorithm based on matrix decomposition. First, Singular-value decomposition is performed on the tfidf matrix to get robust text features of nodes. Then, the text features and a node relation matrix are fed to an inductive matrix completion framework to get node representations. SANRL. Our proposed method. For SANRL, the sliding window size w = 10, the length of each node sequence t = 40, number of node sequences for per node γ = 80. We set the trade-off parameter λ = 2 for two Weibo networks and λ = 8 for DBLP-author network. SANRL doc. A simplified version of SANRL, in which the final objective loss function is Eq. (8). We treat the attribute sets of a node as its pseudo document. Parameter settings of SANRL doc is same to SANRL. 5.3 Node Classification Table 1. Macro-F1 (%) of node classification on the Weibo-education network. % labeled nodes 10% 20% 30% 40% 50% 60% 70% 80% 90% DeepWalk LINE SANRL doc SANRL Table 2. Macro-F1 (%) of node classification on the Weibo-job network. % labeled nodes 10% 20% 30% 40% 50% 60% 70% 80% 90% DeepWalk LINE SANRL doc SANRL Following the settings in previous works [14, 17], we also use the multi-label node classification task to evaluate the quality of the representation vectors learned by different models. The one-vs-the-rest logistic regression classifier implemented in LibLinear is used in our experiments. The Macro-F1 is chosen as the evaluation metric. We follow the suggested parameter settings in the original

9 246 W. Chen et al. Table 3. Macro-F1 (%) of node classification the DBLP-author network. % labeled nodes 10% 20% 30% 40% 50% 60% 70% 80% 90% DeepWalk LINE SANRL doc SANRL LDA TADW papers for the baseline models. Note that, since there is no text attributes in the Weibo dataset, LDA and TADW are inapplicable to Weibo. Tables 1, 2, and 3 show the results of classification with different training ratios on three networks when d = 128 for all the models. All reported results are averaged over 10 runs. Firstly, we observe that SANRL always significantly outperform other baselines. Compared with DeepWalk, SANRL achieves nearly 6% and 10% improvement on two Weibo networks and 13% improvement on DBLP-author network, which proves that the network represent can be enhanced with either categorical attributes or text attributes. Secondly, on the DBLP-author network, the two content-based methods, SANRL doc and LDA, perform better than the structure-based methods, Deep- Walk and LINE. By benefiting from incorporating attributes information and structure information, SANRL and TADW both outperform other four methods. But the relative improvement of SANRL over TADW is around 3.5% since attributes information and structure information is better balanced with a tunable trade-off parameter λ in SANRL. 5.4 Parameter Sensitivity We also explore the sensitivity of the performance w.r.t. the dimension d and the trade-off parameter λ. Here, we take the DBLP-author network as an example. By setting the training ratio to 20%, we report the grid search results over d and λ on the DBLP-author network in Fig. 3. We observe that SANRL achieves best performance when λ =8andd = 256. To further investigate the effect of increasing dimension d, we use Fig. 4 to show the Micro-F1 and Macro-F1 curves of different models. By varying d from 64 to 1024, we can see that SANRL consistently performs best. In SANRL, we add a constraint to DeepWalk to let nodes with similar attributes have similar representations. Our motivation is based on the statistics shown in Fig. 5. The Jaccard coefficient is calculated between the attribute sets of two social nodes sampled from the network at random. Then we compute the conditional probability that two nodes share a same label given the corresponding discretized Jaccard coefficient. Our assumption is validated on all

10 Hierarchical Mixed Neural Network for Joint Representation Learning Micro-F Macro-F Micro-F Macro-F d=64 d = 512 d=128 d = 1024 d= λ 0.68 d=64 d=512 d=128 d = 1024 d= λ 0.60 DeepWalk LDA LINE SANRLdoc TADW SANRL Dimension 0.60 DeepWalk LDA LINE SANRLdoc TADW SANRL Dimension Fig. 3. Grid search over dimension d and λ. Fig. 4. Performance over dimension d Weibo education DeepWalk TADW LINE SANRL Probability of sharing labels [0, 0.2) [0.2, 0.4) [0.4, 0.6) [0.6, 0.8) [0.8, 1.0] Weibo job [0, 0.2) [0.2, 0.4) [0.4, 0.6) [0.6, 0.8) [0.8, 1.0] DBLP author [0, 0.2) [0.2, 0.4) [0.4, 0.6) [0.6, 0.8) [0.8, 1.0] Jaccard coefficient Macro-F α Fig. 5. Illustration of interdependencies between labels of users and attributes of users. Fig. 6. Parameter sensitivity w.r.t α three networks. Note that the DBLP-author network has much more discriminative attributes than those two Weibo networks. That s why a smaller λ is more suitable for the Weibo dataset. Next, we will test the robustness of SANRL in an incomplete data scenario. We random choose some users and remove their text contents. We use α to represent the percentage of users whose texts are removed. LDA and SANRL doc could not work in this situation. By ranging α from 0.5 to 0.9, we report the Macro-F1 of different models when the training ratio is 0.1 in Fig. 6. Obviously, SANRL has greater robustness than TADW whose performance is very poor. The reason lies with the fact that the text content matrix calculated from the tf-idf matrix which has many rows with all zeros is uninformative. In contrast, SANRL benefits from limited text information by using a coupled design. 5.5 Network Visualization Network visualization is an essential task in network analysis area. In this part, we use the t-sne package which takes the node representation vectors as inputs to generate network layouts in a 2D space. Our target is to compare the properties of the layouts of DeepWalk and SANRL qualitatively. Because our previous three networks are not mono-labeled, we build a paper-citation network by using

11 248 W. Chen et al. a large-scale DBLP database 1. We select three computer conferences to represent different research fields: SIGMOD from database, ICML from machine learning, and ACL from natural language processing. Papers published on these three conferences and citation relationships among them are extracted to build this DBLP-citation network. We also use words in the title of each paper as their text attributes. Finally, this network has 1280 papers from SIGMOD, 568 papers from ICML, 1,206 papers from ACL, 6,647 edges and 3,814 unique attributes. Fig. 7. t-sne 2D representations on the DBLP-citation network. We use blue, green and red to indicate papers from SIGMOD, ICML and ACL respectively. Words are colored black (Color figure online) Under the same parameter configuration, the 2-D layouts of the DBLPcitation network are shown in Fig. 7. Three different colors, blue, green and red are used to indicate papers from SIGMOD, ICML and ACL respectively. Attributes are colored black. In Fig. 7, (a) and (b) are generated by only feeding the node representations to the t-sne toolkit. And (c) and (d) are generated by feeding the node representations and attributes representations together to the t-sne toolkit. We have three observations: First, We observe that node distributions in (b) and (a) are very similar, but groups in (b) seem to be more tighter. Second, after plotting the papers and the words in a same space simultaneously in (c), the words are spread throughout the space, but most of them are densely populated in the center while different paper groups are surrounding the word groups. Finally, it is obvious that each paper groups are becoming more denser and separable in (d) which is obtained by removing words from (c). These observations show that the distinguishability of node representations is improved significantly by jointing learning the representations of their associated attributes in SANRL model. 1 Citation.

12 Paper title Hierarchical Mixed Neural Network for Joint Representation Learning 249 Table 4. Top-10 related words for selected papers. Recommended words Dynamic multidimensional histograms Histogram, filesystem, stholes, braid, histograms, lag, partiqle, frequencies, filtered, multidimensional Computing weakest readings Weakest, readings, ambiguities, scope, semantically, formalisms, polysemic, hole, dominance, distributions 5.6 Case Study To explore the correlations among node vectors and attribute vectors learned in SANRL, we provide a case study on the DBLP-citation dataset. We recommend top-10 most related words to papers by calculating the cosine similarly between the paper vector and the word vector. As shown in Table 4, we can see that though the titles of two selected papers both have only three words, SANRL can find much richer related words by considering the citation relationship between the papers and the interactive relationship between the papers and the words. 6 Conclusion and Future Work In this paper, we propose SANRL, an efficient model which integrates both structure and attribute information into the NRL task. By embedding nodes and attributes into the same vector space, the quality of the node representations are improved significantly. The experimental results show that SANRL outperforms competitive baselines for different data mining tasks. We strive to adapt SANRL to learn more discriminative representations by using the semisupervised learning technique in the future. Acknowledgments. This work is supported by 973 Program with Grant No. 2014CB Yan Zhang is supported by NSFC with Grant Nos and , and MOE-RCOE with Grant No. 2016ZD201. We thank the anonymous reviewers for their comments. References 1. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: NIPS, vol. 14, pp (2001) 2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, (2003) 3. Cao, S., Lu, W., Xu, Q.: GraRep: learning graph representations with global structural information. In: CIKM, pp ACM (2015) 4. Culotta, A., Ravi, N.K., Cutler, J.: Predicting the demographics of Twitter users from website traffic data. In: ICWSM. AAAI Press, Menlo Park (2015, in press)

13 250 W. Chen et al. 5. Djuric, N., Wu, H., Radosavljevic, V., Grbovic, M., Bhamidipati, N.: Hierarchical neural language models for joint representation of streaming documents and their content. In: WWW, pp (2015) 6. Dong, Y., Yang, Y., Tang, J., Yang, Y., Chawla, N.V.: Inferring user demographics and social strategies in mobile social networks. In: SIGKDD, pp ACM (2014) 7. Gong, N.Z., Talwalkar, A., Mackey, L., Huang, L., Shin, E.C.R., Stefanov, E., Shi, E.R., Song, D.: Joint link prediction and attribute inference using a social-attribute network. ACM TIST 5(2), 27 (2014) 8. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: SIGKDD, pp ACM, New York (2016) 9. Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD LNCS (LNAI), vol. 6321, pp Springer, Heidelberg (2010). doi: / Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents (2014). arxiv preprint arxiv: Li, J., Zhu, J., Zhang, B.: Discriminative deep random walk for network classification. In: ACL, pp (2016) 12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp (2013) 13. Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp (2009) 14. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: SIGKDD, pp ACM (2014) 15. Shi, C., Zhang, Z., Luo, P., Yu, P.S., Yue, Y., Wu, B.: Semantic path based personalized recommendation on weighted heterogeneous information networks. In: CIKM, pp ACM (2015) 16. Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. ACM SIGKDD Explor. Newslett. 14(2), (2013) 17. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In: WWW. ACM (2015) 18. Tu, C., Zhang, W., Liu, Z., Sun, M.: Max-margin DeepWalk: discriminative learning of network representation. In: IJCAI, pp (2016) 19. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: SIGKDD, pp , ACM, New York (2016) 20. Yang, C., Liu, Z., Zhao, D., Sun, M., Chang, E.Y.: Network representation learning with rich text information. In: IJCAI, pp AAAI Press (2015) 21. Yang, Z., Cohen, W., Salakhutdinov, R.: Revisiting semi-supervised learning with graph embeddings (2016). arxiv preprint arxiv:

Network embedding. Cheng Zheng

Network embedding. Cheng Zheng Network embedding Cheng Zheng Outline Problem definition Factorization based algorithms --- Laplacian Eigenmaps(NIPS, 2001) Random walk based algorithms ---DeepWalk(KDD, 2014), node2vec(kdd, 2016) Deep