Network Representation Learning Enabling Network Analytics and Inference in Vector Space. Peng Cui, Wenwu Zhu, Daixin Wang Tsinghua University

Size: px

Start display at page:

Download "Network Representation Learning Enabling Network Analytics and Inference in Vector Space. Peng Cui, Wenwu Zhu, Daixin Wang Tsinghua University"

Julius Stokes
5 years ago
Views:

1 Network Representation Learning Enabling Network Analytics and Inference in Vector Space Peng Cui, Wenwu Zhu, Daixin Wang Tsinghua University

2 2 A Survey on Network Embedding Peng Cui, Xiao Wang, Jian Pei, Wenwu Zhu. A Survey on Network Embedding. arxiv preprint arxiv:

3 3 Big Data Challenge Data Volume: exponentially grow Computing Power: exponentially grow Data always grows faster than computing power. But the gap is not serious.

4 4 What if data is linked? Computing Power: exponentially grow Required: double exponential Real big data problem exists in linked data.

5 5 Data is linked, in real cases Social Networks

6 6 Data is linked, in real cases Biology Networks

7 7 Data is linked, in real cases Finance Networks

8 8 Data is linked, in real cases Internet of Things Networks are natural representations for such linked big data.

9 9 Network Analytics Descriptive and Predictive pnode importance pcommunity detection pnetwork distance plink prediction pnode classification pnetwork evolution p Network analytics technology have vital impact on many real applications.

10 The voices from industry Social Networks

customers, 800M products The network grows to

10 10 The voices from industry Social Networks Facebook: ~2 billion active users Wechat: ~1 billion active users E-commerce Networks Amazon: 400M active customers, 400M products Taobao: 500M customers, 800M products The network grows to such a scale that no sophisticated network analytics is doable.

11 11 G = ( V, E ) What is the bottleneck? Iterative & Combinatorial Complexity Links Coupling Parallelizability Dependency among nodes Applicability of ML methods

12 12 Revisit network representation G = ( V, E ) G = ( V ) Vector Space generate embed p Easy to parallel p Can apply classical ML methods

13 The ultimate goal of network embedding 13 Network Inference pnode importance pcommunity detection pnetwork distance plink prediction pnode classification pnetwork evolution p in Vector Space

14 14 How? The vector space should be able to Goal 1 Reconstruct the original network Goal 2 Support network inference Network Embedding

15 15 Network Embedding Goal 1 Reconstruct the original network Matrix Factorization Adjacency Matrix Embedding Matrix Aim to reconstruct all the links, causing overfitting. The network inference ability is seriously limited.

16 16 Network Embedding Goal 1 Reconstruct the original network Problems still exist. Network Embedding Network Reconstruction Network Inference Network Inference pnode importance pcommunity detection pnetwork distance plink prediction pnode classification pnetwork evolution p

17 17 Network Embedding Goal 2 Support network inference Network Embedding Enabling network inference in vector space Network Inference pnode importance pcommunity detection pnetwork distance plink prediction pnode classification pnetwork evolution p

18 18 Network Embedding Goal 2 Support network inference Reflect network structure Maintain network Properties B A Transitivity C

19 Graph Embedding v.s. Network Embedding 19 Ø What s the difference between graph embedding and network embedding? Ø Why graph embedding cannot fulfill the goals of network embedding?

20 20 Graph v.s. Network p In some sense, they are different. p Graphs exist in mathematics. (Data Structure) pmathematical structures used to model pairwise relations between objects p Networks exist in the real world. (Data) psocial networks, logistic networks, biology networks, transaction networks, etc. p A network can be represented by a graph. p A dataset that is not a network can also be represented by a graph.

21 21 Graph Embedding Graph embedding is initially proposed as a dimensionality reduction technique. A graph, represented as an n*n proximity matrix, is first constructed based on features of the n samples. After that, low-dimensional representations for the n samples are derived from the constructed graph.

22 22 Multidimensional scaling (MDS) MDS learns representations that minimizes: Algorithm steps: Step 1. Compute squared similarity: Step 2. Remove means from rows and columns, i.e. centering. Result: a normalized proximity matrix, M. Step 3. Pick d largest eigenvalues of M and their eigenvectors. Step 4. Get representations via: is the diagonal matrix of m largest eigenvalues.

23 Isomap Construct graphs from datasets, and then use MDS for graph embedding, with the aim of dimension reduction. Step 1. Construct a neighborhood graph.

23 23 Isomap Construct graphs from datasets, and then use MDS for graph embedding, with the aim of dimension reduction. Step 1. Construct a neighborhood graph. For each point, pick K nearest points as its neighbors. Connect two neighboring points in the graph. Euclidean distance between features as edge length. Step 2. Compute shortest path between two nodes. Store results as an n*n distance matrix. High Complexity Step 3. Apply MDS on the distance matrix to learn representations.

24 24 Locally Linear Embedding To avoid explicit pair-wise proximity calculation. Step 1. Construct a neighborhood graph (same as Isomap). Step 2. Set edge weights as follows: Fix between non-neighboring points as zero. Learn for neighboring nodes by minimizing: under the constraint that Step 2. Learn representations by minimizing:

25 Applying graph embedding to networks 25 Graph embedding can learn node representations for a real-world network, e.g. a social network. W is no longer constructed from features, but from network topology. For example, we can use the adjacency matrix as W.

26 Why Graph Embedding Performs Poorly? Success of graph embedding depends on the quality of W For graph embedding, W is constructed from dataset. W uv captures quite precisely how similar u and v are. Yet the adjacency matrix of a real-world network is not a good proximity matrix. Using A to approximate W performs poorly. A uv describes how frequent u and v interact with each other. A uv roughly correlates with similarity(u, v) but only roughly. A uv = 0 can frequently happens even if u and v are similar. A uv < A uw does not guarantee that w is more similar to u than v. We need to find a better approximation of W than A. 26

27 So many ways to define network proximities Co-occurrence (neighborhood) 27

28 So many ways to define network proximities High-order proximities 28

29 So many ways to define network proximities Communities 29

30 So many ways to define network proximities Labels and attributes 30

31 So many ways to define network proximities Heterogeneous networks 31

32 Graph Embedding v.s. Network Embedding Graph embedding: Proximity is well defined. 32 Embedding Network embedding: Proximity need to be subtly designed. Proximity Embedding approximation W"

33 The Literature of Network Embedding 33

34 34 Outline pwhy revisit network representation? pnetwork embedding 1.0 graph embedding pstructure-preserving network embedding pproperty-preserving network embedding pnetwork embedding with side information papplication-oriented network embedding pdiscussions and future directions

35 35 Network Structures Nodes & Links Node Neighborhood Pair-wise Proximity Community Structures

36 36 Network Structures Nodes & Links Node Neighborhood Pair-wise Proximity Community Structures

37 37 Node neighborhood pnode neighborhood is important in networks. pnode embedding should be able to reflect neighborhood. How to define neighborhood in networks?

38 38 DeepWalk pexploit truncated random walk to define neighborhood of a node. Random Walks on Graph p V $% V $' V ($ V ( V )* p V ' V + V )+ V % V )) p V () V (( V $) V (( V )' B. Perozzi et al. Deepwalk: Online learning of social representations. KDD 2014.

39 39 DeepWalk prandom walk on networks can generate similar power law distributions as word distributions in text. B. Perozzi et al. Deepwalk: Online learning of social representations. KDD 2014.

40 40 DeepWalk DeepWalk: preserve social relation in vector space Context: truncated random walk Model: word embedding (Skip-gram) Dense vector B. Perozzi et al. Deepwalk: Online learning of social representations. KDD 2014.

41 41 DeepWalk Advantages: p Generalization of language modeling p Scalable B. Perozzi et al. Deepwalk: Online learning of social representations. KDD 2014.

preserve both relationships using a unified method? A.

42 42 Node2vec Social Relations: p Homophily (community): p Structural equivalence: DFS search BFS search How to preserve both relationships using a unified method? A. Grover et al. node2vec: Scalable Feature Learning for Networks. KDD 2016.

43 43 Node2vec Biased Random Walk: p Parameters p, q controls interpolation between DFS and BFS p Semi-supervised learning: decide p, q from labeled nodes p Similar optimization model as DeepWalk A. Grover et al. node2vec: Scalable Feature Learning for Networks. KDD 2016.

44 44 Node2vec A. Grover, J. Leskovec. node2vec: Scalable Feature Learning for Networks. KDD 2016.

45 45 DeepWalk & Node2vec p Heuristic methods for node context definition pno explicit and direct optimization objective on network embedding p What should be the objective? ppair-wise proximity

46 46 Network Structures Nodes & Links Node Context Pair-wise Proximity Community Structures

47 47 LINE p Idea Node 6-7 first-order proximity Node 5-6 second-order proximity Networks preserve structure p Challenges l How to deal with Large-scale network? l How to preserve the first and second-order proximity in various types of networks (undirected, directed, weighted)? Jian Tang et al. LINE: Large-scale Information Network Embedding. WWW 2015.

48 48 LINE LINE with First-order Proximity: local pairwise LINE with Second-order Proximity: neighborhood structures Jian Tang et al. LINE: Large-scale Information Network Embedding. WWW 2015.

49 49 LINE Word Analogy multi-label classification Jian Tang et al. LINE: Large-scale Information Network Embedding. WWW 2015.

50 50 Network Structures Nodes & Links Node Context Pair-wise Proximity Community Structures

51 51 Community-preserving: M-NMF Network structure first-order proximity second-order proximity Microscopic structure Pairwise structure: first-order and second-order proximities Mesoscopic structure Community structure: proximities among a group community structure l important features of network: reveal organizational structure l higher level structural constraints on node representations Xiao Wang et al. Community Preserving Network Embedding. AAAI, 2017.

52 52 Community-preserving: M-NMF Network structure Microscopic structure Pairwise structure: first-order and second-order proximities first-order proximity second-order proximity Mesoscopic structure Community structure: proximities among a group community structure How to incorporate community structure in embedding space?

53 53 Community-preserving: M-NMF p Idea Preserving microscopic structure consensus relationship NMF: Detecting community structure Modularity: Expected number of edges between i and j by random Community belonging indicator p How to link them up?

54 54 Community-preserving: M-NMF p Modularized NMF Preserving the first and second order proximity Identifying the community structure Linking up the node embedding with the community structure

55 55 Community-preserving: M-NMF Evaluation node clustering l achieve the best results on seven of nine networks

56 56 Community-preserving: M-NMF Evaluation node classification l achieve the best results on eight of nine networks l consistently show better performance than M-NMF0

57 57 Community-preserving: M-NMF Parameter analysis

58 58 A fundamental thinking p Intrinsically, the problem of network embedding is to learn a mapping function from the original network space to the latent vector space. p However, most of previous methods assume a linear transformation. p Network structure is quite complicated. Is a linear transformation adequate?

59 A deep model for network embedding - SDNE 59

60 Structural Deep Network Embedding p Idea solve non-linearity problem preserve structure constrain first-order proximity Networks Deep Neural Network second-order proximity p Challenges Deep

60 60 Structural Deep Network Embedding p Idea solve non-linearity problem preserve structure constrain first-order proximity Networks Deep Neural Network second-order proximity p Challenges Deep Network Embedding Structural Network Embedding l How to deal with network data in deep neural networks? l How to preserve the first and second-order proximity in deep neural networks? Daixin Wang et al. Structural Deep Network Embedding. KDD, 2016.

61 61 Framework of SDNE Supervised Constraint (preserve first-order proximity) Unsupervised Autoencoder (preserve second-order proximity) Unsupervised Autoencoder (preserve second-order proximity)

62 62 Preserve second-order proximity reconstruction space reconstruct x? i y? 1 i parameter sharing x? j y? 1 j embedding space y i K y j K embed y i 1 x i parameter sharing y j 1 x j origin space vertex i s neighbor vertex j s neighbor Reconstruct the neighborhood structure of each vertex through deep autoencoder L $12 = X6 X B 9 $ reconstruction of adjacency matrix adjacency matrix balancing weight

63 63 Preserve first-order proximity x? i y? 1 i parameter sharing x? j y? 1 j reconstruction space y i K y i 1 x i Vertex i Laplacian Eigenmaps parameter sharing Vertex j y j K y j 1 x j embedding space origin space Incur penalty when similar vertexes are mapped far away in the embedding space. 1 L )AB = C s E,F y E y F $ $ E,FH) similarity between the i-th and j-th vertex the embedding of the i(j)-th vertex

64 64 Algorithm p Objective Function second-order loss first-order loss L = L $12 + αl )AB + υl PQR 1 regularization term p Algorithm = X6 X B 9 $ +α C s E,F y E y F $ $ +υl PQR E,FH)

65 65 Out-of-Sample p The relation to existing vertexes is known 5 4 l Complexity: constant new vertex p The relation to existing vertexes is unknown l state-of-the-art cannot handle 3 adjacency vector SDNE l Resort to side information (attribute, content, etc.) embeddings

66 66 p Training Complexity Complexity l O(ncdI) l n: number of vertexes l c: average degree l d: dimension of the embedding l I: number of iterations Training complexity is linear to the number of vertexes.

67 67 Experiments p Datasets Dataset Type V E BlogCatalog Social network Flickr Social network Youtube Social network Arxiv GR-QC Citation network NewsGroup Language network 1720 Full-connected

68 68 Experiments p Evaluation Tasks Goal Reconstruction Inference Insights Task Network Reconstruction Node Classification Link Prediction Visualization p Baselines l DeepWalk (B. Perozzi et al): Combine random walk and skip-gram l LINE ( J. Tang et al.): Consider first-order and second-order information l GraRep (S. Cao et al.): Consider first-order and high-order information l Laplacian Eigenmaps ( M. Belkin et al.): Factorizing Laplacian matrix l Common Neighbor (D. Liben-Nowell et al.): For link prediction

69 69 Task1: Network Reconstruction p Experiment Setup Learn network representations predict the existing links (nearest neighbor) p Results and Discussion improve 16% improve 85%

70 70 Task2: Node Classification p Experiment Setup Learn network representations Train Linear Classifier Classify the test data p Results averaged improve 12% averaged improve 14% improve 10% improve 18% improve 25% improve 15% Classification performance on Blogcatalog.

71 71 Task3: Link Prediction p Tasks Learn network representations predict the future links (nearest neighbor) p Results and Discussion the precision keeps at least 0.9 improve at least 15%

72 72 Task4: Visualization p Tasks Learn network representations t-sne to layout the network (different categories put different colors) p Results and Discussion small within-class distance large between-class distance Xiao Wang et al. Community Preserving Network Embedding. AAAI, 2017.

73 73 Summary pstructure preserving is essential for network representation learning. pdeep models have large potential in network representation learning. pit is possible to realize various network applications in embedding space.

74 74 Outline pwhy revisit network representation? pnetwork embedding 1.0 graph embedding pstructure-preserving network embedding pproperty-preserving network embedding pnetwork embedding with side information papplication-oriented network embedding pdiscussions and future directions

75 Maintain network properties - Transitivity The Transitivity Phenomenon 75 Network Embedding Space B Triangle Inequality: D A, B + D B, C > D(A, C) A C A close to B, B close to C, A relatively close to C However, real network data is complex

76 76 Non-transitivity The Co-existence of Transitivity and Non-transitivity Image network Social network A dog lawn Collegue Classmate C cat floor B cat lawn Word network Apple Cellphone Banana How to incorporate non-transitivity in embedding space?

77 77 Asymmetric Transitivity Directed Network B A B, B C => A C, but not C A Forward Backward A Forward Transitive Backward Transitive C Tencent Microblog Twitter Distance metric in embedding space is symmetric. How to incorporate Asymmetric Transitivity?

78 Non-transitivity The source of non-transitivity:

Non-transitive Transitive Transitive A dog lawn C cat

C2 floor Scene SC B2 lawn Non-transitive Embedding:

78 78 Non-transitivity The source of non-transitivity: Each node has multiple similarity components Non-transitive Transitive Transitive A dog lawn C cat floor B cat lawn A1 dog C1 cat Object SC B1 cat A2 lawn C2 floor Scene SC B2 lawn Non-transitive Embedding: represent non-transitive data with multiple latent similarity components

79 79 Non-transitivity Challenges phow to extract the latent similarity components? phow to identify the similarity components correspondence of each similarity relationship? Objects?? Scene??

80 M. Ou, et al. Non-transitive Hashing with Latent Similarity Components. SIGKDD, Non-transitivity Framework: Multiple Component Hashing (MuCH) 80

81 Non-transitivity Similarity Component Allocation Loss: how well the learned overall similarities under an allocation reconstruct the input similarities. LE = C log (1 + e gh ijk ij ) R EF : input similarity between i and j; S EF : learned overall similarity under an allocation Overall similarity hypothesis: Similar if similar on at least one similarity component Dissimilar otherwise. Maximum function: Non-transitive similarity h q S EF = max {S EF ), SEF $,, SEF w } S EF q = he qr hf q S EF q : similarity between i and j on m-th similarity component : hash code of on m-th similarity component 81

82 82 Non-transitivity Similarity Component Extraction Similarity component: Hamming space extracted from feature space. Hash function: extraction function h y q = f q x y = sign(w q x y ) h y q : hash code of A on m-th similarity component W q : hash projection on m-th similarity component x y : feature vector of A Hash functions are learned by preserving the allocated similarity relationship.

83 83 Regularizer: Non-transitivity Final Objective and Optimization Guarantee similarity components are independent with each other. Avoid overfitting LR = γ ) (w E r w F ) $ EƒF + γ $ W F $ Final Objective min L = LE + LR Where LE = log(1 + e gh ijk ij ) LR = γ ) (w E r w F ) $ EƒF + γ $ W F $ Optimization: Block Gradient Descent

84 84 Non-transitivity Evaluation: Synthetic Data 1000 samples whose feature consists of 2 similarity components, 4 clusters in each similarity component Feature W ) W $ SC1 SC2 Task: retrieve entities within Hamming radius 1 Hash Projections á 9.3% á 236%

85 85 Non-transitivity Evaluation: Real Data Public datasets: DBLP: 2500 authors from 12 research fields NUS-WIDE: images from 10 concepts Task: retrieve top100 closest entities MAP: ranking performance HRS (Hamming Radius to Search): search efficiency MuCH-S: best MAP. MuCH-F: at least 33% improvement on search radius.

86 86 Asymmetric Transitivity A All existing methods fail.. Single Vector A B Asymmetric fail C B C Double Vectors Source Target A B Transitivity fail C

87 87 Asymmetric Transitivity Inspired by high-order proximities High-order proximities are asymmetric transitivity metrics Katz Similarity: weighted sum of paths between two nodes, where the weights decrease with the path length increasing. S Ž B = C(beta A ) E EH) B Source Target C A 0.18 C B 0 A

88 88 Asymmetric Transitivity A straightforward solution High-order proximity matrix Source embedding matrix Target embedding matrix However, the calculation of high-order proximities is not scalable. (O(N ( ) for Katz)

89 Asymmetric Transitivity The general formulation of high-order proximities 89 M. Ou, P. Cui, J. Pei, W. Zhu. Asymmetric Transitivity Preserving Graph Embedding. KDD, 2016.

90 Asymmetric Transitivity Decompose S without calculating S Generalized SVD 90

91 91 p JDGSVD: Time Complexity Asymmetric Transitivity Linear Complexity O K $ L m Embedding Dimension Iteration Edge number (constant) (constant) p Linear complexity w.r.t. the volume of data (i.e. edge number) --> Scalable algorithm, suitable for large-scale data

92 92 Asymmetric Transitivity p Theoretical Guarantee on Approximation Error Upper Bound:

93 93 Asymmetric Transitivity HOPE: High-Order Proximity preserved Embedding Algorithm framework:

94 Experiment Setting: Datasets Datasets: p Synthetic (Syn): generate using Forest Fire Model p Cora 1 : citation network of academic papers p SN-Twitter 2 : Twitter Social Network p SN-TWeibo 3

94 94 Experiment Setting: Datasets Datasets: p Synthetic (Syn): generate using Forest Fire Model p Cora 1 : citation network of academic papers p SN-Twitter 2 : Twitter Social Network p SN-TWeibo 3 : Tencent Weibo Social Network

95 95 Experiment Setting: Task p Approximation accuracy p High-order proximity approximation: how well can embedded vectors approximate high-order proximity p Reconstruction p Graph Reconstruction: how well can embedded vectors reconstruct training sets p Inference: p Link Prediction: how well can embedded vectors predict missing edges p Vertex Recommendation: how well can embedded vectors recommend vertices for each node

96 96 Experiment Setting: Baseline p Graph embedding p PPE: approximate high-order proximity by selecting landmarks and using sub-block of the proximity matrix p LINE: preserves first-order and second-order proximity, called LINE1 and LINE2 respectively p DeepWalk: random walk on graphs + SkipGram Model p Task Specific: p Common Neighbors: used for link prediction and vertex recommendation task p Adamic-Adar: used for link prediction and vertex recommendation task

97 Experiment result: high-order Proximity Approximation 97 x10 x4 Conclusion: HOPE achieves much smaller RMSE error -> generalized SVD achieves a good approximation

98 Experiment result: Graph Reconstruction % +400% Conclusion: HOPE successfully capture the information of training sets

99 99 Experiment result: Link Prediction +100% +100% Conclusion: HOPE has good inference ability -> based on asymmetric transitivity

Experiment result: Vertex Recommendation 100 +88% improvement +81% improvement

100 Experiment result: Vertex Recommendation % improvement +81% improvement Conclusion: HOPE significantly outperforms all state-ofthe-art baselines on all these experiments

101 Summary pit is crucial pay attention to network properties in network embedding. pthe network properties in real networks are far more complicated than our common assumptions. pby maintaining network properties, the inference ability of the embedding space can be significantly improved. 101

102 102 Outline pwhy revisit network representation? pnetwork embedding 1.0 graph embedding pstructure-preserving network embedding pproperty-preserving network embedding pnetwork embedding with side information papplication-oriented network embedding pdiscussions and future directions

103 103 Side Information p Some real-world networks have rich side information p Node Content p Node Label p Node & Edge Type: Heterogeneous Information Network (HIN)

104 104 Node Content p Information of node attributes or features p E.g. user profile in social network, text information in citation network p Leverage node content: p Constraints on embedding vectors p Extra similarity

105 105 Node Content p Constraints on embedding vectors: embedding as linear transformation of features Yang C., et al. Network Representation Learning with Rich Text Information. IJCAI 2015.

106 106 p Objective function: Node Content p Prove DeepWalk is equivalent to matrix factorization M EF = log N(v E, c F ) N(v E ) = log e E A + A $ + + A B F t p Inductive matrix completion: substitute embedding vectors with linear transformation of features Yang C., et al. Network Representation Learning with Rich Text Information. IJCAI 2015.

107 107 p Experimental results Node Content Yang C., et al. Network Representation Learning with Rich Text Information. IJCAI 2015.

108 Node Label p Node Label: supervise information p E.g. categories of documents, research topics of papers p Idea: nodes with the different label could be separated 108

109 109 p Max-margin from SVM Node Label Tu C., et al. Max-Margin DeepWalk: Discriminative Learning of Network Representation. IJCAI 2016.

110 110 p Max-margin from SVM Node Label p DeepWalk: equivalent to matrix factorization p Objective function: DeepWalk + soft-margin SVM Tu C., et al. Max-Margin DeepWalk: Discriminative Learning of Network Representation. IJCAI 2016.

111 111 p Experimental results Node Label Tu C., et al. Max-Margin DeepWalk: Discriminative Learning of Network Representation. IJCAI 2016.

112 112 Node Content + Label p Sometimes both information are available

113 113 Node Content + Label p Simultaneously utilize both information p Node content: maximize co-occurrence of content words given a node, same as doc2vec p Node label: maximize probability of word sequence given label Pan S., et al. Tri-Party Deep Network Representation. IJCAI 2016.

114 114 p Experimental results Node Content + Label Pan S., et al. Tri-Party Deep Network Representation. IJCAI 2016.

115 Heterogeneous Information Network p Nodes and edges: different types p E.g. DBLP citation and collaboration 115

116 Heterogeneous Information Network p Proximity measure in original space: meta-path 116 Sun Y., et al. PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks. VLDB 2011.

117 Heterogeneous Information Network p Meta-bath based proximity: p PathCount, Path Constrained Random Walk, PathSim, etc. 117 p Truncated Proximity Calculation: dynamic programming p Similar objective function as LINE Huang Z., et al. Heterogeneous Information Network Embedding for Meta Path based Proximity. Arxiv 2017.

118 Heterogeneous Information Network p Experimental results 118 Huang Z., et al. Heterogeneous Information Network Embedding for Meta Path based Proximity. Arxiv 2017.

119 Heterogeneous Information Network p Proximity measure in a mapped common space p A deep model for two modalities data p Image -> CNN; Text -> Fully Connected Layers p Then mapping to a common embedding space 119 Chang S., et al. Heterogeneous Network Embedding via Deep Architectures. KDD 2015.

120 Heterogeneous Information Network p Experimental results: 120 Chang S., et al. Heterogeneous Network Embedding via Deep Architectures. KDD 2015.

121 Heterogeneous Networks are Ubiquitous 121 Social Networks Educational Networks E-commerce Networks Biological Networks

122 Heterogeneous Networks Embedding (Heterogeneous Network Hashing) 10011011 001?? 001? 10100 10011011?

122 122 Heterogeneous Networks Embedding (Heterogeneous Network Hashing) ?? 001? ?? 001 M Ou et al. Comparing Apples to Oranges: A Scalable Solution with Heterogeneous Hashing. KDD, 2013.

123 123 Existing Methods Hashing for single domain [Yair Weiss et al. 08], [Wei Liu et al. 11], [Jun Wang et al. 10] Generate hash codes by learning Cannot handle different types of features Cannot fit heterogeneous relations Work for homogeneous data Hashing for multi-modal data [Shaishav Kumar et al. 11], [Yi Zhen et al. 12] Handle multi-modal data A special case of heterogeneous hashing, but did not target general heterogeneous networks

124 124 Challenges Heterogeneous comparison Objects are represented by different features, cannot compare directly? Sparsity The density of relation matrix is much less than 0.1% Unbalance The heterogeneous domains are with different complexity, an unified representation is not flexible.

125 RaHH:Relation-aware Heterogeneous Hashing 125 Hash Code Relation-aware Heterogeneous Hashing Similarity-preserved Homogeneous Hashing Linear Heterogeneous Mapping Heterogeneous Mapping

126 126 Homogeneous Embedding Hypothesis: similar homogeneous objects will be related with similar heterogeneous objects Complementary with heterogeneous relation Preserve the homogeneous similarity in Hamming space P is domain number, m ž is the number of objects in domain p q J H ž = C C A ž EF h ž E h ž F žh) E,FH) A ž EF is the similarity of object pair (i,j) in domain p h E ž is the hash code of object i in domain p

projected into different Hamming spaces.

127 127 Heterogeneous Mapping Bridge the hash codes from different domains based on the heterogeneous relations Heterogeneous domains are very different, and are projected into different Hamming spaces. Mapping is introduced to bridge the different Hamming spaces W )$ W $) W ($ W W () )( W $(

128 128 Heterogeneous Mapping Heterogeneous loss For each bit of target domain, we design a logistic regression-like loss function ž = ln(1 + e gh ij j l EFŽ ( ) ª i ) Mapping hash code h ž E to k-th bit of h F (i.e. H FŽ ) R ž EF is the heterogeneous relation value w ž Ž is the linear mapping vector The loss function for all the heterogeneous relations J Q H ž, {W ž } = ž~ ( E,F l EFŽ + λ w ž Ž Ž $ ) ž

129 129 Optimization Final Objective Combine homogeneous loss and heterogeneous loss J = J + J Q Optimize hash code and mapping iteratively by Block Gradient Descent Update hash code Update mapping

130 Experimental Results By varying bit number of hash code,

distance User Query for similar Posts Post Query for similar

8% improvement at most Text Query for similar Images Image

130 130 Experimental Results By varying bit number of hash code, we test MAP on 50 most closest results to query on Hamming distance User Query for similar Posts Post Query for similar Users Tencent Weibo 13.8% improvement at most 15.8% improvement at most Text Query for similar Images Image Query for similar Texts NUS-WIDE 65.7% improvement at most 50% improvement at most

131 131 Summary p General idea of fusing side information: proximity measure p Node content: nodes similar in features/attributes space should have large proximity p Node label: nodes with the same label should have large proximity p Node & Edge Types: nodes defined similar in heterogeneous information network should have large proximity

132 132 Outline pwhy revisit network representation? pnetwork embedding 1.0 graph embedding pstructure-preserving network embedding pproperty-preserving network embedding pnetwork embedding with side information papplication-oriented network embedding pdiscussions and future directions

133 Application-oriented network embedding 133 p Some approaches for applications rely heavily on feature engineering p features are carefully and sometimes mysteriously designed. p features are hard to be generalized. p Previous network embedding are general-purpose. p independent of the specific applications. p lack of domain knowledge. p Challenge: how to embed the network under the guidance of the specific application.

134 Cascade prediction p Cascade prediction ppredict the increment of the size of cascade after a given time interval p Idea p An end-to-end model to learn the

134 134 Cascade prediction p Cascade prediction ppredict the increment of the size of cascade after a given time interval p Idea p An end-to-end model to learn the representation of cascade graphs to optimize the prediction accuracy of their future sizes Li Cheng, et al. DeepCas: an End-to-end Predictor of Information Cascades. WWW 2017

135 135 Cascade prediction End-to-end pipeline of DeepCas Cascade graph GRU-based Sequence Encoding Sampling by random walk Attention to assemble the representation of the graph. Li Cheng, et al. DeepCas: an End-to-end Predictor of Information Cascades. WWW 2017

136 Anchor links prediction p Problem Definitions l given source network G s and target network G t and a set of observed anchor links T, to identify the hidden anchor links across G s and G t.

136 136 Anchor links prediction p Problem Definitions l given source network G s and target network G t and a set of observed anchor links T, to identify the hidden anchor links across G s and G t. p Idea l The observed anchor links as supervised information. l Network embedding is to capture the major and specific structural regularities and further learns a cross-network mapping. Man T, Shen H, Liu S, et al. Predict Anchor Links across Social Networks via an Embedding Approach[C]//IJCAI. 2016:

137 Anchor links prediction p Cross Network Extension lbase on the observation, i.e., given a pair of users with anchor links, if they have a connection in one network, so do their counterparts in the other network.

137 137 Anchor links prediction p Cross Network Extension lbase on the observation, i.e., given a pair of users with anchor links, if they have a connection in one network, so do their counterparts in the other network. lfor example, the node pair (b, c) is not linked in the source network, but the counterpart node pair (B,C) is linked in the target network. This offers us a clue to extend the source network with edge b c.

138 138 Anchor links prediction p Loss Function l Notations l The representation of node i is denoted as U i. l Preserving the structures of G s and G t l Learning the mapping function ɸ parameterized by θ l By optimizing the above loss functions simultaneously, the representations which can preserve the network structure and respect the observed anchor links can be learned.

139 139 Summary p Application-oriented network embedding:additional supervised information p Cascade size p Anchor links p

140 140 Outline pwhy revisit network representation? pnetwork embedding 1.0 graph embedding pstructure-preserving network embedding pproperty-preserving network embedding pnetwork embedding with side information papplication-oriented network embedding pdiscussions and future directions

141 141 Summary and Key Messages pnetwork data is undermined in real applications pbottleneck: Complexity and Scalability pa fundamental solution: Network Representation Learning pnetwork embedding has been demonstrated to be a promising way pbut still a long way to go Network Reconstruction Structure Preserving Properties Maintaining

142 Future Directions Dynamic Networks How to update the embedding when networks change dynamically? -- node adding/deleting -- link adding/deleting -- community merging/resolving 142

143 New Edges: SVD Restarts on Dynamic Networks p SVD is widely used in network embedding p HOPE (Ou et al., KDD 2016) p GraRep (Cao et al., CIKM 2015) p Incremental SVD is proposed to update SVD results on dynamic networks p Error accumulation in the long run p Solution: SVD restarts p Question: what are the appropriate time points Restart 143 A 0 A 1 A t SVD U 0 0 V 0 Update SVD U 1 1 V 1 U t t V t When? Zhang Z, Cui P, Pei J, et al. TIMERS: Error-Bounded SVD Restart on Dynamic Networks. AAAI 2018

144 New Node: Out-of-Sample nodes in Dynamic Networks Nonparametric probabilistic modeling + Deep Learning 144 Ma J, Cui P, Zhu W. DepthLGP: Learning Embeddings of Out-of-Sample Nodes in Dynamic Networks[J]. AAAI 2018.

145 145 Future Directions Hyper-Networks How to preserve hyper-network structure in embedding space?

146 Structural Deep Embedding for Hyper-Networks 146 Unsupervised Heterogeneous Component A, A F, A Ž First Layer Node Type a Node Type b Node Type c autoencoder autoencoder autoencoder second-order preserving X, X F, X Ž Second Layer Non-linear mapping L EFŽ. Third Layer S EFŽ +1-1 Supervised Binary Component tuple-wise similarity function first order preserving Tu K, Cui P, Wang X, et al. Structural Deep Embedding for Hyper-Networks[J]. AAAI 2018

147 Future Directions More Complex Structure 147 p Many effective network structures are not considered in network embedding. l Network motifs - Higher order network structure. l How to preserve the motif structure when doing the network embedding?

Future Directions Preserve Special Property Network embedding for social network p Power Law Distribution is a classic and ubiquitous phenomenon for social

148 Future Directions Preserve Special Property Network embedding for social network p Power Law Distribution is a classic and ubiquitous phenomenon for social networks 148 l Problem: Hard to learn effective representations for nodes with limited information. l How to consider power law distribution into network embedding?

149 Future Directions Role of Side information 149 p Existing assumption: Side information benefits network embedding. l Whether this assumption always holds for real-world applications? l Explore the relationships between topology structure and side information for specific networks and applications Topology Side info

150 Future Directions Embedding Space 150 p Existing methods embed the network into the Euclidean space. lexploring other embedding spaces which fit the property of network, like hyperbolic space. lwhether we can learn the distance measure by the model itself? l Explore a space which has a good interpretability for the embedding.

151 Future Directions End-to-End solution 151 Network embedding for social influence analysis p How to explore network embedding to analyze the social influence? l Find top K opinion leaders l Influence maximization

152 152 Future Directions Scalability p Scaling existing network embedding methods to billionscale networks is still challenging. l Assuming embeddings, edge lists and node content can all fit in I/O memory is unreliable for billion-scale networks. l Real-world billion-scale networks are evolving and stored in distributed systems. How to propose a distributed network embedding methods?

153 153 A Survey on Network Embedding Peng Cui, Xiao Wang, Jian Pei, Wenwu Zhu. A Survey on Network Embedding. arxiv preprint arxiv:

154 154 References Ke Tu, Peng Cui, Xiao Wang, Fei Wang, Wenwu Zhu, Structural Deep Embedding for Hyper-Networks, AAAI, 2018 Ziwei Zhang, Peng Cui,Jian Pei, Xiao Wang, Wenwu Zhu, TIMERS: Error-Bounded SVD Restart on Dynamic Networks, AAAI, 2018 Jianxin Ma, Peng Cui, Wenwu Zhu, DepthLGP: Learning Embeddings of Out-of-Sample Nodes in Dynamic Network, AAAI, 2018 Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, Shiqiang Yang. Community Preserving Network Embedding. AAAI, Daixin Wang, Peng Cui, Wenwu Zhu. Structural Deep Network Embedding. KDD, Mingdong Ou, Peng Cui, Jian Pei, Wenwu Zhu. Asymmetric Transitivity Preserving Graph Embedding. KDD, Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu. Nontransitive Hashing with Latent Similarity Components. KDD, Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, Shiqiang Yang. Comparing Apples to Oranges: A Scalable Solution with Heterogeneous Hashing. KDD, 2013.

155 155 References J. Tenenbaum, V. Silva, and J. Langford, A Global Geometric Framework for Nonlinear Dimensionality Reduction, Science, vol. 290, no. 22, pp , Dec S. Roweis and L. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science, vol. 290, no. 22, pp , Dec M. Belkin and P. Niyogi, Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering, Advances in Neural Information Processing System, vol. 14, pp , B. Perozzi, R. Al-Rfou, S. Skiena. Deepwalk: Online learning of social representations. KDD A. Grover, J. Leskovec. node2vec: Scalable Feature Learning for Networks. KDD Jian Tang, Meng Qu, et al. LINE: Large-scale Information Network Embedding. WWW Shaosheng Cao et al. GraRep: Learning Graph Representations with Global Structural Information. CIKM 2015.

156 Acknowledgement Students Xiao Wang Tsinghua U Mingdong Ou Tsinghua U Daixin Wang Tsinghua U Ziwei Zhang Tsinghua U Jianxin Ma Tsinghua U Ke Tu Tsinghua U Collaborators Jian Pei SFU Fei Wang Cornell Shiqiang Yang Tsinghua U Wenwu Zhu Tsinghua U

157 Thanks! NRL page: My homepage: google Peng Cui 157

Network embedding. Cheng Zheng

Network embedding. Cheng Zheng Network embedding Cheng Zheng Outline Problem definition Factorization based algorithms --- Laplacian Eigenmaps(NIPS, 2001) Random walk based algorithms ---DeepWalk(KDD, 2014), node2vec(kdd, 2016) Deep