CS224W: Analysis of Networks Jure Leskovec, Stanford University

Size: px

Start display at page:

Download "CS224W: Analysis of Networks Jure Leskovec, Stanford University"

Thomasine Thornton
5 years ago
Views:

1 CS224W: Analysis of Networks Jure Leskovec, Stanford University

2 Jure Leskovec, Stanford CS224W: Analysis of Networks, 2????? Machine Learning Node classification

3 Jure Leskovec, Stanford CS224W: Analysis of Networks, 3 (Supervised) Machine Learning Lifecycle: This feature, that feature. Every single time! Raw Data Structured Data Learning Algorithm Model Feature Engineering Automatically learn the features Downstream prediction task

4 Goal: Efficient task-independent feature learning for machine learning in networks! node 2 u f: u R & vec R & Feature representation, embedding Jure Leskovec, Stanford CS224W: Analysis of Networks, 4

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.

5 Jure Leskovec, Stanford CS224W: Analysis of Networks, 5 We map each node in a network into a lowdimensional space Distributed representation for nodes Similarity between nodes indicates link strength Encode network information and generate node representation 17

6 Zachary s Karate Club network: Jure Leskovec, Stanford CS224W: Analysis of Networks, 6

(CNNs) Text is linear Sliding window (word2vec) Graphs are neither of these!

7 Jure Leskovec, Stanford CS224W: Analysis of Networks, 7 Graph representation learning is hard: Images are fixed size Convolutions (CNNs) Text is linear Sliding window (word2vec) Graphs are neither of these! Node numbering is arbitrary (node isomorphism problem) Much more complicated structure

8 Jure Leskovec, Stanford CS224W: Analysis of Networks, 8 node2vec: Random Walk Based (Unsupervised) Feature Learning node2vec: Scalable Feature Learning for Networks A. Grover, J. Leskovec. KDD 2016.

9 Jure Leskovec, Stanford CS224W: Analysis of Networks, 9 Goal: Embed nodes with similar network neighborhoods close in the feature space. We frame this goal as prediction-task independent maximum likelihood optimization problem. Key observation: Flexible notion of network neighborhood N S (u) of node u leads to rich features. Develop biased 2 nd order random walk procedure S to generate network neighborhood N S (u) of node u.

10 Jure Leskovec, Stanford CS224W: Analysis of Networks, 10 Intuition: Find embedding of nodes to d-dimensions that preserves similarity Idea: Learn node embedding such that nearby nodes are close together Given a node u, how do we define nearby nodes? N + u neighbourhood of u obtained by some strategy S

11 Jure Leskovec, Stanford CS224W: Analysis of Networks, 11 Given G = (V, E), Our goal is to learn a mapping f: u R d. Log-likelihood objective: max u V log Pr(N S (u) f u ) f where N S (u) is neighborhood of node u. Given node u, we want to learn feature representations predictive of nodes in its neighborhood N S (u).

12 Jure Leskovec, Stanford CS224W: Analysis of Networks, 12 max? log Pr(N S (u) f u ) f u V Assumption: Conditional likelihood factorizes over the set of neighbors. log Pr(N + (u f u ) =? log Pr (f(n A ) f u ) B C D E (F) Softmax parametrization: Pr(f(n i ) f u ) = exp(f n i f(u)) v V exp(f v f(u)))

13 Jure Leskovec, Stanford CS224W: Analysis of Networks, 13 exp(f n A f(u)) max?? log L M N exp(f v f(u))) F N B D O (F) Maximize the objective using Stochastic Gradient descent with negative sampling. Computing the summation is expensive Idea: Just sample a couple of negative nodes This means at each iteration only embeddings of a few nodes will be updated at a time Much faster training of embeddings

14 Two classic strategies to define a neighborhood N + u of a given node u: s 1 u s 2 s 7 s 6 s 8 BFS DFS s 3 s 4 s 5 s 9 Figure 1: BFS and DFS search strategies from node ( N PQ+ u = { s T, s U, s V } N XQ+ u = { s Y, s Z, s [ } Jure Leskovec, Stanford CS224W: Analysis of Networks, Local microscopic view Global macroscopic view 14

Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.

15 Jure Leskovec, Stanford CS224W: Analysis of Networks, 15 u BFS: Micro-view of neighbourhood DFS: Macro-view of neighbourhood

16 Jure Leskovec, Stanford CS224W: Analysis of Networks, 16 Biased random walk S that given a node u generates neighborhood N + u Two parameters: Return parameter p: Return back to the previous node In-out parameter q: Moving outwards (DFS) vs. inwards (BFS)

17 Jure Leskovec, Stanford CS224W: Analysis of Networks, 17 N S (u): Biased 2 nd -order random walks explore network neighborhoods: u s 1 1 s 5 s 4 1/p 1/q BFS-like: low value of p DFS-like: low value of q u s 4? p, q can learned in a semi-supervised way u s 1 s 5

18 Jure Leskovec, Stanford CS224W: Analysis of Networks, ) Compute random walk probs. 2) Simulate r random walks of length l starting from each node u 3) Optimize the node2vec objective using Stochastic Gradient Descent Linear-time complexity. All 3 steps are individually parallelizable

19 Jure Leskovec, Stanford CS224W: Analysis of Networks, 19 Interactions of characters in a novel: p=1, q=2 Microscopic view of the network neighbourhood p=1, q=0.5 Macroscopic view of the network neighbourhood

20 Jure Leskovec, Stanford CS224W: Analysis of Networks, 20

21 Jure Leskovec, Stanford CS224W: Analysis of Networks, Macro-F1 score 0.10 Macro-F1 score Fraction of missing edges Fraction of additional edges

22 General-purpose feature learning in networks: An explicit locality preserving objective for feature learning. Biased random walks capture diversity of network patterns. Scalable and robust algorithm with excellent empirical performance. Future extensions would involve designing random walk strategies entailed to network with specific structure such as heterogeneous networks and signed networks. Jure Leskovec, Stanford CS224W: Analysis of Networks, 22

23 Jure Leskovec, Stanford CS224W: Analysis of Networks, 23 OhmNet: Extension to Hierarchical Networks

24 Jure Leskovec, Stanford CS224W: Analysis of Networks, 24 Let s generalize node2vec to multilayer networks!

25 Jure Leskovec, Stanford CS224W: Analysis of Networks, 25 Each network is a layer G A = (V A, E A ) Similarities between layers are given in hierarchy M, map π encodes parent-child relationships

26 Jure Leskovec, Stanford CS224W: Analysis of Networks, 26 Computational framework that learns features of every node and at every scale based on: Edges within each layer Inter-layer relationships between nodes active on different layers

27 27 Input Output: embeddings of nodes in layers as well as internal levels of the hierarchy G A G e G f G g 1 2 3

28 28 OhmNet: Given layers G i and hierarchy M, learn node features captured by functions f i Functions f i embed every node in a d- dimensional feature space A multi-layer network with four layers and a two-level hierarchy M

29 Jure Leskovec, Stanford CS224W: Analysis of Networks, 29 Given: Layers G A, hierarchy M Layers G A AhT..j are in leaves of M Goal: Learn functions: f A : V A R &

30 Jure Leskovec, Stanford CS224W: Analysis of Networks, 30 Approach has two components: Per-layer objectives: Nodes with similar network neighborhoods in each layer are embedded close together Hierarchical dependency objectives: Nodes in nearby layers in hierarchy are encouraged to share similar features

31 Jure Leskovec, Stanford CS224W: Analysis of Networks, 31 Intuition: For each layer, find a mapping of nodes to d-dimensions that preserves node similarity Approach: Similarity of nodes u and v is defined based on similarity of their network neighborhoods Given node u in layer i we define nearby nodes N A (u) based on random walks starting at node u

32 Jure Leskovec, Stanford CS224W: Analysis of Networks, 32 Given node u in layer i, learn u s representation such that it predicts nearby nodes N A (u):! i (u) =logpr(n i (u) f i (u)), Given T layers, maximize: i = X! i (u), for i =1, 2,...,T. u2v i Notice: Nodes in different networks representing the same entity have different features

33 Jure Leskovec, Stanford CS224W: Analysis of Networks, 33 So far, we did not consider hierarchy M Node representations in different layers are learned independently of each other How to model dependencies between layers when learning node features?

34 Jure Leskovec, Stanford CS224W: Analysis of Networks, 34 We use regularization to share information across the hierarchy We want to enforce similarity between feature representations of networks that are located nearby in the hierarchy

35 Given node u, learn u s representation in layer i to be close to u s representation in parent π(i): c i (u) = 1 2 kf i(u) f (i) (u)k 2 2. Multi-scale: Repeat at every level of M C i = X u2l i c i (u), L A has all layers appearing in sub-hierarchy rooted at i Jure Leskovec, Stanford CS224W: Analysis of Networks, 35

36 Jure Leskovec, Stanford CS224W: Analysis of Networks, 36 Nodes in different layers representing the same entity have the same features in hierarchy ancestors We learn feature representations at multiple scales: features of nodes in the layers features of nodes in non-leaves in the hierarchy This model is more efficient than the fully pairwise model, where dependencies between layers are modeled by pairwise comparisons of nodes across all pairs of layers

37 Jure Leskovec, Stanford CS224W: Analysis of Networks, 37 Learning node features in multi-layer networks Solve maximum likelihood problem: X max f 1,f 2,...,f M i2t i X j2m C j, Per-layer network objectives Hierarchical dependency objectives

Proteins are worker molecules Understanding protein function has great biomedical and pharmaceutical implications Function of proteins depends on

38 Proteins are worker molecules Understanding protein function has great biomedical and pharmaceutical implications Function of proteins depends on their tissue context [Greene et al., Nat Genet 15] G1 G2 G3 G4 Jure Leskovec, Stanford CS224W: Analysis of Networks, 38

39 G1 G2 G3 G4 Tissue-specific protein interaction networks The precise function of proteins depends on their tissue context (Greene et al., Nat Genet 2015) Diseases result from the failure of tissue-specific processes (Hu et al., Nat Rev Genet 2016) Current models assume that protein functions are constant across tissues 39

40 Jure Leskovec, Stanford CS224W: Analysis of Networks, 40 A multi-layer tissue network has many network layers (tissues) Each layer corresponds to one tissue-specific protein interaction network Hierarchy M encodes biological similarities between the tissues at multiple scales

41 Jure Leskovec, Stanford CS224W: Analysis of Networks, 41 Brain 107 genome-wide tissue-specific protein interaction networks Cerebellum Frontal lobe Midbrain Brainstem Substantia nigra 584 tissue-specific cellular functions Examples (tissue, cellular function): Parietal lobe Pons Occipital lobe Medulla oblongata Temporal lobe (renal cortex, cortex development) (artery, pulmonary artery morphogenesis)

42 Jure Leskovec, Stanford CS224W: Analysis of Networks, 42 Brain Brainstem Cerebellum Frontal lobe Parietal lobe Occipital lobe Temporal lobe Midbrain Substantia nigra Pons Medulla oblongata 9 brain tissue PPI networks in two-level hierarchy

43 Jure Leskovec, Stanford CS224W: Analysis of Networks, 43

44 Jure Leskovec, Stanford CS224W: Analysis of Networks, 44 Cellular function prediction is a multi-label node classification task Every node (protein) is assigned one or more labels (cellular functions) Setup: We apply OhmNet, which for every node in every layer learns a separate feature vector in an unsupervised way. For every layer and every function we then train a separate onevs-all regularized linear classifier using the modified Huber loss During the training phase, we observe only a certain fraction of proteins and all their cellular functions across the layers The task is then to predict the tissue-specific functions for the remaining proteins

45 Tissues Jure Leskovec, Stanford CS224W: Analysis of Networks, 45

46 Jure Leskovec, Stanford CS224W: Analysis of Networks, % improvement over state-of-the-art on the same dataset

47 Transfer functions to unannotated tissues Task: Predict functions in target tissue without access to any annotation/label in that tissue Target tissue OhmNet Tissue nonspecific Improvement Placenta % Spleen % Liver % Forebrain % Blood plasma % Smooth muscle % Average % Reported are AUC values Jure Leskovec, Stanford CS224W: Analysis of Networks, 47

This Talk. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured

This Talk. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 1 This Talk 1) Node embeddings Map nodes to low-dimensional embeddings. 2) Graph neural networks Deep learning architectures