This Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured

Size: px

Start display at page:

Download "This Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured"

Chloe Simon
5 years ago
Views:

1 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW This Talk 1) Node embeddings Map nodes to low-dimensional embeddings. 2) Graph neural networks Deep learning architectures for graphstructured data 3) pplications

2 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Part 2: Graph Neural Networks

3 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Embedding Nodes Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.

4 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Embedding Nodes Goal: similarity(u, v) z > v z u Need to define!

5 Two Key omponents Encoder maps each node to a lowdimensional vector. enc(v) =z v node in the input graph Similarity function specifies how relationships in vector space map to relationships in the original network. Similarity of u and v in the original network d-dimensional embedding similarity(u, v) z > v z u dot product between node embeddings Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW

6 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW From Shallow to Deep So far we have focused on shallow encoders, i.e. embedding lookups: embedding matrix embedding vector for a specific node Z = Dimension/size of embeddings one column per node

7 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW From Shallow to Deep Limitations of shallow encoding: O( V ) parameters are needed: there no parameter sharing and every node has its own unique embedding vector. Inherently transductive : It is impossible to generate embeddings for nodes that were not seen during training. Do not incorporate node features: Many graphs have features that we can and should leverage.

8 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW From Shallow to Deep We will now discuss deeper methods based on graph neural networks. enc(v) = complex function that depends on graph structure. In general, all of these more complex encoders can be combined with the similarity functions from the previous section.

9 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Outline for this Section We will now discuss deeper methods based on graph neural networks. 1. The asics 2. Graph onvolutional Networks (GNs) 3. GraphSGE 4. Gated Graph Neural Networks 5. Subgraph Embeddings

10 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW The asics: Graph Neural Networks ased on material from: Hamilton et al Representation Learning on Graphs: Methods and pplications. IEEE Data Engineering ulletin on Graph Systems. Scarselli et al The Graph Neural Network Model. IEEE Transactions on Neural Networks.

11 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Setup ssume we have a graph G: V is the vertex set. is the adjacency matrix (assume binary). X R m V is a matrix of node features. ategorical attributes, text, image data E.g., profile information in a social network. Node degrees, clustering coefficients, etc. Indicator vectors (i.e., one-hot encoding of each node)

12 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood ggregation Key idea: Generate node embeddings based on local neighborhoods. TRGET NODE D E F D E F INPUT GRPH

13 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood ggregation Intuition: Nodes aggregate information from their neighbors using neural networks TRGET NODE D E F D E F INPUT GRPH

14 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood ggregation Intuition: Network neighborhood defines a computation graph Every node defines a unique computation graph!

15 Neighborhood ggregation Nodes have embeddings at each layer. Model can be arbitrary depth. layer-0 embedding of node u is its input feature, i.e. x u. TRGET NODE D E INPUT GRPH F Layer-2 Layer-1 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW D Layer-0 x F E x x x x E x F x

16 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood onvolutions Neighborhood aggregation can be viewed as a center-surround filter. Mathematically related to spectral graph convolutions (see ronstein et al., 2017)

17 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood ggregation Key distinctions are in how different approaches aggregate information across the layers. what s in the box!? TRGET NODE? D E INPUT GRPH F??? D?? E F

18 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood ggregation asic approach: verage neighbor information and apply a neural network. TRGET NODE 1) average messages from neighbors D E F D E F INPUT GRPH 2) apply neural network

19 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW The Math asic approach: verage neighbor messages and apply a neural network. h 0 v = x v 0 h k v k kth layer embedding non-linearity (e.g., of v ReLU or tanh) Initial layer 0 embeddings are equal to node features X u2n(v) h k 1 u average of neighbor s previous layer embeddings previous layer embedding of v 1 N(v) + kh k v 1, 8k >0

20 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Training the Model How do we train the model to generate high-quality embeddings? z Need to define a loss function on the embeddings, L(z u )!

21 Training the Model h 0 v = x v 0 h k v k X u2n(v) trainable matrices (i.e., what we learn) h k 1 u 1 N(v) + kh k v 1, 8k 2 {1,...,K} z v = h K v fter K-layers of neighborhood aggregation, we get output embeddings for each node. We can feed these embeddings into any loss function and run stochastic gradient descent to train the aggregation parameters. Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW

22 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Training the Model Train in an unsupervised manner using only the graph structure. Unsupervised loss function can be anything from the last section, e.g., based on Random walks (node2vec, DeepWalk) Graph factorization i.e., train the model so that similar nodes have similar embeddings.

23 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Training the Model lternative: Directly train the model for a supervised task (e.g., node classification): Human or bot? Human or bot? e.g., an online social network

24 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Training the Model lternative: Directly train the model for a supervised task (e.g., node classification): Human or bot? L = X v2v classification weights y v log( (z > v )) + (1 y v )log(1 (z > v )) output node embedding node class label

25 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Overview of Model Design 1) Define a neighborhood aggregation function. z 2) Define a loss function on the embeddings, L(z u )

26 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Overview of Model Design 3) Train on a set of nodes, i.e., a batch of compute graphs

27 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Overview of Model 4) Generate embeddings for nodes as needed Even for nodes we never trained on!!!!

28 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Inductive apability The same aggregation parameters are shared for all nodes. The number of model parameters is sublinear in V and we can generalize to unseen nodes! shared parameters W k k D E F shared parameters INPUT GRPH ompute graph for node ompute graph for node

Inductive node embedding generalize to new graph generalize to entirely unseen

29 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Inductive apability z u train on one graph Inductive node embedding generalize to new graph generalize to entirely unseen graphs e.g., train on protein interaction graph from model organism and generate embeddings on newly collected data about organism

30 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Inductive apability z u train with snapshot new node arrives generate embedding for new node Many application settings constantly encounter previously unseen nodes. e.g., Reddit, YouTube, GoogleScholar,. Need to generate new embeddings on the fly

31 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Quick Recap Recap: Generate node embeddings by aggregating neighborhood information. llows for parameter sharing in the encoder. llows for inductive learning. We saw a basic variant of this idea now we will cover some state of the art variants from the literature.

32 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood ggregation Key distinctions are in how different approaches aggregate messages TRGET NODE D E INPUT GRPH What else can we put in the box? F??? D??? E F

33 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Outline for this Section 1. The asics 2. Graph onvolutional Networks 3. GraphSGE 4. Gated Graph Neural Networks 5. Subgraph Embeddings

34 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Graph onvolutional Networks ased on material from: Kipf et al., Semisupervised lassification with Graph onvolutional Networks. ILR.

35 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Graph onvolutional Networks Kipf et al. s Graph onvolutional Networks (GNs) are a slight variation on the neighborhood aggregation idea: h k v = k X u2n(v)[v 1 h k u 1 p N(u) N(v)

36 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Graph onvolutional Networks asic Neighborhood ggregation h k v = h k v = k k X u2n(v) X u2n(v)[v h k 1 u VS. GN Neighborhood ggregation 1 N(v) + kh k v 1 1 h k u 1 p N(u) N(v) same matrix for self and neighbor embeddings per-neighbor normalization

37 Graph onvolutional Networks Empirically, they found this configuration to give the best results. More parameter sharing. Down-weights high degree neighbors. h k v = k X u2n(v)[v 1 h k u 1 p N(u) N(v) use the same transformation matrix for self and neighbor embeddings instead of simple average, normalization varies across neighbors Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW

38 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW atch Implementation an be efficiently implemented using sparse batch operations: H (k+1) = D 1 2 ÃD 1 2 H (k) W k where Ã = + I D ii = X j i,j O( E ) time complexity overall.

39 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Outline for this Section 1. The asics 2. Graph onvolutional Networks 3. GraphSGE 4. Gated Graph Neural Networks 5. Subgraph Embeddings

40 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW GraphSGE ased on material from: Hamilton et al., Inductive Representation Learning on Large Graphs. NIPS.

41 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW GraphSGE Idea So far we have aggregated the neighbor messages by taking their (weighted) average, can we do better? TRGET NODE? D E INPUT GRPH F??? D?? E F

42 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW GraphSGE Idea TRGET NODE D E INPUT GRPH F ny differentiable function that maps set of vectors to a single vector. D E F h k v = k agg({h k u 1, 8u 2 N(v)}), k h k v 1

43 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW GraphSGE Differences Simple neighborhood aggregation: 0 1 X h k h v k u 1 k N(v) + kh k v 1 GraphSGE: u2n(v) generalized aggregation concatenate self embedding and neighbor embedding h k v = W k agg {h k u 1, 8u 2 N(v)}, k h k v 1

44 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW GraphSGE Variants Mean: Pool Transform neighbor vectors and apply symmetric vector function. element-wise mean/max agg = {Qh k u 1, 8u 2 N(v)} LSTM: agg = X u2n(v) h k 1 u N(v) pply LSTM to random permutation of neighbors. agg = LSTM [h k 1 u, 8u 2 (N(v))]

45 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Outline for this Section 1. The asics 2. Graph onvolutional Networks 3. GraphSGE 4. Gated Graph Neural Networks 5. Subgraph Embeddings

46 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Gated Graph Neural Networks ased on material from: Li et al., Gated Graph Sequence Neural Networks. ILR.

47 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood ggregation asic idea: Nodes aggregate messages from their neighbors using neural networks TRGET NODE D E F D E F INPUT GRPH

48 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood ggregation GNs and GraphSGE generally only 2-3 layers deep. TRGET NODE D E F D E F INPUT GRPH

49 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Neighborhood ggregation ut what if we want to go deeper? TRGET NODE 10+ layers!? D E F D.. INPUT GRPH

50 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Gated Graph Neural Networks How can we build models with many layers of neighborhood aggregation? hallenges: Overfitting from too many parameters. Vanishing/exploding gradients during backpropagation. Idea: Use techniques from modern recurrent neural networks!

51 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Gated Graph Neural Networks Idea 1: Parameter sharing across layers. same neural network across layers TRGET NODE D E F E F. INPUT GRPH D

52 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Gated Graph Neural Networks Idea 2: Recurrent state update. TRGET NODE RNN module! RNN module D E F E F. INPUT GRPH D

53 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW The Math Intuition: Neighborhood aggregation with RNN state update. 1. Get message from neighbors at step k: m k v = W X u2n(v) h k 1 u aggregation function does not depend on k 2. Update node state using Gated Recurrent Unit (GRU). New node state depends on the old state and the message from neighbors: h k v =GRU(h k 1 v, m k v)

54 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Gated Graph Neural Networks TRGET NODE D E F RNN module E F. INPUT GRPH D an handle models with >20 layers. Most real-world networks have small diameters (e.g., less than 7). llows for complex information about global graph structure to be propagated to all nodes.

55 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Gated Graph Neural Networks TRGET NODE D E F RNN module E F. INPUT GRPH D Useful for complex networks representing: Logical formulas. Programs.

56 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Outline for this Section 1. The asics 2. Graph onvolutional Networks 3. GraphSGE 4. Gated Graph Neural Networks 5. Subgraph Embeddings

57 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Summary so far Key idea: Generate node embeddings based on local neighborhoods. TRGET NODE D E F D E F INPUT GRPH

58 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Summary so far Graph convolutional networks verage neighborhood information and stack neural networks. GraphSGE Generalized neighborhood aggregation. Gated Graph Neural Networks Neighborhood aggregation + RNNs

59 Recent advances in graph neural nets (not covered in detail here) Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW ttention-based neighborhood aggregation: Graph ttention Networks (Velickovic et al., 2018) GeniePath (Liu et al., 2018) Generalizations based on spectral convolutions: Geometric Deep Learning (ronstein et al., 2017) Mixture Model NNs (Monti et al., 2017) Speed improvements via subsampling: FastGNs (hen et al., 2018) Stochastic GNs (hen et al., 2017)

60 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Outline for this Section 1. The asics 2. Graph onvolutional Networks 3. GraphSGE 4. Gated Graph Neural Networks 5. Subgraph Embeddings

61 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW Subgraph Embeddings ased on material from: Duvenaud et al onvolutional Networks on Graphs for Learning Molecular Fingerprints. IML. Li et al Gated Graph Sequence Neural Networks. ILR.

62 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW (Sub)graph Embeddings So far we have focused on nodelevel embeddings

63 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW (Sub)graph Embeddings ut what about subgraph embeddings?

64 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW pproach 1 Simple idea: Just sum (or average) the node embeddings in the (sub)graph z S = X v2s z v Used by Duvenaud et al., 2016 to classify molecules based on their graph structure.

65 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW pproach 2 Idea: Introduce a virtual node to represent the subgraph and run a standard graph neural network. Proposed by Li et al., 2016 as a general technique for subgraph embedding.

66 Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW (Sub)graph Embeddings Still an open research area! How to embed (sub)graphs with millions or billions of nodes? How to do analogue of NN pooling on networks?

Deep Learning on Graphs

Deep Learning on Graphs with Graph Convolutional Networks Hidden layer Hidden layer Input Output ReLU ReLU, 6 April 2017 joint work with Max Welling (University of Amsterdam) The success story of deep