TGNet: Learning to Rank Nodes in Temporal Graphs Qi Song 1 Bo Zong 2 Yinghui Wu 1,3 Lu-An Tang 2 Hui Zhang 2 Guofei Jiang 2 Haifeng Chen 2 1 2 3
Node Ranking in temporal graphs Temporal graphs have been widely applied to model dynamic networks. Automated systems Computer networks Citation networks Anomaly detection Attack detection Author ranking Ranking models are desired to suggest and maintain the nodes with high priority in dynamic graphs. The node ranks can be influenced by rich context from heterogeneous node information, network structures, and temporal perspectives.
Alert ranking in a heterogeneous system alert network. A fraction of a real-world heterogeneous temporal information network from data center monitoring. (host) (host) h 1 h 2 (service) "Database" h 2 (host) (service) "Network" h 2 (host) (service) "Network" Temporal context: suggests the impact of historical or future events to current decisions. l 1 l 2 a 1 a 2 (alert) (alert) 03:33 am Log 'l1' at 3:33 am, Host 'h1' with service "Database" & content 'login failure'... l 3 l 4 a 3 a 4 (alert) (alert) l 5 l' 5 l 6 a 5 a 6 a 7 l 7 (alert) (alert) (alert) G 09:44 am 1 G 09:45 am 2 G 3 Log 'l3' at 9:44 am, Host 'h2' with service "Network" &content'connectionfailed' Alert 'a5' at 9:45 am, relevant to log l and log,reason: 5 l' 5 'Unusual small # of logs' Structural context: captures the features in the neighborhood of nodes at each snapshot. Node attributes: encodes the input information of a node.
Challenges Learning to rank nodes in temporal graphs is more challenging than its counterpart over static graphs: Context learning: structural and temporal context can be used to differentiate the roles that different nodes play in node ranking. Automated approach to learn context features for node ranking. Context dynamics: context features bear constant changes over time. Effective models that can capture context evolution over time. Label sparsity: labels provided by users are sparse and cover a small fraction of nodes Effective learning that can be generalized from sparsely labeled data. Time cost: fast learning and ranking upon the arrival of new nodes. Effective optimization and provable performance guarantees.
Learning to rank nodes in temporal graphs Input: training data, consists of a temporal graph - and labeled data., a ranking function space /, and an error function 0(2) that measures the ranking error. Output: an optimal function 4 / such that the ranking error 0(4,,) is minimized. Pipeline: User-ranked nodes! ",! $! ",! %! &,! % Node ranking! (! *! + TGNet Model Learning w Model Inference
Overview of TGNet Model Four components: Initialization layer: projects input feature vector into hidden state space; Structural propagation layer: exchanges neighborhood information in a snapshot; Temporal propagation layer: propagates temporal influence between snapshots; Output layer: transforms hidden states to ranking scores. Input feature Vector Hidden State Vector Ranking score (2)Structural Propogation (4)Output (L) (3)Temporal (0) (1) (L) Propagation (0) (1)Initialization Snapshot: G i-1 G i G i+1
TGNet: Initialization Layer When a node first occurs: ($,&)! " = / tanh(-$. 0 1 " + 3 $. ) Hidden state vector Input feature vector Parameter Input feature Vector Hidden State Vector Ranking score 1 " - / $.! " ($,&) (4)Output (2)Structural Propogation (L) (3)Temporal (0) (1) (L) (0) (1)Initialization Propagation Snapshot: G i-1 G i G i+1
TGNet: Structural propagation layer For all nodes within a snapshot: iteratively performing local information propagation (for L times)! ($,&'() " = tanh( / 4 ($,&) 0,"! ($,&) 0 ) 0 2 3 Hidden state vector Structural influence factor Input feature Vector Hidden State Vector Ranking score (0) ($,&) 4 0," (L) (4)Output (2)Structural Propogation (3)Temporal (0) (1) (L) (0) (1)Initialization Propagation (L)! " ($,&'() Snapshot: G i-1 G i G i+1! 0 ($,&)
For nodes exist in two continuous snapshots: TGNet: Temporal propagation layer! " ($%&,() = +",,-,, -./! " ($,0) Hidden state vector Temporal influence factor Input feature Vector Hidden State Vector Ranking score G i G i+1 (4)Output (2)Structural Propogation (L) (0) (1) (L) (0) (1)Initialization (3)Temporal Propagation + ",,-,, -./ Snapshot: G i-1 G i G i+1! " ($,0)! " ($%&,()
TGNet: Output layer For nodes in the last snapshot they appear: +!" # = σ(' ()*, - (.,0) # + 3 ()* ) Ranking value Hidden state vector Parameter Input feature Vector Hidden State Vector Ranking score (4)Output + ' ()* (2)Structural Propogation (L) (0) (1) (L) (0) (1)Initialization (3)Temporal Propagation - # (.,0)!" # Snapshot: G i-1 G i G i+1
Influence modeling Edge-centric: parameters are associated to edges; Can not deal with context changes, daunting for users to choose edge types. Node-centric (influence network): The influence between two nodes is conditioned by their contexts; The node context is determined by its hidden state. ((,*) & ' = 6,-./012(345 7 ((,*) 6 8 3 45 ' 8 9 ((,*) 8 9 ((,*) 8 ' ((,*) + ; 4( ) ((,*) < 9,' Structural influence = ',5>,5 >?@ = A(3 6 5( 7 Temporal influence Inference cost: O( $ ) Neighbor sampling: randomly sample a subset of the neighbors of each node. 8 ' ((,*) / ((,*) 6 8 3 5( ' / + ; 5( ) = ',5>,5 >?@
Parameter learning Objective function: Parameters ( Θ, * =, -,. & 0 Θ, 1, 2 +4 5 6 5 (Θ) +4 7 6 7 (Θ, %) Loss function: measure the error made by Θ 0 Θ, 1, 2 = 1 ( :;. :; - ) L 7 regularization: penalize model complexity 6 5 Θ = = 7 Graph smoothness: penalize high dissimilarity between a node and its neighbors @ 7 6 7 Θ, % =, >?5, C (>,D) (>,D) - C. -,. A B 7 Gradient Descent Iteratively compute error resulting from current Θ and update Θ by their gradients. Learning cost: O( % & )
Experiment Setting Dataset Dataset System log data (SLD) ISCX Intrusion detection data (IDS) Microsoft academic graph (MAG) Amazon co-purchasing network (APC) #Snapshots # of nodes # of edges # of Training pairs SLD 19.4K 61.4K 258.1K 20K IDS 66.7K 5.7M 11.5M 100K MAG 12K 2.5M 16.2M 100K APC 1.5K 2.1M 9.5M 100K Baseline: TGNet_BA, TGNet_IN Supervised PageRank (SPR) Dynamic PageRank (DPR) SVMRank Alert fusion for intrusion detection (LRT) Network anomaly detection (BGCM) Metric: Top-k version of Normalized Discounted Cumulative Gain (NDCG@k) k is determined by domain experts
Experiment Results Accuracy comparison BGCM LRT SVMRank SPR DPR TGNet_BA TGNet_IN TGNet SLD 0.35 0.72 0.74 0.78 0.56 0.88 0.90 0.92 IDS 0.32 0.59 0.70 0.72 0.67 0.77 0.85 0.87 MAG - - 0.56 0.69 0.61 0.79 0.86 0.91 APC - - 0.52 0.55 0.45 0.73 0.82 0.88 Accuracy varying parameters 1 0.9 1 0.9 TGNet TGNet_IN TGNet_BA 1 SPR SVMRank LRT NDCG@k 0.8 0.7 0.6 TGNet TGNet_IN TGNet_BA 1 5 10 15 NDCG@k 0.8 0.7 0.6 TGNet TGNet_IN TGNet_BA 1 2 3 4 5 NDCG@k 0.8 0.6 0.4 20% 40% 60% 80% 100% Varying d h (number of hidden state dimension): TGNet achieves good accuracy with small d h. Varying L (# of structural propagation iteration): prediction performance tends to converge when L is small. Varying labeled data size: TGNet needs the least amount of labeled data to achieve the highest accuracy.
Experiment Results Learning cost Case study Time(minutes) 500 400 300 200 100 TGNET(GPU) TGNet(CPU) TGNet(CPU)_opt SPR 1. Critical alert 'a 1 ' starts at 12:18:41, ends at 12:18:42, with reason 'Excessive number of login failure message', relevant to log "l 3,l 4 ". 2. Ignore alert 'a 2 ' starts at 12:18:42, ends at 12:18:42, with reason Number of service started message below threshold, relevant to log "l 5 ". h 1 SQL (host) 0.82 0.58 Server (service) (host) 0.63 SQL Server h 1 h 1 (host) (service) 0.71 SQL Server (service) SQL Server Agent (service) 0 10k 20k 30k 40k 50k 60k 0.77 0.81 0.65 0.78 0.39 0.07 Varying graph size on synthetic SLD: training is 10 times faster than SPR l 1 :SAP DB connection error l 2 :SAP DB connection error l 3 :Login failure 0.89 l 4 :Login failure l 5 :Service Started 0.97 0.12 12:18:38 a 1 (alert) 12:18:41 12:18:42 a 1 (alert) a 2 (alert)
Conclusion TGNet: a novel graph neural network network Explicitly tackles challenges in node ranking for temporal graphs; Influence network: boost the capability of modeling context dynamics; Graph smoothness: cope with label sparsity; End-to-end learning algorithm with optimization techniques. Sponsored by:
Thank you!