TGNet: Learning to Rank Nodes in Temporal Graphs. Qi Song 1 Bo Zong 2 Yinghui Wu 1,3 Lu-An Tang 2 Hui Zhang 2 Guofei Jiang 2 Haifeng Chen 2

Similar documents
TriRank: Review-aware Explainable Recommendation by Modeling Aspects

Mining Web Data. Lijun Zhang

Sentiment Classification of Food Reviews

Query Independent Scholarly Article Ranking

Machine Learning Classifiers and Boosting

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Mining Web Data. Lijun Zhang

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

Recommender Systems. Collaborative Filtering & Content-Based Recommending

Revolver: Vertex-centric Graph Partitioning Using Reinforcement Learning

A Dendrogram. Bioinformatics (Lec 17)

An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures

Influence Maximization in Location-Based Social Networks Ivan Suarez, Sudarshan Seshadri, Patrick Cho CS224W Final Project Report

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Akarsh Pokkunuru EECS Department Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

Semi-supervised Data Representation via Affinity Graph Learning

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Predictive Indexing for Fast Search

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Dropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Machine Learning 13. week

Meta-path based Multi-Network Collective Link Prediction

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

Build a system health check for Db2 using IBM Machine Learning for z/os

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

CS229 Final Project: Predicting Expected Response Times

Deep Learning Applications

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Supervised Link Prediction with Path Scores

Learning Social Graph Topologies using Generative Adversarial Neural Networks

10703 Deep Reinforcement Learning and Control

CS294-1 Assignment 2 Report

Graph Neural Network. learning algorithm and applications. Shujia Zhang

Slides credited from Dr. David Silver & Hung-Yi Lee

Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

Deep Learning for Computer Vision II

arxiv: v1 [cs.db] 22 Mar 2018

CS224W Project: Recommendation System Models in Product Rating Predictions

Lecture 20: Neural Networks for NLP. Zubin Pahuja

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

Using Machine Learning to Optimize Storage Systems

Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning

Introduction to Objective Analysis

Structured Learning. Jun Zhu

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Scalable Influence Maximization in Social Networks under the Linear Threshold Model

Opening the Black Box Data Driven Visualizaion of Neural N

LEARNING A SPARSE DICTIONARY OF VIDEO STRUCTURE FOR ACTIVITY MODELING. Nandita M. Nayak, Amit K. Roy-Chowdhury. University of California, Riverside

An RSSI Gradient-based AP Localization Algorithm

Collaborative filtering models for recommendations systems

CS224W Final Report Emergence of Global Status Hierarchy in Social Networks

CS381V Experiment Presentation. Chun-Chen Kuo

Markov Random Fields and Gibbs Sampling for Image Denoising

Pseudo-Implicit Feedback for Alleviating Data Sparsity in Top-K Recommendation

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

DIAL: A Distributed Adaptive-Learning Routing Method in VDTNs

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Efficient Iterative Semi-supervised Classification on Manifold

Conflict Graphs for Parallel Stochastic Gradient Descent

Personalized Web Search

Exploiting ERP Systems in Enterprise Search

Supervised Learning in Neural Networks (Part 2)

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

Fraud Detection of Mobile Apps

Online Social Networks and Media

Review on Data Mining Techniques for Intrusion Detection System

PERSONALIZED TAG RECOMMENDATION

Machine Learning for User Behavior Anomaly Detection EUGENE NEYOLOV, HEAD OF R&D

ACM MM Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang

R. R. Badre Associate Professor Department of Computer Engineering MIT Academy of Engineering, Pune, Maharashtra, India

Massive Data Analysis

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze)

arxiv: v1 [cs.si] 12 Jan 2019

PARALLELIZED IMPLEMENTATION OF LOGISTIC REGRESSION USING MPI

Applying Supervised Learning

Citation Prediction in Heterogeneous Bibliographic Networks

CS6220: DATA MINING TECHNIQUES

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

Meta-path based Multi-Network Collective Link Prediction

Bilevel Sparse Coding

A Model Based Neuron Detection Approach Using Sparse Location Priors

For Monday. Read chapter 18, sections Homework:

Why do we need graph processing?

Clustering will not be satisfactory if:

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

291 Programming Assignment #3

RESTORING ARTIFACT-FREE MICROSCOPY IMAGE SEQUENCES. Robotics Institute Carnegie Mellon University 5000 Forbes Ave, Pittsburgh, PA 15213, USA

Introduction to Data Mining

House Price Prediction Using LSTM

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Scalable Network Analysis

Online Algorithm Comparison points

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Transcription:

TGNet: Learning to Rank Nodes in Temporal Graphs Qi Song 1 Bo Zong 2 Yinghui Wu 1,3 Lu-An Tang 2 Hui Zhang 2 Guofei Jiang 2 Haifeng Chen 2 1 2 3

Node Ranking in temporal graphs Temporal graphs have been widely applied to model dynamic networks. Automated systems Computer networks Citation networks Anomaly detection Attack detection Author ranking Ranking models are desired to suggest and maintain the nodes with high priority in dynamic graphs. The node ranks can be influenced by rich context from heterogeneous node information, network structures, and temporal perspectives.

Alert ranking in a heterogeneous system alert network. A fraction of a real-world heterogeneous temporal information network from data center monitoring. (host) (host) h 1 h 2 (service) "Database" h 2 (host) (service) "Network" h 2 (host) (service) "Network" Temporal context: suggests the impact of historical or future events to current decisions. l 1 l 2 a 1 a 2 (alert) (alert) 03:33 am Log 'l1' at 3:33 am, Host 'h1' with service "Database" & content 'login failure'... l 3 l 4 a 3 a 4 (alert) (alert) l 5 l' 5 l 6 a 5 a 6 a 7 l 7 (alert) (alert) (alert) G 09:44 am 1 G 09:45 am 2 G 3 Log 'l3' at 9:44 am, Host 'h2' with service "Network" &content'connectionfailed' Alert 'a5' at 9:45 am, relevant to log l and log,reason: 5 l' 5 'Unusual small # of logs' Structural context: captures the features in the neighborhood of nodes at each snapshot. Node attributes: encodes the input information of a node.

Challenges Learning to rank nodes in temporal graphs is more challenging than its counterpart over static graphs: Context learning: structural and temporal context can be used to differentiate the roles that different nodes play in node ranking. Automated approach to learn context features for node ranking. Context dynamics: context features bear constant changes over time. Effective models that can capture context evolution over time. Label sparsity: labels provided by users are sparse and cover a small fraction of nodes Effective learning that can be generalized from sparsely labeled data. Time cost: fast learning and ranking upon the arrival of new nodes. Effective optimization and provable performance guarantees.

Learning to rank nodes in temporal graphs Input: training data, consists of a temporal graph - and labeled data., a ranking function space /, and an error function 0(2) that measures the ranking error. Output: an optimal function 4 / such that the ranking error 0(4,,) is minimized. Pipeline: User-ranked nodes! ",! $! ",! %! &,! % Node ranking! (! *! + TGNet Model Learning w Model Inference

Overview of TGNet Model Four components: Initialization layer: projects input feature vector into hidden state space; Structural propagation layer: exchanges neighborhood information in a snapshot; Temporal propagation layer: propagates temporal influence between snapshots; Output layer: transforms hidden states to ranking scores. Input feature Vector Hidden State Vector Ranking score (2)Structural Propogation (4)Output (L) (3)Temporal (0) (1) (L) Propagation (0) (1)Initialization Snapshot: G i-1 G i G i+1

TGNet: Initialization Layer When a node first occurs: ($,&)! " = / tanh(-$. 0 1 " + 3 $. ) Hidden state vector Input feature vector Parameter Input feature Vector Hidden State Vector Ranking score 1 " - / $.! " ($,&) (4)Output (2)Structural Propogation (L) (3)Temporal (0) (1) (L) (0) (1)Initialization Propagation Snapshot: G i-1 G i G i+1

TGNet: Structural propagation layer For all nodes within a snapshot: iteratively performing local information propagation (for L times)! ($,&'() " = tanh( / 4 ($,&) 0,"! ($,&) 0 ) 0 2 3 Hidden state vector Structural influence factor Input feature Vector Hidden State Vector Ranking score (0) ($,&) 4 0," (L) (4)Output (2)Structural Propogation (3)Temporal (0) (1) (L) (0) (1)Initialization Propagation (L)! " ($,&'() Snapshot: G i-1 G i G i+1! 0 ($,&)

For nodes exist in two continuous snapshots: TGNet: Temporal propagation layer! " ($%&,() = +",,-,, -./! " ($,0) Hidden state vector Temporal influence factor Input feature Vector Hidden State Vector Ranking score G i G i+1 (4)Output (2)Structural Propogation (L) (0) (1) (L) (0) (1)Initialization (3)Temporal Propagation + ",,-,, -./ Snapshot: G i-1 G i G i+1! " ($,0)! " ($%&,()

TGNet: Output layer For nodes in the last snapshot they appear: +!" # = σ(' ()*, - (.,0) # + 3 ()* ) Ranking value Hidden state vector Parameter Input feature Vector Hidden State Vector Ranking score (4)Output + ' ()* (2)Structural Propogation (L) (0) (1) (L) (0) (1)Initialization (3)Temporal Propagation - # (.,0)!" # Snapshot: G i-1 G i G i+1

Influence modeling Edge-centric: parameters are associated to edges; Can not deal with context changes, daunting for users to choose edge types. Node-centric (influence network): The influence between two nodes is conditioned by their contexts; The node context is determined by its hidden state. ((,*) & ' = 6,-./012(345 7 ((,*) 6 8 3 45 ' 8 9 ((,*) 8 9 ((,*) 8 ' ((,*) + ; 4( ) ((,*) < 9,' Structural influence = ',5>,5 >?@ = A(3 6 5( 7 Temporal influence Inference cost: O( $ ) Neighbor sampling: randomly sample a subset of the neighbors of each node. 8 ' ((,*) / ((,*) 6 8 3 5( ' / + ; 5( ) = ',5>,5 >?@

Parameter learning Objective function: Parameters ( Θ, * =, -,. & 0 Θ, 1, 2 +4 5 6 5 (Θ) +4 7 6 7 (Θ, %) Loss function: measure the error made by Θ 0 Θ, 1, 2 = 1 ( :;. :; - ) L 7 regularization: penalize model complexity 6 5 Θ = = 7 Graph smoothness: penalize high dissimilarity between a node and its neighbors @ 7 6 7 Θ, % =, >?5, C (>,D) (>,D) - C. -,. A B 7 Gradient Descent Iteratively compute error resulting from current Θ and update Θ by their gradients. Learning cost: O( % & )

Experiment Setting Dataset Dataset System log data (SLD) ISCX Intrusion detection data (IDS) Microsoft academic graph (MAG) Amazon co-purchasing network (APC) #Snapshots # of nodes # of edges # of Training pairs SLD 19.4K 61.4K 258.1K 20K IDS 66.7K 5.7M 11.5M 100K MAG 12K 2.5M 16.2M 100K APC 1.5K 2.1M 9.5M 100K Baseline: TGNet_BA, TGNet_IN Supervised PageRank (SPR) Dynamic PageRank (DPR) SVMRank Alert fusion for intrusion detection (LRT) Network anomaly detection (BGCM) Metric: Top-k version of Normalized Discounted Cumulative Gain (NDCG@k) k is determined by domain experts

Experiment Results Accuracy comparison BGCM LRT SVMRank SPR DPR TGNet_BA TGNet_IN TGNet SLD 0.35 0.72 0.74 0.78 0.56 0.88 0.90 0.92 IDS 0.32 0.59 0.70 0.72 0.67 0.77 0.85 0.87 MAG - - 0.56 0.69 0.61 0.79 0.86 0.91 APC - - 0.52 0.55 0.45 0.73 0.82 0.88 Accuracy varying parameters 1 0.9 1 0.9 TGNet TGNet_IN TGNet_BA 1 SPR SVMRank LRT NDCG@k 0.8 0.7 0.6 TGNet TGNet_IN TGNet_BA 1 5 10 15 NDCG@k 0.8 0.7 0.6 TGNet TGNet_IN TGNet_BA 1 2 3 4 5 NDCG@k 0.8 0.6 0.4 20% 40% 60% 80% 100% Varying d h (number of hidden state dimension): TGNet achieves good accuracy with small d h. Varying L (# of structural propagation iteration): prediction performance tends to converge when L is small. Varying labeled data size: TGNet needs the least amount of labeled data to achieve the highest accuracy.

Experiment Results Learning cost Case study Time(minutes) 500 400 300 200 100 TGNET(GPU) TGNet(CPU) TGNet(CPU)_opt SPR 1. Critical alert 'a 1 ' starts at 12:18:41, ends at 12:18:42, with reason 'Excessive number of login failure message', relevant to log "l 3,l 4 ". 2. Ignore alert 'a 2 ' starts at 12:18:42, ends at 12:18:42, with reason Number of service started message below threshold, relevant to log "l 5 ". h 1 SQL (host) 0.82 0.58 Server (service) (host) 0.63 SQL Server h 1 h 1 (host) (service) 0.71 SQL Server (service) SQL Server Agent (service) 0 10k 20k 30k 40k 50k 60k 0.77 0.81 0.65 0.78 0.39 0.07 Varying graph size on synthetic SLD: training is 10 times faster than SPR l 1 :SAP DB connection error l 2 :SAP DB connection error l 3 :Login failure 0.89 l 4 :Login failure l 5 :Service Started 0.97 0.12 12:18:38 a 1 (alert) 12:18:41 12:18:42 a 1 (alert) a 2 (alert)

Conclusion TGNet: a novel graph neural network network Explicitly tackles challenges in node ranking for temporal graphs; Influence network: boost the capability of modeling context dynamics; Graph smoothness: cope with label sparsity; End-to-end learning algorithm with optimization techniques. Sponsored by:

Thank you!