Graph Mining: Overview of different graph models

Similar documents
Non Overlapping Communities

Graph Mining: Introduction

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

V2: Measures and Metrics (II)

Positive and Negative Links

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Exploiting Social Network Structure for Person-to-Person Sentiment Analysis (Supplementary Material to [WPLP14])

Random projection for non-gaussian mixture models

Coloring Signed Graphs

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Fast Nearest Neighbor Search on Large Time-Evolving Graphs

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4)

1. a graph G = (V (G), E(G)) consists of a set V (G) of vertices, and a set E(G) of edges (edges are pairs of elements of V (G))

Graph Exploration: Taking the User into the Loop

Approximation Algorithms

CS473-Algorithms I. Lecture 13-A. Graphs. Cevdet Aykanat - Bilkent University Computer Engineering Department

Probabilistic Graph Summarization

Scalable Network Analysis

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Introduction to Machine Learning

Solving problems on graph algorithms

Chapter 9 Graph Algorithms

Chapter 9 Graph Algorithms

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Combinatorial Optimization

Automatic Domain Partitioning for Multi-Domain Learning

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Chapter 9 Graph Algorithms

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Mining Social Network Graphs

Link Prediction for Social Network

Paths, Circuits, and Connected Graphs

CS200: Graphs. Rosen Ch , 9.6, Walls and Mirrors Ch. 14

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Graph Theory. ICT Theory Excerpt from various sources by Robert Pergl

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

Modeling Dynamic Behavior in Large Evolving Graphs

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Lecture Note: Computation problems in social. network analysis

Randomized Graph Algorithms

Edge Classification in Networks

Epilog: Further Topics

Graph Theory: Matchings and Factors

Matching and Covering

Computer-based Tracking Protocols: Improving Communication between Databases

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Definition: A graph G = (V, E) is called a tree if G is connected and acyclic. The following theorem captures many important facts about trees.

Adjacent: Two distinct vertices u, v are adjacent if there is an edge with ends u, v. In this case we let uv denote such an edge.

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Elements of Graph Theory

1 Unweighted Set Cover

Discrete Structures CISC 2315 FALL Graphs & Trees

Balanced and partitionable signed graphs

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

COMP 251 Winter 2017 Online quizzes with answers

Extracting Information from Complex Networks

Algorithm Circle Extra Lecture: Solving the Assignment Problem with Network Flow

An Introduction to Graph Theory

Clustering Using Graph Connectivity

Modularity CMSC 858L

Graph Theory: Applications and Algorithms

CAIM: Cerca i Anàlisi d Informació Massiva

Approximation Algorithms: The Primal-Dual Method. My T. Thai

Lecture 5: Exact inference

A Partition Method for Graph Isomorphism

Greedy Algorithms. Previous Examples: Huffman coding, Minimum Spanning Tree Algorithms

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

ECS 20 Lecture 17b = Discussion D8 Fall Nov 2013 Phil Rogaway

Introduction to Mathematical Programming IE406. Lecture 16. Dr. Ted Ralphs

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CPS 102: Discrete Mathematics. Quiz 3 Date: Wednesday November 30, Instructor: Bruce Maggs NAME: Prob # Score. Total 60

Paths. Path is a sequence of edges that begins at a vertex of a graph and travels from vertex to vertex along edges of the graph.

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

DENSITY BASED AND PARTITION BASED CLUSTERING OF UNCERTAIN DATA BASED ON KL-DIVERGENCE SIMILARITY MEASURE

Part II. Graph Theory. Year

Algorithm Design and Analysis

Networks in economics and finance. Lecture 1 - Measuring networks

Algorithms for Grid Graphs in the MapReduce Model

Weighted Graphs and Greedy Algorithms

Graphs (MTAT , 6 EAP) Lectures: Mon 14-16, hall 404 Exercises: Wed 14-16, hall 402

Optimal tour along pubs in the UK

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 24: Online Algorithms

NP-Completeness. Algorithms

Algorithms. Graphs. Algorithms

Introduction to Approximation Algorithms

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Semi-Automatic Transcription Tool for Ancient Manuscripts

Greedy Approximations

Notes for Lecture 24

Final Exam DATA MINING I - 1DL360

1 The Traveling Salesperson Problem (TSP)

Introduction to Graph Theory

Instructor: Paul Zeitz, University of San Francisco

Note Set 4: Finite Mixture Models and the EM Algorithm

Graph Data Processing with MapReduce

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Algorithm Design and Analysis

Transcription:

Graph Mining: Overview of different graph models Davide Mottin, Konstantina Lazaridou Hasso Plattner Institute Graph Mining course Winter Semester 2016

Lecture road Anomaly detection (previous lecture) Representatives of Probabilistic (Uncertain) graphs Introduction to Signed networks 2

Graph models Graphs are everywhere! Various interesting models that we haven t analyzed in the lecture.. graph streams evolving graphs attributed graphs probabilistic graphs signed graphs colored graphs... 3

Definitions Graph stream sequence of unordered pairs e = {u, v} where u, v [n], S = (e 1, e 2,..., e mi ) Time evolving graph sequence of static graphs {G1, G2,..., Gn}, where Gt = (Vt,Et) is a snapshot of the evolving graph at timestamp t Attributed graph G = (V, E, A) where V is the vertex set, E is the edge set, and A is the attribute set that contains unary attribute a i (linked to each node n i ) and binary attribute a ij (linked to each edge e k =(n i,n j ) E), Colored graph G = (V, E) in which each vertex is assigned a color. properly colored graph: color assignments conform to the coloring rules applied to the graph 4

Probabilistic graphs - Outline Uncertainty in data Introduction to uncertain graphs Model definition Applications Problems Finding representatives in probabilistic graphs Problem definition Algorithms GRAPH MINING WS 2016 5

Uncertainty in data Noise in generation sensors Noise in collection missing instances Biological data protein-protein interaction probability Problem s nature risk, trust, influence, status Anonymized data privacy preservation of user generated data GRAPH MINING WS 2016 6

What is an uncertain graph? A graph where each edge has an associated probability p:[0,1] to it Figure 1: (left) An unweighted probabilistic graph G, (right) G with the expected vertex degrees (in Italics) associated to each node GRAPH MINING WS 2016 7

Possible applications and problems Modelling of probabilities in protein-protein interaction graphs Modelling relationships in social graphs Problems that apply to deterministic graphs algorithms need to be redesigned to incorporate uncertainty Data anonymization one of the possible worlds corresponds the original data Frequent subgraph mining frequency is redefined using the edge probabilities Queries based on shortest paths returns paths with very low probabilities GRAPH MINING WS 2016 8

Graph model definition A probabilistic graph is represented as G = (V, E, W, p), where V is the set of vertices, E is the set of edges, for weighted graphs W: V х V R denotes the weights associated with every edge and p maps every pair of nodes to a real number in [0, 1] p uv represents the probability that edge (u,v) exists in the uncertain network For a probabilistic graph G, 2 " deterministic graphs can be generated these graphs are called possible worlds GRAPH MINING WS 2016 9

Possible world semantics [1] Often in the literature it is assumed that the edge probabilities are independent is this always the case? For simplicity, various approaches treat the probabilities of the edges as weights Others only consider the edges having a probability p>t not valid assumptions in many scenarios! [1] S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds, SIGMOD 1987 GRAPH MINING WS 2016 10

Sampling The probability that a certain graph G=(V,E) will be sampled from G is computed as follows: P[G] = Π(u,v) ϵ E Puv * Π(u,v) ϵ (VxV)\E (1 Puv) Given G and the vertex degrees, we can also calculated the vertex discrepancies disu(g) = degu(g) degu(g), where u is a node in G G s discrepancy is defined as the sum of all node discrepancies G = argmin G: world of G Δ(G) Figure 2: (left) G with the expected vertex degrees associated to each node, (right) a certain instance G of G with the vertex discrepancies GRAPH MINING WS 2016 11

What if we could work on a deterministic graph instead? How do we benefit? Computational complexity would be much lower! Traditional data mining algorithms could be applied Which characteristics should this certain graph maintain from the uncertain one? same number of vertices.. which edges should be included? GRAPH MINING WS 2016 12

Outline - Probabilistic graphs Uncertainty in data Introduction to uncertain graphs Model definition Applications Problems Finding representatives in probabilistic graphs Problem definition Algorithms GRAPH MINING WS 2016 13

Finding representatives in probabilistic graphs [2] A representative G of a probabilistic graph G is a deterministic graph that its vertices will present the least possible discrepancy More formally Given an undirected uncertain graph G = (V, E, W, p), the representative is an exact instance G of G (possible world), such that each vertex degree will have the minimum deviation from its expected value [2] The Pursuit of a Good Possible World: Extracting Representative Instances of Uncertain Graphs, Panos Parchas et. al, ACM SIGMOD 2014 GRAPH MINING WS 2016 14

Introduced algorithms Baseline 1 : Greedy probability each edge e=(u,v) belongs to G, if it decreases the total discrepancy Baseline 2 : Most probable each edge e=(u,v) belongs to G, if p e 0.5 holds ADR (average degree rewiring) aims at preserving the expected average degree of G ABM (approximate b-matching) preserves the expected vertex degrees GRAPH MINING WS 2016 15

ADR: average degree rewiring What is the expected average degree? degavg(g) = 2*P/ V, where P is the sum of all edge probabilities in G In order to preserve it, G should contain exactly P edges Main steps of ADR Construct a seed set E0 of the edges in G For a given number of times k Swap the edges in E0 with edges in E\E0, so that the overall discrepancy of the representative decreases GRAPH MINING WS 2016 16

Pseudocode Initialization, computation of P, sort E in decreasing order by the edge probabilities For each e in E if random x<=pe: insert into E0, update G C = E\E0 For k times For each node u in G I = incident edges of u choose randomly e1 in I and e2 in C to swap compute the overall discrepancy before and after the potential swap if improvement: swap e1 with e2in E,C respectively, update discrepancies GRAPH MINING WS 2016 17

ADR example: edge probabilities GRAPH MINING WS 2016 18

ADR: a possible world and the discrepancies GRAPH MINING WS 2016 19

ADR: first iterations GRAPH MINING WS 2016 20

d1+d2 < 0 explanation For replacing (u,v) with (x,y) d1 = disu (G) - 1 + disv (G) 1 - ( disu (G) + disv (G) ) d2 = disx (G) + 1 + disy(g) + 1 - ( disx (G) + disy(g) ) Sumuv_bef = disu (G) + disv (G) Sumuv_after = disu (G) 1 + disv (G) 1 Sumxy_bef = disx (G) + disy (G) Sumxy_after = disx (G) + 1 + disy (G) + 1 d1 = Sumuv_after Sumuv_bef d2 = Sumxy_after Sumxy_bef If d1 and d2 are positive, then Sumuv_after > Sumuv_bef Sumxy_after > Sumxy_bef none of the underlying nodes benefits from the swap... GRAPH MINING WS 2016 21

References Uncertain data On the representation and querying of sets of possible worlds A survey of uncertain data algorithms and applications Uncertain graphs The pursuit of a good possible world: extracting representative instances of uncertain graphs Uncertain graph sparsification Uncertain graph processing through representative instances Triangle-based representative possible worlds of uncertain graphs Clustering large probabilistic graphs Algorithms for mining uncertain graph data K-nearest neighbors in uncertain graphs GRAPH MINING WS 2016 22

Lecture road Anomaly detection Representatives of Probabilistic (Uncertain) graphs Introduction to signed networks 23

What is a signed network? It is a graph G=(V,E), where each edge is mapped to a sign A sign can be positive or negative The sign of a path is the product of the signs of its edges Typically a signed network is denoted by: Σ = G(V,E,σ), where σ, or the signature of the graph, is the function σ: E->(+,-) u v +/- +/- k +/- GRAPH MINING WS 2016 24

What is balance? The enemy of my enemy is my friend! History.. Fritz Heider (psychologist) and Frank Harary (mathematician) lay the foundations of the signed graphs and the balance theory Original idea of P-O-X model how are social relations modeled? are they balanced? P + O - X + GRAPH MINING WS 2016 25

Example of the P-O-X model Imagine that you are person P and that O is someone, whom you think highly of, now imagine X is a presidential candidate you dislike, but X vehemently endorsees O. What do you suspect would happen? + P needs to agree with his friend O, or needs to unfriend O! - + the situation is unbalanced... GRAPH MINING WS 2016 26

Balance theory Theorem 1: G is balanced if every path p between u, v have the same sign Theorem 2: A signed graph is balanced if and only if V can be bipartitioned, s. t. each edge between the parts is negative and each edge within a part is positive GRAPH MINING WS 2016 27

Status theory [3] The signs in balance theory are perceived as likes/dislikes Can they also indicate another relation? in the context of directed social networks, the intention of the user creating the link matters.. I think O has a higher status than I do P + O - X I think O has a lower status than I do [3] Signed Networks in Social Media, Jure Leskovec, SIGCHI 2010 GRAPH MINING WS 2016 28

Some possible applications Modelling interactions in Chemical/Biological networks Social network analysis Political and economical relations Graph Algorithms, Applications and Implementations, Charles Phillips GRAPH MINING WS 2016 29

References More material Signed graphs, Matthias Beck Graph Algorithms, Applications and Implementations, Charles Phillips Harary : On the notion of balance of a signed graph Networks, Crowds, and Markets: Reasoning about a Highly Connected World, Chapter 5: Positive and Negative Relationships Research problems on signed graphs Signed graphs in social media Community Mining in Signed Social Networks An Automated Approach Polarity Related Influence Maximization in Signed Social Networks Node Classification in Signed Social Networks Predicting Positive and Negative Links in Online Social Networks GRAPH MINING WS 2016 30

In the next episodes 3rd presentation date Course Evaluation Exams and maybe more! 31

Questions? 32

References Akoglu, L., McGlohon, M. and Faloutsos, C.. Oddball: Spotting anomalies in weighted graphs. PAKDD, 2010. Tong, H. and Lin, C.Y. Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. In SDM, 2011. Xing, E.P., Ng, A.Y., Jordan, M.I. and Russell, S. Distance metric learning with application to clustering with side-information. In NIPS, 2002. 33