Graph Mining: Overview of different graph models Davide Mottin, Konstantina Lazaridou Hasso Plattner Institute Graph Mining course Winter Semester 2016
Lecture road Anomaly detection (previous lecture) Representatives of Probabilistic (Uncertain) graphs Introduction to Signed networks 2
Graph models Graphs are everywhere! Various interesting models that we haven t analyzed in the lecture.. graph streams evolving graphs attributed graphs probabilistic graphs signed graphs colored graphs... 3
Definitions Graph stream sequence of unordered pairs e = {u, v} where u, v [n], S = (e 1, e 2,..., e mi ) Time evolving graph sequence of static graphs {G1, G2,..., Gn}, where Gt = (Vt,Et) is a snapshot of the evolving graph at timestamp t Attributed graph G = (V, E, A) where V is the vertex set, E is the edge set, and A is the attribute set that contains unary attribute a i (linked to each node n i ) and binary attribute a ij (linked to each edge e k =(n i,n j ) E), Colored graph G = (V, E) in which each vertex is assigned a color. properly colored graph: color assignments conform to the coloring rules applied to the graph 4
Probabilistic graphs - Outline Uncertainty in data Introduction to uncertain graphs Model definition Applications Problems Finding representatives in probabilistic graphs Problem definition Algorithms GRAPH MINING WS 2016 5
Uncertainty in data Noise in generation sensors Noise in collection missing instances Biological data protein-protein interaction probability Problem s nature risk, trust, influence, status Anonymized data privacy preservation of user generated data GRAPH MINING WS 2016 6
What is an uncertain graph? A graph where each edge has an associated probability p:[0,1] to it Figure 1: (left) An unweighted probabilistic graph G, (right) G with the expected vertex degrees (in Italics) associated to each node GRAPH MINING WS 2016 7
Possible applications and problems Modelling of probabilities in protein-protein interaction graphs Modelling relationships in social graphs Problems that apply to deterministic graphs algorithms need to be redesigned to incorporate uncertainty Data anonymization one of the possible worlds corresponds the original data Frequent subgraph mining frequency is redefined using the edge probabilities Queries based on shortest paths returns paths with very low probabilities GRAPH MINING WS 2016 8
Graph model definition A probabilistic graph is represented as G = (V, E, W, p), where V is the set of vertices, E is the set of edges, for weighted graphs W: V х V R denotes the weights associated with every edge and p maps every pair of nodes to a real number in [0, 1] p uv represents the probability that edge (u,v) exists in the uncertain network For a probabilistic graph G, 2 " deterministic graphs can be generated these graphs are called possible worlds GRAPH MINING WS 2016 9
Possible world semantics [1] Often in the literature it is assumed that the edge probabilities are independent is this always the case? For simplicity, various approaches treat the probabilities of the edges as weights Others only consider the edges having a probability p>t not valid assumptions in many scenarios! [1] S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds, SIGMOD 1987 GRAPH MINING WS 2016 10
Sampling The probability that a certain graph G=(V,E) will be sampled from G is computed as follows: P[G] = Π(u,v) ϵ E Puv * Π(u,v) ϵ (VxV)\E (1 Puv) Given G and the vertex degrees, we can also calculated the vertex discrepancies disu(g) = degu(g) degu(g), where u is a node in G G s discrepancy is defined as the sum of all node discrepancies G = argmin G: world of G Δ(G) Figure 2: (left) G with the expected vertex degrees associated to each node, (right) a certain instance G of G with the vertex discrepancies GRAPH MINING WS 2016 11
What if we could work on a deterministic graph instead? How do we benefit? Computational complexity would be much lower! Traditional data mining algorithms could be applied Which characteristics should this certain graph maintain from the uncertain one? same number of vertices.. which edges should be included? GRAPH MINING WS 2016 12
Outline - Probabilistic graphs Uncertainty in data Introduction to uncertain graphs Model definition Applications Problems Finding representatives in probabilistic graphs Problem definition Algorithms GRAPH MINING WS 2016 13
Finding representatives in probabilistic graphs [2] A representative G of a probabilistic graph G is a deterministic graph that its vertices will present the least possible discrepancy More formally Given an undirected uncertain graph G = (V, E, W, p), the representative is an exact instance G of G (possible world), such that each vertex degree will have the minimum deviation from its expected value [2] The Pursuit of a Good Possible World: Extracting Representative Instances of Uncertain Graphs, Panos Parchas et. al, ACM SIGMOD 2014 GRAPH MINING WS 2016 14
Introduced algorithms Baseline 1 : Greedy probability each edge e=(u,v) belongs to G, if it decreases the total discrepancy Baseline 2 : Most probable each edge e=(u,v) belongs to G, if p e 0.5 holds ADR (average degree rewiring) aims at preserving the expected average degree of G ABM (approximate b-matching) preserves the expected vertex degrees GRAPH MINING WS 2016 15
ADR: average degree rewiring What is the expected average degree? degavg(g) = 2*P/ V, where P is the sum of all edge probabilities in G In order to preserve it, G should contain exactly P edges Main steps of ADR Construct a seed set E0 of the edges in G For a given number of times k Swap the edges in E0 with edges in E\E0, so that the overall discrepancy of the representative decreases GRAPH MINING WS 2016 16
Pseudocode Initialization, computation of P, sort E in decreasing order by the edge probabilities For each e in E if random x<=pe: insert into E0, update G C = E\E0 For k times For each node u in G I = incident edges of u choose randomly e1 in I and e2 in C to swap compute the overall discrepancy before and after the potential swap if improvement: swap e1 with e2in E,C respectively, update discrepancies GRAPH MINING WS 2016 17
ADR example: edge probabilities GRAPH MINING WS 2016 18
ADR: a possible world and the discrepancies GRAPH MINING WS 2016 19
ADR: first iterations GRAPH MINING WS 2016 20
d1+d2 < 0 explanation For replacing (u,v) with (x,y) d1 = disu (G) - 1 + disv (G) 1 - ( disu (G) + disv (G) ) d2 = disx (G) + 1 + disy(g) + 1 - ( disx (G) + disy(g) ) Sumuv_bef = disu (G) + disv (G) Sumuv_after = disu (G) 1 + disv (G) 1 Sumxy_bef = disx (G) + disy (G) Sumxy_after = disx (G) + 1 + disy (G) + 1 d1 = Sumuv_after Sumuv_bef d2 = Sumxy_after Sumxy_bef If d1 and d2 are positive, then Sumuv_after > Sumuv_bef Sumxy_after > Sumxy_bef none of the underlying nodes benefits from the swap... GRAPH MINING WS 2016 21
References Uncertain data On the representation and querying of sets of possible worlds A survey of uncertain data algorithms and applications Uncertain graphs The pursuit of a good possible world: extracting representative instances of uncertain graphs Uncertain graph sparsification Uncertain graph processing through representative instances Triangle-based representative possible worlds of uncertain graphs Clustering large probabilistic graphs Algorithms for mining uncertain graph data K-nearest neighbors in uncertain graphs GRAPH MINING WS 2016 22
Lecture road Anomaly detection Representatives of Probabilistic (Uncertain) graphs Introduction to signed networks 23
What is a signed network? It is a graph G=(V,E), where each edge is mapped to a sign A sign can be positive or negative The sign of a path is the product of the signs of its edges Typically a signed network is denoted by: Σ = G(V,E,σ), where σ, or the signature of the graph, is the function σ: E->(+,-) u v +/- +/- k +/- GRAPH MINING WS 2016 24
What is balance? The enemy of my enemy is my friend! History.. Fritz Heider (psychologist) and Frank Harary (mathematician) lay the foundations of the signed graphs and the balance theory Original idea of P-O-X model how are social relations modeled? are they balanced? P + O - X + GRAPH MINING WS 2016 25
Example of the P-O-X model Imagine that you are person P and that O is someone, whom you think highly of, now imagine X is a presidential candidate you dislike, but X vehemently endorsees O. What do you suspect would happen? + P needs to agree with his friend O, or needs to unfriend O! - + the situation is unbalanced... GRAPH MINING WS 2016 26
Balance theory Theorem 1: G is balanced if every path p between u, v have the same sign Theorem 2: A signed graph is balanced if and only if V can be bipartitioned, s. t. each edge between the parts is negative and each edge within a part is positive GRAPH MINING WS 2016 27
Status theory [3] The signs in balance theory are perceived as likes/dislikes Can they also indicate another relation? in the context of directed social networks, the intention of the user creating the link matters.. I think O has a higher status than I do P + O - X I think O has a lower status than I do [3] Signed Networks in Social Media, Jure Leskovec, SIGCHI 2010 GRAPH MINING WS 2016 28
Some possible applications Modelling interactions in Chemical/Biological networks Social network analysis Political and economical relations Graph Algorithms, Applications and Implementations, Charles Phillips GRAPH MINING WS 2016 29
References More material Signed graphs, Matthias Beck Graph Algorithms, Applications and Implementations, Charles Phillips Harary : On the notion of balance of a signed graph Networks, Crowds, and Markets: Reasoning about a Highly Connected World, Chapter 5: Positive and Negative Relationships Research problems on signed graphs Signed graphs in social media Community Mining in Signed Social Networks An Automated Approach Polarity Related Influence Maximization in Signed Social Networks Node Classification in Signed Social Networks Predicting Positive and Negative Links in Online Social Networks GRAPH MINING WS 2016 30
In the next episodes 3rd presentation date Course Evaluation Exams and maybe more! 31
Questions? 32
References Akoglu, L., McGlohon, M. and Faloutsos, C.. Oddball: Spotting anomalies in weighted graphs. PAKDD, 2010. Tong, H. and Lin, C.Y. Non-Negative Residual Matrix Factorization with Application to Graph Anomaly Detection. In SDM, 2011. Xing, E.P., Ng, A.Y., Jordan, M.I. and Russell, S. Distance metric learning with application to clustering with side-information. In NIPS, 2002. 33