Capturing Topology in Graph Pattern Matching

Similar documents
A Strong Simulation: Capturing Topology in Graph Pattern Matching

Distributed Exact Subgraph Matching in Small Diameter Dynamic Graphs

Graph Pattern Matching Revised for Social Network Analysis

Binary Decision Diagrams

Efficient Subgraph Matching by Postponing Cartesian Products

Mining Summaries for Knowledge Graph Search

gspan: Graph-Based Substructure Pattern Mining

A Polynomial Algorithm for Submap Isomorphism: Application to Searching Patterns in Images

Construction of Minimum-Weight Spanners Mikkel Sigurd Martin Zachariasen

Answering Pattern Queries Using Views

Strong Bridges and Strong Articulation Points of Directed Graphs

Efficiently Evaluating Complex Boolean Expressions

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests

Treewidth and graph minors

Learning Path Queries on Graph Databases

On Graph Query Optimization in Large Networks

Mining frequent Closed Graph Pattern

Lecture 5: Duality Theory

Subgraph Isomorphism. Artem Maksov, Yong Li, Reese Butler 03/04/2015

Graph Isomorphism. Algorithms and networks

Graph Pattern Matching: From Intractable to Polynomial Time

I. An introduction to Boolean inverse semigroups

Adding Regular Expressions to Graph Reachability and Pattern Queries

Diversified Top-k Graph Pattern Matching

Distributed Graph Algorithms for Planar Networks

V Advanced Data Structures

CSE 417 Network Flows (pt 4) Min Cost Flows

Data Structures for Moving Objects

Simulation of Scale-Free Networks

NED: An Inter-Graph Node Metric Based On Edit Distance

V Advanced Data Structures

Error correction guarantees

Discrete mathematics II. - Graphs

Rewriting Ontology-Mediated Queries. Carsten Lutz University of Bremen

Ontology-based Subgraph Querying

a standard database system

Surfaces, meshes, and topology

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Pattern Recognition Using Graph Theory

Constraint Satisfaction Problems

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

CCD: Efficient Customized Content Dissemination in Distributed Publish/Subscribe

Nearest Neighbor with KD Trees

MAS341 Graph Theory 2015 exam solutions

CHAPTER 10 GRAPHS AND TREES. Alessandro Artale UniBZ - artale/

Mining Minimal Contrast Subgraph Patterns

Chapter 9: Elementary Graph Algorithms Basic Graph Concepts

Example of a Demonstration that a Problem is NP-Complete by reduction from CNF-SAT

Locality- Sensitive Hashing Random Projections for NN Search

Local Context Selection for Outlier Ranking in Graphs with Multiple Numeric Node Attributes

R-Trees. Accessing Spatial Data

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Fixed-Parameter Algorithm for 2-CNF Deletion Problem

Symbol Detection Using Region Adjacency Graphs and Integer Linear Programming

Towards Energy Proportionality for Large-Scale Latency-Critical Workloads

Induction Review. Graphs. EECS 310: Discrete Math Lecture 5 Graph Theory, Matching. Common Graphs. a set of edges or collection of two-elt subsets

COMS 4771 Support Vector Machines. Nakul Verma

Geometric data structures:

Groups and Graphs Lecture I: Cayley graphs

Outsourcing Privacy-Preserving Social Networks to a Cloud

γ(ɛ) (a, b) (a, d) (d, a) (a, b) (c, d) (d, d) (e, e) (e, a) (e, e) (a) Draw a picture of G.

Discrete mathematics

Business Club. Decision Trees

Neural Networks: What can a network represent. Deep Learning, Fall 2018

Neural Networks: What can a network represent. Deep Learning, Spring 2018

CMSC th Lecture: Graph Theory: Trees.

Efficient Aggregation for Graph Summarization

Multiprocessor Interconnection Networks- Part Three

Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions. --Lei Zou Joint work with Shuo Han and Jeffrey Xu 2018

Introduction to Computers and Programming. Concept Question

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

Balanced Binary Search Trees. Victor Gao

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured Data?

Chapter 4: Trees. 4.2 For node B :

Topology and Topological Spaces

On the Minimum k-connectivity Repair in Wireless Sensor Networks

Pattern Mining in Frequent Dynamic Subgraphs

Week 4. COMP62342 Sean Bechhofer, Uli Sattler

Recitation 9. Prelim Review

THE GROWTH OF LIMITS OF VERTEX REPLACEMENT RULES

Multidimensional Indexes [14]

Hyper-Butterfly Network: A Scalable Optimally Fault Tolerant Architecture

54 Years of Graph Isomorphism Testing

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions

Chapter 7 TOPOLOGY CONTROL

Finding Dense and Connected Subgraphs in Dual Networks

Hamilton paths & circuits. Gray codes. Hamilton Circuits. Planar Graphs. Hamilton circuits. 10 Nov 2015

Cofinite Induced Subgraph Nim

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

Dynamic Collision Detection

Reductions of Graph Isomorphism Problems. Margareta Ackerman. Technical Report 08

Clustering Using Graph Connectivity

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

Red-Black trees are usually described as obeying the following rules :

GADDI: Distance Index based Subgraph Matching in Biological Networks

Answering Graph Pattern Queries Using Views

Signatures of Combinatorial Maps

Lecture 3: Conditional Independence - Undirected

All-Pairs Nearly 2-Approximate Shortest-Paths in O(n 2 polylog n) Time

Transcription:

Capturing Topology in Graph Pattern Matching Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo University of Edinburgh

Graphs are everywhere, and quite a few are huge graphs! File systems Databases World Wide Web Social Networks Graph searching is a key to social searching engines! 2

Graph Pattern Matching Given two directed graphs G1 (pattern graph) and G2 (data graph), decide whether G1 matches G2 (Boolean queries); identify subgraphs of G2 that match G1 Applications Web mirror detection/ Web site classification Complex object identification Software plagiarism detection Social network/biology analyses Matching Models Traditional: Subgraph Isomorphism Emerging applications: Graph Simulation and its extensions, etc.. A variety of emerging real-life applications! 3

Subgraph Isomorphism Pattern graph Q, subgraph G s of data graph G Q matches G s if there exists a bijective function f: V Q V Gs such that for each node u in Q, u and f(u) have the same label An edge (u, u ) in Q if and only if (f(u), f(u')) is an edge in G s Goodness: Keep exact structure topology between Q and G s Badness: Decision problem is NP-complete May return exponential many matched subgraphs In certain scenarios, too restrictive to find matches These hinder the usability in emerging applications, e.g., social networks 4

Graph Simulation Given pattern graph Q(Vq, Eq) and data graph G(V, E), a binary relation R Vq V is said to be a match if (1) for each (u, v) R, u and v have the same label; and (2) for each edge (u, u ) Eq, there exists an edge (v, v ) in E such that (u, v ) R. Graph G matches pattern Q via graph simulation, if there exists a total match relation M for each u Vq, there exists v V such that (u, v) M. Goodness: Quadratic time solvable Badness: Lose structure topology (how much? open question) Return a single unique matched subgraph Subgraph isomorphism (NP-complete) vs. graph simulation (O(n 2 ))! 5

Graph Simulation Set up a team to develop a new software product Graph simulation returns F3, F4 and F5; Subgraph isomorphism returns empty! Subgraph Isomorphism is too strict for emerging applications! 6

Graph Simulation Loses Structures Connected pattern graphs match disconnected subgraphs S(HR) = {HR} S(SE) = {SE} S(Bio) = {Bio 1, Bio 2 } Q Gs Cyclic pattern graphs match tree subgraphs S(HR) = {HR} S(SE) = {SE} S(Bio) = {Bio 1, Bio 2 } Q Gs These motivate us to propose a new matching model! 7

Strong Simulation: A New Model Strong Simulation = Graph Simulation + Duality + Locality Duality (dual simulation) Both child and parent relationships Simulation considers only child relationships Locality Restricting matches within a ball When social distance increases, the closeness of relationships decreases and the relationships may become irrelevant The semantics of strong simulation is well defined The matching results are unique Striking a balance between expressiveness and complexity! 8

Duality and Locality Pattern graph Q matches data graph G via dual simulation if there exists a binary match relation S V Q V such that: for each (u, v) S, u and v have the same label; and for each u V q, there exits v V such that (u, v) S; and for each edge (u, u1) in E q, there is an edge (v, v1) in E with (u1; v1) S; -> Child relationships for each edge (u2, u) in E q, there is an edge (v2; v) in E with (u2, v2) S. -> Parent relationships Graph Simulation Dual simulation: bring duality intro graph simulation! The matched subgraph must be a connected subgraph falling into a ball with center v and radius d Q (diameter of Q) containing the ball center v Strong simulation: bring locality intro dual simulation! 9

Properties of Strong Simulation Connectivity: Connected pattern graphs match disconnected subgraphs S(HR) = {HR} S(SE) = {SE} S(Bio) = {Bio 1, Bio 2 } Q Gs Cycles: Cyclic pattern graphs match tree subgraphs S(HR) = {HR} S(SE) = {SE} S(Bio) = {Bio 1, Bio 2 } Q Q Gs Gs Strong simulation preserves more topology structures! 10

Properties of Strong Simulation Locality: the diameter of matched subgraphs is bounded by 2*d Q Graph simulation does NOT have this property Subgraph isomorphism does have this property Bounded matches: The number of matched subgraphs is bounded by V Graph simulation finds at most one matched subgraph Subgraph isomorphism may finds exponential number of matched subgraphs Bounded cycles: The length of cycles in matched subgraphs is bounded by the ones in pattern graphs. Graph simulation and strong simulation does NOT have this property Subgraph isomorphism does have this property! Strong simulation preserves more topology structures! 11

Properties of Strong Simulation

Properties of Strong Simulation

Properties of Strong Simulation A balance between expressiveness and complexity! 14

Properties of Strong Simulation If Q matches G, via subgraph isomorphism, then Q matches G, via strong simulation If Q matches G, via strong simulation, then Q matches G, via dual simulation If Q matches G, via dual simulation, then Q matches G, via graph simulation Subgraph Strong Dual Graph Isomorphism Simulation Simulation Simulation 15

Algorithms for Strong Simulation A cubic time algorithm Graph simulation: Quadratic time Subgraph isomorphism: NP-Complete A distributed algorithm Using the data locality property Real life graphs are typically distributed Connectivity theorem: Q matches G, via dual simulation for any connected component G c of the match graph w.r.t. the maximum match relation of Q and G, Q matches G c, G c is exactly the match graph w.r.t. the maximum match relation of Q and G c Nontrivial extension of the algorithm for graph simulation! 16

Optimization Techniques Minimizing pattern graphs (Q Qm): An quadratic time algorithm Given pattern graph Q, we compute a minimized equivalent pattern graph Qm such that for any data graph G, G matches Q iff G matches Qm, via strong simulation. Dual simulation filtering First compute the matched subgraph of dual simulation, Then project on each ball of the data graph Connectivity pruning Based on the connectivity theorem 17

Experimental Study Real life datasets: Amazon product co-buy network: 548,552 nodes and 1,788,725 edges YouTube video network: 155,513 nodes and 3,110,120 edges Synthetic graph generator: (10 7 nodes and 251,188,643 edges) Three parameters: 1. The number n of nodes; 2. The number n α of edges; and 3. The number l of node labels Algorithms: Strong simulation algorithm Match and its optimized version Match + Graph simulation algorithm Sim [HHK, FOCS 95] Approximate matching algorithm TALE [TP, ICDE 08] Maximum common subgraph algorithm VF2 [CN, 2006] Machines: PC machines with Intel Core i7 860 CPUs and 16GB memory 18

Experiments - Quality The results of strong simulation are more realistic! 19

Experiments - Quality The results of strong simulation are more compact! 20

Experiments - Quality 70%-80% found by subgraph isomorphism are retrieved by strong simulation Up to 50% found by subgraph isomorphism are retrieved by graph simulation 21

Experiments - Quality Strong simulation effectively reduces the number of match results! 22

Experiments - Quality Pattern graphs have 10 nodes Graph simulation (returns a single graph) Amazon: 103 YouTube: 177 Synthetic: 311 The sizes of matched subgraphs are small! 23

Experiments - Efficiency 1. Our algorithms scale well; 2. Optimization techniques are effective (reduce about 1/3 time); 3. The time gap between Sim and Match is tolerable, considering the matching quality that Match improves. 24

Summary We have proposed and investigated strong simulation to rectify the problems of subgraph isomorphism and graph simulation. the duality to preserve the parent relationships the locality to eliminate excessive matches. We identify a set of criteria for topology preservation, and show that strong simulation preserves the topology of pattern graphs and data graphs. Children, Parent, Connectivity, Cycles, Bounded matches Bounded cycles, Bisimilarity We show that strong simulation retains the same complexity as earlier extensions of simulation (a cubic-time algorithm) Optimizations: minimization, dual simulation filtering, connectivity pruning We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs A new matching model with a balance between complexity and expressiveness 25