Graph similarity. Laura Zager and George Verghese EECS, MIT. March 2005

Similar documents
Trust-Based Recommendation Based on Graph Similarity

Measuring Similarity of Graph Nodes by Neighbor Matching

Using Metrics to Identify Design Patterns in Object-Oriented Software EPM

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Recovery of Design Pattern from source code

CSCI5070 Advanced Topics in Social Computing

Basics of Network Analysis

Graph Similarity for Data Mining Lot 6 of Data Analytics FY17/18

Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies

Extracting Information from Complex Networks

Embedded Subgraph Isomorphism and Related Problems

Networks in economics and finance. Lecture 1 - Measuring networks

Graphs. Motivations: o Networks o Social networks o Program testing o Job Assignment Examples: o Code graph:

Introduction to Graph Theory

Laura Zager. B.A., Mathematics (2003) B.S., Engineering (2003) Swarthmore College. at the. May Professor of Electrical Engineering and

Signal Processing for Big Data

Lecture 3: Recap. Administrivia. Graph theory: Historical Motivation. COMP9020 Lecture 4 Session 2, 2017 Graphs and Trees

CAIM: Cerca i Anàlisi d Informació Massiva

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

Minimum-Cost Spanning Tree. as a. Path-Finding Problem. Laboratory for Computer Science MIT. Cambridge MA July 8, 1994.

Lecture 2: From Structured Data to Graphs and Spectral Analysis

Graph Theory. Part of Texas Counties.

Mathematics of Networks II

Lecture 2: Geometric Graphs, Predictive Graphs and Spectral Analysis

Social Data Management Communities

AMS /672: Graph Theory Homework Problems - Week V. Problems to be handed in on Wednesday, March 2: 6, 8, 9, 11, 12.

Characterizing Graphs (3) Characterizing Graphs (1) Characterizing Graphs (2) Characterizing Graphs (4)

Alignment of Trees and Directed Acyclic Graphs

Unit 2: Graphs and Matrices. ICPSR University of Michigan, Ann Arbor Summer 2015 Instructor: Ann McCranie

Introduction to Networks and Business Intelligence

NETWORK ANALYSIS. Duygu Tosun-Turgut, Ph.D. Center for Imaging of Neurodegenerative Diseases Department of Radiology and Biomedical Imaging

CSE 190 Lecture 16. Data Mining and Predictive Analytics. Small-world phenomena

Social Network Analysis

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.

Paths, Circuits, and Connected Graphs

arxiv:cs/ v1 [cs.ir] 26 Apr 2002

CSE 158 Lecture 11. Web Mining and Recommender Systems. Triadic closure; strong & weak ties

1. a graph G = (V (G), E(G)) consists of a set V (G) of vertices, and a set E(G) of edges (edges are pairs of elements of V (G))

Introduction to Engineering Systems, ESD.00. Networks. Lecturers: Professor Joseph Sussman Dr. Afreen Siddiqi TA: Regina Clewlow

Elements of Graph Theory

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Examples of Complex Networks

Introduction III. Graphs. Motivations I. Introduction IV

DS504/CS586: Big Data Analytics Graph Mining Prof. Yanhua Li

Math 776 Graph Theory Lecture Note 1 Basic concepts

Graphs and Isomorphisms

V10 Metabolic networks - Graph connectivity

PageRank and related algorithms

EECS 1028 M: Discrete Mathematics for Engineers

Assoc.Department of Computer Applications Professor SVCET, Chittoor, Andhra Pradesh, India

M.E.J. Newman: Models of the Small World

An Edge-Swap Heuristic for Finding Dense Spanning Trees

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Math 778S Spectral Graph Theory Handout #2: Basic graph theory

Case Studies in Complex Networks

Machine Learning and Modeling for Social Networks

Graphs (MTAT , 6 EAP) Lectures: Mon 14-16, hall 404 Exercises: Wed 14-16, hall 402

The 3-Steiner Root Problem

Searching the Web [Arasu 01]

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

Math.3336: Discrete Mathematics. Chapter 10 Graph Theory

UNIVERSITA DEGLI STUDI DI CATANIA FACOLTA DI INGEGNERIA

Graph Theory and Network Measurment

COMP5331: Knowledge Discovery and Data Mining

Small Survey on Perfect Graphs

Decreasing the Diameter of Bounded Degree Graphs

Response Network Emerging from Simple Perturbation

Path Analysis References: Ch.10, Data Mining Techniques By M.Berry, andg.linoff Dr Ahmed Rafea

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

CSE 255 Lecture 13. Data Mining and Predictive Analytics. Triadic closure; strong & weak ties

Chapter 2 Graphs. 2.1 Definition of Graphs

Properties of Biological Networks

Discrete Mathematics

CSE 158 Lecture 13. Web Mining and Recommender Systems. Triadic closure; strong & weak ties

HIPRank: Ranking Nodes by Influence Propagation based on authority and hub

Random Generation of the Social Network with Several Communities

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

Topology Generation for Web Communities Modeling

13: Betweenness Centrality

Graphs. Introduction To Graphs: Exercises. Definitions:

Graph Theory for Network Science

MAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds

Graph Mining and Social Network Analysis

Mathematics of networks. Artem S. Novozhilov

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

On the edit distance from a cycle- and squared cycle-free graph

A visual analytics approach to compare propagation models in social networks

Lecture 17 November 7

Mining Social Network Graphs

Graph Grammars. Marc Provost McGill University February 18, 2004

A Relational View of Subgraph Isomorphism

EFFECT OF VARYING THE DELAY DISTRIBUTION IN DIFFERENT CLASSES OF NETWORKS: RANDOM, SCALE-FREE, AND SMALL-WORLD. A Thesis BUM SOON JANG

A Community-Based Peer-to-Peer Model Based on Social Networks

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Data mining --- mining graphs

Graph and Digraph Glossary

Abstract. 1. Introduction

Personalized Information Retrieval

6. Overview. L3S Research Center, University of Hannover. 6.1 Section Motivation. Investigation of structural aspects of peer-to-peer networks

Transcription:

Graph similarity Laura Zager and George Verghese EECS, MIT March 2005

Words you won t hear today impedance matching thyristor oxide layer VARs

Some quick definitions GV (, E) a graph G V the set of vertices or nodes E V V the set of edges can be directed or undirected. ex 1 2 3 4 5 a directed graph and its node-node adjacency matrix 1 2 3 4 5 1 2 3 4 5 0 1 1 0 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0

Graph theory: some perspective The Königsberg bridge problem (18 th c.) The Four Color Theorem (1976)

Graph theory: some perspective The Königsberg bridge problem (18 th c.) The Four Color Theorem (1976) Erdös and Rényi random graph models (1959)

Graph theory: some perspective The Königsberg bridge problem (18 th c.) The Four Color Theorem (1976) Erdös and Rényi random graph models (1959) present and future: graphs that arise in the natural world

Applications Comparing biological networks Deriving phylogenetic trees from metabolic pathway data [Heymans, Singh, 2003]. Social network mapping Small world phenomena [Milgram, 1967; Watts, 1999]. Web searching Improving searching results using WWW structure [Kleinberg, 1999]. Chemical structure matching Finding similar structures in a chemical database [Hattori et al., 2003].

Applications Comparing biological networks Deriving phylogenetic trees from metabolic pathway data [Heymans, Singh, 2003]. Social network mapping Small world phenomena [Milgram, 1967; Watts, 1999]. Web searching Improving searching results using WWW structure [Kleinberg, 1999]. Chemical structure matching Finding similar structures in a chemical database [Hattori et al., 2003]. one common thread: similarity

Notions of similarity Isomorphism identifying a bijection between the nodes of two graphs which preserves (directed) adjacency. Corneil & Gotlieb, Journal of the ACM, 1970. Pelillo, Neural Computation, 1999. Ullman, Journal of the Assoc. of Computing Machinery, 1976.

Notions of similarity Isomorphism identifying a bijection between the nodes of two graphs which preserves (directed) adjacency. a c a d c b d b Corneil & Gotlieb, Journal of the ACM, 1970. Pelillo, Neural Computation, 1999. Ullman, Journal of the Assoc. of Computing Machinery, 1976.

Notions of similarity isomorphism Edit distance given a cost function on edit operations (e.g. addition/deletion of nodes and edges), determine the minimum cost transformation from one graph to another. Bunke, IEEE Trans. Pattern Analysis and Machine Int., 1999. Messmer & Bunke, IEEE Trans. Pattern Analysis and Machine Int., 1998.

Notions of similarity edit distance isomorphism Maximum common subgraph identifying the `largest isomorphic subgraphs of two graphs. Minimum common supergraph identifying the `smallest graph that contains both graphs. max. com. sub Fernandez & Valiente, Pattern Recognition Letters, 2001. Bunke, Jiang & Candel, Computing, 2000.

Notions of similarity edit distance isomorphism max (min) common sub(super)graph Statistical methods assessing aggregate measures of graph structure (e.g. degree distribution, diameter, betweenness measures). frequency degree Albert, Barabasi, Reviews of Modern Physics, 2002 Dill, Kumar, et al., ACM Transactions on Internet Technology, 2002. Watts, Small Worlds, 1999.

Notions of similarity edit distance? isomorphism max (min) common sub(super)graph? statistical comparison Iterative methods: Two graph elements (e.g., edges or nodes) are similar if their neighborhoods are similar. Kleinberg, Journal of the ACM, 1999. Blondel, Van Dooren, et al., SIAM Review, 2004. Jeh & Widom, 8 th Intl. Conf. on Knowledge Discovery and Data Mining, 2002. Melnik, Garcia-Molina, 18 th Intl. Conf. on Data Engineering, 2002. Heymans & Singh, Bioinformatics, 2003.

Kleinberg, 1999 * Motivated by demands of web searching Step 1: Use text-based search methods to identify a candidate graph containing relevant websites and their neighbors. text-based search results plus the local neighborhood leinberg, J.M. Authoritative sources in a hyperlinked environment. Journal of the ACM. 1999

Kleinberg, 1999 Relevant search results might be: Hubs pages which point to many good authorities Authorities pages which are pointed to by many good hubs } good hubs } good authorities Step 2: Compute hub and authority scores for every node in the candidate graph.

Kleinberg, 1999 Denote: x 1p (k) = hub score of node p at iteration k x 2p (k) = authority score of node p at iteration k Update rule: x ( k + 1) = x ( k) 2p 1q q:( q, p) E i.e. the sum of hub scores of nodes that point to node p x ( k + 1) = x ( k) 1p 2q q:( p, q) E i.e. the sum of authority scores of nodes that are pointed to by node p Normalize the scores so that p x ip = 1 and repeat.

Kleinberg, 1999 Denote: x 1p (k) = hub score of node p at iteration k x 2p (k) = authority score of node p at iteration k Update rule: x ( k + 1) = x ( k) 2p 1q q:( q, p) E i.e. the sum of hub scores of nodes that point to node p x ( k + 1) = x ( k) 1p 2q q:( p, q) E i.e. the sum of authority scores of nodes that are pointed to by node p Normalize the scores so that p x ip = 1 and repeat.

Kleinberg, 1999 Denote: x 1p (k) = hub score of node p at iteration k x 2p (k) = authority score of node p at iteration k Update rule: Stack the scores x 1p (k) into a vector [x 1 ] k, then stack [x 1 ] k and [x 2 ] k. Let B be the node-node adjacency matrix of the candidate graph. Then: x x 1 = 2 k+ 1 0 B B x 0 x 1 2 k

Kleinberg, 1999 Ex. 1 2 5 3 6 7 nodes x 1 hub scores x 2 authority scores 1 0.374 0 2 0.242 0 3 0.467 0 4 0 0.365 5 0 0.467 4 6 0 0.365 7 0 0.308 for a good read, see The Ongoing Search for Efficient Web Search Algorithms, SIAM News, November 2004.

Blondel, Van Dooren, et al., 2004 * Views Kleinberg s iteration as a comparison between the web graph and a hub-authority graph: 1 hub node 2 authority node A = 0 1 0 0 Observe that the matrix form of Kleinberg s update can be written as follows: x x 1 = 0 B B x 0 x 1 ( A B A B ) x = + x 2 k+ 1 2 k 2 k Is this generalizable to any two graphs G A and G B? 1 * Blondel, V., Gajardo, A., Heymans, M., Senellart, P., Van Dooren, P. A measure of similarity between graph vertices: applications to synonym extraction and web searching. SIAM Review, v. 46(4), 647-666. 2004.

Blondel, Van Dooren, et al., 2004 A first step toward generalizing Kleinberg s approach: consider comparing the graph G B to the following graph using a similar update: 1 2 3 hub node central node authority node A = 0 1 0 0 0 1 0 0 0 x ( k + 1) = x ( k) 1p 2q q:( p, q) E p x ( k + 1) = x ( k) + x ( k) 2p 1q 3q q:( q, p) E q:( p, q) E x ( k + 1) = x ( k) 3p 2q q:( q, p) E

Blondel, Van Dooren, et al., 2004 A first step toward generalizing Kleinberg s approach: consider comparing the graph G B to the following graph using a similar update: 1 2 3 hub node central node authority node A = 0 1 0 0 0 1 0 0 0 x ( k + 1) = x ( k) 1p 2q q:( p, q) E p x ( k + 1) = x ( k) + x ( k) 2p 1q 3q q:( q, p) E q:( p, q) E x ( k + 1) = x ( k) 3p 2q q:( q, p) E

Blondel, Van Dooren, et al., 2004 A first step toward generalizing Kleinberg s approach: consider comparing the graph G B to the following graph using a similar update: 1 2 3 hub node central node authority node A = 0 1 0 0 0 1 0 0 0 x ( k + 1) = x ( k) 1p 2q q:( p, q) E p x ( k + 1) = x ( k) + x ( k) 2p 1q 3q q:( q, p) E q:( p, q) E x ( k + 1) = x ( k) 3p 2q q:( q, p) E

Blondel, Van Dooren, et al., 2004 A first step toward generalizing Kleinberg s approach: consider comparing the graph G B to the following graph using a similar update: 1 2 3 hub node central node authority node A = 0 1 0 0 0 1 0 0 0 x ( k + 1) = x ( k) 1p 2q q:( p, q) E p x ( k + 1) = x ( k) + x ( k) 2p 1q 3q q:( q, p) E q:( p, q) E x ( k + 1) = x ( k) 3p 2q q:( q, p) E

Blondel, Van Dooren, et al., 2004 A first step toward generalizing Kleinberg s approach: consider comparing the graph G B to the following graph using a similar update: 1 2 3 hub node central node authority node A = 0 1 0 0 0 1 0 0 0 x x x 1 2 0 B 0 x = B 0 B x 0 B 0 x 1 2 ( ) = A B + A B x x x 3 k+ 1 3 k 3 k 1 2 (use this construction for automatic synonym extraction)

Blondel, Van Dooren, et al., 2004 In general, the nodes of two graphs G A and G B can be compared via the following update: ( ) xk+ 1 = A B + A B x k Ex. G A G B 1 2 3 1 2 3 similarity scores nodes 1 2 3 1 0.443 0.104 0 2 0.280 0.396 0.086 3 0.086 0.396 0.280 4 0.222 0.049 0.222 5 0 0.104 0.443 4 5

Coupled edge and node scoring Idea: use this iterative approach to assign edge similarity scores as well as node similarity scores. Couple the definitions in the following manner: x ij = similarity between node i in G B and node j in G A = sum of pairwise similarities between adjacent edges y ij = similarity between edge i in G B and edge j in G A. = sum of similarities of source and terminal nodes

Coupled edge and node scoring Idea: use this iterative approach to assign edge similarity scores as well as node similarity scores. Couple the definitions in the following manner: x ij = similarity between node i in G B and node j in G A = sum of pairwise similarities between adjacent edges y ij = similarity between edge i in G B and edge j in G A. = sum of similarities of source and terminal nodes

Coupled edge and node scoring Idea: use this iterative approach to assign edge similarity scores as well as node similarity scores. Couple the definitions in the following manner: x ij = similarity between node i in G B and node j in G A = sum of pairwise similarities between adjacent edges y ij = similarity between edge i in G B and edge j in G A. = sum of similarities of source and terminal nodes

Coupled edge and node scoring Idea: use this iterative approach to assign edge similarity scores as well as node similarity scores. Couple the definitions in the following manner: x ij = similarity between node i in G B and node j in G A = sum of pairwise similarities between adjacent edges y ij = similarity between edge i in G B and edge j in G A. = sum of similarities of source and terminal nodes [ ] x + 1 = A B + A B y k S S T T k [ ] + y A B A B x k+ 1 = S S T T k [ A ] S ij = 1 0 s( j) = else i [ A ] T ij = 1 0 t( j) = else i

Example 1 G A 1 2 3 G B 2 3 4 5 Blondel, Van Dooren, et al. similarity scores nodes 1 2 3 1 0.443 0.104 2 0.280 0.396 0.086 3 0.086 0.396 0.280 4 0.222 0.049 0.222 5 0 0.104 0.443 0 Coupled model similarity scores nodes 1 2 3 1 0.324 0.054 2 0.177 0.587 0.018 3 0.018 0.587 0.177 4 0.127 0.010 0.127 5 0 0.054 0.324 0

Application: Graph Matching Assign a correspondence between nodes and/or edges of each graph to maximize some performance criteria. The Approach: apply Hungarian algorithm to node similarity matrix to maximize the sum of matched scores. 1 3 7 1 3 7 3 2 4 3 2 4 4 8 3 4 8 3

Application: Graph Matching Task: subgraph matching Generate a random graph, G. Select a subgraph, S. Compute the node similarity matrices between G and S. Apply the Hungarian algorithm to `best match the nodes of S to those in G by finding a matching the maximizes the sum of matched scores. Record successes for nodes that are matched with their original identifier. G d c a b d S c a b

Application: Graph Matching Task: subgraph matching Generate a random graph, G Select a subgraph, S Compute the node similarity matrices between G and S Apply the Hungarian algorithm to `best match the nodes of S to those in G by finding a matching the maximizes the sum of matched weights. Record successes for nodes that are matched with their original identifier G a* a S d c Yields a lower bound on the success of the matching process b d c a b

Application: Graph Matching Using local edge similarity to improve scores: perform a a local edge matching a a x aa x aa * = x aa + m aa

Application: Graph Matching

Application: Graph Matching Exploring the impact of node labeling: cannot be matched

Application: Graph Matching

Current/future work How does graph structure (e.g., cycles, paths, completeness) impact similarity scores? What can be inferred about a pair of graphs from a similarity measurement? What kinds of tasks is this measure appropriate for?

Acknowledgments George Verghese, MIT Sandip Roy, WSU Paul Van Dooren, Université catholique de Louvain Work supported by a NSF Graduate Research Fellowship.