Small World Graph Clustering

Similar documents
CAIM: Cerca i Anàlisi d Informació Massiva

Solving Linear Recurrence Relations (8.2)

Overlapping Communities

Characteristics of Preferentially Attached Network Grown from. Small World

Lesson 4. Random graphs. Sergio Barbarossa. UPC - Barcelona - July 2008

Heuristics for the Critical Node Detection Problem in Large Complex Networks

Introduction to network metrics

(Social) Networks Analysis III. Prof. Dr. Daning Hu Department of Informatics University of Zurich

Minimum Spanning Trees My T. UF

CS 220: Discrete Structures and their Applications. graphs zybooks chapter 10

Networks in economics and finance. Lecture 1 - Measuring networks

Models of Network Formation. Networked Life NETS 112 Fall 2017 Prof. Michael Kearns

Small-World Models and Network Growth Models. Anastassia Semjonova Roman Tekhov

Online Social Networks and Media. Community detection

Sequential Monte Carlo Method for counting vertex covers

6. Overview. L3S Research Center, University of Hannover. 6.1 Section Motivation. Investigation of structural aspects of peer-to-peer networks

Oh Pott, Oh Pott! or how to detect community structure in complex networks

CS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University

Computing minimum distortion embeddings into a path for bipartite permutation graphs and threshold graphs

Modularity CMSC 858L

V 2 Clusters, Dijkstra, and Graph Layout"

V 2 Clusters, Dijkstra, and Graph Layout

Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes

TELCOM2125: Network Science and Analysis

Summary: What We Have Learned So Far

Introduction to Algorithms. Lecture 24. Prof. Patrick Jaillet

Peer-to-Peer Data Management

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

Introduction to Bioinformatics

Example for calculation of clustering coefficient Node N 1 has 8 neighbors (red arrows) There are 12 connectivities among neighbors (blue arrows)

Social Data Management Communities

Graph Theory for Network Science

Minimum Spanning Trees

Analysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths

Faster parameterized algorithms for Minimum Fill-In

1 Random Graph Models for Networks

On Covering a Graph Optimally with Induced Subgraphs

6.207/14.15: Networks Lecture 5: Generalized Random Graphs and Small-World Model

Distribution-Free Models of Social and Information Networks

Hierarchical Clustering

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

CS 6824: The Small World of the Cerebral Cortex

Announcements Problem Set 5 is out (today)!

Lecture 5. Figure 1: The colored edges indicate the member edges of several maximal matchings of the given graph.

A Partition Method for Graph Isomorphism

V4 Matrix algorithms and graph partitioning

The Computational Complexity of Graph Contractions II: Two Tough Polynomially Solvable Cases

Lecture 4: Walks, Trails, Paths and Connectivity

CSE 100 Minimum Spanning Trees Prim s and Kruskal

Network Clustering. Balabhaskar Balasundaram, Sergiy Butenko

Some Graph Theory for Network Analysis. CS 249B: Science of Networks Week 01: Thursday, 01/31/08 Daniel Bilar Wellesley College Spring 2008

Faster parameterized algorithms for Minimum Fill-In

A CSP Search Algorithm with Reduced Branching Factor

CS200: Graphs. Rosen Ch , 9.6, Walls and Mirrors Ch. 14

Graph Theory. Graph Theory. COURSE: Introduction to Biological Networks. Euler s Solution LECTURE 1: INTRODUCTION TO NETWORKS.


ECS 253 / MAE 253, Lecture 8 April 21, Web search and decentralized search on small-world networks

1 Degree Distributions

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

Persistent Homology in Complex Network Analysis

Highway Hierarchies (Dominik Schultes) Presented by: Andre Rodriguez

Lecture Note: Computation problems in social. network analysis

On a perfect problem

Nick Hamilton Institute for Molecular Bioscience. Essential Graph Theory for Biologists. Image: Matt Moores, The Visible Cell

Approximation slides 1. An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs

Introduction To Graphs and Networks. Fall 2013 Carola Wenk

DEFINITION OF GRAPH GRAPH THEORY GRAPHS ACCORDING TO THEIR VERTICES AND EDGES EXAMPLE GRAPHS ACCORDING TO THEIR VERTICES AND EDGES

Minimum-Spanning-Tree problem. Minimum Spanning Trees (Forests) Minimum-Spanning-Tree problem

Community Detection. Community

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

The Pre-Image Problem in Kernel Methods

Total communicability as a centrality measure

M.E.J. Newman: Models of the Small World

Structural Graph Matching With Polynomial Bounds On Memory and on Worst-Case Effort

Basics of Network Analysis

This course is intended for 3rd and/or 4th year undergraduate majors in Computer Science.

On Structural Parameterizations for the 2-Club Problem

A Survey of the Algorithmic Properties of Simplicial, Upper Bound and Middle Graphs

Exercise set #2 (29 pts)

Signal Processing for Big Data

ECE 158A - Data Networks

Extracting Information from Complex Networks

Finding a winning strategy in variations of Kayles

Kernelization Upper Bounds for Parameterized Graph Coloring Problems

Mining Social Network Graphs

Community Structure and Beyond

16 - Networks and PageRank

Overlay (and P2P) Networks

Week 12: Minimum Spanning trees and Shortest Paths

Small World Networks

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal

RANDOM-REAL NETWORKS

2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity.

Chordal Probe Graphs (extended abstract)

Quick Review of Graphs

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

Collaboration networks and innovation in Canada s ICT Hardware Cluster. Catherine Beaudry and Melik Bouhadra Polytechnique Montréal

Introduction to Engineering Systems, ESD.00. Networks. Lecturers: Professor Joseph Sussman Dr. Afreen Siddiqi TA: Regina Clewlow

ALTERNATIVES TO BETWEENNESS CENTRALITY: A MEASURE OF CORRELATION COEFFICIENT

Extracting Communities from Networks

Transcription:

Small World Graph Clustering Igor Kanovsky, Lilach Prego University of Haifa, Max Stern Academic College Israel 1

Clustering Problem Clustering problem : How to recognize communities within a huge graph. There is no exact definition of the cluster. Our aim was to design an effective algorithm for community recognition in special class of graphs called Small World graph. 2

Outline Small World Graph definition. Link weighting. Iterated Clustering Algorithm. Small World Simulation. Zachary's Karate Club Study. Spidering the Web. Conclusion. 3

Small World Graph (1/3) Def.1. The characteristic path length L(G) of a graph G = (V,E) is the average length of the shortest path between two vertices in G Def.2. The clustering coefficient C(G)=<C(v)> of a graph G = (V,E) is the average clustering coefficient of its vertices C(v); 4

Small World Graph (2/3) Def. 3. Clustering coefficient for vertex v: C( v) = number _ of _ edges _ in _ k( v) ( k( v) 1) [ ( v) ] G N 2 where N(v) neighbourhood of v, k(v)= N(v) (degree of v). The clustering coefficient C(v) of a vertex v is the edge density of the graph G[N(v)] induced by N(v). 5

Small World Graph (3/3) Def. 4. Small World graph is a graph G(V,E) with L~L R ~log V and C>>C R, where G R (V R,E R ) - a random graph with V R = V, E R = E. A lot of real world graphs are Small World graphs: 1. Social relationships. 2. Business (organization) collaborations. 3. The Web. The Internet. 4. Biological data (DNA structure, cells metabolism etc.). 6

Link Weighting In Small World Definition: For the link connecting nodes v 1,v 2 the edge weighting parameter β as: we define Where N(v) is the neighborhood of the vertex v. 7

Iterated Clustering Algorithm (ICA) Definition: Two adjacent nodes v 1, v 2 belong to the same cluster if β(v 1, v 2 ) > α Where α is the level of the cluster separation for a Small World graph (threshold value which is a parameter of the algorithm). By simple iterated procedure a Small World graph can be divided into k clusters, where k is between 1 to V depends on α and the graph link structure. 8

Iterated Clustering Algorithm ICA Flow : (ICA) Cont. Input: graph G=(V,E), level of cluster separation α ; Output: clustering V, V,... V } ; { 1 2 k 1. i=0; 2. loop while V is not empty a. find an arbitrary cluster C in G[V]; b. i++; c. V i =C; V=V-C; 3. k=i; 9

Iterated Clustering Algorithm Finding an arbitrary cluster flow : (ICA) Cont. Input: graph G=(V,E); Output: A cluster C G. 1. put arbitrary vertex v into queue Q; C= ; 2. loop while Q is not empty a. get vertex u from Q; add u to C; b. loop for each w N(u) if β ( u, w) > α and w Q C put w into Q; end if. 10

ICA Example 11

Small World Simulation Extension of Watts and Strogatz Small World Model : Collection of ring lattices with randomly reconnected P in links inside the ring lattice and P out reconnected links between the rings. 12

Small World simulation Results Typical behavior of the ICA ability to recognize clusters under increasing link randomization. Average Percentage of correctly recognized nodes 13 120.00 100.00 80.00 60.00 40.00 20.00 0.00 0.00 6.60 13.20 19.80 ICA algorithm w ith p out =6% 26.40 Percentage of Internal links 33.00 39.60 46.20 52.80 59.40 66.00 72.60 79.20 120.00 100.00 Average Percentage 80.00 of correctly 60.00 recognized nodes 40.00 20.00 0.00 0.00 4.80 9.60 14.40 ICA algorithm with p in =6% 19.20 Percentage of external links 24.00 28.80 33.60 38.40 43.20 48.00 52.80 57.60

Small World simulation Results We focus on ICA recognition of at least 80% of the nodes. We define P critical as the value of p out when the percentage of correctly recognized nodes starts to go down below 80%. p in =0.05%. (For different p in tested changes are not significant). 14

Small World simulation Results Exists a cluster separation level α c that provides best results of ICA cluster recognition. For our simulation model α c =0.1. 15

Small World simulation Results 16

Small World simulation Results Cont. As the value of z grows, there is cluster identification for higher values of p out 17

Zachary's Karate Club Study The real-world friendship network from the well Known Zachary's Karate Club : 18

Zachary's Karate Club Study Cont. The hierarchical tree of clustering produced by our method : 19

Spidering the Web Node degree distributions of Oxford university (www.ox.ac.uk) subgraph (consist of 10014 pages) that was spidered from the web. In- degree distribution Out- degree distribution 4.5 4 1.8 1.6 Log(Number Of Pages) 3.5 3 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 2.5 3 Log(Number Of Pages) 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 Log(in- degree) Log(out- degree) 20

Web Subgraphs Results Example for spidering 4001 pages from Harvard University. Spidered graph CC=0.007, after the link removal for α = 0.2 the CC=0.046. 21

Advantages 1. Efficiency: ICA has a polynomial complexity O( E ), or more exactly, an average complexity O(z 2 V ), where z is the average number of links per node in the graph. 2. Locality: No need to know full graph or number of clusters for local cluster recognition. 3. Applicability: Applicable for graphs with different nature ("big" and "small", power-law. etc.) 4. Tool ability: Helps to recognize a Small World subgraphs in not Small World graph. 22

Conclusion The success of this method demonstrate that the local link order are connected to the cluster structure of the graph. ICA has significant advantages compare to other clustering algorithms. Still, it must be emphasized that ICA applicability is for special graphs. 23

Thank you. For contacts: igor kanovsky, igork@yvc.ac.il, http://www.yvc.ac.il/ik/ 24