On the Permanence of Vertices in Network Communities. Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India

Similar documents
arxiv: v1 [physics.soc-ph] 10 Jun 2014

Networks: Methods and Applications

Community Detection: Comparison of State of the Art Algorithms

Community Detection in Bipartite Networks:

arxiv: v1 [cs.si] 17 Sep 2016

Generalized Measures for the Evaluation of Community Detection Methods

Network community detection with edge classifiers trained on LFR graphs

Community detection. Leonid E. Zhukov

CEIL: A Scalable, Resolution Limit Free Approach for Detecting Communities in Large Networks

Community detection using boundary nodes in complex networks

Edge Weight Method for Community Detection in Scale-Free Networks

Expected Nodes: a quality function for the detection of link communities

arxiv: v1 [cs.si] 6 Dec 2017

EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE

Local Community Detection in Dynamic Graphs Using Personalized Centrality

Modeling and Detecting Community Hierarchies

Community Detection: A Bayesian Approach and the Challenge of Evaluation

Revealing Multiple Layers of Deep Community Structure in Networks

An Efficient Algorithm for Community Detection in Complex Networks

Social Data Management Communities

Using Stable Communities for Maximizing Modularity

Comparative Evaluation of Community Detection Algorithms: A Topological Approach

CUT: Community Update and Tracking in Dynamic Social Networks

Finding Hierarchical Communities in Complex Networks Using Influence-Guided Label Propagation

Local higher-order graph clustering

A FUSION CONDITION FOR COMMUNITY

Perspective on Measurement Metrics for Community Detection Algorithms

Basics of Network Analysis

Dynamic Structural Similarity on Graphs

University of Alberta. Justin Fagnan. Master of Science. Department of Computing Science

Chapter 2 Community Detection in Bipartite Networks: Algorithms and Case studies

arxiv: v2 [cs.si] 22 Mar 2013

SLPA: Uncovering Overlapping Communities in Social Networks via A Speaker-listener Interaction Dynamic Process

Big Data Analytics Influx of data pertaining to the 4Vs, i.e. Volume, Veracity, Velocity and Variety

AN ANT-BASED ALGORITHM WITH LOCAL OPTIMIZATION FOR COMMUNITY DETECTION IN LARGE-SCALE NETWORKS

Relative Centrality and Local Community Detection

An Empirical Analysis of Communities in Real-World Networks

Holistic Analysis of Multi-Source, Multi- Feature Data: Modeling and Computation Challenges

Label propagation with dams on large graphs using Apache Hadoop and Apache Spark

A Survey on Community Mining in Social Networks

A Simple Acceleration Method for the Louvain Algorithm

Crawling and Detecting Community Structure in Online Social Networks using Local Information

Adaptive Modularity Maximization via Edge Weighting Scheme

arxiv: v2 [physics.soc-ph] 16 Sep 2010

SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS

On community detection in very large networks

TELCOM2125: Network Science and Analysis

Hierarchical community structure in complex (social) networks. arxiv: v2 [physics.soc-ph] 10 Mar 2014

Subspace Based Network Community Detection Using Sparse Linear Coding

A new Pre-processing Strategy for Improving Community Detection Algorithms

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal

MCL. (and other clustering algorithms) 858L

Fast Parallel Algorithm For Unfolding Of Communities In Large Graphs

Local Community Detection Based on Small Cliques

Web Structure Mining Community Detection and Evaluation

Fast Community Detection For Dynamic Complex Networks

Supplementary material to Epidemic spreading on complex networks with community structures

Detecting Community Structure for Undirected Big Graphs Based on Random Walks

Distributed community detection in complex networks using synthetic coordinates

High Quality, Scalable and Parallel Community Detection for Large Real Graphs

Spectral Graph Multisection Through Orthogonality

Community Preserving Network Embedding

arxiv: v2 [cs.si] 20 Jan 2017

Genetic Algorithm with a Local Search Strategy for Discovering Communities in Complex Networks

C omplex networks are widely used for modeling real-world systems in very diverse areas, such as sociology,

COMMUNITY is used to represent a group of nodes

EECS730: Introduction to Bioinformatics

Toward Small Community Discovery in Social Networks. Ran Wang

Statistical Physics of Community Detection

Graph analytics approach to analyse Enterprise Architecture models

A Fast Method of Detecting Overlapping Community in Network Based on LFM

A Scalable Distributed Louvain Algorithm for Large-scale Graph Community Detection

Static community detection algorithms for evolving networks

Generalized Louvain method for community detection in large networks

Studying the Properties of Complex Network Crawled Using MFC

Edge Representation Learning for Community Detection in Large Scale Information Networks

Inferring Coarse Views of Connectivity in Very Large Graphs

Community Detection Algorithm based on Centrality and Node Closeness in Scale-Free Networks

Scalable Community Detection Benchmark Generation

Maximizing edge-ratio is NP-complete

Relative Validity Criteria for Community

Community detection in Social Media

Metropolitan Community Deliniation. Community detection algorithms. Nathaniel Pritchard

arxiv: v1 [cs.si] 25 Sep 2015

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Understanding Community Effects on Information Diffusion

MSA220 - Statistical Learning for Big Data

A Novel Parallel Hierarchical Community Detection Method for Large Networks

arxiv: v1 [cs.si] 7 Jul 2016

A Fast Algorithm to Find Overlapping Communities in Networks

Communities Validity:

Algorithm Engineering Applied To Graph Clustering

FlowPro: A Flow Propagation Method for Single Community Detection

A Dynamic Algorithm for Local Community Detection in Graphs

Network Traffic Measurements and Analysis

Community Detection in Social Networks

On the Min-Max 2-Cluster Editing Problem

Online Social Networks and Media. Community detection

A Comparison of Community Detection Algorithms on Artificial Networks

Large-scale networks with thousands to millions of. Community detection in large-scale networks: a survey and empirical evaluation

Transcription:

On the Permanence of Vertices in Network Communities Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India 20 th ACM SIGKDD, New York City, Aug 24-27, 2014

Tanmoy Chakraborty Niloy Ganguly IIT Kharagpur, India Animesh Mukherjee Sriram Srinivasan Sanjukta Bhowmick University of Nebraska, Omaha

A Shoplifting Non-addictive Drug

Heuristic I Total Internal connections > maximum external connections to any one of the external communities Modularity, Conductance, Cut-ration consider total external connections

A Shoplifting Drug

Heuristic II Internal neighbors should be highly connected => high clustering coefficient among internal neighbors Modularity, conductance and cut-ratio do not consider clustering coefficient

Permanence Perm( v) = [ I( v) E ( v) max 1 ] D( v) (1 C in ( v)) I(v)=internal deg of v D(v)=degree of v E max (v)=max connection to an external neighbor C in (v)=clustering coefficient of internal neighbors Perm(v)=0.12 I(v)=4, D(v)=7, E max (v)=2 C in (v)=5/6

Permanence Permanence ~ 1 Permanence = 0 Wrong vertex-to-community assignment Permanence ~ -1

Research Questions 1. Assigning every vertex in a community is reasonable? 2. No one measures the intensity of belongingness of a vertex to a community Only try to detect best community structure in the network Never ask for whether a network possesses a strong community Permanence answers all these questions

Test Suite of Networks Synthetic Networks LFR networks with different values of mixing parameter (µ) Real-world Networks (Lancichinetti & Fortunato, PRE, 11) Football Network Nodes: teams, Edges: matches, Communities: team-conference (Girvan & Newman, PNAS, 02) Railway Network Nodes: station, Edges: train-connections, Communities: state/provinces (Ghosh et al., Acta Physica, 11) Coauthorship Network Nodes: authors, Edges: coauthorships, Communities: research field (Chakraborty et al., ASONAM, 13)

Baseline Algorithms Modularity based FastGreedy (Newman, PRE, 04) Louvain (Blondel et al, J. Stat. Mech., 08) CNM (Clauset et al, PRE, 04) Random-walk based WalkTrap (Pons & Latapy, J. Graph Algo and Appln, 06) Compression based InfoMod (Rosvall & Bergstrom, PNAS, 07) InfoMap (Rosvall & Bergstrom, PNAS, 08)

Permanence: A Better Community Scoring Function

Methodology Approach (Steinhaeuser & Chawla, PRL, 10): 1. Consider a network 2. Run N community detection algorithms (here N=6) 3. Compute community scoring functions on these outputs 4. Rank the algos based on these values 5. Compare outputs with the ground-truth using validation metrics and rank algos again 6. Find rank-correlation FastGreedy Louvain CNM WalkTrap InfoMod InfoMap Football Network Mod Per Con Cut 0.2 0.6 0.8 0.4 0.5 0.1 5 2 1 4 3 6 correlation NMI ARI PU 0.3 4 0.5 2 0.8 1 0.2 5 0.4 3 0.1 6 Intuition: Ranking of good scoring function and the validation measures should be high

Results Lighter color is better Table : Performance of the community scoring functions averaged over all the validation measures Fig. : Heat maps depicting pairwise Spearman's rank correlation

Developing Community Detection Algorithm

Major Limitations Limitations of optimization algorithms Resolution limit (Fortunato & Barthelemy, PNAS, 07) Degeneracy of solutions (Good et al., PRE, 10) Asymptotic growth (Good et al., PRE, 10)

Community Detection Based on Maximizing Permanence Follow similar strategy used in Louvain algorithm (a greedy modularity maximization) (Blondel et al., J. Stat. Mech, 07) Selecting seed nodes helps converge the process faster We only consider those communities having size >=3 Communities having size<3 remain as singleton

Experimental Results Algo LFR (µ=0.1) LFR (µ=0.3) LFR (µ=0.6) Why???? Football Railway Coauthors hip Louvain 0.02 0.00-0.75 0.02 0.14 0.00 FastGrdy 0.00 0.87 0.02 0.01 0.37 0.14 CNM 0.14 0.40-0.13 0.30 0.00 0.05 WalkTrap 0.00 0.00-0.50 0.02 0.02 0.01 Infomod 0.06 0.08-0.20 0.19 0.04 0.00 Infomap 0.00 0.00-0.72 0.02-0.02 0.03 Table: Differences of our algorithm with the other algorithms averaged over all validation measures

LFR (µ = 0.1) vs. LFR (µ = 0.6) LFR (µ = 0.1) LFR (µ = 0.6) µ = Avg. ratio between the number of external connections to its degree

Permanence is Nice Permanence is not very sensitive to minor perturbation, but very sensitive after a certain threshold LFR (µ=0.1) Football Values Perturbation intensity (p) Permanence finds small-size communities Identify singleton (act as junction in Railway n/w) and small communities (subfields in Coauthorship n/w)

Issues Related with Modularity Maximization Resolution limit If a vertex is very tightly connected to a community and very loosely connected to another community, highest permanence is obtained when it joins the community to which it is more connected. Degeneracy of solution if a vertex is sufficiently loosely connected to its neighbouring communities and has equal number of connections to each community, then in most cases it will remain as singleton, rather than arbitrarily joining any of its neighbour groups. Asymptotic growth of value All the parameters of parameters are independent of the symmetric growth of network size and the number of communities. Analytical proofs: http://cnerg.org/permanence

Take Away Permanence a better community scoring function sensitive to perturbation after a certain threshold indicates the eligibility of a network for community detection Maximizing permanence a better community detection algorithm can detect small-size communities ameliorates existing limitations Future work Recast permanence for overlapping communities Recast permanence for weighted graphs Recast permanence for dynamic community detection

Thank you http://cnerg.org/permanence