Community Detection in Bipartite Networks:

Similar documents
Chapter 2 Community Detection in Bipartite Networks: Algorithms and Case studies

Community Detection: Comparison of State of the Art Algorithms

Social Data Management Communities

On the Permanence of Vertices in Network Communities. Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India

Community detection. Leonid E. Zhukov

Community detection using boundary nodes in complex networks

EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE

Network community detection with edge classifiers trained on LFR graphs

On community detection in very large networks

Expected Nodes: a quality function for the detection of link communities

A new Pre-processing Strategy for Improving Community Detection Algorithms

arxiv: v2 [physics.soc-ph] 16 Sep 2010

Crawling and Detecting Community Structure in Online Social Networks using Local Information

Revealing Multiple Layers of Deep Community Structure in Networks

Generalized Modularity for Community Detection

Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities

A Simple Acceleration Method for the Louvain Algorithm

2007 by authors and 2007 World Scientific Publishing Company

Relative Centrality and Local Community Detection

MULTI-SCALE COMMUNITY DETECTION USING STABILITY AS OPTIMISATION CRITERION IN A GREEDY ALGORITHM

CUT: Community Update and Tracking in Dynamic Social Networks

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

An Efficient Algorithm for Community Detection in Complex Networks

Edge Representation Learning for Community Detection in Large Scale Information Networks

Near Linear-Time Community Detection in Networks with Hardly Detectable Community Structure

Finding Hierarchical Communities in Complex Networks Using Influence-Guided Label Propagation

Community Detection in Directed Weighted Function-call Networks

My favorite application using eigenvalues: partitioning and community detection in social networks

Modeling and Detecting Community Hierarchies

A new method for community detection in social networks based on message distribution

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Package manet. September 19, 2017

CEIL: A Scalable, Resolution Limit Free Approach for Detecting Communities in Large Networks

Community Detection: A Bayesian Approach and the Challenge of Evaluation

Supplementary material to Epidemic spreading on complex networks with community structures

arxiv: v2 [cs.si] 22 Mar 2013

This is a published version of a paper published in PLoS ONE.

Generalized Louvain method for community detection in large networks

Definition: Implications: Analysis:

STA 4273H: Statistical Machine Learning

Label propagation with dams on large graphs using Apache Hadoop and Apache Spark

Stanford Infolab Technical Report. Overlapping Communities Explain Core-Periphery Organization of Networks

Demystifying movie ratings 224W Project Report. Amritha Raghunath Vignesh Ganapathi Subramanian

Spectral Graph Multisection Through Orthogonality

Statistical Physics of Community Detection

Research Article Detecting Local Community Structures in Networks Based on Boundary Identification

Community Detection. Community

Static community detection algorithms for evolving networks

Generalized Measures for the Evaluation of Community Detection Methods

Fast Parallel Algorithm For Unfolding Of Communities In Large Graphs

Sequential algorithm for fast clique percolation

Vrije Universiteit Amsterdam. Optimizing Community Detection Using the Kemeny Constant

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Local higher-order graph clustering

Hierarchical Problems for Community Detection in Complex Networks

Keywords: dynamic Social Network, Community detection, Centrality measures, Modularity function.

1 Degree Distributions

AN ANT-BASED ALGORITHM WITH LOCAL OPTIMIZATION FOR COMMUNITY DETECTION IN LARGE-SCALE NETWORKS

arxiv: v2 [physics.soc-ph] 24 Jul 2009

Efficient Mining Algorithms for Large-scale Graphs

Performance Effects of Dynamic Graph Data Structures in Community Detection Algorithms

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

FlowPro: A Flow Propagation Method for Single Community Detection

Network modularity reveals critical scales for connectivity in ecology and

A Novel Parallel Hierarchical Community Detection Method for Large Networks

Reflexive Regular Equivalence for Bipartite Data

Review on Different Methods of Community Structure of a Complex Software Network

CENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy

Dynamic Clustering in Social Networks using Louvain and Infomap Method

Centrality Measures to Identify Traffic Congestion on Road Networks: A Case Study of Sri Lanka

Community Detection in Networks using Node Attributes and Modularity

A Review on Overlapping Community Detection Algorithms

C omplex networks are widely used for modeling real-world systems in very diverse areas, such as sociology,

Collaborative Filtering based on Dynamic Community Detection

LARGE-SCALE COMMUNITY DETECTION ON SPEAKER CONTENT GRAPHS

Detecting Communities in K-Partite K-Uniform (Hyper)Networks

Hierarchical Graph Clustering: Quality Metrics & Algorithms

Information-Theoretic Co-clustering

Subspace Based Network Community Detection Using Sparse Linear Coding

Adaptive Modularity Maximization via Edge Weighting Scheme

COUNTING AND PROBABILITY

SLPA: Uncovering Overlapping Communities in Social Networks via A Speaker-listener Interaction Dynamic Process

Finding missing edges and communities in incomplete networks

Relative Constraints as Features

A Fast Method of Detecting Overlapping Community in Network Based on LFM

Two mode Network. PAD 637, Lab 8 Spring 2013 Yoonie Lee

Community detection in Social Media

University of Alberta. Justin Fagnan. Master of Science. Department of Computing Science

CS224W Final Report: Study of Communities in Bitcoin Network

MC 302 GRAPH THEORY 10/1/13 Solutions to HW #2 50 points + 6 XC points

IDLE: A Novel Approach to Improving Overlapping Community Detection in Complex Networks

Overview Of Various Overlapping Community Detection Approaches

Genetic Algorithm with a Local Search Strategy for Discovering Communities in Complex Networks

Netplexity The Complexity of Interactions in the Real World

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Detecting Community Structure for Undirected Big Graphs Based on Random Walks

Single link clustering: 11/7: Lecture 18. Clustering Heuristics 1

Spectral Methods for Network Community Detection and Graph Partitioning

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

Scott Philips, Edward Kao, Michael Yee and Christian Anderson. Graph Exploitation Symposium August 9 th 2011

Transcription:

Community Detection in Bipartite Networks: Algorithms and Case Studies Kathy Horadam and Taher Alzahrani Mathematical and Geospatial Sciences, RMIT Melbourne, Australia IWCNA 2014 Community Detection, 30/05/2014 Kathy Horadam 1 / 34

Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 2 / 34

Bipartite Networks Bipartite Networks The network has two node sets P (Primary, of most interest) and S (Secondary). Edges do not occur between nodes in the same set. Community Detection, 30/05/2014 Kathy Horadam 3 / 34

Bipartite Networks Why do we care about bipartiteness? Many real world examples are naturally bipartite: actors and events (in social networks) authors and papers (in collaboration networks) trains and railway stations (in railway networks) companies and goods (in financial networks) Guillaume & Latapy (2006) [9] argue that any complex network equals a bipartite network through decomposition, and propose a random bipartite graph model ( the configuration model) Community Detection, 30/05/2014 Kathy Horadam 4 / 34

Community Detection Algorithms Types of algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 5 / 34

Community Detection Algorithms Types of algorithms Aim of Community Detection algorithms: to derive coarse-grain depiction of real large-scale networks There is a vast number of community detection algorithms available. Two algorithmic approaches: Structural examination of strength of within community connectivity vs between community connectivity. based on underlying stochastic model of network formation e.g. measured by conductance, modularity, link density Examining flows across network from which structure/communities emerge which are visited more frequently within than are jumped from. based on how underlying structure constrains flow across network e.g. measured by random walks + teleportation, spectral methods. Community Detection, 30/05/2014 Kathy Horadam 6 / 34

Community Detection Algorithms Performance of algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 7 / 34

Community Detection Algorithms Performance of algorithms Performance on benchmark networks Is there a best" algorithm? Public benchmark graph data (Lancichinetti et al [11]) which has : Ability to test larger size networks of 10 3 to 10 5 nodes. Power law distributions for the node degree and community sizes. Overlapping communities. Directed and weighted graphs. Compares partition found by tested algorithm against actual ("planted") partition Community Detection, 30/05/2014 Kathy Horadam 8 / 34

Community Detection Algorithms Performance of algorithms Comparison of performance of 12 algorithms [12] Conclude that Infomap", Louvain" and a Potts model method are the best performing algorithms on these benchmarks. Infomap and Louvain also very fast, linear in network size, so further tested on benchmark graphs with 50,000 and 100,000 nodes range of community sizes, from 20 to 1,000 nodes maximum degree 200 Performance of Louvain is worse than on smaller graphs, whereas that of Infomap is stable. Community Detection, 30/05/2014 Kathy Horadam 9 / 34

Community Detection Algorithms Modularity-based algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 10 / 34

Community Detection Algorithms Modularity-based algorithms Modularity-based algorithms: CNM and Louvain CNM (Clauset-Newman-Moore) algorithm [4]: optimize a quality function modularity Q (of a partition M of a graph with N vertices), OK for graphs up to N = 10 6 vertices. Problem: The resolution limit of modularity [8] Maximisation of Q will fail to identify communities with edge number E (even when they are cliques). eg if m >> p, higher modularity for joined pair of cliques than for cliques themselves. So it may not find important small communities. Various attempts to overcome this. Community Detection, 30/05/2014 Kathy Horadam 11 / 34

Community Detection Algorithms Modularity-based algorithms Louvain", Blondel et.al Fast unfolding algorithm 2008 Multistep technique: local optimization of Q in nbhd of each node [3]. Community Detection, 30/05/2014 Kathy Horadam 12 / 34

Community Detection Algorithms Visit-frequency algorithm Infomap Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 13 / 34

Community Detection Algorithms Visit-frequency algorithm Infomap Infomap", Rosvall and Bergstrom 2008, Minimum Description Length [16] Minimises a different quality function from Q, the map equation" L of a partition M of the graph. L is the average description length of binary codewords describing a single step in a random walk on the graph. L is a sum of weighted entropies. Exploits the duality between compressing information representing a flow on the network and detecting/extracting significant structures in network. Community Detection, 30/05/2014 Kathy Horadam 14 / 34

Community Detection Algorithms Visit-frequency algorithm Infomap Infomap (Rosvall and Bergstrom 2008 [16]) Community Detection, 30/05/2014 Kathy Horadam 15 / 34

Algorithms for Bipartite Networks Structural algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 16 / 34

Algorithms for Bipartite Networks Structural algorithms Structural algorithms for Bipartite Networks Different modularity based algorithms developed by Barber[1], Gumiera [10] and Michel et al [5]. Label Propagation Algorithm for bipartite network (LPAb) [2] Table: Numbers of communities of women (P) in benchmark Southern women network Algorithm Optimised Network Modules Guimera [10] modularity weighted projection 2 Michel [5] bimodularity bipartite 3 Barber [1] bimodularity bipartite 4 LPAb(+) bimodularity bipartite 4 Community Detection, 30/05/2014 Kathy Horadam 17 / 34

Algorithms for Bipartite Networks Minimum message length algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 18 / 34

Algorithms for Bipartite Networks Minimum message length algorithms Problem: Infomap can t be applied to a bipartite network Example If you start in one set of a bipartite network, then you will always be in that set after an even number of steps, so the probability of being at a particular vertex is zero at odd time steps. In terms of Markov chains, the random walk on a bipartite graph is periodic. The random walk has a stationary distribution on a bipartite graph, but it won t converge to it. Thus, we can not implement Infomap on bipartite graph because of periodicity. Community Detection, 30/05/2014 Kathy Horadam 19 / 34

Algorithms for Bipartite Networks Minimum message length algorithms A different MDL algorithm CAN be applied directly to bipartite networks. Inference using MDL on stochastic blockmodel (Peixoto 2013 [15]). BUT 1 There is again a theoretical resolution limit for detection of communities of size N. 2 Algorithm only tested on small networks. However, the inferred communities fully reflect the bipartite nature. The communities partition P and S separately. This supports our decision to apply Infomap to the weighted networks projected from P (or from S). Community Detection, 30/05/2014 Kathy Horadam 20 / 34

Our Approach: Apply Infomap Algorithm to Weighted Projection Projection based on common neighbors Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 21 / 34

Our Approach: Apply Infomap Algorithm to Weighted Projection Projection based on common neighbors Method: Multiple links - weighted self-connections Community Detection, 30/05/2014 Kathy Horadam 22 / 34

Our Approach: Apply Infomap Algorithm to Weighted Projection Results: Real world bipartite networks Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 23 / 34

Our Approach: Apply Infomap Algorithm to Weighted Projection Results: Real world bipartite networks Experiments on five real world bipartite networks Table: Network sizes, where P and S are the number of primary set nodes and secondary set nodes respectively and m is the total number of edges. Network P S m Southern women[6] 18 14 89 NSW Crimes[14] 155 22 9611 Noordin Top Terrorists 74 45 276 Scientific collaboration[13] 16726 22016 58595 Australian government contracts[7] 11924 1655 70019 Community Detection, 30/05/2014 Kathy Horadam 24 / 34

Our Approach: Apply Infomap Algorithm to Weighted Projection Results: Real world bipartite networks Table: Community numbers in P, where L is the (minimum) code length and Q is the (maximum) modularity. Infomap Louvain Network Comm. L Comm. Q Southern women 4 3.992 2 0.352 NSW crime 2 7.276 1 0.0 Noordin Top Terrorists 5 5.846 4 0.343 Australian government 1114 8.340 836 0.530 Scientific collaboration 2131 6.164 1266 0.877 Community Detection, 30/05/2014 Kathy Horadam 25 / 34

Case Studies Benchmark - Southern Women Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 26 / 34

Case Studies Benchmark - Southern Women Southern women network [6] (women P and events S) The four communities of women found in the Southern women dataset. Red nodes represent S, the events the women attended, and the four other colors represent four communities within P, with nodes labelled by first name. The 2 women in each smallest community are core members of their social networks. Community Detection, 30/05/2014 Kathy Horadam 27 / 34

Case Studies NSW Crimes Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 28 / 34

Case Studies NSW Crimes NSW crimes (LGAs P and crimes S) Community Detection, 30/05/2014 Kathy Horadam 29 / 34

Case Studies Noordin Top Terrorist Network Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 30 / 34

Case Studies Noordin Top Terrorist Network Noordin Top Terrorists (persons P, affiliations S) The two small cliques we find are the Bali II (2005) bombers (Community 5) and Ring Baten members involved in bombing the Australian embassy in 2004 (Community 4). In Community 1, 17 of the 25 members belonged to Jemaah Islamiyah, a transnational Southeast Asian militant Islamist terrorist organisation linked to Al-Qaeda. Community Detection, 30/05/2014 Kathy Horadam 31 / 34

Conclusions and Future Work Conclusions and future work Integration of projection with Infomap results in more valuable information about small strong communities than the high performance modularity based algorithm Louvain. Community Detection, 30/05/2014 Kathy Horadam 32 / 34

Conclusions and Future Work Conclusions and future work Integration of projection with Infomap results in more valuable information about small strong communities than the high performance modularity based algorithm Louvain. Neither Louvain nor Infomap finds overlapping communities. We may merge communities found separately in P and S to do this and recover communities in the bipartite graph. Community Detection, 30/05/2014 Kathy Horadam 32 / 34

Conclusions and Future Work Conclusions and future work Integration of projection with Infomap results in more valuable information about small strong communities than the high performance modularity based algorithm Louvain. Neither Louvain nor Infomap finds overlapping communities. We may merge communities found separately in P and S to do this and recover communities in the bipartite graph. We will compare the communities found this way with those found by modularity and message-length algorithms using eg NMI (normalised mutual information). Community Detection, 30/05/2014 Kathy Horadam 32 / 34

Conclusions and Future Work Conclusions and future work Integration of projection with Infomap results in more valuable information about small strong communities than the high performance modularity based algorithm Louvain. Neither Louvain nor Infomap finds overlapping communities. We may merge communities found separately in P and S to do this and recover communities in the bipartite graph. We will compare the communities found this way with those found by modularity and message-length algorithms using eg NMI (normalised mutual information). Acknowledgements: Thanks to Serdar Boztas, Chris Bellman, Sarah Taylor and Murray Aitken. The speaker is partly supported by Department of Defence of Australia Agreement 4500743680. Community Detection, 30/05/2014 Kathy Horadam 32 / 34

Conclusions and Future Work References M. J. Barber, "Modularity and community detection in bipartite networks," Physical review E, vol. 76, p. 066102 (2007). M. J. Barber and J. W. Clark, "Detecting network communities by propagating labels under constraints," Physical review E, vol. 80, p. 026129 (2009). Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, P10008 (2008) Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Physical Review E 70, 066111 (2004) Crampes, M., Plantie, M.: A Unified Community Detection, Visualization and Analysis method. arxiv preprint arxiv:1301.7006 (2013). Davis, A., Gardner, B.B., Gardner, M.R.: Deep south: A Social Anthropological Study of Caste and Class University of Chicago Press Chicago (1941). Department of Finance and Deregulation. Dataset [Online]. Historical Australian Government Contract Data. (2013, February 27). Available: http://data.gov.au/ dataset/historical-australian-government-contract-data/ Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 36-41 (2007) Community Detection, 30/05/2014 Kathy Horadam 33 / 34

Conclusions and Future Work Guillaume, J.-L., Latapy, M.: Bipartite graphs as models of complex networks, Physica A 371, 795 813 (2006). R. Guimera, M. Sales-Pardo, and L. s. A. N. Amaral, "Module identification in bipartite and directed networks," Physical review E, vol. 76, p. 036102 (2007). Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Physical Review E 78, 046110 (2008). Lancichinetti, A., Fortunato, S.: Community detection algorithms: A comparative analysis. Physical Review E 80, 056117 (2009) Newman, M.E.: The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98, 404-409 (2001). NSW Bureau of Crime Statistics and Research. Dataset [Online]. NSW Crime data. (2008 December). Available: http://data.gov.au/dataset/nsw-crime-data/ Peixoto, T., Parsimonious module inference in large networks, Physical Review Letters 110, 148701 (2013). Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 1118 1123 (2008). Community Detection, 30/05/2014 Kathy Horadam 34 / 34