Community Detection in Bipartite Networks: Algorithms and Case Studies Kathy Horadam and Taher Alzahrani Mathematical and Geospatial Sciences, RMIT Melbourne, Australia IWCNA 2014 Community Detection, 30/05/2014 Kathy Horadam 1 / 34
Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 2 / 34
Bipartite Networks Bipartite Networks The network has two node sets P (Primary, of most interest) and S (Secondary). Edges do not occur between nodes in the same set. Community Detection, 30/05/2014 Kathy Horadam 3 / 34
Bipartite Networks Why do we care about bipartiteness? Many real world examples are naturally bipartite: actors and events (in social networks) authors and papers (in collaboration networks) trains and railway stations (in railway networks) companies and goods (in financial networks) Guillaume & Latapy (2006) [9] argue that any complex network equals a bipartite network through decomposition, and propose a random bipartite graph model ( the configuration model) Community Detection, 30/05/2014 Kathy Horadam 4 / 34
Community Detection Algorithms Types of algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 5 / 34
Community Detection Algorithms Types of algorithms Aim of Community Detection algorithms: to derive coarse-grain depiction of real large-scale networks There is a vast number of community detection algorithms available. Two algorithmic approaches: Structural examination of strength of within community connectivity vs between community connectivity. based on underlying stochastic model of network formation e.g. measured by conductance, modularity, link density Examining flows across network from which structure/communities emerge which are visited more frequently within than are jumped from. based on how underlying structure constrains flow across network e.g. measured by random walks + teleportation, spectral methods. Community Detection, 30/05/2014 Kathy Horadam 6 / 34
Community Detection Algorithms Performance of algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 7 / 34
Community Detection Algorithms Performance of algorithms Performance on benchmark networks Is there a best" algorithm? Public benchmark graph data (Lancichinetti et al [11]) which has : Ability to test larger size networks of 10 3 to 10 5 nodes. Power law distributions for the node degree and community sizes. Overlapping communities. Directed and weighted graphs. Compares partition found by tested algorithm against actual ("planted") partition Community Detection, 30/05/2014 Kathy Horadam 8 / 34
Community Detection Algorithms Performance of algorithms Comparison of performance of 12 algorithms [12] Conclude that Infomap", Louvain" and a Potts model method are the best performing algorithms on these benchmarks. Infomap and Louvain also very fast, linear in network size, so further tested on benchmark graphs with 50,000 and 100,000 nodes range of community sizes, from 20 to 1,000 nodes maximum degree 200 Performance of Louvain is worse than on smaller graphs, whereas that of Infomap is stable. Community Detection, 30/05/2014 Kathy Horadam 9 / 34
Community Detection Algorithms Modularity-based algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 10 / 34
Community Detection Algorithms Modularity-based algorithms Modularity-based algorithms: CNM and Louvain CNM (Clauset-Newman-Moore) algorithm [4]: optimize a quality function modularity Q (of a partition M of a graph with N vertices), OK for graphs up to N = 10 6 vertices. Problem: The resolution limit of modularity [8] Maximisation of Q will fail to identify communities with edge number E (even when they are cliques). eg if m >> p, higher modularity for joined pair of cliques than for cliques themselves. So it may not find important small communities. Various attempts to overcome this. Community Detection, 30/05/2014 Kathy Horadam 11 / 34
Community Detection Algorithms Modularity-based algorithms Louvain", Blondel et.al Fast unfolding algorithm 2008 Multistep technique: local optimization of Q in nbhd of each node [3]. Community Detection, 30/05/2014 Kathy Horadam 12 / 34
Community Detection Algorithms Visit-frequency algorithm Infomap Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 13 / 34
Community Detection Algorithms Visit-frequency algorithm Infomap Infomap", Rosvall and Bergstrom 2008, Minimum Description Length [16] Minimises a different quality function from Q, the map equation" L of a partition M of the graph. L is the average description length of binary codewords describing a single step in a random walk on the graph. L is a sum of weighted entropies. Exploits the duality between compressing information representing a flow on the network and detecting/extracting significant structures in network. Community Detection, 30/05/2014 Kathy Horadam 14 / 34
Community Detection Algorithms Visit-frequency algorithm Infomap Infomap (Rosvall and Bergstrom 2008 [16]) Community Detection, 30/05/2014 Kathy Horadam 15 / 34
Algorithms for Bipartite Networks Structural algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 16 / 34
Algorithms for Bipartite Networks Structural algorithms Structural algorithms for Bipartite Networks Different modularity based algorithms developed by Barber[1], Gumiera [10] and Michel et al [5]. Label Propagation Algorithm for bipartite network (LPAb) [2] Table: Numbers of communities of women (P) in benchmark Southern women network Algorithm Optimised Network Modules Guimera [10] modularity weighted projection 2 Michel [5] bimodularity bipartite 3 Barber [1] bimodularity bipartite 4 LPAb(+) bimodularity bipartite 4 Community Detection, 30/05/2014 Kathy Horadam 17 / 34
Algorithms for Bipartite Networks Minimum message length algorithms Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 18 / 34
Algorithms for Bipartite Networks Minimum message length algorithms Problem: Infomap can t be applied to a bipartite network Example If you start in one set of a bipartite network, then you will always be in that set after an even number of steps, so the probability of being at a particular vertex is zero at odd time steps. In terms of Markov chains, the random walk on a bipartite graph is periodic. The random walk has a stationary distribution on a bipartite graph, but it won t converge to it. Thus, we can not implement Infomap on bipartite graph because of periodicity. Community Detection, 30/05/2014 Kathy Horadam 19 / 34
Algorithms for Bipartite Networks Minimum message length algorithms A different MDL algorithm CAN be applied directly to bipartite networks. Inference using MDL on stochastic blockmodel (Peixoto 2013 [15]). BUT 1 There is again a theoretical resolution limit for detection of communities of size N. 2 Algorithm only tested on small networks. However, the inferred communities fully reflect the bipartite nature. The communities partition P and S separately. This supports our decision to apply Infomap to the weighted networks projected from P (or from S). Community Detection, 30/05/2014 Kathy Horadam 20 / 34
Our Approach: Apply Infomap Algorithm to Weighted Projection Projection based on common neighbors Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 21 / 34
Our Approach: Apply Infomap Algorithm to Weighted Projection Projection based on common neighbors Method: Multiple links - weighted self-connections Community Detection, 30/05/2014 Kathy Horadam 22 / 34
Our Approach: Apply Infomap Algorithm to Weighted Projection Results: Real world bipartite networks Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 23 / 34
Our Approach: Apply Infomap Algorithm to Weighted Projection Results: Real world bipartite networks Experiments on five real world bipartite networks Table: Network sizes, where P and S are the number of primary set nodes and secondary set nodes respectively and m is the total number of edges. Network P S m Southern women[6] 18 14 89 NSW Crimes[14] 155 22 9611 Noordin Top Terrorists 74 45 276 Scientific collaboration[13] 16726 22016 58595 Australian government contracts[7] 11924 1655 70019 Community Detection, 30/05/2014 Kathy Horadam 24 / 34
Our Approach: Apply Infomap Algorithm to Weighted Projection Results: Real world bipartite networks Table: Community numbers in P, where L is the (minimum) code length and Q is the (maximum) modularity. Infomap Louvain Network Comm. L Comm. Q Southern women 4 3.992 2 0.352 NSW crime 2 7.276 1 0.0 Noordin Top Terrorists 5 5.846 4 0.343 Australian government 1114 8.340 836 0.530 Scientific collaboration 2131 6.164 1266 0.877 Community Detection, 30/05/2014 Kathy Horadam 25 / 34
Case Studies Benchmark - Southern Women Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 26 / 34
Case Studies Benchmark - Southern Women Southern women network [6] (women P and events S) The four communities of women found in the Southern women dataset. Red nodes represent S, the events the women attended, and the four other colors represent four communities within P, with nodes labelled by first name. The 2 women in each smallest community are core members of their social networks. Community Detection, 30/05/2014 Kathy Horadam 27 / 34
Case Studies NSW Crimes Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 28 / 34
Case Studies NSW Crimes NSW crimes (LGAs P and crimes S) Community Detection, 30/05/2014 Kathy Horadam 29 / 34
Case Studies Noordin Top Terrorist Network Outline 1 Bipartite Networks 2 Community Detection Algorithms 3 Algorithms for Bipartite Networks 4 Our Approach: Apply Infomap Algorithm to Weighted Projection 5 Case Studies 6 Conclusions and Future Work Community Detection, 30/05/2014 Kathy Horadam 30 / 34
Case Studies Noordin Top Terrorist Network Noordin Top Terrorists (persons P, affiliations S) The two small cliques we find are the Bali II (2005) bombers (Community 5) and Ring Baten members involved in bombing the Australian embassy in 2004 (Community 4). In Community 1, 17 of the 25 members belonged to Jemaah Islamiyah, a transnational Southeast Asian militant Islamist terrorist organisation linked to Al-Qaeda. Community Detection, 30/05/2014 Kathy Horadam 31 / 34
Conclusions and Future Work Conclusions and future work Integration of projection with Infomap results in more valuable information about small strong communities than the high performance modularity based algorithm Louvain. Community Detection, 30/05/2014 Kathy Horadam 32 / 34
Conclusions and Future Work Conclusions and future work Integration of projection with Infomap results in more valuable information about small strong communities than the high performance modularity based algorithm Louvain. Neither Louvain nor Infomap finds overlapping communities. We may merge communities found separately in P and S to do this and recover communities in the bipartite graph. Community Detection, 30/05/2014 Kathy Horadam 32 / 34
Conclusions and Future Work Conclusions and future work Integration of projection with Infomap results in more valuable information about small strong communities than the high performance modularity based algorithm Louvain. Neither Louvain nor Infomap finds overlapping communities. We may merge communities found separately in P and S to do this and recover communities in the bipartite graph. We will compare the communities found this way with those found by modularity and message-length algorithms using eg NMI (normalised mutual information). Community Detection, 30/05/2014 Kathy Horadam 32 / 34
Conclusions and Future Work Conclusions and future work Integration of projection with Infomap results in more valuable information about small strong communities than the high performance modularity based algorithm Louvain. Neither Louvain nor Infomap finds overlapping communities. We may merge communities found separately in P and S to do this and recover communities in the bipartite graph. We will compare the communities found this way with those found by modularity and message-length algorithms using eg NMI (normalised mutual information). Acknowledgements: Thanks to Serdar Boztas, Chris Bellman, Sarah Taylor and Murray Aitken. The speaker is partly supported by Department of Defence of Australia Agreement 4500743680. Community Detection, 30/05/2014 Kathy Horadam 32 / 34
Conclusions and Future Work References M. J. Barber, "Modularity and community detection in bipartite networks," Physical review E, vol. 76, p. 066102 (2007). M. J. Barber and J. W. Clark, "Detecting network communities by propagating labels under constraints," Physical review E, vol. 80, p. 026129 (2009). Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, P10008 (2008) Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Physical Review E 70, 066111 (2004) Crampes, M., Plantie, M.: A Unified Community Detection, Visualization and Analysis method. arxiv preprint arxiv:1301.7006 (2013). Davis, A., Gardner, B.B., Gardner, M.R.: Deep south: A Social Anthropological Study of Caste and Class University of Chicago Press Chicago (1941). Department of Finance and Deregulation. Dataset [Online]. Historical Australian Government Contract Data. (2013, February 27). Available: http://data.gov.au/ dataset/historical-australian-government-contract-data/ Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 36-41 (2007) Community Detection, 30/05/2014 Kathy Horadam 33 / 34
Conclusions and Future Work Guillaume, J.-L., Latapy, M.: Bipartite graphs as models of complex networks, Physica A 371, 795 813 (2006). R. Guimera, M. Sales-Pardo, and L. s. A. N. Amaral, "Module identification in bipartite and directed networks," Physical review E, vol. 76, p. 036102 (2007). Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Physical Review E 78, 046110 (2008). Lancichinetti, A., Fortunato, S.: Community detection algorithms: A comparative analysis. Physical Review E 80, 056117 (2009) Newman, M.E.: The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98, 404-409 (2001). NSW Bureau of Crime Statistics and Research. Dataset [Online]. NSW Crime data. (2008 December). Available: http://data.gov.au/dataset/nsw-crime-data/ Peixoto, T., Parsimonious module inference in large networks, Physical Review Letters 110, 148701 (2013). Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 1118 1123 (2008). Community Detection, 30/05/2014 Kathy Horadam 34 / 34