Social Data Management Communities

Similar documents
INTRODUCTION SECTION 9.1

CAIM: Cerca i Anàlisi d Informació Massiva

Basics of Network Analysis

Community detection. Leonid E. Zhukov

NETWORK SCIENCE COMMUNITIES

Modularity CMSC 858L

A new Pre-processing Strategy for Improving Community Detection Algorithms

Non Overlapping Communities

Online Social Networks and Media. Community detection

Community Detection. Community

Crawling and Detecting Community Structure in Online Social Networks using Local Information

Using Stable Communities for Maximizing Modularity

Community Detection: Comparison of State of the Art Algorithms

Clusters and Communities

EXTREMAL OPTIMIZATION AND NETWORK COMMUNITY STRUCTURE

Community Detection in Bipartite Networks:

Network community detection with edge classifiers trained on LFR graphs

2007 by authors and 2007 World Scientific Publishing Company

Expected Nodes: a quality function for the detection of link communities

Community Structure and Beyond

Oh Pott, Oh Pott! or how to detect community structure in complex networks

On community detection in very large networks

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Betweenness Measures and Graph Partitioning

Community detection using boundary nodes in complex networks

Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes

CUT: Community Update and Tracking in Dynamic Social Networks

Community Detection in Social Networks

Keywords: dynamic Social Network, Community detection, Centrality measures, Modularity function.

A Simple Acceleration Method for the Louvain Algorithm

Social-Network Graphs

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Generalized Modularity for Community Detection

Clustering CS 550: Machine Learning

Web Structure Mining Community Detection and Evaluation

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Mining Social Network Graphs

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal

Cluster Analysis: Agglomerate Hierarchical Clustering

Community Detection Algorithm based on Centrality and Node Closeness in Scale-Free Networks

Network Science: Principles and Applications

Clustering Algorithms on Graphs Community Detection 6CCS3WSN-7CCSMWAL

Maximizing edge-ratio is NP-complete

Community Detection in Directed Weighted Function-call Networks

Spectral Methods for Network Community Detection and Graph Partitioning

My favorite application using eigenvalues: partitioning and community detection in social networks

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

Finding and Evaluating Fuzzy Clusters in Networks

An Efficient Algorithm for Community Detection in Complex Networks

Community structures evaluation in complex networks: A descriptive approach

Community Detection by Affinity Propagation

Hierarchical Clustering

MCL. (and other clustering algorithms) 858L

Structure of biological networks. Presentation by Atanas Kamburov

Uncertain Data Management Non-relational models: Graphs

A New Vertex Similarity Metric for Community Discovery: A Local Flow Model

Overlapping Communities

Approximation of the Maximal α Consensus Local Community detection problem in Complex Networks

Social and Technological Network Analysis. Lecture 4: Community Detec=on and Overlapping Communi=es. Prof. Cecilia Mascolo

Overview Of Various Overlapping Community Detection Approaches

Persistent Homology in Complex Network Analysis

Hierarchical Clustering

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

Detection of Communities and Bridges in Weighted Networks

TELCOM2125: Network Science and Analysis

Understanding complex networks with community-finding algorithms

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

Chapter 2 Detecting the Overlapping and Hierarchical Community Structure in Networks 2.1 Introduction

Graph Coarsening and Clustering on the GPU

Statistical Physics of Community Detection

LICOD: Leaders Identification for Community Detection in Complex Networks

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology

Hierarchical Overlapping Community Discovery Algorithm Based on Node purity

Collaborative Filtering based on Dynamic Community Detection

Review on Different Methods of Community Structure of a Complex Software Network

Graph Theory S 1 I 2 I 1 S 2 I 1 I 2

Detecting Community Structure for Undirected Big Graphs Based on Random Walks

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

MULTI-SCALE COMMUNITY DETECTION USING STABILITY AS OPTIMISATION CRITERION IN A GREEDY ALGORITHM

Hierarchical Clustering 4/5/17

Dynamic Clustering in Social Networks using Louvain and Infomap Method

V4 Matrix algorithms and graph partitioning

Introduction to network metrics

On the Approximability of Modularity Clustering

CS224W Final Report: Study of Communities in Bitcoin Network

Research on Community Structure in Bus Transport Networks

Clustering on networks by modularity maximization

Extracting Communities from Networks

Communities and Balance in Signed Networks: A Spectral Approach

Extracting Information from Complex Networks

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Netplexity The Complexity of Interactions in the Real World

Edge Weight Method for Community Detection in Scale-Free Networks

Cluster Analysis. Ying Shen, SSE, Tongji University

Understanding Topological Mesoscale Features in Community Mining

arxiv: v2 [cs.si] 22 Mar 2013

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Lesson 3. Prof. Enza Messina

Transcription:

Social Data Management Communities Antoine Amarilli 1, Silviu Maniu 2 January 9th, 2018 1 Télécom ParisTech 2 Université Paris-Sud 1/20

Table of contents Communities in Graphs 2/20

Graph Communities Communities in the Belgium call graph 3/20

Graph Communities Communities in Zachary s Karate club 4/20

Detecting Communities Hypothesis 1 Hypothesis A graph s community structure is uniquely encoded in its topology. 5/20

Detecting Communities Hypothesis 2 Hypothesis A community is a locally dense connected subgraph in a network. 6/20

Detecting Communities A few approaches: 1. Maximum Cliques: a community is a subgraph whose nodes are all connected to each other. 2. Strong Communities: relaxation of cliques, depending on the internal degree (number of neighbors in the community) vs. external degree (neighbours outside of the community) a strong community has a greater internal degree than external degree. 7/20

Detecting Communities A naïve algorithm for detecting 2 communities: 1. divide the graph in two (find a cut) and decide if they are strong communities, and 2. choose the best cut over all possible cuts. This can generalize to more communities, but it needs to generate an exponential number of cuts. 8/20

Polynomial Algorithms We need polynomial algorithms to be able to detect efficiently the communities in a graph. One approach is hierarchical clustering: uses a similarity matrix X, where x ij encodes the similarity between nodes i and j, and based on this, agglomerative algorithms merge nodes into the same community, while divisive algorithms isolate communities by removing low similarity links. 9/20

Agglomerative Algorithm Ravasz 1. Define the similarity matrix: various ways, but the algorithm uses the topological overlap matrix, encoding the number of common neighbors over the maximum possible. 2. Define group similarity: computed as the average cluster similarity the average of x ij over all node pairs 3. Apply the Hierarchical Clustering: 3.1 assign each node to a community of their own, 3.2 find the community pairs with highest similarity and merge them, 3.3 compute the similarity between all communities 3.4 repeat until only one community exists. 4. The community structure will be encoded in the Dendogram, showing the order in which communities were merged (see next slide). 10/20

Agglomerative Algorithm Ravasz 11/20

Divisive Algorithm Girvan-Newman Divisive algorithms remove edges: 1. Define centrality: x ij needs to select nodes in different communities, e.g., betweenness. 2. Apply the Hierarchical Clustering: 2.1 remove the link with the largest centrality 2.2 recompute the centrality of all other links 2.3 repeat until no links exist 3. The community structure will be encoded in the Dendogram, showing the order in which edges were removed (see next slide). 12/20

Divisive Algorithm Girvan-Newman 13/20

Detecting Communities Hypothesis 3 Hypothesis Random networks lack a community structure. 14/20

Modularity Consider a graph having some partition into communities C having L C links. If L C is greater than the number of links expected by a random wiring having the same degree distribution, then it is a potential community. This is measured by the modularity: M C = 1 2L (i,j) C (A ij p ij ), where p ij can be computed by randomizing the original network, e.g.,: p ij = k ik j 2L. For the entire graph: we sum the modularities over all communities. 15/20

Modularity Examples 16/20

Modularity Algorithm Hypothesis The partition with maximum modularity corresponds to the optimal community structure. 17/20

Modularity Algorithm For computation efficiency concerns, all algorithms use a greedy approach: 1. Assign each node to its own community. 2. Inspect each community pair connected by at least one link and merge the ones having the highest increase in modularity M for the whole network. 3. Repeat until all nodes are in a single community. 4. Choose the partition with the highest modularity. 18/20

Community Detection Algorithms Complexity algorithm type complexity Ravazs agglomerative O(N 2 ) Girvan Newman divisive O(N 3 ) greedy optimized modularity O(N log 2 N) Louvain modularity O(L) Infomap flow O(N log N) 19/20

Open Issues in Community Detection Do communities really exist?: given a network, do we know it is always organized in communities? Are the hypotheses valid?: is a community only identified by its wiring diagram? Does everybody belong to a community? How do we know which measure is the valid one?: centrality, similarity, modularity, flow, etc. 20/20

Acknowledgments Figures in slides 3, 4, 11, 13, and 16 taken from the book Network Science by A.-L. Barabási. The contents is partly inspired by the flow of Chapter 9 of the same book. http://barabasi.com/networksciencebook/

References i Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10). Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821 7826. Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Phys. Rev. E, 69.

References ii Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., and Barabási, A.-L. (2002). Hierarchical organization of modularity in metabolic networks. Science, 297(5586):1551 1555.