Non Overlapping Communities

Similar documents
Mining Social Network Graphs

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #21: Graph Mining 2

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Social-Network Graphs

Online Social Networks and Media. Community detection

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

My favorite application using eigenvalues: partitioning and community detection in social networks

Community Detection. Community

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

Modularity CMSC 858L

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

Clustering Algorithms on Graphs Community Detection 6CCS3WSN-7CCSMWAL

Web Structure Mining Community Detection and Evaluation

Betweenness Measures and Graph Partitioning

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Social Data Management Communities

Clusters and Communities

Extracting Information from Complex Networks

Basics of Network Analysis

CS 534: Computer Vision Segmentation II Graph Cuts and Image Segmentation

Lesson 2 7 Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Clustering on Handwritten Digits Database

MCL. (and other clustering algorithms) 858L

Graph Mining: Overview of different graph models

V4 Matrix algorithms and graph partitioning

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

Introduction to Machine Learning

CAIM: Cerca i Anàlisi d Informació Massiva

Graph Theory Review. January 30, Network Science Analytics Graph Theory Review 1

Overlapping Communities

Community Structure and Beyond

Visual Representations for Machine Learning

CS 534: Computer Vision Segmentation and Perceptual Grouping

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Extracting Communities from Networks

Oh Pott, Oh Pott! or how to detect community structure in complex networks

Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal

Segmentation and Grouping

Tie strength, social capital, betweenness and homophily. Rik Sarkar

Graph Theory for Network Science

Community Analysis. Chapter 6

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Image Segmentation continued Graph Based Methods

Network Community Detection

An introduction to graph analysis and modeling Descriptive Analysis of Network Data

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

CS 5220: Parallel Graph Algorithms. David Bindel

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization


Scalable Clustering of Signed Networks Using Balance Normalized Cut

Graph drawing in spectral layout

Mining Social-Network Graphs

Community detection. Leonid E. Zhukov

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

TELCOM2125: Network Science and Analysis

Local Fiedler Vector Centrality for Detection of Deep and Overlapping Communities in Networks

Persistent Homology in Complex Network Analysis

Sergei Silvestrov, Christopher Engström. January 29, 2013

Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes

Lecture Note: Computation problems in social. network analysis

Detecting Community Structure for Undirected Big Graphs Based on Random Walks

arxiv: v1 [stat.ml] 2 Nov 2010

Introduction to spectral clustering

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

Finding and Visualizing Graph Clusters Using PageRank Optimization. Fan Chung and Alexander Tsiatas, UCSD WAW 2010

Graph Exploration: Taking the User into the Loop

Content-based Image and Video Retrieval. Image Segmentation

Community Detection in Social Networks

Clustering Algorithms for general similarity measures

Segmentation Computer Vision Spring 2018, Lecture 27

The clustering in general is the task of grouping a set of objects in such a way that objects

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Classic Graph Theory Problems

Community Identification in International Weblogs

Minimum Spanning Trees

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection.

Supplementary material to Epidemic spreading on complex networks with community structures

Social Network Analysis

Graph Theory for Network Science

EE 701 ROBOT VISION. Segmentation

Community Detection Methods using Eigenvectors of Matrices

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

How and what do we see? Segmentation and Grouping. Fundamental Problems. Polyhedral objects. Reducing the combinatorics of pose estimation

CSE 373 NOVEMBER 20 TH TOPOLOGICAL SORT

Review of Graph Theory. Gregory Provan

MATH 567: Mathematical Techniques in Data

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

6.801/866. Segmentation and Line Fitting. T. Darrell

Application of Spectral Clustering Algorithm

Introduction to Engineering Systems, ESD.00. Networks. Lecturers: Professor Joseph Sussman Dr. Afreen Siddiqi TA: Regina Clewlow

APPROXIMATE SPECTRAL LEARNING USING NYSTROM METHOD. Aleksandar Trokicić

16 - Networks and PageRank

Algorithms and Data Structures

Local higher-order graph clustering

Transcription:

Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016

Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides Some other material: https://www.ismll.uni-hildesheim.de/lehre/cmie-11w/script/lecture5.pdf 2

Lecture road Introduction to graph clustering Hierarchical approaches Spectral clustering 3

Networks & Communities We often think of networks being organized into modules, cluster, communities: 4

Goal: Find Densely Linked Clusters 5

Micro-Markets in Sponsored Search Find micro-markets by partitioning the query-to-advertiser graph: query advertiser Andersen, R. and Lang, K.J. Communities from seed sets. WWW, 2006. 6

Movies and Actors Clusters in Movies-to-Actors graph: Andersen, R. and Lang, K.J. Communities from seed sets. WWW, 2006. 7

Twitter & Facebook Discovering social circles, circles of trust: Mcauley, J. and Leskovec, J. Discovering social circles in ego networks. TKDD, 2014. 8

Lecture road Introduction to graph clustering Hierarchical approaches Spectral clustering 9

Community Detection How to find communities? We will work with undirected (unweighted) networks 10

Method 1: Strength of Weak Ties Edge betweenness: Number of shortest paths passing over the edge Intuition: b=16 b=7.5 Edge strengths (call volume) in a real network Edge betweenness in a real network 11

[Girvan-Newman 02] Method 1: Girvan-Newman Divisive hierarchical clustering based on the notion of edge betweenness: Number of shortest paths passing through the edge Girvan-Newman Algorithm: Undirected unweighted networks Repeat until no edges are left: Calculate betweenness of edges Remove edges with highest betweenness Recompute betweenness Connected components are communities Gives a hierarchical decomposition of the network 12

Girvan-Newman: Example 1 12 33 49 Need to re-compute betweenness at every step 13

Girvan-Newman: Example Step 1: Step 2: Step 3: Hierarchical network decomposition: 14

Girvan-Newman: Results Communities in physics collaborations 15

Girvan-Newman: Results Zachary s Karate club: Hierarchical decomposition 16

We need to resolve 2 questions 1. How to compute betweenness? 2. How to select the number of clusters? 17

How to Compute Betweenness? Want to compute betweenness of paths starting at node A Breath first search starting from A: 0 1 2 3 4 18

Betweenness centrality Betweennes is a measure of the brokerage power of a node: the higher the betweenness the higher the capacity of a node of acting as a broker and spreading a particular information Gives also the idea of the capacity of a graph to be easily disconnected Shortest paths from s to t including v betweenness v = * σ,- v σ,-,/0/- Shortest paths from s to t Takes O n 2 Require all the shortest paths! in the worst case: all nodes shortest path computation Defined for nodes 19

How to Compute Betweenness? Count the number of shortest paths from A to all other nodes of the network : Variant of Dijkstra or Floyd-Warshall algorithms 20

How to Compute Betweenness? * σ,- v σ,-,/0/- Compute betweenness by working up the tree: If there are multiple paths count them fractionally The algorithm: Add edge flows: -- node flow = 1+ child edges -- split the flow up based on the parent value Repeat the BFS procedure for each starting node U 1 path to K. Split evenly 1+0.5 paths to J Split 1:2 1+1 paths to H Split evenly 21

How to Compute Betweenness? Compute betweenness by working up the tree: If there are multiple paths count them fractionally The algorithm: Add edge flows: -- node flow = 1+ child edges -- split the flow up based on the parent value Repeat the BFS procedure for each starting node U 1 path to K. Split evenly 1+0.5 paths to J Split 1:2 1+1 paths to H Split evenly 22

Summary of Betweenness computation 1. Build one BFS structures for each node 2. Determine flow values for each edge using the previous procedure and (3 steps) 3. Sum up the flow values of each edge in all BFS structures to get its betweenness value Notice: we are counting the flow between each pair of nodes X and Y twice (once when BFS from X and once when BFS from Y) at the end we divide everything by two 4. Use these betweenness values to identify the edges of highest betweenness for purposes of removing them in the Girvan- Newman method 23

We need to resolve 2 questions 1. How to compute betweenness? 2. How to select the number of clusters? 24

Network Communities Communities: sets of tightly connected nodes Define: Modularity Q A measure of how well a network is partitioned into communities Given a partitioning of the network into groups s Î S: Q sî S [ (# edges within group s) (expected # edges within group s) ] Need a null model! 25

Null Model: Configuration Model Given real G on n nodes and m edges, construct rewired network G Same degree distribution but random connections Consider G as a multigraph (=multiple edges for the same pair of nodes) Question: what is the expected number of edges between two nodes? Imagine to split the edges in nodes + half-edges i The expected number of edges between nodes i and j of degrees k i and k j equals to: k i k j The expected number of edges in (multigraph) G : = 1 k i k j 2 i N j N = 1 1 k 2m 2 2m i j N k j = 1 2m 2m = m 4m i N = = k ik j 2m 2m Note: * k ` = 2m j 26

Modularity Modularity of partitioning S of graph G: Q sî S [ (# edges within group s) (expected # edges within group s) ] Q G, S = 1 A 2m ij k ik j s S i s j s 2m Normalizing cost.: -1<Q<1 Adjacency matrix Modularity values take range [ 1,1] It is positive if the number of edges within groups exceeds the expected number 0. 3 < Q < 0. 7 means significant community structure 27

Reasoning about modularity Can modularity = 1? Answer: NO!!! Range: [ x y, 1) Whatis the modularity of a two completed connected components of n/2 nodes? Can you find an example of a network partitioning with modularity < 0? 28

Modularity: Number of clusters Modularity is useful for selecting the number of clusters: Q Why not optimize Modularity directly? 29

Lecture road Introduction to graph clustering Hierarchical approaches Spectral clustering 30

Graph Partitioning Undirected graph G(V, E): Bi-partitioning task: Divide vertices into two disjoint groups A, B 2 1 3 4 5 6 A 1 2 3 Questions: How can we define a good partition of G? How can we efficiently identify such a partition? 4 5 6 B 31

Graph Partitioning What makes a good partition? Maximize the number of within-group connections Minimize the number of between-group connections 1 5 2 3 4 6 A B 32

Graph Cuts Express partitioning objectives as a function of the edge cut of the partition Cut: Set of edges with only one vertex in a group: A 2 1 3 4 5 6 B cut(a,b) = 2 33

Graph Cut Criterion Criterion: Minimum-cut Minimize weight of connections between groups Degenerate case: argmin A,B cut(a, B) Optimal cut Minimum cut Problem: Only considers external cluster connections Does not consider internal cluster connectivity 34

Graph Cut Criteria NP-hard Criterion: Normalized-cut Connectivity between groups relative to the density of each group vol(a): total weight of the edges with at least one endpoint in A: vol A = n Why use this criterion? n Produces more balanced partitions How do we efficiently find a good partition? Problem: Computing optimal cut is NP-hard i A k i Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. TPAMI, 22(8), 888-905. 35

Spectral Graph Partitioning A: adjacency matrix of undirected G A ij = 1 if (i, j) is an edge, else 0 x is a vector in Rˆ with components (x 1,, x n ) Think of it as a label/value of each node of G What is the meaning of A x? ˆ ŒŽx y = A Œ x Œ = x Œ,Œ Entry y i is a sum of labels x j of neighbors of i 36

What is the meaning of Ax? j th coordinate of A x : Sum of the x-values of neighbors of j Make this a new value at node j A x = λ x Spectral Graph Theory: Analyze the spectrum of matrix representing G Spectrum: Eigenvectors x i of a graph, ordered by the magnitude (strength) of their corresponding eigenvalues λ i : 37

Questions? 38

References Andersen, R. and Lang, K.J. Communities from seed sets. WWW, 2006. Mcauley, J. and Leskovec, J. Discovering social circles in ego networks. TKDD, 2014. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. TPAMI, 22(8), 888-905. Fiedler, M., 1973. Algebraic connectivity of graphs. Czechoslovak mathematical journal, 23(2), pp.298-305. 39