Modularity CMSC 858L

Similar documents
Community Structure Detection. Amar Chandole Ameya Kabre Atishay Aggarwal

My favorite application using eigenvalues: partitioning and community detection in social networks

Extracting Communities from Networks

TELCOM2125: Network Science and Analysis

Community Structure and Beyond

Social Data Management Communities

Mining Social Network Graphs

Spectral Methods for Network Community Detection and Graph Partitioning

Non Overlapping Communities

MCL. (and other clustering algorithms) 858L

V4 Matrix algorithms and graph partitioning

Discovery of Community Structure in Complex Networks Based on Resistance Distance and Center Nodes

Oh Pott, Oh Pott! or how to detect community structure in complex networks

Community Detection Methods using Eigenvectors of Matrices

Clustering Algorithms on Graphs Community Detection 6CCS3WSN-7CCSMWAL

Overlapping Communities

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Lesson 2 7 Graph Partitioning

Introduction to Machine Learning

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Statistical Physics of Community Detection

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

Community Detection. Community

Clustering CS 550: Machine Learning

CSE 258 Lecture 6. Web Mining and Recommender Systems. Community Detection

Problem Definition. Clustering nonlinearly separable data:

Clustering Algorithms for general similarity measures

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #21: Graph Mining 2

CSE 158 Lecture 6. Web Mining and Recommender Systems. Community Detection

On the Approximability of Modularity Clustering

Social-Network Graphs

SGN (4 cr) Chapter 11

Lecture 19: Graph Partitioning

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Web Structure Mining Community Detection and Evaluation

Online Social Networks and Media. Community detection

An Efficient Algorithm for Community Detection in Complex Networks

Extracting Information from Complex Networks

Clusters and Communities

arxiv: v1 [physics.data-an] 27 Sep 2007

Jure Leskovec, Cornell/Stanford University. Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

CMPSCI 311: Introduction to Algorithms Practice Final Exam

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

CSE 5243 INTRO. TO DATA MINING

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch]

Basics of Network Analysis

Clustering Part 3. Hierarchical Clustering

Network Community Detection

Machine Learning for Data Science (CS4786) Lecture 11

Hierarchical Clustering

1 More stochastic block model. Pr(G θ) G=(V, E) 1.1 Model definition. 1.2 Fitting the model to data. Prof. Aaron Clauset 7 November 2013

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

CS 534: Computer Vision Segmentation and Perceptual Grouping

Clustering: Classic Methods and Modern Views

Maximizing edge-ratio is NP-complete

A Novel Parallel Hierarchical Community Detection Method for Large Networks

Clustering on networks by modularity maximization

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

CSE 5243 INTRO. TO DATA MINING

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Persistent Homology in Complex Network Analysis

Detecting community structure in networks

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

CSE 158. Web Mining and Recommender Systems. Midterm recap

Functions 3.6. Fall Math (Math 1010) M / 13

On Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari)

Behavioral Data Mining. Lecture 18 Clustering

Math 778S Spectral Graph Theory Handout #2: Basic graph theory

Community Detection by Affinity Propagation

Community detection. Leonid E. Zhukov

Cluster Analysis. Ying Shen, SSE, Tongji University

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Network Clustering. Balabhaskar Balasundaram, Sergiy Butenko

This is not the chapter you re looking for [handwave]

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Spectral Graph Multisection Through Orthogonality. Huanyang Zheng and Jie Wu CIS Department, Temple University

INTRODUCTION SECTION 9.1

Hardware-Software Codesign

Community Detection in Social Networks

Unsupervised Learning: Clustering

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Cluster Analysis. Angela Montanari and Laura Anderlucci

Clustering Lecture 3: Hierarchical Methods

Image Segmentation continued Graph Based Methods

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Source. Sink. Chapter 10: Iterative Programming Maximum Flow Problem. CmSc250 Intro to Algorithms

CAIM: Cerca i Anàlisi d Informació Massiva

Identifying network modules

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection.

Community Structure in Graphs

COT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

Segmentation. Bottom Up Segmentation

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

Visual Representations for Machine Learning

1 Large-scale network structures

1 Homophily and assortative mixing

Transcription:

Modularity CMSC 858L

Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look at over-represented functions within a module, e.g.: - If a majority of the proteins within a module have annotation A, predict annotation A for the other proteins in the module. Graph clustering methods - Min Multiway Cut, Graph Summarization, VI-Cut: examples we ve already seen. - Methods often borrowed from other community detection applications.

Modularity

Modularity e ii = % edges in module i e ii = {(u,v) : u V i, v V i, (u,v) E} / E 2 a i = % edges with at least 1 end in module i i ai= {(u,v) : u Vi, (u,v) E} / E Modularity is: Q = k i=1 eii a 2 i probability a random edge would fall into module i probability edge is in module i 3 High modularity more edges within the module that you expect by chance.

Examples 5 6 22 3 27 28 4 7 8 16 21 23 19 12 13 14 3 2 1 10 9 10 20 29 24 5 30 2 11 17 1 26 7 9 25 8 Communities Assigned to a small graph Note: maximizing modularity will find it s own # of clusters 15 6 4 Communities assigned to a random graph 18

Modularity Algorithm #1 Modularity is NP-hard to optimize (Brandes, 2007) Greedy Heuristic: (Newman, 2003) - C = trivial clustering with each node in its own cluster - Repeat: Merge the two clusters that will increase the modularity by the largest amount Stop when all merges would reduce the modularity.

Karate Club (again) Newman-Girvan, 2004 Only 3 is in the wrong community.

Maximizing Modularity via a Spectral Technique

Another View of Modularity normalization Q = 1 4m i,j in same module adjacency matrix A ij k ik j 2m probability a random edge would go between i and j Consider the case of only 2 modules. Let s i = 1 if node i is in module 1; -1 if node i is in module 2 Q = = 1 4m 1 4m i,j i,j A ij k ik j 2m A ij k ik j 2m m = # edges in graph k i = degree(i) (s i s j + 1) s i s j

Goal: Maximize modularity Try to find ±1 vector s that maximizes the modularity. Start with the case above: only two groups. Then show how to extend to 2 groups. Will use some ideas from linear algebra.

Q = = 1 4m 1 i,j 4m st Bs modularity matrix A ij k ik j 2m s i s j s is a {-1,1} membership vector Let u i (i = 1,...,n) be the eigenvectors of matrix B with eigenvalue β i for vector u i. (Assume β 1 β 2 β 3 β 4... β n ) Write s as: where: s = i a i u i a i = u T i s

s = i a i u i a i = u T i s drop the (1/4m) Q = = = = i 1 4m st Bs a i u T i B a j u j i j a i u T i B a j u j i j a i a j u T i Bu j Note: 1. Bu j = β i u j 2. When i j, u it Bu j = 0 because u i u j j Q = i (u T i s) 2 β i

To Maximize Q Q = i (u T i s) 2 β i If we were allowed to choose any s we d pick the one that is parallel to u 1. But: s i must be +1 or -1. This is a severe restriction. So: maximize u 1 s, the projection of s along vector u 1. To do this: choose s i = 1 if u 1 > 0, and s i = -1 if u 1 0.

Subsequent Splits The modularity if this module was split according to s The modularity of module g as it stands now = 1 2 i,j g B ij s i s j + 1 2 B ij i,j g i,j g B ij g B ij = s i s j δ i,j i,j g i,j g k g B ik +1-1

Karate Club Results: Exactly Right (Newman, 2006)

Greedy Improvement Given a partition of the network Repeat: - Find the vertex that would yield the largest modularity increase if it were moved into a different community AND that has not yet been moved - Move the vertex into that new community Return the best partitioning ever observed largest increase might be negative Similar to the Kernighan-Lin graph partitioning heuristic (details in a few slides)

Additional Results Girvan-Newman (betweenness) Newman Spectral Greedy Hierarchical Newman, 2006

Krebs Political Books Nodes = political books; shape = conservative (squares) / liberal (circles) / centrist (triangles) Edges = books frequently bought by the same readers on Amazon.com

Complexes Biological Processes + indicates parameters tuned to maximize precision All GS predictions are Pareto optimal Many unique predictions made by each algorithm

% Modules Enriched A lower % of GS modules are enriched for some annotation, but not indicative of predictive performance. Easy to get legitimate statistical signibicant enrichment.

Summary: Modularity Modularity is widely used as a measure for how good a clustering is. Particularly popular in social network analysis, but used in other contexts as well (e.g. Brain networks). Has a resolution preference: for a given network, will tend to prefer clusters of a particular size. Often this means the clusters are too big. A good example of where a spectral clustering technique can work.