Non-exhaustive, Overlapping k-means

Similar documents
Segmentation: Clustering, Graph Cut and EM

COMMUNITY detection is one of the most important

Problem Definition. Clustering nonlinearly separable data:

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

CS 534: Computer Vision Segmentation II Graph Cuts and Image Segmentation

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Low rank methods for optimizing clustering

Community Detection. Community

Scalable Clustering of Signed Networks Using Balance Normalized Cut

PCP and Hardness of Approximation

CS Introduction to Data Mining Instructor: Abdullah Mueen

Clustering: Classic Methods and Modern Views

TELCOM2125: Network Science and Analysis

Spectral Clustering and Community Detection in Labeled Graphs

Local higher-order graph clustering

Visual Representations for Machine Learning

Fast Multiplier Methods to Optimize Non-exhaustive, Overlapping Clustering

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts

Approximation Algorithms: The Primal-Dual Method. My T. Thai

Introduction to Machine Learning

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

The Pre-Image Problem in Kernel Methods

General Instructions. Questions

Clustering. Chapter 10 in Introduction to statistical learning

Mining Social Network Graphs

Generalized trace ratio optimization and applications

Semi-Supervised Clustering with Partial Background Information

Clustering CS 550: Machine Learning

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Introduction to spectral clustering

COSC 6397 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2015.

Matching and Covering

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Algorithms for Data Science

Evolutionary tree reconstruction (Chapter 10)

A Course in Machine Learning

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Introduction to Graph Theory

CHAPTER 4 VORONOI DIAGRAM BASED CLUSTERING ALGORITHMS

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Lecture 10 October 7, 2014

Unsupervised Learning : Clustering

Clustering. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 238

CSE 5243 INTRO. TO DATA MINING

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means

3D Surface Recovery via Deterministic Annealing based Piecewise Linear Surface Fitting Algorithm

Clustering will not be satisfactory if:

Normalized cuts and image segmentation

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Chapter VIII.3: Hierarchical Clustering

Scene Detection Media Mining I

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Computing NodeTrix Representations of Clustered Graphs

K-Means and Gaussian Mixture Models

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Assignment 4 Solutions of graph problems

by conservation of flow, hence the cancelation. Similarly, we have

CSE 5243 INTRO. TO DATA MINING

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection.

Minimum Cost Edge Disjoint Paths

Discriminative Clustering for Image Co-segmentation

Graph Adjacency Matrix Automata Joshua Abbott, Phyllis Z. Chinn, Tyler Evans, Allen J. Stewart Humboldt State University, Arcata, California

PERSONALIZED TAG RECOMMENDATION

Clustering: Overview and K-means algorithm

11/17/2009 Comp 590/Comp Fall

Spectral Clustering on Handwritten Digits Database

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Community detection algorithms survey and overlapping communities. Presented by Sai Ravi Kiran Mallampati

Reflexive Regular Equivalence for Bipartite Data

Convex Optimization / Homework 2, due Oct 3

Traditional clustering fails if:

Unsupervised Learning and Clustering

Ma/CS 6b Class 26: Art Galleries and Politicians

Clustering and Visualisation of Data

Kernels for Structured Data

A Memetic Heuristic for the Co-clustering Problem

Unsupervised Learning and Clustering

Flexible Coloring. Xiaozhou Li a, Atri Rudra b, Ram Swaminathan a. Abstract

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Hierarchical Clustering

Diffusion and Clustering on Large Graphs

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

6 Randomized rounding of semidefinite programs

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

10-701/15-781, Fall 2006, Final

COSC 6339 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2017.

Gene Clustering & Classification

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Rank Minimization over Finite Fields

Transcription:

Non-exhaustive, Overlapping k-means J. J. Whang, I. S. Dhilon, and D. F. Gleich Teresa Lebair University of Maryland, Baltimore County October 29th, 2015 Teresa Lebair UMBC 1/38

Outline Introduction NEO-K-Means Objective Functional NEO-K-Means Algorithm Graph Clustering using NEO-K-Means Experimental Results Conclusions and Future Work Teresa Lebair UMBC 2/38

Introduction The traditional k-means clustering algorithm is exhaustive (every point is put into a cluster), and non-overlapping (no point belongs to more than one cluster). There are many applications in which there are outliers (points do not belong to any cluster) and / or some points belong to multiple clusters. Application examples: Outlier detection in datasets. Popular EachMovie dataset: some movies belong to multiple genres. Biology: clustering genes by function results in overlapping clusters. Teresa Lebair UMBC 3/38

Introduction Many papers on clustering consider either cluster outliers, or overlapping clusters, but not both. This paper is one of the first to consider both cluster outliers and overlapping clusters in a unified manner. One popular application is community detection (e.g., detecting clusters or communities of users in social networks.) In this paper, the objective functional for traditional clustering is modified to produce the NEO-K-Means objective functional. Teresa Lebair UMBC 4/38

Introduction A NEO-K-Means Algorithm is proposed to minimize the NEO-K-Means objective functional. NEO-K-Means is also used to solve graph clustering problems. NEO-K-Means is tested experimentally on both vector and graph data Teresa Lebair UMBC 5/38

Traditional k-means X := {x 1, x 2,..., x n } is a set of data points. We want to partition X into k clusters C 1, C 2,..., C k, i.e., j C j = X and i j = C i C j =. The goal of k-means is to pick the clusters to minimize the sum of the distances of the clusters from the cluster centroids, i.e., solve the minimization problem min {C j } k j=1 k j=1 This is an NP-hard problem. x i C j x i m j 2, where m j := x i C j x i C j However, the traditional k-means algorithm monotonically decreases the objective functional. Teresa Lebair UMBC 6/38

k-means Extension Define { the assignment matrix U = (u ij ) R n k such that 1 if x i C j u ij = 0 otherwise. In the case of traditional disjoint exhaustive clustering, each column of U contains exactly one entry of 1; all other entries are zeros. Hence the trace of U T U is equal to the number of cluster assignments, n. To control the number of cluster assignments, we introduce the constraint that the trace of U T U is equal to n(1 + α), where 0 α (k 1). Teresa Lebair UMBC 7/38

k-means Extension First extend the k-means objective as follows to minimize over assignment matrices U: min U k j=1 i=1 n u ij x i m j 2, where m j = s.t. trace(u T U) = (1 + α)n n i=1 u ijx i n i=1 u. ij Take α << (k 1) to avoid assigning each point as its own cluster. New objective functional allows for both outliers and overlapping clusters. Teresa Lebair UMBC 8/38

Testing k-means extension Tested first extension of the k-means objectve on synthetic data set: Two over-lapping clusters of ordered pairs, with several additional points as outliers. Each cluster generated from a Gaussian distribution. Algorithm similar to k-means is used. α is set to 0.1, the ground-truth value of α. Teresa Lebair UMBC 9/38

Testing k-means extension First extension of k-means fails to recover ground-truth clusters. Too many points are labeled as outliers. Teresa Lebair UMBC 10/38

NEO-K-Means Objective Necessary to control the degree of non-exhaustiveness. Let I denote the indicator function: { 1 if the expression is true I(expression) = 0 otherwise. Let 1 R k denote the vector of all ones. Note that (U1) i denotes the number of clusters to which x i belongs. We update the objective functional by adding a non-exhaustiveness constraint. Teresa Lebair UMBC 11/38

NEO-K-Means Objective The NEO-K-Means Objective is defined as follows: min U k j=1 i=1 n u ij x i m j 2, where m j = s.t. trace(u T U) = (1 + α)n, n i=1 u ijx i n i=1 u. ij n I ((U1) i = 0) βn 0 β << 1 controls the amount of points can be labeled as outliers. i=1 The choice of α = β = 0 recovers the traditional k-means objective. Teresa Lebair UMBC 12/38

Testing NEO-K-Means Objective Tested the NEO-K-Means Objective using the previous synthetic data set. Outcome is much better than that of the previous k-means extension. Teresa Lebair UMBC 13/38

NEO-K-Means Algorithm At most βn data points have no cluster membership = at least n βn data points belong to a cluster. Sketch of NEO-K-Means Algorithm Initialize cluster centroids (use any traditional k-means initialization strategy.) Compute d ij, the distance between each point x i and each cluster C j, for all i = 1,..., n and j = 1,..., k. Sort data points in ascending order by distance to closest cluster. Assign the first n βn data points in the sorted list to their closest clusters. Let C j denote the assignments made by this step. Teresa Lebair UMBC 14/38

NEO-K-Means Algorithm Sketch of NEO-K-Means Algorithm (Continued) Note that k j=1 C j = n βn. Make an additional αn + βn assignments based off of the αn + βn smallest entries of D = (d ij ) not already associated with assignments to C j s. Let Ĉ j denote the assignments made by this step. Each cluster C j is then updated to be C j := C j Ĉ j. Repeat process until objective function is sufficiently small, or the maximum number of iterations is reached. Teresa Lebair UMBC 15/38

NEO-K-Means Algorithm Teresa Lebair UMBC 16/38

NEO-K-Means Algorithm Theorem The NEO-K-Means objective functional monotonically decreases through the application of the NEO-K-Means Algorithm while satisfying the constraints for fixed α and β. Proof. Let J (t) denote the objective at the t-th iterations. Then J (t) = = k j=1 k j=1 x i C (t) j x i C (t+1) j x i m (t) j 2 k j=1 x i m (t) j 2 x i C (t+1) j k j=1 x i C (t+1) j x i m (t) j 2 + k j=1 x i Ĉ (t+1) j x i m (t+1) j 2 = J (t+1). x i m (t) j 2 Teresa Lebair UMBC 17/38

NEO-K-Means Algorithm: Choosing α and β Choosing β: Run traditional k-means. Let d i be the distance between a data point x i and its closest cluster. Compute the mean µ and standard deviation σ for these set of distances. Consider d i and outlier if d i / [µ δσ, µ + δσ] for some fixed δ > 0. (δ = 6 usually leads to reasonable β.) Set β equal to the proportion of d i s that are outliers. Choosing α: Two different strategies can be used to choose α. Teresa Lebair UMBC 18/38

NEO-K-Means Algorithm: Choosing α First strategy for choosing α (for small overlap): For each cluster C j, consider the distances between each x i C j and C j. Compute the mean µ j and standard deviation σ j of the distances. For each x l / C j, compute the distance between x l and C j. If d lj is less than µ j + δσ j, consider x j to be in the overlapped region. Count the points in overlapped regions to estimate α. Teresa Lebair UMBC 19/38

NEO-K-Means Algorithm: Choosing α Second strategy for choosing α (for large overlap): Let d ij denote the distance between the point x i and the cluster C j. Compute the normalized distance between each point x i and cluster C j : d ij := d ij k l=1 d il Count the number of d ij s whose value is less than 1 k+1. Divide by n to obtain α. Note that if a point x i is equidistant from all clusters C l, we have d il = 1 k for all l. Using a threshold of 1 k+1 gives us a stronger bound. Teresa Lebair UMBC 20/38

Weighted Kernel K-Means In kernel k-means, each point is mapped into a higher dimensional feature space via the mapping φ. Additionally, weights ω i 0 can be introduced to differentiate each point s contribution to the objective functional. The weighted kernel k-means objective functional is min {C j } k j=1 k j=1 x i C j ω i φ(x i ) m j 2, where m j = x i C j ω i φ(x i ) x i C j ω i. Teresa Lebair UMBC 21/38

Weighted Kernel NEO-K-Means Can define an analogous weighted kernel NEO-K-Means objective functional: min U k c=1 i=1 n u ic ω i φ(x i ) m c 2, where m c = s.t. trace(u T U) = (1 + α)n, n I ((U1) i = 0) βn i=1 n i=1 u icω i φ(x i ) n i=1 u icω i. This NEO-K-Means extension allows us to consider non-exhaustive overlapping graph clustering / overlapping community detection. Teresa Lebair UMBC 22/38

Graph Clustering Using NEO-K-Means A graph G = (V, E) is a collection of vertices and edges. We can define an adjacency matrix A = (a ij ), such that a ij is equal to the weight of the edge between vertices i and j. Example: Alice 8 Bob Eve 2 1 4 Linda Paul Teresa Lebair UMBC 23/38

Graph Clustering Using NEO-K-Means Alice 8 Bob Eve 2 1 Linda 4 Paul The adjacency matrix is A B L P E A 0 8 2 0 0 B 8 0 0 1 0 A = L 2 0 0 4 0 P 0 1 4 0 0. E 0 0 0 0 0 Teresa Lebair UMBC 24/38

Graph Clustering Using NEO-K-Means We assume that there are no connections between any vertex i and itself = a ii = 0 for all i = 1,..., n. Also assume that graph is unidirected (so A is symmetric). Traditional graph partitioning problem groups vertices into k pairwise disjoint clusters C 1... C k = V. Define the links function between two clusters as the sum of the edge weights between the clusters: links(c j, C l ) := x i1 C j x i2 C l a i1 i 2 Example: If C 1 := {Alice, Bob} and C 2 = { Linda, Paul}, then links(c 1, C 2 ) = a AL + a BL + a AP + a BP = 2 + 0 + 0 + 1 = 3. Teresa Lebair UMBC 25/38

Graph Clustering Using NEO-K-Means The cut (popular measure of evaluating graph partitioning) of a graph G is defined as Cut(G) = k j=1 links(c j, V \ C j ). links(c j, V) The normalized cut of a graph partition is the partition of V that minimizes Cut(G) over all possible partitions, i.e. NCut(G) = min Cut(G). C 1,...,C k Let D be the diagonal matrix such that d ii = n j=1 a ij, i.e., the matrix of vertex degrees. Teresa Lebair UMBC 26/38

Graph Clustering Using NEO-K-Means We can rewrite NCut(G) = min C1,...,C k k j=1 links(c j,v\c j ) links(c j,v) as NCut(G) = min y 1,...,y k k j=1 yj T (D A)y j yj T = max Dy j y 1,...,y k k j=1 yj T Ay j yj T, Dy j where y j denotes the indicator vector for the cluster C j, i.e., y j (i) = 1 if v i C j, and zero otherwise. This traditional normalized cut objective is for disjoint, exhaustive graph clustering. Teresa Lebair UMBC 27/38

Graph Clustering Using NEO-K-Means For NEO-K-Means graph clustering, consider the maximization problem max Y k j=1 yj T Ay j Dy j y T j s.t. trace(y T Y ) = (1 + α)n, n I ((Y 1) i = 0) βn, where Y is an assignment matrix, with the jth column of Y given by y j. i=1 Just as with the vector data, we may adjust α and β to control the degree of non-exhaustiveness. This optimization problem for graph partitioning can be reformulated as weighted kernel NEO-K-Means problem. Teresa Lebair UMBC 28/38

Graph Clustering Using NEO-K-Means Let W R n n be the diagonal matrix such that w ii is equal to the vertex degree/weight for each i = 1,..., n. Let K be a kernel matrix given by K ij = φ(x i )φ(x j ). Finally, let u c be the column of the assignment matrix U. The weighted kernel NEO-K-Means objective can be rewritten as min U min U k c=1 i=1 n u ic ω i φ(x i ) m c 2 = ( k n c=1 i=1 u ic ω i K ii ut c WKWu c u T c Wu c ). Teresa Lebair UMBC 29/38

Graph Clustering Using NEO-K-Means Define the kernel as K = γw 1 + W 1 AW 1, where γ > 0 is chosen so that K is positive definite. Then ) min U ( k n c=1 = min U = max U ( i=1 u ic ω i K ii ut c Au c u T c Wu c γ(1 + α)n k c=1 u T c Au c u T c Wu c. k c=1 u T c Au c u T c Wu c Letting W = D and noting that U = Y demonstrates that the extended normalized cut objective can be formulated as a weighted kernel NEO-K-Means objective. Teresa Lebair UMBC 30/38 )

Graph Clustering Using NEO-K-Means Authors use the following distance function between a vertex v i and a cluster C j : dist(v i, C j ) = 2links(v i, C j ) deg(v i )deg(c j ) +links(c j, C j ) deg(c j ) 2 + γ deg(v i ) γ deg(c j ) deg(v i ) denotes the degree of the vertex v i, and deg(c j ) denotes the sum of the edge weights connecting vertices in C j. The NEO-K-Means Algorithm can then be applied to the graph data using this distance function. Teresa Lebair UMBC 31/38

Experimental Metrics To measure the effectiveness of the NEO-K-Means algorithm, we use the average F 1 score: Define F 1 score of the the ground-truth cluster S i as F 1 (S i ) = 2r S i p Si r Si +p Si, where p Si = C j S i C j, r Si = C j S i S i and j corresponds to the cluster C j which makes F 1 (S i ) as large as possible out of all the clusters. The average F 1 score is then F 1 = 1 S F 1 (S i ), S i S where S is the set of ground-truth clusters. Higher F 1 [0, 1] score indicates better clustering. Teresa Lebair UMBC 32/38

Experimental Results Tested NEO-K-Means Algorithm on several vector data sets. Teresa Lebair UMBC 33/38

Experimental Results Compared effectiveness of NEO-K-Means to that of several other algorithms. NEO-K-Means consistently outperforms the other algorithms for this data set. Teresa Lebair UMBC 34/38

Experimental Results Additionally test NEO-K-Means with large graph data sets: Compare the average normalized cut of each algorithm when applied to large real-world datasets. Teresa Lebair UMBC 35/38

Experimental Results Also consider (i) average F 1 score of different algorithms, and (ii) average normalized cut and F 1 NEO-K-Means for different α s and β s. Teresa Lebair UMBC 36/38

Conclusions and Future Work NEO-K-Means simultaneously considers non-exhaustive and overlapping clusters. New method outperforms state-of-the-art methods in terms of finding ground truth clusters. Conclude that NEO-K-Means is a useful algorithm. Plan to extend this type of clustering to other types of Bregman divergences. Teresa Lebair UMBC 37/38

References J. Shi and J. Malik Normalized Cuts and Image Segmentation. IEE TPAMI, 1997. J. J. Whang, D. F. Gleich, and I. S. Dhilon Overlapping community detection using seed set expansion. CIKM, 2013. J. J. Whang, I. S. Dhilon, and D. F. Gleich Non-exhaustive, Overlapping k-means. SIAM International Conference on Data Mining, pg. 936 944, 2015. http://mulan.sourceforge.net/datasets.html Teresa Lebair UMBC 38/38