Big-data Clustering: K-means vs K-indicators

Similar documents
Cluster Analysis (b) Lijun Zhang

The K-modes and Laplacian K-modes algorithms for clustering

Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Clustering: Classic Methods and Modern Views

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering: K-means and Kernel K-means

General Instructions. Questions

Introduction to Machine Learning

Clustering Lecture 5: Mixture Model

A Course in Machine Learning

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Large-Scale Face Manifold Learning

CSE 5243 INTRO. TO DATA MINING

Application of Spectral Clustering Algorithm

CSC 411 Lecture 18: Matrix Factorizations

Spectral Clustering on Handwritten Digits Database

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 5243 INTRO. TO DATA MINING

K-Means. Oct Youn-Hee Han

Random projection for non-gaussian mixture models

MSA220 - Statistical Learning for Big Data

Iterative random projections for high-dimensional data clustering

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Behavioral Data Mining. Lecture 18 Clustering

k-center Problems Joey Durham Graphs, Combinatorics and Convex Optimization Reading Group Summer 2008

1 Case study of SVM (Rob)

Machine Learning: Think Big and Parallel

MATH 567: Mathematical Techniques in Data

Scalable Clustering of Signed Networks Using Balance Normalized Cut

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Clustering. Chapter 10 in Introduction to statistical learning

Unsupervised Learning. Pantelis P. Analytis. Introduction. Finding structure in graphs. Clustering analysis. Dimensionality reduction.

[7.3, EA], [9.1, CMB]

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Learning Low-rank Transformations: Algorithms and Applications. Qiang Qiu Guillermo Sapiro

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Unsupervised learning in Vision

Visual Representations for Machine Learning

Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets

COMS 4771 Clustering. Nakul Verma

CS 664 Structure and Motion. Daniel Huttenlocher

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Kapitel 4: Clustering

A Taxonomy of Semi-Supervised Learning Algorithms

Outlier Pursuit: Robust PCA and Collaborative Filtering

Data-driven modeling: A low-rank approximation problem

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

The clustering in general is the task of grouping a set of objects in such a way that objects

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis:midyear Report

Network Traffic Measurements and Analysis

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Dimension Reduction CS534

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Lab # 2 - ACS I Part I - DATA COMPRESSION in IMAGE PROCESSING using SVD

6 Randomized rounding of semidefinite programs

Clustering gene expression data

A GPU-based Approximate SVD Algorithm Blake Foster, Sridhar Mahadevan, Rui Wang

Subspace Clustering with Global Dimension Minimization And Application to Motion Segmentation

Energy Minimization for Segmentation in Computer Vision

Learning a Manifold as an Atlas Supplementary Material

Unsupervised and Semi-Supervised Learning vial 1 -Norm Graph

Convex Optimization. Lijun Zhang Modification of

Detecting Burnscar from Hyperspectral Imagery via Sparse Representation with Low-Rank Interference

Modern Medical Image Analysis 8DC00 Exam

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Unsupervised Learning

Methods for Intelligent Systems

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

k-means A classical clustering algorithm

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve

A Geometric Analysis of Subspace Clustering with Outliers

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Weka ( )

Dimension reduction : PCA and Clustering

DIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIAN EIGENMAPS

K-Means. Algorithmic Methods of Data Mining M. Sc. Data Science Sapienza University of Rome. Carlos Castillo

Towards the world s fastest k-means algorithm

Clustering CS 550: Machine Learning

Parallel and Distributed Sparse Optimization Algorithms

The Pre-Image Problem in Kernel Methods

University of Washington Department of Computer Science and Engineering / Department of Statistics

Lecture 7: Segmentation. Thursday, Sept 20

Machine learning - HT Clustering

Information Retrieval and Web Search Engines

What to come. There will be a few more topics we will cover on supervised learning

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Unsupervised Learning

Decentralized Low-Rank Matrix Completion

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Data-driven modeling: A low-rank approximation problem

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Transcription:

Big-data Clustering: K-means vs K-indicators Yin Zhang Dept. of Computational & Applied Math. Rice University, Houston, Texas, U.S.A. Joint work with Feiyu Chen & Taiping Zhang (CQU), Liwei Xu (UESTC) 2016 International Workshop on Mathematical Issues in Information Sciences Dec. 20, 2016. Shenzhen, China Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 1 / 39

Outline 1 Introduction to Data Clustering 2 K-means: Model & Algorithm 3 K-indicators: Model & Algorithm 4 Numerical experiments Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 2 / 39

Outline 1 Introduction to Data Clustering 2 K-means: Model & Algorithm 3 K-indicators: Model & Algorithm 4 Numerical experiments Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 3 / 39

Partition data points into clusters Figure: Group n data points into k = 4 clusters (n k) Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 3 / 39

Non-globular and globular clusters Figure: Rows 1 2: non-globular. Row 3: Globular Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 4 / 39

Clustering Images Figure: Group images into 2 clusters (cats and dogs) Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 5 / 39

Clustering Images II Figure: Group faces by individuals Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 6 / 39

Formal Description: Input: data objects m i R d, i = 1, 2,, n (estimated) number of clusters k < n a similarity measure ( distance ) Output: Assignments to k clusters: (a) within clusters objects are more similar to each other (b) between clusters objects are less similar to each other A fundamental technique in unsupervised learning Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 7 / 39

Two-Stage Strategy in practice Stage 1: Pre-processing data Transform data to latent features dependent on definition of similarity various data-specific techniques existing e.g.: PCA clustering, spectral clustering 1 Stage 2: Clustering latent features Method of choice: K-means (model 2 + algorithm 3 ) Our focus is on Stage 2 1 See von Luxburg, A tutorial on spectral clustering, 2007 2 MacQueen, Some Methods for classification & Analysis of Multivariate Observations, 1967 3 Lloyd, Least square quantization in PCM, 1957. Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 8 / 39

Pre-processing Spectral Clustering 4 : Given a similarity measure, do: 1 Construct a similarity graph (many options, e.g. KNN) 2 Compute k leading eigenvectors of graph Laplacian 3 Cluster the rows via K-means (i.e., Lloyd algorithm) Dimension Reduction: Principle Component Analysis (PCA) Non-negative Matrix Factorization (NMF) Random Projection (PR) Kernel Tricks: nonlinear change of spaces Afterwards, K-means is applied to do clustering 4 See von Luxburg, A Tutorial on Spectral Clustering, 2007 for references. Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 9 / 39

Outline 1 Introduction to Data Clustering 2 K-means: Model & Algorithm 3 K-indicators: Model & Algorithm 4 Numerical experiments Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 10 / 39

K-means Model Given {m i } n i=1 Rd and k > 0, the default K-means model is min xj n i=1 { } min m i x j 2 j = 1,, k where j-th centroid x j = the mean of the points in j-th cluster. K-means model searches for k centroids in data space is essentially discrete and generally NP-hard is suitable for globular clusters (2-norm) Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 10 / 39

Lloyd Algorithm a.k.a. K-means Algorithm Lloyd Algorithm 5 (or Lloyd-Forgy): Select k points as the initial centroids; while not converged do (1) Assign each point to the closest centroid; (2) Recompute the centroid for each cluster. end while Algorithm converges to a local minimum at O(dkn) per iter. is sensitive to initialization due to greedy nature requires (costly) multiple restarts for consistency 5 Lloyd, Least square quantization in PCM, 1957. Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 11 / 39

(Non-uniform) Random Initialization: K-means++ 6 : 1 Randomly select a data point in M as the first centroid. 2 Randomly select a new centroid in M so that it is probabilistically far away from the existing centroids. 3 Repeat step 2 until k centroids are selected. Advantages: Achieving O(log k) (optimal value) is expected much better than uniformly random in practice K-means Algorithm = K-means++/Lloyd: method of choice 6 Arthur and Vassilvitskii, K-means++: The advantages of careful seeding, 2007. Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 12 / 39

Convex Relaxation Models LP Relaxation 7 = O ( n 2) variables SDP Relaxation 8 = O ( n 2) variables Sum of Norm Relaxation: 9 = O ( n 2) evaluation cost min xi n m i x i 2 + γ i<j i=1 x i x j sparsity promoting Due to high computational cost, fully convexified models/methods have not shaken the dominance of K-means Need to keep the same O(n) per-iteration cost as K-means 7 Awasthi et al., Relax, no need to round: Integrality of clustering formulations, 2015 8 Peng and Xia, Approximating k-means-type clustering via semidefinite programming, 2007 9 Lindsten et al., Just Relax and Come Clustering! A Convexication of k-means..., 2011 Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 13 / 39

Why another new model/algorithm? A Revealing Example Synthetic Data: Generate k centers with mutual distance = 2 Create a cloud around each center within radius < 1 Globular, perfectly separated, ground truth known Experiment: Pre-processing: k principal components Clustering: apply K-means and KindAP (new) k increases from 10 to 150 Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 14 / 39

Recovery Percentage 100 [n, d, k] = [40k, 300, k] 98 96 Accuracy (%) 94 92 90 88 86 Lloyd: radius 0.33 Lloyd: radius 0.66 Lloyd: radius 0.99 KindAP: radius 0.99 0 50 100 150 Number of clusters k Such behavior hinders K-means in big-data applications Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 15 / 39

Speed 10 2 10 1 Lloyd: radius 0.33 Lloyd: radius 0.66 Lloyd: radius 0.99 KindAP: radius 0.99 size [n, d, k] = [40k, 300, k] Time (s) 10 0 10-1 10-2 10-3 0 50 100 150 Number of clusters k KindAP runs faster than Lloyd (1 run) with GPU acceleration Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 16 / 39

Outline 1 Introduction to Data Clustering 2 K-means: Model & Algorithm 3 K-indicators: Model & Algorithm 4 Numerical experiments Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 17 / 39

Ideal Data Ideal K-rays data contains n points lying on k rays in R d. After permutation: h 1 p T 1 h 1 p T h 2 p2 ˆM 1 h 2 p T = =.... 2. } h k pk T {{ } } {{ hk }} pk T {{ } n d n k k d p 1,, p k R d are ray vectors. h i R n i are positive vectors. Each data vector ( ˆM row) is a positive multiple of a ray vector. Ideal K-points data are special cases of ideal K-rays data. Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 17 / 39

k Indicators Indicator matrix Ĥ 0 contains k indicators (orth. columns) [Ĥ] > 0 Point i is on Ray j. ij The j-th column of H is an indicator for Cluster j. E.g., Ĥ R6 2 consists of two indicators Ĥe 1 = 3 2 1 0 0 0, Ĥe 2 = Points 1-3 are in Cluster 1, points 4-6 (identical) in Cluster 2. 0 0 0 2 2 2 Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 18 / 39

Range Relationship For ideal K-rays data, range spaces of ˆM and Ĥ are the same: R( ˆM) = R(Ĥ) Consider data M = ˆM + E with small perturbation E, R(U [k] ) R(M) R( ˆM) = R(Ĥ), where U [k] R n k contains k leading left singular vectors of M. The approximations become exact as E vanishes. Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 19 / 39

Two Choices of subspace distance Minimizing the distance between 2 subspaces: K-means choice: dist(r(u [k] ), R(H)) = (I P R(H) ) U [k] F where P R(H) is the orthogonal projection onto R(H). Our Choice: dist(r(u [k] ), R(H)) = min U U o U H F where U o is the set of all the orthonormal bases for R(U [k] ). Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 20 / 39

Two Models K-means model (a matrix version 10 ): min (I HH T ) U [k] 2 F, s.t. H H... H where the objective is highly non-convex in H. K-indicators model: min U,H U H 2 F, s.t. U U o, H H where the objective is convex, and { U o = U [k] Z R n k : Z T Z = I }, H = K-indicators model looks more tractable { } H R n k + : HT H = I 10 Boutsidis et al., Unsupervised Feature Selection for k-means Clustering Prob., NIPS, 2009 Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 21 / 39

K-indicators Model is Geometric min U,H U H 2 F, s.t. U U o, H H (1) is to find the distance between 2 nonconvex sets U o and H. H is nasty, permitting no easy projection U o is milder, easily projectable Projection onto U o : U T [k] X = PDQT (k by k SVD) P Uo (X) = U [k] (PQ T ). Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 22 / 39

Partial Convexification An intermediate problem: min U N 2 F, s.t. U U o, N N (2) where N = {N R n k N 0} is a closed convex set. Reasons: N is convex, P N (X) = max(0, X). The boundary of N contains H. Local minimum of (2) can be solved by alternating projection. Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 23 / 39

KindAP algorithm 2-level alternating projections algorithmic scheme: Figure: (1) alternating projection (2) proj-like operator (3) projection Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 24 / 39

Uncertainty Information Magnitude of N ij reflects likelihood of point i in cluster j. Let ˆN hold the elements of N sorted row-wise in descending order. Define soft indicator vector: s i = 1 ˆN i2 / ˆN i1 [0, 1], i = 1,..., n, Intuition: The closer is s i to 0, the more uncertainty there is in assignment. The closer is s i to 1, the less uncertainty there is in assignment. Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 25 / 39

Algorithm: KindAP Input: M R n d and integer k (0, min(d, n)] Compute k leading left singular vectors of M to form U k Set U = U k U o while not converged do Starting from U, find a minimizing pair (U, N) argmin dist(u o, N ). Find an H H close to N. U P Uo (H). end while Output: U U o, N N, H H. Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 26 / 39

Outline 1 Introduction to Data Clustering 2 K-means: Model & Algorithm 3 K-indicators: Model & Algorithm 4 Numerical experiments Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 27 / 39

A typical run on synthetic data [d, n, k] = [500, 10000, 100] ***** radius 1.00 ***** Outer 1: 43 duh: 5.08712480e+00 idxchg: 9968 Outer 2: 13 duh: 3.02146591e+00 idxchg: 866 Outer 3: 2 duh: 3.02144864e+00 idxchg: 0 KindAP 100.00% Elapsed time is 1.923610 seconds Kmeans 88.66% Elapsed time is 4.920525 seconds ***** radius 2.00 ***** Outer 1: 42 duh: 5.98082240e+00 idxchg: 9900 Outer 2: 3 duh: 5.55739995e+00 idxchg: 250 Outer 3: 2 duh: 5.55442009e+00 idxchg: 110 Outer 4: 2 duh: 5.55442000e+00 idxchg: 0 KindAP 100.00% Elapsed time is 1.797684 seconds Kmeans 82.73% Elapsed time is 7.909552 seconds Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 27 / 39

Example 1: Three tangential circular clusters in 2-D A: Synthetic data [n,d,k] = [7500,2,3] 1 2 0.8 1 0.6 0.4 0 0.2-1 -2-1 0 1 2 Figure: Points colored according to soft indicator values Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 28 / 39

Example 2: Groups of 40 faces from 4 persons (ORL data) B: ORL data [n, d, k] = [40, 4096, 4] 1 0.8 0.6 Group 1: Mean(s) = 0.98; AC = 100.00% Group 2: Mean(s) = 0.88; AC = 90.00% Group 3: Mean(s) = 0.81; AC = 80.00% Group 4: Mean(s) = 0.77; AC = 70.00% Group 5: Mean(s) = 0.73; AC = 60.00% 0 10 20 30 40 0 Figure: Five soft indicators sorted in a descending order 0.4 0.2 Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 29 / 39

K-rays data Example 15 Original Data 15 Result of Lloyd 15 Result of KindAP 10 10 10 5 5 5 Second Dimension 0-5 Second Dimension 0-5 Second Dimension 0-5 -10-10 -10-15 -20-10 0 10 20 First Dimension -15-20 -10 0 10 20 First Dimension -15-20 -10 0 10 20 First Dimension Figure: Points are colored according to their clusters Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 30 / 39

Kernel data Example Two spirals Mean(s) = 0.99961 10 5 0-5 -10-5 0 5 Cluster in cluster Mean(s) = 0.99586 Mean(s) = 0.99964 5 0-5 -5 0 5 Corners 10 5 0-5 -10-10 0 10 Half-kernel Mean(s) = 0.99999 20 10 0-10 -20-20 -10 0 10 Crescent & Full Moon 10 0-10 -20 Mean(s) = 1-10 0 10 Outlier 20 0-20 Mean(s) = 1-20 0 20 Figure: Non-globular clusters in 2-D Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 31 / 39

Real Examples: two popular datasets with large k COIL image dataset k = 100 objects, each having 72 different images n = 7200 images d = 1024 pixels (image size: 32 32) TDT2 document dataset k = 96 different usenet newsgroups n = 10212 documents d = 36771 words Pre-processing: KNN + Normalized Laplacian + Spectral Clustering Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 32 / 39

Algorithms Tested 3 Algorithms Compared KindAP KindAP+ L: Run 1 Lloyd starting from KindAP centers Lloyd10000: K-means with 10000 replications K-Means Code kmeans in Matlab Statistics and Machine Learning Toolbox (R2016a with GPU acceleration) Computer A desktop with Intel Core i7-3770 CPU at 3.4GHz/8GB RAM Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 33 / 39

K-means Objective Value 16 14 A: TDT2 [n,d,k] = [10212,36771,96] KindAP KindAP+L Lloyd(10,000 runs) 18 16 B: COIL100 [n,d,k] = [72k,1024,k] KindAP KindAP+L Lloyd(10,000 runs) 12 14 Square Error 10 8 Square Error 12 10 6 8 4 6 2 30 40 50 60 70 80 Number of Clusters k 4 30 40 50 60 70 80 90 100 Number of Clusters k Figure: A: KindAP+L best B: KindAP+L best Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 34 / 39

Clustering Accuracy 92 C: TDT2 [n,d,k] = [10212,36771,96] 70 D: COIL100 [n,d,k] = [72k,1024,k] 90 68 66 88 64 Accuracy (%) 86 84 82 Accuracy (%) 62 60 58 80 78 KindAP KindAP+L Lloyd(10,000 runs) 76 30 40 50 60 70 80 Number of Clusters k 56 54 52 KindAP KindAP+L Lloyd(10,000 runs) 50 30 40 50 60 70 80 90 100 Number of Clusters k Figure: A: KindAP+L best B: KindAP best K-means (K-indicators) model fits TDT2 (COIL100) better Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 35 / 39

Running Time (average time for Lloyd) 1.2 1 E: TDT2 [n,d,k] = [10212,36771,96] KindAP KindAP+L Lloyd(average time) 1.2 1 F: COIL100 [n,d,k] = [72k,1024,k] KindAP KindAP+L Lloyd(average time) 0.8 0.8 Time (s) 0.6 Time (s) 0.6 0.4 0.4 0.2 0.2 0 30 40 50 60 70 80 Number of Clusters k 0 30 40 50 60 70 80 90 100 Number of Clusters k Figure: 1 KindAP run 1 Lloyd run Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 36 / 39

Visualization on Yale Face Dataset: n = 165, k = 16 (k = 15) M M k = U k Σ k V T k U k KindAP (U, N, H) W = M T k U Figure: A: V k. B: W. C-E: face V k c 1 = Wc 2 c 1 : a row of Σ k U k c 2 : a row of U ( face in cluster 16) Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 37 / 39

Summary Contribution: Property K-indicators K-means parameter-free yes yes O(n) cost per iter. yes yes non-greedy yes no not need replications yes no suitable for big-k data yes no posterior info available yes no enhanced infrastructure for unsupervised learning Further Work: global optimality under favorable conditions (already proven for ideal data) Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 38 / 39

Thank you! Chen, Xu, Zhang, Zhang Big-data Clustering: K-means vs K-indicators Dec. 20, 2016 39 / 39