Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Similar documents
Clustering. Clustering in R d

Clustering COMS 4771

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. Unsupervised Learning

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering Lecture 5: Mixture Model

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Clustering web search results

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Lecture 2 The k-means clustering problem

COMS 4771 Clustering. Nakul Verma

ECE 5424: Introduction to Machine Learning

Unsupervised Learning: Clustering

Clustering gene expression data

Machine learning - HT Clustering

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering and The Expectation-Maximization Algorithm

Lecture 4 Hierarchical clustering

Unsupervised: no target value to predict

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Expectation Maximization: Inferring model parameters and class labels

Clustering CS 550: Machine Learning

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Learning with minimal supervision

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

ALTERNATIVE METHODS FOR CLUSTERING

SGN (4 cr) Chapter 11

Mixture Models and the EM Algorithm

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning. Unsupervised Learning. Manfred Huber

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Note Set 4: Finite Mixture Models and the EM Algorithm

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Clustering: K-means and Kernel K-means

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Data Mining. Clustering. Hamid Beigy. Sharif University of Technology. Fall 1394

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Clustering Part 3. Hierarchical Clustering

IBL and clustering. Relationship of IBL with CBR

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Unsupervised Learning

Content-based image and video analysis. Machine learning

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

CSE 5243 INTRO. TO DATA MINING

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CSE 5243 INTRO. TO DATA MINING

CHAPTER 4: CLUSTER ANALYSIS

University of Florida CISE department Gator Engineering. Clustering Part 2

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)

S(x) = arg min s i 2S kx

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

4. Ad-hoc I: Hierarchical clustering

Inference and Representation

Clustering Color/Intensity. Group together pixels of similar color/intensity.

CSE 347/447: DATA MINING

K-Means Clustering. Sargur Srihari

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Cluster Analysis. Ying Shen, SSE, Tongji University

K-Means Clustering 3/3/17

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Machine Learning (BSMC-GA 4439) Wenke Liu

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)


Cluster analysis. Agnieszka Nowak - Brzezinska

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Introduction to Mobile Robotics

Hierarchical clustering

10-701/15-781, Fall 2006, Final

Hierarchical Clustering

Approaches to Clustering

Data Clustering. Danushka Bollegala

Clustering Techniques

Clustering: Classic Methods and Modern Views

Dynamic Thresholding for Image Analysis

CPSC 340: Machine Learning and Data Mining. Hierarchical Clustering Fall 2016

10701 Machine Learning. Clustering

Introduction to Machine Learning. Xiaojin Zhu

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Clustering Lecture 14

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Transcription:

Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional space. Finding meaningful structure in data Finding salient grouping in data. Widely-used clustering methods The k-means optimization problem 1 K-means and its many variants 2 EM for mixtures of Gaussians 3 Agglomerative hierarchical clustering Input: Points x 1,..., x n R d ; integer k Output: Centers, or representatives, µ 1,..., µ k R d Goal: Minimize average squared distance between points and their nearest representatives: cost(µ 1,..., µ k ) = min j x i µ j 2 Centers carve R d into k convex regions: µ j s region consists of points for which it is the closest center.

Lloyd s k-means algorithm Initialization matters NP-hard optimization problem. Heuristic: k-means algorithm. Initialize centers µ 1,..., µ k in some manner. Repeat until convergence: Assign each point to its closest center. Update each µ j to the mean of the points assigned to it. Each iteration reduces the cost convergence to a local optimum. Initializing the k-means algorithm Typical practice: choose k data points at random as the initial centers. Another common trick: start with extra centers, then prune later. A particularly good initializer: k-means++ Pick a data point x at random as the first center Let C = {x} (centers chosen so far) Repeat until desired number of centers is attained: Pick a data point x at random from the following distribution: Pr(x) dist(x, C) 2, where dist(x, C) = min z C x z Add x to C Representing images using k-means codewords How to represent a collection of images as fixed-length vectors? patch of fixed size Take all l l patches in all images. Extract features for each. Run k-means on this entire collection to get k centers. Now associate any image patch with its nearest center. Represent an image by a histogram over {1, 2,..., k}.

Streaming and online computation Example: sequential k-means Streaming computation: for data sets too large to fit in memory. Make one pass (or maybe a few passes) through the data. On each pass: See data points one at a time, in order. Update models/parameters along the way. There is only enough space to store a tiny fraction of the data, or a perhaps short summary. 1 Set the centers µ 1,..., µ k to the first k data points 2 Set their counts to n 1 = n 2 = = n k = 1 3 Repeat, possibly forever: Get next data point x Let µ j be the center closest to x Update µ j and n j : Online computation: an even more lightweight setup, for data that is continuously being collected. Initialize a model. Repeat forever: See a new data point. Update model if need be. µ j = n jµ j + x n j + 1 and n j = n j + 1 K-means: the good and the bad Mixtures of Gaussians The good: Fast and easy. Effective in quantization. The bad: Geared towards data in which the clusters are spherical, and of roughly the same radius. Each of the k clusters is specified by: a Gaussian distribution P j = N(µ j, Σ j ) a mixing weight π j Is there is a similarly-simple algorithm in which clusters of more general shape are accommodated? Overall distribution over R d : a mixture of Gaussians Pr(x) = π 1 P 1 (x) + + π k P k (x)

The clustering task We are given data x 1,..., x n R d. For any mixture model π 1,..., π k, P 1 = N(µ 1, Σ 1 ),..., P k = N(µ k, Σ k ), Pr (data π 1 P 1 + + π k P k ) n = (π 1 P 1 (x i ) + + π k P k (x i )) = n k ( π j (2π) d/2 exp 1 ) Σ j 1/2 2 (x i µ j ) T Σ 1 j (x i µ j ) j=1 Optimization surface Minimize the negative log-likelihood, k ( ln π j (2π) d/2 exp 1 ) Σ j 1/2 2 (x i µ j ) T Σ 1 j (x i µ j ) j=1 Find the maximum-likelihood mixture of Gaussians: parameters {π j, µ j, Σ j : j = 1... k} maximizing this function. The EM algorithm 1 Initialize π 1,..., π k and P 1 = N(µ 1, Σ 1 ),..., P k = N(µ k, Σ k ). 2 Repeat until convergence: Assign each point x i fractionally between the k clusters: Example Data with 3 clusters, each with 100 points. w ij = Pr(cluster j x i ) = π jp j (x i ) l π lp l (x i ) Update mixing weights, means, and covariances: π j = 1 n µ j = 1 nπ j Σ j = 1 nπ j w ij w ij x i w ij (x i µ j )(x i µ j ) T k-means solution 1

Example Hierarchical clustering Data with 3 clusters, each with 100 points. Choosing the number of clusters (k) is difficult. Often: no single right answer, because of multiscale structure. EM for mixture of Gaussians Hierarchical clustering avoids these problems. Example: gene expression data The single linkage algorithm Start with each point in its own, singleton, cluster Repeat until there is just one cluster: Merge the two clusters with the closest pair of points Disregard singleton clusters

Linkage methods Start with each point in its own, singleton, cluster Repeat until there is just one cluster: Merge the two closest clusters How to measure distance between two clusters C and C? Average linkage Three commonly-used variants: 1 Average pairwise distance between points in the two clusters dist(c, C ) = 1 C C x C x C x x 2 Distance between cluster centers dist(c, C ) = mean(c) mean(c ) Single linkage Complete linkage dist(c, C ) = dist(c, C ) = min x x x C,x C max x x x C,x C 3 Ward s method: the increase in k-means cost occasioned by merging the two clusters dist(c, C ) = C C C + C mean(c) mean(c ) 2