Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Similar documents
Clustering Lecture 5: Mixture Model

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Introduction to Machine Learning CMU-10701

Clustering web search results

Expectation Maximization (EM) and Gaussian Mixture Models

Gaussian Mixture Models method and applications

K-Means Clustering. Sargur Srihari

ALTERNATIVE METHODS FOR CLUSTERING

Mixture Models and the EM Algorithm

Latent Variable Models and Expectation Maximization

Inference and Representation

Note Set 4: Finite Mixture Models and the EM Algorithm

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Expectation Maximization: Inferring model parameters and class labels

Mixture Models and EM

K-Means Clustering 3/3/17

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

IBL and clustering. Relationship of IBL with CBR

Machine Learning. Unsupervised Learning. Manfred Huber

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Expectation Maximization: Inferring model parameters and class labels

Clustering: Classic Methods and Modern Views

COMS 4771 Clustering. Nakul Verma

ECE 5424: Introduction to Machine Learning

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

K-Means and Gaussian Mixture Models

CLUSTERING. JELENA JOVANOVIĆ Web:

CS 6140: Machine Learning Spring 2016

Unsupervised Learning: Clustering

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Mixture Models and EM

Introduction to Mobile Robotics


An Introduction to PDF Estimation and Clustering

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration

LogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16

CS Introduction to Data Mining Instructor: Abdullah Mueen

Clustering. Image segmentation, document clustering, protein class discovery, compression

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Large Scale Data Analysis Using Deep Learning

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Clustering: K-means and Kernel K-means

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Expectation Maximization!

K-means and Hierarchical Clustering

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Machine Learning

Unsupervised Learning

IJSER 1. INTRODUCTION 2. THE GMM FORMULATION BASED ON MRF

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Adaptive Learning of an Accurate Skin-Color Model

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Semi- Supervised Learning

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

The K-modes and Laplacian K-modes algorithms for clustering

Clustering. Shishir K. Shah

Sensor Tasking and Control

Approaches to Clustering

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

CS 229 Midterm Review

ECE 592 Topics in Data Science

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Expectation-Maximization Algorithm and Image Segmentation

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

CS249: ADVANCED DATA MINING

Content-based image and video analysis. Machine learning

Fall 09, Homework 5

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Zero-Inflated Poisson Regression

Supervised vs. Unsupervised Learning

9.1. K-means Clustering

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Comparative Analysis of Unsupervised and Supervised Image Classification Techniques

Unsupervised Learning : Clustering

Generative and discriminative classification techniques

Clustering. Supervised vs. Unsupervised Learning

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

Unsupervised Learning Partitioning Methods

K-Means. Oct Youn-Hee Han

Introduction to Machine Learning

Multiple Model Estimation : The EM Algorithm & Applications

Transcription:

Gaussian Mixture Models For Clustering Data Soft Clustering and the EM Algorithm

K-Means Clustering Input: Observations: xx ii R dd ii {1,., NN} Number of Clusters: kk Output: Cluster Assignments. Cluster Centroids: μμ jj R dd jj {1,., kk}

K-Means Clustering Let zz ii be a binary vector of dimension k associated with each observation. If the ii ttt observation belongs to the jj ttt cluster then zz iijj = 1 and all other components of z are zero. Thus, z can be considered as a cluster label vector associated with each observation. Cluster-3 1 1 Cluster-1 Cluster- 1-of-k representation for cluster assignment.

K-Means Clustering We can now cast k-means as a minimization problem with the objective function: N K J = z nk x n µ k n= 1 k = 1 Squared Distance We need to find zz nnnn and μμ kk that minimize J.

K-Means Clustering Minimizing this function w.r.t zz nnkk : All n points are independent can optimize each one independently. Choose zz nnkk to be 1 for whichever value k gives the minimum value of the squared distance. Assign the current observation to the nearest cluster center. Minimizing this function w.r.t μμ kk : Take the derivative of J w.r.t μμ kk and equate to zero. μμ kk = NN nn=1 zz nnnnxx nn NN nn=1 zz nnnn

Finally! Iterative Algorithm For K-Means: Initialize k centroids. Repeat till convergence: Calculate zz nnnn Update μμ kk

Example Looking for two clusters. Initialize the centroids. (The blue and the red crosses). - -

Example Calculate the cluster assignments. - -

Example Re-calculate the centroids based on the new cluster assignments. - -

New zz nnnn Iteration - - -

New zz nnnn Iteration - New μμ kk New Blue Centroid - - - -

New zz nnnn Iteration - New μμ kk New Blue Centroid - - - Iteration -3 - New Blue Centroid - New Red Centroid -

New zz nnnn Iteration - New μμ kk New Blue Centroid - - - Iteration -3 - Final Results New Blue Centroid - New Red Centroid - - -

Minimizing the Objective Function 9 8 J (Value of the Objective Function) 7 6 5 4 3 1 1 3 4 5 6 7 # iteration (E Step)

Terminology Model Parameters (θθ = {μμ 1.kk }) The centroids. Complete Data (yy = {xx 1..nn, zz 1..nn }) The observations along with the cluster assignments. Incomplete Data (xx 1..nn ) Only the observations. In clustering problems we only have incomplete data and need to find estimates for both θθ and zz 1..nn.

Dissecting the K-Means Algorithm θθ (nnnnnn) Initialize Model Parameters θθ (oooooo) Estimate Cluster Assignments zz nnnn Estimate New Model Parameters There is a cyclic dependency between the model parameters and the cluster assignments. Initially we guess the model parameters. Based on this guess we estimate new cluster assignments. Which in turn impacts the model parameters which are then re-estimated.

Dissecting the K-Means Algorithm θθ (nnnnnn) Initialize Model Parameters θθ (oooooo) Estimate Cluster Assignments zz nnnn Estimate New Model Parameters There is a cyclic dependency between the model parameters and the cluster assignments. Initially we guess the model parameters. Based on this guess we estimate new cluster assignments. Which in turn impacts the model parameters which are then re-estimated. E-Step M-Step

Another Problem K-Means makes hard guesses for cluster assignment. For some cases our model may not be sure about exact cluster assignment. - - Can we make this probabilistic so that zz nnnn defines the probability that the nn ttt observation belongs to the kk ttt cluster?

Another Problem K-Means makes hard guesses for cluster assignment. For some cases our model may not be sure about exact cluster assignment. - - Can we make this probabilistic so that zz nnnn defines the probability that the nn ttt observation belongs to the kk ttt cluster?

Another Problem K-Means makes hard guesses for cluster assignment. For some cases our model may not be sure about exact cluster assignment. - - Can we make this probabilistic so that zz nnnn defines the probability that the nn ttt observation belongs to the kk ttt cluster???

Probabilistic Clustering Lets place a Gaussian centered at each of the means discovered by K-Means (assume we know the covariance). Since we have run the k-means algorithm we have access to complete data i.e. yy = {xx 1..nn, zz 1..nn } The probability of the complete data is: PP XX, ZZ θθ = {ππ kk NN(μμ kk, Σ kk ) } zz nnnn Complete Data Likelihood nn kk

Probabilistic Clustering We don t know the value of Z for our data, they are missing/hidden/latent. Need to get rid of Z to calculate the data likelihood: PP XX θθ = PP(XX, ZZ θθ) ZZ (Marginalize it out) Lets see what happens to our complete data likelihood when we marginalize out Z. PP xx ii, zz ii θθ zz ii = {ππ kk NN(xx ii μμ kk, Σ kk )} zziiii zz ii kk PP xx ii θθ = ππ kk NN(xx ii μμ kk, Σ kk ) kk

Gaussian Mixture Model Data generated from a mixture distribution: PP xx = KK kk=1 ππ kk NN(xx μμ kk, Σ kk ) Linear superposition of k Gaussians. Added constraints: ππ kk 1 and Generating Data: KK kk=1 ππ kk = 1 (Multinomial Distribution). Pick one of the Gaussian randomly with probability ππ kk. Sample the value from the Gaussian centered at μμ kk. Parameters of GMM: θθ = {ππ 1..kk, μμ 1..kk, Σ 1..kk }.

Estimating the Parameters We want to estimate our model parameters such that the probability of the data being generated by the model is maximized. θθ = arg max PP(XX θθ) which is equivalent to: θθ = arg max log (PP XX θθ ) Lets apply this to our incomplete-data likelihood: θ PP XX θθ = ππ kk NN(xx nn μμ kk, Σ kk ) nn θ kk log (PP XX θθ ) = log ππ kk NN(xx nn μμ kk, Σ kk ) nn kk STUCK!!

Estimating the Parameters Make it a bit simpler, assume we know Z. Now we can maximize the complete data log likelihood and estimate the model parameters. PP XX, ZZ θθ = {ππ kk NN(μμ kk, Σ kk ) } zz nnnn nn kk log PP XX, ZZ θθ = zz nnnn log ππ kk + log (NN(xx nn μμ kk, Σ kk )) nn kk Much more easier to work with, the parameters are decoupled and we can maximize easily.

Estimating the Parameters If we maximize the complete data log likelihood we get the following estimates: μμ kk = nn zz nnnnxx nn = nn zz nnnnxx nn nn zz nnnn NN kk Σ kk = 1 NN kk zz nnnn (xx nn μμ kk )(xx nn μμ kk ) TT nn ππ kk = nn zz nnnn NN Are we done?? What about the zz nnnn? we assumed they are known, but they are not!

What about zz nnnn? θθ (nnnnnn) Initialize Model Parameters θθ (oooooo) Estimate Cluster Assignments zz nnnn Estimate New Model Parameters Recall, the game we played while using k-means. Guess the parameters, estimate the zz nnnn!! Fixing the parameters to some values, we now get a distribution over the missing ZZ i.e. PP ZZ XX, θθ. OK! But this is a distribution, how do I get individual values for zz nnnn?

What about zz nnnn? Lets evaluate the expected value of each zz nnnn, under PP ZZ XX, θθ. EE PP(ZZ XX,θθ) zz nnnn = 1 PP zz nnnn = 1 xx nn, θθ kk + PP zz nnnn = xx nn, θθ kk = PP(zz nnnn = 1 xx nn, θθ kk ). Using Bayes Theorem we have: Probability that the k-th component was chosen to generate xx nn PP zz nnnn = 1 xx nn, θθ kk = PP(xx nn zz nnnn = 1, θθ kk ) PP(zz nnnn = 1 θθ kk ) PP(xx nn θθ kk ) Probability of generating xx nn using the k-th component. Incomplete Data Likelihood for xx nn

What about zz nnnn? So, EE PP(ZZ XX,θθ) zz nnnn = ππ kknn(xx nn μμ kk, Σ kk ) ππ jj NN(xx nn μμ jj, Σ jj ) jj = γγ(zz nnnn ) This quantity can be viewed as the responsibility that the kk ttt component takes for explaining the observation xx nn. Finally, we can substitute this value for zz nnnn in our parameter estimates as our best guesses for the values of zz nnnn given our current model parameters.

EM for GMM based clustering 1. Initialize the model parameters θθ (). E-Step: Evaluate the responsibilities using current parameter estimates: γγ zz nnnn = ππ kknn(xx nn μμ kk, Σ kk ) ππ jj NN(xx nn μμ jj, Σ jj ) jj 3. M-Step: Re-estimate the parameters using the current responsibilities: μμ kk = nn γγ zz nnnn xx nn NN kk Σ kk = 1 NN kk γγ zz nnnn (xx nn μμ kk )(xx nn μμ kk ) TT nn ππ kk = nn γγ zz nnnn NN where, NN kk = nn γγ zz nnnn. 4. If convergence criterion is not satisfied go back to step-.

EM for GMM based clustering Convergence Criterion: Check for the change in the values of the parameters. Calculate the incomplete data log likelihood: log (PP XX θθ ) = log ππ kk NN(xx nn μμ kk, Σ kk ) nn kk and if the value on current iteration has not changed from the previous value, or the change is negligible (below a preset tolerance), stop.

Example 1-1 - -3-4 -5 Number of Clusters?? Data sampled from three Gaussians centered at: [-1,-3] [-3,-3] [-4.75,-3] -6-7 -8-6 -5-4 -3 - -1

Example Ground Truth K-Means 1 1-1 -1 - - -3-3 -4-4 -5-5 -6-6 -7-7 -8-6 -5-4 -3 - -1-8 -6-5 -4-3 - -1 Run K-Means with 5 random starting points. Select the solution that has the minimum sum of squared distances.

Example Ground Truth Contour Plot of the final GMM 1 1-1 -1 - - -3-3 -4-4 -5-5 -6-6 -7-7 -8-6 -5-4 -3 - -1-8 -6-5 -4-3 - -1 Soft Clustering using a three component Gaussian Mixture Model with random starting point.

Example Original Means: [-1,-3] [-3,-3] [-4.75,-3] K-Means Centroids: [-1.335,-4.57] [-1.581, -1.6458] [-4.3681, -3.49] Means of the Three Gaussians Discovered by GMM: [-1.6, -.9663] [-.9747, -.991] [-4.7488, -.9717]

EM Algorithm A very powerful method for dealing with probabilistic models that involve latent/missing variables. Each iteration of the EM is guaranteed to maximize the data log likelihood. Guaranteed to converge to a local maxima. Sensitive to starting points. We have applied it to Gaussian Mixture Models, which can model any arbitrary shaped densities. Can be used for data density estimation aside from clustering.