LogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16

Size: px

Start display at page:

Download "LogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16"

Bertha Evans
5 years ago
Views:

1 LogisBcs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and InformaBon Science Northeastern University Webpage: Exam is next week, March 31! Guideline: hqp:// courses/slides_cs6140_sp16/ exam_guideline.pdf Office hour moved to Wednesday at 4:30pm-5:30pm? Today s Outline Mixture Models ExpectaBon MaximizaBon K-means Algorithm Goal: represent a data set in terms of K clusters each of which is summarized by a prototype IniBalize prototypes, then iterate between two phases: E-step: assign each data point to nearest prototype M-step: update prototypes to be the cluster means Simplest version is based on Euclidean distance [Some slides are borrowed from Christopher Bishop] 1

2 2

3 ResponsibiliBes Responsibili*es assign data points to clusters such that Example: 5 data points and 3 clusters K-means Cost FuncBon data Minimizing the Cost FuncBon E-step: minimize w.r.t. assigns each data point to nearest prototype M-step: minimize w.r.t gives responsibilities prototypes each prototype set to the mean of points in that cluster Convergence guaranteed since there is a finite number of possible secngs for the responsibilibes LimitaBons of K-means Hard assignments of data points to clusters small shif of a data point can flip it to a different cluster Not clear how to choose the value of K SoluBon: replace hard clustering of K-means with sof probabilisbc assignments Represents the probability distribubon of the data as a Gaussian mixture model The Gaussian DistribuBon MulBvariate Gaussian mean covariance 3

4 Gaussian Mixtures Example: Mixture of 3 Gaussians Linear super-posibon of Gaussians NormalizaBon and posibvity require Can interpret the mixing coefficients as prior probabilibes Contours of Probability DistribuBon Sampling from the Gaussian To generate a data point: first pick one of the components with probability then draw a sample from that component Repeat these two steps for each new data point SyntheBc Data Set Ficng the Gaussian Mixture We wish to invert this process given the data set, find the corresponding parameters: mixing coefficients means Covariances 4

Ficng the Gaussian Mixture SyntheBc Data Set Without Labels We wish to invert this process given the data set, find the corresponding parameters: mixing coefficients means covariances If we knew

5 Ficng the Gaussian Mixture SyntheBc Data Set Without Labels We wish to invert this process given the data set, find the corresponding parameters: mixing coefficients means covariances If we knew which component generated each data point, the maximum likelihood solubon would involve ficng each component to the corresponding cluster Problem: the data set is unlabelled We shall refer to the labels as latent (= hidden) variables Posterior ProbabiliBes Posterior ProbabiliBes (colour coded) We can think of the mixing coefficients as prior probabilibes for the components For a given value of we can evaluate the corresponding posterior probabilibes, called responsibili*es These are given from Bayes theorem by Maximum Likelihood for the GMM The log likelihood funcbon takes the form Over-ficng in Gaussian Mixture Models SingulariBes in likelihood funcbon when a component collapses onto a data point: Note: sum over components appears inside the log There is no closed form solubon for maximum likelihood then consider Likelihood funcbon gets larger as we add more components (and hence parameters) to the model not clear how to choose the number K of components 5

Problems and SoluBons How to maximize the log likelihood solved by expectabon-maximizabon (EM) algorithm How to avoid singularibes in the likelihood funcbon solved by a Bayesian treatment How to

6 Problems and SoluBons How to maximize the log likelihood solved by expectabon-maximizabon (EM) algorithm How to avoid singularibes in the likelihood funcbon solved by a Bayesian treatment How to choose number K of components also solved by a Bayesian treatment EM Algorithm Informal DerivaBon Let us proceed by simply differenbabng the log likelihood Secng derivabve with respect to equal to zero gives giving which is simply the weighted mean of the data. EM Algorithm Informal DerivaBon Similarly for the covariances For mixing coefficients, use a Lagrange mulbplier to give EM Algorithm Informal DerivaBon The solubons are not closed form since they are coupled Suggests an iterabve scheme for solving them: Make inibal guesses for the parameters Alternate between the following two stages: 1. E-step: evaluate responsibilibes 2. M-step: update parameters using ML results 6

K-means Revisited Consider GMM with common covariances Take limit ResponsibiliBes become binary EM in General Consider arbitrary distribubon

7 K-means Revisited Consider GMM with common covariances Take limit ResponsibiliBes become binary EM in General Consider arbitrary distribubon over the latent variables (p is the true distribubon) The following decomposibon always holds where Expected complete-data log likelihood becomes 7

to maximizing expected complete-data log likelihood Each EM cycle must increase incomplete-data

8 DecomposiBon OpBmizing the Bound E-step: maximize with respect to equivalent to minimizing KL divergence sets equal to the posterior distribubon M-step: maximize bound with respect to equivalent to maximizing expected complete-data log likelihood Each EM cycle must increase incomplete-data likelihood unless already at a (local) maximum E-step M-step Homework Reading Murphy ,

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Exam