CS 6140: Machine Learning Spring 2016

Size: px

Start display at page:

Download "CS 6140: Machine Learning Spring 2016"

Roderick Oliver Kelley
5 years ago
Views:

1 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage:

2 Logis?cs Exam is next week, March 31! Guideline: hpp:// courses/slides_cs6140_sp16/ exam_guideline.pdf Office hour moved to Wednesday at 4:30pm-5:30pm?

3 Today s Outline Mixture Models Expecta?on Maximiza?on [Some slides are borrowed from Christopher Bishop]

4 K-means Algorithm Goal: represent a data set in terms of K clusters each of which is summarized by a prototype Ini?alize prototypes, then iterate between two phases: E-step: assign each data point to nearest prototype M-step: update prototypes to be the cluster means Simplest version is based on Euclidean distance

5 BCS Summer School, Exeter, 2003 Christopher M. Bishop

6 BCS Summer School, Exeter, 2003 Christopher M. Bishop

7 BCS Summer School, Exeter, 2003 Christopher M. Bishop

8 BCS Summer School, Exeter, 2003 Christopher M. Bishop

9 BCS Summer School, Exeter, 2003 Christopher M. Bishop

10 BCS Summer School, Exeter, 2003 Christopher M. Bishop

11 BCS Summer School, Exeter, 2003 Christopher M. Bishop

12 BCS Summer School, Exeter, 2003 Christopher M. Bishop

13 BCS Summer School, Exeter, 2003 Christopher M. Bishop

14 Responsibili?es Responsibili*es assign data points to clusters such that Example: 5 data points and 3 clusters

15 K-means Cost Func?on data responsibilities prototypes

16 Minimizing the Cost Func?on E-step: minimize w.r.t. assigns each data point to nearest prototype M-step: minimize w.r.t gives each prototype set to the mean of points in that cluster Convergence guaranteed since there is a finite number of possible sebngs for the responsibili?es

17 Limita?ons of K-means Hard assignments of data points to clusters small shie of a data point can flip it to a different cluster Not clear how to choose the value of K Solu?on: replace hard clustering of K-means with soe probabilis?c assignments Represents the probability distribu?on of the data as a Gaussian mixture model

18 The Gaussian Distribu?on Mul?variate Gaussian mean covariance

19 Gaussian Mixtures Linear super-posi?on of Gaussians Normaliza?on and posi?vity require Can interpret the mixing coefficients as prior probabili?es

20 Example: Mixture of 3 Gaussians

21 Contours of Probability Distribu?on

22 Sampling from the Gaussian To generate a data point: first pick one of the components with probability then draw a sample from that component Repeat these two steps for each new data point

23 Synthe?c Data Set

24 Fibng the Gaussian Mixture We wish to invert this process given the data set, find the corresponding parameters: mixing coefficients means Covariances

25 Fibng the Gaussian Mixture We wish to invert this process given the data set, find the corresponding parameters: mixing coefficients means covariances If we knew which component generated each data point, the maximum likelihood solu?on would involve fibng each component to the corresponding cluster Problem: the data set is unlabelled We shall refer to the labels as latent (= hidden) variables

26 Synthe?c Data Set Without Labels

27 Posterior Probabili?es We can think of the mixing coefficients as prior probabili?es for the components For a given value of we can evaluate the corresponding posterior probabili?es, called responsibili*es These are given from Bayes theorem by

28 Posterior Probabili?es (colour coded)

29 Maximum Likelihood for the GMM The log likelihood func?on takes the form Note: sum over components appears inside the log There is no closed form solu?on for maximum likelihood

30 Over-fibng in Gaussian Mixture Models Singulari?es in likelihood func?on when a component collapses onto a data point: then consider Likelihood func?on gets larger as we add more components (and hence parameters) to the model not clear how to choose the number K of components

31 Problems and Solu?ons How to maximize the log likelihood solved by expecta?on-maximiza?on (EM) algorithm How to avoid singulari?es in the likelihood func?on solved by a Bayesian treatment How to choose number K of components also solved by a Bayesian treatment

32 EM Algorithm Informal Deriva?on Let us proceed by simply differen?a?ng the log likelihood Sebng deriva?ve with respect to equal to zero gives giving which is simply the weighted mean of the data.

33 EM Algorithm Informal Deriva?on Similarly for the covariances For mixing coefficients, use a Lagrange mul?plier to give

34 EM Algorithm Informal Deriva?on The solu?ons are not closed form since they are coupled Suggests an itera?ve scheme for solving them: Make ini?al guesses for the parameters Alternate between the following two stages: 1. E-step: evaluate responsibili?es 2. M-step: update parameters using ML results

35 BCS Summer School, Exeter, 2003 Christopher M. Bishop

36 BCS Summer School, Exeter, 2003 Christopher M. Bishop

37 BCS Summer School, Exeter, 2003 Christopher M. Bishop

38 BCS Summer School, Exeter, 2003 Christopher M. Bishop

39 BCS Summer School, Exeter, 2003 Christopher M. Bishop

40 BCS Summer School, Exeter, 2003 Christopher M. Bishop

41 K-means Revisited Consider GMM with common covariances Take limit Responsibili?es become binary Expected complete-data log likelihood becomes

42 EM in General Consider arbitrary distribu?on over the latent variables (p is the true distribu?on) The following decomposi?on always holds where

43 Decomposi?on

on M-step: maximize bound with respect to equivalent to maximizing expected

44 Op?mizing the Bound E-step: maximize with respect to equivalent to minimizing KL divergence sets equal to the posterior distribu?on M-step: maximize bound with respect to equivalent to maximizing expected complete-data log likelihood Each EM cycle must increase incomplete-data likelihood unless already at a (local) maximum

45 E-step

46 M-step

47 Homework Reading Murphy ,

LogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16

LogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16 LogisBcs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and InformaBon Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Exam