Mixture Models and EM

Similar documents
K-Means Clustering. Sargur Srihari

9.1. K-means Clustering

Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm

Inference and Representation

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Clustering Lecture 5: Mixture Model

Introduction to Machine Learning CMU-10701

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Clustering web search results

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

K-Means and Gaussian Mixture Models

Supervised vs. Unsupervised Learning

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Unsupervised Learning: Clustering

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Clustering. K-means clustering

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

CS 229 Midterm Review

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering. Supervised vs. Unsupervised Learning

COMS 4771 Clustering. Nakul Verma

Machine Learning (BSMC-GA 4439) Wenke Liu

Lecture 2 The k-means clustering problem

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Latent Variable Models and Expectation Maximization

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.

K-Means Clustering 3/3/17

Lecture 3 January 22

Clustering algorithms

Lecture 8: The EM algorithm

Machine Learning : Clustering, Self-Organizing Maps

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Content-based image and video analysis. Machine learning

Machine Learning for OR & FE

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Clustering: Classic Methods and Modern Views

ECE 5424: Introduction to Machine Learning

What to come. There will be a few more topics we will cover on supervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

K-means and Hierarchical Clustering

Network Traffic Measurements and Analysis

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Introduction to Mobile Robotics

Machine Learning (BSMC-GA 4439) Wenke Liu

Geoff McLachlan and Angus Ng. University of Queensland. Schlumberger Chaired Professor Univ. of Texas at Austin. + Chris Bishop

University of Florida CISE department Gator Engineering. Clustering Part 2

IBL and clustering. Relationship of IBL with CBR

Expectation Maximization (EM) and Gaussian Mixture Models

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Clustering: K-means and Kernel K-means

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

Unsupervised Learning : Clustering

Generative and discriminative classification techniques

SGN (4 cr) Chapter 11

Grundlagen der Künstlichen Intelligenz

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Unsupervised Learning

K-Means. Oct Youn-Hee Han

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration

Mixture Models and EM

Intelligent Image and Graphics Processing

Unsupervised Learning

Clustering. Shishir K. Shah

CSE 5243 INTRO. TO DATA MINING

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time


AN ITERATIVE APPROACH TO DECISION TREE TRAINING FOR CONTEXT DEPENDENT SPEECH SYNTHESIS. Xiayu Chen, Yang Zhang, Mark Hasegawa-Johnson

The K-modes and Laplacian K-modes algorithms for clustering

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Learning to Detect Partially Labeled People

Machine Learning

CSSE463: Image Recognition Day 21

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Fisher vector image representation

Segmentation: Clustering, Graph Cut and EM

Clustering and The Expectation-Maximization Algorithm

CS Introduction to Data Mining Instructor: Abdullah Mueen

Unsupervised Learning Partitioning Methods

Machine Learning

Spatial Outlier Detection

Clustering CS 550: Machine Learning

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

[7.3, EA], [9.1, CMB]

Transcription:

Table of Content Chapter 9 Mixture Models and EM -means Clustering Gaussian Mixture Models (GMM) Expectation Maximiation (EM) for Mixture Parameter Estimation Introduction Mixture models allows Complex distribution to be formed from simpler distributions of observed and latent variables Distribution of observed variables alone is obtained by marginaliation A method for clustering data Maximum lielihood estimation in a mixture model is the Expectation Maximiation (EM) -means Clustering Given data set {x 1,..,x N } in D-dimensional Euclidean space Partition into clusters, which is given by One-of- coding. It is also called unsupervised classification. u is the center of th cluster Indicator variable r n {0,1} where =1,.., Describes which of clusters data point x n is assigned to r n = 1, and r nj = 0 for j because of one-of- coding. 1

Sum of squared errors Distortion measure Iterative procedure: -Means Clustering Initialie µ Minimie J w.r.t. r n, eep μ fixed (Expectation) J is the sum of squared distances of each point to the closest cluster centers u The goal is to find values for {r n } and the {μ } so as to minimie J Can be done by an iterative procedure that consists of two optimiation steps w.r.t. r n and μ Assign the nth point to the closest cluster. Minimie J w.r.t. μ, eep r n fixed (Maximiation) - Re-estimate the cluster centers based on current point Assignments until no or little change in either u or r n. Termination of -Means Two phases re-assigning data points to clusters Re-computing means of clusters Done repeatedly until no further change in assignments Since each phase reduces J, convergence is assured May converge to local minimum of J 2

Image Segmentation Goal: partition image into regions based on its intensity each of which has homogeneous visual appearance or corresponds to objects or parts of objects Each pixel is a point in R_G_B space -means clustering is used with a palette of colors Method does not tae into account proximity of different pixels Classify each pixel based on its intensity into clusters, =2,3, and 10 Online -Means Clustering Online version (Robbins-Monro procedure) where η n is a learning rate parameter made to decrease monotonically as more samples are observed Only M-step Dissimilarity Measure Euclidean distance has limitations Inappropriate for categorical labels Cluster means are non-robust to outliers Use more general or robust dissimilarity measure to measure dissimilarity (instead of the Euclidean distance as in -means) between a point and a cluster center. ν(x,x ) and distortion measure N J = r v( x, u ) n= 1 = 1 n n v(,) measures average dissimilarity to all the objects in the cluster, which gives the -medoids algorithm M-step is potentially more complex than for -means 3

Gaussian Mixture Models (GMMs) Allows a complex distribution represented by a combination of simpler distributions, hence providing a richer class of density models than the single Gaussian Parameteried and enhanced clustering Introduce the concept of the latent variable Motivates EM algorithm for ML estimation GMM Formulation To characterie a complex distribution of x, introduce the latent variable is a discreate vector of possible binary states Define joint distribution p(x,)=p(x )p() P(x) is defined through by marginaliing over = p( x, ) = p ( x) p( ) p( x ) the probability of the th element of is 1 p( x) = = 1 π N( x µ, Σ ) 4

GMM Parameter Estimation Given training data X= {x 1,..,x N } and use the ML to estimate the GMM parameters, π,µ, and Σ. The joint lielihood The log joint lielihood L( π, µ, Σ) = p( X π, µ, Σ) = π N( x µ, Σ ) n n= 1 = 1 [ π1, π2,.., π], µ = [ µ 1, µ 2,.., ], Σ= [ Σ1, Σ2,.., Σ] where π = µ N N ln{ ln L( π, µ, Σ) = π N( x µ, Σ ) } n= 1 = 1 n GMM Parameter Estimation Tae the derivatives of the log joint lielihood with respect to π,µ, and Σ and setting them to ero yield respectively π N( x µ, Σ ) γ ( ) = where π N( x µ, Σ ), and it measures the = 1 contribution to x by the th component, but it still depends on the parameters Expectation Maximiation (EM) for GMM Parameter Estimation 5

Relation to -Means Clustering GMM and -means both identify clusters, but they are different GMM gives both cluster center, cluster covariance matrix, and cluster weight, while -means only gives cluster center. -means perform hard point assignment, while GMM performs soft assignment EM Algorithm EM is a general technique for finding maximum lielihood solutions for probabilistic models with latent variables (Dempster 1977) Goal of EM is: find maximum lielihood solutions for models having latent variables or missing data Introduce latent variables in order to represent complicated distribution with simpler components 6

EM Algorithm Let X observed data, Z latent variables, parameters θ. Goal: maximie marginal log-lielihood of observed data Maximiation P(X,Z θ) simple but P(X θ) difficult due to log before the sum. Assume straightforward maximiation for complete data Ln p(x,z θ) but cannot do it since Z is unnown Latent Z is nown only through p(z X,θ ) if θ is nown. We will consider the expected complete data log-lielihood. ln p ( X θ ) = ln{ p( X, Z θ )} EM: algorithm Initialiation: Choose initial set of parameters θ old. E-step: use current parameters θ old to compute p(z X, θ old ) E-step: find the expected complete-data log-lielihood for general θ old old Q( θ, θ ) = p( Z θ, X )ln p( X, Z θ ) M-step: determine θ new by maximiing θ new Chec convergence: stop or = argmax Q( θ, θ θ old ) EM Properties EM defined heuristically can be proved to maximie the lielihood function Proof involves obtaining lower bound on loglielihood function EM guarantees the lielihood improves after each iteration old new Go to E-step θ = θ 7