Expectation Maximization (EM) and Gaussian Mixture Models

Similar documents
Clustering Lecture 5: Mixture Model

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Introduction to Machine Learning CMU-10701

Supervised vs. Unsupervised Learning

CS 229 Midterm Review

Clustering and The Expectation-Maximization Algorithm

Clustering. Supervised vs. Unsupervised Learning

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Mixture Models and EM

Latent Variable Models and Expectation Maximization

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Inference and Representation

Unsupervised Learning: Clustering

Grundlagen der Künstlichen Intelligenz

K-Means Clustering 3/3/17

Unsupervised Learning

Mixture Models and the EM Algorithm

Introduction to Mobile Robotics

K-Means and Gaussian Mixture Models

IBL and clustering. Relationship of IBL with CBR

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

The Curse of Dimensionality

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Note Set 4: Finite Mixture Models and the EM Algorithm

What is machine learning?

Supervised Learning for Image Segmentation

Linear Methods for Regression and Shrinkage Methods

Bayes Classifiers and Generative Methods

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Last updated January 4, 2012

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

ECE 5424: Introduction to Machine Learning

I How does the formulation (5) serve the purpose of the composite parameterization

Supervised vs unsupervised clustering

Shared Kernel Models for Class Conditional Density Estimation

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Learning from Data Mixture Models

CS Introduction to Data Mining Instructor: Abdullah Mueen

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

9.1. K-means Clustering

Lecture 8: The EM algorithm

Generative and discriminative classification techniques

Expectation Maximization: Inferring model parameters and class labels

ISyE 6416 Basic Statistical Methods Spring 2016 Bonus Project: Big Data Report

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Clustering. Shishir K. Shah

Voxel selection algorithms for fmri

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Missing variable problems

Clustering: Classic Methods and Modern Views

Fisher vector image representation

An Introduction to PDF Estimation and Clustering

Unsupervised Learning

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Introduction to Trajectory Clustering. By YONGLI ZHANG

Clustering web search results

K-Means Clustering. Sargur Srihari

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

Support Vector Machines.

Content-based image and video analysis. Machine learning

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Unsupervised Learning : Clustering

COMS 4771 Clustering. Nakul Verma

Mixture Models and EM

Expectation Maximization: Inferring model parameters and class labels

Learning Alignments from Latent Space Structures

MULTICORE LEARNING ALGORITHM

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

Unsupervised Learning

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

Machine Learning Lecture 3

Machine Learning for OR & FE

CLUSTERING. JELENA JOVANOVIĆ Web:

Network Traffic Measurements and Analysis

Motivation. Technical Background

Based on Raymond J. Mooney s slides

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Support Vector Machines

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Understanding Clustering Supervising the unsupervised

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

1 Case study of SVM (Rob)

Neuroimage Processing

K-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.

Transcription:

Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1

2

3

4

5

6

7

8

Unsupervised Learning Motivation Unsupervised learning aims at finding some patterns or characteristics of the data. It does not need the class attribute. Consider the following data set: 0.39 0.12 0.94 1.67 1.76 2.44 3.72 4.28 4.92 5.53 0.06 0.48 1.01 1.68 1.80 3.25 4.12 4.60 5.28 6.22 Model the density of the data points A simple and common way: single Gaussian model 9

Unsupervised Learning Motivation Unsupervised learning aims at finding some patterns or characteristics of the data. It does not need the class attribute. Consider the following data set: 0.39 0.12 0.94 1.67 1.76 2.44 3.72 4.28 4.92 5.53 0.06 0.48 1.01 1.68 1.80 3.25 4.12 4.60 5.28 6.22 Model the density of the data points A simple and common way: single Gaussian model From histogram of the data points, single Gaussian model is poor 10

Mixture Model Basic Framework Also called clustering Relate to grouping or segmenting a collection of objects into subsets or clusters Within each cluster are more closely related to one another than objects assigned to different clusters Form descriptive statistics to ascertain whether or not the data consists of a set distinct subgroups 11

Mixture Model Basic Framework The mixture model is a probabilistic clustering paradigm. It is a useful tool for density estimation. It can be viewed as a kind of kernel method. Gaussian mixture model: 12

Mixture Model Basic Framework Gaussian mixture model: are mixing proportions, and Each Gaussian density has a mean and covariance matrix Can use any component densities in place of the Gaussian The Gaussian mixture model is by far the most popular 13

Mixture Model Example An example of Gaussian mixture model with 2 components. 50, 5, 0.6 65 2, 0.4 Sample data points generated from the model A 51 A 43 B 62 B 64 A 45 A 42 A 46 A 45 A 45 B 62 A 47 A 52 B 64 A 51 B 65 A 48 A 49 A 46 B 64 A 51 A 52 B 62 A 49 A 48 B 62 A 43 A 40 A 48 B 64 A 51 B 63 A 43 B 65 B 66 B 65 A 46 A 39 B 62 B 64 A 52 B 63 B 64 A 48 B 64 A 48 A 51 A 48 B 64 A 42 A 48 A 41 14

Mixture Model Learning Sample Result Due to the apparent bi modality Single Gaussian distribution would not be appropriate A simple mixture model for density estimation Associated EM algorithm for carrying out maximum likelihood estimation Maximum likelihood fit 15

Mixture Model Learning Two Component Model Two separate underlying regimes instead model as mixture of two normal distributions: where with Generative representation is explicit: generate a with probability Depending on outcome, deliver or 16

Mixture Model Learning Two Component Model Let denote the normal density with parameters Density of : 17

Mixture Model Learning Two Component Model Denote the training data by Fit the model to the data by maximum likelihood, the parameters: Log likelihood based on the training cases: 18

Mixture Model Learning Two Component Model Direct maximization of is quite difficult numerically, because of the sum of terms inside the logarithm Consider unobserved latent variables taking values 0 or 1 if comes from model 2 otherwise, comes from model 1 19

Mixture Model Learning Two Component Model Suppose knew the values of the s the log likelihood: 20

Mixture Model Learning Two Component Model Maximum likelihood estimates: and sample mean and variance for those data with and sample mean and variance for those data with Estimate of would be the proportion of is unknown iterative fashion, substituting for each in its expected value is also called responsibility of model 2 for observation 21

Two Component Mixture Model EM Algorithm EM algorithm for two component Gaussian mixtures: 1. Take initial guesses for the parameters 2. Expectation Step: compute the responsibilities 22

Two Component Mixture Model EM Algorithm 3. Maximization Step: Compute the weighted means and variances,, and the mixing probability 4. Iterate steps 2 and 3 until convergence 23

Two Component Mixture Model EM Algorithm In expectation step do soft assignment of each observation to each model: Current estimates of the parameters are used to assign responsibilities according to the relative density of the training points under each model In maximization step weighted maximumlikelihood fits to update the estimates of the parameters 24

Two Component Mixture Model EM Algorithm Construct initial guesses for and : choose two of the at random Both and set equal to the overall sample variance Mixing proportion can be started at the value 0.5 25

Two Component Mixture Model Example of Running EM Returning to the previous data set 0.39 0.12 0.94 1.67 1.76 2.44 3.72 4.28 4.92 5.53 0.06 0.48 1.01 1.68 1.80 3.25 4.12 4.60 5.28 6.22 The progress of the EM algorithm in maximizing the log likelihood / the maximum likelihood estimate of the proportion of observations in class 2, at selected iterations of the EM procedure Iteration 1 0.485 5 0.493 10 0.523 15 0.544 20 0.546 26

Two Component Mixture Model Example of Running EM The final maximum likelihood estimates:,, The estimated Gaussian mixture density from this procedure (solid red curve), along with the responsibilities (dotted green curve): EM Algorithm 27

Mixture Models Heart Disease Risk Data Set Using Bayes theorem, separate mixture densities in each class lead to flexible models for An application of mixtures to the heart disease risk factor (CHD) study 28

Mixture Models Heart Disease Risk Data Set Using the combined data fit a two component mixture of the form with the (scalars) and not constrained to be equal Fitting via the EM algorithm: procedure does not knowledge of the CHD labels Resulting estimates:,, 29

Mixture Models Heart Disease Risk Data Set Lower left and middle panels: Component densities and Lower right panel: Component densities (orange and blue) along with the estimated mixture density (green) 30

Mixture Models Heart Disease Risk Data Set Mixture model provides an estimate of the probability observation belongs to component : where is Age in the example Suppose threshold each value define 31

Mixture Models Heart Disease Risk Data Set Compare the classification of each observation by CHD and the mixture model: Mixture model 0 1 CHD No 232 70 Yes 76 84 Although did not use the CHD labels, can discover the two CHD subpopulations Error rate: 32

Mixture Models Heart Disease Risk Data Set Linear logistic regression, using CHD as a response: same error rate (32%) when fit to these data using maximum likelihood 33