Clustering Lecture 5: Mixture Model

Similar documents
Clustering Lecture 3: Hierarchical Methods

Introduction to Machine Learning CMU-10701

Clustering Lecture 4: Density-based Methods

CS 229 Midterm Review

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Clustering web search results

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS Introduction to Data Mining Instructor: Abdullah Mueen

Inference and Representation

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Unsupervised Learning: Clustering

Expectation Maximization (EM) and Gaussian Mixture Models

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

Clustering Lecture 9: Other Topics. Jing Gao SUNY Buffalo

K-Means and Gaussian Mixture Models

Clustering: Classic Methods and Modern Views

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Note Set 4: Finite Mixture Models and the EM Algorithm

ALTERNATIVE METHODS FOR CLUSTERING

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Machine Learning. Unsupervised Learning. Manfred Huber

Lecture 8: The EM algorithm

K-Means Clustering. Sargur Srihari

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Grundlagen der Künstlichen Intelligenz

ECE 5424: Introduction to Machine Learning

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Mixture Models and EM

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Machine Learning Lecture 3

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Unsupervised: no target value to predict

An Introduction to PDF Estimation and Clustering

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Introduction to Mobile Robotics

Mixture Models and the EM Algorithm

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

SGN (4 cr) Chapter 11

IBL and clustering. Relationship of IBL with CBR

Clustering and The Expectation-Maximization Algorithm

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Lecture 7: Segmentation. Thursday, Sept 20

COMS 4771 Clustering. Nakul Verma

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Unsupervised Learning

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Clustering. Shishir K. Shah

Methods for Intelligent Systems

Expectation Maximization: Inferring model parameters and class labels

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

K-Means Clustering 3/3/17

Clustering. k-mean clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Warped Mixture Models

Unsupervised Learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Motivation. Technical Background

Machine Learning Lecture 3


Machine Learning Lecture 3

10701 Machine Learning. Clustering

The K-modes and Laplacian K-modes algorithms for clustering

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Clustering algorithms

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Data-Intensive Computing with MapReduce

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

What is machine learning?

Fall 09, Homework 5

Understanding Clustering Supervising the unsupervised

Expectation Maximization!

Lecture 11: Clustering Introduction and Projects Machine Learning

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Lecture 3 January 22

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Content-based image and video analysis. Machine learning

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Energy Minimization for Segmentation in Computer Vision

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Missing Data Analysis for the Employee Dataset

Network Traffic Measurements and Analysis

Latent Variable Models and Expectation Maximization

CS 6140: Machine Learning Spring 2016

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Unsupervised Learning

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Transcription:

Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1

Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering ensemble Clustering in MapReduce Semi-supervised clustering, subspace clustering, co-clustering, etc. 2

Using Probabilistic Models for Clustering Hard vs. soft clustering Hard clustering: Every point belongs to exactly one cluster Soft clustering: Every point belongs to several clusters with certain degrees Probabilistic clustering Each cluster is mathematically represented by a parametric distribution The entire data set is modeled by a mixture of these distributions 3

Gaussian Distribution f(x) Changing μ shifts the distribution left or right σ Changing σ increases or decreases the spread μ x Probability density function f(x) is a function of x given μ and σ 2 1 1 x N( x, ) exp( ( ) 2 2 2 ) 4

Likelihood f(x) Which Gaussian distribution is more likely to generate the data? x Define likelihood as a function of μ and σ given x 1, x 2,, x n n N(, 2 ) i1 x i 5

6 Gaussian Distribution Multivariate Gaussian Log likelihood mean covariance ) ln )) ( ) ( 2 1 ( ), ( ln ), ( 1 1 1 i T i n i n i i x x x N L

Maximum Likelihood Estimate MLE Find model parameters, that maximize log likelihood L(, ) MLE for Gaussian 7

Gaussian Mixture Linear combination of Gaussians where parameters to be estimated 8

Gaussian Mixture To generate a data point: first pick one of the clusters with probability then draw a sample from that cluster distribution 9

Gaussian Mixture Maximize log likelihood Each data point is generated by one of K clusters, a latent variable is associated with each Regard the values of latent variables as missing 10

Expectation-Maximization (EM) Algorithm E-step: for given parameter values we can compute the expected values of the latent variables Note that instead of but we still have 11

Expectation-Maximization (EM) Algorithm M-step: maximize the expected complete log likelihood Parameter update: 12

EM Algorithm Iterate E-step and M-step until the log likelihood of data does not increase any more. Converge to local optimal Need to restart algorithm with different initial guess of parameters (as in K-means) Relation to K-means Consider GMM with common covariance As, two methods coincide 13

14 14

15 15

16 16

17 17

18 18

19 19

K-means vs GMM Objective function Minimize sum of squared error Can be optimized by an EM algorithm E-step: assign points to clusters M-step: optimize cluster centers Performs hard assignment during E-step Assumes spherical clusters with equal probability of a cluster Objective function Maximize log-likelihood EM algorithm E-step: Compute posterior probability of membership M-step: Optimize parameters Perform soft assignment during E-step Can be used for non-spherical clusters Can generate clusters with different probabilities 20

Mixture Model Strengths Give probabilistic cluster assignments Have probabilistic interpretation Can handle clusters with varying sizes, variance etc. Weakness Initialization matters Choose appropriate distributions Overfitting issues 21

Probabilistic clustering Take-away Message Maximum likelihood estimate Gaussian mixture model for clustering EM algorithm that assigns points to clusters and estimates model parameters alternatively Strengths and weakness 22