Lecture 11: E-M and MeanShift. CAP 5415 Fall PDF Free Download

Lecture 11: E-M and MeanShift CAP 5415 Fall 2007

Review on Segmentation by Clustering Each Pixel Data Vector

Example (From Comanciu and Meer)

Review of k-means Let's find three clusters in this data These points could represent RGB triplets in 3D

Review of k-means Begin by guessing where the center of each cluster is

Review of k-means Now assign each point to the closest cluster

Review of k-means Now move each cluster center to the center of the points assigned to it Repeat this process until it converges

Probabilistic Point of View We'll take a generative point of view How to generate a data point: 1) Choose a cluster,z, from (1... N) 2) Sample that point from the distribution associated with that cluster 1D Example

Called a Mixture Model Probability of choosing cluster k Probability of x given the cluster is k z indicates which cluster is chosen or

To make it a Mixture of Gaussians Called a mixing coefficient

Brief Review of Gaussians

Mixture of Gaussians

In Context of Our Previous Model Now, we have means and covariances

How does this help with clustering? Let's think about a different problem first What if we had a set of data points and we wanted to find the parameters of the mixture model? Typical strategy: Optimize parameters to maximize likelihood of the data

Maximizing the likelihood Easy if we knew which cluster each point should belong to But we don't, so we get its probability function by using Bayes Rule

Where this comes from Let's differentiate with respect to \mu_k

EM Algorithm This is called the E-Step M-Step: Using these estimates of maximize the rest of the parameters Lots of interesting math and intuitions that go into this algorithm, that I'm not covering Take Pattern Recognition!

Back to clustering Now we have Can be seen as a soft-clustering

Another Clustering Application

Another Clustering Application In this case, we have a video and we want to segment out what's moving or changing from C. Stauffer and W. Grimson

Easy Solution Average a bunch of frames to get a Background Image Computer the difference between background and foreground

The difficulty with this approach The background changes (From Stauffer and Grimson)

Solution Fit a mixture model to the background I.E. A background pixel could have multiple colors

Can use this to track in surveillance

Suggested Reading Chapter 14, David A. Forsyth and Jean Ponce, Computer Vision: A Modern Approach. Chapter 3, Mubarak Shah, Fundamentals of Computer Vision

Advantages and Disadvantages

Mean-Shift Like EM, this algorithm is built on probabilistic intuitions. To understand EM we had to understand mixture models To understand mean-shift, we need to understand kernel density estimation (Take Pattern Recognition!)

Basics of Kernel Density Estimation Let s say you have a bunch of points drawn from some distribution What s the distribution that generated these points?

Using a Parametric Model Could fit a parametric model (like a Gaussian) Why: Can express distribution with a few number of parameters (like mean and variance) Why not: Limited in flexibility

Non-Parametric Methods We ll focus on kernel-density estimates Basic Idea: Use the data to define the distribution Intuition: If I were to draw more samples from the same probability distribution, then those points would probably be close to the points that I have already drawn Build distribution by putting a little mass of probability around each data-point

Example (From Tappen Thesis)

Formally Kernel Most Common Kernel: Gaussian or Normal Kernel Another way to think about it: Make an image, put 1(or more) wherever you have a sample Convolve with a Gaussian

What is Mean-Shift? The density will have peaks (also called modes) If we started at point and did gradient-ascent, we would end up at one of the modes Cluster based on which mode each point belongs to

Gradient Ascent? Actually, no. A set of iterative steps can be taken that will monotonically converge to a mode No worries about step sizes This is an adaptive gradient ascent (x = yj)

Results

Normalized Cuts Clustering approach based on graphs First some background

Graphs A graph G(V,E) is a triple consisting of a vertex set V(G) an edge set E(G) and a relation that associates with each edge two vertices called its end points. (From Slides by Khurram Shafique)

Connected and Disconnected Graphs A graph G is connected if there is a path from every vertex to every other vertex in G. A graph G that is not connected is called disconnected graph. (From Slides by Khurram Shafique)

Can represent a graph with a matrix a b c e d One Row Per Node (Based on Slides by Khurram Shafique) [ 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 Adjacency Matrix: W ]

Can add weights to edges [ 0 1 3 1 0 4 2 3 4 0 6 7 6 0 1 Weight Matrix: W (Based on Slides by Khurram Shafique) 2 7 1 0 ]

Minimum Cut A cut of a graph G is the set of edges S such that removal of S from G disconnects G. Minimum cut is the cut of minimum weight, where weight of cut <A,B> is given as (Based on Slides by Khurram Shafique)

Minimum Cut There can be more than one minimum cut in a given graph All minimum cuts of a graph can be found in polynomial time1. H. Nagamochi, K. Nishimura and T. Ibaraki, Computing all small cuts in an undirected network. SIAM J. Discrete Math. 10 (1997) 469-481. 1 (Based on Slides by Khurram Shafique)

How does this relate to image segmentation? When we compute the cut, we've divided the graph into two clusters To get a good segmentation, the weight on the edges should represent pixels affinity for being in the same group (Images from Khurram Shafique)

Affinities for Image Segmentation Brightness Features Interpretation: High weight edges for pixels that Have similar intensity Are close to each other

Min-Cut won't work though The minimum-cut will often choose a cut with one small cluster (Image From Shi and Malik)

We need a better criterion Instead of min-cut, we can use the normalized cut Basic Idea: Big clusters will increase assoc(a,v), thus decreasing Ncut(A,B)

Finding the Normalized Cut NP-Hard Problem Can find approximate solution by finding the eigenvector with the second-smallest eigenvalue of this generalized eigenvalue problem That splits the data into two clusters Can recursively partition data to find more clusters Code available on Jianbo Shi's webpage

Results Figure from Normalized cuts and image segmentation, Shi and Malik, 2000

So what if I want to segment my image? Ncuts is a very common solution Mean-shift is also very popular

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007