Lecture 11: E-M and MeanShift CAP 5415 Fall 2007
Review on Segmentation by Clustering Each Pixel Data Vector
Example (From Comanciu and Meer)
Review of k-means Let's find three clusters in this data These points could represent RGB triplets in 3D
Review of k-means Begin by guessing where the center of each cluster is
Review of k-means Now assign each point to the closest cluster
Review of k-means Now move each cluster center to the center of the points assigned to it Repeat this process until it converges
Probabilistic Point of View We'll take a generative point of view How to generate a data point: 1) Choose a cluster,z, from (1... N) 2) Sample that point from the distribution associated with that cluster 1D Example
Called a Mixture Model Probability of choosing cluster k Probability of x given the cluster is k z indicates which cluster is chosen or
To make it a Mixture of Gaussians Called a mixing coefficient
Brief Review of Gaussians
Mixture of Gaussians
In Context of Our Previous Model Now, we have means and covariances
How does this help with clustering? Let's think about a different problem first What if we had a set of data points and we wanted to find the parameters of the mixture model? Typical strategy: Optimize parameters to maximize likelihood of the data
Maximizing the likelihood Easy if we knew which cluster each point should belong to But we don't, so we get its probability function by using Bayes Rule
Where this comes from Let's differentiate with respect to \mu_k
EM Algorithm This is called the E-Step M-Step: Using these estimates of maximize the rest of the parameters Lots of interesting math and intuitions that go into this algorithm, that I'm not covering Take Pattern Recognition!
Back to clustering Now we have Can be seen as a soft-clustering
Another Clustering Application
Another Clustering Application In this case, we have a video and we want to segment out what's moving or changing from C. Stauffer and W. Grimson
Easy Solution Average a bunch of frames to get a Background Image Computer the difference between background and foreground
The difficulty with this approach The background changes (From Stauffer and Grimson)
Solution Fit a mixture model to the background I.E. A background pixel could have multiple colors
Can use this to track in surveillance
Suggested Reading Chapter 14, David A. Forsyth and Jean Ponce, Computer Vision: A Modern Approach. Chapter 3, Mubarak Shah, Fundamentals of Computer Vision
Advantages and Disadvantages
Mean-Shift Like EM, this algorithm is built on probabilistic intuitions. To understand EM we had to understand mixture models To understand mean-shift, we need to understand kernel density estimation (Take Pattern Recognition!)
Basics of Kernel Density Estimation Let s say you have a bunch of points drawn from some distribution What s the distribution that generated these points?
Using a Parametric Model Could fit a parametric model (like a Gaussian) Why: Can express distribution with a few number of parameters (like mean and variance) Why not: Limited in flexibility
Non-Parametric Methods We ll focus on kernel-density estimates Basic Idea: Use the data to define the distribution Intuition: If I were to draw more samples from the same probability distribution, then those points would probably be close to the points that I have already drawn Build distribution by putting a little mass of probability around each data-point
Example (From Tappen Thesis)
Formally Kernel Most Common Kernel: Gaussian or Normal Kernel Another way to think about it: Make an image, put 1(or more) wherever you have a sample Convolve with a Gaussian
What is Mean-Shift? The density will have peaks (also called modes) If we started at point and did gradient-ascent, we would end up at one of the modes Cluster based on which mode each point belongs to
Gradient Ascent? Actually, no. A set of iterative steps can be taken that will monotonically converge to a mode No worries about step sizes This is an adaptive gradient ascent (x = yj)
Results
Results
Normalized Cuts Clustering approach based on graphs First some background
Graphs A graph G(V,E) is a triple consisting of a vertex set V(G) an edge set E(G) and a relation that associates with each edge two vertices called its end points. (From Slides by Khurram Shafique)
Connected and Disconnected Graphs A graph G is connected if there is a path from every vertex to every other vertex in G. A graph G that is not connected is called disconnected graph. (From Slides by Khurram Shafique)
Can represent a graph with a matrix a b c e d One Row Per Node (Based on Slides by Khurram Shafique) [ 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 Adjacency Matrix: W ]
Can add weights to edges [ 0 1 3 1 0 4 2 3 4 0 6 7 6 0 1 Weight Matrix: W (Based on Slides by Khurram Shafique) 2 7 1 0 ]
Minimum Cut A cut of a graph G is the set of edges S such that removal of S from G disconnects G. Minimum cut is the cut of minimum weight, where weight of cut <A,B> is given as (Based on Slides by Khurram Shafique)
Minimum Cut There can be more than one minimum cut in a given graph All minimum cuts of a graph can be found in polynomial time1. H. Nagamochi, K. Nishimura and T. Ibaraki, Computing all small cuts in an undirected network. SIAM J. Discrete Math. 10 (1997) 469-481. 1 (Based on Slides by Khurram Shafique)
How does this relate to image segmentation? When we compute the cut, we've divided the graph into two clusters To get a good segmentation, the weight on the edges should represent pixels affinity for being in the same group (Images from Khurram Shafique)
Affinities for Image Segmentation Brightness Features Interpretation: High weight edges for pixels that Have similar intensity Are close to each other
Min-Cut won't work though The minimum-cut will often choose a cut with one small cluster (Image From Shi and Malik)
We need a better criterion Instead of min-cut, we can use the normalized cut Basic Idea: Big clusters will increase assoc(a,v), thus decreasing Ncut(A,B)
Finding the Normalized Cut NP-Hard Problem Can find approximate solution by finding the eigenvector with the second-smallest eigenvalue of this generalized eigenvalue problem That splits the data into two clusters Can recursively partition data to find more clusters Code available on Jianbo Shi's webpage
Results Figure from Normalized cuts and image segmentation, Shi and Malik, 2000
So what if I want to segment my image? Ncuts is a very common solution Mean-shift is also very popular