Expectation-Maximization Algorithm and Image Segmentation

Size: px

Start display at page:

Download "Expectation-Maximization Algorithm and Image Segmentation"

Bernadette Hill
6 years ago
Views:

Expectation-Maximization Algorithm and Image Segmentation Daozheng Chen 1 In computer vision, image segmentation problem is to partition a digital image into multiple parts.

1 Expectation-Maximization Algorithm and Image Segmentation Daozheng Chen 1 In computer vision, image segmentation problem is to partition a digital image into multiple parts. The goal is to change the representation of the image and make it more meaningful and easier to analyze [11]. In this assignment, we will show how an image segmentation algorithm works in a real application. In the Electronic Field Guide (EFG) project, researchers want to segment the leaf region from an image and extract a set of points from the contour to represent the shape of leaf [2]. A leaf image typically contains a single piece of leaf on a surface with rather uniform pattern. This makes the segmentation problem easier. Figure 1 shows an example of leaf images. A matrix of pixels represents a leaf image. Each pixel is a 3 by 1 vector representing the value of red, green, and blue components respectively. Each value is an integer value between 0 to 255. Instead of working on this 3-dimensional data, we transform a pixel in hue, saturation, and value domain (HSV) [7], and get rid of the hue component. Our task is to group this 2-dimensional data into two clusters: the leaf region and non-leaf region. Figure 1. A typical leaf image This is a data clustering problem, Chapter 11 of [10] discusses the popular K- means algorithm. Let us do some review of this algorithm to see what it optimizes and how it works. We have a set X of N data points x i R m and K clusters.

2 2 Each cluster C j has a center c j and each point is assigned to a cluster. We define the distance d i between a point x i to its cluster center as d i = min j x i c j, where j = 1,..., K. Our objective is to find the k cluster centers that minimize the sum of distance between each point and its cluster center. Given an initial guess of centers or computed centers from previous iteration, the K-means algorithm first assigns each data point to a cluster whose center is the closest. Then for each cluster, it updates the cluster center according to the data points assigned to it. It keeps doing these two steps until the centers do not change. Algorithm 11.1 in [10] provides a detailed description of the algorithm. Let us do Challenge 1 to see how it works for our image data set. CHALLENGE 1. For leaf1.jpg, leaf2.jpg, leaf3.jpg, and leaf4.jpg, generate the 2D data points for saturation and values. Use the MATLAB function kmeans to group the data points into two clusters. Display the binary segmentation image. You should use 2-norm to measure distance d i. And use (0.4, 0.6) as the initial mean for the first cluster and (0.6, 0.4) as the initial mean for the second cluster. Each pixel in a binary images is either 0 or 1. If the data of a pixel in the original image belongs to cluster 1 (those pixels whose indices returned by kmeans are 1), set the pixel value to be 0. Otherwise, set it to be 1. MATLAB image processing functions such as imread, imshow, imwrite, and rgb2hsv may be useful to play with the images and do the HSV transformation. For leaf1.jpg, leaf2.jpg, and leaf3.jpg, use MATLAB plot function to generate the scatter plot for (1) all the 2D data points, (2) those data points belonging to cluster 1, and (3) those belonging to cluster 2. In addition, use hist2d and Plot2dHist in MATLAB Central [8] to generate 2D histogram for these three data sets. Only display the plots and histograms for leaf1.jpg for this Challenge, and save those plots for leaf2.jpg and leaf3.jpg for Challenge 4. You do not need to discuss the segmentation quality. We leave this to Challenge 4. Document your program to make it clear to understand, and answer the following questions: (a) Using the scatter plots and 2D histogram for leaf1.jpg, how many clusters can you see? (b) What is the shape of the boundary that separates the clusters? Why it is a shape like that? (c) Discuss the advantages and disadvantages of visualizing our data using scatter plots and 2D histograms. Expectation-Maximization (EM) algorithm K-means algorithm is simple. However, it is easy to get stuck in local optimal. The EM algorithm tends to get stuck less than K-means algorithm. The idea is to assign data points partially to different clusters instead of assigning to only one cluster. To do this partial assignment, we model each cluster using a probabilistic distribution.

3 3 So a data point associates with a cluster with certain probability and it belongs to the cluster with the highest probability in the final assignment [6]. We can use mixture of Gaussian distributions to model this. The mixture model is a weighted sum of K Gaussian distributions. The weights sum up to 1. Let the parameter of jth distribution be θ j and its weight be w j, the probability of a data point x i given this model is p(x i Θ) = K w j p j (x i θ j ), j=1 where Θ = {w 1,..., w K, θ 1,..., θ K }. To do clustering, we want to determine the probability of the cluster C yi for each data point x i given Θ, that is, p(y i x i, Θ). This requires us to know Θ. We can use the common maximum likelihood approach to determine Θ. Assuming that each data point is identical and independently distributed with this mixture model, the log-likelihood of our data set X is N N K log(p (X Θ)) = log( p(x i Θ)) = log( w j p((x i θ j )). However, finding Θ by direct maximization of this function is difficult, and this is the place where EM jumps in [4]. In this algorithm, we suppose our data set X is incomplete, and Y is the set of missing data. Given the old (or initial) model parameter Θ old, we perform the following two steps repeatedly. Expectation Step (E-step): we use Θ old and incomplete data X to express the expected value of log-likelihood of the complete data: E(Θ, Θ old ) = E[log(P (X, Y Θ)) Θ old, X]. j=1 Maximization Step (M-step): we use the expression for E(Θ, Θ old ) from E- step to find the new parameter Θ = Θ new which maximizes our expected value E(Θ, Θ old ). Then we use Θ new as Θ old in the next iteration. The log-likelihood will increase in each iteration and the parameter Θ new from M- step will converge to a local minimum of the log-likelihood for incomplete data X [4]. Therefore, we can keep this iteration until the log-likelihood increases less than some threshold. To apply EM algorithm to our mixture of Gaussians, we let the missing data set Y be the index y i of the cluster to which a data point x i belongs. (So x i belongs to cluster C yi.) Now let us work through Challenge 2 to see how E-step works for our mixture model. (Please refer to the problem on the next page.) After completing Challenge 2, we get a general expression for E(Θ, Θ old ). It turns out that we can simplify it and get K N K N E(Θ, Θ old ) = log(w j )p(j x i, Θ old ) + log(p(x i θ j ))p(j x i, Θ old ), (1) j=1 j=1

4 4 assuming 0 < w j < 1 and p(x i θ j ) > 0 for j = 1,..., K and i = 1,..., N. Let µ new j, Σ new j, and wj new be the mean, varaince, and weight of the jth Gaussian that maximize E(Θ, Θ old ). It can be shown that w new j µ new j = = 1 N N p(j x i, Θ old ) (2) N x ip(j x i, Θ old ) N p(j x i, Θ old ) (3) Σ new j = N p(j x i, Θ old )(x i µ new j )(x i µ new j N p(j x i, Θ old ) ) T (4) Deriving formula 2, 3, and 4 for more than 2 components and multivariate Gaussian distribution requires the use of Lagrange multiplier and knowledge from Vector Calculus [4]. However, let us do Challenge 3 to work out the derivation of a simple case. CHALLENGE 2. Before we go into the problem, we need to introduce two tools to do the analysis. The product rule of probability states p(a, B) = p(a B)p(B). The Bayes s rule says p(a B) = p(b A)p(A) p(b). In this problem, we assume that x i is independently distributed. The same property holds for y i. (a) Using product rule of probability, verify that the log-likelihood of our complete data (X, Y ) is log(p(x, Y Θ)) = N log(w yi p(x i θ yi )). (b) Using Bayes s rule and product rule of probability, verify that p(y i x i, Θ old ) = (c) Let y = [y 1, y 2,..., y N ], we have p(y X, Θ old ) = Using this expression, we can show that wy old i p(x i θy old i ) K j=1 wold j p(x i θj old ). N p(y i x i, Θ old ). E(Θ, Θ old ) = E[log(p(X, Y Θ)) Θ old, X] = y Q log(p(x, y Θ))p(y X, Θ old ), where Q in the right most formula is the domain of y. How many different ys are in Q? Express this using N and K. Is it practical to directly use this summation to evaluate E(Θ, Θ old )?

5 5 CHALLENGE 3. For K = 2 and 1-dimensional Gaussian distribution, verify that formula 2, 3, and 4 are correct. You may assume that you know formula 1. Now let us implement this method and see how it works for our data set. Here is an important note you should be aware of before the implementation. The description of E-step wants to obtain E(Θ, Θ old ). However, in our case, we can work out formulas for the paramters that maximize E(Θ, Θ old ). So in E-step, we only need to compute p(j x i, Θ old ) for j = 1,..., K and i = 1,..., N, which are used in formula 2, 3, and 4 in M-step. CHALLENGE 4. For each image in Challenge 1, write a MATLAB program that uses EM algorithm to do the image segmentation. Keep the same initial mean as that in Challenge 1, and use identity matrix as initial covariance matrix and 0.5 as the initial weight for each component. Set the maximum number of iteration to be 100. Stop the iteration if Lnew L old L old 0.001, where L new is the log-likelihood based on the new parameters, and L old is the log-likelihood based on the old parameter in an iteration. Compute and display the log-likelihood using new parameters in each iteration. As in chanllege 1, generate the binary segmentation image for each image, and produce the similar set of scatter plots and 2D histograms for cluster 1 and 2 for leaf1.jpg, leaf2.jpg and leaf3.jpg. Answer the following questions: (a) For each image, does the log-likelihood increases for each iteration? (b) Compare the scatter plots, 2D histograms and segmentation images for leaf1.jpg using EM with those using K-means, discuss the results. How different are the scatter plots for cluster 1 and 2? How different are the 2D histograms for cluster 1 and 2? How different are the final segmentations? What can you conclude? (c) Compare the results by K-means and EM for leaf2.jpg and leaf3.jpg respectively. Follow the similar approach as that in part (b). In addition, how is the total data distribution different from that in leaf1.jpg? (d) Compare the segmentation images for leaf4.jpg using EM with those using K- means, discuss the results. In what region is EM is bad? Do similar things happen in other images? (e) Based on your discussion for part (b), (c), and (d), discuss advantages and disadvantages of both methods in terms of segmentation quality. Which one is better in general for our images? Relation between K-means and EM Algorithms K-means and EM are very similar. Within one iteration of K-means algorithm, first, we assign each data point to a cluster whose center is the closest; then for each cluster, we update its center according to the data points assigned to it from the previous step.

6 6 POINTER. For background on probability, please refer to a standard text such as [9]. Forsyth and Ponce [6, Chapter 16, 17] give more description of K-means and EM image segmentation. A paper by Malik and his students [3] formulates the image segmentation problem using EM. The EM algorithm was explained and named in a classic paper by Arthur Dempster, Nan Laird, and Donald Rubin [5] in Bilmes [4] gives very detailed description of EM algorithm, and discusses its application to Gaussian Mixture and Hidden Markov Models. The derivation of EM in this project follows from the derivation in this paper. For more information on the EFG project, please see [1] and [2]. See also http: //herbarium.cs.columbia.edu/. For more information on HSV, please refer to the book by Gonzalez and Woods [7]. See also Within one iteration of EM algorithm, first, we compute the probability that a data point comes from a cluster for each data point and each cluster; then for the distribution of each cluster, we update its parameters based on the probabilities from the previous step. Let us define a new set of probabilities ˆp(j x i, Θ old ) for i = 1,..., N and j = 1,..., K. We let { ˆp(j x i, Θ old 1 if j = argmaxs p(s x ) = i, Θ old ); 0 otherwise. (5) Based on this new function, let us do Challenge 5 to see why EM algorithm is a general case of K-means algorithm. CHALLENGE 5. (a) In E-step, if we replace p(j x i, Θ old ) with ˆp(j x i, Θ old ), and use ˆp to do further optimization in M-step. ˆp(j x i, Θ old ) will replace p(j x i, Θ old ) in formulas (2), (3), and (4). What do these new formulas tell you about the parameters computed in M-step? In particular, can you tell what data points are used to update parameters for jth distribution? (b) Using this new probability function ˆp and your discovery in part (a), can you describe an iteration of the new EM algorithm in the sense of a K-means algorithm, like the description at the beginning of this section?

7 Bibliography [1] Gaurav Agarwal, Haibin Ling, David Jacobs, Sameer Shirdhonkar, W. John Kress, Rusty Russell, Peter Belhumeur, An Dixit, Steve Feiner, Dhruv Mahajan, Kalyan Sunkavalli, Ravi Ramamoorthi, and Sean White. First steps toward an electronic field guide for plants. Taxon, 55: , [2] Peter N. Belhumeur, Daozheng Chen, Steven Feiner, David W. Jacobs, W. John Kress, Haibin Ling, Ida Lopez, Ravi Ramamoorthi, Sameer Sheorey, Sean White, and Ling Zhang. Searching the world s herbaria: A system for visual identification of plant species. In David A. Forsyth, Philip H. S. Torr, and Andrew Zisserman, editors, ECCV (4), volume 5305 of Lecture Notes in Computer Science, pages Springer, [3] Serge Belongie, Chad Carson, Hayit Greenspan, and Jitendra Malik. Colorand texture-based image segmentation using em and its application to contentbased image retrieval. pages , [4] Jeff A. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical report, [5] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1 38, [6] David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach. Prentice Hall, August [7] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing (2nd Edition). Prentice Hall, January [8] Kangwon Lee. 2d histogram matrix. [9] Alexander M. Mood, Franklin A. Graybill, and Duane C. Boes. Introduction to the Theory of Statistics. McGraw-Hill Companies, [10] Dianne P. O Leary. Scientific Computing with Case Studies. SIAM Press, Philadelphia,

8 8 Bibliography [11] Linda G. Shapiro, George C. Stockman, Linda G. Shapiro, and George Stockman. Computer Vision. Prentice Hall, January 2001.

Leaf Classification from Boundary Analysis

Leaf Classification from Boundary Analysis Anne Jorstad AMSC 663 Midterm Progress Report Fall 2007 Advisor: Dr. David Jacobs, Computer Science 1 Outline Background, Problem Statement Algorithm Validation