Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1

Image Segmentation Some material for these slides comes from https://www.csd.uwo.ca/courses/cs4487a/ 2

Image Segmentation Is the process of partitioning an image into multiple segments (or sets of pixels) Example application: 3D medical image segmentation Stack boundaries in 2D X ray slices to create 3D models Saves laborious manual tracing of individual X ray slices From: Richard Szeliski, Computer Vision: Algorithms and Applications, 2011 3

Methods Intensity based: Thresholding Edge based: Region growing Clustering K means Superpixels Gaussian Mixture Models Mean shift Graph cuts References: Szeliski book Ch 5 Gonzalez and Woods book Ch 10 4

Methods Intensity based: Thresholding Edge based: Region growing Clustering K means Gaussian Mixture Models Superpixels Mean shift Graph cuts References: Szeliski book Ch 5 Gonzalez and Woods book Ch 10 5

Thresholding Basic segmentation operation: mask(x,y) = 1 if im(x,y) > T mask(x,y) = 0 if im(x,y) < T T is threshold user defined or automatic Same as histogram partitioning

Otsu s Method for Global Thresholding Choose threshold to minimize the variance within groups W P P 2 2 2 1 1 2 2 0.03 Used in Matlab s graythresh function Or equivalently, maximize the variance between groups where 2 2 2 B P1 m1 mg P2 m2 mg k L1 P p, P p 1 i 2 i0 ik1 i 0.025 0.02 0.015 0.01 0.005 0 0 50 100 150 200 250 300 m G is the global mean; m k is the mean of class k 7

Thresholding as Likelihood Ratio Test Example: assume known probability distributions P 1 N( 1, ) r P 2 N( 2, ) 0 I T μ 2 T μ 1 r p Thresholding could be derived as statistical decision: likelihood ratio test : log P ( I 1 P ( I 0 p p ) ) T 1 2 2 P 1 and P 2 are object and background known color models rp r p 0 0 pixel p is object pixel p is background

Problems Overlapping Distributions = I= I obj -I bkg background subtraction P1 N(0, ) P U 2 0 T 255 Threshold intensities at T

Problems Spatial Variation in Lighting Adaptive thresholding can handle some variation

Region growing contrast edges Start with initial set of pixels K (initial seed(s)) Add to pixels p in K their neighbors q if Ip-Iq < T Repeat until nothing changes Method stops at contrast edges

Region growing

What can go wrong with region growing? Region growth may leak through a single weak spot in the boundary

Region growing Sea region leaks into sky due to a weak boundary between them

General Grouping or Clustering (a.k.a. unsupervised learning) Have data points (samples, also called feature vectors, examples, etc. ) x 1,,x n Cluster similar points into groups points are not pre labeled think of clustering as discovering labels sci fi movies horror movies documentaries slides from Olga Veksler

Color Features Represent image pixels as feature vectors x 1,,x n For example, each pixel can be represented as intensity, gives one dimensional feature vectors color, gives three dimensional feature vectors Cluster them into k clusters, i.e. k segments 9 8 9 input image 7 8 4 3 6 2 1 5 3 2 8 7 4 5 2 1 4 9 4 5 3 8 2 4 feature vectors for clustering based on color [9 4 2] [7 3 1] [8 6 8] [8 2 4] [5 8 5] [3 7 2] [9 4 5] [2 9 3] [1 4 4] RGB (or LUV) space clustering

Example: Indexed Storage of Color Images If an image uses 8 bits for each of R,G,B, there are 2^24 possible colors Most images don t use the entire color space of possible values we can get by with fewer We ll use k means clustering to find the reduced set of colors Image using the full color space Image using only 32 discrete colors 19

K means Clustering Probably the most popular clustering algorithm assumes the number of clusters is given k Optimizes the following objective function for variables D i and µ i E k SSE k i1 xd i x i 2 sum of squared errors from cluster center µ i D 1 D 2 2 SSE + + 1 3 D 3

K means Clustering: Objective Function D 1 D 2 2 D 2 D 2 1 1 1 D 3 D 3 3 3 SSE + + SEE + + Good (tight) clustering smaller value of SSE Bad (loose) clustering larger value of SSE

K means Clustering: Algorithm Initialization step 1. pick k cluster centers randomly

K means Clustering: Algorithm Initialization step 1. pick k cluster centers randomly 2. assign each sample to closest center

K means Clustering: Algorithm Initialization step 1. pick k cluster centers randomly 2. assign each sample to closest center Iteration steps 1 1. compute means in each cluster i D i x D i x

K means Clustering: Algorithm Initialization step 1. pick k cluster centers randomly 2. assign each sample to closest center Iteration steps 1. compute means in each cluster 2. re assign each sample to the closest mean Iterate until clusters stop changing i 1 D i x D i x This procedure decreases the value of the objective function E k ( D, ) x k i1 xd i i 2 D block coordinate descent: step 1 optimizes µ, step 2 optimizes D optimization variables ( D 1,..., D k ),..., ) ( 1 k

K means clustering examples k = 3 (random colors are used to better show segments/clusters) k = 5 k = 10

K means clustering examples: Segmentation In this case K means (K=2) automatically finds good threshold (between 2 clusters) μ 2 μ 1 K means finds compact clusters

K means clustering examples: Color Quantization NOTE bias to equal size clusters

clear all close all % Read image RGB = im2double(imread('peppers.png')); RGB = imresize(rgb, 0.5); %RGB = im2double(imread('pears.png')); RGB = imresize(rgb, 0.5); %RGB = im2double(imread('tissue.png')); RGB = imresize(rgb, 0.5); %RGB = im2double(imread('robot.jpg')); RGB = imresize(rgb, 0.15); figure, imshow(rgb); % Convert 3-dimensional array (M,N,3) array to 2D (MxN,3) X = reshape(rgb, [], 3); k=16; % Number of clusters to find % Call kmeans. It returns: % IDX: for each point in X, which cluster (1..k) it was assigned to % C: the k cluster centers [IDX,C] = kmeans(x,k,... 'EmptyAction', 'drop'); % if a cluster becomes empty, drop it % Reshape the index array back to a 2-dimensional image I = reshape(idx, size(rgb,1), size(rgb,2)); % Show the reduced color image figure, imshow(i, C); % Plot pixels in color space figure hold on for i=1:20:size(x,1) plot3(x(i, 1), X(i, 2), X(i, 3),... '.', 'Color', C(IDX(i),:)); end hold off % Also plot cluster centers hold on for i=1:k plot3(c(i,1), C(i,2), C(i,3), 'ro', 'MarkerFaceColor', 'r'); end Colorado hold off School of Mines 32

1.2 1 0.8 0.6 0.4 0.2 0-0.2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 33

Superpixels Superpixel algorithms group pixels by color, but adhere to image boundaries [SLIC superpixels, Achanta et al., PAMI 2011] The feature vector for each pixel is color plus x,y coordinates (ie, a five dimensional vector RGBXY) 34

clear all close all % First make sure the vlfeat code is in the path if exist('vl_slic', 'file')==0 run('c:\users\william\documents\research\vlfeat-0.9.20\toolbox\vl_setup'); end RGB = imread('peppers.png'); VLFeat Example % Convert to LAB color space. LAB = vl_xyz2lab(vl_rgb2xyz(rgb)) ; % Run SLIC superpixel segmentation. regionsize = 30; regularizer = 50.0; segments = vl_slic(single(lab), regionsize, regularizer); segments = int32(segments); If not already loaded, download VLFeat from http://www.vlfeat.org/ % Draw superpixel boundaries on image. Dx = imfilter(segments, [-1, 0; 0 +1]); Dy = imfilter(segments, [0, -1; 1 0]); E = (Dx ~= 0) (Dy ~= 0); E = ~E; R = RGB(:,:,1); G = RGB(:,:,2); B = RGB(:,:,3); Rs = R.* uint8(e); Gs = G.* uint8(e); Bs = B.* uint8(e); RGBs(:,:,1) = Rs; RGBs(:,:,2) = Gs; RGBs(:,:,3) = Bs; figure, imshow(rgbs); 36

Problem with K means From https://en.wikipedia.org/ wiki/expectation%e2%80 %93maximization_algorit hm We need to estimate the covariance as well as the means of the clusters 37

Expectation Maximization K means clustering is an example of a general method called expectation maximization In EM, you have two sets of unknowns: Some parameters (for example, the centroids of the clusters) Some latent, or hidden, variables (for example, the assignment of points to clusters) If you know one of the unknown sets of unknowns, it is easy to compute the other: If we know the assignment of points to clusters, we can compute the cluster centroids If we know the cluster centroids, we can compute the probability of each point to belong to each cluster So we simply alternate these steps Namely, assume that one quantity is known and compute the other Then assume that is known and compute the other Repeat until convergence 38

Gaussian Mixture Models (GMM) The probability distribution is a weighted sum of Gaussians where,σ,,, We want to find parameters (mean of the kth Gaussian) Σ (covariance of the kth Gaussian) (weight of the kth Gaussian) The hidden (latent) variables are (the assignment of point to a Gaussian, 1..K) 39

EM Algorithm for GMM Start with a guess for each,, E step: The are a soft assignment of point i to cluster k 40

EM Algorithm for GMM M step 1 Σ Repeat E step and M step until convergence 41

From Computer vision: models, learning and inference, Simon J.D. Prince, Cambridge University Press 2012 42

Matlab Example Run Matlab function GMMExample_2D.m on course website 43

K means vs. GMM hard assignment to clusters - separates data points into multiple Gaussian blobs only estimates means μ i computationally cheap sensitive to local minima scales to higher dimensions (kernel K means) soft mode searching estimates data distribution with multiple Gaussian modes estimates both mean μ i and (co)variance Σ i for each mode more expensive (EM algorithm, Szeliski Sec.5.3.1) sensitive to local minima does not scale to higher dimensions

Parametric models K means and Mixture of Gaussians use parametric models of the probability density K means uses a superposition of spherically symmetric distributions MOG uses a superposition of Gaussians What if your distribution can t be modeled this way? Where are the clusters in this space? (Comaniciu and Meer 2002) (a) input color image; (b) pixels plotted in L*u*v* space 46

A simple non parametric alternative: mean shift clustering Finds peaks (or modes) in the histogram Does not assume the number of clusters known data points data histogram and its modes clustering

Finding Modes in a Histogram How many modes (major peaks) are there? Easy to see, hard to compute

Mean Shift [Fukunaga and Hostetler 1975, Cheng 1995, Comaniciu & Meer 2002] mode Iterative Mode Search o x x x 1. Initialize random seed, and fixed window 2. Calculate center of gravity x of the window (the mean ) 3. Translate the search window to the mean 4. Repeat Step 2 until convergence

Mean Shift Clustering [Fukunaga and Hostetler 1975, Cheng 1995, Comaniciu & Meer 2002] Assigning points to clusters: Start a separate mean shift mode estimate at every input point and iterate until it reaches a mode. Faster: Do for a sparse set of points; The remaining points can then be classified based on the nearest evolution paths

Mean shift results for segmentation RGB+XY clustering [Comaniciu & Meer 2002] 5D features adding XY helps coherence in the image domain

Mean shift results for segmentation RGB+XY clustering [Comaniciu & Meer 2002]

Mean shift results for segmentation RGB+XY clustering [Comaniciu & Meer 2002] works well for color quantization

Issues: Window size (kernel bandwidth) selection is critical can not be too small or too large indirectly controls the number of clusters (k) different width in RGB and XY parts of the space Color may not be sufficient (e.g. color overlap between object and background) Integrating detailed boundary cues contrast edges explicit shape priors (smoothness, curvature, convexity, atlas)