Segmentation: Clustering, Graph Cut and EM

Similar documents
CS 534: Computer Vision Segmentation II Graph Cuts and Image Segmentation

6.801/866. Segmentation and Line Fitting. T. Darrell

The goals of segmentation

SGN (4 cr) Chapter 11

Image Segmentation continued Graph Based Methods

Unsupervised Learning

Targil 12 : Image Segmentation. Image segmentation. Why do we need it? Image segmentation

Visual Representations for Machine Learning

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

CS Introduction to Data Mining Instructor: Abdullah Mueen

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering: Classic Methods and Modern Views

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

CS 534: Computer Vision Segmentation and Perceptual Grouping

Lecture 7: Segmentation. Thursday, Sept 20

Segmentation and low-level grouping.

Announcements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class

Fitting D.A. Forsyth, CS 543

Clustering Lecture 5: Mixture Model

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Segmentation. Bottom Up Segmentation

Non-exhaustive, Overlapping k-means

Mixture Models and the EM Algorithm

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection.

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003

Introduction to Machine Learning

Unsupervised Learning

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Spectral Clustering on Handwritten Digits Database

Image Segmentation continued Graph Based Methods. Some slides: courtesy of O. Capms, Penn State, J.Ponce and D. Fortsyth, Computer Vision Book

Random projection for non-gaussian mixture models

Multiple cosegmentation

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Multiple Model Estimation : The EM Algorithm & Applications

Normalized cuts and image segmentation

EE 701 ROBOT VISION. Segmentation

Note Set 4: Finite Mixture Models and the EM Algorithm

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Clustering. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 238

Machine Learning for OR & FE

K-means and Hierarchical Clustering

human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC

Content-based image and video analysis. Machine learning

Lecture 8: The EM algorithm

Introduction to spectral clustering

Image Analysis - Lecture 5

IMAGE RESTORATION VIA EFFICIENT GAUSSIAN MIXTURE MODEL LEARNING

Clustering web search results

ADAPTIVE LOW RANK AND SPARSE DECOMPOSITION OF VIDEO USING COMPRESSIVE SENSING

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Inference and Representation

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

K-Means and Gaussian Mixture Models

MSA220 - Statistical Learning for Big Data

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Expectation Maximization!

CS 231A Computer Vision (Autumn 2012) Problem Set 1

S(x) = arg min s i 2S kx

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Chapter 7: Competitive learning, clustering, and self-organizing maps

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Image Segmentation. April 24, Stanford University. Philipp Krähenbühl (Stanford University) Segmentation April 24, / 63

ECG782: Multidimensional Digital Signal Processing

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

Energy Minimization for Segmentation in Computer Vision

Fisher vector image representation

Improving Image Segmentation Quality Via Graph Theory

Image Segmentation. Srikumar Ramalingam School of Computing University of Utah. Slides borrowed from Ross Whitaker

Convexization in Markov Chain Monte Carlo

Unit 3 : Image Segmentation

Multiple Model Estimation : The EM Algorithm & Applications

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Social-Network Graphs

Computer Vision 5 Segmentation by Clustering

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Mixture Models and EM

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Contrast adjustment via Bayesian sequential partitioning

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

Introduction to Machine Learning

Mining Social Network Graphs

CS 664 Segmentation. Daniel Huttenlocher

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

A Course in Machine Learning

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Markov Random Fields and Segmentation with Graph Cuts

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Image Segmentation. Selim Aksoy. Bilkent University

Image Segmentation. Selim Aksoy. Bilkent University

Artificial Neural Networks Unsupervised learning: SOM

Transcription:

Segmentation: Clustering, Graph Cut and EM Ying Wu Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208 yingwu@northwestern.edu http://www.eecs.northwestern.edu/~yingwu 1/29

Outline Motivations and Applications Image Segmentation by Clustering K-Means Algorithm Self-Organizing Map Image Segmentation by Graph Cut Basic Idea Block-diagonalization Segmentation by Expectation-Maximization Missing Data Problem E-M iteration Issues Remained 2/29

Segmentation is a Fundamental Problem To group up similar components such as image pixels, image regions or even video clips. It is an ill-posed problem! How do we define the similarity measurement? 3/29

Background Subtraction Video surveillance applications Separate the foreground from background Assume fixed camera Subtract background from images Adaptive scheme by selecting w a, w i and w c. B n+1 = w af + i w ib n i w c 4/29

Object Modeling Represent an object by regions First step for recognition Represent a scene by a set of layers Motion segmentation Image segmentation v.s. motion segmentation 5/29

Basic Approaches Segmentation by clustering Segmentation by graph cut Segmentation by EM algorithm 6/29

Outline Motivations and Applications Image Segmentation by Clustering K-Means Algorithm Self-Organizing Map Image Segmentation by Graph Cut Basic Idea Block-diagonalization Segmentation by Expectation-Maximization Missing Data Problem E-M iteration Issues Remained 7/29

K-Means Clustering Assume the number of clusters, K, is given. Use the center of each clusters C i to represent each cluster. How do we determine the identity of a data point? Need to define a distance measurement, D(x, y). e.g., D(x, y) = x y 2. Winner takes all: l k (x k ) = arg min D(x k, C i ) = arg min x k C i 2 i i where l k is the label for the data point x k. K-means finds the clusters to minimize the total distortion. φ(x, C) = x j C i 2 i C j i th cluster 8/29

K-Means Clustering To minimize φ, K-means algorithm iterates between two steps: Labelling: assume the p-th iteration ends up with a set of cluster centers C (p) i, i = 1,...,K. We label each data point based on such a set of cluster centers, i.e., x k, find l (p+1) k (x k ) = min x k C (p) i i 2 and group data points belong to the same cluster Ω j = {x k : l k (x k ) = C j } Re-centering: re-calculating the centers: C (p+1) x i = k Ω i x k Ω i Iterates between labelling and re-centering until it converges. 9/29

Self-Organizing Map (SOM) SOM can be used for visualizing high-dim data Map to a low-dim space based on competitive learning A two-layer neural network outputs ξ 1 ξ 2 ξ m-1 ξ m weights x 1 x 2 x 3 inputs The # of neuron in the input layer is the same as the dimension of the input vector. Connection weights W k for each output neuron. 10/29

Competitive Learning For an input x, all neurons compete against each other The winner is the one whose weight is the closest to the input: y i = arg min i D(x i, W i ) The index of the winner is taken as the output of SOM. Adjust the weight of the winner Train the neurons nearby, and counter-train those far away. A window function Λ( y yk ) and the Hebbian learning rule: W(t + 1) = W(t) + η(t)λ( y y k )(x k W y k (t)) Intuition: the input data point will attract the neuron inside the window to its location, but push those neuron outside the window far away. Relation to vector quantization (VQ) and K-means clustering? 11/29

Outline Motivations and Applications Image Segmentation by Clustering K-Means Algorithm Self-Organizing Map Image Segmentation by Graph Cut Basic Idea Block-diagonalization Segmentation by Expectation-Maximization Missing Data Problem E-M iteration Issues Remained 12/29

Adjacency Graph and Affinity Matrix We can represent the set of data {x 1,...,x N } by a graph G = {V,E} Each vertex represents an individual data point Each edge represents the adjacency of two data points And the weight of the edge represents the affinity of the two points For example A ij = exp { x i x j 2 } 2σ 2 i.e., the similarity of two points Thus, the data set can be viewed as a weighted adjacency graph More importantly, it can also be viewed as an affinity matrix A 13/29

Block-diagonalization: Idea If the data are grouped, then the affinity matrix is pretty much block-diagonalized Now, clustering can be treated as the task of finding the best re-permutation to block-diagonalize A More specifically, the summation of the affinity values of those off-diagonal block matrices is minimized or the sum of diagonal block matrices is maximized 14/29

Block-diagonalization: Formulation Introduce an association vector (i.e., a projection) for each cluster component w k, w k = w k1 w k2. w kn where w ki is the association of x i to the cluster k. Positive w ki indicates that x i is in cluster k to some extent, and negative otherwise Usually, such projection vector is normalized, i.e., we have: w T k w k = 1, Now we can formulate the problem as w k = arg max w k k = 1,...,K s.t. w T k w k = 1 w T k Aw k 15/29

Spectral Analysis The solution is easy The Lagrangian L = wk T Aw k + λ(1 wk T w k) It is clear that L w k = 2Aw k 2λw k = 0 Aw k = λw k What is this! w k, an eigenvector, indicates the association of data with cluster k The size of the cluster is given by the eigenvalue λ More significantly, we don t need to know K in advance! The significant λs tell K 16/29

A Problem Ideally, we can check the values of w ki for grouping But life is always complicated Suppose A has two identical eigenvalues Aw 1 = λw 1, and Aw 2 = λw 2 It is easy to see any linear combination of w 1 and w 2 also gives a valid eigenvector A(a 1 w 1 + a 2 w 2 ) = λ(a 1 w 1 + a 2 w 2 ) This means that we cannot simply use the values of w = a 1 w 1 + a 2 w 2 for grouping now Instead of using the 1-D subspace, we need to go to the 2-D subspace spanned by {w 1,w 2 } If all the K clusters are more or less of the same size, we ll have K similar eigenvalues. Then we have to go to K-d subspace. This is the worse case. 17/29

Graph Cut We may view the problem from another point of view: graph cut We still represent the data set by the affinity graph Suppose we want to divide the data set into two clusters, we need to find the set of weakest links between the subgraphs, each of which corresponds to one cluster A set of edges in a graph is called a cut Now, we need to find a minimum cut for the weakest links But we have singularity here: the separation of the isolated vertex gives the minimum cut In other words, the cut does not balance the sizes of the clusters 18/29

Normalized Cut So, the cut needs to be normalized. Suppose we partition V into A and B. z { 1, 1} N is the indicator. z i = 1 if x i in A, and -1 otherwise. Let d i = j A ij be the total connection from x i to all others Define normalized cut NCut(A, B) = = cut(a, B) cut(b, A) + asso(a, V) asso(b, V) A ij x i x j x i >0,x j <0 + d i x i >0 x i <0,x j >0 x i <0 A ij x i x j Denote x D = diag{d 1,...,d N }, k = i >0 d i i d, b = k i 1 k d i 19/29

Normalized Cut Define y = (1 + x) b(1 x) Shi & Malik (1997) gave a nice formulation 1 y T (D A)y minncut(x) = min x y y T Dy { yi {1, b} s.t. y T D1 = 0 This is to solve a generalized EVD under constraints (D A)y = λdy The showed that the eigenvector associated with the 2nd smallest eigenvalue is able to bipartite the graph 1 J. Shi and J. Malik, Normalized Cuts and Image Segmentation, CVPR 97 20/29

Outline Motivations and Applications Image Segmentation by Clustering K-Means Algorithm Self-Organizing Map Image Segmentation by Graph Cut Basic Idea Block-diagonalization Segmentation by Expectation-Maximization Missing Data Problem E-M iteration Issues Remained 21/29

Generative Model and Missing Data Assume each image pixel is produced by a probability density associated with one of the g image segments. The data generation process: we first choose an image segment, and then generate the pixel based on: p(x) = i p(x θ i )π i where π is the prior for the i-th image segment, and θ i is the parameter. We can use Gaussian for each component: p(x θ i ) G(µ i, Σ i ) Associate a label l k for each x k for its identity This mixture model is a generative model. The data labels are missing. 22/29

Formulation So, our task is to do the inverse. Given a set of data point (image pixels) X = {x k, k = 1,...,N}, we need to estimate those parameters θ i, π i, and estimating the labels for all the data points by: lj = arg maxp(l j = k x j, Θ), x j k which gives the posteriori probability of x j. Maximum Likelihood Estimation. The likelihood of the data set can be written by: p(x Θ) = g ( p(x j θ i )π i ) j Usually, we use log likelihood: log p(x Θ) = j i=1 log( g p(x j θ i )π i ) But this is very ugly (why?) and intractable! 23/29 i=1

Missing Data and Indicator Variable Introduce an indicator variable z: z 1 z 2 z =. If a data point x is drawn from the k-th component, then z k = 1, and all other z i k = 0. This indicator variable tells the identity of a data point It is the missing part! Why do we need it? z g 24/29

Good News! Let s form the complete data: y k = [ xk And the complete data set is Y = {y k, k = 1,...,N}. The likelihood of the complete data point y k : g p(y k ) Θ) = z ki p(x k θ i ) log p(y k Θ) = i=1 z k ] g z ki log p(x k θ i ) i=1 So, for the whole data set, we have N g p(y Θ) = z ki p(x k θ i ) k=1 i=1 25/29

Good News and Bad News And thus: log p(y Θ) = = N g log( z ki p(x k θ i )) k=1 N k=1 i=1 i=1 g z ki log p(x k θ i ) Because we eliminate the summation terms inside log, the ML estimation becomes easier: Θ = arg maxlog p(y Θ) Θ However, the bad news is that the indicator variable z k make the ML difficult, since we do not know z k. 26/29

Expectation-Maximization Iteration Fortunately, life won t be too bad. A quite interesting phenomenon: if we know such zk, i.e., we know the identities for each data point, we can easily estimate the density parameters Θ based on ML, without any doubt. At the same time, if we know the density parameters, we can easily solve such indicator variables z k based on MAP. This phenomenon suggest an iterative procedure: E-step: computing an expected value of the complete data, here only E[z k ]; M-step: maximizing the the log likelihood of the complete data to estimate Θ. It converges to a local maximum of the likelihood. 27/29

EM for Image Segmentation let s apply EM to image segmentation: E-step: E[z ki ] = 1 p(kth pixel comes from ith component) M-step + 0 p(kth pixel doesn t come from ith component) = p(kth pixel comes from ith component) π i p(x k θ i ) = g j=1 π jp(x k θ j ) π i = 1 r µ i = r p(i x l,θ) l=1 r l=1 x lp(i x l,θ) r l=1 p(i x l,θ) Σ i = r l=1 p(i x l,θ)[(x l µ i )(x l µ i ) T ] r l=1 p(i x l,θ) 28/29

Issues Remained Structural parameters EM assumes a known number of components A common problem in clustering What if we don t know it? Minimum Description Length (MDL) principle in theory Cross-validation in practice Curse of dimensionality What if the dimensionality of x is very high? Too many parameters to estimate Requires a huge amount of training data Otherwise, the estimation is heavily biased 29/29