Missing variable problems

Similar documents
Fitting D.A. Forsyth, CS 543

Mixture Models and EM

Multiple Model Estimation : The EM Algorithm & Applications

6.801/866. Segmentation and Line Fitting. T. Darrell

Multiple Model Estimation : The EM Algorithm & Applications

Fitting. Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local

SEGMENTATION AND FITTING USING PROBABILISTIC METHODS

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Expectation Maximization (EM) and Gaussian Mixture Models

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Supervised vs. Unsupervised Learning

Lecture 7: Segmentation. Thursday, Sept 20

Computer Vision 6 Segmentation by Fitting

Expectation Maximization: Inferring model parameters and class labels

Introduction to Computer Vision

Representing Moving Images with Layers. J. Y. Wang and E. H. Adelson MIT Media Lab

Clustering web search results

Segmentation and low-level grouping.

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Discovering Visual Hierarchy through Unsupervised Learning Haider Razvi

Machine Learning Lecture 3

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Clustering: Classic Methods and Modern Views

IBL and clustering. Relationship of IBL with CBR

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

Computer Vision Lecture 6

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Tracking. Establish where an object is, other aspects of state, using time sequence Biggest problem -- Data Association

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Computer Vision 5 Segmentation by Clustering

Unsupervised Learning: Clustering

Computer Vision Lecture 6

Clustering. Supervised vs. Unsupervised Learning

Deep Generative Models Variational Autoencoders

Machine Learning. Unsupervised Learning. Manfred Huber

Segmentation: Clustering, Graph Cut and EM

Introduction to Mobile Robotics

Latent Variable Models and Expectation Maximization

Expectation Maximization: Inferring model parameters and class labels

Optical Flow Estimation

Clustering Lecture 5: Mixture Model

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

What is machine learning?

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC

Bioimage Informatics

An Introduction to PDF Estimation and Clustering

CS 223B Computer Vision Problem Set 3

Machine Learning Lecture 3

Image Segmentation. Selim Aksoy. Bilkent University

Image Segmentation. Selim Aksoy. Bilkent University

Mixture Models and the EM Algorithm

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

Machine Learning Lecture 3

CSC 2515 Introduction to Machine Learning Assignment 2

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Note Set 4: Finite Mixture Models and the EM Algorithm

Patch-Based Image Classification Using Image Epitomes

Introduction to Machine Learning. Xiaojin Zhu

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Empirical Bayesian Motion Segmentation

SSE Vectorization of the EM Algorithm. for Mixture of Gaussians Density Estimation

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Unit 3 : Image Segmentation

CS 534: Computer Vision Segmentation and Perceptual Grouping

Segmentation & Grouping Kristen Grauman UT Austin. Announcements

CS 231A Computer Vision (Fall 2012) Problem Set 3

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Using Machine Learning to Optimize Storage Systems

Introduction to Machine Learning CMU-10701

Markov Random Fields and Segmentation with Graph Cuts

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Three-Dimensional Sensors Lecture 6: Point-Cloud Registration

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

Final Exam Study Guide

Clustering Color/Intensity. Group together pixels of similar color/intensity.

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Annotated multitree output

Lecture 3 January 22

3. Data Structures for Image Analysis L AK S H M O U. E D U

EE 701 ROBOT VISION. Segmentation

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Outline Introduction Goal Methodology Results Discussion Conclusion 5/9/2008 2

ALTERNATIVE METHODS FOR CLUSTERING

Scaled representations

K-Means and Gaussian Mixture Models

Laboratorio di Problemi Inversi Esercitazione 4: metodi Bayesiani e importance sampling

Lecture 8: The EM algorithm

CS664 Lecture #16: Image registration, robust statistics, motion

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Computer Vision with MATLAB

Unsupervised learning in Vision

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Segmentation and Grouping April 19 th, 2018

Transcription:

Missing variable problems In many vision problems, if some variables were known the maximum likelihood inference problem would be easy fitting; if we knew which line each token came from, it would be easy to determine line parameters segmentation; if we knew the segment each pixel came from, it would be easy to determine the segment parameters fundamental matrix estimation; if we knew which feature corresponded to which, it would be easy to determine the fundamental matrix etc. This sort of thing happens in statistics, too

For Independent data samples

Missing variables - strategy We have a problem with parameters, missing variables This suggests: Iterate until convergence replace missing variable with expected values, given fixed values of parameters fix missing variables, choose parameters to maximise likelihood given fixed values of missing variables e.g., iterate till convergence allocate each point to a line with a weight, which is the probability of the point given the line refit lines to the weighted set of points Converges to local extremum

Finite Mixtures P(x) = S i=1:3 a(i) g i (x; q)

Expection on Indicator variables

Segmentation with EM Figure from Color and Texture Based Image Segmentation Using EM and Its Application to Content Based Image Retrieval,S.J. Belongie et al., Proc. Int. Conf. Computer Vision, 1998, c1998, IEEE

We have one line, and n points Some come from the line, some from noise This is a mixture model: P point line and noise params Lines and robustness ( ) = P point line We wish to determine line parameters p(comes from line) ( )P( comes from line)+ P( point noise)p( comes from noise) = P( point line)l + P( point noise)(1- l)

Estimating the mixture model Introduce a set of hidden variables, d, one for each point. They are one when the point is on the line, and zero when off. If these are known, the negative log-likelihood becomes (the line s parameters are f, c): Here K is a normalising constant, kn is the noise intensity (we ll choose this later). Q c ( x;q ) = Â i Ê Á Á Ë d i Ê Á Ë ( x i cosf + y i sinf + c) 2 2s 2 ( 1- d i )k n ˆ + ˆ + K

Substituting for delta We shall substitute the expected value of d, for a given q recall q=(f, c, l) E(d_i)=1. P(d_i=1 q)+0... Notice that if kn is small and positive, then if distance is small, this value is close to 1 and if it is large, close to zero P( d i = 1 q, x i ) = = P( x i d i = 1,q)P( d i = 1) P( x i d i = 1,q)P( d i = 1)+ P( x i d i = 0,q)P d i = 0 ( exp -1 2s 2 [ x i cosf + y i sinj + c] 2 )l ( exp -1 2s 2 [ x i cosf + y i sinj + c] 2 )l + exp -k n ( ) ( ) 1 - l ( )

Algorithm for line fitting Obtain some start point q 0 ( ( ) ) ( ) = f ( 0 ),c ( 0),l 0 Now compute d s using formula above Iterate to convergence Now compute maximum likelihood estimate of q ( 1) f, c come from fitting to weighted points l comes by counting

The expected values of the deltas at the maximum (notice the one value close to zero).

Closeup of the fit

Choosing parameters What about the noise parameter, and the sigma for the line? several methods from first principles knowledge of the problem (seldom really possible) play around with a few examples and choose (usually quite effective, as precise choice doesn t matter much) notice that if kn is large, this says that points very seldom come from noise, however far from the line they lie usually biases the fit, by pushing outliers into the line rule of thumb; its better to fit to the better fitting points, within reason; if this is hard to do, then the model could be a problem

Other examples Segmentation a segment is a gaussian that emits feature vectors (which could contain colour; or colour and position; or colour, texture and position). segment parameters are mean and (perhaps) covariance if we knew which segment each point belonged to, estimating these parameters would be easy rest is on same lines as fitting line Fitting multiple lines rather like fitting one line, except there are more hidden variables easiest is to encode as an array of hidden variables, which represent a table with a one where the i th point comes from the j th line, zeros otherwise rest is on same lines as above

Issues with EM Local maxima can be a serious nuisance in some problems no guarantee that we have reached the right maximum Starting k means to cluster the points is often a good idea

Local maximum

which is an excellent fit to some points

and the deltas for this maximum

A dataset that is well fitted by four lines

Result of EM fitting, with one line (or at least, one available local maximum).

Result of EM fitting, with two lines (or at least, one available local maximum).

Seven lines can produce a rather logical answer

Motion segmentation with EM Model image pair (or video sequence) as consisting of regions of parametric motion affine motion is popular Ê Á Ë v x v y ˆ = Ê Á a Ë c Now we need to determine which pixels belong to which region estimate parameters b ˆ d Ê Á xˆ Ê Ë y + t xˆ Á Ë t y Likelihood assume I( x, y,t) = I( x + v x, y + v y,t +1) +noise Straightforward missing variable problem, rest is calculation

Three frames from the MPEG flower garden sequence Figure from Representing Images with layers,, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE

Grey level shows region no. with highest probability Segments and motion fields associated with them Figure from Representing Images with layers,, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE

If we use multiple frames to estimate the appearance of a segment, we can fill in occlusions; so we can re-render the sequence with some segments removed. Figure from Representing Images with layers,, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE

Some generalities Many, but not all problems that can be attacked with EM can also be attacked with RANSAC need to be able to get a parameter estimate with a manageably small number of random choices. RANSAC is usually better Didn t present in the most general form in the general form, the likelihood may not be a linear function of the missing variables in this case, one takes an expectation of the likelihood, rather than substituting expected values of missing variables

Model Selection We wish to choose a model to fit to data e.g. is it a line or a circle? e.g is this a perspective or orthographic camera? e.g. is there an aeroplane there or is it noise? Issue In general, models with more parameters will fit a dataset better, but are poorer at prediction This means we can t simply look at the negative loglikelihood (or fitting error)

Top is not necessarily a better fit than bottom (actually, almost always worse)

We can discount the fitting error with some term in the number of parameters in the model.

Discounts AIC (an information criterion) choose model with smallest value of -2L( D;q * )+ 2 p p is the number of parameters BIC (Bayes information criterion) choose model with smallest value of -2L( D;q * )+ p log N N is the number of data points Minimum description length same criterion as BIC, but derived in a completely different way

Cross-validation Split data set into two pieces, fit to one, and compute negative log-likelihood on the other Average over multiple different splits Choose the model with the smallest value of this average The difference in averages for two different models is an estimate of the difference in KL divergence of the models from the source of the data

Model averaging Very often, it is smarter to use multiple models for prediction than just one e.g. motion capture data there are a small number of schemes that are used to put markers on the body given we know the scheme S and the measurements D, we can estimate the configuration of the body X We want ( ) = P X S 1, D P X D ( )P( S 1 D)+ P( X S 2, D)P( S 2 D) + P( X S 3, D)P( S 3 D) If it is obvious what the scheme is from the data, then averaging makes little difference If it isn t, then not averaging underestimates the variance of X --- we think we have a more precise estimate than we do.

Learning as Model fitting

Model Fitting, Learning Model: Mapping between observations and world properties. P(o,s) Parametric: P(o,s;ø) where ø are parameters that need to be fit (learned) from observations (data) Density Estimation: Learn P(o,s) from examples. E.g. Histogram, KDE Mixture of parametric densities.

What s the problem? Two sets of hidden parameters Goal, find argmax(s) P(s o,ø) But P(s o;ø) depends on ø, thus different ø, different inference of s. Optimize ø? Yes, but optimal ø only good for one s. Need to optimize both at the same time. One Solution: EM algorithm

Learning templates for categories Assume a set of labelled examples for stimuli from categories C = 1, 2, etc Stimuli are fixed images in gaussian noise s = I1 + N Then labelled examples can be used to learn a parametric model P(s C) = N(m,K). Let K = s 2 I. How do we estimate m across trials?

EM algorithm Normal maximum likelihood Y unknown? What to do with Iterate, 1) set value for average over Y 2) maximize, use as next value 1) 2)

Density Est: Mixture of Densities e.g. q = { m, s }

Mixture Model Applications

Motion Estimation

Motion Estimation

Motion Estimation