Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Similar documents
Markov chain Monte Carlo methods

MCMC Methods for data modeling

Image Restoration using Markov Random Fields

PIANOS requirements specifications

Markov Random Fields and Gibbs Sampling for Image Denoising

Generative and discriminative classification techniques

Digital Image Processing Laboratory: MAP Image Restoration

Chapter 1. Introduction

mritc: A Package for MRI Tissue Classification

Problem Set 4. Assigned: March 23, 2006 Due: April 17, (6.882) Belief Propagation for Segmentation

Markov Chain Monte Carlo (part 1)

Undirected Graphical Models. Raul Queiroz Feitosa

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

GiRaF: a toolbox for Gibbs Random Fields analysis

Computer Vision. Exercise Session 10 Image Categorization

3 : Representation of Undirected GMs

Scene Grammars, Factor Graphs, and Belief Propagation

Application of MRF s to Segmentation

Scene Grammars, Factor Graphs, and Belief Propagation

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Bayesian Statistics Group 8th March Slice samplers. (A very brief introduction) The basic idea

Digital Image Processing Laboratory: Markov Random Fields and MAP Image Segmentation

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Machine Learning. Classification

Statistical Matching using Fractional Imputation

Clustering Relational Data using the Infinite Relational Model

Nested Sampling: Introduction and Implementation

Machine Problem 8 - Mean Field Inference on Boltzman Machine

Introduction to Machine Learning CMU-10701

Lesion Segmentation and Bias Correction in Breast Ultrasound B-mode Images Including Elastography Information

A Nonparametric Bayesian Approach to Detecting Spatial Activation Patterns in fmri Data

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Semantic Segmentation. Zhongang Qi

Laboratorio di Problemi Inversi Esercitazione 4: metodi Bayesiani e importance sampling

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

1 Methods for Posterior Simulation

Quantitative Biology II!

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Markov Networks in Computer Vision

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Markov Networks in Computer Vision. Sargur Srihari

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

10708 Graphical Models: Homework 4

Segmentation. Bottom up Segmentation Semantic Segmentation

Keywords: Image segmentation, EM algorithm, MAP Estimation, GMM-HMRF, color image segmentation

Mesh segmentation. Florent Lafarge Inria Sophia Antipolis - Mediterranee

MRF-based Algorithms for Segmentation of SAR Images

RJaCGH, a package for analysis of

Calibration and emulation of TIE-GCM

BART STAT8810, Fall 2017

Max-Product Particle Belief Propagation

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Collective classification in network data

Texture Modeling using MRF and Parameters Estimation

Experts-Shift: Learning Active Spatial Classification Experts for Keyframe-based Video Segmentation

Markov/Conditional Random Fields, Graph Cut, and applications in Computer Vision

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Monte Carlo for Spatial Models

Temporal Modeling and Missing Data Estimation for MODIS Vegetation data

Fitting Social Network Models Using the Varying Truncation S. Truncation Stochastic Approximation MCMC Algorithm

Generative and discriminative classification techniques

Neighbourhood-consensus message passing and its potentials in image processing applications

Multi-label classification using rule-based classifier systems

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Generative and discriminative classification

Generative and discriminative classification techniques

Detection of Man-made Structures in Natural Images

Unsupervised Learning

Graphical Models, Bayesian Method, Sampling, and Variational Inference

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Supervised Learning for Image Segmentation

Algorithms for Markov Random Fields in Computer Vision

BAYESIAN SEGMENTATION OF THREE DIMENSIONAL IMAGES USING THE EM/MPM ALGORITHM. A Thesis. Submitted to the Faculty.

08 An Introduction to Dense Continuous Robotic Mapping

Network-based auto-probit modeling for protein function prediction

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Markov Random Fields for Recognizing Textures modeled by Feature Vectors

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

CS281 Section 9: Graph Models and Practical MCMC

K-Means and Gaussian Mixture Models

Clustering web search results

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

MR IMAGE SEGMENTATION

Detection of Smoke in Satellite Images

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

FUSION OF MULTITEMPORAL AND MULTIRESOLUTION REMOTE SENSING DATA AND APPLICATION TO NATURAL DISASTERS

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

Note Set 4: Finite Mixture Models and the EM Algorithm

intro, applications MRF, labeling... how it can be computed at all? Applications in segmentation: GraphCut, GrabCut, demos

10-701/15-781, Fall 2006, Final

Bayesian model ensembling using meta-trained recurrent neural networks

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Unsupervised Texture Image Segmentation Using MRF- EM Framework

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Approximate Bayesian Computation. Alireza Shafaei - April 2016

Using Texture to Annotate Remote Sensed Datasets

Spatial Regularization of Functional Connectivity Using High-Dimensional Markov Random Fields

Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri

Transcription:

7 Computer Vision and Classification 413 / 458

Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning communities as a quick way to classify objects into predefined groups. This approach requires a training dataset where both the class y and the vector x of characteristics (or covariates) of each observation are known. 414 / 458

Computer Vision and Classification The k-nearest-neighbor method The training dataset (y i, x i ) 1 i n is used by the k-nearest-neighbor procedure to predict the value of y n+1 given a new vector of covariates x n+1 in a very rudimentary manner. The predicted value of y n+1 is simply the most frequent class found amongst the k nearest neighbors of x n+1 in the set (x i ) 1 i n. 415 / 458

Computer Vision and Classification The k-nearest-neighbor method The classical knn procedure does not involve much calibration and requires no statistical modeling at all! There exists a Bayesian reformulation. 416 / 458

Computer Vision and Classification vision dataset (1) vision: 1373 color pictures, described by 200 variables rather than by the whole table of 500 375 pixels Four classes of images: class C 1 for motorcycles, class C 2 for bicycles, class C 3 for humans and class C 4 for cars 417 / 458

Computer Vision and Classification vision dataset (2) Typical issue in computer vision problems: build a classifier to identify a picture pertaining to a specific topic without human intervention We use about half of the images (648 pictures) to construct the training dataset and we save the 689 remaining images to test the performance of our procedures. 418 / 458

Computer Vision and Classification vision dataset (3) 419 / 458

Computer Vision and Classification A probabilistic version of the knn methodology (1) Symmetrization of the neighborhood relation: If x i belongs to the k-nearest-neighborhood of x j and if x j does not belong to the k-nearest-neighborhood of x i, the point x j is added to the set of neighbors of x i Notation: i k j The transformed set of neighbors is then called the symmetrized k-nearest-neighbor system. 420 / 458

Computer Vision and Classification A probabilistic version of the knn methodology (2) 421 / 458

Computer Vision and Classification A probabilistic version of the knn methodology (3) P(y i = C j y i, X, β, k) = exp β / I Cj (y l ) l ki N k G exp β / I Cg (y l ) l ki g=1 N k, N k being the average number of neighbours over all x i s, β > 0 and y i = (y 1,..., y i 1, y i+1,..., y n ) and X = {x 1,..., x n }. 422 / 458

Computer Vision and Classification A probabilistic version of the knn methodology (4) β grades the influence of the prevalent neighborhood class ; the probabilistic knn model is conditional on the covariate matrix X ; the frequencies n 1 /n,..., n G /n of the classes within the training set are representative of the marginal probabilities p 1 = P(y i = C 1 ),..., p G = P(y i = C G ): If the marginal probabilities p g are known and different from n g /n = reweighting the various classes according to their true frequencies 423 / 458

Computer Vision and Classification Bayesian analysis of the knn probabilistic model From a Bayesian perspective, given a prior distribution π(β, k) with support [0, β max ] {1,..., K}, the marginal predictive distribution of y n+1 is P(y n+1 = C j x n+1, y, X) = K βmax P(y n+1 = C j x n+1, y, X, β, k)π(β, k y, X) dβ, k=1 0 where π(β, k y, X) is the posterior distribution given the training dataset (y, X). 424 / 458

Computer Vision and Classification Unknown normalizing constant Major difficulty: it is impossible to compute f(y X, β, k) Use instead the pseudo-likelihood made of the product of the full conditionals f(y X, β, k) = G g=1 y i =C g P(y i = C g y i, X, β, k) 425 / 458

Computer Vision and Classification Pseudo-likelihood approximation Pseudo-posterior distribution π(β, k y, X) f(y X, β, k)π(β, k) Pseudo-predictive distribution P(y n+1 = C j x n+1, y, X) = K k=1 0 βmax P(y n+1 = C j x n+1, y, X, β, k) π(β, k y, X) dβ. 426 / 458

Computer Vision and Classification MCMC implementation (1) MCMC approximation is required Random walk Metropolis Hastings algorithm Both β and k are updated using random walk proposals: (i) for β, logistic tranformation β (t) = β max exp(θ (t) ) / (exp(θ (t) ) + 1), in order to be able to simulate a normal random walk on the θ s, θ N (θ (t), τ 2 ); 427 / 458

Computer Vision and Classification MCMC implementation (2) (ii) for k, we use a uniform proposal on the r neighbors of k (t), namely on {k (t) r,..., k (t) 1, k (t) + 1,... k (t) + r} {1,..., K}. Using this algorithm, we can thus derive the most likely class associated with a covariate vector x n+1 from the approximated probabilities, 1 M M i=1 ( ) P y n+1 = l x n+1, y, x, (β (i), k (i) ). 428 / 458

β k 0 5000 10000 15000 20000 Iterations 0 5000 10000 15000 20000 Iterations 5.0 5.5 6.0 6.5 7.0 7.5 4 6 8 10 12 14 16 Bayesian Core:A Practical Approach to Computational Bayesian Statistics Computer Vision and Classification MCMC implementation (3) 10 20 30 40 Frequency 0 5000 10000 15000 5 6 7 8 Frequency 0 500 1000 1500 2000 β max = 15, K = 83, τ 2 = 0.05 and r = 1 k sequences for the knn Metropolis Hastings β based on 20, 000 iterations 429 / 458

This underlying structure of the true pixels is denoted by x, while the observed image is denoted by y. Both objects x and y are arrays, with each entry of x taking a finite number of values and each entry of y taking real values. We are interested in the posterior distribution of x given y provided by Bayes theorem, π(x y) f(y x)π(x). The likelihood, f(y x), describes the link between the observed image and the underlying classification. 430 / 458

Menteith dataset (1) The Menteith dataset is a 100 100 pixel satellite image of the lake of Menteith. The lake of Menteith is located in Scotland, near Stirling, and offers the peculiarity of being called lake rather than the traditional Scottish loch. 431 / 458

Menteith dataset (2) Satellite image of the lake of Menteith 432 / 458

Random Fields If we take a lattice I of sites or pixels in an image, we denote by i I a coordinate in the lattice. The neighborhood relation is then denoted by. A random field on I is a random structure indexed by the lattice I, a collection of random variables {x i ; i I} where each x i takes values in a finite set χ. 433 / 458

Markov Random Fields Let x n(i) be the set of values taken by the neighbors of i. A random field is a Markov random field (MRF) if the conditional distribution of any pixel given the other pixels only depends on the values of the neighbors of that pixel; i.e., for i I, π(x i x i ) = π(x i x n(i) ). 434 / 458

Ising Models If pixels of the underlying (true) image x can only take two colors (black and white, say), x is then binary, while y is a grey-level image. We typically refer to each pixel x i as being foreground if x i = 1 (black) and background if x i = 0 (white). We have π(x i = j x i ) exp(βn i,j ), β > 0, where n i,j = l n(i) I x l =j is the number of neighbors of x i with color j. 435 / 458

Ising Models The Ising model is defined via full conditionals π(x i = 1 x i ) = exp(βn i,1 ) exp(βn i,0 ) + exp(βn i,1 ), and the joint distribution satisfies π(x) exp β j i I xj =x i, where the summation is taken over all pairs (i, j) of neighbors. 436 / 458

Simulating from the Ising Models The normalizing constant of the Ising Model is intractable except for very small lattices I... Direct simulation of x is not possible! 437 / 458

Ising Gibbs Sampler Algorithm (Ising Gibbs Sampler) Initialization: For i I, generate independently Iteration t (t 1): x (0) i B(1/2). 1 Generate u = (u i ) i I, a random ordering of the elements of I. 2 For 1 l I, update n (t) u l,0 and n(t) u l,1, and generate { x (t) exp(βn (t) u u l B l,1 ) } exp(βn (t) u l,0 ) + exp(βn(t) u l,1 ). 438 / 458

Ising Gibbs Sampler (2) 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 Simulations from the Ising model with a four-neighbor neighborhood structure on a 100 100 array after 1, 000 iterations of the Gibbs sampler: β varies in steps of 0.1 from 0.3 to 1.2. 439 / 458

Potts Models If there are G colors and if n i,g denotes the number of neighbors of i I with color g (1 g G) (that is, n i,g = j i I x j =g) In that case, the full conditional distribution of x i is chosen to satisfy π(x i = g x i ) exp(βn i,g ). This choice corresponds to the Potts model, whose joint density is given by π(x) exp β j i I xj =x i. 440 / 458

Simulating from the Potts Models (1) Algorithm (Potts Metropolis Hastings Sampler) Initialization: For i I, generate independently Iteration t (t 1): x (0) i U ({1,..., G}). 1 Generate u = (u i ) i I a random ordering of the elements of I. 2 For 1 l I, generate x u (t) l U ({1, x u (t 1) l 1, x u (t 1) l + 1,..., G}), 441 / 458

Simulating from the Potts Models (2) Algorithm (continued) compute the n (t) u l,g and { } ρ l = exp(βn (t) u l, x )/ exp(βn(t) u l,x ul ) 1, and set x (t) u l equal to x ul with probability ρ l. 442 / 458

Simulating from the Potts Models (3) 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 20 60 100 Simulations from the Potts model with four grey levels and a four-neighbor neighborhood structure based on 1000 iterations of the Metropolis Hastings sampler. The parameter β varies in steps of 0.1 from 0.3 to 1.2. 443 / 458

Posterior Inference (1) The prior on x is a Potts model with G categories, π(x β) = 1 Z(β) exp β I xj =x i, i I j i where Z(β) is the normalizing constant. Given x, we assume that the observations in y are independent normal random variables, f(y x, σ 2, µ 1,..., µ G ) = { 1 (2πσ 2 exp 1 } ) 1/2 2σ 2 (y i µ xi ) 2. i I 444 / 458

Posterior Inference (2) Priors β U ([0, 2]), µ = (µ 1,..., µ G ) U ({µ ; 0 µ 1... µ G 255}), π(σ 2 ) σ 2 I ]0, [ (σ 2 ), the last prior corresponding to a uniform prior on log σ. 445 / 458

Posterior Inference (3) Posterior distribution π(x, β, σ 2, µ y) π(β, σ 2, µ) 1 Z(β) exp β I xj =x i i I j i { } 1 1 (2πσ 2 exp ) 1/2 2σ 2 (y i µ xi ) 2. i I 446 / 458

Full conditionals (1) P(x i = g y, β, σ 2, µ) exp β I xj =g 1 2σ 2 (y i µ g ) 2, j i can be simulated directly. n g = i I I xi =g and s g = i I I xi =gy i the full conditional distribution of µ g is a truncated normal distribution on [µ g 1, µ g+1 ] with mean s g /n g and variance σ 2 /n g 447 / 458

Full conditionals (2) The full conditional distribution of σ 2 is an inverse gamma distribution with parameters I 2 /2 and i I (y i µ xi ) 2 /2. The full conditional distribution of β is such that π(β y) 1 Z(β) exp β i I j i I xj =x i. 448 / 458

Path sampling Approximation (1) Z(β) = x exp {βs(x)}, where S(x) = i I j i I x j =x i dz(β) dβ = Z(β) S(x) exp(βs(x)) Z(β) x = Z(β) E β [S(X)], d log Z(β) dβ = E β [S(X)]. 449 / 458

Path sampling Approximation (2) Path sampling identity log {Z(β 1 )/Z(β 0 )} = β1 β 0 E β [S(x)]dβ. 450 / 458

Path sampling Approximation (3) For a given value of β, E β [S(X)] can be approximated from an MCMC sequence. The integral itself can be approximated by computing the value of f(β) = E β [S(X)] for a finite number of values of β and approximating f(β) by a piecewise-linear function ˆf(β) for the intermediate values of β. 451 / 458

Path sampling Approximation (3) 10000 15000 20000 25000 30000 35000 40000 0.0 0.5 1.0 1.5 2.0 Approximation of f(β) for the Potts model on a 100 100 image, a four-neighbor neighborhood, and G = 6, based on 1500 MCMC iterations after burn-in. 452 / 458

Posterior Approximation (1) 32.0 33.0 34.0 35.0 58.5 59.0 59.5 60.0 60.5 0 500 1000 1500 0 500 1000 1500 70.0 70.5 71.0 71.5 72.0 72.5 0 500 1000 1500 83.0 83.5 84.0 84.5 85.0 0 500 1000 1500 95.0 95.5 96.0 96.5 97.0 97.5 110.5 111.5 112.5 113.5 0 500 1000 1500 0 500 1000 1500 Dataset Menteith: Sequence of µ g s based on 2000 iterations of the hybrid Gibbs sampler (read row-wise from µ 1 to µ 6). 453 / 458

Posterior Approximation (2) 0 0 10 20 30 40 32.0 32.5 33.0 33.5 34.0 34.5 35.0 58.5 59.0 59.5 60.0 60.5 0 10 20 30 40 50 60 0 10 20 30 40 50 70.0 70.5 71.0 71.5 72.0 72.5 83.0 83.5 84.0 84.5 85.0 0 10 20 30 40 50 0 95.0 95.5 96.0 96.5 97.0 97.5 110.5 111.0 111.5 112.0 112.5 113.0 113.5 Histograms of the µ g s 454 / 458

Posterior Approximation (3) 18.5 19.0 19.5 20.0 20.5 0 120 0 500 1000 1500 18.5 19.0 19.5 20.0 20.5 1.295 1.300 1.305 1.310 1.315 1.320 0 120 0 500 1000 1500 1.295 1.300 1.305 1.310 1.315 1.320 Raw plots and histograms of the σ 2 s and β s based on 2000 iterations of the hybrid Gibbs sampler 455 / 458

Image segmentation (1) Based on (x (t) ) 1 t T, an estimator of x needs to be derived from an evaluation of the consequences of wrong allocations. Two common ways corresponding to two loss functions L 1 (x, x) = i I I xi ˆx i, L 2 (x, x) = I x bx, Corresponding estimators ˆx MP M i = arg max 1 g G Pπ (x i = g y), i I, x MAP = arg max x π(x y), 456 / 458

Image segmentation (2) x MP M and x MAP not available in closed form! Approximation ˆx MP M i = max N g {1,...,G} j=1 I x (j) i =g, based on a simulated sequence, x (1),..., x (N). 457 / 458

Image segmentation (3) (top) Segmented image based on the MPM estimate produced after 2000 iterations of the Gibbs sampler and (bottom) the observed image 458 / 458