Multiple cosegmentation

Similar documents
Discriminative Clustering for Image Co-segmentation

Discriminative Clustering for Image Co-segmentation

Multi-Class Cosegmentation

Discriminative Clustering for Image Co-Segmentation

UNSUPERVISED CO-SEGMENTATION BASED ON A NEW GLOBAL GMM CONSTRAINT IN MRF. Hongkai Yu, Min Xian, and Xiaojun Qi

An Efficient Image Co-segmentation Algorithm based on Active Contour and Image Saliency

UNSUPERVISED COSEGMENTATION BASED ON SUPERPIXEL MATCHING AND FASTGRABCUT. Hongkai Yu and Xiaojun Qi

Image Co-Segmentation via Consistent Functional Maps

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Generative and discriminative classification techniques

Segmentation: Clustering, Graph Cut and EM

CRFs for Image Classification

A Hierarchical Image Clustering Cosegmentation Framework

Discriminative clustering for image co-segmentation

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Markov Networks in Computer Vision

Markov Networks in Computer Vision. Sargur Srihari

AUTOMATIC IMAGE CO-SEGMENTATION USING GEOMETRIC MEAN SALIENCY

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Metric learning approaches! for image annotation! and face recognition!

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

CS4670 / 5670: Computer Vision Noah Snavely

Unsupervised Multi-Class Joint Image Segmentation

Announcements. Image Segmentation. From images to objects. Extracting objects. Status reports next Thursday ~5min presentations in class

Markov Random Fields and Segmentation with Graph Cuts

K-Means and Gaussian Mixture Models

Improving Image Segmentation Quality Via Graph Theory

3 October, 2013 MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing

Energy Minimization for Segmentation in Computer Vision

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Computer Vision. Exercise Session 10 Image Categorization

Machine Learning : Clustering, Self-Organizing Maps

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

Part 5: Structured Support Vector Machines

Cosegmentation Revisited: Models and Optimization

Expectation Maximization (EM) and Gaussian Mixture Models

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

CRF Based Point Cloud Segmentation Jonathan Nation

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

9.1. K-means Clustering

Image Segmentation Using Iterated Graph Cuts BasedonMulti-scaleSmoothing

Separating Objects and Clutter in Indoor Scenes

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Learning Representations for Visual Object Class Recognition

Support Vector Machines.

Supervised texture detection in images

Object Segmentation and Tracking in 3D Video With Sparse Depth Information Using a Fully Connected CRF Model

Spatial Latent Dirichlet Allocation

Segmentation with non-linear constraints on appearance, complexity, and geometry

Bilevel Sparse Coding

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Gradient of the lower bound

Introduction to Machine Learning CMU-10701

human vision: grouping k-means clustering graph-theoretic clustering Hough transform line fitting RANSAC

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Sparse Coding and Dictionary Learning for Image Analysis

The goals of segmentation

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

CS4670: Computer Vision

A Taxonomy of Semi-Supervised Learning Algorithms

Part III: Affinity Functions for Image Segmentation

4/13/ Introduction. 1. Introduction. 2. Formulation. 2. Formulation. 2. Formulation

Image Segmentation continued Graph Based Methods

Bag-of-features. Cordelia Schmid

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Selection of Scale-Invariant Parts for Object Class Recognition

CS 534: Computer Vision Segmentation and Perceptual Grouping

EE 701 ROBOT VISION. Segmentation

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Machine Learning: Think Big and Parallel

Co-segmentation of Non-homogeneous Image Sets

Blind Image Deblurring Using Dark Channel Prior

Metric Learning for Large Scale Image Classification:

08 An Introduction to Dense Continuous Robotic Mapping

Image Segmentation continued Graph Based Methods. Some slides: courtesy of O. Capms, Penn State, J.Ponce and D. Fortsyth, Computer Vision Book

570 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 2, FEBRUARY 2011

Deep Generative Models Variational Autoencoders

String distance for automatic image classification

Robotics Programming Laboratory

Logistic Regression

All lecture slides will be available at CSC2515_Winter15.html

Mixture Models and EM

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Tri-modal Human Body Segmentation

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Multi-Class Segmentation with Relative Location Prior

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

intro, applications MRF, labeling... how it can be computed at all? Applications in segmentation: GraphCut, GrabCut, demos

Visual Recognition: Examples of Graphical Models

CS6670: Computer Vision

IMPROVED FINE STRUCTURE MODELING VIA GUIDED STOCHASTIC CLIQUE FORMATION IN FULLY CONNECTED CONDITIONAL RANDOM FIELDS

Interactive Image Segmentation with GrabCut

Unsupervised discovery of category and object models. The task

Support Vector Machines.

Transcription:

Armand Joulin, Francis Bach and Jean Ponce. INRIA -Ecole Normale Supérieure April 25, 2012

Segmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Segmentation is classical and fundamental vision problem.

Segmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Segmentation is classical and fundamental vision problem. Problem: Many possible solutions.

Existing solutions Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Supervised Segmentation: Need ground truth for every class of object Cannot deal with an unknown object. P. Krähenbühl and V. Koltun (NIPS 11) Interactive segmentation (scribbles or a bounding box) Need human interaction for each image. GrabCut

Cosegmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Dividing one images.

Cosegmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Dividing a set of images by using shared information.

Cosegmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Dividing a set of images by using shared information. No prior information. But: common foreground and different background.

Cosegmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Previous existing methods (Rother et al. 2006, Singh and Hochbaum 2009,...) only work with 2 images and the exact same object. The first presented method works on multiple images and on an object class. The second one extends it to multiple images and multiple object classes.

Segmentation Supervised and weakly-supervised segmentation Cosegmentation...Cosegmentation is also a ill-posed problem In natural images, objects are link with their environement...the background is also common to all the images.

Segmentation Supervised and weakly-supervised segmentation Cosegmentation...Cosegmentation is also a ill-posed problem In natural images, objects are link with their environement...the background is also common to all the images. Solutions: Use user interaction on some images,

Segmentation Supervised and weakly-supervised segmentation Cosegmentation...Cosegmentation is also a ill-posed problem In natural images, objects are link with their environement...the background is also common to all the images. Solutions: Use user interaction on some images, Segment the background into meaningful regions.

The goals of our approach Goal of our approach Notations Our method should: Handle multiple images.

The goals of our approach Goal of our approach Notations Our method should: Handle multiple images. Works on any kind of object/stuff.

The goals of our approach Goal of our approach Notations Our method should: Handle multiple images. Works on any kind of object/stuff. Segments the background into meaningful regions.

The goals of our approach Goal of our approach Notations Our method should: Handle multiple images. Works on any kind of object/stuff. Segments the background into meaningful regions. Uses no prior information but can be easily extended to interactive cosegmentation.

Method goals Introduction Goal of our approach Notations Local consistency Maximizing spatial consistency within a particular image. Figure: Image space.

Method goals Introduction Goal of our approach Notations Local consistency Maximizing spatial consistency within a particular image. Separation of the classes Maximizing the separability of K classes between different images Our framework: Unsupervised discriminative clustering. Figure: Image space. Figure: Feature space.

Problem Notations Introduction Goal of our approach Notations Each image i is reduced to a subsampled grid of pixels. For the n-th pixel, we denote by: x n its d-dimensional feature vector. y n the K-vector such as y nk = 1 if the n-th pixel is in the k-class and 0 otherwise.

Figure: Image space. Normalized Cut (Shi and Malik, 2000): The similarty between two pixels is mesured by the rbf distance between their position p n and their color c n.

Figure: Image space. Normalized Cut (Shi and Malik, 2000): The similarty between two pixels is mesured by the rbf distance between their position p n and their color c n. For an image i, our similarity matrix is: W i nm = exp( λ p p n p m 2 2 λ c c n c m 2 ).

Figure: Image space. Normalized Cut (Shi and Malik, 2000): The Laplacian matrix is L = I D 1/2 WD 1/2 where D the diagonal matrix composed of the row sums of W

Figure: Image space. Normalized Cut (Shi and Malik, 2000): The Laplacian matrix is L = I D 1/2 WD 1/2 where D the diagonal matrix composed of the row sums of W We thus have the following in our cost function: E B (y) = µ N tr(yt Ly).

Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Figure: Feature space. Discriminative classifier: given the labels y, we solve the following problem: 1 E U (y) = min A IR K d, N b IR K N n=1 l(y n,aφ(x n )+b)+ λ 2K A 2 F, Notations φ a non-linear mapping of the feature, l is a cost function.

Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Figure: Feature space. Discriminative classifier: given the labels y, we solve the following problem: 1 E U (y) = min A IR K d, N b IR K N n=1 l(y n,aφ(x n )+b)+ λ 2K A 2 F, Notations φ a non-linear mapping of the feature, l is a cost function.

Mapping approximation Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Our discriminative clustering framework works with positive definite kernels

Mapping approximation Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Our discriminative clustering framework works with positive definite kernels We use the χ 2 kernel matrix K: ( D K nm = exp λ h d=1 (x nd x md ) 2 ), x nd +x md

Mapping approximation Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Our discriminative clustering framework works with positive definite kernels We use the χ 2 kernel matrix K: ( D K nm = exp λ h d=1 (x nd x md ) 2 ), x nd +x md Equivalent to apply a mapping φ from the feature space to a high-dimensional Hilbert space F, such that: K nm = φ(x n ), φ(x m )

Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Figure: Feature space. Discriminative classifier: given the labels y, we solve the following problem: 1 E U (y) = min A IR K d, N b IR K N n=1 l(y n,aφ(x n )+b)+ λ 2K A 2 F, Notations φ a non-linear mapping of the feature, l is a cost function.

Loss function Introduction Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation We choose the soft-max loss function because it is suited for multiclass and is related to probabilistic models: l(y n,aφ(x n )+b) = K k=1 ( exp(a T ) y nk log k φ(x n )+b k ) K, l=1 exp(at l φ(x n )+b l )

Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Find the set of labels y which leads to the best data separation into K classes: min y {0,1} N K, y1 K =1 N min E U(y,A,b) A IR K d,b IR K

Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Find the set of labels y which leads to the best data separation into K classes: min y {0,1} N K, y1 K =1 N min E U(y,A,b) A IR K d,b IR K Problem: Same label for all the pixels perfect separation

Cluster size balancing Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Two solutions: adding linear constraints on the number of elements per class Encourage the proportion of points per class to be uniform

Cluster size balancing Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Two solutions: adding linear constraints on the number of elements per class Encourage the proportion of points per class to be uniform We choose the second: No additional parameters and have a probabilistic interpretation. H(y) = i I K k=1 ( 1 N n N i y nk ) ( 1 log N n N i y nk where i is an image, and N i the number of pixels in i Note: In a weakly supervised setting (e.g., interactive segmentation), this term can be modify to take into account prior knowledge. ).

Overall problem Introduction Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Combining the unary and binary term with the class balancing term, we obtain the following problem: [ ] min y {0,1} N K, y1 K =1 N min E U (y,a,b) +E B (y) H(y). A IR d K, b IR K

Probabilistic interpretation Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation We introduce t n in {0,1} I indicating to which image n belongs and z n in {1,...,M} giving for each pixel n some observable information The label y is a latent variable of the observable information z given x (x y z t) inducing an explain away phenomenon: the label y n and the variable t n compete to explain the observable information z n.

Probabilistic interpretation Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation More precisely, we suppose a bilinear model: P(z nm = 1 t ni = 1, y nk = 1) = y nk G ik mt ni, where N m=1 Gik m = 1

Probabilistic interpretation Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation More precisely, we suppose a bilinear model: P(z nm = 1 t ni = 1, y nk = 1) = y nk G ik mt ni, where N m=1 Gik m = 1 and a exponential family model for Y = (y 1,...,y N ) given X = (x 1,...,x N ) with unary parameters (A,b) and binary parameters L.

Probabilistic interpretation Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Our cost function is the mean-field variational approximation of the following (regularized) negative conditional log-likelihood of Z = (z 1,...,z N ) given X and T = (t 1,...,t N ) for our model: min A IR d K,b IR K, G IR N K I, G T 1 N =1, G 0 1 N N n=1 log ( p(z n x n, t n ) ) + λ 2K A 2 2.

Probabilistic interpretation Formulation Mapping approximation Loss function Cluster size balancing Overall problem Probabilistic interpretation Our cost function is the mean-field variational approximation of the following (regularized) negative conditional log-likelihood of Z = (z 1,...,z N ) given X and T = (t 1,...,t N ) for our model: min A IR d K,b IR K, G IR N K I, G T 1 N =1, G 0 1 N N n=1 log ( p(z n x n, t n ) ) + λ 2K A 2 2. Z can encode must-link and must-not-link constraints between pixels (e.g., superpixels).

EM procedure Introduction [ min y {0,1} N K, y1 K =1 N min A IR d K, b IR K ] E U (y,a,b) +E B (y) H(y).

EM procedure Introduction [ min y {0,1} N K, y1 K =1 N min A IR d K, b IR K ] E U (y,a,b) +E B (y) H(y). This cost function is not jointly convex in y and (A,b).

EM procedure Introduction [ min y {0,1} N K, y1 K =1 N min A IR d K, b IR K ] E U (y,a,b) +E B (y) H(y). This cost function is not jointly convex in y and (A,b). However it is convex in both independently.

EM procedure Introduction [ min y {0,1} N K, y1 K =1 N min A IR d K, b IR K ] E U (y,a,b) +E B (y) H(y). This cost function is not jointly convex in y and (A,b). However it is convex in both independently. We alternatively optimize over each variable while fixing the other: We use L-BFGS for (A,b) We use a projected gradient descent for y.

The initialization Since our problem is not convex, a good initialization is crucial We propose a quadratic convex approximation related to Joulin et al. (CVPR 10). Quadratic function may lead to poor solutions, thus we also use random initializations.

Initialization: Quadratic approximation The second-order Taylor expansion of our cost function is: J(y) = K [ tr(yy T C)+ 2µ 2 NK tr(yyt L) 1 ] N tr(yyt Π I ), where C = 1 N Π N(I Φ(NλI K +Φ T Π N Φ) 1 Φ T )Π N is related to the reweighted ridge regression classifier (Joulin et al. CVPR 10).

Initialization: Quadratic approximation The second-order Taylor expansion of our cost function is: J(y) = K [ tr(yy T C)+ 2µ 2 NK tr(yyt L) 1 ] N tr(yyt Π I ), where C = 1 N Π N(I Φ(NλI K +Φ T Π N Φ) 1 Φ T )Π N is related to the reweighted ridge regression classifier (Joulin et al. CVPR 10). This is not convex because of the last term which can be replaced by the following linear constraints: y nk 0.9N i ; y nk 0.1(N N i ). n N i n N j j I\i we obtain a formulation similar to Joulin et al. (CVPR 10).

Introduction Binary segmentation (foreground/background) on MSRC: High variability in foreground and background, around 30 images per classes, We use SIFT features. Multiclass cosegmentation on icoseg: Low variability in the image, same illumination... around 10 images per classes, We use color histograms. Some extensions: Grabcut. weakly supervised problem video key frames.

Binary cosegmentation

Binary cosegmentation

Binary cosegmentation class Ours Kim et al. (ICCV 11) Joulin et al. (CVPR 10) Bike 43.3 29.9 42.3 Bird 47.7 29.9 33.2 Car 59.7 37.1 59.0 Cat 31.9 24.4 30.1 Chair 39.6 28.7 37.6 Cow 52.7 33.5 45.0 Dog 41.8 33.0 41.3 Face 70.0 33.2 66.2 Flower 51.9 40.2 50.9 House 51.0 32.2 50.5 Plane 21.6 25.1 21.7 Sheep 66.3 60.8 60.4 Sign 58.9 43.2 55.2 Tree 67.0 61.2 60.0 Average 50.2 36.6 46.7

class K Ours Joulin et al. CVPR 10 Kim et al ICCV 11 Baseball player 5 62.2 53.5 51.1 Brown bear 3 75.6 78.5 40.4 Elephant 4 65.5 51.2 43.5 Ferrari 4 65.2 63.2 60.5 Football player 5 51.1 38.8 38.3 Helicopter 3 43.3 67.8 7.3 Kite Panda 2 57.8 58.0 66.2 Monk 2 77.6 76.9 71.3 Panda 3 55.9 49.1 39.4 Skating 2 64.0 47.2 51.1 Stonehedge 3 86.3 85.4 64.6 Plane 3 45.8 39.2 25.2 Face 3 70.5 56.4 33.2 Average 64.8 58.1 48.7

Extensions Introduction grabcut: Weakly supervised learning with image tags ({ plane, sheep, sky, grass}). Video shot segmentation:

Limitations Introduction Number of classes: Each class must be in each image (because of the entropy). Running time: About half an hour to one hour (MATLAB implementation).

Thank you.