Segmentation. Bottom up Segmentation Semantic Segmentation

Size: px

Start display at page:

Download "Segmentation. Bottom up Segmentation Semantic Segmentation"

Simon Harrington
5 years ago
Views:

1 Segmentation Bottom up Segmentation Semantic Segmentation

2 Semantic Labeling of Street Scenes Ground Truth Labels 11 classes, almost all occur simultaneously, large changes in viewpoint, scale sky, road, sidewalk,tree,fence,pole,traffic sign,car,pedestrian, bicyclist

3 Object detection Basic idea: slide a window across image and evaluate a (object) face model at every location

4 Object detection Sliding window approach does not scale well Top pedestrian [HOG] detector only 15.3% with 85% FP

Ingredients What are the elementary regions

What is the role of geometric features How

5 Ingredients What are the elementary regions for classification? How to spatially integrate the evidence from neighbouring regions? What is the role of geometric features for classification? How to exploit street scene context? Video database 3 daytime one at dusk sequence

Semantic Labeling Problem formulation, compute optimal assignment of labels to regions given appearance and geometry features P (L A, G) = P (A, G L) P (L) P (A,

6 Semantic Labeling Problem formulation, compute optimal assignment of labels to regions given appearance and geometry features P (L A, G) = P (A, G L) P (L) P (A, G) sky building building columnpole tree signsymbol building tree columnpole building sky car columnpole car building sidewalk sidewalk road L =(l 1,l 2,...l S )

7 Semantic Segmentation Ingredients simultaneous segmentation and recognition choice of elementary regions (pixels, super-pixels, rectangular regions) choice of features to describe the regions (color edge statistics, area/shape/moments ) computation of the likelihoods of features given labels context modeling - spatial co-occurrence between class labels - modeling relative location of object classes - large support regions for training category classifier semantic labeling formulated in MRF/CRF framework

8 Image Labelling Problems In general - Assign a label to each image pixel Geometry Es1ma1on Image Denoising Object Segmenta1on Depth Es1ma1on Sky Building Tree Grass

9 Image Labelling MRF/CRF framework Typical pair-wise MRF probabilistic semantics P (x y) =P (y x)p (x) P (y x) = P (y i x i ) P (x) Prior distribution over labels i S set of labels x = {x 1,...,x n } Commonly used priors for semantic labeling - spatial co-occurrence between class labels - modeling relative location of object classes - large support regions for training category classifier

10 Markov Random Field Framework Typical pair-wise MRF in vision P (x y) =P (y x)p (x) P (y x) = i S P (y i x i ) If data likelihood is independent given the labels and prior is homegenous Ising prior with pairwise non zero potentials, we can write posterior as i S i S j N i P(x y) = 1 exp( log p(y i x i )+ β m x i x j Z m

Markov Random Field Framework Typical pair-wise MRF in vision i S i S j N i P(x y) = 1 exp( log p(y i x i )+ β m x i x j ) Z m Can be expressed in terms of an energy function

11 Markov Random Field Framework Typical pair-wise MRF in vision i S i S j N i P(x y) = 1 exp( log p(y i x i )+ β m x i x j ) Z m Can be expressed in terms of an energy function P(x) = 1 Z m exp( E(x,θ) Z m is a partition function important for learning Parameters, not critical for inference Typical energy function E(x) = D(x i ) + V(x i, x j ) i S (i, j) E

Toy Problem Binary image denoising Given a noisy image get most likely binary image Set of labels {white, black} Commonly used prior: Isotropic Ising Prior Goal find image x, such that p(x y)

12 Toy Problem Binary image denoising Given a noisy image get most likely binary image Set of labels {white, black} Commonly used prior: Isotropic Ising Prior Goal find image x, such that p(x y) is maximized i.e. find image that minimizes energy E(x) E(x) = D(x i ) + V(x i, x j ) V(x i, x j ) = β x i x j, β > 0 i S (i, j) E D(x i ) = log(1 θ) for x i = y i D(x i ) = log(θ) for x i y i

13 Foreground/Background Estimation Rother et al. SIGGRAPH04

14 Foreground/Background Estimation Data term Smoothness term Data term Es1mated using FG / BG colour models Smoothness term where Intensity dependent smoothness

15 Foreground/Background Estimation Data term Smoothness term

16 Foreground/Background Estimation Data term Smoothness term How to solve this optimisation problem?

17 Foreground/Background Estimation Data term Smoothness term How to solve this optimisation problem? Transform into min-cut / max-flow problem Solve it using min-cut / max-flow algorithm Loopy belief propagation, tree-weigted message passing etc Gibbs Sampling, ICM, Simulated Annealing

18 More general multi-label problems MAP of MRF - solve optimal labelling problem Labels semantic categories Nodes: pixels, superpixels, patches etc Toy example: 12 superpixels, 3 labels each available solvers: Kolmogorov PAMI'06, Werner PAMI'07 18

19 MRSC 21 class dataset Class co-occurrence Typically 2-5 classes per image, small scale and viewpoint variatio Context modeling spatial co-occurrence between class labels modeling relative location of object classes Best performing methods are discriminative (CRF framework)

20 Semantic Labeling P (L A, G) = P (A, G L) P (L) P (A, G) Appearance is captured using scale invariant features organized in visual vocabulary trees, discrete set of visual words V =(v 1,v 2,...,v S ) Maximize likelihood of the labels using appearance and geometry cues appearance likelihood geometry likelihood argmax L argmin L P (L V, G) = argmax P (V L)P (G L) P (L) L S S E app + g E geom + s E smooth i=1 i=1 (i,j) E

considered- buildings, roads, sky, cars and trees

21 Semantic Segmentation Sky Building Tree Road Image Ground Truth Annotation Semantic categories considered- buildings, roads, sky, cars and trees Fully annotated datasets 320 side views dataset 90 frontal views dataset

dimensional feature computed for each superpixel of image P.F. Felzenszwalb and D.P. Huttenlocher.

22 Semantic Segmentation Superpixel Extraction Images sites for semantic labeling - superpixels superpixels obtained from graph based segmentation method of [Felzenszwalb 2004] Features color, texture, location, perspective cues 194 dimensional feature computed for each superpixel of image P.F. Felzenszwalb and D.P. Huttenlocher. Efficient graph-based image segmentation, IJCV 2004 D. Hoiem, A.A. Efros and M. Hebert. Recovering surface layout from an image, IJCV 2007

Semantic Segmentation Observation likelihood - P(a i l i )

ensemble of weak learners Decision Trees as weak learner

Many Lines? Yes Sky! No Yes No Learn one vs.

by classifier with maximum score Sample decision tree for sky

23 Semantic Segmentation Observation likelihood - P(a i l i ) computed using boosting classifier Boosting classifiers Learn ensemble of weak learners Decision Trees as weak learner Superpixel Feature Spxl. Location High? Yes No Blue? Many Lines? Yes Sky! No Yes No Learn one vs. all boosting classifiers Superpixel semantic label determined by classifier with maximum score Sample decision tree for sky class Two separate models trained 320 side views dataset 90 frontal views dataset Side View Frontal View

24 Example Results Image Ground Truth Labeling Predicted Labeling

25 Nonparametric Methods Figure credit: J. Tighe and S. Lazebnik. Superparsing: Scalable nonparametric parising with superpixels, ECCV 2010

26 References J. Tighe and S. Lazebnik. Superparsing: Scalable nonparametric image parsing with superpixels. ECCV 2010 D. Eigen and R. Fergus. Nonparametric image parsing using adaptive neighbor sets, CVPR 2012 P. Sturgess, L. Ladicky, N. Crook, and P. Torr. Scalable cascade inference for semantic image segmentation. BMVC 2012 G. Singh and J.Kosecka: Nonparametric Scene Parsing with Adaptive Feature Relevance and Scene Context, CVPr 2013

27 Conditional Random Fields MAP of MRF - solve optimal labelling problem Posterior = product of likelihood and prior Sometimes difficult to obtain generative model i S P(x y) = 1 exp( log p(y i x i )+ β m x i x j ) Z m Conditional Random Fields Model the posterior directly i S i S i S j N i j N i P(x y) = 1 exp( f (x i, y)+ g(x,i x j, y)) Z m

28 Conditional Random Fields Seman6c Parsing of Indoors scenes Tree Graph Structure Informa6ve Features using RGB- D

29 Graph Structure: Our choice Minimum Spanning Tree Over 3D 5/5/2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

30 Potentials: Pairwise CRFs 5/5/2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

31 Potentials: Pairwise CRFs 5/5/2013 Semantic Parsing for Priming Object Detection in RGB-D Scenes

Undirected Graphical Models. Raul Queiroz Feitosa

Undirected Graphical Models. Raul Queiroz Feitosa Undirected Graphical Models Raul Queiroz Feitosa Pros and Cons Advantages of UGMs over DGMs UGMs are more natural for some domains (e.g. context-dependent entities) Discriminative UGMs (CRF) are better