Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

Similar documents
Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Semantic Segmentation

Conditional Random Fields as Recurrent Neural Networks

Semantic Segmentation. Zhongang Qi

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Fully Convolutional Networks for Semantic Segmentation

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Lecture 7: Semantic Segmentation

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Structured Models in. Dan Huttenlocher. June 2010

arxiv: v2 [cs.cv] 18 Jul 2017

Graphical Models for Computer Vision

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

Gradient of the lower bound

Exploring Context with Deep Structured models for Semantic Segmentation

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Yiqi Yan. May 10, 2017

Spatial Localization and Detection. Lecture 8-1

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Learning Fully Dense Neural Networks for Image Semantic Segmentation

Exploring Context with Deep Structured models for Semantic Segmentation

Segmentation. Bottom up Segmentation Semantic Segmentation

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

CRFs for Image Classification

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Fast Edge Detection Using Structured Forests

Object detection with CNNs

Instance-aware Semantic Segmentation via Multi-task Network Cascades

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

SEMANTIC segmentation has a wide array of applications

Object Detection Based on Deep Learning

Lecture 5: Object Detection

Recurrent Convolutional Neural Networks for Scene Labeling

Region-based Segmentation and Object Detection

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Scene Grammars, Factor Graphs, and Belief Propagation

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation

Decomposing a Scene into Geometric and Semantically Consistent Regions

Scene Grammars, Factor Graphs, and Belief Propagation

Structured Prediction using Convolutional Neural Networks

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Deconvolutions in Convolutional Neural Networks

Object Detection by 3D Aspectlets and Occlusion Reasoning

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

Semantic Segmentation via Structured Patch Prediction, Context CRF and Guidance CRF

RGBd Image Semantic Labelling for Urban Driving Scenes via a DCNN

Deep Learning and Its Applications

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION

Learning and Inference to Exploit High Order Poten7als

Learning and Recognizing Visual Object Categories Without First Detecting Features

VISION & LANGUAGE From Captions to Visual Concepts and Back

with Deep Learning A Review of Person Re-identification Xi Li College of Computer Science, Zhejiang University

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Bilinear Models for Fine-Grained Visual Recognition

SPM-BP: Sped-up PatchMatch Belief Propagation for Continuous MRFs. Yu Li, Dongbo Min, Michael S. Brown, Minh N. Do, Jiangbo Lu

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

3D Shape Analysis with Multi-view Convolutional Networks. Evangelos Kalogerakis

Day 3 Lecture 1. Unsupervised Learning

Fast scene understanding and prediction for autonomous platforms. Bert De Brabandere, KU Leuven, October 2017

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Cambridge Interview Technical Talk

Regionlet Object Detector with Hand-crafted and CNN Feature

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Depth from Stereo. Dominic Cheng February 7, 2018

Image Segmentation. Lecture14: Image Segmentation. Sample Segmentation Results. Use of Image Segmentation

Scene Parsing with Global Context Embedding

LEARNING DENSE CONVOLUTIONAL EMBEDDINGS

Bayesian model ensembling using meta-trained recurrent neural networks

Multi-View 3D Object Detection Network for Autonomous Driving

FoveaNet: Perspective-aware Urban Scene Parsing

Category-level localization

Computer Vision: Making machines see

Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

Segmentation in electron microscopy images

arxiv: v1 [cs.cv] 31 Mar 2016

Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

arxiv: v1 [cs.cv] 24 May 2016

Modeling 3D viewpoint for part-based object recognition of rigid objects

Multi-Class Segmentation with Relative Location Prior

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Big and Tall: Large Margin Learning with High Order Losses

Semantic Object Parsing with Local-Global Long Short-Term Memory

3D Graph Neural Networks for RGBD Semantic Segmentation

Supplemental Document for Deep Photo Style Transfer

Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation

Object Recognition II

Deep Dual Learning for Semantic Image Segmentation

Presentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16

Simultaneous Multi-class Pixel Labeling over Coherent Image Sets

Object recognition (part 2)

Semantic Soft Segmentation Supplementary Material

Transcription:

Learning Deep Structured Models for Semantic Segmentation Guosheng Lin

Semantic Segmentation

Outline Exploring Context with Deep Structured Models Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel; Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation; arxiv. Learning CNN based Message Estimators Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel; Deeply Learning the Messages in Message Passing Inference; NIPS 2015.

Background Fully convolution network for semantic segmentation Long et al. CVPR2015 low resolution prediction e.g., 1/32 or 1/8 of the input image size Fully convolution net Prediction in the Size of the input image Bilinear upsample Score map

Background Recent methods focus on the up-sample and refinement stage. e.g., DeepLab (ICLR 2015), CRF-RNN(ICCV 2015), DeconvNet(ICCV 2015), DPN (ICCV 2015) low resolution prediction e.g., 1/32 or 1/8 of the input image size Prediction in the Size of the input image Bilinear upsample and refine Fully convolution net Score map

Background Our focus: explore contextual information using deep structured model low resolution prediction e.g., 1/32 or 1/8 of the input image size Prediction in the Size of the input image Bilinear upsample and refine Contextual Deep Structured model Score map

Explore Context Spatial Context: Semantic relations between image regions. e.g., a car is likely to appear over a road A person appears above a horse is more likely than a dog appears above a horse. We focus on two types of context: Patch-Patch context Patch-Background context

Patch-Patch Context Patch-Background Context

Overview

Patch-Patch Context Learning CRFs with CNN based pairwise potential functions. FeatMap-Net Create the CRF graph (create nodes and pairwise connections) Feature map

d Create the CRF graph (create nodes and pairwise connections) Feature map Create nodes in the CRF graph One node corresponds to one spatial position in the feature map Generate pairwise connection One node connects to the nodes that lie in a spatial range box (box with the dashed lines)

Patch-Patch Context Construct CRF graph Constructing pairwise connections in a CRF graph:

CRFs with CNN based potentials The conditional likelihood for one image:

CRFs with CNN based potentials The conditional likelihood for one image:

Explore background context FeatMap-Net: multi-scale network for generating feature map

Prediction Coarse-level prediction stage: P(y x) is approximated using the mean-field algorithm Prediction refinement stage Sharpen the object boundary by leveraging low-level pixel information for smoothness. First up-sample the confidence map of the coarse prediction to the original input image size. Then perform Dense-CRF. (P. Kr ahenb uhl and V. KoltunNIPS2012)

Prediction Coarse-level prediction stage: P(y x) is approximated using the mean-field algorithm Prediction refinement stage Sharpen the object boundary by leveraging low-level pixel information for smoothness. First up-sample the confidence map of the coarse prediction to the original input image size. Then perform Dense-CRF. (P. Kr ahenb uhl and V. KoltunNIPS2012)

CRF learning Minimize the negative log-likelihood: SGD optimization, difficulty in calculating the gradient of the partition function: Require marginal inference at each SGD. Since the huge number of SGD iteration and large number of nodes, this approach is not practical or even intractable. We apply piecewise training to avoid repeat inference at each SGD iteration.

Results

PASCAL Leaderboard http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=6

Examples on Internet images

Test image: street scene

Result from a model trained on street scene images (around 1000 training images)

Building Road Side-walk Car

Tree Fence Rider Person

Result from a model trained on street scene images (around 1000 training images)

Result from PASCAL VOC model

Test image: indoor scene

Result from NYUD trained model (around 800 training images)

Result from PASCAL VOC trained model

Result from NYUD trained model

Message Learning

CRFs+CNNs Conditional likelihood: Energy function: CNN based (log-) potential function (factor function): The potential function can be a unary, pairwise, or high-order potential function Factor graph: a factorization of the joint distribution of variables CNN based unary potential: measure the labelling confidence of a single variable y1 y2 CNN based pairwise potential, measure the confidence of the pairwise label configuration

Challenges in Learning CRFs+CNNs Prediction can be made by marginal inference (e.g. message passing): CRF-CNN joint learning: learning CNN potential functions by optimizing the CRF objective, typically, minimizing the negative conditional log-likelihood (NLL) Learning CNN parameters with stochastic gradient descend. The partition function Z brings difficulties for optimization: For each SGD iteration: require approximate marginal inference to calculate the factor marginals. CNN training need a large number of SGD iterations, training become intractable.

Solutions Traditional approach: Applying approximate learning objectives Replace the optimization objectives to avoid inference e.g., piecewise training, pseudo-likelihood Our approach Directly target the final prediction Traditional approach aims to learn the potentials function and perform inference for final prediction Not learning the potential function Learning CNN estimators to directly output the required intermediate values in an inference algorithm Focus on message passing based inference for prediction (specifically Loopy BP). Directly learning CNNs to predict the messages.

belief propagation: message passing based inference A simple example of the marginal inference on the node y2: y1 Variable-to-factor message y2 Factor-to-variable message Message: K-dimensional vector, K is the number of classes (node states) Variable-to-factor message: Factor-to-variable message: marginal distribution (beliefs) of one variable: y3

CNN message estimators Directly learn a CNN function to output the message vector Don't need to learn the potential functions The factor-to-variable message: Input image region A message prediction function formualted by a CNN dependent message feature vector: encodes all dependent messages from the neighboring nodes that are connected to the node p by the factor F

Learning CNN message estimator The variable marginals estimated by CNN: Define the cross entropy loss between the ideal marginal and the estimated marginal: The optimization problem for learning:

Application on semantic segmentation