Lecture 7: Semantic Segmentation

Size: px

Start display at page:

Download "Lecture 7: Semantic Segmentation"

Justin Jordan
5 years ago
Views:

Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images

Converting fully connected layers to convolution layers Each fully connected layer is

image Semantic segmentation Given an input image, obtain pixel-wise segmentation mask using a

1 Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr 2 Semantic Segmentation using CNN Image classification Fully Convolutional Network (FCN) Converting fully connected layers to convolution layers Each fully connected layer is interpreted as a convolution with a large spatial filter that covers entire input field Query image Semantic segmentation Given an input image, obtain pixel-wise segmentation mask using a deep Convolutional Neural Network (CNN) fc7 fc6 pool fc7 fc6 fc7 fc pool pool Fully connected layers Convolution layers For the larger Input field Query image 3 4

FCN for Semantic Segmentation Network architecture [Long5] End-to-end CNN architecture for semantic segmentation Interpret fully connected layers to convolutional layers 500x500x3 Deconvolution

Fining-tuning convolutional layers of the network with segmentation ground-truth.

2 FCN for Semantic Segmentation Network architecture [Long5] End-to-end CNN architecture for semantic segmentation Interpret fully connected layers to convolutional layers 500x500x3 Deconvolution Filter Bilinear interpolation filter Same filter for every class No filter learning! How does this deconvolution work? Deconvolution layer is fixed. Fining-tuning convolutional layers of the network with segmentation ground-truth. 6x6x2 seg $ = & (($) Deconvolution Fixed Pretrained on ImageNet Fine-tuned for segmentation 64x64 bilinear interpolation [Long5] J. Long, E. Shelhamer, T. Darrell: Fully Convolutional Network for Semantic Segmentation. CVPR DeconvNet Encoder-decoder architecture Learning a deep deconvolution network One of the seminal works for CNN-based semantic segmentation Fully supervised approach Symmetric architecture: conceptually more reasonable network Deep progressive decoder: better to identify fine structures of objects Large prediction map: capable of predicting dense output scores Operations in Deconvolution Network Unpooling Place activations to pooled location Preserve structure of activations Deconvolution Densify sparse activations Bases to reconstruct shape ReLU Same with convolution network [Noh5] H. Noh, S. Hong, B. Han: Learning Deconvolution Network for Semantic Segmentation. ICCV

Training and Inference Instance-wise training Data augmentation: object

segmentation with ground truth Full segmentation with object proposals

Object proposals 3. Prediction and aggregation 4.

corresponds to one of the channels in the output layer.

Aggregation of 50 object proposals: max operations over all proposals 9

Few training examples with strong labels Decoupled architecture

input of segmentation using bridging layers Achieved outstanding

3 How Deconvolution Network Works? Visualization of activations Deconv: 4x4 Unpool: 28x28 Deconv: 28x28 Training and Inference Instance-wise training Data augmentation: object proposals, random cropping, flipping Two-stage training Binary segmentation with ground truth Full segmentation with object proposals Batch normalization Instance-wise prediction DeconvNet. Input image 2. Object proposals 3. Prediction and aggregation 4. Results Unpool: 56x56 Deconv: 56x56 Unpool: 2x2 Deconv: 2x2 Each class corresponds to one of the channels in the output layer. Label of a pixel is given by max operation over all channels. Aggregation of 50 object proposals: max operations over all proposals 9 0 Results DecoupledNet Scenario Many training examples with weak labels Few training examples with strong labels Decoupled architecture Decoupling classification and segmentation networks Customizing the input of segmentation using bridging layers Achieved outstanding performance [Hong6] S. Hong, J. Oh, H. Lee, B. Han: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 206, Spotlight Presentation 2

DecoupledNet Comparison to other algorithms in PASCAL VOC 202 validation set Per-class accuracy in PASCAL VOC 202 test set

target classes Transfer segmentation knowledge from other classes Approach Using attention for individual classes Classify,

Han: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 206, Spotlight

Presentation 4 TransferNet Input image Ground-truth Densified attention BaselineNet TransferNet TransferNet+CRF Weakly

Feature map upsampling: 2 deconv layers + unpooling + 2 deconv layers Superpixel pooling layer: aggregates feature vectors

4 DecoupledNet Comparison to other algorithms in PASCAL VOC 202 validation set Per-class accuracy in PASCAL VOC 202 test set TransferNet Transfer learning for semantic segmentation Similar scenario with DecoupledNet No segmentation knowledge for target classes Transfer segmentation knowledge from other classes Approach Using attention for individual classes Classify, attend, and segment [Hong6] S. Hong, J. Oh, H. Lee, B. Han: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 206, Spotlight Presentation 3 [Hong6] S. Hong, J. Oh, H. Lee, B. Han: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 206, Spotlight Presentation 4 TransferNet Input image Ground-truth Densified attention BaselineNet TransferNet TransferNet+CRF Weakly Supervised Semantic Segmentation Superpixel Pooling Network (SPN) Goal: Construction of tentative ground-truth segmentation Feature map upsampling: 2 deconv layers + unpooling + 2 deconv layers Superpixel pooling layer: aggregates feature vectors spatially aligned with superpixels 5 [Kwak6] S. Kwak, S. Hong, B. Han: Weakly Supervised Semantic Segmentation using Superpixel Pooling Network. AAAI 207 6

Auto-Annotation for Semantic Segmentation Dense segmentation label mining Goal: obtaining segmentation labels using web-crawled videos FG/BG segmentation Using a graph-based segmentation technique

Han: Weakly Supervised Semantic Segmentation using Web- Crawled Videos, CVPR 207 Spotlight Presentation 7 Auto-Annotation for Semantic Segmentation Results Annotations Method Mean IoU Image labels

0 Bounding box [Papandreou5b] 58.5 Bounding box [Dai5] 62.0 Scribble [Lin et al. 206] 63. [Tokmakov6] 38. Ours 58.

5 Auto-Annotation for Semantic Segmentation Dense segmentation label mining Goal: obtaining segmentation labels using web-crawled videos FG/BG segmentation Using a graph-based segmentation technique Based on class-specific attention, motion, and color Automatic video collection given text labels [Hong6] S. Hong, D. Yeo, S. Kwak, H. Lee, B. Han: Weakly Supervised Semantic Segmentation using Web- Crawled Videos, CVPR 207 Spotlight Presentation 7 Auto-Annotation for Semantic Segmentation Results Annotations Method Mean IoU Image labels Extra annotations Videos (unannotated) [Papandreou5a] 33.8 [Pathak5b] 35.3 [Pinheiro5] 42.0 [Kolesnikov6] 50.7 Transfer learning [Hong6] 52. Point supervision [Bearman6] 46.0 Bounding box [Papandreou5b] 58.5 Bounding box [Dai5] 62.0 Scribble [Lin et al. 206] 63. [Tokmakov6] 38. Ours 58. [HongCVPR6] Seunghoon Hong, Donghun Yeo, Suha Kwak, Honglak Lee, Bohyung Han: Weakly Supervised Semantic Segmentation using Web-Crawled Videos, arxiv: , Three contributions Atrous convolution Atrous Spatial Pyramid Pooling (ASPP) Fully connected Conditional Random Field (CRF) Atrous convolution Alleviating limitations caused by reduced feature resolution Large receptive field with sparse parameters Effectively enlarging the field of view of filters to incorporate larger context Not increasing the number of parameters D Atrous convolution Standard convolution vs. Atrous convolution [Chen8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: : Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, TPAMI 9 [Chen8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: : Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, TPAMI 20

0 0 Atrous Spatial Pyramid Pooling (ASPP) Improving prediction performance on multi-scale objects A variant of spatial pyramid

Pursuing better object boundary recognition +, = -. /, / + -. /3 (, /,, 3 ) / /3. /, / = log 8, /. /3, /,, 3 = 9, /, 3 ; < exp? /? 3 D / D 3 + ; exp?

-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L.

[Krahenbuhl] P. Krahenbuhl, V. Koltun: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials.

6 0 0 Atrous Spatial Pyramid Pooling (ASPP) Improving prediction performance on multi-scale objects A variant of spatial pyramid pooling Multiple parallel Atrous convolutional layers with different sampling rates Fully connected Conditional Random Field (CRF) Pursuing better object boundary recognition +, = -. /, / + -. /3 (, /,, 3 ) / /3. /, / = log 8, /. /3, /,, 3 = 9, /, 3 ; < exp? /? 3 D / D 3 + ; exp? /? 3 2B C 2B C 2B E [Chen8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: : Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, TPAMI 2 [Krahenbuhl] P. Krahenbuhl, V. Koltun: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. NIPS Semantic Segmentation Performance Semantic Instance Segmentation Leaderboard for PASCAL VOC202 Training on own data (comp6) Mean IOU Image classification Object detection/localization Semantic segmentation Semantic instance segmentation

Instance-sensitive Fully Convolutional Networks Instance-sensitive score maps The outcome of a

F channel score maps Instance assembling through copy-and-paste Instance-sensitive Fully

instances: generating instance sensitive score maps Scoring the instances: generating objectness

Dense prediction in multiple scales [Dai6] J. Dai, K. He, Y. Li, S. Ren, J.

Ren, J. Sun: Instance-sensitive Fully Convolutional Networks.

7 Instance-sensitive Fully Convolutional Networks Instance-sensitive score maps The outcome of a pixel-wise classifier of a relative position to instances F F relative positions, which requires F channel score maps Instance assembling through copy-and-paste Instance-sensitive Fully Convolutional Networks Architecture: two fully convolutional branches Estimating segment instances: generating instance sensitive score maps Scoring the instances: generating objectness scores Training and testing Training with sparsely sampled windows using an aggregated loss Dense prediction in multiple scales [Dai6] J. Dai, K. He, Y. Li, S. Ren, J. Sun: Instance-sensitive Fully Convolutional Networks. ECCV [Dai6] J. Dai, K. He, Y. Li, S. Ren, J. Sun: Instance-sensitive Fully Convolutional Networks. ECCV Instance-sensitive Fully Convolutional Networks [Dai6] J. Dai, K. He, Y. Li, S. Ren, J. Sun: Instance-sensitive Fully Convolutional Networks. ECCV

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output: