Lecture 7: Semantic Segmentation

Similar documents
Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Structured Prediction using Convolutional Neural Networks

Presentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16

Deconvolutions in Convolutional Neural Networks

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Fully Convolutional Networks for Semantic Segmentation

Lecture 5: Object Detection

Semantic Segmentation

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Learning Fully Dense Neural Networks for Image Semantic Segmentation

Semantic segmentation is a popular visual recognition task

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES

arxiv: v1 [cs.cv] 31 Mar 2016

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Yiqi Yan. May 10, 2017

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION

Object detection with CNNs

arxiv: v2 [cs.cv] 9 Apr 2018

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Object Detection on Self-Driving Cars in China. Lingyun Li

Regionlet Object Detector with Hand-crafted and CNN Feature

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Gradient of the lower bound

Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

SUMMARY. they need large amounts of computational resources to train

arxiv: v2 [cs.cv] 18 Jul 2017

EE-559 Deep learning Networks for semantic segmentation

Advanced Video Analysis & Imaging

arxiv: v1 [cs.cv] 11 Apr 2018

YOLO9000: Better, Faster, Stronger

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Cascade Region Regression for Robust Object Detection

arxiv: v1 [cs.cv] 15 Oct 2018

Martian lava field, NASA, Wikipedia

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

arxiv: v4 [cs.cv] 6 Jul 2016

Places Challenge 2017

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation

arxiv: v2 [cs.cv] 10 Apr 2017

arxiv: v3 [cs.cv] 12 Mar 2016

Conditional Random Fields as Recurrent Neural Networks

arxiv: v1 [cs.cv] 8 Mar 2017 Abstract

Spatial Localization and Detection. Lecture 8-1

Dense Image Labeling Using Deep Convolutional Neural Networks

Object Detection Based on Deep Learning

arxiv: v1 [cs.cv] 29 Sep 2016

Detecting and Parsing of Visual Objects: Humans and Animals. Alan Yuille (UCLA)

One Network to Solve Them All Solving Linear Inverse Problems using Deep Projection Models

Computer Vision Lecture 16

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Computer Vision Lecture 16

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

arxiv: v1 [cs.cv] 14 Dec 2015

SEMANTIC segmentation, the task of assigning semantic. Coarse-to-fine Semantic Segmentation from Image-level Labels

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Semi Supervised Semantic Segmentation Using Generative Adversarial Network

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Multi-channel Deep Transfer Learning for Nuclei Segmentation in Glioblastoma Cell Tissue Images

LEARNING DENSE CONVOLUTIONAL EMBEDDINGS

Team G-RMI: Google Research & Machine Intelligence

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Learning to Segment Object Candidates

Convolutional Networks in Scene Labelling

arxiv: v1 [cs.cv] 9 Aug 2017

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Multi-View 3D Object Detection Network for Autonomous Driving

PARTIAL STYLE TRANSFER USING WEAKLY SUPERVISED SEMANTIC SEGMENTATION. Shin Matsuo Wataru Shimoda Keiji Yanai

Person Part Segmentation based on Weak Supervision

Semantic Soft Segmentation Supplementary Material

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation

arxiv: v3 [cs.cv] 2 Jun 2017

arxiv: v1 [cs.cv] 1 Feb 2018

A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION

arxiv: v2 [cs.cv] 8 Apr 2018

Automatic Lymphocyte Detection in H&E Images with Deep Neural Networks

Photo OCR ( )

An Analysis of Scale Invariance in Object Detection SNIP

arxiv: v1 [cs.cv] 23 Mar 2018

arxiv: v1 [cs.cv] 12 Jun 2018

Scene Parsing with Global Context Embedding

R-FCN: Object Detection with Really - Friggin Convolutional Networks

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

An Analysis of Scale Invariance in Object Detection SNIP

Dataset Augmentation with Synthetic Images Improves Semantic Segmentation

RGBd Image Semantic Labelling for Urban Driving Scenes via a DCNN

SEMANTIC segmentation has a wide array of applications

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

Transcription:

Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr 2 Semantic Segmentation using CNN Image classification Fully Convolutional Network (FCN) Converting fully connected layers to convolution layers Each fully connected layer is interpreted as a convolution with a large spatial filter that covers entire input field Query image Semantic segmentation Given an input image, obtain pixel-wise segmentation mask using a deep Convolutional Neural Network (CNN) fc7 fc6 pool5 7 7 52 fc7 fc6 fc7 fc6 6 6 6 6 pool5 7 7 52 pool5 22 22 52 Fully connected layers Convolution layers For the larger Input field Query image 3 4

FCN for Semantic Segmentation Network architecture [Long5] End-to-end CNN architecture for semantic segmentation Interpret fully connected layers to convolutional layers 500x500x3 Deconvolution Filter Bilinear interpolation filter Same filter for every class No filter learning! How does this deconvolution work? Deconvolution layer is fixed. Fining-tuning convolutional layers of the network with segmentation ground-truth. 6x6x2 seg $ = & (($) Deconvolution Fixed Pretrained on ImageNet Fine-tuned for segmentation 64x64 bilinear interpolation [Long5] J. Long, E. Shelhamer, T. Darrell: Fully Convolutional Network for Semantic Segmentation. CVPR 205 5 6 DeconvNet Encoder-decoder architecture Learning a deep deconvolution network One of the seminal works for CNN-based semantic segmentation Fully supervised approach Symmetric architecture: conceptually more reasonable network Deep progressive decoder: better to identify fine structures of objects Large prediction map: capable of predicting dense output scores Operations in Deconvolution Network Unpooling Place activations to pooled location Preserve structure of activations Deconvolution Densify sparse activations Bases to reconstruct shape ReLU Same with convolution network [Noh5] H. Noh, S. Hong, B. Han: Learning Deconvolution Network for Semantic Segmentation. ICCV 205 7 8

How Deconvolution Network Works? Visualization of activations Deconv: 4x4 Unpool: 28x28 Deconv: 28x28 Training and Inference Instance-wise training Data augmentation: object proposals, random cropping, flipping Two-stage training Binary segmentation with ground truth Full segmentation with object proposals Batch normalization Instance-wise prediction DeconvNet. Input image 2. Object proposals 3. Prediction and aggregation 4. Results Unpool: 56x56 Deconv: 56x56 Unpool: 2x2 Deconv: 2x2 Each class corresponds to one of the channels in the output layer. Label of a pixel is given by max operation over all channels. Aggregation of 50 object proposals: max operations over all proposals 9 0 Results DecoupledNet Scenario Many training examples with weak labels Few training examples with strong labels Decoupled architecture Decoupling classification and segmentation networks Customizing the input of segmentation using bridging layers Achieved outstanding performance [Hong6] S. Hong, J. Oh, H. Lee, B. Han: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 206, Spotlight Presentation 2

DecoupledNet Comparison to other algorithms in PASCAL VOC 202 validation set Per-class accuracy in PASCAL VOC 202 test set TransferNet Transfer learning for semantic segmentation Similar scenario with DecoupledNet No segmentation knowledge for target classes Transfer segmentation knowledge from other classes Approach Using attention for individual classes Classify, attend, and segment [Hong6] S. Hong, J. Oh, H. Lee, B. Han: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 206, Spotlight Presentation 3 [Hong6] S. Hong, J. Oh, H. Lee, B. Han: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, CVPR 206, Spotlight Presentation 4 TransferNet Input image Ground-truth Densified attention BaselineNet TransferNet TransferNet+CRF Weakly Supervised Semantic Segmentation Superpixel Pooling Network (SPN) Goal: Construction of tentative ground-truth segmentation Feature map upsampling: 2 deconv layers + unpooling + 2 deconv layers Superpixel pooling layer: aggregates feature vectors spatially aligned with superpixels 5 [Kwak6] S. Kwak, S. Hong, B. Han: Weakly Supervised Semantic Segmentation using Superpixel Pooling Network. AAAI 207 6

Auto-Annotation for Semantic Segmentation Dense segmentation label mining Goal: obtaining segmentation labels using web-crawled videos FG/BG segmentation Using a graph-based segmentation technique Based on class-specific attention, motion, and color Automatic video collection given text labels [Hong6] S. Hong, D. Yeo, S. Kwak, H. Lee, B. Han: Weakly Supervised Semantic Segmentation using Web- Crawled Videos, CVPR 207 Spotlight Presentation 7 Auto-Annotation for Semantic Segmentation Results Annotations Method Mean IoU Image labels Extra annotations Videos (unannotated) [Papandreou5a] 33.8 [Pathak5b] 35.3 [Pinheiro5] 42.0 [Kolesnikov6] 50.7 Transfer learning [Hong6] 52. Point supervision [Bearman6] 46.0 Bounding box [Papandreou5b] 58.5 Bounding box [Dai5] 62.0 Scribble [Lin et al. 206] 63. [Tokmakov6] 38. Ours 58. [HongCVPR6] Seunghoon Hong, Donghun Yeo, Suha Kwak, Honglak Lee, Bohyung Han: Weakly Supervised Semantic Segmentation using Web-Crawled Videos, arxiv:70.00352, 207 8 Three contributions Atrous convolution Atrous Spatial Pyramid Pooling (ASPP) Fully connected Conditional Random Field (CRF) Atrous convolution Alleviating limitations caused by reduced feature resolution Large receptive field with sparse parameters Effectively enlarging the field of view of filters to incorporate larger context Not increasing the number of parameters D Atrous convolution Standard convolution vs. Atrous convolution [Chen8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: : Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, TPAMI 9 [Chen8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: : Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, TPAMI 20

0 0 Atrous Spatial Pyramid Pooling (ASPP) Improving prediction performance on multi-scale objects A variant of spatial pyramid pooling Multiple parallel Atrous convolutional layers with different sampling rates Fully connected Conditional Random Field (CRF) Pursuing better object boundary recognition +, = -. /, / + -. /3 (, /,, 3 ) / /3. /, / = log 8, /. /3, /,, 3 = 9, /, 3 ; < exp? /? 3 D / D 3 + ; exp? /? 3 2B C 2B C 2B E [Chen8] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille: : Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, TPAMI 2 [Krahenbuhl] P. Krahenbuhl, V. Koltun: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. NIPS 20 22 Semantic Segmentation Performance Semantic Instance Segmentation Leaderboard for PASCAL VOC202 Training on own data (comp6) Mean IOU Image classification Object detection/localization http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?cls=mean&challengeid=&compid=6 23 24 Semantic segmentation Semantic instance segmentation

Instance-sensitive Fully Convolutional Networks Instance-sensitive score maps The outcome of a pixel-wise classifier of a relative position to instances F F relative positions, which requires F channel score maps Instance assembling through copy-and-paste Instance-sensitive Fully Convolutional Networks Architecture: two fully convolutional branches Estimating segment instances: generating instance sensitive score maps Scoring the instances: generating objectness scores Training and testing Training with sparsely sampled windows using an aggregated loss Dense prediction in multiple scales [Dai6] J. Dai, K. He, Y. Li, S. Ren, J. Sun: Instance-sensitive Fully Convolutional Networks. ECCV 206 25 [Dai6] J. Dai, K. He, Y. Li, S. Ren, J. Sun: Instance-sensitive Fully Convolutional Networks. ECCV 206 26 Instance-sensitive Fully Convolutional Networks [Dai6] J. Dai, K. He, Y. Li, S. Ren, J. Sun: Instance-sensitive Fully Convolutional Networks. ECCV 206 27 28