Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Similar documents
Lecture 7: Semantic Segmentation

Presentation Outline. Semantic Segmentation. Overview. Presentation Outline CNN. Learning Deconvolution Network for Semantic Segmentation 6/6/16

Fully Convolutional Networks for Semantic Segmentation

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Deconvolutions in Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks

Efficient Segmentation-Aided Text Detection For Intelligent Robots

RGBd Image Semantic Labelling for Urban Driving Scenes via a DCNN

Martian lava field, NASA, Wikipedia

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

Semantic Segmentation

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

AUTOMATED DETECTION AND CLASSIFICATION OF CANCER METASTASES IN WHOLE-SLIDE HISTOPATHOLOGY IMAGES USING DEEP LEARNING

A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Advanced Video Analysis & Imaging

Real-Time Semantic Segmentation Benchmarking Framework

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Learning Fully Dense Neural Networks for Image Semantic Segmentation

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

R-FCN: Object Detection with Really - Friggin Convolutional Networks

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Detecting cars in aerial photographs with a hierarchy of deconvolution nets

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

Perceiving the 3D World from Images and Videos. Yu Xiang Postdoctoral Researcher University of Washington

arxiv: v3 [cs.cv] 8 May 2017

SEMANTIC segmentation has a wide array of applications

arxiv: v1 [cs.cv] 31 Mar 2016

Spatial Localization and Detection. Lecture 8-1

Fuzzy Set Theory in Computer Vision: Example 3

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation

Thinking Outside the Box: Spatial Anticipation of Semantic Categories

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Team G-RMI: Google Research & Machine Intelligence

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

arxiv: v1 [cs.cv] 7 Jun 2016

Object Detection Based on Deep Learning

Semantic Segmentation with Scarce Data

arxiv: v1 [cs.cv] 8 Mar 2017 Abstract

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Convolutional Networks in Scene Labelling

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

arxiv: v4 [cs.cv] 6 Jul 2016

Lecture 5: Object Detection

DeepBIBX: Deep Learning for Image Based Bibliographic Data Extraction

Feature-Fused SSD: Fast Detection for Small Objects

arxiv: v1 [cs.cv] 11 Sep 2017

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

Dense Image Labeling Using Deep Convolutional Neural Networks

Places Challenge 2017

Hand-Object Interaction Detection with Fully Convolutional Networks

arxiv: v1 [cs.cv] 11 Apr 2018

arxiv: v2 [cs.cv] 18 Jul 2017

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

arxiv: v3 [cs.cv] 25 Jul 2018

arxiv: v1 [cs.cv] 26 May 2017

Deep Learning with Tensorflow AlexNet

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

Object detection with CNNs

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION

Cascade Region Regression for Robust Object Detection

Real-time Object Detection CS 229 Course Project

arxiv: v1 [cs.cv] 5 Apr 2017

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Yiqi Yan. May 10, 2017

Object Detection on Self-Driving Cars in China. Lingyun Li

All You Want To Know About CNNs. Yukun Zhu

Deep Residual Learning

Efficient Yet Deep Convolutional Neural Networks for Semantic Segmentation

arxiv: v1 [cs.cv] 24 May 2016

Large-scale tissue histopathology image segmentation based on feature pyramid

YOLO9000: Better, Faster, Stronger

Deconvolution Networks

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES

Using Machine Learning for Classification of Cancer Cells

Study of Residual Networks for Image Recognition

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Recurrent Convolutional Neural Networks for Scene Labeling

Deep Fully Convolutional Networks with Random Data Augmentation for Enhanced Generalization in Road Detection

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Squeeze-SegNet: A new fast Deep Convolutional Neural Network for Semantic Segmentation

Understand Amazon Deforestation using Neural Network

Rich feature hierarchies for accurate object detection and semantic segmentation

UAV Navigation above Roads Using Convolutional Neural Networks

HYBRIDNET FOR DEPTH ESTIMATION AND SEMANTIC SEGMENTATION

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

CAP 6412 Advanced Computer Vision

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Transcription:

Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta

Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results

What is Semantic Segmentation? Input: RGB Image Output: A segmentation Mask

Encoder-Decoder Networks Encoder Decoder Encoder Takes an input image and generates a high-dimensional feature vector Aggregate features at multiple levels Decoder Takes a highdimensional feature vector and generates a semantic segmentation mask Decode features aggregated by encoder at multiple levels

Building Blocks of CNNs > Convolution > Down-Sampling > Up-Sampling

Convolution Filter weights are learned from data a0 a1 a2 a3 a4 a5 a6 a7 a8 x0 x1 x2 x3 x4 X5 x6 x7 x8

Down-Sampling > Max-pooling > Average Pooling > Strided Convolution a0 a1 b0 b1 a0 c1 b1 d3 Max-Pooling a2 a3 b2 b3 c0 c1 d0 d1 c2 c3 d2 d3 a c b d Avg. Pooling xx xx xx xx Strided Convolution (stride = 2)

Up-Sampling > Un-pooling > Deconvolution Un-Pooling a0 0 0 b1 0 0 0 0 a0 a1 b0 b1 a2 a3 b2 b3 a0 b1 0 c1 0 0 0 0 0 d3 c0 c1 d0 d1 c2 c3 d2 d3 Maxpooling c1 d3 Deconvolution 0 0 0 0 x0 x1 x2 x3 0 a0 b1 0 x4 x5 x6 x7 0 c1 d3 0 x8 x9 x10 x11 0 0 0 0 x12 x13 x14 x15

Encoder-Decoder Networks Encoder Decoder

Encoder-Decoder Networks Different Encoding Block Types VGG Inception ResNet

Encoder-Decoder Networks Different Encoding Block Types Input Max-Pool VGG Inception ResNetd Conv 3x3 Conv 3x3 Conv 3x3 Output

Encoder-Decoder Networks Different Encoding Block Types Input Max-Pool VGG Inception ResNet Max-Pool Conv 1x1 Conv 1x1 Conv 3x3 Conv 1x1 Conv 5x5 Conv 1x1 Concat Output

Encoder-Decoder Networks Different Encoding Block Types Input VGG Inception ResNet Conv 3x3 Conv 3x3 Sum Output

Different Encoding Block Types Performance on the ImageNet 2012 Validation Dataset Memory (in MB) 80 60 40 20 Memory per image Parameters (in Million) 150 100 50 Parameters 0 0 Inference Time (in ms) 40 30 20 10 Inference Time Error (in %) 10 9.5 9 8.5 8 Classification Error 0 7.5 VGG Inception ResNet-18

Encoder-Decoder Networks Encoder Decoder

Encoder-Decoder Networks Encoder Decoder

Encoder-Decoder Networks Different Decoding Block Types VGG Inception ResNet

Encoder-Decoder Networks Different Decoding Block Types Input Un-Pool VGG Inception ResNet Conv 3x3 Conv 3x3 Conv 3x3 Output

Encoder-Decoder Networks Different Decoding Block Types Input Deconv 1x1 VGG Inception ResNet Max-Pool Conv 1x1 Conv 1x1 Conv 3x3 Conv 1x1 Conv 5x5 Conv 1x1 Concat Output

Encoder-Decoder Networks Different Decoding Block Types Input VGG Inception ResNet DeConv 3x3 Sum Output

Classification vs Segmentation Error (in %) 9.35 9.3 9.25 9.2 9.15 9.1 Top-5 Classification Error Accuracy (in %) 60 55 50 45 40 35 FCN-32s Segmentation Accuracy 9.05 VGG Inception 30 VGG Inception (a) ImageNet Classificaiton Validation Set (b) PASCAL VOC 2011 Validation Set VGG Inception Source: Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." TPAMI. 2016.

Our Work on Segmenting GigaPixel Breast Biopsy Images

Challenges with the dataset > Limited computational resources > Sliding window approach is promising but Size of patch determines the context Some biological structures may cover several patches

Challenges with the dataset > Some biological structures are rare Necrosis and Secretion have less than 1% of all the pixels

Training details > Training Set: 30 ROIs 25,992 patches of size 256x256 with augmentation Split into training and validation set using 90:10 ratio > Test Set: 28 ROIs > Stochastic Gradient Descent for optimization > Implemented in Torch http://torch.ch/

Segmentation Results RGB Image Ground Truth Label

Segmentation Results Encoder-Decoder Network with skip connection RGB Image Prediction

Segmentation Results Multi-Resolution Encoder- Decoder Network RGB Image Prediction

Segmentation Results RGB Image Ground Truth Plain Multi-Resolution

Segmentation Results F1-Score

Why Segmentation? Results on Diagnosis > Segmented whole dataset (428 ROIs) with the model trained on 30 ROIs > Extracted histograms from segmentation masks and then trained different classifiers > Weak classifiers are as good as strong classifiers

Thank You!!

References 1. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." TPAMI. 2016. (FCN-8s) 2. Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." ICCV. 2015. (DeConvNet) 3. V. Badrinarayanan; A. Kendall; R. Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Scene Segmentation," TPAMI, 2017 (SegNet) 4. Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions., ICLR, 2016 (Dilation) 5. Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." arxiv preprint arxiv:1606.00915 (2016). (DeepLab) 6. Zheng, Shuai, et al. "Conditional random fields as recurrent neural networks." ICCV. 2015. (CRFasRNN) 7. Hariharan, Bharath, et al. "Hypercolumns for object segmentation and fine-grained localization." CVPR. 2015. (HyperColumn)

Two 3x3 filters are same as one 5x5 filter Source: Rethinking the Inception Architecture for Computer Vision by Szegedy et al.