Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Similar documents
Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Classifying a specific image region using convolutional nets with an ROI mask as input

Final Report: Smart Trash Net: Waste Localization and Classification

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Deep Residual Learning

Spatial Localization and Detection. Lecture 8-1

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Lecture 5: Object Detection

Modern Convolutional Object Detectors

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Todo before next class

Martian lava field, NASA, Wikipedia

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Yiqi Yan. May 10, 2017

Computer Vision Lecture 16

Computer Vision Lecture 16

arxiv: v3 [cs.cv] 24 Jan 2018

R-FCN: Object Detection with Really - Friggin Convolutional Networks

YOLO9000: Better, Faster, Stronger

Learning Deep Features for Visual Recognition

Deep Learning for Object detection & localization

arxiv: v1 [cs.cv] 15 Oct 2018

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks

Learning Deep Representations for Visual Recognition

Object Detection on Self-Driving Cars in China. Lingyun Li

Automatic detection of books based on Faster R-CNN

Channel Locality Block: A Variant of Squeeze-and-Excitation

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Classification of objects from Video Data (Group 30)

An Analysis of Scale Invariance in Object Detection SNIP

An Analysis of Scale Invariance in Object Detection SNIP

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Object detection with CNNs

arxiv: v1 [cs.cv] 19 Feb 2019

Advanced Video Analysis & Imaging

Object Detection Based on Deep Learning

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

arxiv: v1 [cs.cv] 26 Jul 2018

arxiv: v1 [cs.cv] 19 Mar 2018

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

SON OF ZORN S LEMMA: TARGETED STYLE TRANSFER USING INSTANCE-AWARE SEMANTIC SEGMENTATION

Lecture 7: Semantic Segmentation

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Mask R-CNN. Kaiming He Georgia Gkioxari Piotr Dollár Ross Girshick Facebook AI Research (FAIR)

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Skin Lesion Attribute Detection for ISIC Using Mask-RCNN

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Visual features detection based on deep neural network in autonomous driving tasks

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches

Kaggle Data Science Bowl 2017 Technical Report

Cascade Region Regression for Robust Object Detection

Detecting cars in aerial photographs with a hierarchy of deconvolution nets

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Amodal and Panoptic Segmentation. Stephanie Liu, Andrew Zhou

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Zero-shot learning with Fast RCNN and Mask RCNN through semantic attribute Mapping

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

arxiv: v1 [cs.cv] 29 Nov 2018

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Finding Tiny Faces Supplementary Materials

Team G-RMI: Google Research & Machine Intelligence

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Feature-Fused SSD: Fast Detection for Small Objects

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Photo OCR ( )

An Accurate and Real-time Self-blast Glass Insulator Location Method Based On Faster R-CNN and U-net with Aerial Images

arxiv: v1 [cs.cv] 23 Apr 2018

Optimizing Object Detection:

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

OBJECT DETECTION HYUNG IL KOO

arxiv: v3 [cs.cv] 17 May 2018

arxiv: v1 [cs.cv] 13 Mar 2019

arxiv: v2 [cs.cv] 19 Apr 2018

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Deep Learning Explained Module 4: Convolution Neural Networks (CNN or Conv Nets)

U-Net On Biomedical Images

Fuzzy Set Theory in Computer Vision: Example 3

Semantic Soft Segmentation Supplementary Material

SNIPER: Efficient Multi-Scale Training

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Convolutional Neural Networks

arxiv: v1 [cs.cv] 26 May 2017

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

arxiv: v2 [cs.cv] 30 Sep 2018

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

arxiv: v2 [cs.cv] 10 Oct 2018

Rich feature hierarchies for accurate object detection and semantic segmentation

Know your data - many types of networks

arxiv: v2 [cs.cv] 18 Jul 2017

Transcription:

Mask R-CNN Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 1

Common computer vision tasks Image Classification: one label is generated for an image from the probability from a softmax function. Convolution+Pooling There is a cat in the image. 2

Common computer vision tasks Image Classification: one label is generated for an image from the probability from a softmax function. Convolution+Pooling There is a cat in the image. Where is the cat? 3

Common computer vision tasks A trained neural network roughly knows where the object is. Ref: Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European conference on computer vision. Springer, Cham, 2014. 4

Common computer vision tasks Classification+Localization: one label is generated for an image based on the probability from a softmax function. Convolution+Pooling 5

Common computer vision tasks Object detection: detect the presence of objects and then label and localize them. 6

Semantic segmentation Upsampling the features to the same witdth and height as the input image. 1. 2. No fully connected layers in such an architecture 1 1 convolution is used to collapse all the channels of a feature volume into one chanel Ref: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015. 7

Sematic segmentation vs instance segmentation Source; https://stackoverflow.com/questions/33947823/what-is-semantic-segmentation-com pared-to-segmentation-and-scene-labeling 8

Sematic segmentation vs instance segmentation One pixel can have different semantic meaning in different regions Ref: Yi Li, et al. FCIS 9

What can Mask-RCNN do? 10

How does Mask-rcnn work at a high level? An FCN on RoIs is added to Faster-rcnn source: https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html 11

Faster-rcnn = Fast-rcnn + RPN Ref: Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. 12

RoIPool layer in fast-rcnn RoI pooling layer uses max pooling to covert the features inside any valid region of interest into a small feature map with a predefined size. RoI pooling Divide the h w RoI window into a H W grid of subwindows and then do do max-pooling in each sub-window 13

Best way to draw an apple?

Drawbacks of RoI pooling for segmentation task Translation equivariance: shifts in input lead to corresponding changes in the feature map Source: author s slide 15

One pixel in RoI means many pixels in the raw image The misalignment introduced by Quantization associated with RoIPool matters for a large stride Slide source: https://www.youtube.com/watch?v=xgi-mz3do2s&t=288s 16

Solution: RoI align Property 1: Pixel-to-pixel alignment https://en.wikipedia.org/wiki/bilinear_interpolation Source: author s slide 17

Solution: RoI align Property 2: equivariance Source: author s slide 18

Solution: RoI align Property 2: Scale-invariant Source: author s slide 19

FCN Mask head Per class binary masks are generated rather than a multinomial mask. No class competition. 20

How is a mask generated from FCN head? Source: author s slide 21

Backbone of Mask-rcnn: FPN.. RPN Region CNN features RoI Align Classification Regression Segmentation 22

Backbone of Mask-rcnn: FPN SSD: skip connection FPN: lateral connection 23

Experiments 24

State-of-art in 2016: FCIS 25

Main results: Mask R-CNN vs MNC and FCIS 26

Ablation: different backbone architecture More efficient encoder and decoder help generally. 27

Ablation: Binary vs Multinomial Loss 28

Ablation: RoIAlign vs RoPool 29

Bounding box detection results are improved by RoIAlign ResNeXt-101-FPN on COCO. 30

Bounding box detection benefits from mult-task learning Bounding box detection benefits from the segmentation task Backbone: ResNeXt-101-FPN on COCO. 31

Qualitative Results on Instance Segmentation https://www.youtube.com/watch?v=g7z4mkfrji4&t=639s 32

Generality of Mask-rcnn: Human keypoint detection 33

Thank you! 34

Supplementary slides follow 35

Invariance vs equivariance Translation means that each point/pixel in the image has been moved the same amount in the same direction Convolution operation is translation equivariant Maxpooling is translation invariance Ref: 37

Method for upsampling in segmentation task 1. 2. 3. Ref: Uppooling Interpolation deconvolution 38

ResNeXt: what is new compared to Resnet? Similar to Inception net but share the same topology among the multiple paths and use summation rather than concatenation to merge Cardinality the number of independent paths Width: the number of channels in output. Note: convolution is 3d computation. Ref: S. Xie, R. Girshick, P. Dollar, Z. Tu and K. He. Aggregated Residual Transformations for Deep Neural Networks 39

Position sensitive score map Ref: Jifeng Dai, et al. Instance FCN 40

Instance FCN vs FCIS Ref: Yi Li, et al. FCIS 41

Multi-task Network Cascades Ref: Yi Li, et al. Instance-aware Semantic Segmentation via Mult-task Network Cascades 42

AP metric Fig source: http://darkpgmr.tistory.com/162 43

RoI warp vs RoIPool The RoI warping layer crops a feature map region and warps it into a target size by interpolation. Ref: Jifeng Dai, kaiming He, Jian Sun Instance-aware Semantic Segmentation via Multi-task Network Cascades 44

Timing ResNet-50-FPN on COCO trainval135k 8-GPU 0.72s per 16-image mini-batch 45