R-FCN: Object Detection with Really - Friggin Convolutional Networks

Similar documents
G-CNN: an Iterative Grid Based Object Detector

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Object Detection on Self-Driving Cars in China. Lingyun Li

Yiqi Yan. May 10, 2017

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Spatial Localization and Detection. Lecture 8-1

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Lecture 5: Object Detection

Modern Convolutional Object Detectors

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Object detection with CNNs

Instance-aware Semantic Segmentation via Multi-task Network Cascades

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

YOLO 9000 TAEWAN KIM

Object Detection Based on Deep Learning

Deep Learning for Object detection & localization

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

RON: Reverse Connection with Objectness Prior Networks for Object Detection

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Paper Motivation. Fixed geometric structures of CNN models. CNNs are inherently limited to model geometric transformations

Cascade Region Regression for Robust Object Detection

Joint Object Detection and Viewpoint Estimation using CNN features

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

OBJECT DETECTION HYUNG IL KOO

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Deep Residual Learning

Lecture 7: Semantic Segmentation

Final Report: Smart Trash Net: Waste Localization and Classification

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

arxiv: v1 [cs.cv] 15 Oct 2018

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

arxiv: v1 [cs.cv] 19 Mar 2018

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

Finding Tiny Faces Supplementary Materials

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

arxiv: v1 [cs.cv] 4 Jun 2015

Regionlet Object Detector with Hand-crafted and CNN Feature

Photo OCR ( )

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Computer Vision Lecture 16

Computer Vision Lecture 16

Fully Convolutional Networks for Semantic Segmentation

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

Visual features detection based on deep neural network in autonomous driving tasks

arxiv: v2 [cs.cv] 18 Jul 2017

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Rich feature hierarchies for accurate object detection and semant

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

arxiv: v1 [cs.cv] 26 May 2017

Abandoned Luggage Detection

arxiv: v2 [cs.cv] 10 Apr 2017

Flow-Based Video Recognition

arxiv: v1 [cs.cv] 31 Mar 2017

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Self Driving. DNN * * Reinforcement * Unsupervised *

YOLO9000: Better, Faster, Stronger

Automatic detection of books based on Faster R-CNN

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

Yield Estimation using faster R-CNN

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

Martian lava field, NASA, Wikipedia

arxiv: v1 [cs.cv] 14 Dec 2015

Xiaowei Hu* Lei Zhu* Chi-Wing Fu Jing Qin Pheng-Ann Heng

Skin Lesion Attribute Detection for ISIC Using Mask-RCNN

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

arxiv: v1 [cs.cv] 26 Jun 2017

arxiv: v1 [cs.cv] 19 Feb 2019

Learning Globally Optimized Object Detector via Policy Gradient

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

arxiv: v1 [cs.cv] 16 Nov 2018

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Classifying a specific image region using convolutional nets with an ROI mask as input

Team G-RMI: Google Research & Machine Intelligence

Weakly Supervised Object Recognition with Convolutional Neural Networks

From Keyboards to Neural Networks 从键盘到神经网络

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

Classification of objects from Video Data (Group 30)

Optimizing Object Detection:

Bilinear Models for Fine-Grained Visual Recognition

Advanced Video Analysis & Imaging

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Rich feature hierarchies for accurate object detection and semantic segmentation

Deep Face Recognition. Nathan Sun

Transcription:

R-FCN: Object Detection with Really - Friggin Convolutional Networks Jifeng Dai Microsoft Research Li Yi Tsinghua Univ. Kaiming He FAIR Jian Sun Microsoft Research NIPS, 2016 Or Region-based Fully Convolutional Networks VGG Reading Group - Sam Albanie

Object Detection

Some members of the postdeepluvian* ARXIV Nov, 2015 object detection family tree ARXIV Apr, 2015 ProNet ARXIV Jan, 2017 FAST R-CNN CVPR 2016 ARXIV Dec, 2015 SSD+ DSSD ARXIV Nov, 2013 R-CNN ARXIV June, 2014 SPP-Net ICCV 2015 ARXIV June, 2015 Faster RCNN SSD CVPR 2016 ARXIV June, 2015 ARXIV Dec, 2016 YOLO 9000 CVPR 2014 ECCV 2014 NIPS 2015 YOLO ARXIV June, 2016 NOW WITH MORE LAYERS *Serge Bolongieism ARXIV June, 2015 R-CNN minus R BMVC 2015 Fully connected bidirectional inspiration layer CVPR 2016 ARXIV Dec, 2015 G-CNN CVPR 2016 Fully connected unidirectional inspiration layer R-FCN NIPS 2016

* Motivation: Sharing is Caring RoI UNSHARED CONVOLUTIONS SHARED CONVOLUTIONS SHARED CONVOLUTIONS RoI R-CNN Faster R-CNN R-FCN ResNet-101 backbone (to scale) RoI *Trademark: Salvation Army

Problem: Location Invariance For image classification we want location invariance For object detection, we want location variance In previous work, a RoI pooling layer has been inserted before the final convolutions to break the invariance at the cost of reduced sharing

Solution: Position-Sensitive Score Maps Waffle explanation. Much like neural networks, it works on multiple layers.

Position-Sensitive Score Maps Channels take responsibility for relative spatial locations

Efficient Sharing of Diagrams

Backbone: Res-101 Minor modifications: Remove the GAP Dimensionality reduction layer (1024)

Further Details Bbox regression under standard parameterisation Standard loss function Online Hard Example Mining during training Faster R-CNN-style alternating optimisation Dilation used at conv5 (RPN works from conv4) - gives a 2.6 map boost

Visualisation: Hit

Visualisation: Miss

Experiments

The Effect of Position Sensitivity on fully convolutional strategies ( naive Faster R-CNN still has FC layer after RoI pooling) Without position sensitivity, Faster R-CNN takes a major performance hit when the RoI pooling is late in the network

Standard Benchmarks: VOC 2007

Standard Benchmarks: VOC 2012

The Effect of Depth Saturates at ResNet-101

The Effect of Proposal Type Works pretty well with any proposal method

Summary A little more efficient than Faster RCNN Simpler Makes a tradeoff with efficiency for accuracy

Appendix/Details

Standard Benchmarks: MS COCO

The Effect of Proposal Numbers: VOC 2007

Position Sensitive RoI Pooling: for all the indexing fans Scores are averaged over bins inside regions where (i,j)-th bin spans:

Standard Object Detection Multitask Loss Function Class loss is computed by averaging the positional scores (i.e. voting) to produce a C+1 dim vector for each RoI, pushing through softmax and computing cross entropy. Regression loss is similar, producing a 4-dim vector which is passed into Huber loss. The two losses are combined in a weighted sum: Positive examples are formed from the RoIs that have intersection-over-union (IoU) overlap with a ground-truth box of at least 0.5, and negative otherwise

Bounding Box Regression In Object Detection: R-CNN style Predict bounding box updates with additional 4*k*k-dim convolutional layer {(P i,g i )} i=1,...,n, where P i =(P i x,p i y,p i w,p i h) Parameterise mapping with linear functions such that: d x (P ),d y (P ),d w (P ),d h (P ) ˆ G x = P w d x (P )+P x Ĝ y = P h d y (P )+P y (scale invariant) Gw ˆ = P w exp(d w (P ))) Gˆ h = P h exp(d h (P ))) (log space)

OHEM: Online Hard Example Mining (bootstrapping) Rank regions by loss and only use the top ranked These hard examples will evolve as the network trains OHEM is particularly efficient in R-FCN due to the (almost) free ranking of all region proposals

Alternating Optimisation: You put your left boot in, your left boot out 1. Train RPN 2. Use proposals to train Fast R-CNN 3. The resulting network is used to initialise RPN 4. Retrain Fast R-CNN with the updated RPN sharing convolutions

Dilated Convolutions Figure 1 from Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions.

Dilated Convolutions