Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Similar documents
Object detection with CNNs

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Spatial Localization and Detection. Lecture 8-1

Yiqi Yan. May 10, 2017

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Deep Learning for Object detection & localization

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object Detection Based on Deep Learning

Lecture 5: Object Detection

Automatic detection of books based on Faster R-CNN

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

OBJECT DETECTION HYUNG IL KOO

Optimizing Object Detection:

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Object Detection on Self-Driving Cars in China. Lingyun Li

Yield Estimation using faster R-CNN

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Regionlet Object Detector with Hand-crafted and CNN Feature

Real-time Object Detection CS 229 Course Project

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

arxiv: v1 [cs.cv] 4 Jun 2015

Final Report: Smart Trash Net: Waste Localization and Classification

Deep Residual Learning

Visual features detection based on deep neural network in autonomous driving tasks

Computer Vision Lecture 16

Computer Vision Lecture 16

Project 3 Q&A. Jonathan Krause

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Joint Object Detection and Viewpoint Estimation using CNN features

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

R-FCN: Object Detection with Really - Friggin Convolutional Networks

Object Detection. Part1. Presenter: Dae-Yong

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Rich feature hierarchies for accurate object detection and semantic segmentation

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Finding Tiny Faces Supplementary Materials

Advanced Video Analysis & Imaging

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Cascade Region Regression for Robust Object Detection

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Zero-shot learning with Fast RCNN and Mask RCNN through semantic attribute Mapping

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Modern Convolutional Object Detectors

Rich feature hierarchies for accurate object detection and semantic segmentation

arxiv: v1 [cs.cv] 26 Jun 2017

Why Is Recognition Hard? Object Recognizer

A Novel Representation and Pipeline for Object Detection

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

RON: Reverse Connection with Objectness Prior Networks for Object Detection

WE are witnessing a rapid, revolutionary change in our

Optimizing Object Detection:

Classification of objects from Video Data (Group 30)

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Hybrid Learning Framework for Large-Scale Web Image Annotation and Localization

Martian lava field, NASA, Wikipedia

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Rich feature hierarchies for accurate object detection and semant

Deep Learning and Its Applications

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

Deep Neural Networks:

Perceptron: This is convolution!

Object Detection. Computer Vision Yuliang Zou, Virginia Tech. Many slides from D. Hoiem, J. Hays, J. Johnson, R. Girshick

YOLO 9000 TAEWAN KIM

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

YOLO9000: Better, Faster, Stronger

G-CNN: an Iterative Grid Based Object Detector

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Classifying a specific image region using convolutional nets with an ROI mask as input

Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning

Deep Learning for Computer Vision II

Improved Face Detection and Alignment using Cascade Deep Convolutional Network

Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. Presented by: Karen Lucknavalai and Alexandr Kuznetsov

arxiv: v1 [cs.cv] 5 Oct 2015

Recurrent Convolutional Neural Networks for Scene Labeling

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Mimicking Very Efficient Network for Object Detection

Lecture 37: ConvNets (Cont d) and Training

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies

Large-scale Video Classification with Convolutional Neural Networks

Know your data - many types of networks

arxiv: v2 [cs.cv] 3 Sep 2018

SSD: Single Shot MultiBox Detector

arxiv: v1 [cs.cv] 23 Apr 2015

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Self Driving. DNN * * Reinforcement * Unsupervised *

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Deep Learning with Tensorflow AlexNet

Machine Learning 13. week

Transcription:

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross Girshick Jian Sun Present by: Yixin Yang Mingdong Wang 1 Object Detection 2 1

Applications Basic task for image understanding Output only bounding boxes Enables many downstream applications Almost everything requires detection! Vehicle traffic, accident etc. Face security, APPs 3 Applications 4 2

Applications 5 Applications http://how-old.net/ By Microsoft 6 3

Basic Idea: Detection as classification Select a region (by any means) Classify this region Example Cat? No Dog? No 7 Basic Idea: Detection as classification Select a region (by any means) Classify this region Example Cat? No Dog? Yes 8 4

Basic Idea: Detection as classification Select a region (by any means) Classify this region Example Cat? No Dog? No 9 Basic Idea: Detection as classification Problem: Too many proposals Slow Solution: Only look at possible regions Region proposal Find regions that are likely to have object in it Class-invariant 10 5

Region Proposal Selective Search Feature-based Bottom-up Merging Uijlings et al, Selective Search for Object Recognition, IJCV 2013 11 Region Proposal Many other solutions EdgeBoxes is the best in practice Hosang et al, What makes for effective detection proposals?, PAMI 2015 12 6

Outline Previous work Region-based Convolutional Neural Network (R-CNN) Spatial Pyramid Pooling Network (SPP-Net) Fast R-CNN Faster R-CNN Region Proposal Network (RPN) Detection Experiments 13 R-CNN Region Proposals + CNN Three Steps: Use Selective Search to get region proposals (~2k) Warp every region proposal to 227x227, then extract feature by CNN Classify: Support Vector Machine (SVM) 14 7

R-CNN Warp Image: The inputs of CNN should be the same size Training: Pre-train CNN for image classification Fine-tune CNN for object detection Train linear predictor for object detection 15 R-CNN What is wrong with R-CNN? Training and testing is slow Takes a lot of disk space Reason a ConvNet forward pass for each object proposal How to solve? 1 CNN for whole image -> Spatial Pyramid Pooling Net (Spp-net) 16 8

Spatial Pyramid Pooling Net Much similar with R-CNN, but only 1 CNN for the whole image In fact, it is the fully-connect layer that needs the fix-size input 17 Spatial Pyramid Pooling Net 1 CNN for the input image and get the feature map Add a SPP layer after the last convolutional layer 18 9

Spatial Pyramid Pooling Net SPP layer Example The input of the spp layer can by arbitrary size The output size is 21x256 no matter the input size 19 Spatial Pyramid Pooling Net 20 10

Spatial Pyramid Pooling Net The improvement of SPP-net Makes training and testing fast What is wrong with SPP-net? Cannot update parameters below SPP layer during training How to solve? Use Softmax classifer instead of SVM -> Fast R-CNN 21 Fast R-CNN There are two differences in fast R-CNN Add a Region of Interest (Rol) pooling layer after the last convolutional layer Two output vectors per RoI: softmax probabilities and bouding-box regression on offset. The architecture is trained end-to-end with a multi-task loss. 22 11

Fast R-CNN RoI pooling layer Use max pooling to convert the features insides any valid region of interest into a small feature map with a fixed spatial extent of HxW? 23 Fast R-CNN RoI pooling layer (example) Input: a single 8 8 feature map and a region proposal (arbitrary size) Output: 2x2 feature map for future use 24 12

Fast R-CNN RoI pooling layer Pooling sections h/h && w/w, in this case: 5/2 && 7/2 Notice that the size of the region of interest doesn t have to be perfectly divisible by the number of pooling sections 25 Fast R-CNN RoI pooling layer Max pooling: Get the max values in each of the sections 26 13

Fast R-CNN Two output layers discrete probability distribution (per RoI), p = (p 0, p 1, p K ) over K+1 categories bounding-box regression offsets, t k = (t x k, t y k, t w k, t h k ), for each of the K object classes 27 Fast R-CNN Multi-task loss u is the ground-truth class v is ground-truth bounding-box regression target Where is the log loss for true class u 28 14

Fast R-CNN Multi-task loss Where 29 Fast R-CNN What is wrong with Fast R-CNN Use Selective Search to get the regions proposal, which is time consuming. How to solve? Use Convolutional Neural Network to generate region proposal -> Faster R-CNN 30 15

Faster R-CNN Object proposal is the bottleneck Selective search ~ 2s EdgeBoxes ~ 0.2s As much as the detection network Feature map used by detector can also used for generating proposals Why not also use CNN? Better performance Sharing computation 31 Region Proposal Network (RPN) Input Image (feature map) of any size Output A set of rectangular object proposals Each with an objectness score 32 16

RPN Insert the RPN after the last conv layer RPN will produce region proposal directly, serves as attention After RPN, use RoI pooling and bounding-box regressor just like fast R-CNN 33 RPN Slide a network on the feature map For each nxn (n=3) window use a conv kernel to produce another feature map Now we have a KxKx256 feature map Each pixel in this feature map is an anchor 34 17

RPN: Anchor For each anchor Propose k anchor boxes Related to this anchor Prior assumption For each box, regress Objectness score Coordinates 35 Loss function L_cls: classification error L_reg: bbox coords regression error p_i/p_i*: predicted/ground truth classification t_i/t_i*: predicted/ground truth bbox coords 36 18

RPN training End to end trainable by SGD Stochastic gradient descent Each mini-batch arises from a single image Randomly sample 256 anchors in the image Try to make pos/neg ~ 1 Otherwise negative will dominate -> biased 37 RPN & Fast RCNN detector If trained separately, they will have different conv params Try to share conv layers How to write the update formula? 38 19

Joint training Exact derivatives are hard to get Derivatives of RoI pooling layer w.r.t. box coords Alternating Train RPN, then use proposals to train detector, then train RPN Approximate Ignore the derivatives of bbox coords as if they are fixed 4-step training Train RPN on pretrained ImageNet network Train a detector with RPN proposals but using different conv params Use detector params to initialize RPN network, but fix shared layers, finetune RPN Fix shared layers, finetune detector 39 Other details Anchor box area & ratio Select without carefully tested Cross image boundaries handling Ignore when training Crop when testing RPN proposals overlapping NMS to reduce proposals 40 20

Experiments 41 Experiments 42 21

Experiments 43 Experiments 44 22

Follow Up YOLO (You Only Look Once: Unified, Real-Time Object) Instead of using region proposal + classification, doing the regression of the position and class of bounding box Convert the objection detection to a Regression problem SDD (Single Shot MultiBox Detector) YOLO2 45 Summary Find a variable number of objects by classifying image regions R-CNN Selective Search + CNN + SVM ~30s / img Fast RCNN Swap order of convolutions and region extraction 2 s / img Faster RCNN Compute region proposals within the network 0.2 s / img 46 23

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Thanks for listening Q && A 47 24