R-FCN: Object Detection with Really - Friggin Convolutional Networks

Size: px

Start display at page:

Download "R-FCN: Object Detection with Really - Friggin Convolutional Networks"

Lester Gardner
5 years ago
Views:

Research NIPS, 2016 Or Region-based Fully

1 R-FCN: Object Detection with Really - Friggin Convolutional Networks Jifeng Dai Microsoft Research Li Yi Tsinghua Univ. Kaiming He FAIR Jian Sun Microsoft Research NIPS, 2016 Or Region-based Fully Convolutional Networks VGG Reading Group - Sam Albanie

2 Object Detection

2016 YOLO 9000 CVPR 2014 ECCV 2014 NIPS 2015 YOLO ARXIV June, 2016 NOW WITH MORE LAYERS *Serge Bolongieism ARXIV June, 2015 R-CNN minus R BMVC 2015

3 Some members of the postdeepluvian* ARXIV Nov, 2015 object detection family tree ARXIV Apr, 2015 ProNet ARXIV Jan, 2017 FAST R-CNN CVPR 2016 ARXIV Dec, 2015 SSD+ DSSD ARXIV Nov, 2013 R-CNN ARXIV June, 2014 SPP-Net ICCV 2015 ARXIV June, 2015 Faster RCNN SSD CVPR 2016 ARXIV June, 2015 ARXIV Dec, 2016 YOLO 9000 CVPR 2014 ECCV 2014 NIPS 2015 YOLO ARXIV June, 2016 NOW WITH MORE LAYERS *Serge Bolongieism ARXIV June, 2015 R-CNN minus R BMVC 2015 Fully connected bidirectional inspiration layer CVPR 2016 ARXIV Dec, 2015 G-CNN CVPR 2016 Fully connected unidirectional inspiration layer R-FCN NIPS 2016

4 * Motivation: Sharing is Caring RoI UNSHARED CONVOLUTIONS SHARED CONVOLUTIONS SHARED CONVOLUTIONS RoI R-CNN Faster R-CNN R-FCN ResNet-101 backbone (to scale) RoI *Trademark: Salvation Army

5 Problem: Location Invariance For image classification we want location invariance For object detection, we want location variance In previous work, a RoI pooling layer has been inserted before the final convolutions to break the invariance at the cost of reduced sharing

6 Solution: Position-Sensitive Score Maps Waffle explanation. Much like neural networks, it works on multiple layers.

7 Position-Sensitive Score Maps Channels take responsibility for relative spatial locations

8 Efficient Sharing of Diagrams

9 Backbone: Res-101 Minor modifications: Remove the GAP Dimensionality reduction layer (1024)

10 Further Details Bbox regression under standard parameterisation Standard loss function Online Hard Example Mining during training Faster R-CNN-style alternating optimisation Dilation used at conv5 (RPN works from conv4) - gives a 2.6 map boost

11 Visualisation: Hit

12 Visualisation: Miss

13 Experiments

14 The Effect of Position Sensitivity on fully convolutional strategies ( naive Faster R-CNN still has FC layer after RoI pooling) Without position sensitivity, Faster R-CNN takes a major performance hit when the RoI pooling is late in the network

15 Standard Benchmarks: VOC 2007

16 Standard Benchmarks: VOC 2012

17 The Effect of Depth Saturates at ResNet-101

18 The Effect of Proposal Type Works pretty well with any proposal method

19 Summary A little more efficient than Faster RCNN Simpler Makes a tradeoff with efficiency for accuracy

20 Appendix/Details

21 Standard Benchmarks: MS COCO

22 The Effect of Proposal Numbers: VOC 2007

23 Position Sensitive RoI Pooling: for all the indexing fans Scores are averaged over bins inside regions where (i,j)-th bin spans:

24 Standard Object Detection Multitask Loss Function Class loss is computed by averaging the positional scores (i.e. voting) to produce a C+1 dim vector for each RoI, pushing through softmax and computing cross entropy. Regression loss is similar, producing a 4-dim vector which is passed into Huber loss. The two losses are combined in a weighted sum: Positive examples are formed from the RoIs that have intersection-over-union (IoU) overlap with a ground-truth box of at least 0.5, and negative otherwise

25 Bounding Box Regression In Object Detection: R-CNN style Predict bounding box updates with additional 4*k*k-dim convolutional layer {(P i,g i )} i=1,...,n, where P i =(P i x,p i y,p i w,p i h) Parameterise mapping with linear functions such that: d x (P ),d y (P ),d w (P ),d h (P ) ˆ G x = P w d x (P )+P x Ĝ y = P h d y (P )+P y (scale invariant) Gw ˆ = P w exp(d w (P ))) Gˆ h = P h exp(d h (P ))) (log space)

26 OHEM: Online Hard Example Mining (bootstrapping) Rank regions by loss and only use the top ranked These hard examples will evolve as the network trains OHEM is particularly efficient in R-FCN due to the (almost) free ranking of all region proposals

27 Alternating Optimisation: You put your left boot in, your left boot out 1. Train RPN 2. Use proposals to train Fast R-CNN 3. The resulting network is used to initialise RPN 4. Retrain Fast R-CNN with the updated RPN sharing convolutions

28 Dilated Convolutions Figure 1 from Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions.

29 Dilated Convolutions

G-CNN: an Iterative Grid Based Object Detector

G-CNN: an Iterative Grid Based Object Detector Magyar Najibi Univ. Maryland Mohammad Rastegari Univ. Maryland Larry S. Davis Univ. Maryland CVPR, 2016 VGG Reading Group - Sam Albanie Motivation: Proposals