Lecture 5: Object Detection

Size: px

Start display at page:

Download "Lecture 5: Object Detection"

Brittany Poole
5 years ago
Views:

Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.

architecture (e.g., selective search, edgebox) Classification Softmax, SVM Object detection Independent evaluation of each proposal Bounding box regression improves detection accuracy.

1 Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. 2 Traditional Object Detection Algorithms Region-based CNN (R-CNN) Multi-scale sliding window + classifier Input image Extract region proposal Compute CNN features Any proposal method Any architecture (e.g., selective search, edgebox) Classification Softmax, SVM Object detection Independent evaluation of each proposal Bounding box regression improves detection accuracy. Mean average precision (map): 53.7% with bounding box regression in VOC 2010 test set 3 4 [Girshick14] R. Girshick, J. Donahue, S. Guadarrama, T. Darrell, J. Malik: Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014

5 Selective Search Motivation Sliding window approach is not feasible for object detection with convolutional neural networks.

Finding object proposals Greedy hierarchical superpixel segmentation Diversification of superpixel construction and merge Using a

van de Sande, T. Gevers, A. W. M. Smeulders: Selective Search for Object Recognition.

%,! #,! % Ground-truth: & = & ', & (, & #, & % Transformation: )! = (+ ', + (, + #, + % ) )!! > -. = argmin 7 + 8. - :. ; <! 8-6 8?

2 5 Selective Search Motivation Sliding window approach is not feasible for object detection with convolutional neural networks. We need a more faster method to identify object candidates. Finding object proposals Greedy hierarchical superpixel segmentation Diversification of superpixel construction and merge Using a variety of color spaces Using different similarity measures Varying staring regions [Uijlings13] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders: Selective Search for Object Recognition. IJCV Bounding Box Regression Learning a transformation of bounding box Region proposal:! =! #,! %,! #,! % Ground-truth: & = & ', & (, & #, & % Transformation: )! = (+ ', + (, + #, + % ) )!! > -. = argmin :. ; <! 8-6 8?@ & &C ' =! # ) '! +! ' &C ( =! % ) (! +! ( &C # =! # exp ) #! &C % =! % exp ) %! ).! = -. : ; <! = CNN pool5 feature + B -. = R-CNN Detection Results VOC 2010 test set Feature analysis on VOC 2007 test set 7 8

Fast R-CNN Fast R-CNN 9 Fast version of RCNN 9x faster in training and 213x

using object proposals Bounding box regression into network Single stage

Girshick: Fast R-CNN, ICCV 2015 https://medium.

3 Fast R-CNN Fast R-CNN 9 Fast version of RCNN 9x faster in training and 213x faster in testing than RCNN A single feature computation and ROI pooling using object proposals Bounding box regression into network Single stage training using multi-task loss [Girshick15] R. Girshick: Fast R-CNN, ICCV Faster R-CNN Fast RCNN + RPN Proposal computation into network Marginal cost of proposals: 10ms Faster R-CNN [Ren15] S. Ren, K. He, R. Girshick, J. Sun: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS

Object Detection Performance Faster RCNN with ResNet RCNN family achieves the state-of-the-art performance

Pascal VOC 2007 Object Detection map (%) 13 14 Faster RCNN with ResNet Main idea Discretizes the output

location Generates scores for the presence of each category in each default box Produces adjustments to the

4 Object Detection Performance Faster RCNN with ResNet RCNN family achieves the state-of-the-art performance in object detection! Pascal VOC 2007 Object Detection map (%) Faster RCNN with ResNet Main idea Discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location Generates scores for the presence of each category in each default box Produces adjustments to the box to better match the object shape Combines predictions from multi-resolution feature maps to handle objects of various sizes [Liu16] W. Liu, et al.: SSD: Single Shot MultiBox Detector. ECCV 2016

Architecture PASCAL VOC2007 test set Accuracy 17 Loss function G H, I, J, K = 1 M G NOPQ

proposals Easy to train and straightforward to integrate into systems that require a

Characteristics Framing object detection as a regression problem Predicting bounding boxes

Real-time performance 45 FPS (150 FPS for fast version) COCO test-dev2015 19 [Redmon16] J.

5 Architecture PASCAL VOC2007 test set Accuracy 17 Loss function G H, I, J, K = 1 M G NOPQ H, I + G RON H, J, K Characteristics Simple relative to methods that require object proposals Easy to train and straightforward to integrate into systems that require a detection component 18 Speed PASCAL VOC2012 test set You Only Look Once (YOLO) Characteristics Framing object detection as a regression problem Predicting bounding boxes and class probabilities directly from full images in one evaluation using a single network Real-time performance 45 FPS (150 FPS for fast version) COCO test-dev [Redmon16] J. Redmon, S. Divvala, R. Girshick, A. Farhadi: You Only Look Once: Unified, Real-Time Object Detection. CVPR

You Only Look Once (YOLO) 2 boxes per grid: P(Object)*IOU You Only Look Once (YOLO) 7 x 7 x (2 x 5 + 20) = 7 x

Only Look Once (YOLO) PASCAL VOC2012 test (leaderboard as of 11/6/2015) YOLO9000 Main idea and achievements

only for a small subset of classes The state-of-the-art performance: faster than SSD and more accurate than

clusters: learning the prior of bounding box shapes through k-means clustering Better parametrization of

6 You Only Look Once (YOLO) 2 boxes per grid: P(Object)*IOU You Only Look Once (YOLO) 7 x 7 x (2 x ) = 7 x 7 x 30 tensor = 1470 outputs Bicycle Car 21 Dog Dining Table Conditioned on object: P(Class Object) 22 You Only Look Once (YOLO) PASCAL VOC2012 test (leaderboard as of 11/6/2015) YOLO9000 Main idea and achievements Joint training of classification and detection Detecting over 9000 object categories with localization labels only for a small subset of classes The state-of-the-art performance: faster than SSD and more accurate than Faster R-CNN Techniques for performance improvement Batch normalization High resolution classifier Dimension clusters: learning the prior of bounding box shapes through k-means clustering Better parametrization of location prediction: more stable training [Redmon17] J. Redmon, A. Farhadi: YOLO9000: Better, Faster, Stronger. CVPR 2017

SSD paper because they are concurrent works and results of

7 Results of YOLOv2 Results of YOLOv2 PASCAL VOC2007 PASCAL VOC2012 test set The accuracy may not be consistent with SSD paper because they are concurrent works and results of other algorithms may not be up-to-date. COCO test-dev

Object detection with CNNs

Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals