Object detection with CNNs

Similar documents
Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Object Detection Based on Deep Learning

Lecture 5: Object Detection

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Spatial Localization and Detection. Lecture 8-1

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Yiqi Yan. May 10, 2017

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

YOLO9000: Better, Faster, Stronger

Modern Convolutional Object Detectors

OBJECT DETECTION HYUNG IL KOO

Optimizing Object Detection:

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Deep Learning for Object detection & localization

Unified, real-time object detection

arxiv: v1 [cs.cv] 15 Oct 2018

Rich feature hierarchies for accurate object detection and semantic segmentation

Improving Small Object Detection

Rich feature hierarchies for accurate object detection and semantic segmentation

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Hand Detection For Grab-and-Go Groceries

Feature-Fused SSD: Fast Detection for Small Objects

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

arxiv: v1 [cs.cv] 26 Jun 2017

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

SSD: Single Shot MultiBox Detector

arxiv: v1 [cs.cv] 4 Jun 2015

arxiv: v1 [cs.cv] 26 May 2017

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

G-CNN: an Iterative Grid Based Object Detector

arxiv: v1 [cs.cv] 3 Apr 2016

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Real-time Object Detection CS 229 Course Project

Yield Estimation using faster R-CNN

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

arxiv: v1 [cs.cv] 22 Mar 2018

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Gated Bi-directional CNN for Object Detection

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

arxiv: v2 [cs.cv] 23 Nov 2017

Optimizing Object Detection:

arxiv: v1 [cs.cv] 6 Jul 2017

arxiv: v1 [cs.cv] 18 Jun 2017

Regionlet Object Detector with Hand-crafted and CNN Feature

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

An algorithm for highway vehicle detection based on convolutional neural network

arxiv: v3 [cs.cv] 17 May 2018

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Computer Vision Lecture 16

arxiv: v2 [cs.cv] 23 Jan 2019

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

arxiv: v1 [cs.cv] 5 Oct 2015

Learning Globally Optimized Object Detector via Policy Gradient

Object Detection on Self-Driving Cars in China. Lingyun Li

YOLO 9000 TAEWAN KIM

Defect Detection from UAV Images based on Region-Based CNNs

arxiv: v2 [cs.cv] 30 Mar 2016

A MultiPath Network for Object Detection

Adaptive Object Detection Using Adjacency and Zoom Prediction

Why Is Recognition Hard? Object Recognizer

Computer Vision Lecture 16

Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors

arxiv: v3 [cs.cv] 29 Mar 2017

arxiv: v1 [cs.cv] 24 Jan 2019

arxiv: v2 [cs.cv] 13 Jun 2017

Fast Vehicle Detector for Autonomous Driving

RON: Reverse Connection with Objectness Prior Networks for Object Detection

Abandoned Luggage Detection

arxiv: v2 [cs.cv] 8 Apr 2018

Context Refinement for Object Detection

arxiv: v2 [cs.cv] 18 Jul 2017

Project 3 Q&A. Jonathan Krause

arxiv: v3 [cs.cv] 12 Mar 2018

A Novel Representation and Pipeline for Object Detection

CAD: Scale Invariant Framework for Real-Time Object Detection

Robust Object detection for tiny and dense targets in VHR Aerial Images

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

arxiv: v1 [cs.cv] 18 Nov 2017

End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images

Object Recognition and Detection

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Parallel Feature Pyramid Network for Object Detection

Mimicking Very Efficient Network for Object Detection

Feature Fusion for Scene Text Detection

arxiv: v2 [cs.cv] 24 Jul 2018

arxiv: v4 [cs.cv] 30 Nov 2016

DeepBox: Learning Objectness with Convolutional Networks

Multi-Level Fusion based 3D Object Detection from Monocular Images

arxiv: v2 [cs.cv] 10 Oct 2018

Transcription:

Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

Region proposals As an alternative to sliding window search, evaluate a few hundred region proposals Can use slower but more powerful features and classifiers Take advantage of low-level perceptual organization cues Proposal mechanism can be category-independent Proposal mechanism can be trained

Selective search Use hierarchical segmentation: start with small superpixels and merge based on diverse cues J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, IJCV 2013

Evaluation of region proposals J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, IJCV 2013

Selective search detection pipeline Feature extraction: color SIFT, codebook of size 4K, spatial pyramid with four levels = 360K dimensions J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, IJCV 2013

Another proposal method: EdgeBoxes Box score: number of edges in the box minus number of edges that overlap the box boundary Uses a trained edge detector Uses efficient data structures (incl. integral images) for fast evaluation Gets 75% recall with 800 boxes (vs. 1400 for Selective Search), is 40 times faster C. Zitnick and P. Dollar, Edge Boxes: Locating Object Proposals from Edges, ECCV 2014.

R-CNN: Region proposals + CNN features Source: R. Girshick SVMs SVMs ConvNet SVMs ConvNet Classify regions with SVMs Forward each region through ConvNet ConvNet Warped image regions Region proposals Input image R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014.

R-CNN details Regions: ~2000 Selective Search proposals Network: AlexNet pre-trained on ImageNet (1000 classes), fine-tuned on PASCAL (21 classes) Final detector: warp proposal regions, extract fc7 network activations (4096 dimensions), classify with linear SVM Bounding box regression to refine box locations Performance: map of 53.7% on PASCAL 2010 (vs. 35.1% for Selective Search and 33.4% for DPM). R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014.

R-CNN pros and cons Pros Accurate! Any deep architecture can immediately be plugged in Cons Ad hoc training objectives Fine-tune network with softmax classifier (log loss) Train post-hoc linear SVMs (hinge loss) Train post-hoc bounding-box regressions (least squares) Training is slow (84h), takes a lot of disk space 2000 CNN passes per image Inference (detection) is slow (47s / image with VGG16)

Fast R-CNN Softmax classifier Linear + softmax Linear Bounding-box regressors FCs Fully-connected layers RoI Pooling layer Region proposals conv5 feature map of image Forward whole image through ConvNet ConvNet Source: R. Girshick R. Girshick, Fast R-CNN, ICCV 2015

Fast R-CNN training Log loss + smooth L1 loss Multi-task loss Linear + softmax Linear FCs Trainable ConvNet Source: R. Girshick R. Girshick, Fast R-CNN, ICCV 2015

Fast R-CNN: Another view R. Girshick, Fast R-CNN, ICCV 2015

ROI pooling closeup (actual output feature map size: 7 x 7) Image source

Fast R-CNN results Fast R-CNN Train time (h) 9.5 84 - Speedup 8.8x 1x Test time / image R-CNN 0.32s 47.0s Test speedup 146x 1x map 66.9% 66.0% Timings exclude object proposal time, which is equal for all methods. All methods use VGG16 from Simonyan and Zisserman. Source: R. Girshick

Faster R-CNN Region proposals Region Proposal Network feature map feature map share features CNN CNN S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS 2015

Region proposal network (RPN) Slide a small window over the conv5 layer Predict object/no object Regress bounding box coordinates Box regression is with reference to anchors (3 scales x 3 aspect ratios)

Faster R-CNN results

Object detection progress mean0average0precision0(map) 80% 70% 60% 50% 40% 30% 20% 10% Before deep convnets Faster R-CNN Fast R-CNN R-CNNv1 Using deep convnets 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

YOLO 1. Take conv feature maps at 7x7 resolution 2. Add two FC layers to predict, at each location, a score for each class and 2 bboxes w/ confidences (output is 7x7x30 for PASCAL) 7x speedup over Faster RCNN (45-155 FPS vs. 7-18 FPS) Some loss of accuracy due to lower recall, poor localization J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016

YOLO v2 Batch normalization Pre-train on higher resolution ImageNet Use and improve anchor box idea from Faster RCNN Train at multiple resolutions Very good accuracy, very fast J. Redmon and A. Farhadi, YOLO9000: Better, Faster, Stronger, CVPR 2017

SSD W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. Berg, SSD: Single Shot MultiBox Detector, ECCV 2016.

Feature pyramid networks Improve predictive power of lower-level feature maps by adding contextual information from higherlevel feature maps Predict different sizes of bounding boxes from different levels of the pyramid (but share parameters of predictors) T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, CVPR 2017.

New detection benchmark: COCO (2014) 80 categories instead of PASCAL s 20 Current best map: 52% http://cocodataset.org/#home

New detection benchmark: COCO (2014) J. Huang et al., Speed/accuracy trade-offs for modern convolutional object detectors, CVPR 2017

Review: Object detection with CNNs

Review: R-CNN SVMs Classify regions with SVMs SVMs ConvNet SVMs ConvNet ConvNet Forward each region through ConvNet Warped image regions Region proposals Input image R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014.

Review: Fast R-CNN Softmax classifier Linear + softmax Linear Bounding-box regressors FCs Fully-connected layers RoI Pooling layer Region proposals conv5 feature map of image Forward whole image through ConvNet ConvNet R. Girshick, Fast R-CNN, ICCV 2015

Review: Faster R-CNN Region proposals Region Proposal Network feature map feature map share features CNN CNN S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS 2015

Summary: Object detection with CNNs R-CNN: region proposals + CNN on cropped, resampled regions Fast R-CNN: region proposals + RoI pooling on top of a conv feature map Faster R-CNN: RPN + RoI pooling Next generation of detectors Direct prediction of BB offsets, class scores on top of conv feature maps Get better context by combining feature maps at multiple resolutions