Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Similar documents
Object detection with CNNs

Object Detection Based on Deep Learning

Lecture 5: Object Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Spatial Localization and Detection. Lecture 8-1

Yiqi Yan. May 10, 2017

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Rich feature hierarchies for accurate object detection and semantic segmentation

OBJECT DETECTION HYUNG IL KOO

Rich feature hierarchies for accurate object detection and semantic segmentation

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Unified, real-time object detection

Computer Vision Lecture 16

Deep Learning for Object detection & localization

Modern Convolutional Object Detectors

Feature-Fused SSD: Fast Detection for Small Objects

arxiv: v1 [cs.cv] 5 Oct 2015

Computer Vision Lecture 16

Improving Small Object Detection

arxiv: v1 [cs.cv] 4 Jun 2015

arxiv: v1 [cs.cv] 15 Oct 2018

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

Computer Vision Lecture 16

Optimizing Object Detection:

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

arxiv: v1 [cs.cv] 26 Jun 2017

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

YOLO9000: Better, Faster, Stronger

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

G-CNN: an Iterative Grid Based Object Detector

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

A Novel Representation and Pipeline for Object Detection

Yield Estimation using faster R-CNN

Why Is Recognition Hard? Object Recognizer

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

A MultiPath Network for Object Detection

Regionlet Object Detector with Hand-crafted and CNN Feature

Adaptive Object Detection Using Adjacency and Zoom Prediction

Hand Detection For Grab-and-Go Groceries

SSD: Single Shot MultiBox Detector

arxiv: v1 [cs.cv] 3 Apr 2016

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Deformable Part Models

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Robust Object detection for tiny and dense targets in VHR Aerial Images

Fully Convolutional Networks for Semantic Segmentation

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Real-time Object Detection CS 229 Course Project

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Gated Bi-directional CNN for Object Detection

Final Report: Smart Trash Net: Waste Localization and Classification

An algorithm for highway vehicle detection based on convolutional neural network

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Learning to Segment Object Candidates

Object Recognition and Detection

Cascade Region Regression for Robust Object Detection

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

DeepBox: Learning Objectness with Convolutional Networks

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

arxiv: v1 [cs.cv] 6 Jul 2017

Martian lava field, NASA, Wikipedia

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Project 3 Q&A. Jonathan Krause

arxiv: v1 [cs.cv] 14 Dec 2015

arxiv: v3 [cs.cv] 17 May 2018

Lecture 7: Semantic Segmentation

Deep Neural Networks:

arxiv: v1 [cs.cv] 22 Mar 2018

arxiv: v1 [cs.cv] 26 May 2017

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

arxiv: v1 [cs.cv] 23 Apr 2015

Rich feature hierarchies for accurate object detection and semant

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Mimicking Very Efficient Network for Object Detection

arxiv: v2 [cs.cv] 18 Jul 2017

Faster R-CNN Features for Instance Search

Defect Detection from UAV Images based on Region-Based CNNs

arxiv: v1 [cs.cv] 18 Jun 2017

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

arxiv: v3 [cs.cv] 12 Mar 2018

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Structured Prediction using Convolutional Neural Networks

arxiv: v2 [cs.cv] 8 Apr 2018

arxiv: v2 [cs.cv] 23 Jan 2019

arxiv: v2 [cs.cv] 10 Apr 2017

arxiv: v2 [cs.cv] 30 Mar 2016

Segmentation as Selective Search for Object Recognition in ILSVRC2011

A CLOSER LOOK: SMALL OBJECT DETECTION IN FASTER R-CNN. Christian Eggert, Stephan Brehm, Anton Winschel, Dan Zecha, Rainer Lienhart

Transcription:

Deep learning for object detection Slides from Svetlana Lazebnik and many others

Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep convnets Using deep convnets 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

Beyond sliding windows: Region proposals Advantages: Cuts down on number of regions detector must evaluate Allows detector to use more powerful features and classifiers Uses low-level perceptual organization cues Proposal mechanism can be category-independent Proposal mechanism can be trained

Selective search Use segmentation J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, IJCV 2013

Selective search: Basic idea Use hierarchical segmentation: start with small superpixels and merge based on diverse cues J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, IJCV 2013

Evaluation of region proposals J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, IJCV 2013

Selective search detection pipeline Feature extraction: color SIFT, codebook of size 4K, spatial pyramid with four levels = 360K dimensions J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, IJCV 2013

Another proposal method: EdgeBoxes Box score: number of edges in the box minus number of edges that overlap the box boundary Uses a trained edge detector Uses efficient data structures for fast evaluation Gets 75% recall with 800 boxes (vs. 1400 for Selective Search), is 40 times faster C. Zitnick and P. Dollar, Edge Boxes: Locating Object Proposals from Edges, ECCV 2014.

R-CNN: Region proposals + CNN features Source: R. Girshick SVMs SVMs ConvNet SVMs ConvNet Classify regions with SVMs Forward each region through ConvNet ConvNet Warped image regions Region proposals Input image R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014.

R-CNN details Regions: ~2000 Selective Search proposals Network: AlexNet pre-trained on ImageNet (1000 classes), fine-tuned on PASCAL (21 classes) Final detector: warp proposal regions, extract fc7 network activations (4096 dimensions), classify with linear SVM Bounding box regression to refine box locations Performance: map of 53.7% on PASCAL 2010 (vs. 35.1% for Selective Search and 33.4% for DPM). R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014.

R-CNN pros and cons Pros Accurate! Any deep architecture can immediately be plugged in Cons Ad hoc training objectives Fine-tune network with softmax classifier (log loss) Train post-hoc linear SVMs (hinge loss) Train post-hoc bounding-box regressions (least squares) Training is slow (84h), takes a lot of disk space 2000 convnet passes per image Inference (detection) is slow (47s / image with VGG16)

Fast R-CNN Softmax classifier Linear + softmax Linear Bounding-box regressors FCs Fully-connected layers RoI Pooling layer Region proposals conv5 feature map of image Forward whole image through ConvNet ConvNet Source: R. Girshick R. Girshick, Fast R-CNN, ICCV 2015

Fast R-CNN training Log loss + smooth L1 loss Multi-task loss Linear + softmax Linear FCs Trainable ConvNet Source: R. Girshick R. Girshick, Fast R-CNN, ICCV 2015

Fast R-CNN results Fast R-CNN Train time (h) 9.5 84 - Speedup 8.8x 1x Test time / image R-CNN 0.32s 47.0s Test speedup 146x 1x map 66.9% 66.0% Timings exclude object proposal time, which is equal for all methods. All methods use VGG16 from Simonyan and Zisserman. Source: R. Girshick

Faster R-CNN Region proposals Region Proposal Network feature map feature map share features CNN CNN S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS 2015

Region proposal network Slide a small window over the conv5 layer Predict object/no object Regress bounding box coordinates Box regression is with reference to anchors (3 scales x 3 aspect ratios)

Faster R-CNN results

Object detection progress mean0average0precision0(map) 80% 70% 60% 50% 40% 30% 20% 10% Before deep convnets Faster R-CNN Fast R-CNN R-CNNv1 Using deep convnets 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

Next trends New datasets: MSCOCO 80 categories instead of PASCAL s 20 Current best map: 37% http://mscoco.org/home/

Next trends Fully convolutional detection networks W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. Berg, SSD: Single Shot MultiBox Detector, arxiv 2016.

Next trends Networks with context S. Bell, L. Zitnick, K. Bala, and R. Girshick, Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks, arxiv 2015.

Review: Object detection with CNNs

Review: R-CNN SVMs Classify regions with SVMs SVMs ConvNet SVMs ConvNet ConvNet Forward each region through ConvNet Warped image regions Region proposals Input image R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR 2014.

Review: Fast R-CNN Softmax classifier Linear + softmax Linear Bounding-box regressors FCs Fully-connected layers RoI Pooling layer Region proposals conv5 feature map of image Forward whole image through ConvNet ConvNet R. Girshick, Fast R-CNN, ICCV 2015

Review: Faster R-CNN Region proposals Region Proposal Network feature map feature map share features CNN CNN S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, NIPS 2015