Modern Convolutional Object Detectors

Similar documents
R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Object detection with CNNs

Team G-RMI: Google Research & Machine Intelligence

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Fish Species Likelihood Prediction. Kumari Deepshikha (1611MC03) Sequeira Ryan Thomas (1611CS13)

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

TRAFFIC ANALYSIS USING VISUAL OBJECT DETECTION AND TRACKING

STREET OBJECT DETECTION / TRACKING FOR AI CITY TRAFFIC ANALYSIS

Pelee: A Real-Time Object Detection System on Mobile Devices

Lecture 5: Object Detection

Object Detection Based on Deep Learning

R-FCN: Object Detection with Really - Friggin Convolutional Networks

Fusion of Camera and Lidar Data for Object Detection using Neural Networks

arxiv: v1 [cs.cv] 13 Mar 2019

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Vehicle Classification on Low-resolution and Occluded images: A low-cost labeled dataset for augmentation

Yiqi Yan. May 10, 2017

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

arxiv: v1 [cs.cv] 4 Jun 2015

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Improving Small Object Detection

RON: Reverse Connection with Objectness Prior Networks for Object Detection

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

FCHD: A fast and accurate head detector

arxiv: v1 [cs.cv] 9 Dec 2018

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Efficient Segmentation-Aided Text Detection For Intelligent Robots

arxiv: v1 [cs.cv] 24 Jan 2019

arxiv: v1 [cs.cv] 26 May 2017

YOLO 9000 TAEWAN KIM

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Real-Time Semantic Segmentation Benchmarking Framework

arxiv: v2 [cs.cv] 19 Apr 2018

Spatial Localization and Detection. Lecture 8-1

arxiv: v1 [cs.cv] 22 Mar 2018

arxiv: v1 [cs.cv] 15 Oct 2018

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

Final Report: Smart Trash Net: Waste Localization and Classification

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Cascade Region Regression for Robust Object Detection

Automatic detection of books based on Faster R-CNN

Object Detection on Self-Driving Cars in China. Lingyun Li

Towards High Performance Video Object Detection

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Regionlet Object Detector with Hand-crafted and CNN Feature

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Defect Detection from UAV Images based on Region-Based CNNs

R-FCN-3000 at 30fps: Decoupling Detection and Classification

Feature-Fused SSD: Fast Detection for Small Objects

arxiv: v1 [cs.cv] 5 Dec 2017

Recurrent Multi-frame Single Shot Detector for Video Object Detection

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

YOLO9000: Better, Faster, Stronger

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Optimizing Object Detection:

Computer Vision Lecture 16

arxiv: v2 [cs.cv] 23 Nov 2017

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches

Photo OCR ( )

Computer Vision Lecture 16

arxiv: v1 [cs.cv] 18 Nov 2017

arxiv: v1 [cs.cv] 24 Oct 2018

Classifying a specific image region using convolutional nets with an ROI mask as input

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

Unified, real-time object detection

arxiv: v2 [cs.cv] 23 Jan 2019

Deeply Cascaded Networks

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

An Analysis of Scale Invariance in Object Detection SNIP

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Learning Globally Optimized Object Detector via Policy Gradient

arxiv: v2 [cs.cv] 10 Apr 2017

Why Is Recognition Hard? Object Recognizer

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Deep Learning for Object detection & localization

Real-time Object Detection CS 229 Course Project

Progressive Neural Architecture Search

Context Refinement for Object Detection

arxiv: v1 [cs.cv] 26 Jun 2017

An Analysis of Scale Invariance in Object Detection SNIP

arxiv: v1 [cs.cv] 15 Aug 2018

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Robust Object detection for tiny and dense targets in VHR Aerial Images

Deep Residual Learning

Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors

Improving Region based CNN object detector using Bayesian Optimization

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Rich feature hierarchies for accurate object detection and semantic segmentation

arxiv: v2 [cs.ro] 27 Aug 2018

ScratchDet: Exploring to Train Single-Shot Object Detectors from Scratch

arxiv: v3 [cs.cv] 6 Apr 2019

SSD: Single Shot MultiBox Detector

FaceNet. Florian Schroff, Dmitry Kalenichenko, James Philbin Google Inc. Presentation by Ignacio Aranguren and Rahul Rana

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Transcription:

Modern Convolutional Object Detectors Faster R-CNN, R-FCN, SSD 29 September 2017 Presented by: Kevin Liang

Papers Presented Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (NIPS 2015) Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun R-FCN: Object Detection via Region-based Fully Convolutional Networks (NIPS 2016) Jifeng Dai, Yi Li, Kaiming He, Jian Sun SSD: Single Shot MultiBox Detector (ECCV 2016) Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C Berg Speed/accuracy trade-offs for modern convolutional object detectors (CVPR 2017) Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

Object Detection Models

Section 1 Background

Background Faster R-CNN R-FCN SSD Classification (top) vs Detection (bottom) Speed/Accuracy Comparison

Two-stage Detection Paradigm Many algorithms break up detection into two components: Box Proposal Object Classification

Regions with CNN features (R-CNN): 2013 Pros: Demonstrated ImageNet classification + AlexNet learned learned features also applicable for detection Significant jump in map performance on PASCAL VOC over state-of-the-art

Regions with CNN features (R-CNN) (continued) Cons: Proposals generated by external algorithm (Selective Search) SVM and bounding box regressors trained separately; saving features to train these is memory intensive Patches extracted from raw image; overlapping proposals lead to repeated CNN computation for classification Detection takes 47 s/image (GPU)

Fast R-CNN Single pass of entire image through CNN; patches extracted from convolutional feature maps Final classification and bounding box regression trained jointly with CNN Test time detection is 213x as fast as R-CNN Proposals still generated by external algorithm

Two-stage Detection Paradigm - Revisited

Section 2 Faster R-CNN

Two-stage Detection Paradigm - Revisited

Faster R-CNN: Model Architecture Box proposal and classification share convolutional computation Patches extracted from convolutional feature map Detection 5 fps

Faster R-CNN: Region Proposal Network Sliding window passed over final convolutional feature maps Box proposals are parameterized relative to fixed size reference boxes (termed anchors ) Multiple scales and and aspect ratios (eg 1x1, 2x1, 1x2, 2x2, etc.)

Faster R-CNN: Training

Faster R-CNN: Loss RPN Loss: L(p i, t i ) = 1 L cls (p i, p i ) + λ 1 p i L reg (t i, t i ) (1) N cls N reg i Bounding box parameterization: t x = (x x a )/w a, t y = (y y a )/h a t w = log(w/w a ), t h = log(h/h a ) t x = (x x a )/w a, t y = (y y a )/h a t w = log(w /w a ), t h = log(h /h a ) i Localization loss: L reg (t, t ) = smooth L1 (x) = j x,y,w,h smooth L1 (t j t j), { 0.5x 2 if x < 1 x 0.5 otherwise

Faster R-CNN: ResNet ResNet weights:

Section 3 R-FCN

Region-based Fully Convolutional Networks: Inspiration Fast and Faster R-CNN save time by sharing computation of repeated convolutional features for object classification and region proposals, respectively However, Faster R-CNN still contains several unshared fully connected layers that must be computed for each of hundreds of proposals Dilemma of opposing needs for translation invariance (classification) and translation variance (localization) Leads to unnatural insertion of region proposals between convolutional layers

Region-based Fully Convolutional Networks: Model Architecture

Background Faster R-CNN R-FCN SSD Speed/Accuracy Comparison Region-based Fully Convolutional Networks: Visualization

Region-based Fully Convolutional Networks: Benefits and Contributions 2.5-20x faster than Faster R-CNN Comparable accuracy Easy Online Hard Example Mining (OHEM) Use of atrous convolutions

Section 4 SSD

Two-stage Detection Paradigm - Revisited Again

Single Shot Multibox Detector: Model Architecture

Single Shot Multibox Detector: Default boxes

Section 5 Speed/Accuracy Comparison

CNNs and Meta Architectures Summary

Accuracy vs Speed

Accuracy/Speed vs Box proposals

Conclusions Found a speed-accuracy performance frontier For two-stage set-ups, more box proposals leads to better performance, but at the cost of speed and significantly diminishing returns. General correlation of convolutional feature extractor performance on ImageNet classification and performance as part of an object detector. Higher resolution images lead to higher quality localization, but at the cost of speed and memory. A similar trade-off exists with respect to convolutional feature extractor output stride (atrous convolutions). Won 2016 MS COCO object detection challenge by ensembling these implementations.

Background Faster R-CNN R-FCN SSD Open-Sourced TensorFlow Implementations Speed/Accuracy Comparison