Object Detection Based on Deep Learning

Similar documents
Object detection with CNNs

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Spatial Localization and Detection. Lecture 8-1

Lecture 5: Object Detection

Unified, real-time object detection

Yiqi Yan. May 10, 2017

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

OBJECT DETECTION HYUNG IL KOO

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Rich feature hierarchies for accurate object detection and semantic segmentation

YOLO9000: Better, Faster, Stronger

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Feature-Fused SSD: Fast Detection for Small Objects

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Rich feature hierarchies for accurate object detection and semantic segmentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

G-CNN: an Iterative Grid Based Object Detector

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection on Self-Driving Cars in China. Lingyun Li

Deep Learning for Object detection & localization

SSD: Single Shot MultiBox Detector

Category-level localization

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

arxiv: v1 [cs.cv] 4 Jun 2015

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

Real-time Object Detection CS 229 Course Project

Optimizing Object Detection:

arxiv: v1 [cs.cv] 26 May 2017

arxiv: v1 [cs.cv] 3 Apr 2016

A Novel Representation and Pipeline for Object Detection

Computer Vision Lecture 16

arxiv: v2 [cs.cv] 30 Mar 2016

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia

Computer Vision Lecture 16

Object Detection with YOLO on Artwork Dataset

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

Localization-Aware Active Learning for Object Detection

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

Object Recognition II

arxiv: v1 [cs.cv] 26 Jun 2017

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

arxiv: v4 [cs.cv] 30 Nov 2016

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Hand Detection For Grab-and-Go Groceries

arxiv: v1 [cs.cv] 15 Oct 2018

Optimizing Object Detection:

Learning Representations for Visual Object Class Recognition

arxiv: v1 [cs.cv] 1 Apr 2017

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Computer Vision Lecture 16

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Cascade Region Regression for Robust Object Detection

arxiv: v2 [cs.cv] 8 Apr 2018

Adaptive Object Detection Using Adjacency and Zoom Prediction

Future directions in computer vision. Larry Davis Computer Vision Laboratory University of Maryland College Park MD USA

Deformable Part Models

Regionlet Object Detector with Hand-crafted and CNN Feature

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Traffic Multiple Target Detection on YOLOv2

Modern Convolutional Object Detectors

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

Abandoned Luggage Detection

Single-Shot Refinement Neural Network for Object Detection -Supplementary Material-

Improving Small Object Detection

arxiv: v1 [cs.cv] 6 Jul 2017

Deep Neural Networks:

arxiv: v2 [cs.cv] 22 Sep 2014

Object Detection in Sports Videos

Yield Estimation using faster R-CNN

arxiv: v1 [cs.cv] 18 Jun 2017

Self Driving. DNN * * Reinforcement * Unsupervised *

segdeepm: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection

Visual features detection based on deep neural network in autonomous driving tasks

arxiv: v1 [cs.cv] 5 Oct 2015

Object Recognition and Detection

You Only Look Once: Unified, Real-Time Object Detection

TS 2 C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

arxiv: v2 [cs.cv] 13 Jun 2017

Mimicking Very Efficient Network for Object Detection

arxiv: v1 [cs.cv] 15 Aug 2018

arxiv: v1 [cs.cv] 13 Jul 2018

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

arxiv: v2 [cs.cv] 28 Jul 2017

Video Semantic Indexing using Object Detection-Derived Features

Project 3 Q&A. Jonathan Krause

arxiv: v1 [cs.cv] 9 Aug 2017

arxiv: v1 [cs.cv] 23 Apr 2015

Gated Bi-directional CNN for Object Detection

YOLO 9000 TAEWAN KIM

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Transcription:

Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016

Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf 2

Classification vs. Detection http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf 3

Benchmarks PASCAL VOC 2007/2012 ILSVRC MS COCO 4

PASCAL VOC 2007/2012 20 classes: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor Train/val size: o VOC 2007 has 9,963 images containing 24,640 annotated objects. o VOC 2012 has 11,530 images containing 27,450 http://host.robots.ox.ac.uk/pascal/voc/ 5

ILSVRC (DET) 200 object classes 527,982 images http://image-net.org/ 6

MS COCO 91 categories (80 available) 123,287 images, 886,284 instances http://mscoco.org/ 7

Evaluation We use a metric called mean average precision (map) Compute average precision (AP) separately for each class, then average over classes A detection is a true positive if it has IoU with a ground-truth box greater than some threshold (usually 0.5) (map@0.5) Combine all detections from all test images to draw a precision / recall curve for each class; AP is area under the curve 8

Detection as Classification Problem: Need to test many positions and scales, and use a computationally demanding classifier (CNN) Solution: Only look at a tiny subset of possible positions http://cs231n.github.io/neural-networks-3/ Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-48 1 Feb 2016 9

Region Proposals. Selective Search J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, Selective Search for Object Recognition, IJCV 2013 10

Selective search detection pipeline Selective search + SIFT + bag-of-words + SVMs 11

R-CNN Regions: ~2000 Selective Search proposals Network: AlexNet pre-trained on ImageNet (1000 classes), finetuned on PASCAL (21 classes) Final detector: warp proposal regions, extract fc7 network activations (4096 dimensions), classify with linear SVM Bounding box regression to refine box locations Performance: map of 53.7% on PASCAL 2010 (vs. 35.1% for Selective Search and 33.4% for DPM). Ross Girshick Jeff Donahue Trevor Darrell Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014 12

R-CNN pros and cons Pros Accurate! Any deep architecture can immediately be plugged in Cons Ad hoc training objectives o Fine-tune network with softmax classifier (log loss) o Train post-hoc linear SVMs (hinge loss) o Train post-hoc bounding-box regressions (least squares) Training is slow (84h), takes a lot of disk space 2000 convnet passes per image Inference (detection) is slow (47s / image with VGG16) Ross Girshick Jeff Donahue Trevor Darrell Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014 13

Spatial Pyramid Pooling Layer In each candidate window, used a 4-level spatial pyramid: 1 1 2 2 3 3 6 6 Totally 50 bins to pool the features. This generates a 12,800- d (256 50) representation for each window K. Grauman and T. Darrell, The pyramid match kernel: Discriminative classification with sets of image features, in ICCV, 2005. S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in CVPR, 2006. 14

SPP-net Apply b-box regressors Classify regions with SVMs Fully-connected layers ROIs from proposal method Spatial Pyramid Pooling layer conv5 feature map of image Forward whole image through ConvNet Input image He, K., Zhang, X., Ren, S., and Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, abs/1406.4729v2, 2014 15

What s good about SPP-net? Pascal VOC 2007 results its realy faster He, K., Zhang, X., Ren, S., and Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, abs/1406.4729v2, 2014 16

What s wrong with SPP'net? Inherits the rest of R-CNN s problems Introduces a new problem: cannot update parameters below SPP layer during training Trainable (3layers) Frozen (13layers) He, K., Zhang, X., Ren, S., and Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, abs/1406.4729v2, 2014 17

Fast R-CNN Fast test time, like SPP-net One network, trained in one stage Higher mean average precision than slow R- CNN and SPP-net R. Girshick. Fast R-CNN. arxiv:1504.08083, 2015 18

Fast R-CNN Results 19

Fast R-CNN Problem 20

Region proposal network ~ 10 ms per image S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015. 21

Faster R-CNN 22 S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015.

Faster R-CNN Training http://cs231n.github.io/neural-networks-3/ Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8-48 1 Feb 2016 23

Faster R-CNN Results 24

Faster R-CNN Results MS COCO ResNet 101 + Faster R-CNN 25

Object detection progress PASCAL VOC 2007 26

ImageNet Detection 2013-2015 27

Next trends Fully convolutional detection networks You Only Look Once (YOLO) Single Shot Multibox Detector (SSD) 28

You Only Look Once YOLO Divide image into SxS grid If the center of an object falls into a grid cell, it will be the responsible for the object. Each grid cell predict: B-boxes Confidence scores Class probability J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. arxiv preprint arxiv:1506.02640, 2015 29

YOLO Design J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. arxiv preprint arxiv:1506.02640, 2015 30

Fast R-CNN & YOLO Speed > 45 fps J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. arxiv preprint arxiv:1506.02640, 2015 31

Limitations Struggle with small objects Struggle with different aspects and ratios of objects Loss function is an approximation Loss function threats errors in different boxes ratio at the same. 32

Single Shot MultiBox Detector (SSD) W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed. SSD: single shot multibox detector. CoRR, abs/1512.02325, 2015 33

SSD vs YOLO Architecture W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. E. Reed. SSD: single shot multibox detector. CoRR, abs/1512.02325, 2015 34

SSD Results Pascal PASCAL VOC 2007 PASCAL VOC 2012 35

SSD Results on MS COCO 36

Sources R-CNN Caffe + MATLAB : https://github.com/rbgirshick/rcnn Faster R-CNN YOLO SSD Caffe + MATLAB: https://github.com/shaoqingren/faster_rcnn Caffe + Python: https://github.com/rbgirshick/py-faster-rcnn Torch: https://github.com/andreaskoepf/faster-rcnn.torch TensorFlow: https://github.com/smallcorgi/faster-rcnn_tf Darknet: https://github.com/pjreddie/darknet TensorFlow: https://github.com/gliese581gg/yolo_tensorflow Caffe: https://github.com/xingwangsfu/caffe-yolo Caffe: https://github.com/weiliu89/caffe/tree/ssd 37

THANK YOU Yurii Pashchenko george.pashchenko@gmail.com