Yiqi Yan. May 10, 2017

Similar documents
Spatial Localization and Detection. Lecture 8-1

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Object Detection Based on Deep Learning

Object detection with CNNs

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Lecture 5: Object Detection

OBJECT DETECTION HYUNG IL KOO

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Deep Learning for Object detection & localization

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

YOLO 9000 TAEWAN KIM

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

YOLO9000: Better, Faster, Stronger

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Rich feature hierarchies for accurate object detection and semantic segmentation

Object Detection on Self-Driving Cars in China. Lingyun Li

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Yield Estimation using faster R-CNN

Rich feature hierarchies for accurate object detection and semantic segmentation

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Unified, real-time object detection

R-FCN: Object Detection with Really - Friggin Convolutional Networks

Regionlet Object Detector with Hand-crafted and CNN Feature

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Optimizing Object Detection:

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

RON: Reverse Connection with Objectness Prior Networks for Object Detection

Automatic detection of books based on Faster R-CNN

Computer Vision Lecture 16

Object Detection. Part1. Presenter: Dae-Yong

Self Driving. DNN * * Reinforcement * Unsupervised *

Real-time Object Detection CS 229 Course Project

G-CNN: an Iterative Grid Based Object Detector

Deep Residual Learning

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Final Report: Smart Trash Net: Waste Localization and Classification

Object Detection with YOLO on Artwork Dataset

Computer Vision Lecture 16

Fully Convolutional Networks for Semantic Segmentation

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Rich feature hierarchies for accurate object detection and semant

Lecture 7: Semantic Segmentation

Visual features detection based on deep neural network in autonomous driving tasks

Cascade Region Regression for Robust Object Detection

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Computer Vision Lecture 16

Pixel Offset Regression (POR) for Single-shot Instance Segmentation

Advanced Video Analysis & Imaging

Classifying a specific image region using convolutional nets with an ROI mask as input

arxiv: v1 [cs.cv] 26 Jun 2017

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Project 3 Q&A. Jonathan Krause

Feature-Fused SSD: Fast Detection for Small Objects

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Todo before next class

Vehicle Classification on Low-resolution and Occluded images: A low-cost labeled dataset for augmentation

G-CNN: an Iterative Grid Based Object Detector

Zero-shot learning with Fast RCNN and Mask RCNN through semantic attribute Mapping

Modern Convolutional Object Detectors

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Photo OCR ( )

arxiv: v1 [cs.cv] 15 Oct 2018

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

Instance-aware Semantic Segmentation via Multi-task Network Cascades

A Comparison of CNN-based Face and Head Detectors for Real-Time Video Surveillance Applications

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Mimicking Very Efficient Network for Object Detection

Object Detection in Sports Videos

Why Is Recognition Hard? Object Recognizer

An Exploration of Computer Vision Techniques for Bird Species Classification

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Optimizing Object Detection:

A Novel Representation and Pipeline for Object Detection

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Structured Prediction using Convolutional Neural Networks

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

arxiv: v1 [cs.cv] 5 Oct 2015

SSD: Single Shot MultiBox Detector

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Hand Detection For Grab-and-Go Groceries

Classification of objects from Video Data (Group 30)

A Deep Learning Framework for Authorship Classification of Paintings

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Machine Learning 13. week

Transcription:

Yiqi Yan May 10, 2017

P a r t I F u n d a m e n t a l B a c k g r o u n d s

Convolution Single Filter Multiple Filters 3

Convolution: case study, 2 filters 4

Convolution: receptive field receptive field 3 X 3 The deeper the network goes, the larger the receptive fields will be 5

Pooling Pooling on one single channel Pooling on the whole feature map 6

Computer Vision Tasks this is a cat here is a cat 7

Computer Vision Tasks Multiple Objects in one image! Pixel-wise Classification! 8

P a r t I I S t a t e - of- a r t m e t h o d s f o r o b j e c t d e t e c t i o n 9

Group One: RCNN & Modifications (CVPR 2014) RCNN: Region Based CNN (TOO SLOW!) (ECCV 2014) SPP-net: Spatial Pyramid Pooling in CNN (ICCV 2015) Fast RCNN (NIPS 2015) Faster RCNN (Online Object Detection) Group Two: Fast! Online Object Detection (ECCV 2016) SSD: Single Shot MultiBox Detector (CVPR 2016) YOLO (CVPR 2017) YOLO9000 Group Three: Deformable Convolutional Filters (CVPR 2015) DeepID-Net (arxiv 2017) Deformable Convolutional Networks Group Four: Detection + Segmentation (arxiv 2017) Mask R-CNN 10

P a r t I I I R e g i o n B a s e d C N N 11

Motivation Problem One: CNN seems unsuited for Object Detection Unlike image classification, detection requires localizing objects within an image Deep CNNs have very large receptive fields, which makes precise localization very challenging Problem Two: Deep networks need large dataset to train 12

Framework Solve Problem One Extract region proposals Recognition using regions Solve Problem Two Supervised Pre-training on ImageNet Domain-specified fine-tuning 13

Region Proposals Find blobby image regions that are likely to contain objects 14

Region Proposals: Selective Search over-segmentation region-merging Uijlings et al, Selective Search for Object Recognition, IJCV 2013 15

Training method Step 1: Supervised Pre-training Train a classification model for ImageNet (AlexNet, VGGNet, etc.) No localization can be done, thus this is just pre-training the parameters. VGG-19 (2015) 16

Training method Step 2: Domain-specified fine-tuning Change network architecture Instead of 1000 ImageNet classes, want N object classes + background (N+1 classes) Need to reinitialize the soft-max layer Fine-tuning using Region Proposals Keep training model using positive / negative regions from detection images This time, use Detection Datasets (VOC, ILCVRC, COC, etc.) 17

Run Detection Now we get the trained network, let us test it! Step 1: Extract region proposals for all images Step 2: (for each region) run through CNN, save pool5 features Step 3: use binary SVM to classify region features (WHY NOT just use soft-max) Step 5: bounding box regression: For each class, train a linear regression model to make up for slightly wrong proposals 18

Run Detection 19

P a r t I V F a s t R C N N 20

R-CNN Problems: too slow! Training is a multi-stage pipeline: RCNN SVMs bounding-box regression Training is expensive in space and time CNN features are stored for use of training SVMs and regression ~200GB disk place for PASCAL dataset! Object detection is slow: features are extracted from each object proposal 47s / image on a GPU! 21

Fast R-CNN Framework 22

ROI (region of interest) Pooling Pooling each region into a fixed size (7 X 7 in the paper) Back propagate similar to traditional max pooling 23

Multi-task loss The network outputs two vectors per ROI 1. Soft-max probabilities for classification 2. Per-class bounding-box regression offsets 1 st term: traditional cross-entropy loss for soft-max 2 nd term: error between predicted and true bounding-box 24

Why superior to RCNN: problem #1 & #2 RCNN problem #1: Training is a multi-stage pipeline RCNN problem #2: Training is expensive in space and time Faster RCNN: end-to-end training; no need to store features 25

Why superior to RCNN: problem #3 RCNN problem #3: Object detection is slow because features are extracted from each object proposal Fast RCNN: just run the whole image through CNN; regions are extracted from feature map 26

Comparison Fast RCNN is not fast enough! Bottleneck: Selective Search Region Proposal 27

P a r t V F a s t e r R C N N O n l i n e D e t e c t i o n! 28

Fast RCNN is not fast enough Main bottleneck: Selective Search Region Proposal Faster RCNN: Why not just make the CNN do region proposals too! 29

Faster RCNN framework Insert a Region Proposal Network (RPN) trained to produce region proposals directly ROI Pooling, soft-max classifier and bounding box regression are just like Fast RCNN 30

Region Proposal Network Similar to the multi-task training in fast RCNN: classification + bounding box prediction. The difference is that we only need two-class classification here: object & not object 31

End-to-end joint training! RPN classification RPN bbx regression Fast RCNN classification Fast R-CNN bbx regression 32

Comparison 33

P a r t V I Y O L O : Yo u O n l y L o o k O n O n c e 34

YOLO Framework Divide image into S x S grid (7 X 7 in the parper) Within each grid cell predict: B Boxes: 4 coordinates + confidence C Class scores Regression from image to 7 x 7 x (5 * B + C) tensor Direct prediction using a CNN 35

Comparison Faster than Faster R-CNN, but not as good 36

Great Contributors Ross Girshick: http://www.rossgirshick.info/ Kaiming He: http://kaiminghe.com/ Joseph Chet Redmon: https://pjreddie.com/ Code R-CNN (Cafffe+MATLAB): https://github.com/rbgirshick/rcnn Fast R-CNN (Caffe+MATLAB): https://github.com/rbgirshick/fast-rcnn Faster R-CNN (Caffe+MATLAB): https://github.com/shaoqingren/faster_rcnn (Caffe+Python): https://github.com/rbgirshick/py-faster-rcnn YOLO: http://pjreddie.com/darknet/yolo/ 37