Rich feature hierarchies for accurate object detection and semantic segmentation

Similar documents
Rich feature hierarchies for accurate object detection and semantic segmentation

Object Detection Based on Deep Learning

Spatial Localization and Detection. Lecture 8-1

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Object detection with CNNs

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

Rich feature hierarchies for accurate object detection and semant

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Yiqi Yan. May 10, 2017

YOLO9000: Better, Faster, Stronger

Lecture 5: Object Detection

OBJECT DETECTION HYUNG IL KOO

Fully Convolutional Networks for Semantic Segmentation

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Weakly Supervised Object Recognition with Convolutional Neural Networks

Computer Vision Lecture 16

Computer Vision Lecture 16

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Object Detection on Self-Driving Cars in China. Lingyun Li

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Computer Vision Lecture 16

Optimizing Object Detection:

Project 3 Q&A. Jonathan Krause

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Return of the Devil in the Details: Delving Deep into Convolutional Nets

arxiv: v2 [cs.cv] 1 Oct 2014

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

RECOGNIZING objects and localizing them in images is

Cascade Region Regression for Robust Object Detection

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Deep Learning for Object detection & localization

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Content-Based Image Recovery

Visual features detection based on deep neural network in autonomous driving tasks

Deformable Part Models

Unified, real-time object detection

Regionlet Object Detector with Hand-crafted and CNN Feature

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

G-CNN: an Iterative Grid Based Object Detector

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

arxiv: v1 [cs.cv] 14 Dec 2015

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia

YOLO: You Only Look Once Unified Real-Time Object Detection. Presenter: Liyang Zhong Quan Zou

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Object Recognition II

DeCAF: a Deep Convolutional Activation Feature for Generic Visual Recognition

Object Detection with YOLO on Artwork Dataset

arxiv: v1 [cs.cv] 4 Jun 2015

LSDA: Large Scale Detection through Adaptation

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

CAP 6412 Advanced Computer Vision

Deep Residual Learning

WE are witnessing a rapid, revolutionary change in our

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

A Novel Representation and Pipeline for Object Detection

Detection and Localization with Multi-scale Models

arxiv: v1 [cs.cv] 23 Apr 2015

Automatic detection of books based on Faster R-CNN

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Fuzzy Set Theory in Computer Vision: Example 3

Why Is Recognition Hard? Object Recognizer

Semantic Segmentation

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

CNN Basics. Chongruo Wu

Learning From Weakly Supervised Data by The Expectation Loss SVM (e-svm) algorithm

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Deep Neural Networks:

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Beyond Bounding Boxes: Precise Localization of Objects in Images

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

Transfer Learning. Style Transfer in Deep Learning

Final Report: Smart Trash Net: Waste Localization and Classification

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

Subspace Alignment Based Domain Adaptation for RCNN Detector

A Deep Learning Framework for Authorship Classification of Paintings

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

arxiv: v1 [cs.cv] 7 Jul 2014

Lecture 7: Semantic Segmentation

arxiv: v2 [cs.cv] 22 Sep 2014

Feature-Fused SSD: Fast Detection for Small Objects

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Optimizing Object Detection:

Object Detection with Discriminatively Trained Part Based Models

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Automatic Graphic Logo Detection via Fast Region-based Convolutional Networks

Classification of objects from Video Data (Group 30)

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Transcription:

Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu

Last class SGD for Document recognition ImageNet classification LeCun et al. 1998 Krizhevsky et al. 2012 What problems did they solve?

Image Classification ImageNet sample image Image credit: http://rocknrollnerd.github.io/ml/2015/05/27/leopard-sofa.html

Image Classification ImageNet sample image Image credit: \https://www.pinterest.com/explore/facts-about-tigers

Image Classification ImageNet sample image Image credit: https://www.youtube.com/watch?v=a1ofpnxiwvm

Image Classification MS COCO sample image Image credit: https://github.com/pdollar/coco/blob/master/pythonapi/pycocodemo.ipynb

Object Detection MS COCO sample image Image credit: https://github.com/pdollar/coco/blob/master/pythonapi/pycocodemo.ipynb

Semantic segmentation Image credit: http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review

Different visual recognition tasks Image classification Object detection Semantic segmentation

History

PASCAL VOC Detection history Image credit: Ross Girshick

DPM - Deformable Parts Model Image credit: http://www.embeddedvisionsystems.it/solutions/ip2lib/117-dpm

Feature learning with CNNs Fukushima 1980 Neocognitron LeCun et al. 1998 SGD for document recognition Krizhevsky et al. 2012 ImageNet classification (AlexNet)

Brute force

Brute force Forget it!

Regions Gu et al. 2009 Recognition using regions

R-CNN High level flow Category independent region proposals Extract feature vector using CNN Classify each region using a linear SVM per class

Region Proposal 2000 region proposals Selective search algorithm (Uijilings et al. 2012) Exhaustive search Segmentation Selective search

Feature vector using CNN 5 convolutional layers 2 fully connected layers Output: 4096 dimension feature vector

Linear SVMs SVM for Cat No SVM for Dog Yes CNN SVM for Lion A sample region No

Challenges Localization Region proposal and bounding box regression Limited training set

Challenges Localization Limited training set Region proposal and bounding box regression Supervised pre-training with fine tuning

Training Cat Supervised Dog CNN SVM pre-training Fine tuning (SGD) Car Regions from PASCAL VOC ILSVRC (2012)

Testing Intersection-over-Union (IoU) Image credit: http://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

Testing Ignore unwanted regions (using non-maxima suppression) SVM Scores 0.87 0.72

Object Detection Results Slide credit: Ross Girshick

Semantic segmentation Results R-CNN easily extended for semantic segmentation task O2P: then leader in the task (uses CPMC - Constrained Parametric Min-Cuts) Image credit: Ross Girshick

PROS

Intuitive Combining Regions with CNN

Performant Easily scales with number of classes

Run time analysis Once for all SVM classes Feature vectors Supervised ImageNet CNN SVM pre-training Fine tuning (SGD) Regions from PASCAL VOC Feature vec: low dimension - 4K

Run time analysis Slide credit: Ross Girshick

Impact One of the commonly used methods used for semantic segmentation

Ablation studies Analyzing performance impact of different layers

Ablation studies Effect of different layers and fine tuning on map (mean average precision) Without fine tuning fc7 generalizes worse than fc6 Representational power: conv layer > fully connected layer Fine tuning: increases map by 8%.

Visualization of network Showing which layers learn which features

Visualization of network what features does each layer learn? Image credit: Ross Girshick

Evaluation Compared with different other baselines and methods

Failure modes Analyzed common failure modes and also suggested solutions (BB)

Detection error analysis Image credit: Ross Girshick

Bounding box regression Most errors: Mislocalizations BB regression: Linear regression model to predict a new detection window given the pool5 features. Bounding box regression

Source code Properly documented source code in github

Source code Image credit: https://github.com/rbgirshick/rcnn

CONS

Computational costly Every proposals have to go through the whole network

Need two-steps for Detection Can t unify proposal step and classification step

Using SVM No end to end training

Violate spatial translation invariance Devils are FC layers

No global information

Person or Not

Idea is simple No more than image classification