Rich feature hierarchies for accurate object detection and semant

Similar documents
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Rich feature hierarchies for accurate object detection and semantic segmentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Rich feature hierarchies for accurate object detection and semantic segmentation

Spatial Localization and Detection. Lecture 8-1

Project 3 Q&A. Jonathan Krause

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Yiqi Yan. May 10, 2017

Object detection with CNNs

Using Machine Learning for Classification of Cancer Cells

OBJECT DETECTION HYUNG IL KOO

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection Based on Deep Learning

Refining Bounding-Box Regression for Object Localization

CAP 6412 Advanced Computer Vision

G-CNN: an Iterative Grid Based Object Detector

Refining Bounding-Box Regression for Object Localization. Naomi Lynn Dickerson

Deep Learning for Object detection & localization

WITH the recent advance [1] of deep convolutional neural

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Lecture 5: Object Detection

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Automatic detection of books based on Faster R-CNN

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Know your data - many types of networks

YOLO9000: Better, Faster, Stronger

YOLO: You Only Look Once Unified Real-Time Object Detection. Presenter: Liyang Zhong Quan Zou

Yield Estimation using faster R-CNN

Learning to Segment Object Candidates

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Multi-View 3D Object Detection Network for Autonomous Driving

YOLO 9000 TAEWAN KIM

Regionlet Object Detector with Hand-crafted and CNN Feature

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Gradient of the lower bound

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

An Exploration of Computer Vision Techniques for Bird Species Classification

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Object Detection with Discriminatively Trained Part Based Models

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Deep Residual Learning

Optimizing Object Detection:

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Lecture 7: Semantic Segmentation

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Fuzzy Set Theory in Computer Vision: Example 3

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Object Proposal Generation with Fully Convolutional Networks

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

Final Report: Smart Trash Net: Waste Localization and Classification

R-FCN: Object Detection with Really - Friggin Convolutional Networks

DeepPose & Convolutional Pose Machines

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Learning to Localize Objects with Structured Output Regression

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

OBJECT detection is one of the most important problems

Object Detection with YOLO on Artwork Dataset

G-CNN: an Iterative Grid Based Object Detector

Classification of objects from Video Data (Group 30)

Overall Description. Goal: to improve spatial invariance to the input data. Translation, Rotation, Scale, Clutter, Elastic

RECOGNIZING objects and localizing them in images is

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

All You Want To Know About CNNs. Yukun Zhu

Deformable Part Models

Why Is Recognition Hard? Object Recognizer

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Object Recognition II

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

CNN Basics. Chongruo Wu

Advanced Video Analysis & Imaging

Structured Prediction using Convolutional Neural Networks

Finding Tiny Faces Supplementary Materials

WE are witnessing a rapid, revolutionary change in our

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

THE goal of action detection is to detect every occurrence

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

Segmentation as Selective Search for Object Recognition in ILSVRC2011

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. By Joa õ Carreira and Andrew Zisserman Presenter: Zhisheng Huang 03/02/2018

CAP 6412 Advanced Computer Vision

Part Localization by Exploiting Deep Convolutional Networks

arxiv: v1 [cs.cv] 31 Mar 2016

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

Modern Object Detection. Most slides from Ali Farhadi

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

Computer Vision Lecture 16

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Computer Vision Lecture 16

Transcription:

Rich feature hierarchies for accurate object detection and semantic segmentation Speaker: Yucong Shen 4/5/2018

Develop of Object Detection 1 DPM (Deformable parts models) 2 R-CNN 3 Fast R-CNN 4 Faster R-CNN 5 Mask R-CNN

Framework of R-CNN 1 Region Proposal 2 Feature extraction 3 Classification

Region Proposal 1 Take an input image, extract 2000 bottom-up region proposals with selective search 2 Selective search: 1 Segment input image into 1K to 2K regions 2 Compute the similarity of neighboring regions 3 Merge the similarity neighboring regions 3 Resize all the proposal regions into 227 227

Selective Search: Color Similarity and Texture Similarity 1 Color Similarity For every region, we will get a color historgram, there would be 25 (75 if RGB) intervals: C i = {c 1 i, c2 i, cn i } S color (r i, r j ) = Σ n k=1 min(ck i, ck j ) 2 Texture Similarity For every region, calculate Guassian derivative for every channel s 8 different direction with variance 1 Obtain the histogram of them: T i = {t 1 i, t2 i,, tn i } S texture (t i, t j ) = Σ n k=1 min(tk i, tk j )

Feature Extractor 1 Network Stucture Train the 1 AlexNet 2 VGG-16 to extract a 4096-dimension feature vector 2 Supervised Pre-training pre-train CNN network on large auxiiary dataset (ImageNet) 3 Domain-specific fine-tuning Continue SGD training of CNN using only warped regions proposals The CNN architecture is unchanged except replacing the 1000-way classification layer with the (N+1)-way classification layer In each iteration, the batch size is 128, 32 positive samples, 96 negative samples 4 Object category classification Treat all region proposals with 05 with a ground-truth box as positives for that box s class and the rest as negatives Once features are extracted and training labels are applied, we optimize one linear SVM per class

Classification 1 Training SVM can only deal with 2-way classification problem Labeling the data as negative when IoU less than 03 Once features are extracted, train N linear SVM classifier After that, training a linear regression predict a new detection window given the 4096-dimension vectors features for a selective search region proposal 2 Testing Extract 2000 proposal regions with selective search, normalization all the regions to 227 227, forward propogate in CNN, extract the 4096-dimension feature vector, testing with the SVM classifier, removing the redundant bounding box with NMS, obtain the bounding box after bounding box regression

Result on PASCAL VOC and ILSIRC

Result on PASCAL VOC and ILSIRC

Visulization of Pool5

Result on PASCAL ILSIRC

ILSVRC2013 detection dataset 1 The ILSVRC2013 detection dataset is split into three sets: train (395,918), val (20,121), and test (40,152) in 200 classes 2 The training images are not exhaustively annotated, so train images cannot be used for hard negative mining, so use val for both training and validation 3 Training data is required for three procedures in R-CNN: (1) CNN fine-tuning, (2) detector SVM training (3) bounding-box regressor training

Object Proposal (A) the original object proposal at its actual scale relative to the transformed CNN inputs; (B) tightest square with context; (C) tightest square without context; (D) warp Warping with context padding (p = 16 pixels) outperformed the warped by a large margin (3-5 map points)

Bounding-box Regression 1 Purpose improve localization performance of the bounding-box 2 Approach Predict a new bounding box for the detection using a class-specific bounding-box regressor after scoring each selec- tive search proposal with a class-specific detection SVM 3 Algorithm The input is a set of N training pairs {(P i, G i )} i=1,2,,n, where P i = (P i x, P i y, P i ω, P i h ) specifies the pixel coordinates of the center of proposal P i s bounding box together with P i s width and height in pixels Each ground-truth bounding box G is specified as: G = (G x, G y, G ω, G h ) The goal is to learn a transformation that maps a proposed box P to a ground-truth box G

Bounding-box Regression Set the transformation as: d x (P), d y (P), d ω (P), d h (P) First specify translation of the center of P: Ĝ x = P ω d x (P) + P x Ĝ y = P h d y (P) + P x Then specify log-space translations of the width and height of P s bounding box: ˆ G ω = P ω exp(d ω ) Ĝ h = P h exp(d h ) Each function d (P) (where is one of x,y,h,w) can be obtained by a linear function from the pool5 featurens of proposal ϕ 5 (P): d (P) = w T ϕ 5 (P), where w is a vector of learnable model parameters

Bounding-box Regression We learn w by optimizing the regularized least squares objective (ridge regression): w = argmin w ˆ Σ N i (t i ŵ ϕ 5 (P i )) 2 + λ ŵ 2 The regression targets t are defined as: t x = (G x P x )/P ω t y = (G y P y )/P h t ω = log(g ω /P ω ) t h = log(g h /P h )