AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Similar documents
Spatial Localization and Detection. Lecture 8-1

arxiv: v2 [cs.cv] 26 Sep 2015

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Object detection with CNNs

WITH the recent advance [1] of deep convolutional neural

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Lecture 5: Object Detection

Yiqi Yan. May 10, 2017

Rich feature hierarchies for accurate object detection and semantic segmentation

Recurrent Convolutional Neural Networks for Scene Labeling

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Object Detection Based on Deep Learning

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Rich feature hierarchies for accurate object detection and semantic segmentation

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

G-CNN: an Iterative Grid Based Object Detector

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Computer Vision Lecture 16

Computer Vision Lecture 16

Project 3 Q&A. Jonathan Krause

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

R-FCN: Object Detection with Really - Friggin Convolutional Networks

Towards Weakly- and Semi- Supervised Object Localization and Semantic Segmentation

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Structured Prediction using Convolutional Neural Networks

Object Detection on Self-Driving Cars in China. Lingyun Li

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Modern Convolutional Object Detectors

Learning to Localize Objects with Structured Output Regression

Final Report: Smart Trash Net: Waste Localization and Classification

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

YOLO 9000 TAEWAN KIM

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

ImageNet Classification with Deep Convolutional Neural Networks

Automatic detection of books based on Faster R-CNN

Tutorial on Machine Learning Tools

Fully Convolutional Networks for Semantic Segmentation

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Regionlet Object Detector with Hand-crafted and CNN Feature

CAP 6412 Advanced Computer Vision

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

Development in Object Detection. Junyuan Lin May 4th

Weakly Supervised Object Recognition with Convolutional Neural Networks

Deep Learning for Computer Vision II

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

YOLO9000: Better, Faster, Stronger

Deep Learning for Object detection & localization

Optimizing Object Detection:

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Semantic Segmentation

Scene Text Recognition for Augmented Reality. Sagar G V Adviser: Prof. Bharadwaj Amrutur Indian Institute Of Science

Rich feature hierarchies for accurate object detection and semant

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Object Detection with Partial Occlusion Based on a Deformable Parts-Based Model

Deep Learning and Its Applications

Classification of objects from Video Data (Group 30)

Lecture 37: ConvNets (Cont d) and Training

Lecture 7: Semantic Segmentation

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Yield Estimation using faster R-CNN

Computer Vision Lecture 16

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms

Advanced Video Analysis & Imaging

CNN Basics. Chongruo Wu

Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification

A Novel Representation and Pipeline for Object Detection

Category-level localization

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

MACHINE LEARNING CLASSIFIERS ADVANTAGES AND CHALLENGES OF SELECTED METHODS

G-CNN: an Iterative Grid Based Object Detector

Deconvolutions in Convolutional Neural Networks

arxiv: v1 [cs.cv] 26 Jun 2017

Detecting and Recognizing Text in Natural Images using Convolutional Networks

arxiv: v2 [cs.cv] 7 Apr 2016

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

arxiv: v1 [cs.cv] 4 Jun 2015

Unified, real-time object detection

Cascade Region Regression for Robust Object Detection

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

OBJECT DETECTION HYUNG IL KOO

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

Deep Learning & Neural Networks

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

Object Detection with Discriminatively Trained Part Based Models

Learning 6D Object Pose Estimation and Tracking

All You Want To Know About CNNs. Yukun Zhu

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

arxiv: v1 [cs.cv] 20 Dec 2016

Unsupervised Deep Learning. James Hays slides from Carl Doersch and Richard Zhang

Object Proposal Generation with Fully Convolutional Networks

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

Introduction to Neural Networks

Transcription:

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015) Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony Paek, In So Kweon.

State-of-the-art frameworks for object detection.

State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14]

State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] Object proposal.

State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] CNN Object proposal.

State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] SVM CNN Object proposal.

State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal.

State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal.

State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal. ( ) The maximally scored region is prone to focus on discriminative part (e.g. face) rather than entire object (e.g. human body).

State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal. ( ) The maximally scored region is prone to focus on discriminative part (e.g. face) rather than entire object (e.g. human body).

State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13]

State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13]

State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13] X 1 y 1 X 2 y 2 CNN

State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13] (X 2,Y 2 ) X 1 y 1 X 2 y 2 CNN (X 1,Y 1 )

State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13] (X 2,Y 2 ) X 1 y 1 X 2 y 2 CNN (X 1,Y 1 ) ( ) Direct mapping from an image to an exact bounding box is relatively difficult for a CNN.

Idea: Ensemble of weak prediction.

Idea: Ensemble of weak prediction.

Idea: Ensemble of weak prediction.

Idea: Ensemble of weak prediction.

Idea: Ensemble of weak prediction.

Idea: Ensemble of weak prediction.

Idea: Ensemble of weak prediction. Stop signal

Idea: Ensemble of weak prediction. Stop signal

Idea: Ensemble of weak prediction. Stop signal Stop signal

Idea: Ensemble of weak prediction. Stop signal Stop signal

Model: Rather than CNN regression model, use CNN classification model.

Model: Rather than CNN regression model, use CNN classification model. Bottom-right direction prediction. Top-left direction prediction. Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.

Model: Rather than CNN regression model, use CNN classification model. Bottom-right direction prediction. Top-left direction prediction. Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.

Model: Rather than CNN regression model, use CNN classification model. [ 3 directions, stop signal, no object ] R 5 [ 3 directions, stop signal, no object ] R 5 Bottom-right direction prediction. Top-left direction prediction. Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.

Model: Rather than CNN regression model, use CNN classification model. [ 3 directions, stop signal, no object ] R 5 [ 3 directions, stop signal, no object ] R 5 F F Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.

Iterative test: Ensemble of weak directions.

Iterative test: Ensemble of weak directions.

Iterative test: Ensemble of weak directions.

Iterative test: Ensemble of weak directions.

Iterative test: Ensemble of weak directions.

Iterative test: Ensemble of weak directions.

Iterative test: Ensemble of weak directions.

Iterative test: Ensemble of weak directions.

Training AttentionNet.

Training AttentionNet. 1. Generating training samples.

Training AttentionNet. 2. Minimizing the loss function by back-propagation and stochastic gradient descent. L = 1 2 L softmax y TL, t TL + 1 2 L softmax y BR, t BR.

Result. (Good examples.)

Result. (Good examples.)

Result. (Bad examples.)

How to detect multiple instance?

Extension to multiple-instance: 1. Fast multi-scale sliding window search using fully-convolutional network.

*Fast extraction of multi-scale dense activations.

*Fast extraction of multi-scale dense activations. 227 227 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6

*Fast extraction of multi-scale dense activations. 227 227 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6 322 322 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6

*Fast extraction of multi-scale dense activations. Idea: Fully connection can be equally implemented by convolutional layer. 227 227 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6 322 322 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6

*Fast extraction of multi-scale dense activations. Idea: Fully connection can be equally implemented by convolutional layer. 227 227 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6 322 322 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 Conv. 7 Conv. 6

*Fast extraction of multi-scale dense activations.

*Fast extraction of multi-scale dense activations.

*Fast extraction of multi-scale dense activations.

*Fast extraction of multi-scale dense activations. 4,096 Multi-scale dense activations.

*Fast extraction of multi-scale dense activations. 4,096 Each activation vector comes from each patch. Multi-scale dense activations.

Extension to multiple-instance: 1. Fast multi-scale sliding window search using fully-convolutional network.

Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint.

Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint. Satisfying { TL, BR }: Start iterative test.

Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint. Un-satisfying { TL, BR }: Reject. Satisfying { TL, BR }: Start iterative test.

Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint. Un-satisfying { TL, BR }: Reject. Un-satisfying { TL, BR }: Reject. Satisfying { TL, BR }: Start iterative test.

Extension to multiple-instance: Overall architecture for sliding window search.

Extension to multiple-instance: Merging multiple bounding boxes.

Extension to multiple-instance: Merging multiple bounding boxes.

Extension to multiple-instance: Merging multiple bounding boxes.

Extension to multiple-instance: Merging multiple bounding boxes.

Extension to multiple-instance: Merging multiple bounding boxes.

Evaluation on PASCAL VOC Series. PASCAL VOC 2007 Person. 58.7 RCNN. PASCAL VOC 2012 Person. RCNN-based.

Evaluation on PASCAL VOC Series. AttentionNet. PASCAL VOC 2007 Person. 58.7 RCNN. AttentionNet. PASCAL VOC 2012 Person. RCNN-based.

Evaluation on PASCAL VOC Series. AttentionNet+RCNN. PASCAL VOC 2007 Person. 58.7 RCNN. AttentionNet+RCNN. PASCAL VOC 2012 Person. RCNN-based.

Evaluation on PASCAL VOC Series. PASCAL VOC 2007 Person. 58.7 Precision-recall curve on PASCAL VOC 2007 Person. PASCAL VOC 2012 Person.