Regionlet Object Detector with Hand-crafted and CNN Feature

Similar documents
Spatial Localization and Detection. Lecture 8-1

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Yiqi Yan. May 10, 2017

Object detection with CNNs

Rich feature hierarchies for accurate object detection and semantic segmentation

Object Detection on Self-Driving Cars in China. Lingyun Li

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Object Detection Based on Deep Learning

Deep Learning for Object detection & localization

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Lecture 5: Object Detection

Lecture 7: Semantic Segmentation

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

Generic Object Detection with Dense Neural Patterns and Regionlets

A Study of Vehicle Detector Generalization on U.S. Highway

Joint Object Detection and Viewpoint Estimation using CNN features

Cascade Region Regression for Robust Object Detection

Automatic detection of books based on Faster R-CNN

Rich feature hierarchies for accurate object detection and semantic segmentation

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

WE are witnessing a rapid, revolutionary change in our

Unified, real-time object detection

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

Deformable Part Models

Fully Convolutional Networks for Semantic Segmentation

Project 3 Q&A. Jonathan Krause

Category-level localization

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Object Detection Design challenges

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Optimizing Object Detection:

Object Category Detection: Sliding Windows

Classification of objects from Video Data (Group 30)

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

YOLO 9000 TAEWAN KIM

Visual features detection based on deep neural network in autonomous driving tasks

Yield Estimation using faster R-CNN

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers

Return of the Devil in the Details: Delving Deep into Convolutional Nets

Pedestrian Detection via Mixture of CNN Experts and thresholded Aggregated Channel Features

PASCAL VOC Classification: Local Features vs. Deep Features. Shuicheng YAN, NUS

Object Detection. Computer Vision Yuliang Zou, Virginia Tech. Many slides from D. Hoiem, J. Hays, J. Johnson, R. Girshick

Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection

Multi-View 3D Object Detection Network for Autonomous Driving


Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

Final Report: Smart Trash Net: Waste Localization and Classification

Semantic Segmentation

Computer Vision Lecture 16

Can appearance patterns improve pedestrian detection?

Modern Convolutional Object Detectors

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

Computer Vision Lecture 16

Object Recognition II

Bilinear Models for Fine-Grained Visual Recognition

OBJECT DETECTION HYUNG IL KOO

Convolutional Neural Networks

arxiv: v1 [cs.cv] 5 Oct 2015

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

G-CNN: an Iterative Grid Based Object Detector

Object Detection. Part1. Presenter: Dae-Yong

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Pedestrian Detection with Deep Convolutional Neural Network

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

Rich feature hierarchies for accurate object detection and semant

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Computer Vision Lecture 16

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

Object Recognition and Detection

A Deep Learning Framework for Authorship Classification of Paintings

Future directions in computer vision. Larry Davis Computer Vision Laboratory University of Maryland College Park MD USA

Traffic Multiple Target Detection on YOLOv2

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

arxiv: v1 [cs.cv] 26 Jun 2017

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Detection III: Analyzing and Debugging Detection Methods

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

R-FCN: Object Detection with Really - Friggin Convolutional Networks

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

WITH the recent advance [1] of deep convolutional neural

Large-scale Video Classification with Convolutional Neural Networks

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Using Machine Learning for Classification of Cancer Cells

1002 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 17, NO. 4, APRIL 2016

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Find that! Visual Object Detection Primer

Transcription:

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu

Overview of this section Regionlet Object Detector Regionlet Localizer (re-localization) Regionlet with Deep CNN Feature CNN Feature Extraction Support Pixel Integral Image Application Examples Car Detection for Fine-grained Image Classification Pedestrian, Car, Cyclist Detection for Autonomous Driving

What is Regionlet Object Detector A significant extension to traditional boosting object detector Together with OverFeat and R-CNN, the Regionlet detector is one of the first several detectors that successfully adopt deep CNN features for generic object detection.

How does Regionlet detector connect to past/future RealBoost 1 Segmentation as Selective Search 2 Low-level Feature Deep CNN 3 Past Boosting Feature Selection Object Proposal Generalized Spatial Pyramid for CNN Feature Pooling 2013 CNN-based Object Detection Spatial Pyramid Pooling in SPP-Net 4 RoI Pooling in Fast R- CNN 5 future 1. C. Huang, et. al. Boosting nested cascade detector for multi-view face detection. ICPR, 2004. 2. K. E. A. Van de Sande, et. al. Segmentation as selective search for object recognition. ICCV 2011 3. Krizhevsky, et. al. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012 4. He, et. al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. ECCV 2014 5. Ross Girshick. Fast R-CNN. ICCV 2015

Boosting Object Detector x 2 x 1 N f X = β i h i (x i ) i=0 x N Weak classifier A detection window A sub-region where weak classifier is built based on

Traditional Boosting Detection Framework Model 1 Model 2 Operate on multiple scales to detect objects in different scales Use multiple components to detect objects with various aspect ratios How about a single model, but flexible during testing, no feature pyramids, no multiple components

What the Regionlet Detector Proposed A boosting classifier that can take inputs of different scales A boosting classifier that can take inputs of different viewpoints A boosting classifier containing feature pooling learning

Regionlet: Definition Region(R): Feature extraction region Regionlet(r 1, r 2, r 3 ): A sub-region in a feature extraction area whose position/resolution are relative and normalized to a detection window Region Regionlet

Regionlet: Definition(cont.) Regionlet coordinates are normalized Traditional (l, t, r, b) (50,50,180,180) h w Normalized l w, t h, r w, b h (.25,.25,.90,.90) (50,50,180,180) (.25,.25,.90,.90)

Regionlet: Definition(cont.) Regionlet definition = Generalized Spatial Pyramid Similar Both use relative coordinates Difference Regionlet: coordinates are relative to the detection window (not the image) Regionlet: coordinates are flexible (do not have to evenly divide the image/window) Regionlet feature extraction = Generalized Spatial Pyramid Pooling Rectangles in Generalized Spatial Pyramid Rectangles in Spatial Pyramid

Connection to other methods in pooling design Object Proposal CNN-based Object Detection Spatial Pyramid Pooling in SPP-Net 1 Generalized Spatial Pyramid for CNN Feature Pooling RoI Pooling in Fast R- CNN 2 1. He, et. al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. ECCV 2014 2. Ross Girshick. Fast R-CNN. ICCV 2015

Regionlet: Feature extraction Non-local pooling Could be Hand-crafted features or deep CNN features, whatever feature your like!

Regionlet Classifier Each weak classifier is based on a 1-D feature extracted from a region Feature Feature extraction x Regionlets n 1 Weak Classifier h x = v o 1 B x = 0 o=1 Strong Classifier T H X = β i i=1 h i (x i )

Detection Framework (a) (b) (c) Regionlet Region (a) : Input image (b) : Generate object regions 1,2,3 (c) : Feature extraction and pooling Generalized Spatial Pyramid Pooling inside Regionget Low-level features CNN features (will talk later) Max-pooling among Regionlets 1. K. E. A. Van de Sande, et. al. Segmentation as selective search for object recognition. ICCV 2011 2. B. Alexe, et. al. Measuring the objectness of image windows. T-PAMI 2012 3. S. Ren, et. al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015

Multiple scale & viewpoints Handling Not a motorbike Regionlet Model Adjusting the model to a candidate bounding box

Multiple scale & viewpoints Handling Motorbike Detected Regionlet Model Adjusting the model to a candidate bounding box

Weak Classifier Construction Weak learner on each REGION Eight lots lookup table Lookup table is learned Lot value is learned One lot is activated for one feature Regionlet feature x i (after pooling) Assign lot 0.01-0.2-0.5-0.4 0.02 0.15 0.5 0.3 N H(X) = LLT i (x i ) i=1 Weak learner output: -0.5

Regionlet Training How to get regions and regionlets Regions Regions are randomly sampled Effective Regions are greedily selected to reduce learning cost Regionlets Each Region & Regionlet configuration are randomly configured A Region and its regionlets configuration are selected simultaneously Region & Regionlet pool is fixed for each cascade learning

Regionlet: Training Constructing the regions/regionlets pool Small region, fewer regionlets -> fine spatial layout Large region, more regionlets -> robust to deformation Learning realboost 1 cascades 16K region/regionlets candidates for each cascade Learning of each cascade stops when the error rate is achieved (1% for positive, 37.5% for negative) Last cascade stops after collecting 5000 weak classifiers Result in 4-7 cascades 2-3 hours to finish training one category on a 8-core machine 1. C. Huang, et. al. Boosting nested cascade detector for multi-view face detection. ICPR, 2004.

Regionlet: Testing No image resizing Any scale, any aspect ratio Adapt the model size to the same size as the object candidate bounding box One model, resize image + Multiple models, original image + Ours, One model, original image +

Overview of this section Regionlet Object Detector Regionlet Localizer Regionlet with Deep CNN Feature CNN Feature Extraction Support Pixel Integral Image Application Examples Car Detection Pedestrian, Car, Cyclist Detection for Autonomous Driving

Regionlet Localizer (object re-localization) Why a localizer is needed (classification & localization precision dilemma) VS Data augmentation during training to accommodate inaccurate localization As accurate location as possible during testing

Regionlet Localizer Regionlet feature can be reused for localization Each Regionlet feature is associated with a spatial location The location is learned during classifier training

Regionlet Localizer Regionlet feature can be reused for localization Regionlet classifier 1 Regionlet classifier N 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 8N dimensional binary vector

Regionlet Localizer 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 W l, t, r, b

Regionlet Localizer Training Random sample examples which have > 0.6 overlap with ground truth Less overlap gives poor results The regression task learns the location difference

Regionlet Localizer Experiment result on our car dataset for autonomous driving 17501 cars for training 12546 cars for testing Detection performance (% AP) 0.5 overlap 0.7 overlap Regionlet 62.7% 34.6% Regionlet + localization 65.3% 43.9% Improvement 2.6% 9.1%

Overview of this section Regionlet Object Detector Regionlet Localizer Regionlet with Deep CNN Feature CNN Feature Extraction Support Pixel Integral Image Application Examples Car Detection Pedestrian, Car, Cyclist Detection for Autonomous Driving

Regionlet with DCNN Deep CNN Deep structure learns high-level information Max-pooling is robust to parts misalignment Information are jointly learned How to establish a bridge for DCNN and Regionlet object detection framework?

Regionlet with DCNN Deep CNN structure Features from convolution layers retain spatial information Convolutional layers

Regionlet with DCNN Deep CNN structure Features from convolution layers retain spatial information A feature vector

Regionlets with DCNN Deep CNN structure image-convolution to generate features for the whole image + (a) Convolution kernels (b) Output the dense neural patterns

Support Pixel Integral Image CNN Feature Not available on each pixel Feature dimension is high for integral image We want dense integral feature We want to save memory Integral Image Computation Does not change if rowsum = 0 Does not change if colsum = 0

Support Pixel Integral Image Support Pixel Where the integral vector computation is inevitable

Support Pixel Integral Image Support Pixel Integral Image for CNN Feature 1 Support pixel integral vector (sparse) Dense virtual integral vector map 1. Wang et. al. Regionlets for Generic Object Detection. T-PAMI, 2015

Regionlet with DCNN Deep CNN feature for detection 1, 2 (a) Input image (b) Densely extracted feature maps (c) Generalized Spatial Pyramid Deep CNN Feature Pooling (d) Detected object bounding box 1. Zou et. al. Generic Object Detection with Dense Neural Patterns and Regionlets. BMVC, 2014 2. Wang et. al. Regionlets for Generic Object Detection. T-PAMI, 2015

Experiment Results Regionlet + CNN feature (No fine-tuning) Average precision on PASCAL VOC 2007 (%) map Regionlet 41.7 Regionlet-CNN pool5 49.3 R-CNN pool5 44.2 R-CNN FT fc7 BB 58.5

Experiment Results Visualization of selected neuron patterns

Running speed Regionlet detector + localizer : 5 fps using a single CPU core, (>30 fps) using 8 cores Regionlet + CNN: 3fps using a single CPU core

Application examples Car, pedestrian, cyclist detection for autonomous driving Ranked #1 on KITTI detection dataset Car Detection (map %) Moderate Easy Hard 3DVP 75.77% 87.46% 65.38% SS 74.30% 85.03% 59.48% SVM-Res 67.49% 78.11% 54.28% Regionlet 76.45% 84.75% 59.70%

Application examples Car, pedestrian, cyclist detection for autonomous driving Ranked #1 on KITTI detection dataset Pedestrian Detection (map %) Moderate Easy Hard paucenst 1 54.49% 65.26% 48.60% R-CNN 50.13% 61.61% 44.79% Fusion-DPM 46.67% 59.51% 42.05% Regionlet 61.15% 73.14% 55.21% 1. S. Paisitkriangkrai, C. Shen and A. Hengel, Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning

Application examples Car, pedestrian, cyclist detection for autonomous driving Ranked #1 on KITTI detection dataset Cyclist Detection (map %) Moderate Easy Hard paucenst 38.03% 51.62% 33.38% R-CNN 34.47% 50.07% 32.12% DF+DPM+ROI 30.90% 41.86% 27.75% Regionlet 58.72% 70.41% 51.83%

Application examples Car detection Totally 170K cars for fine-grained car recognition 11000 labelled for training detector 2745 labelled for testing 100% AP

Take Away Messages Regionlet extend the traditional boosting object detector It operates on object proposal (the generalized spatial pyramid pooling is the key) It integrates a pooling process It can easily integrate various features Regionlet is FAST ( 5 fps, single core CPU) Regionlet is FAST even with CNN Feature (3fps, single core CPU)

Thank you! We are hiring!