Segmentation as Selective Search for Object Recognition in ILSVRC2011

Similar documents
Selective Search for Object Recognition

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deformable Part Models

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Category-level localization

ImageCLEF 2011

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Rich feature hierarchies for accurate object detection and semantic segmentation

Object detection with CNNs

Selective Search for Object Recognition

Object Detection Based on Deep Learning

An Efficient Divide-and-Conquer Cascade for Nonlinear Object Detection

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

GPU-Accelerated Feature Extraction

Part based models for recognition. Kristen Grauman

BUAA AUDR at ImageCLEF 2012 Photo Annotation Task

Lecture 5: Object Detection

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

Detection III: Analyzing and Debugging Detection Methods

Hierarchical Learning for Object Detection. Long Zhu, Yuanhao Chen, William Freeman, Alan Yuille, Antonio Torralba MIT and UCLA, 2010

Automatic License Plate Detection

Learning Representations for Visual Object Class Recognition

Multi-Component Models for Object Detection

Spatial Localization and Detection. Lecture 8-1

Bag-of-features. Cordelia Schmid

Object Detection by 3D Aspectlets and Occlusion Reasoning

Category vs. instance recognition

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Part-based and local feature models for generic object recognition

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Object Classification Problem

Improved Fisher Vector for Large Scale Image Classification XRCE's participation for ILSVRC

Find that! Visual Object Detection Primer

Recap Image Classification with Bags of Local Features

The Impact of Color on Bag-of-Words based Object Recognition

Development in Object Detection. Junyuan Lin May 4th

AN EFFICIENT NAIVE BAYES APPROACH TO CATEGORY-LEVEL OBJECT DETECTION. J.M.H. du Buf. Kasim Terzić

A Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification

Linear combinations of simple classifiers for the PASCAL challenge

Accurate Object Detection with Location Relaxation and Regionlets Re-localization

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Learning Spatial Context: Using Stuff to Find Things

Fish species recognition from video using SVM classifier

Semantic Segmentation

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Category-level Localization

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

Ranking Figure-Ground Hypotheses for Object Segmentation

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Evaluation and comparison of interest points/regions

Video Object Proposals

Computer Vision: Visual Extent of an Object

Learning to Localize Objects with Structured Output Regression

Using k-poselets for detecting people and localizing their keypoints

Selection of Scale-Invariant Parts for Object Class Recognition

Learning Object Representations for Visual Object Class Recognition

Beyond Bags of features Spatial information & Shape models

Mixtures of Gaussians and Advanced Feature Encoding

Histogram of Oriented Gradients (HOG) for Object Detection

Yiqi Yan. May 10, 2017

Regionlet Object Detector with Hand-crafted and CNN Feature

Object Category Detection. Slides mostly from Derek Hoiem

Complexity-Adaptive Distance Metric for Object Proposals Generation

Martian lava field, NASA, Wikipedia

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Action recognition in videos

CAP 6412 Advanced Computer Vision

Edge Boxes: Locating Object Proposals from Edges

Efficient indexing for Query By String text retrieval

Rich feature hierarchies for accurate object detection and semant

HISTOGRAMS OF ORIENTATIO N GRADIENTS

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

High Level Computer Vision

Fisher and VLAD with FLAIR

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

The MediaMill TRECVID 2008 Semantic Video Search Engine

Real-Time Detection of Landscape Scenes

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Fully Convolutional Networks for Semantic Segmentation

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

VISUAL categorization aims to determine whether objects

Local Context Priors for Object Proposal Generation

Automatic Indoor 3D Scene Classification using RGB-D data

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Multiscale Combinatorial Grouping

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Object Recognition as Ranking Holistic Figure-Ground Hypotheses

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Patch Descriptors. CSE 455 Linda Shapiro

Object Proposal by Multi-branch Hierarchical Segmentation

Object Proposal by Multi-branch Hierarchical Segmentation

Improved Spatial Pyramid Matching for Image Classification

A Discriminatively Trained, Multiscale, Deformable Part Model

CS229: Action Recognition in Tennis

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Transcription:

Segmentation as Selective Search for Object Recognition in ILSVRC2011 Koen van de Sande Jasper Uijlings Arnold Smeulders Theo Gevers Nicu Sebe Cees Snoek University of Amsterdam, University of Trento ILSVRC2011 - November 7 th 2011 1

Object Recognition What is it? Classification (task #1) What is it + Where is it? Classification with localisation (task #2) Train Person Plane Bicycle 2

Exhaustive Search Viola IJCV 2004 Dalal CVPR 2005 Felzenszwalb TPAMI 2010 Vedaldi ICCV 2009 Logical first step, visits every location Imposes computational constraints on #possible locations considered (coarse grid/fixed aspect ratio) Evaluation cost per location (weak features/classifiers) Impressive results To go beyond them, need to be more sophisticated 3

Selective Search Carreira CVPR 2010 Endres ECCV 2010 Uijlings CVPR 2009 Selective search has been exploited convincingly for segmentation For segmentation, the emphasis is on finding few but good segments 10-100 per image Accurate delineation contour detector For recognition Once discarded, an object will never be found again Appearance including nearby context more effective Use segmentation with different goal 4

Selective Search for Recognition Design criteria High recall Coarse locations are sufficient Bounding boxes Fast to compute Efficient low-level features <10s per image 5

Gu CVPR 2009 Selective Search: High Recall Image is intrinsically hierarchical Segmentation at a single scale won t find all objects 6

Van de Sande ICCV 2011 Selective Search: Approach Hypotheses based on hierarchical grouping 7

Van de Sande ICCV 2011 Selective Search: Approach Hypotheses based on hierarchical grouping Ground truth 8

Van de Sande ICCV 2011 Selective Search: Approach Hypotheses based on hierarchical grouping Initial segments from oversegmentation [Felzenszwalb2004] 9

Van de Sande ICCV 2011 Selective Search: Approach Hypotheses based on hierarchical grouping Group adjacent regions on color/texture cues 10

Van de Sande ICCV 2011 Selective Search: Approach Hypotheses based on hierarchical grouping Object hypotheses from all hierarchy levels 11

Example 1 13

Example 2 14

Selective Search: High Recall Color cues work best Texture cues work, color fails No single segmentation strategy works everywhere We need a set of complementary segmentation strategies 15

Evaluation of object hypotheses Standard Pascal VOC overlap criterion Correct if overlap normalized for size >= 50% Recall is the % of objects for which there is a hypothesis with >=50% overlap

Van de Sande ICCV 2011 Multiple Complementary Color Spaces We diversify the set of segmentations: combine multiple initial segmentations and different color spaces Color spaces with complementary invariance properties: some include shadow/shading pixels in a segment, others do not Location hypotheses are class-independent VOC2007 test 1,536 windows/image 96.7% recall 17

Selective Search on ILSVRC2011 Apply to ILSVRC2011 train set Object hypotheses are class-independent ILSVRC2011 train With bounding box annotations 315,525 images Average #boxes/image 1,565 Average recall 98.5% Recall not informative anymore 18

Is recall@50%-overlap still the right measure? 0 Missed 1.5% 1 Recall 98.5% Average best overlap 87.6% ABO

Average Best Overlap Example What does ~87.6% ABO look like? Overlap 88.4% Overlap 87.9% Overlap 87.4% is high recall visible in results? 20

VOC2011 detection challenge - aeroplane High recall 21

VOC2011 detection challenge - cat High recall 22

Task #1 - Classification 23

Task #1: Classification Bag-of-words system with configuration similar to VOC2008 participation Dense sampling and Harris-Laplace keypoints SIFT, RGB-SIFT, X-ColorSIFT (under submission at IJCV) Spatial pyramid 1x1 and 1x3 Vector Quantization with hard assignment We have 1.3 million training images and 1000 objects this time around Visit http://www.colordescriptors.com 24

Training 1000 object classifiers Van de Sande TMM 2011 Maji CVPR 2009 Each object category has ~1,300 positive examples As negative examples: random 10% of the train set The same set of negatives for all categories re-use big part of kernel matrix Compute kernel matrix on GPU: 10x faster than quadcore CPU (40x faster than single-core) Histogram Intersection Kernel with Fast Approximation Based on probabilistic output of LibSVM: Select 5 objects with highest score per image flat error 0.346 25

Task #2 Classification with localisation 26

Localisation System Training Use positives and mirrored positives Use object hypotheses to create difficult initial negatives (at most 7,500) Add 2 iterations of false positives (from 4,000 images) Features: Bag-of-words, sample every pixel, SIFT, X-ColorSIFT and RGB-SIFT, pyramid up to level 3, codebook size 4096 Histogram Intersection Kernel with Fast Approximation 27

Applying Object Localisation 100,000 test images 1000 object detectors = 100M classifications Practical problem: deadline too close Each object occurs exactly 100 times Apply object detector only to: Top-10 objects per image (in terms of classification likelihood) Top-1000 images per object Reduction to 1.24M classifications 28

ILSVRC 2011 Results Classification: in the 5 labels selected for an image, is the correct label in there? Classification Task 1 Flat error Base 0.346 Reranked w/localisation 0.310 It is for 65.4% of the images Classification with localisation: in the 5 selected bounding boxes for an image, does it overlap >50% with ground truth box and is the box labeled correctly? Task 2 Flat error Localisation 0.425 Just 8% error increase! 29

Conclusions Adopted segmentation as selective search strategy for object localisation: High recall: >96% with ~1,500 locations Coarse locations are sufficient: bounding boxes Fast to compute: <10s per image Class-independent Enables the use of bag-of-words features Doing localisation on top of classification increases error rate by just 8% Come visit poster 42 Segmentation as Selective Search for Object Recognition, ICCV 2011, K.E.A. van de Sande, J.R.R. Uijlings, T. Gevers, and A.W.M. Smeulders, Poster #42, Wednesday 17:20-20:00 30