Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

Similar documents
Semantic Pooling for Image Categorization using Multiple Kernel Learning

Learning to Localize Objects with Structured Output Regression

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

Category-level localization

LEARNING OBJECT SEGMENTATION USING A MULTI NETWORK SEGMENT CLASSIFICATION APPROACH

Deep condolence to Professor Mark Everingham

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Hierarchical Image-Region Labeling via Structured Learning

Object Detection Based on Deep Learning

Learning Representations for Visual Object Class Recognition

Detection III: Analyzing and Debugging Detection Methods

Learning to Localize Objects with Structured Output Regression

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

Deformable Part Models

An Efficient Divide-and-Conquer Cascade for Nonlinear Object Detection

Find that! Visual Object Detection Primer

Linear combinations of simple classifiers for the PASCAL challenge

MEDICAL IMAGE SEGMENTATION WITH DIGITS. Hyungon Ryu, Jack Han Solution Architect NVIDIA Corporation

Ranking Figure-Ground Hypotheses for Object Segmentation

CPSC340. State-of-the-art Neural Networks. Nando de Freitas November, 2012 University of British Columbia

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Learning Object Representations for Visual Object Class Recognition

PASCAL VOC Classification: Local Features vs. Deep Features. Shuicheng YAN, NUS

Modern Object Detection. Most slides from Ali Farhadi

EECS 442 Computer vision Databases for object recognition and beyond Segments of this lectures are courtesy of Prof F. Li, R. Fergus and A.

Bag-of-features. Cordelia Schmid

Category-level Localization

A Multiple Kernel Learning Approach to Joint Multi-class Object Detection

Attributes. Computer Vision. James Hays. Many slides from Derek Hoiem

Generating Descriptions of Spatial Relations between Objects in Images

MULTIMODAL SEMI-SUPERVISED IMAGE CLASSIFICATION BY COMBINING TAG REFINEMENT, GRAPH-BASED LEARNING AND SUPPORT VECTOR REGRESSION

Loose Shape Model for Discriminative Learning of Object Categories

Stealing Objects With Computer Vision

Lecture 12 Recognition

Object Detection with Discriminatively Trained Part Based Models

Object Recognition II

Incremental Learning of Object Detectors Using a Visual Shape Alphabet

Object recognition (part 2)

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Beyond Bags of features Spatial information & Shape models

Part based models for recognition. Kristen Grauman

Efficient Object Localization with Gaussianized Vector Representation

2 Related Work. 2.1 Logo Localization

Lecture 12 Recognition. Davide Scaramuzza

Single-Shot Refinement Neural Network for Object Detection -Supplementary Material-

24 hours of Photo Sharing. installation by Erik Kessels

Part-based and local feature models for generic object recognition

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

c 2011 by Pedro Moises Crisostomo Romero. All rights reserved.

Deformable Part Models with Individual Part Scaling

Diagnosing Error in Object Detectors


Multiple Kernels for Object Detection

Applying Supervised Learning

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Object Recognition with Deformable Models

Multi-class image segmentation using Conditional Random Fields and Global Classification

G-CNN: an Iterative Grid Based Object Detector

Object Category Detection. Slides mostly from Derek Hoiem

Contents. Preface to the Second Edition

Class Segmentation and Object Localization with Superpixel Neighborhoods

Apprenticeship Learning: Transfer of Knowledge via Dataset Augmentation

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Immediate, scalable object category detection

Lecture 25: Review I

Computer Vision: Summary and Discussion

Weakly Supervised Object Recognition with Convolutional Neural Networks

A Comparison of l 1 Norm and l 2 Norm Multiple Kernel SVMs in Image and Video Classification

Unified, real-time object detection

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

ImageCLEF 2011

6.819 / 6.869: Advances in Computer Vision

Optimizing Object Detection:

Local Context Priors for Object Proposal Generation

Random Forest A. Fornaser

Learning Spatial Context: Using Stuff to Find Things

arxiv: v3 [cs.cv] 29 May 2016

Segmenting Objects in Weakly Labeled Videos

Semantic Texton Forests

Classifiers and Detection. D.A. Forsyth

2282 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

CS 343: Artificial Intelligence

Structured Models in. Dan Huttenlocher. June 2010

Multi-net System Configuration for Visual Object Segmentation by Error Backpropagation

Classification and Detection in Images. D.A. Forsyth

Object Detection on Self-Driving Cars in China. Lingyun Li

Layered Object Detection for Multi-Class Segmentation

CS6716 Pattern Recognition

Object Classification Problem

Object Segmentation by Alignment of Poselet Activations to Image Contours

A Discriminatively Trained, Multiscale, Deformable Part Model

Transcription:

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search Christoph H. Lampert, Matthew B. Blaschko, & Thomas Hofmann Max Planck Institute for Biological Cybernetics Tübingen, Germany Google, Inc. Zürich, Switzerland

Identify all Objects in an Image

Identify all Objects in an Image

Identify all Objects in an Image

Identify all Objects in an Image

Identify all Objects in an Image

Identify all Objects in an Image

Identify all Objects in an Image

Identify all Objects in an Image

Overview... Object Localization Sliding Window Classifiers Efficient Subwindow Search Results

Sliding Window: Example 0.1

Sliding Window: Example -0.2

Sliding Window: Example -0.1

Sliding Window: Example 0.1

Sliding Window: Example... 1.5...

Sliding Window: Example 0.5

Sliding Window: Example 0.4

Sliding Window: Example 0.3

Sliding Window: Example 0.1-0.2-0.1 0.1... 1.5... 0.5 0.4 0.3

Sliding Window Classifier approach: sliding window classifier evaluate classifier at candidate regions in an image - argmax B B f I (B) for a 640 480 pixel image, there are over 10 billion possible regions to evaluate sample a subset of regions to evaluate scale aspect ratio grid size

Sliding Window Classifier approach: sliding window classifier evaluate classifier at candidate regions in an image - argmax B B f I (B) for a 640 480 pixel image, there are over 10 billion possible regions to evaluate sample a subset of regions to evaluate scale aspect ratio grid size We need a better way to search the space of possible windows

Overview... Object Localization Sliding Window Classifiers Efficient Subwindow Search Results

Efficient Object Localization Problem: Exhaustive evaluation of argmax B B f I (B) is too slow. Solution: Use the problem s geometric structure. Similar boxes have similar scores. Calculate scores for sets of boxes jointly (upper bound). If no element can contain the object, discard the set. Else, split the set into smaller parts and re-check, etc. efficient branch & bound algorithm

Branch & Bound Search Form a priority queue that stores sets of boxes. Optimality check is O(1). Split is O(1). Bound calculation depends on quality function. For us: O(1) No pruning step necessary n m images: empirical performance O(nm) instead of O(n 2 m 2 ). no approximations, solution is globally optimal

Branch & Bound Branch & bound algorithms have three main design choices Parametrization of the search space Technique for splitting regions of the search space Bound used to select the most promising regions

Sliding Window Parametrization low dimensional parametrization of bounding box (left, top, right, bottom)

Sets of Rectangles Branch-and-Bound works with subsets of the search space. Instead of four numbers [l, t, r, b], store four intervals [L, T, R, B ]: L = [l lo, l hi ] T = [t lo, t hi ] R = [r lo, r hi ] B = [b lo, b hi ]

Branch-Step: Splitting Sets of Boxes rectangle set [L, R, T, B] [L, R 1, T, B] with R 1 := [r lo, r lo +r hi 2 ] [L, R 2, T, B] with R 2 := [ r lo +r hi +1, r 2 hi ]

Bound-Step: Constructing a Quality Bound We have to construct f upper : { set of boxes } R such that i) f upper (B) max B B f (B), ii) f upper (B) = f (B), if B = {B}. Example: SVM with Linear Bag-of-Features Kernel f (B) = j α j h B, h j h B the histogram of the box B. = j α j k hb k hj k = k hb k w k, for w k = j α jh j k = x i B w c i, c i the cluster ID of the feature x i Example: Upper Bound Set f + (B) = x i B [w i] +, f (B) = x i B [w i]. Set B max := largest box in B, B min := smallest box in B. f upper (B) := f + (B max ) + f (B min ) fulfills i) and ii).

Evaluating the Quality Bound for Linear SVMs f (B) = x i B w i. f upper (B) = x i B max [w i ] + + Evaluating f upper (B) has same complexity as f (B)! Using integral images, this is O(1). x i B min [w i ].

Bound-Step: Constructing a Quality Bound It is easy to construct bounds for Boosted classifiers SVM Logistic regression Nearest neighbor Unsupervised methods... provided we have an appropriate image representation Bag of words Spatial pyramid χ 2 Itemsets... The following require assumptions about the image statistics to implement Template based classifiers Pixel based classifiers

Overview... Object Localization Sliding Window Classifiers Efficient Subwindow Search Results

Results: UIUC Cars Dataset 1050 training images: 550 cars, 500 non-cars 170 test images single scale 139 test images multi scale

Results: UIUC Cars Dataset Evaluation: Precision-Recall curves with different pyramid kernels 1.0 UIUC Cars (single scale) 1.0 UIUC Cars (multi scale) 0.8 bag of words 2x2 pyramid 4x4 pyramid 6x6 pyramid 8x8 pyramid 10x10 pyramid 0.8 bag of words 2x2 pyramid 4x4 pyramid 6x6 pyramid 8x8 pyramid 10x10 pyramid 0.6 0.6 recall recall 0.4 0.4 0.2 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 1-precision 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1-precision

Results: UIUC Cars Dataset Evaluation: Error Rate where precision equals recall method \data set single scale multi scale 10 10 spatial pyramid kernel 1.5 % 1.4 % 4 4 spatial pyramid kernel 1.5 % 7.9 % bag-of-visual-words kernel 10.0 % 71.2 % Agarwal et al. [2002,2004] 23.5 % 60.4 % Fergus et al. [2003] 11.5 % Leibe et al. [2007] 2.5 % 5.0% Fritz et al. [2005] 11.4 % 12.2% Mutch/Lowe [2006] 0.04 % 9.4% UIUC Car Localization, previous best vs. our results

Results: PASCAL VOC 2007 challenge We participated in the PASCAL Challenge on Visual Object Categorization (VOC) 2007: most challenging and competitive evaluation to date training: 5,000 labeled images task: 5,000 new images, predict locations for 20 object classes aeroplane, bird, bicycle, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tv/monitor natural images, downloaded from Flickr, realistic scenes high intra-class variance

Results: PASCAL VOC 2007 challenge Results: High localization quality: first place in 5 of 20 categories. High speed: 40ms per image (excl. feature extraction) Example detections on VOC 2007 dog.

Results: PASCAL VOC 2007 challenge Results: High localization quality: first place in 5 of 20 categories. High speed: 40ms per image (excl. feature extraction) Precision Recall curves on VOC 2007 cat (left) and dog (right).

Results: Prediction Speed on VOC2006

Extensions Branch-and-bound localization allows efficient extensions: Multi-Class Object Localization: (B, C) opt = argmax B B, C C finds best object class C C. fi C (B) Localized retrieval from image databases or videos (I, B) opt = argmax f I (B) B B, I D find best image I in database D. Runtime is sublinear in C and D. Nearest Neighbor query for Red Wings Logo in 10,000 video keyframes in Ferris Buellers Day Off

Summary For a 640 480 pixel image, there are over 10 billion possible regions to evaluate Sliding window approaches trade off runtime vs. accuracy scale aspect ratio grid size Efficient subwindow search finds the maximum that would be found by an exhaustive search efficiency accuracy flexibile just need to come up with a bound Source code is available online

Outlook: Learning to Localize Objects Sucessful Sliding Window Localization has two key components: Efficiency of classifier evaluation this talk Training a discriminant suited to localization talk at ECCV 2008 Learning to Localize Objects with Structured Output Regression