Local Features based Object Categories and Object Instances Recognition

Similar documents
Part based models for recognition. Kristen Grauman

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Supervised learning. y = f(x) function

Object Category Detection. Slides mostly from Derek Hoiem

Part-based and local feature models for generic object recognition

Course Administration

Supervised learning. y = f(x) function

Deformable Part Models

Beyond Bags of Features

Lecture 16: Object recognition: Part-based generative models

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Learning Visual Similarity Measures for Comparing Never Seen Objects

Object Category Detection: Sliding Windows

Category vs. instance recognition

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Category-level localization

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Methods for Representing and Recognizing 3D objects

Efficient Kernels for Identifying Unbounded-Order Spatial Features

High Level Computer Vision

Patch Descriptors. CSE 455 Linda Shapiro

Selective Search for Object Recognition

Bertin Technologies / CNRS LEAR Group INRIA - France

Loose Shape Model for Discriminative Learning of Object Categories

Window based detectors

CV as making bank. Intel buys Mobileye! $15 billion. Mobileye:

Beyond Bags of features Spatial information & Shape models

Recap Image Classification with Bags of Local Features

Local Features and Bag of Words Models

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Descriptors for CV. Introduc)on:

Human Detection and Action Recognition. in Video Sequences

Detection III: Analyzing and Debugging Detection Methods

Part-based models. Lecture 10

Selection of Scale-Invariant Parts for Object Class Recognition

Local cues and global constraints in image understanding

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Part-Based Models for Object Class Recognition Part 2

Multiple-Person Tracking by Detection

Part-Based Models for Object Class Recognition Part 2

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Object Classification for Video Surveillance

Object detection. Announcements. Last time: Mid-level cues 2/23/2016. Wed Feb 24 Kristen Grauman UT Austin

Discriminative classifiers for image recognition

Action recognition in videos

CS6670: Computer Vision

Object and Class Recognition I:

Object Category Detection: Sliding Windows

Is 2D Information Enough For Viewpoint Estimation? Amir Ghodrati, Marco Pedersoli, Tinne Tuytelaars BMVC 2014

String distance for automatic image classification

The Caltech-UCSD Birds Dataset

Learning to Recognize Faces in Realistic Conditions

Machine Learning Crash Course

Evaluation and comparison of interest points/regions

Patch Descriptors. EE/CSE 576 Linda Shapiro

Sampling strategies for bag-of-features image classification

Bag-of-features. Cordelia Schmid

CS5670: Intro to Computer Vision

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Modern Object Detection. Most slides from Ali Farhadi

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Bias-Variance Trade-off (cont d) + Image Representations

CLASSIFICATION Experiments

Visual Object Recognition

Fusing shape and appearance information for object category detection

Visual Object Recognition

2D Image Processing Feature Descriptors

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Oriented Filters for Object Recognition: an empirical study

Object Detection by 3D Aspectlets and Occlusion Reasoning

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

High Level Computer Vision. Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG)

OBJECT CATEGORIZATION

Today. Introduction to recognition Alignment based approaches 11/4/2008

Category-level Localization

Effective Classifiers for Detecting Objects

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

CS229: Action Recognition in Tennis

Computer Science Faculty, Bandar Lampung University, Bandar Lampung, Indonesia

Incremental Learning of Object Detectors Using a Visual Shape Alphabet

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

Categorization by Learning and Combining Object Parts

Object recognition (part 2)

Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Face detection and recognition. Detection Recognition Sally

People detection in complex scene using a cascade of Boosted classifiers based on Haar-like-features

Learning Tree-structured Descriptor Quantizers for Image Categorization

Sketchable Histograms of Oriented Gradients for Object Detection

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Find that! Visual Object Detection Primer

Face Detection and Alignment. Prof. Xin Yang HUST

Transcription:

Local Features based Object Categories and Object Instances Recognition Eric Nowak Ph.D. thesis defense 17th of March, 2008 1

Thesis in Computer Vision Computer vision is the science and technology of machines that see and think (Wikipedia) Object recognition is the subfield of computer vision whose goal is to recognize objects from image data Algorithm BIKE 2

What we want to achieve Local Features based Object Categories and Object Instances Recognition YES NO YES NO YES NO NO NO NO 3

What we want to achieve Local Features based Object Categories and Object Instances Recognition NO YES YES YES NO 4

What we want to achieve Local Features based Object Categories and Object Instances Recognition 5

What we want to achieve Categorization YES Localization Segmentation Different tasks 6

Applications of Object Recognition Look for all faces or for Mr. Smith s face in my personal collection of 20,000 pictures Send an alert automatically when any tank or an AMX-30B2 is seen by a surveillance camera Drive autonomous vehicles: detect pedestrians, other vehicles, street tracks, Look for bike images on the Internet 7

Difficulties of object recognition Reference object YES Same pixels, Same object NO Different pixels, Different objects YES Different pixels, Same object 8

Difficulties of object recognition Reference object YES NO DIFFICULTY = SIGNAL VARIATION Same pixels, Same object Different pixels, Different objects YES Different pixels, Same object 9

Difficulties of object recognition View point change: orientation, translation, scale 10

Difficulties of object recognition Illumination modification: global brightness and contrast, light sources, shadows, 11

Difficulties of object recognition Clutter 12

Difficulties of object recognition Occlusions 13

Difficulties of object recognition Intra-class variations Specific for object categories, not instances 14

Difficulties of object recognition All together: view point variations, illumination variations, clutter, occlusions, intra-class variations Use local representations for local invariance 15

To Re-cognize From Old French reconoistre and from Latin recognoscere: to know again Two step process: Knowledge acquisition: what characterizes or discriminates such and such object? Provided by an expert Inferred from a data set Machine Learning Knowledge usage: Is what I see similar to what I learned? 16

State of the art Object instance recognition Geometric alignment Fischler & Bolles 81, Goad 83, Lowe 87, Stockman 87 Global methods Turk & Pentland 91, Murase & Nayar 95, Pontil & Verri 98, Schiele & Crowley 2000 Local methods Schmid & Mohr 97, Lowe 99 Object class recognition Global methods Idem above Local methods Rigid geometry Schneiderman & Kanade 00, Viola & Jones 01, Torralba et al 04, Dalal & Triggs 05 No geometry Csurka et al 04, Opelt et al 04, Zhang et al 05 Flexible geometry Fischler & Elschlager 73 (!!), Burl et al 98, Weber et al 00, Agarwal & Roth 02, Leibe & Schiele 03, Fergus et al 03, Berg et al 05, Felzenszwalb & Huttenlocher 05, Fei-Fei et al 06 17

State of the art Object instance recognition Geometric alignment Fischler & Bolles 81, Goad 83, Lowe 87, Stockman 87 Global methods Turk & Pentland 91, Murase & Nayar 95, Pontil & Verri 98, Schiele & Crowley 2000 Local methods Schmid & Mohr 97, Lowe 99 Object class recognition Triumph of local methods! Global methods Idem above Local methods Rigid geometry Schneiderman & Kanade 00, Viola & Jones 01, Torralba et al 04, Dalal & Triggs 05 No geometry Csurka et al 04, Opelt et al 04, Zhang et al 05 Flexible geometry Fischler & Elschlager 73 (!!), Burl et al 98, Weber et al 00, Agarwal & Roth 02, Leibe & Schiele 03, Fergus et al 03, Berg et al 05, Felzenszwalb & Huttenlocher 05, Fei-Fei et al 06 But how to select, describe and combine local regions to take a decision? 18

t+10 Overview of our works ECCV06 DGA06 VS05 CVPR07 (detailed presentation) 19

The Bag of Words Algorithm ECCV06

The Bag of Words Algorithm Csurka et al 2004 Object Bag of words X=[ 0.1 0.4 0.1 0.4 ] X=[ 0.7 0.1 0.1 0.1 ] (Credit: Li Fei-Fei) Visual Vocabulary 21

The Bag of Words Algorithm Issues: Local regions sampling (sampler, min scale, samples number, ) Local regions description (normalization, descriptor, ) Codebook computation (size, algorithm, specificity, ) Image description (encoding [1nn vs thr.], normalization, ) Image classification (discriminative/generative, kernel, ) Many claims in literature, but settings not comparable! Study the relative importance of the different steps 22

The Bag of Words Algorithm Graz01: 667 images, 3 cat. + bg: bikes, cars, humans Xerox7: 1776 images, 7 cat. [bike, book, building, ] Pascal Voc 2005: 1373 images, 4 cat + bg: cars, bikes, motorbikes, humans KTH-TIPS: 810 images, 10 categories UIUCTex: 1000 images, 25 categories Brodatz: 112 images, 112 categories Graz 01 KTH-TIPS 23

The Bag of Words Algorithm Perf = average of mean multiclass accuracy over 6 runs of 2 folds cross validation SIFT vs GrayLevel Multiclass accuracy Pascal 05 dataset 24

The Bag of Words Algorithm t+21 Our contributions Measure of the relative importance of algorithmic choices Most important parameter: number of regions => Rand Few regions: HL, LoG >> Rand Many regions: Rand >> HL, LoG HL, LoG: matching Rand: categorization Other parameters (little influence alone, make the difference combined) Codebook size Codebook source Codebook construction algorithm Non linear SVM Kernel Minimum sampling scale Histogram normalization Best accuracy on Pascal 2005 25

Bag of Words and Infrared Images DGA06

Bag of Words and Infrared Images Infrared specificities Noise Low resolution images Distance, humidity Intraclass variations due to local warming + Target detection Do the previous conclusions hold? [sampling, CB size, ] Which operation conditions matter more? [distance, noise, ] 27

Bag of Words and Infrared Images 7 objects [tanks, light fighting vehicles, light transport vehicles], 10,000 images Hybrid dataset: Zoom, histogram, noise modification Inner contrast, outer contrast, atmosphere transmission Joint work with DGA Parameters studied (learn A,B,C ; test D) Codebook size Feature selection methods Standard patch size Sampling size [max size, min size, offset, ] Activation values for codewords assignment Nb of training images required 28

Bag of Words and Infrared Images 29

Bag of Words and Infrared Images Operation parameters Camera / target distance Atmosphere transmission Occlusion rate RSC (target to background contrast ratio) RSS (target contrast) ROI extraction precision Class grouping Two situations Already seen : same data distribution No figures Results are classified Never seen : learn on easy data, test on difficult data 30

Bag of Words and Infrared Images t+27 Our contributions Conclusions about algorithms Same behavior as visible state of the art datasets All the parameters can be tuned from a validation set Conclusions about operation conditions In depth analysis for DGA already seen : good, harder with occlusions never seen : good, harder with occlusions or clutter + bad target extraction 31

Time / Accuracy trade-off VS05

Time / Accuracy trade-off Problem: too many local regions and codewords A solution: feature selection (Int. pts not a solution for IR) A better solution: feature selection for HBC 33

Time / Accuracy trade-off 2 datasets: infrared images [1000 im. vehicles, 1500 im. bg]+ Xerox7 [350 im, 400 bg] fewer patches => faster computation 34

Time / Accuracy trade-off t+32 A feature selection scheme adapted to Hierarchical Binary Classifiers Our contributions It performs much better with very few features, as well with few features, worse with all the features 35

Learning Visual Similarity Measures for Comparing Never Seen Objects CVPR07

Motivation Learning Visual Similarity Measures for Comparing Never Seen Objects 37

Our Goal Computing the visual similarity of two never seen objects Based on training pairs labeled Same or Different (equivalence constraints) Despite occlusions, changes in pose, light, background Learning Visual Similarity Measures for Comparing Never Seen Objects 38

Equivalence Constraints? Same Different Car A Car A Class A Car A Car A Car A Car A Car B Car B Car B Car B Class B Car B Car B Learning Visual Similarity Measures for Comparing Never Seen Objects Car B 39

Equivalence Constraints Less informative than Class Labels car model X and car model Y same/different car model Cheaper to obtain e.g. space of class labels too large Deal with new objects. Which model? CANNOT answer Same or Different? CAN answer Learning Visual Similarity Measures for Comparing Never Seen Objects 40

How to be robust to variabilities? Consider local representations Get corresponding patch pairs Learning Visual Similarity Measures for Comparing Never Seen Objects 41

Vocabulary for Local Representations Text vocabulary of words car, wheel, glass, motor, Image vocabulary of visual words Image pair vocabulary of visual differences HOW do the patches differ? => Characterize local differences Learning Visual Similarity Measures for Comparing Never Seen Objects 42

Characterizing local differences (Ferencz et al, ICCV 05) d1 d2 d3 D(I1,I2) = f(d1,d2,d3) d1, d2, d3: weak characterization of the differences Learning Visual Similarity Measures for Comparing Never Seen Objects 43

Characterizing local differences: our approach Characteristic Difference!!! Patch Pair Space (ND) Much more information than a simple distance HOW TO COMPUTE THIS QUANTIZATION? Learning Visual Similarity Measures for Comparing Never Seen Objects 44

Patch pair quantization algorithm Thr=0.19 Thr=0.03 Thr=0.08 2 SIFT descrip. Both larger than 0.19? False left child True right child Learning Visual Similarity Measures for Comparing Never Seen Objects 45

Patch pair quantization algorithm Patch Pair Space (ND) Quantizer / Clusterer Defined by the trees Cluster centers (characteristic differences) defined by the leaves Learning Visual Similarity Measures for Comparing Never Seen Objects 46

How to learn the trees? Classical decision trees For each node select the best feature [which SIFT dimension] and the best threshold Extremely Randomized Decision Trees (Geurts 06) Ensemble of decision trees + combination rule Each node is suboptimal Variance is small Fast to learn Good for clustering (Moosman, Triggs and Jurie 06) Learning Visual Similarity Measures for Comparing Never Seen Objects 47

An image pair descriptor a) sample corresponding patch pairs b) cluster them with the forest c) Update a global image pair descriptor Learning Visual Similarity Measures for Comparing Never Seen Objects 48

Similarity Measure Computation X=[ 1 0 0 1 1 0 1 ] X=[ 0 1 0 0 1 0 0 ] Our Goal! S( I1, I 2) = S( x) = ω x Learning Visual Similarity Measures for Comparing Never Seen Objects t 49

Datasets Ferencz et al: cars distortions, tiny details, crop Our dataset: toycars view point, light, background Jain et al: faces in the news light, expression, pose, quality, annotation errors Fleuret et al: COIL 100 Learning Visual Similarity Measures for Comparing Never Seen Objects full rotation, heterogeneous 50

Comparison with State of the Art: Equal Error Rate of Precision Recall Never Seen Ferencz Toycars Left: Faces Right: COIL 100 Learning Visual Similarity Measures for Comparing Never Seen Objects 51

Visualizations Learning Visual Similarity Measures for Comparing Never Seen Objects 52

Visualizations Multi dimensional scaling (2D): L2 distance in 2D as close as possible to the pairwise similarity matrix Below: simple bag of words representation Next page: our similarity measure Learning Visual Similarity Measures for Comparing Never Seen Objects 53

Learning Visual Similarity Measures for Comparing Never Seen Objects 54

Method Summary Consider corresp. local regions Quantize patch pair differences Extremely Randomized Clustering Forest Get global image pair descriptor Similarity measure is a weighted sum Learning Visual Similarity Measures for Comparing Never Seen Objects 55

Learning Visual Similarity Measures for Comparing Never Seen Objects Our contributions A new concept: local differences informative characterization Comparison of never seen objects Very significant improvement over the state of the art In depth study of all the parameters of the algorithm Learning Visual Similarity Measures for Comparing Never Seen Objects 56

Conclusions Summary of the contributions ECCV06 DGA06 VS05 CVPR07 57

Open issues What is a good local representation? How to integrate geometry (and improve the results)? Sampling: how to bias the random detector (but not too much)? What is the role of the context? When is it more useful? How to use it? How to integrate physical object constraints (local warming) to statistical models? Should the same algorithm be used for all the problems? HOG for large categories [some geometry, no details] BoW for classes recognition [no geometry, texture] kas for classes recognition [geometry, no texture] Sim. Measure for instance recognition [fine differences between local corresponding regions] How does the algorithms scale wrt. the number of classes? 10,000! Overlapping classes Unsupervised learning 58

Thank you for your attention