Local Features based Object Categories and Object Instances Recognition

Local Features based Object Categories and Object Instances Recognition Eric Nowak Ph.D. thesis defense 17th of March, 2008 1

Thesis in Computer Vision Computer vision is the science and technology of machines that see and think (Wikipedia) Object recognition is the subfield of computer vision whose goal is to recognize objects from image data Algorithm BIKE 2

What we want to achieve Local Features based Object Categories and Object Instances Recognition YES NO YES NO YES NO NO NO NO 3

What we want to achieve Local Features based Object Categories and Object Instances Recognition NO YES YES YES NO 4

What we want to achieve Local Features based Object Categories and Object Instances Recognition 5

What we want to achieve Categorization YES Localization Segmentation Different tasks 6

Applications of Object Recognition Look for all faces or for Mr. Smith s face in my personal collection of 20,000 pictures Send an alert automatically when any tank or an AMX-30B2 is seen by a surveillance camera Drive autonomous vehicles: detect pedestrians, other vehicles, street tracks, Look for bike images on the Internet 7

Difficulties of object recognition Reference object YES Same pixels, Same object NO Different pixels, Different objects YES Different pixels, Same object 8

Difficulties of object recognition Reference object YES NO DIFFICULTY = SIGNAL VARIATION Same pixels, Same object Different pixels, Different objects YES Different pixels, Same object 9

Difficulties of object recognition View point change: orientation, translation, scale 10

Difficulties of object recognition Illumination modification: global brightness and contrast, light sources, shadows, 11

Difficulties of object recognition Clutter 12

Difficulties of object recognition Occlusions 13

Difficulties of object recognition Intra-class variations Specific for object categories, not instances 14

Difficulties of object recognition All together: view point variations, illumination variations, clutter, occlusions, intra-class variations Use local representations for local invariance 15

To Re-cognize From Old French reconoistre and from Latin recognoscere: to know again Two step process: Knowledge acquisition: what characterizes or discriminates such and such object? Provided by an expert Inferred from a data set Machine Learning Knowledge usage: Is what I see similar to what I learned? 16

State of the art Object instance recognition Geometric alignment Fischler & Bolles 81, Goad 83, Lowe 87, Stockman 87 Global methods Turk & Pentland 91, Murase & Nayar 95, Pontil & Verri 98, Schiele & Crowley 2000 Local methods Schmid & Mohr 97, Lowe 99 Object class recognition Global methods Idem above Local methods Rigid geometry Schneiderman & Kanade 00, Viola & Jones 01, Torralba et al 04, Dalal & Triggs 05 No geometry Csurka et al 04, Opelt et al 04, Zhang et al 05 Flexible geometry Fischler & Elschlager 73 (!!), Burl et al 98, Weber et al 00, Agarwal & Roth 02, Leibe & Schiele 03, Fergus et al 03, Berg et al 05, Felzenszwalb & Huttenlocher 05, Fei-Fei et al 06 17

State of the art Object instance recognition Geometric alignment Fischler & Bolles 81, Goad 83, Lowe 87, Stockman 87 Global methods Turk & Pentland 91, Murase & Nayar 95, Pontil & Verri 98, Schiele & Crowley 2000 Local methods Schmid & Mohr 97, Lowe 99 Object class recognition Triumph of local methods! Global methods Idem above Local methods Rigid geometry Schneiderman & Kanade 00, Viola & Jones 01, Torralba et al 04, Dalal & Triggs 05 No geometry Csurka et al 04, Opelt et al 04, Zhang et al 05 Flexible geometry Fischler & Elschlager 73 (!!), Burl et al 98, Weber et al 00, Agarwal & Roth 02, Leibe & Schiele 03, Fergus et al 03, Berg et al 05, Felzenszwalb & Huttenlocher 05, Fei-Fei et al 06 But how to select, describe and combine local regions to take a decision? 18

t+10 Overview of our works ECCV06 DGA06 VS05 CVPR07 (detailed presentation) 19

The Bag of Words Algorithm ECCV06

The Bag of Words Algorithm Csurka et al 2004 Object Bag of words X=[ 0.1 0.4 0.1 0.4 ] X=[ 0.7 0.1 0.1 0.1 ] (Credit: Li Fei-Fei) Visual Vocabulary 21

The Bag of Words Algorithm Issues: Local regions sampling (sampler, min scale, samples number, ) Local regions description (normalization, descriptor, ) Codebook computation (size, algorithm, specificity, ) Image description (encoding [1nn vs thr.], normalization, ) Image classification (discriminative/generative, kernel, ) Many claims in literature, but settings not comparable! Study the relative importance of the different steps 22

The Bag of Words Algorithm Graz01: 667 images, 3 cat. + bg: bikes, cars, humans Xerox7: 1776 images, 7 cat. [bike, book, building, ] Pascal Voc 2005: 1373 images, 4 cat + bg: cars, bikes, motorbikes, humans KTH-TIPS: 810 images, 10 categories UIUCTex: 1000 images, 25 categories Brodatz: 112 images, 112 categories Graz 01 KTH-TIPS 23

The Bag of Words Algorithm Perf = average of mean multiclass accuracy over 6 runs of 2 folds cross validation SIFT vs GrayLevel Multiclass accuracy Pascal 05 dataset 24

The Bag of Words Algorithm t+21 Our contributions Measure of the relative importance of algorithmic choices Most important parameter: number of regions => Rand Few regions: HL, LoG >> Rand Many regions: Rand >> HL, LoG HL, LoG: matching Rand: categorization Other parameters (little influence alone, make the difference combined) Codebook size Codebook source Codebook construction algorithm Non linear SVM Kernel Minimum sampling scale Histogram normalization Best accuracy on Pascal 2005 25

Bag of Words and Infrared Images DGA06

Bag of Words and Infrared Images Infrared specificities Noise Low resolution images Distance, humidity Intraclass variations due to local warming + Target detection Do the previous conclusions hold? [sampling, CB size, ] Which operation conditions matter more? [distance, noise, ] 27

Bag of Words and Infrared Images 7 objects [tanks, light fighting vehicles, light transport vehicles], 10,000 images Hybrid dataset: Zoom, histogram, noise modification Inner contrast, outer contrast, atmosphere transmission Joint work with DGA Parameters studied (learn A,B,C ; test D) Codebook size Feature selection methods Standard patch size Sampling size [max size, min size, offset, ] Activation values for codewords assignment Nb of training images required 28

Bag of Words and Infrared Images 29

Bag of Words and Infrared Images Operation parameters Camera / target distance Atmosphere transmission Occlusion rate RSC (target to background contrast ratio) RSS (target contrast) ROI extraction precision Class grouping Two situations Already seen : same data distribution No figures Results are classified Never seen : learn on easy data, test on difficult data 30

Bag of Words and Infrared Images t+27 Our contributions Conclusions about algorithms Same behavior as visible state of the art datasets All the parameters can be tuned from a validation set Conclusions about operation conditions In depth analysis for DGA already seen : good, harder with occlusions never seen : good, harder with occlusions or clutter + bad target extraction 31

Time / Accuracy trade-off VS05

Time / Accuracy trade-off Problem: too many local regions and codewords A solution: feature selection (Int. pts not a solution for IR) A better solution: feature selection for HBC 33

Time / Accuracy trade-off 2 datasets: infrared images [1000 im. vehicles, 1500 im. bg]+ Xerox7 [350 im, 400 bg] fewer patches => faster computation 34

Time / Accuracy trade-off t+32 A feature selection scheme adapted to Hierarchical Binary Classifiers Our contributions It performs much better with very few features, as well with few features, worse with all the features 35

Learning Visual Similarity Measures for Comparing Never Seen Objects CVPR07

Motivation Learning Visual Similarity Measures for Comparing Never Seen Objects 37

Our Goal Computing the visual similarity of two never seen objects Based on training pairs labeled Same or Different (equivalence constraints) Despite occlusions, changes in pose, light, background Learning Visual Similarity Measures for Comparing Never Seen Objects 38

Equivalence Constraints? Same Different Car A Car A Class A Car A Car A Car A Car A Car B Car B Car B Car B Class B Car B Car B Learning Visual Similarity Measures for Comparing Never Seen Objects Car B 39

Equivalence Constraints Less informative than Class Labels car model X and car model Y same/different car model Cheaper to obtain e.g. space of class labels too large Deal with new objects. Which model? CANNOT answer Same or Different? CAN answer Learning Visual Similarity Measures for Comparing Never Seen Objects 40

How to be robust to variabilities? Consider local representations Get corresponding patch pairs Learning Visual Similarity Measures for Comparing Never Seen Objects 41

Vocabulary for Local Representations Text vocabulary of words car, wheel, glass, motor, Image vocabulary of visual words Image pair vocabulary of visual differences HOW do the patches differ? => Characterize local differences Learning Visual Similarity Measures for Comparing Never Seen Objects 42

Characterizing local differences (Ferencz et al, ICCV 05) d1 d2 d3 D(I1,I2) = f(d1,d2,d3) d1, d2, d3: weak characterization of the differences Learning Visual Similarity Measures for Comparing Never Seen Objects 43

Characterizing local differences: our approach Characteristic Difference!!! Patch Pair Space (ND) Much more information than a simple distance HOW TO COMPUTE THIS QUANTIZATION? Learning Visual Similarity Measures for Comparing Never Seen Objects 44

Patch pair quantization algorithm Thr=0.19 Thr=0.03 Thr=0.08 2 SIFT descrip. Both larger than 0.19? False left child True right child Learning Visual Similarity Measures for Comparing Never Seen Objects 45

Patch pair quantization algorithm Patch Pair Space (ND) Quantizer / Clusterer Defined by the trees Cluster centers (characteristic differences) defined by the leaves Learning Visual Similarity Measures for Comparing Never Seen Objects 46

How to learn the trees? Classical decision trees For each node select the best feature [which SIFT dimension] and the best threshold Extremely Randomized Decision Trees (Geurts 06) Ensemble of decision trees + combination rule Each node is suboptimal Variance is small Fast to learn Good for clustering (Moosman, Triggs and Jurie 06) Learning Visual Similarity Measures for Comparing Never Seen Objects 47

An image pair descriptor a) sample corresponding patch pairs b) cluster them with the forest c) Update a global image pair descriptor Learning Visual Similarity Measures for Comparing Never Seen Objects 48

Similarity Measure Computation X=[ 1 0 0 1 1 0 1 ] X=[ 0 1 0 0 1 0 0 ] Our Goal! S( I1, I 2) = S( x) = ω x Learning Visual Similarity Measures for Comparing Never Seen Objects t 49

Datasets Ferencz et al: cars distortions, tiny details, crop Our dataset: toycars view point, light, background Jain et al: faces in the news light, expression, pose, quality, annotation errors Fleuret et al: COIL 100 Learning Visual Similarity Measures for Comparing Never Seen Objects full rotation, heterogeneous 50

Comparison with State of the Art: Equal Error Rate of Precision Recall Never Seen Ferencz Toycars Left: Faces Right: COIL 100 Learning Visual Similarity Measures for Comparing Never Seen Objects 51

Visualizations Learning Visual Similarity Measures for Comparing Never Seen Objects 52

Visualizations Multi dimensional scaling (2D): L2 distance in 2D as close as possible to the pairwise similarity matrix Below: simple bag of words representation Next page: our similarity measure Learning Visual Similarity Measures for Comparing Never Seen Objects 53

Learning Visual Similarity Measures for Comparing Never Seen Objects 54

Method Summary Consider corresp. local regions Quantize patch pair differences Extremely Randomized Clustering Forest Get global image pair descriptor Similarity measure is a weighted sum Learning Visual Similarity Measures for Comparing Never Seen Objects 55

Learning Visual Similarity Measures for Comparing Never Seen Objects Our contributions A new concept: local differences informative characterization Comparison of never seen objects Very significant improvement over the state of the art In depth study of all the parameters of the algorithm Learning Visual Similarity Measures for Comparing Never Seen Objects 56

Conclusions Summary of the contributions ECCV06 DGA06 VS05 CVPR07 57

Open issues What is a good local representation? How to integrate geometry (and improve the results)? Sampling: how to bias the random detector (but not too much)? What is the role of the context? When is it more useful? How to use it? How to integrate physical object constraints (local warming) to statistical models? Should the same algorithm be used for all the problems? HOG for large categories [some geometry, no details] BoW for classes recognition [no geometry, texture] kas for classes recognition [geometry, no texture] Sim. Measure for instance recognition [fine differences between local corresponding regions] How does the algorithms scale wrt. the number of classes? 10,000! Overlapping classes Unsupervised learning 58

Thank you for your attention