Object recognition (part 2)

Size: px

Start display at page:

Download "Object recognition (part 2)"

Leonard Hunt
5 years ago
Views:

1 Object recognition (part 2) CSE P 576 Larry Zitnick (larryz@microsoft.com) 1

2 2

3 3

4 Support Vector Machines Modified from the slides by Dr. Andrew W. Moore Linear Classifiers denotes +1 denotes -1 x a f f(x,w,b) = sign(w. x - b) How would you classify this data? y est Nov 23rd, 2001 Support Vector Machines: Slide 14 Linear Classifiers x a f y est Linear Classifiers x a f y est denotes +1 f(x,w,b) = sign(w. x - b) denotes +1 f(x,w,b) = sign(w. x - b) denotes -1 denotes -1 How would you classify this data? How would you classify this data? Support Vector Machines: Slide 15 Support Vector Machines: Slide 16 4

5 Linear Classifiers x a f y est Linear Classifiers x a f y est denotes +1 f(x,w,b) = sign(w. x - b) denotes +1 f(x,w,b) = sign(w. x - b) denotes -1 denotes -1 How would you classify this data? Any of these would be fine....but which is best? Support Vector Machines: Slide 17 Support Vector Machines: Slide 18 Classifier Margin x a f y est Maximum Margin x a f y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint. denotes +1 denotes -1 Linear SVM f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vector Machines: Slide 19 Support Vector Machines: Slide 20 5

This is the simplest kind of SVM (Called an LSVM) Support Vector Machines: Slide 21 Why Maximum Margin? denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against 1.

6 Maximum Margin denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against x Linear SVM a f f(x,w,b) = sign(w. x - b) y est The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vector Machines: Slide 21 Why Maximum Margin? denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against 1. Intuitively this feels safest. 2. If we ve made f(x,w,b) a small = sign(w. error in x the - b) location of the boundary (it s been jolted in its perpendicular The maximum direction) this gives us least margin chance linear of causing a misclassification. classifier is the 3. LOOCV is easy since linear the classifier model is immune to removal with of any the, nonsupport-vector datapoints. um, maximum margin. 4. There s some theory (using VC dimension) that is This related is the to (but not the same as) the simplest proposition kind that of this is a good thing. SVM (Called an 5. Empirically it works LSVM) very very well. Support Vector Machines: Slide 22 Nonlinear Kernel (I) Nonlinear Kernel (II) Support Vector Machines: Slide 23 Support Vector Machines: Slide 24 6

7 Support Vector Machines: Slide 25 Caltech-101: Drawbacks Smallest category size is 31 images: N train 30 Too easy? left-right aligned Rotation artifacts Soon will saturate performance 7

s slides. (page 29) Objects in Context R.

8 Antonio Torralba generated these average images of the Caltech 101 categories 5/23/2011 Jump to Nicolas Pinto s slides. (page 29) Objects in Context R. Gokberk Cinbis MIT Object Recognition and Scene Understanding 32 8

Papers A. Torralba. Contextual priming for object detection. IJCV 2003.

Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context.

Reasoning Preview: Contextual Priming for Object Detection Scene Centered 2D Reasoning Object Centered 2,5D / 3D

9 Papers A. Torralba. Contextual priming for object detection. IJCV Object presence at a particular location/scal e Object Detection Probabilistic Framework (Single Object Likelihood) A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in Context. ICCV 2007 Given all image features (local/object and scene/context) v = v Local + v Contextual Contextual Reasoning Preview: Contextual Priming for Object Detection Scene Centered 2D Reasoning Object Centered 2,5D / 3D Reasoning Surface orientations w.r.t. camera SKY? VERTICAL SUPPORT VERTIC AL Input test image Contextual priming for object detection Objects in Context Geometric context from a single image

Preview: Contextual Priming for Object Detection Preview: Contextual Priming for Object Detection Using previously collected statistics about filter output predict information about objects Correlate

10 Preview: Contextual Priming for Object Detection Preview: Contextual Priming for Object Detection Using previously collected statistics about filter output predict information about objects Correlate with many filters Preview: Contextual Priming for Object Detection Contextual Priming for Object Detection: Probabilistic Framework Predict information about objects Local measurements (a lot in the literature) Contextual features Where I can find the objects easily? people chair car Which objects do I expect to see? How large objects do I expect to see?

Contextual Priming for Object Detection: Contextual Features Contextual Priming for Object Detection: Object Priming Results Gabor filters at 4 scales and 6 orientations Use PCA on filter output

(Other alternatives include KNN, parzen window, logistic regression, etc) 41 (o 1 =people, o 2 =furniture, o 3 =vehicles and o 4 =trees 42 Contextual Priming for Object Detection: Focus of Attention

11 Contextual Priming for Object Detection: Contextual Features Contextual Priming for Object Detection: Object Priming Results Gabor filters at 4 scales and 6 orientations Use PCA on filter output images to reduce the number of features (< 64) Use Mixture of Gaussians to model the probabilities. (Other alternatives include KNN, parzen window, logistic regression, etc) 41 (o 1 =people, o 2 =furniture, o 3 =vehicles and o 4 =trees 42 Contextual Priming for Object Detection: Focus of Attention Results Contextual Priming for Object Detection: Conclusions Proves the relation btw low level features and scene/context Can be seen as a computational evidence for the (possible) existence of low-level feature based biological attention mechanisms Also a warning: Whether an object recognition system understands the object or works by lots bg features. Heads

12 Preview: Objects in Context Preview: Objects in Context Input test image Do segmentation on the image Preview: Objects in Context Preview: Objects in Context Building, boat, person Building, boat, person Building Water, sky Road Building, boat, motorbike Do classification (find label probabilities) in each segment only with local info 47 Water, sky Road Building, boat, motorbike Road Boat Water Most consistent labeling according to object cooccurrences & local label probabilities

Objects in Context: Local Categorization Objects in Context: Contextual Refinement Building, boat, person Water, sky Road Building, boat, motorbike Extract random patches on zero-padded segments

13 Objects in Context: Local Categorization Objects in Context: Contextual Refinement Building, boat, person Water, sky Road Building, boat, motorbike Extract random patches on zero-padded segments Calculate SIFT descriptors Use BoF: Training: - Cluster patches in training (Hier. K-means, K=10x3) - Histogram of words in each segment - NN classifier (returns a sorted list of categories) Water Building Boat Road Contextual model based on co-occurrences Try to find the most consistent labeling with high posterior probability and high mean pairwise interaction. Use CRF for this purpose. Each segment is classified independently 49 Mean interaction of all label pairs Φ(i,j) is basically the observed label cooccurrences in training set. Independent segment classification 50 Objects in Context: Learning Context Objects in Context: Results Using labeled image datasets (MSRC, PASCAL) Using labeled text based data (Google Sets): Contains list of related items A large set turns out to be useless! (anything is related)

Objects in Context Limitations: Context modeling Objects in Context Limitations: Context modeling With

Segmentation context context context P(person,horse) > Local information only Means: P(person,dog) Local

We have seen in the previous example that P(person,dog) is common too.

) 53 54 Stuff-like Stuff Labels with high co-occurrences with other labels Objects in Context

Looks like background stuff object (such as water-boat) does help rather than foreground object

14 Objects in Context Limitations: Context modeling Objects in Context Limitations: Context modeling With co-occurrence Categorization without With co-occurrence Categorization without context Segmentation Segmentation context context context P(person,horse) > Local information only Means: P(person,dog) Local information only P(person, dog) > P(person, cow) But why? Isn t it only a dataset bias? We have seen in the previous example that P(person,dog) is common too. (Bonus Q: How did it handle the background?) Stuff-like Stuff Labels with high co-occurrences with other labels Objects in Context Object-Object or Stuff-Object? Looks like background stuff object (such as water-boat) does help rather than foreground object cooccurrences (such as person-horse) [but still car-person-motorbike is useful in PASCAL] Objects in Context Limitations: Segmentation Too good: A few or many? How to select a good segmentation in multiple segmentations? Can make object recognition & contextual reasoning (due to stuff detection) much easier

with images where many labels are already correct. How good is the model?

15 Objects in Context - Limitations Contextual Priming vs. Objects in Context No cue by unknown objects No spatial relationship reasoning Object detection part heavily depends on good segmentations Improvements using object co-occurrences are demonstrated with images where many labels are already correct. How good is the model? 57 Scene->Object Simpler training data (only target object s labels are enough) Scene information is view-dependent (due to gist) Object detector independent {Object,Stuff} <-> {Object,Stuff} May need huge amount of labeled data Can be more generic than scene->object with a very good model Contextual model is object detector independent, in theory. But: + use segmentation easier to detect stuff - uses segmentation can be unreliable 58 Object recognition We ve come a long way Finding the weakest link in person detectors Devi Parikh TTI, Chicago Larry Zitnick Microsoft Research Fischler and Elschlager, 1973 Dollar et al., BMVC

16 Still a ways to go Dollar et al., BMVC 2009 Dollar et al., BMVC 2009 Still a ways to go Part-based person detector 4 main components: Feature selection Part detection Spatial model NMS / context Color Intensity Edges Dollar et al., BMVC 2009 Felzenszwalb Hoiem et al., et al.,

17 How can we help? Human debugging Humans supply training data 100,000s labeled images We design the algorithms. Going on 40 years. Help me! Feature selection Feature selection Part detection Spatial model Spatial model NMS / context NMS / context Can we use humans to debug? Feature selection Part detection NMS / context Amazon Mechanical Turk Feature selection Part detection Human performance Humans ~90% average precision Machines ~46% average precision Feature selection Human debugging Spatial model NMS / context Is it a head, torso, arm, leg, foot, hand, or nothing? Head Leg Nothing Low resolution 20x20 pixels PASCAL VOC dataset Feet Head? 17

18 Part detections Head Torso Arm Hand Leg Foot Person Part detections Machine Machine Humans Humans Part detections Part detections High res Humans Machine Low res Machine 18

19 AP results Spatial model Feature selection Part detection NMS / context Person High res Low res Not a person Spatial model Context/NMS vs. NMS / context 19

20 Conclusion 7:00min 20

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context