Introduction to Object Recognition & Bag of Words (BoW) Models

Size: px

Start display at page:

Download "Introduction to Object Recognition & Bag of Words (BoW) Models"

Lester West
5 years ago
Views:

1 : Introduction to Object Recognition & Bag of Words (BoW) Models Professor Fei Fei Li Stanford Vision Lab 1

2 What we will learn today? Introduction to object recognition Representation Learning Recognition Bag of Words models (Problem Set 4 (Q2)) Basic representation Different learning and recognition algorithms 2

3 What are the different visual recognition tasks? 3

4 Classification: Does this image contain a building? [yes/no] Yes! 4

5 Classification: Is this an beach? 5

6 Image Search Organizing photo collections 6

7 Detection: Does this image contain a car? [where?] car 7

8 Detection: Which object does this image contain? [where?] Building clock person car 8

9 Detection: Accurate localization (segmentation) clock 9

10 Detection: Estimating object semantic & geometric attributes Object: Building, 45º pose, 8 10 meters away It has bricks Object: Person, back; 1 2 meters away Object: Police car, side view, 4 5 m away 10

11 Applications of computer vision Computational photography Assistive technologies Surveillance Security Assistive driving 11

12 Categorization vs Single instance recognition Does this image contain the Chicago Macy building s? 12

13 Categorization vs Single instance recognition Where is the crunchy nut? 13

14 Applications of computer vision Recognizing landmarks in mobile platforms + GPS 14

15 Activity or Event recognition What are these people doing? 15

16 Visual Recognition Design algorithms that are capable to Classify images or videos Detect and localize objects Estimate semantic and geometrical attributes Classify human activities and events Why is this challenging? 16

17 How many object categories are there? 17

18 Challenges: viewpoint variation Michelangelo

19 Challenges: illumination image credit: J. Koenderink 19

20 Challenges: scale 20

21 Challenges: deformation 21

22 Challenges: occlusion Magritte,

23 Challenges: background clutter Kilmeny Niland

24 Challenges: intra class variation 24

and Jones, 2000 Amit and Geman, 1999 LeCun et al.

25 Some early works on object categorization Turk and Pentland, 1991 Belhumeur, Hespanha, & Kriegman, 1997 Schneiderman & Kanade 2004 Viola and Jones, 2000 Amit and Geman, 1999 LeCun et al Belongie and Malik, 2002 Schneiderman & Kanade, 2004 Argawal and Roth, 2002 Poggio et al. 1993

26 Basic issues Representation How to represent an object category; which classification scheme? Learning How to learn the classifier, given training data Recognition How the classifier is to be used on novel data 26

27 Representation Building blocks: Sampling strategies Interest operators Multiple interest operators Dense, uniformly Randomly Image credits: L. Fei Fei, E. Nowak, J. Sivic 27

28 Representation Appearance only or location and appearance 28

29 Representation Invariances View point Illumination Occlusion Scale Deformation Clutter etc. 29

30 Representation To handle intra class variability, it is convenient to describe an object categories using probabilistic models Object models: Generative vs Discriminative vs hybrid 30

31 Object categorization: the statistical viewpoint p( zebra image) vs. p( no zebra image) Bayesrule: p( zebra image) p( no zebra image) 31

32 Object categorization: the statistical viewpoint p( zebra image) vs. p( no zebra image) Bayes rule: p( zebra image) p( no zebra image) p( image zebra) p( image no zebra) p( zebra) p( no zebra) posterior ratio likelihood ratio prior ratio

33 Object categorization: the statistical viewpoint Discriminative methods model posterior Generative methods model likelihood and prior Bayes rule: p( zebra image) p( no zebra image) p( image zebra) p( image no zebra) p( zebra) p( no zebra) posterior ratio likelihood ratio prior ratio

34 Discriminative models Modeling the posterior ratio: p( zebra image) p( no zebra image) Decision boundary Zebra Non zebra 34

.. LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 Support Vector Machines Latent SVM

35 Discriminative models Nearest neighbor Neural networks 10 6 examples Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 Support Vector Machines Latent SVM Structural SVM Boosting Guyon, Vapnik, Heisele, Serre, Poggio Felzenszwalb 00 Ramanan 03 Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006, Source: Vittorio Ferrari, Kristen Grauman, Antonio Torralba 35

36 Generative models Modeling the likelihood ratio: p( image zebra) p( image no zebra) 5 4 p(x C 2 ) class densities p(x C 1 ) x 36

37 Generative models p( image zebra) High Low p( image no zebra) Low High 5 4 p(x C 2 ) class densities 3 2 p(x C 1 )

38 Generative models Naïve Bayes classifier Csurka Bray, Dance & Fan, 2004 Hierarchical Bayesian topic models (e.g. plsa and LDA) Object categorization: Sivic et al. 2005, Sudderth et al Natural scene categorization: Fei Fei et al D Part based models Constellation models: Weber et al 2000; Fergus et al 200 Star models: ISM (Leibe et al 05) 3D part based models: multi aspects: Sun, et al,

39 Basic issues Representation How to represent an object category; which classification scheme? Learning How to learn the classifier, given training data Recognition How the classifier is to be used on novel data 39

40 Learning Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) 40

41 Learning Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Batch/incremental Priors 41

) Level of supervision Manual segmentation; bounding box; image labels; noisy

42 Learning Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.) Level of supervision Manual segmentation; bounding box; image labels; noisy labels Batch/incremental Priors Training images: Issue of overfitting Negative images for discriminative methods 42

43 Basic issues Representation How to represent an object category; which classification scheme? Learning How to learn the classifier, given training data Recognition How the classifier is to be used on novel data 43

44 Recognition Recognition task: classification, detection, etc.. 44

45 Recognition Recognition task Search strategy: Sliding Windows Simple Computational complexity (x,y, S,, N of classes) BSW by Lampert et al 08 Also, Alexe, et al 10 Viola, Jones 2001, 45

46 Recognition Recognition task Search strategy: Sliding Windows Simple Computational complexity (x,y, S,, N of classes) BSW by Lampert et al 08 Also, Alexe, et al 10 Localization Objectsare not boxes Viola, Jones 2001, 46

47 Recognition Recognition task Search strategy: Sliding Windows Simple Computational complexity (x,y, S,, N of classes) BSW by Lampert et al 08 Also, Alexe, et al 10 Localization Objectsare not boxes Proneto false positive Non max suppression: Canny 86. Desai et al, 2009 Viola, Jones 2001, 47

48 Recognition Recognition task Search strategy Attributes Savarese, 2007 Sun et al 2009 Liebelt et al., 08, 10 Farhadi et al 09 Category: car Azimuth = 225º Zenith = 30º It has metal it is glossy has wheels Farhadi et al 09 Lampert et al 09 Wang & Forsyth 09 48

& Davis 08 Heitz & Koller 08 L J Li et al 08 Yao & Fei Fei 10

49 Recognition Recognition task Search strategy Attributes Context Semantic: Torralba et al 03 Rabinovich et al 07 Gupta & Davis 08 Heitz & Koller 08 L J Li et al 08 Yao & Fei Fei 10 Geometric Hoiem, et al 06 Gould et al 09 Bao, Sun, Savarese 10 49

50 Basic issues Representation How to represent an object category; which classification scheme? Learning How to learn the classifier, given training data Recognition How the classifier is to be used on novel data 50

51 Part 1: Bag of words models This segment is based on the tutorial Recognizing and Learning Object Categories: Year 2007, by Prof L. Fei Fei, A. Torralba, and R. Fergus 51

52 Related works Early bag of words models: mostly texture recognition Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003; Hierarchical Bayesian models for documents (plsa, LDA, etc.) Hoffman 1999; Blei, Ng & Jordan, 2004; Teh, Jordan, Beal & Blei, 2004 Object categorization Csurka, Bray, Dance & Fan, 2004; Sivic, Russell, Efros, Freeman & Zisserman, 2005; Sudderth, Torralba, Freeman & Willsky, 2005; Natural scene categorization Vogel & Schiele, 2004; Fei Fei & Perona, 2005; Bosch, Zisserman & Munoz,

53 Object Bag of words 53

For a long time it was thought that the retinal image was transmitted sensory, point brain, by point to visual centers in the brain; the cerebral cortex was a visual, perception, movie screen, so to

54 Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted sensory, point brain, by point to visual centers in the brain; the cerebral cortex was a visual, perception, movie screen, so to speak, upon which the image in retinal, the eye was cerebral projected. Through cortex, the discoveries of eye, Hubel cell, and Wiesel optical we now know that behind the origin of the visual perception in the nerve, brain there image is a considerably more complicated Hubel, course of Wiesel events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a stepwise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. China is forecasting a trade surplus of $90bn ( 51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures China, are likely trade, to further annoy the surplus, US, which has commerce, long argued that China's exports are unfairly helped by a deliberately exports, undervalued imports, yuan. Beijing US, agrees the yuan, surplus bank, is too high, domestic, but says the yuan is only one factor. Bank of China governor Zhou foreign, Xiaochuan increase, said the country also needed to do trade, more to value boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. 54

55 definition of BoW Independent features face bike violin 55

56 definition of BoW Independent features histogram representation codewords dictionary 56

57 Representation recognition feature detection & representation codewords dictionary image representation learning category models (and/or) classifiers 57 category decision

58 1.Feature detection and representation 58

59 1.Feature detection and representation Regular grid Vogel & Schiele, 2003 Fei Fei & Perona,

60 1.Feature detection and representation Regular grid Vogel & Schiele, 2003 Fei Fei & Perona, 2005 Interest point detector Csurka, et al Fei Fei & Perona, 2005 Sivic, et al

61 1.Feature detection and representation Regular grid Vogel & Schiele, 2003 Fei Fei & Perona, 2005 Interest point detector Csurka, Bray, Dance & Fan, 2004 Fei Fei & Perona, 2005 Sivic, Russell, Efros, Freeman & Zisserman, 2005 Other methods Random sampling (Vidal Naquet & Ullman, 2002) Segmentation based patches (Barnard, Duygulu, Forsyth, de Freitas, Blei, Jordan, 2003) 61

62 1.Feature detection and representation Compute SIFT descriptor [Lowe 99] Normalize patch Detect patches [Mikojaczyk and Schmid 02] [Mata, Chum, Urban & Pajdla, 02] [Sivic & Zisserman, 03] Slide credit: Josef Sivic 62

63 1.Feature detection and representation 63

64 2. Codewords dictionary formation 64

65 2. Codewords dictionary formation Cluster center = code word Clustering/ vector quantization 65

66 2. Codewords dictionary formation Fei-Fei et al

67 Image patch examples of codewords Sivic et al

68 Visual vocabularies: Issues How to choose vocabulary size? Too small: visual words not representative of all patches Too large: quantization artifacts, overfitting Computational efficiency Vocabulary trees (Nister & Stewenius, 2006) 68

69 3. Bag of word representation Nearest neighbors assignment K D tree search strategy Codewords dictionary 69

70 3. Bag of word representation frequency. codewords Codewords dictionary 70

71 Representation 1. feature detection & representation 2. codewords dictionary image representation 3. 71

72 Learning and Recognition codewords dictionary category models (and/or) classifiers 72 category decision

73 Learning and Recognition 1. Discriminative method: -NN - SVM 2.Generative method: - graphical models category models (and/or) classifiers 73

74 Discriminative classifiers category models Model space Class 1 Class N 74

75 Discriminative classifiers Query image Model space Winning class: pink 75

76 Nearest Neighbors classifier Query image Model space Winning class: pink Assign label of nearest training data point to each test data point 76

77 K- Nearest Neighbors classifier Query image Model space Winning class: pink For a new point, find the k closest points from training data Labels of the k points vote to classify Works well provided there is lots of data and the distance function is good 77

78 K- Nearest Neighbors classifier from Duda et al. Voronoi partitioning of feature space for 2 category 2 D and 3 D data For k dimensions: k D tree = space partitioning data structure for organizing points in a k dimensional space Enable efficient search Nice tutorial: /pbasic.pdf 78

79 Functions for comparing histograms L1 distance χ 2 distance Quadratic distance (cross bin) N i i h i h h h D ) ( ) ( ), ( Jan Puzicha, Yossi Rubner, Carlo Tomasi, Joachim M. Buhmann: Empirical Evaluation of Dissimilarity Measures for Color and Texture. ICCV 1999 N i i h i h i h i h h h D ) ( ) ( ) ( ) ( ), ( j i ij j h i h A h h D, )) ( ) ( ( ), ( 79

80 Learning and Recognition 1. Discriminative method: -NN - SVM 2.Generative method: - graphical models 80

81 Discriminative classifiers (linear classifier) category models Model space Class 1 Class N 81

82 Support vector machines Find hyperplane that maximizes the margin between the positive and negative examples Support vectors: x i w b 1 x w b Distance between point i and hyperplane: w Margin = 2 / w Solution: w y x i i i i Support vectors Credit slide: S. Lazebnik Margin w Classification function (decision boundary): x b y x x i i i i b 82

83 Support vector machines Classification w x b y x x i i i i b Test point if x w b 0 class 1 if x w b 0 class 2 Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery,

84 Nonlinear SVMs Datasets that are linearly separable work out great: 0 x But what if the dataset is just too hard? 0 x We can map it to a higher dimensional space: x 2 0 x Slide credit: Andrew Moore 84

85 Nonlinear SVMs General idea: the original input space can always be mapped to some higher dimensional feature space where the training set is separable: Φ: x φ(x) lifting transformation Slide credit: Andrew Moore 85

86 Nonlinear SVMs Nonlinear decision boundary in the original feature space: i y i K ( x i, x ) i b The kernel K = product of the lifting transformation φ(x): K(x i,x j j) = φ(x i ) φ(x j ) NOTE: It is not required to compute φ(x) explicitly: The kernel must satisfy the Mercer inequality C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery,

87 Kernels for bags of features Histogram intersection kernel: N I ( h1, h 2 ) min( h1 ( i), h 2 ( i)) i 1 Generalized Gaussian kernel: 1 K ( h, h 2 ) exp D ( h1, h A ) D can be Euclidean distance, χ 2 distance etc J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study, IJCV

88 Pyramid match kernel Fast approximation of Earth Mover s Distance Weighted sum of histogram intersections at mutliple resolutions (linear in the number of features instead of cubic) K. Grauman and T. Darrell. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, ICCV

89 Spatial Pyramid Matching Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce. CVPR

90 What about multi class SVMs? No definitive multi class SVM formulation In practice, we have to obtain a multi class SVM by combining multiple two class SVMs One vs. others Traning: learn an SVM for each class vs. the others Testing: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value One vs. one Training: learn an SVM for each pair of classes Testing: each learned SVM votes for a class to assign to the test example Credit slide: S. Lazebnik 90

91 SVMs: Pros and cons Pros Many publicly available SVM packages: machines.org/software Kernel based framework is very powerful, flexible SVMs work very well in practice, even with very small training sample sizes Cons No direct multi class SVM, must combine two class SVMs Computation, memory During training time, must compute matrix of kernel values for every pair of examples Learning can take a very long time for large scale problems 91

92 Object recognition results ETH 80 database of 8 object classes (Eichhorn and Chapelle 2004) Features: Harris detector PCA SIFT descriptor, d=10 Kernel Complexity Recognition rate Match [Wallraven et al.] 84% Bhattacharyya affinity [Kondor & Jebara] 85% Pyramid match 84% Slide credit: Kristen Grauman

93 Discriminative models Nearest neighbor Neural networks 10 6 examples Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 Support Vector Machines Latent SVM Structural SVM Boosting Guyon, Vapnik, Heisele, Serre, Poggio Felzenszwalb 00 Ramanan 03 Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006, Source: Vittorio Ferrari, Kristen Grauman, Antonio Torralba 93

94 Learning and Recognition 1. Discriminative method: NN SVM 2.Generative method: graphical models Model the probability distribution that produces a given bag of features 94

95 Generative models 1. Naïve Bayes classifier Csurka Bray, Dance & Fan, Hierarchical Bayesian text models (plsa and LDA) Background: Hoffman 2001, Blei, Ng & Jordan, 2004 Object categorization: Sivic et al. 2005, Sudderth et al Natural scene categorization: Fei Fei et al

96 Some notations w: a collection of all N codewords in the image w = [w1,w2,,wn] c: category of the image 96

97 the Naïve Bayes model c w N Graphical model Posterior = probability that image I is of category c p( c w) p( c) p( w c) Prior prob. of the object classes Image likelihood given the class

98 the Naïve Bayes model c w N Graphical model c arg max c p( c w) p( c) p( w c) p( c) N n 1 p( w n c) Object class decision Likelihood of ith visual word given the class Estimated by empirical frequencies of code words in images from a given class

99 Csurka et al

100 Csurka et al

101 Other generative BoW models Hierarchical Bayesian topic models (e.g. plsa and LDA) Object categorization: Sivic et al. 2005, Sudderth et al Natural scene categorization: Fei Fei et al

102 Generative vs discriminative Discriminative methods Computationally efficient & fast Generative models Convenient for weakly or un supervised, incremental training Prior information Flexibility in modeling parameters 102

103 Weakness of BoW the models No rigorous geometric information of the object components It s intuitive to most of us that objects are made of parts no such information Not extensively tested yet for View point invariance Scale invariance Segmentation and localization unclear 103

104 What have learned today? Introduction to object recognition Representation Learning Recognition Bag of Words models (Problem Set 4 (Q2)) Basic representation Different learning and recognition algorithms 104

EECS 442 Computer vision. Object Recognition

EECS 442 Computer vision. Object Recognition EECS 442 Computer vision Object Recognition Intro Recognition of 3D objects Recognition of object categories: Bag of world models Part based models 3D object categorization Computer Vision: Algorithms