Lecture 12 Visual recognition

Size: px

Start display at page:

Download "Lecture 12 Visual recognition"

Arleen York
6 years ago
Views:

1 Lecture 12 Visual recognition Bag of words models for object recognition and classification Discriminative methods Generative methods Silvio Savarese Lecture 11 17Feb14

2 Challenges Variability due to: View point Illumination Occlusions Intraclass variability

3 Challenges: intraclass variation

4 Basic properties Representation How to represent an object category; which classification scheme? Learning How to learn the classifier, given training data Recognition How the classifier is to be used on novel data

5 Part 1: Bagofwords models This segment is based on the tutorial Recognizing and Learning Object Categories: Year 2007, by Prof A. Torralba, R. Fergus and F. Li

6 Related works Early bag of words models: mostly texture recognition Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003; Hierarchical Bayesian models for documents (plsa, LDA, etc.) Hoffman 1999; Blei, Ng & Jordan, 2004; Teh, Jordan, Beal & Blei, 2004 Object categorization Csurka, Bray, Dance & Fan, 2004; Sivic, Russell, Efros, Freeman & Zisserman, 2005; Sudderth, Torralba, Freeman & Willsky, 2005; Natural scene categorization Vogel & Schiele, 2004; FeiFei & Perona, 2005; Bosch, Zisserman & Munoz, 2006

7 Object Bag of words

8 Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted sensory, point brain, by point to visual centers in the brain; the cerebral cortex was a visual, perception, movie screen, so to speak, upon which the image in retinal, the eye was cerebral projected. Through cortex, the discoveries of eye, Hubel cell, and Wiesel optical we now know that behind the origin of the visual perception in the nerve, brain there image is a considerably more complicated Hubel, course of Wiesel events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a stepwise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. China is forecasting a trade surplus of $90bn ( 51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures China, are likely trade, to further annoy the US, which has long argued that surplus, commerce, China's exports are unfairly helped by a deliberately exports, undervalued imports, yuan. Beijing US, agrees the yuan, surplus bank, is too high, domestic, but says the yuan is only one factor. Bank of China governor Zhou foreign, Xiaochuan increase, said the country also needed to do trade, more to value boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

9 definition of BoW Independent features face bike violin

10 definition of BoW Independent features histogram representation codewords dictionary

11 learning Representation recognition feature detection & representation codewords dictionary image representation category models (and/or) classifiers category decision

12 1.Feature detection and description

13 1.Feature detection and description Regular grid Vogel & Schiele, 2003 FeiFei & Perona, 2005

14 1.Feature detection and description Regular grid Vogel & Schiele, 2003 FeiFei & Perona, 2005 Interest point detector Csurka, et al FeiFei & Perona, 2005 Sivic, et al. 2005

15 1.Feature detection and description Regular grid Vogel & Schiele, 2003 FeiFei & Perona, 2005 Interest point detector Csurka, Bray, Dance & Fan, 2004 FeiFei & Perona, 2005 Sivic, Russell, Efros, Freeman & Zisserman, 2005 Other methods Random sampling (VidalNaquet & Ullman, 2002) Segmentation based patches (Barnard, Duygulu, Forsyth, de Freitas, Blei, Jordan, 2003)

16 1.Feature detection and description Compute SIFT descriptor [Lowe 99] Normalize patch Detect patches [Mikojaczyk and Schmid 02] [Mata, Chum, Urban & Pajdla, 02] [Sivic & Zisserman, 03] Slide credit: Josef Sivic

17 2. Codewords dictionary formation

18 2. Codewords dictionary formation

19 Example: color feature

20 Example: color feature b g r

21 2. Codewords dictionary formation Cluster center = code word Clustering/ vector quantization E.g., Kmeans, see CS131A

26 2. Codewords dictionary formation Image patch examples of codewords Sivic et al. 2005

27 2. Codewords dictionary formation FeiFei et al. 2005

28 2. Codewords dictionary formation Typically a codeword dictionary is obtained from a training set comprising all the object classes of interests

29 Visual vocabularies: Issues How to choose vocabulary size? Too small: visual words not representative of all patches Too large: quantization artifacts, overfitting Computational efficiency Vocabulary trees (Nister & Stewenius, 2006)

30 3. Bag of word representation Nearest neighbors assignment KD tree search strategy Codewords dictionary

31 frequency 3. Bag of word representation. codewords Codewords dictionary

Representing textures Texture is characterized by the repetition of basic elements or textons For stochastic textures, it is the identity of the textons, not their spatial arrangement, that

32 Representing textures Texture is characterized by the repetition of basic elements or textons For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003 Credit slide: S. Lazebnik

33 Representing textures histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003 Credit slide: S. Lazebnik

34 Invariance issues Scale? Rotation? View point? Occlusions? Implicit; depends on detectors and descriptors Kadir and Brady. 2003

35 Representation 1. feature detection & representation 2. codewords dictionary 3. image representation category models

36 Category models Class 1 Class N

37 Recognition codewords dictionary category models (and/or) classifiers category decision

38 Lecture 12 Visual recognition Bag of words models for object recognition and classification Discriminative methods Nearest neighborhood Linear classifier SVM Generative methods Silvio Savarese Lecture 11 17Feb14

39 Discriminative classifiers category models Model space Class 1 Class N

40 Discriminative classifiers Query image Model space Winning class: pink

41 Nearest Neighbors classifier Query image Model space Winning class: pink Assign label of nearest training data point to each test data point

42 K Nearest Neighbors classifier Query image Model space Winning class: pink For a new point, find the k closest points from training data Labels of the k points vote to classify Works well provided there is lots of data and the distance function is good

43 K Nearest Neighbors classifier from Duda et al. Voronoi partitioning of feature space for 2category 2D and 3D data For k dimensions: kd tree = spacepartitioning data structure for organizing points in a kdimensional space Enable efficient search Nice tutorial:

44 Functions for comparing histograms L1 distance χ 2 distance Quadratic distance (crossbin) N i i h i h h h D ) ( ) ( ), ( Jan Puzicha, Yossi Rubner, Carlo Tomasi, Joachim M. Buhmann: Empirical Evaluation of Dissimilarity Measures for Color and Texture. ICCV 1999 N i i h i h i h i h h h D ) ( ) ( ) ( ) ( ), ( j i ij j h i h A h h D, )) ( ) ( ( ), (

45 Discriminative classifiers (linear classifier) category models Model space Class 1 Class N w For a linear classifier, the training data is used to learn w and then discarded Only w is needed for classifying new data

46 Linear classifiers We want to classify two classes of points Each point x i can have two labels y i : {1, 1}

47 Linear classifiers GOAL: learn a classifier f(x) such that: where: Find the hyperplane (w,b) to separate training points w, b w

48 Linear classifiers Once w,b are learnt, we can do classification: if x w b 0 class 1 if x w b 0 class 2 Test point

49 Linear classifiers Which hyperplane is best?

50 Select two hyperplanes such: Support vector machines They separate the training points There are no points between them Their distance is maximized Support vectors: x i w b 1 x w b Distance between point i and hyperplane: w Margin = 2 / w Solution: w y x i i i i Support vectors Margin b 1 N N i1 ( x w i y i ) The region bounded by them is called "the margin". Maximum margin solution: most stable under perturbations of the inputs

51 Support vector machines Classification: f ( x) i i y i x i x b Test point if f ( x) 0 x class 1 if f ( x) 0 x class 2 C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

52 Linearly not separable Linearly separable with small margins Linearly separable Linear separability Courtesy of A Zisserman

53 Linear separability Two possible solutions: Introduce soft variables (through slack variables) Non linear separation function (e.g., non linear SVM)

54 Soft margins The points can be linearly separated but there is a very narrow margin Or they cannot be separated at all IDEA: still seek at large margin solution, even though one constraint is violated In general there is a trade off between the margin and the number of mistakes on the training data Courtesy of A Zisserman

55 Soft margins By Corinna Cortes and Vladimir N. Vapnik, 1995 Use soft margin violations instead of the hard one: Find hyperplanes that split the examples as cleanly as possible, while still maximizing the distance to the nearest cleanly split examples. Introduce slack variables to measure the degree of misclassification of the data

56 Nonlinear SVMs Given a nonlinearly separable dataset: 0 x Map it to a higherdimensional space: x 2 0 x Slide credit: Andrew Moore

57 Nonlinear SVMs General idea: the original input space can always be mapped to some higherdimensional feature space where the training set is separable: Φ: x φ(x) lifting transformation Slide credit: Andrew Moore

58 Nonlinear SVMs Nonlinear decision boundary in the original feature space: i i y i x i x b i yik ( xi, x ) i b The kernel K = product of the lifting transformation φ(x): K(x i,x j j) = φ(x i ) φ(x j ) NOTE: It is not required to compute φ(x) explicitly: The kernel must satisfy the Mercer inequality C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

59 Kernels for bags of features Histogram intersection kernel: I( h1, h2 ) min( h1 ( i), h2 ( i)) i1 Generalized Gaussian kernel: N 1 K( h, h2 ) exp D( h1, h A 2 1 2) D can be Euclidean distance, χ 2 distance etc J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study, IJCV 2007

60 Which classifier to use? Linear Nonlinear

61 Which classifier to use? Linear Nonlinear Let s add more training data! A more complex model can over fit the data if these are not enough!

62 What about multiclass SVMs? No definitive multiclass SVM formulation In practice, we have to obtain a multiclass SVM by combining multiple twoclass SVMs One vs. others Traning: learn an SVM for each class vs. the others Testing: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value One vs. one Training: learn an SVM for each pair of classes Testing: each learned SVM votes for a class to assign to the test example Credit slide: S. Lazebnik

63 SVMs: Pros and cons Pros Many publicly available SVM packages: Kernelbased framework is very powerful, flexible SVMs work very well in practice, even with very small training sample sizes Cons No direct multiclass SVM, must combine twoclass SVMs Computation, memory During training time, must compute matrix of kernel values for every pair of examples Learning can take a very long time for largescale problems

64 Discriminative classifiers (linear classifier) category models Model space Class 1 Class N w

65 Discriminative classifiers (linear classifier) Query image Model space Winning class: pink w

66 Caltech 101

67 Caltech 101 BOW ~15%

68 Spatial Pyramid Matching Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce N I( h1, h2 ) min( h1 ( i), h2 ( i)) i1

69 Caltech 101

70 Caltech 101 Pyramid matching

Discriminative models Nearest neighbor Neural

Baluja, Kanade 1998 Support Vector Machines

Vapnik, Heisele, Serre, Poggio Felzenszwalb 00

71 Discriminative models Nearest neighbor Neural networks 10 6 examples Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 Support Vector Machines Latent SVM Structural SVM Boosting Guyon, Vapnik, Heisele, Serre, Poggio Felzenszwalb 00 Ramanan 03 Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006, Courtesy of Vittorio Ferrari Slide credit: Kristen Grauman Slide adapted from Antonio Torralba

72 Next Lecture Bag of words models for object recognition and classification Generative methods 18Feb14

EECS 442 Computer vision. Object Recognition

EECS 442 Computer vision. Object Recognition EECS 442 Computer vision Object Recognition Intro Recognition of 3D objects Recognition of object categories: Bag of world models Part based models 3D object categorization Computer Vision: Algorithms