Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

Size: px
Start display at page:

Download "Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61"

Transcription

1 Object Recognition in Images and Video: Advanced Bag-of-Words Prof. Andrew D. Bagdanov Dipartimento di Ingegneria dell Informazione Università degli Studi di Firenze andrew.bagdanov AT unifi.it 27 April 2017 Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

2 Outline 1 Comments 2 Overview 3 The Bag-of-Words Model 4 Interlude 5 Advanced BOW: Spatial Pyramids 6 Advanced BOW: Sparse Coding 7 Advanced BOW: Fisher Vectors 8 Detection: Deformable Part Models 9 Discussion Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

3 Comments Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

4 Comments: Final Exam For the exam (your presentations), we can be flexible. We re all busy, and I recognize that. For those in Florence: we can arrange to do your presentations at any time (preferably by mid-june). For those outside of Florence: we can also do the final exam by Skype, if that s easier. I strongly desire to finish all exam presentations by mid-june. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

5 Overview Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

6 Overview In this lesson I will first pickup where I left off with an explanation of the basic Bag-of-Words (BOW) model. Then, I will explain three extensions of this basic model (which is why the lecture is called Advanced Bag-of-Words). And then I will cover an article on detection (which is only loosely connected with the Bag-of-Words). Since you have all read the articles, I will cover the details briefly, then I will open the floor for discussion and questions. Together, we will work to reach a deeper understanding of each contribution. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

7 The Bag-of-Words Model Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

8 Three Magic Ingredients Now we will shift our discussion to one of the first Big Breakthroughs in modern object recognition. Visual Categorization with Bags of Keypoints, Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray. In: European Conference on Computer Vision (ECCV), These ideas were developed independently, in many places, at the same time. This paper is one of the first, and in my opinion the simplest explanation of the basic Bag-of-Words pipeline. Again returning to our analogy with text retrieval, we now have a reasonably invariant way to describe local image structure. However, we still don t have a concept corresponding to words. SIFT features are 128-dimensional vectors, which are not discrete enough to use in a TF*IDF model. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

9 Feature Quantization Key idea: use clustering to identify groups of SIFT points using a training set. The centers are used as a visual vocabulary words in our model. All SIFT descriptors extracted from training or test images are quantized to the closest visual word in our vocabulary. We have gone from an infinite class of SIFT descriptors, to a finite class of visual words. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

10 Feature Pooling One last problem: the number of SIFT descriptor is variable: each image will yield a different number of points. Also, the order of points (for comparison, for example) is crucial. This problem makes it hard to apply standard, machine learning techniques to our representation (e.g. SVM, naive-bayes, nearest neighbor, etc). The solution: like in text retrieval, use pooling to build a fixed-length descriptor of images that is invariant to descriptor order. Our descriptor is a histogram of frequencies of visual word occurrences in the image. To compare images we can now use: inner products (like TF*IDF), SVMs, and a vast array of tried and true classifiers. This last point is most important: given a training set of images labeled with object categories, we can train classifiers to recognize objects in unseen test images. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

11 The Bag-of-Words Model This full pipeline is best explained graphically Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

12 The Bag-of-Words Model Csurka et al. demonstrated the BOW approach on a dataset with 7 object categories. They extract BOW descriptors from training images and train a multiclass, one-versus-all, linear SVM for each. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

13 The Bag-of-Words The punchline: the results on this challenging dataset are impressive. The approach uses a small vocabulary of 1000 visual words (in text retrieval, 100K+ word dictionaries are common). It also uses an extremely simple linear SVM for classification. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

14 The Bag-of-Words Added bonus: visual words are semantically meaningful (note, this example from Csurka et al. is highly cherry-picked): Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

15 The Bag-of-Words Another bonus: the one-versus-all SVM architecture can recognize multiple object categories in images. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

16 BOW: Discussion Discussion Like the SIFT descriptor, it is hard to overstate the impact and influence the Bag-of-Words model has had on the development of modern object recognition. It is a hallmark result, despite its extreme simplicity (in hindsight). The paper of Csurka et al. was the first to demonstrate the plausibility of efficient, accurate, and robust object recognition over many categories with extreme visual variance. Clearly, this simple BOW model was only the beginning. The next ten years of computer vision was dominated by incremental improvements and refinements of this model. In the next lecture we will head of in that direction with a survey of advanced Bag-of-Words models that came after. Note: see the course website for the required reading for next week. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

17 Interlude Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

18 Interlude: Ten Years of Progress The Bag-of-Words was quickly adopted by the community as řthe* method for object recognition. There rapidly followed a series of many, many improvements over the basic BOW model. In the following, we will look at some important examples. With the adoption of the BOW model, and the growing interest in object recognition, there were also established several standard benchmark datasets. Many of these datasets were developed in the context of international competitions: PASCAL VOC: five years of competitions, many high-quality benchmark datasets. ImageNet: first version with 1000 object categories, over 1M images. Current version has 10M+ images. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

19 Advanced BOW: Spatial Pyramids Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

20 BOW is an Orderless Image Representation The next paper we will examine is: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, S Lazebnik, C Schmid, J Ponce. In: Computer Vision and Pattern Recognition (CVPR), The motivation behind this work is that the Bag-of-Words is an orderless image representation. We can think of the BOW histogramming process as marginalizing spatial information away. Assume we have encoded an image in quantized visual words that is, each spatial location is represented by a single integer representing the cluster center the SIFT descriptor is closest to. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

21 BOW is an Orderless Image Representation Assume we have K = 1000 visual words in our vocabulary, and let I q be the quantized image (i.e. each location is represented by an integer index). Let δ i be the (curried) Kronecker delta function: δ i (j) = { 1 ifi = j 0 otherwise We can express the image as a one-hot discrete of field of vectors, and the histogram as a sum over the field: I 1-hot (x, y) = [δ i (I(x, y))] 1000 i=1 H(I 1-hot ) = I 1-hot (x, y) x y Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

22 BOW is an Orderless Image Representation Feature: we have a fixed-length representation of images. Feature: this representation has strong invariance. Bug: we lose all spatial coherence in this representation it s too invariant). Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

23 Spatial Pyramids: Impose Some Order The main idea: revisit global non-invariant representations based on aggregating statistics of local features; and use kernel-based recognition that computes rough geometric correspondence on a global scale. Once you sweep away the suppercazzola in the paper, the method is quite simple: repeatedly subdivide the image; and compute histograms of local features at increasingly fine resolutions. The spatial pyramid technique is simple and extremely effective. After its publication, it became ubiquitous in nearly all BOW pipelines. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

24 Spatial Pyramids: A Technical Aside The Support Vector Machine (SVM) is the standard classifier BOW. The linear SVM objective function is to find w that minimizes: [ ] 1 n max (0, 1 y i(w x i + b)) + λ w 2. n i=1 This can be rewritten as a constrained optimization problem: minimize 1 n ζ i + λ w 2 n i=1 subject to y i(w x i + b) 1 ζ i and ζ i 0, for all i. And the dual formulation: maximize f (c 1... c n) = subject to n i=1 n c i 1 2 i=1 n i=1 n y ic i(x i x j)y jc j, j=1 c iy i = 0, and 0 c i 1 for all i. 2nλ Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

25 Spatial Pyramids: A Technical Aside The c i in the dual are formulated so that we can write the classifier vector as: n w = c i y i x i. i=1 Kernel trick: embed our feature vectors x i in a Hilbert Space φ(x i ). n maximize f (c 1... c n ) = c i 1 n n y i c i k( x i, x j )y j c j 2 subject to n i=1 Where k(x i, y i ) = φ(x i ) φ(y i ). i=1 i=1 j=1 c i y i = 0, and 0 c i 1 for all i. 2nλ So, we never have to actually embed out features, we just need to compute the kernel matrix of all pairs of inner products. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

26 Spatial Pyramids: A Technical Aside Some popular kernels Linear: k(x, y) = x y Gaussian RBF: k(x, y) = exp ( x x 2 2σ 2 ) χ 2 : Exponential χ 2 : k(x, y) = 1 2 d i=1 (x i y i ) 2 x i + y i k(x, y) = exp( β(1χ 2 (x, y))) Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

27 Spatial Pyramids: Structured Matching The inspiration for Spatial Pyramids comes from a technique called pyramid matching for measuring similarity between n-dimensional point sets X and Y. It constructs a sequence of grids at resolutions 0,..., L such that the grid at level l has 2 l cells along each of the d dimensions. Thus, there is a total of D = 2 dl cells at each level. Finally, let HX l and Hl Y denote the histograms of X and Y at level l. So: HX l and Hl Y are the number of from X and Y that fall into the ith cell of the grid. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

28 Spatial Pyramids: Structured Matching The main tool in defining Spatial Pyramids is the Histogram Intersection Kernel: D I(HX l, HY l ) = min(hx l (i), HY l (i)) i=1 What we re doing here is simply counting the number of points that hit the same cells in the dyadic spatial decomposition. If we notice that the matches found at level l also include all matches at finer levels, we can write the pyramid match kernel: L 1 κ L (X, Y ) = I L L l (Il I l 1 ) = 1 2 L I0 + l=0 L l=1 2 1 L l+1 Il Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

29 Spatial Pyramids: Structured Matching To extend this idea to the BOW model, we can quantize all salient locations in the image (SIFTs descriptors). Let X m and Y m represent the spatial locations of all visual words of type m in images X and Y, respectively. Then, we can write the Spatial Pyramid Match Kernel as: M K L (X, Y ) = κ L (X m, Y m ) m=1 Note that κ L is just a weighted sum of histogram intersections. Note also that, for positive numbers, c min(a, b) = min(ca, cb). Thus, we can write the above as a single histogram intersection of concatenations of appropriately weighted channel histograms at all levels. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

30 Spatial Pyramids: What s Going On This is the diagram from the paper: But this is how people usually think of the Spatial Pyramid: Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

31 Spatial Pyramids: Experiments New Trend: More Data is Better. Scenes-15: Fifteen categories with strong inter-class variability, intra-class similarity images per category. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

32 Spatial Pyramids: Experiments New Trend: More Data is Better. Caltech-101: 101 categories with strong inter-class variability, intra-class similarity images per category. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

33 Spatial Pyramids: Experiments Results are impressive and interesting on Scenes-15: And on Caltech-101: Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

34 Spatial Pyramids: Reflections Despite the simplicity of the method, it consistently achieves an improvement over an orderless image representation. This also despite the fact that it works not by constructing explicit object models, but by using global cues as indirect evidence about the presence of an object. This is not a trivial accomplishment, given that a well-designed bag-of-features method can outperform more sophisticated approaches based on parts and relations. As I mentioned before, the Spatial Pyramid technique became a standard trick to significantly and consistently improve results for BOW models. In the next two papers, we will look at more sophisticated coding techniques. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

35 Advanced BOW: Sparse Coding Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

36 Locality-constrained Linear Coding The original BOW model uses global pooling of descriptors, hence a single, global image representation. Spatial Pyramids add some spatial structure to the image representation. Can we improve the way features themselves are coded before pooling into the final image represenation? We will look at one approach to better feature coding in our next paper: Locality-constrained linear coding for image classification, J Wang J Yang, K Yu, F Lv, T Huang, Y Gong. In: Computer Vision and Pattern Recognition (CVPR), Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

37 Locality-constrained Linear Coding Some ovservations: BOW + SPM works really well. But: requires non-linear SVMs to achieve state-of-the-art performance. As datasets grow larder, computing and storing the kernel matrix for solving the dual SVM formulation is onerous. We hope: for a better feature encoding that allows us to achieve state-of-the-art results, but with linear SVMs. The key insight is to use the codebook (visual vocabulary) more evvectively. This is done through sparse coding to encode local features. Followed by max pooling (as opposed to average pooling) to arrive at the global image description. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

38 LLC: Basic Definitions Let X = [x 1,..., x N ] R D N Given a set of codewords B = [b 1,..., b M ] R D M, We want to encode each x i into an M-dimensional code. Vector Quantization is used in the BOW (resulting in a set of 1-hot codes C = [c 1,..., c M ] R D N : Sparse coding can be used instead (reference 22): This leads to lower quantization loss by using more elements of the codebook to encode local features. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

39 LLC: Paying a Price LLC proposes to use an additional locality (in feature space) constraint: Where locality is smoothly modeled with a an exponential: So: code features, but pay a cost for using codewords far from the descriptor we are encoding. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

40 LLC: What s Going On Here is a comparison: Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

41 LLC: What s Going On The full pipeline: Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

42 LLC: Implementation In practice, solving the constrained optimization problem for every descriptor is too costly. Solution: select the k nearest codewords in feature space, and solve a constrained least-squares problem using only k codewords. Codebook optimization: maybe k-means doesn t yield an optimal codebook for LLC codeing: Section 3 gives an iterative algorithm for building a codebook. In practice, everyone uses k-means. Classification: the LLC embedding is rich enough to allow it to perform well with linear SVMs as classifiers. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

43 LLC: Results Extract dense HOG descriptors (8-pixel stride), at three scales. Use k = 5 for approximate LLC encoding. They also use a Spatial Pyramid (but don t explain the configuration). The authors consider two pooling methods to arrive at the final image representation: sum-pooling: just sum all codes (this is the BOW pooling). max-pooling: take the maximum coefficient for each codeword. Then report results with: max-pooling with L2 normalization (since they use linear SVMs). Use a linear SVM to train one-versus-all classifiers for each category. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

44 LLC: Results On Caltech-101: And on Caltech-256: Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

45 LLC: Results A new dataset (which became the benchmark for object recognition). The PASCAL Visual Object Categorization (VOC) competitions defined the state-of-the-art for five years. 20 object categories, high-quality annotations, recognition, segmentation, detections, etc. Introduced use of average precision to evaluate object recognition. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

46 LLC: Results Results on PASCAL 2007: Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

47 LLC: Reflections The LLC encoding technique takes a different approach to enriching the image representation. It uses sparse codes, but this codes are sparse in that only local codewords can contribute to the encoding of features. Global pooling is done using a max operation, which helps ensure global quasi-sparsity. The resulting codes can be used with linear SVMs, which is a huge win for large datasets. Beats other BOW/SPM approaches, and achieves results comparable to more complex methods at the state-of-the-art. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

48 Advanced BOW: Fisher Vectors Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

49 FV: Clusters are Not Points The main observation in Fisher Vector coding is that the quantization process is imprecise. More precisely: clusters are distributions of points. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

50 FV: Start with a Generative Model Let X = { x t, t = 1... T } be the descriptors extracted from an image. Assume there is a generation process for X modeled by a probability density function u λ with parameters λ. X can be characterized by the gradient: G X λ = 1 T λ log u λ (X) Why the gradient? Because it describes the contribution of each parameter to the generation process (also, the gradient of the data log likelihood is the Fisher Score). Plus, the dimensionality depends only on the number of parameters in λ and not on the number of patches in the image. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

51 FV: Then define a kernel A natural kernel to use for gradients of generative model likelihoods is the Fisher kernel: K(X, Y ) = G X λ Fλ 1 GY λ Where F 1 λ is the Fisher Information Matrix: F λ = E x uλ [ λ log u λ (x) λ log u λ (x) ] = L λl λ (by Cholesky decomposition of symmetric and p.d. L) So, we can rewrite K(X, Y ) as dot-products between normalized vectors: G X λ = L λ G X λ Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

52 FV: The Fisher Vector This G X λ is the Fisher Vector of image X. Big win: learning a kernel classifier with the non-linear kernel K is equivalent to learning a linear classifier on Fisher Vectors. Implementation: Assume the generative model is a mixture of Gaussians. And assume that all x i are generated independently: Gλ X = 1 T λ log u λ (x t) T t=1 Compute gradients with respect to mean and diagonal covariance of all mixture components. Final descriptor is the concatenation of: Gµ,i X 1 T ( ) = T xt µ t γ t(i) w i σ i t=1 Gσ,i X 1 T ( ) = T (xt µ t) 2 γ t(i) 1 w i t=1 σ 2 i Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

53 FV: What s Going On The Fisher Vector is a weighted average of gradients. These gradients are defined at every point in descriptor space. First compute the FV for each individual descriptor. Then average pool the vectors to computer the FV encoding for the image. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

54 FV: Improvements There are really only two improvements to the Fisher Vector proposed in this paper. Improvement 1: L2 normalize the Fisher Vectors before training an SVM (not surprising). Power normalization: Also known as "signed square-root" or the Hellinger kernel, it compensates for bursty features: Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

55 FV: Baseline Comparison It has become standard practice to do a baseline comparison. In these experiments, you want to evaluate the contribution of your work. Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

56 FV: Us versus Them The proof is in the pudding. PASCAL 2007: Caltech 256: Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

57 FV: Reflections The Fisher Vector is an alternative coding method. Instead of quantizing descriptors, you represent each descriptor with a gradient. This gradient represents the relationship between a local descriptor and all clusters in a generative model. This encoding can significantly improve performance over the BOW framework. Bonus: everything is linear, and we can use efficient solvers even for large-scale datasets. The Fisher Vector was the state-of-the-art in Bag-of-Features coding until Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

58 Detection: Deformable Part Models Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

59 Deformable Part Models [OTHER PRESENTATIONS] Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

60 Discussion Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

61 Discussion Discuss Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Beyond bags of Features

Beyond bags of Features Beyond bags of Features Spatial Pyramid Matching for Recognizing Natural Scene Categories Camille Schreck, Romain Vavassori Ensimag December 14, 2012 Schreck, Vavassori (Ensimag) Beyond bags of Features

More information

Artistic ideation based on computer vision methods

Artistic ideation based on computer vision methods Journal of Theoretical and Applied Computer Science Vol. 6, No. 2, 2012, pp. 72 78 ISSN 2299-2634 http://www.jtacs.org Artistic ideation based on computer vision methods Ferran Reverter, Pilar Rosado,

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Mixtures of Gaussians and Advanced Feature Encoding

Mixtures of Gaussians and Advanced Feature Encoding Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Bag-of-features. Cordelia Schmid

Bag-of-features. Cordelia Schmid Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Exploring Bag of Words Architectures in the Facial Expression Domain

Exploring Bag of Words Architectures in the Facial Expression Domain Exploring Bag of Words Architectures in the Facial Expression Domain Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett Machine Perception Laboratory, University of California San Diego {ksikka,ting,josh,marni}@mplab.ucsd.edu

More information

Fisher vector image representation

Fisher vector image representation Fisher vector image representation Jakob Verbeek January 13, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.11.12.php Fisher vector representation Alternative to bag-of-words image representation

More information

Codebook Graph Coding of Descriptors

Codebook Graph Coding of Descriptors Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'5 3 Codebook Graph Coding of Descriptors Tetsuya Yoshida and Yuu Yamada Graduate School of Humanities and Science, Nara Women s University, Nara,

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani 2

A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani 2 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 10, 2014 ISSN (online): 2321-0613 A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani

More information

Sparse coding for image classification

Sparse coding for image classification Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction

More information

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task CPPP/UFMS at ImageCLEF 2014: Robot Vision Task Rodrigo de Carvalho Gomes, Lucas Correia Ribas, Amaury Antônio de Castro Junior, Wesley Nunes Gonçalves Federal University of Mato Grosso do Sul - Ponta Porã

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012)

Three things everyone should know to improve object retrieval. Relja Arandjelović and Andrew Zisserman (CVPR 2012) Three things everyone should know to improve object retrieval Relja Arandjelović and Andrew Zisserman (CVPR 2012) University of Oxford 2 nd April 2012 Large scale object retrieval Find all instances of

More information

CS 231A Computer Vision (Fall 2011) Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4 CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Improved Spatial Pyramid Matching for Image Classification

Improved Spatial Pyramid Matching for Image Classification Improved Spatial Pyramid Matching for Image Classification Mohammad Shahiduzzaman, Dengsheng Zhang, and Guojun Lu Gippsland School of IT, Monash University, Australia {Shahid.Zaman,Dengsheng.Zhang,Guojun.Lu}@monash.edu

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task S. Avila 1,2, N. Thome 1, M. Cord 1, E. Valle 3, and A. de A. Araújo 2 1 Pierre and Marie Curie University, UPMC-Sorbonne Universities, LIP6, France

More information

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies?

Basic Problem Addressed. The Approach I: Training. Main Idea. The Approach II: Testing. Why a set of vocabularies? Visual Categorization With Bags of Keypoints. ECCV,. G. Csurka, C. Bray, C. Dance, and L. Fan. Shilpa Gulati //7 Basic Problem Addressed Find a method for Generic Visual Categorization Visual Categorization:

More information

Metric Learning for Large Scale Image Classification:

Metric Learning for Large Scale Image Classification: Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre

More information

CS6670: Computer Vision

CS6670: Computer Vision CS6670: Computer Vision Noah Snavely Lecture 16: Bag-of-words models Object Bag of words Announcements Project 3: Eigenfaces due Wednesday, November 11 at 11:59pm solo project Final project presentations:

More information

Learning Representations for Visual Object Class Recognition

Learning Representations for Visual Object Class Recognition Learning Representations for Visual Object Class Recognition Marcin Marszałek Cordelia Schmid Hedi Harzallah Joost van de Weijer LEAR, INRIA Grenoble, Rhône-Alpes, France October 15th, 2007 Bag-of-Features

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study J. Zhang 1 M. Marszałek 1 S. Lazebnik 2 C. Schmid 1 1 INRIA Rhône-Alpes, LEAR - GRAVIR Montbonnot, France

More information

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature

Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature 0/19.. Multi-view Facial Expression Recognition Analysis with Generic Sparse Coding Feature Usman Tariq, Jianchao Yang, Thomas S. Huang Department of Electrical and Computer Engineering Beckman Institute

More information

OBJECT CATEGORIZATION

OBJECT CATEGORIZATION OBJECT CATEGORIZATION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it Slides: Ing. Lamberto Ballan November 18th, 2009 What is an Object? Merriam-Webster Definition: Something material that may be

More information

TA Section: Problem Set 4

TA Section: Problem Set 4 TA Section: Problem Set 4 Outline Discriminative vs. Generative Classifiers Image representation and recognition models Bag of Words Model Part-based Model Constellation Model Pictorial Structures Model

More information

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013 CS4670 / 5670: Computer Vision Noah Snavely Bag-of-words models Object Bag of words Bag of Words Models Adapted from slides by Rob Fergus and Svetlana Lazebnik 1 Object Bag of words Origin 1: Texture Recognition

More information

Action recognition in videos

Action recognition in videos Action recognition in videos Cordelia Schmid INRIA Grenoble Joint work with V. Ferrari, A. Gaidon, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang Action recognition - goal Short actions, i.e. drinking, sit

More information

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,

More information

Determinant of homography-matrix-based multiple-object recognition

Determinant of homography-matrix-based multiple-object recognition Determinant of homography-matrix-based multiple-object recognition 1 Nagachetan Bangalore, Madhu Kiran, Anil Suryaprakash Visio Ingenii Limited F2-F3 Maxet House Liverpool Road Luton, LU1 1RS United Kingdom

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017 CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

Beyond Bags of features Spatial information & Shape models

Beyond Bags of features Spatial information & Shape models Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

Image classification Computer Vision Spring 2018, Lecture 18

Image classification Computer Vision Spring 2018, Lecture 18 Image classification http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 18 Course announcements Homework 5 has been posted and is due on April 6 th. - Dropbox link because course

More information

CLASSIFICATION Experiments

CLASSIFICATION Experiments CLASSIFICATION Experiments January 27,2015 CS3710: Visual Recognition Bhavin Modi Bag of features Object Bag of words 1. Extract features 2. Learn visual vocabulary Bag of features: outline 3. Quantize

More information

Object Detection Using Segmented Images

Object Detection Using Segmented Images Object Detection Using Segmented Images Naran Bayanbat Stanford University Palo Alto, CA naranb@stanford.edu Jason Chen Stanford University Palo Alto, CA jasonch@stanford.edu Abstract Object detection

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Efficient Kernels for Identifying Unbounded-Order Spatial Features Efficient Kernels for Identifying Unbounded-Order Spatial Features Yimeng Zhang Carnegie Mellon University yimengz@andrew.cmu.edu Tsuhan Chen Cornell University tsuhan@ece.cornell.edu Abstract Higher order

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

Learning Compact Visual Attributes for Large-scale Image Classification

Learning Compact Visual Attributes for Large-scale Image Classification Learning Compact Visual Attributes for Large-scale Image Classification Yu Su and Frédéric Jurie GREYC CNRS UMR 6072, University of Caen Basse-Normandie, Caen, France {yu.su,frederic.jurie}@unicaen.fr

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

Discriminative sparse model and dictionary learning for object category recognition

Discriminative sparse model and dictionary learning for object category recognition Discriative sparse model and dictionary learning for object category recognition Xiao Deng and Donghui Wang Institute of Artificial Intelligence, Zhejiang University Hangzhou, China, 31007 {yellowxiao,dhwang}@zju.edu.cn

More information

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Zhu Li Dept of CSEE,

More information

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. CSE 455 Linda Shapiro Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Developing Open Source code for Pyramidal Histogram Feature Sets

Developing Open Source code for Pyramidal Histogram Feature Sets Developing Open Source code for Pyramidal Histogram Feature Sets BTech Project Report by Subodh Misra subodhm@iitk.ac.in Y648 Guide: Prof. Amitabha Mukerjee Dept of Computer Science and Engineering IIT

More information

The devil is in the details: an evaluation of recent feature encoding methods

The devil is in the details: an evaluation of recent feature encoding methods CHATFIELD et al.: THE DEVIL IS IN THE DETAILS 1 The devil is in the details: an evaluation of recent feature encoding methods Ken Chatfield http://www.robots.ox.ac.uk/~ken Victor Lempitsky http://www.robots.ox.ac.uk/~vilem

More information

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction Preprocessing The goal of pre-processing is to try to reduce unwanted variation in image due to lighting,

More information

Scene Recognition using Bag-of-Words

Scene Recognition using Bag-of-Words Scene Recognition using Bag-of-Words Sarthak Ahuja B.Tech Computer Science Indraprastha Institute of Information Technology Okhla, Delhi 110020 Email: sarthak12088@iiitd.ac.in Anchita Goel B.Tech Computer

More information

Descriptors for CV. Introduc)on:

Descriptors for CV. Introduc)on: Descriptors for CV Content 2014 1.Introduction 2.Histograms 3.HOG 4.LBP 5.Haar Wavelets 6.Video based descriptor 7.How to compare descriptors 8.BoW paradigm 1 2 1 2 Color RGB histogram Introduc)on: Image

More information

Kernels for Visual Words Histograms

Kernels for Visual Words Histograms Kernels for Visual Words Histograms Radu Tudor Ionescu and Marius Popescu Faculty of Mathematics and Computer Science University of Bucharest, No. 14 Academiei Street, Bucharest, Romania {raducu.ionescu,popescunmarius}@gmail.com

More information

By Suren Manvelyan,

By Suren Manvelyan, By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

arxiv: v1 [cs.cv] 20 Dec 2013

arxiv: v1 [cs.cv] 20 Dec 2013 Occupancy Detection in Vehicles Using Fisher Vector Image Representation arxiv:1312.6024v1 [cs.cv] 20 Dec 2013 Yusuf Artan Xerox Research Center Webster, NY 14580 Yusuf.Artan@xerox.com Peter Paul Xerox

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

Sketchable Histograms of Oriented Gradients for Object Detection

Sketchable Histograms of Oriented Gradients for Object Detection Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14 Announcements Computer Vision I CSE 152 Lecture 14 Homework 3 is due May 18, 11:59 PM Reading: Chapter 15: Learning to Classify Chapter 16: Classifying Images Chapter 17: Detecting Objects in Images Given

More information

Classification and Detection in Images. D.A. Forsyth

Classification and Detection in Images. D.A. Forsyth Classification and Detection in Images D.A. Forsyth Classifying Images Motivating problems detecting explicit images classifying materials classifying scenes Strategy build appropriate image features train

More information

Generic object recognition using graph embedding into a vector space

Generic object recognition using graph embedding into a vector space American Journal of Software Engineering and Applications 2013 ; 2(1) : 13-18 Published online February 20, 2013 (http://www.sciencepublishinggroup.com/j/ajsea) doi: 10.11648/j. ajsea.20130201.13 Generic

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision

University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision report University of Cambridge Engineering Part IIB Module 4F12 - Computer Vision and Robotics Mobile Computer Vision Web Server master database User Interface Images + labels image feature algorithm Extract

More information

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017 Data Analysis 3 Support Vector Machines Jan Platoš October 30, 2017 Department of Computer Science Faculty of Electrical Engineering and Computer Science VŠB - Technical University of Ostrava Table of

More information

CS 231A Computer Vision (Fall 2012) Problem Set 4

CS 231A Computer Vision (Fall 2012) Problem Set 4 CS 231A Computer Vision (Fall 2012) Problem Set 4 Master Set Due: Nov. 29 th, 2012 (23:59pm) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable

More information

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013 Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest

More information

Improved Fisher Vector for Large Scale Image Classification XRCE's participation for ILSVRC

Improved Fisher Vector for Large Scale Image Classification XRCE's participation for ILSVRC Improved Fisher Vector for Large Scale Image Classification XRCE's participation for ILSVRC Jorge Sánchez, Florent Perronnin and Thomas Mensink Xerox Research Centre Europe (XRCE) Overview Fisher Vector

More information

Loose Shape Model for Discriminative Learning of Object Categories

Loose Shape Model for Discriminative Learning of Object Categories Loose Shape Model for Discriminative Learning of Object Categories Margarita Osadchy and Elran Morash Computer Science Department University of Haifa Mount Carmel, Haifa 31905, Israel rita@cs.haifa.ac.il

More information

Object recognition. Methods for classification and image representation

Object recognition. Methods for classification and image representation Object recognition Methods for classification and image representation Credits Slides by Pete Barnum Slides by FeiFei Li Paul Viola, Michael Jones, Robust Realtime Object Detection, IJCV 04 Navneet Dalal

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION

CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 4, 2015 ISSN 2286-3540 CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION Ionuţ Mironică 1, Bogdan Ionescu 2, Radu Dogaru 3 In this paper we propose

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Integrated Feature Selection and Higher-order Spatial Feature Extraction for Object Categorization

Integrated Feature Selection and Higher-order Spatial Feature Extraction for Object Categorization Integrated Feature Selection and Higher-order Spatial Feature Extraction for Object Categorization David Liu, Gang Hua 2, Paul Viola 2, Tsuhan Chen Dept. of ECE, Carnegie Mellon University and Microsoft

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

arxiv: v1 [cs.lg] 20 Dec 2013

arxiv: v1 [cs.lg] 20 Dec 2013 Unsupervised Feature Learning by Deep Sparse Coding Yunlong He Koray Kavukcuoglu Yun Wang Arthur Szlam Yanjun Qi arxiv:1312.5783v1 [cs.lg] 20 Dec 2013 Abstract In this paper, we propose a new unsupervised

More information

IMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS

IMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS IMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS Lingxi Xie 1, Qi Tian 2, Jingdong Wang 3, and Bo Zhang 4 1,4 LITS, TNLIST, Dept. of Computer Sci&Tech, Tsinghua University, Beijing 100084, China 2 Department

More information

Aggregated Color Descriptors for Land Use Classification

Aggregated Color Descriptors for Land Use Classification Aggregated Color Descriptors for Land Use Classification Vedran Jovanović and Vladimir Risojević Abstract In this paper we propose and evaluate aggregated color descriptors for land use classification

More information

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space. Visual words Map high-dimensional descriptors to tokens/words by quantizing the feature space. Quantize via clustering; cluster centers are the visual words Word #2 Descriptor feature space Assign word

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Multiple VLAD encoding of CNNs for image classification

Multiple VLAD encoding of CNNs for image classification Multiple VLAD encoding of CNNs for image classification Qing Li, Qiang Peng, Chuan Yan 1 arxiv:1707.00058v1 [cs.cv] 30 Jun 2017 Abstract Despite the effectiveness of convolutional neural networks (CNNs)

More information

Discriminative Spatial Pyramid

Discriminative Spatial Pyramid Discriminative Spatial Pyramid Tatsuya Harada,, Yoshitaka Ushiku, Yuya Yamashita, and Yasuo Kuniyoshi The University of Tokyo, JST PRESTO harada@isi.imi.i.u-tokyo.ac.jp Abstract Spatial Pyramid Representation

More information

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Fuzzy based Multiple Dictionary Bag of Words for Image Classification Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image

More information

Pattern Spotting in Historical Document Image

Pattern Spotting in Historical Document Image Pattern Spotting in historical document images Sovann EN, Caroline Petitjean, Stéphane Nicolas, Frédéric Jurie, Laurent Heutte LITIS, University of Rouen, France 1 Outline Introduction Commons Pipeline

More information

Patch Descriptors. EE/CSE 576 Linda Shapiro

Patch Descriptors. EE/CSE 576 Linda Shapiro Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2 Geometric VLAD for Large Scale Image Search Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2 1 2 Our Goal 1) Robust to various imaging conditions 2) Small memory footprint

More information