Locality-constrained and Spatially Regularized Coding for Scene Categorization

Size: px
Start display at page:

Download "Locality-constrained and Spatially Regularized Coding for Scene Categorization"

Transcription

1 Locality-constrained and Spatially Regularized Coding for Scene Categorization Aymen Shabou Hervé Le Borgne CEA, LIST, Vision & Content Engineering Laboratory Gif-sur-Yvettes, France Abstract Improving coding and spatial pooling for bag-of-words based feature design have gained a lot of attention in recent works addressing object recognition and scene classification. Regarding the coding step in particular, properties such as sparsity, locality and saliency have been investigated. The main contribution of this work consists in taking into acount the local spatial context of an image into the usual coding strategies proposed in the state-ofthe-art. For this purpose, given an imgae, dense local features are extracted and structured in a lattice. The latter is endowed with a neighborhood system and pairwise interactions. We propose a new objective function to encode local features, which preserves locality constraints both in the feature space and the spatial domain of the image. In addition, an appropriate efficient optimization algorithm is provided, inspired from the graph-cut framework. In conjunction with the maximum-pooling operation and the spatial pyramid matching, that reflects a global spatial layout, the proposed method improves the performances of several state-of-the-art coding schemes for scene classification on three publicly available benchmarks (UIUC 8-sport, Scene- 15 and Caltech-101). 1. Introduction In recent works addressing object recognition and scene classification tasks, the bag-of-words (BoW) is one of the most popular model for feature design. Inspired by the seminal work of [26], different approaches have been proposed to improve both its generative property to describe accurately images and its discriminatory power for classification. Despite remarkable progresses, it remains challenges concerning the extraction of local descriptors, codebook design, local descriptors coding and pooling, including a spatial layout into the final feature, and the final classification. Given a training dataset, the first step of the BoW method consists in extracting local features, such as SIFT [21], HOG [8] and SURF [1], from images. Then a codebook (or Spatial domain of dense local features X p X q X { p 0.9 X q : Local features 0.7 X r X r 0.9: similarity between X p, X q 0.7: similarity between X p, X r 1 1. Locality-constrained assignement Visual word Selected visual word 2 X p X q 2. Locality-constrained and spatially regularized assignement X r Codebook Figure 1. Schematic comparison of basis selection methods to code dense descriptors. The first configuration is the one adopted by some recent coding approaches [30, 12, 20]. The second configuration corresponds to the proposed LCSR method. a dictionary), which is a set of visual words, is built to represent them. Initial methods are based on clustering techniques, such as K-means [26]. Despite their efficiency, the obtained codebooks suffer from several drawbacks such as distortion errors and low discriminative ability [14, 28]. A more appropriate unsupervised dictionary learning method is sparse coding which aims to learn an over-complete codebook ensuring sparse representation of local descriptors [23, 22]. However, this approach is computationally expensive even if progress was made toward accelerating the process [16]. Other approaches have rather attempted to improve the discriminative power of the codebook while compacting it relying on supervised methods [14, 22, 2]. However, recent works of [6, 24] show that, for the recognition task, codebook design is less critical than the next stages (coding, pooling and spatial layout).

2 Coding consists in decomposing local features over a codebook in order to satisfy some desirable properties. Various strategies are proposed in the literature. The earliest one is the hard coding [26], a voting scheme that is simple yet highly sensitive to reconstruction errors induced by the codebook. A more robust voting approach is the soft coding [28], which assigns a descriptor to all the visual words according to their distances. Sparse coding is an alternative [32] that is time consuming and which is, moreover, non-consistent to encode similar descriptors [30, 11]. Authors of [33] introduced another coding property, called locality, that ensures sparsity while remaining efficient. Several implementations have been proposed by [30, 20], where each descriptor is coded on locally selected bases. Note also that in [12], the authors give another explanation about the success of the locality coding, which is saliency. Indeed, for a given descriptor and corresponding local bases, the closer the nearest visual word to the descriptor in comparison to the remaining local bases, the stronger its coding response should be. The next step of BoW design is pooling the obtained codes to obtain a compact signature. Usually, the maxpooling operation is used, leading to signatures that are appropriate to linear classifiers [32, 3, 2, 20]. Finally, the Spatial Pyramid Matching (SPM) step, proposed in [15], is usually exploited to include some spatial layout information to the BoW. Such vectors of fixed size can then feed a machine learning algorithm such as SVM [7] or Boosting [25]. In the current work, the local feature coding step is investigated. While several techniques have outperformed the classic hard assignment by introducing either the locality or the similarity constraints in the feature space [30, 20, 12, 11], we propose a new formalism that implicitly preserves these properties while adding the local contextual information from the spatial domain of the image. Figure 1 shows a schematic comparison. The proposed coding approach is divided into two steps. 1. The first step is an optimal basis selection for each local feature, formulated as a labeling problem. For this purpose, we introduce a novel objective function that includes locality and similarity (or coherency) constraints in both the feature space and the spatial domain of the image. Furthermore, we provide an appropriate efficient optimization algorithm, called α knn - expansion, which is inspired from the fast optimization tools dedicated to Markov Random Field (MRF) based energy minimization task [4]. 2. The second step consists in assigning responses (or values) to the selected optimal bases. This new approach enriches the BoW signature leading to more accurate features for classification than the state-ofthe-art methods. Furthermore, it is generic and can thus be added to several recent coding strategies. The remainder of this paper is as follows. In section 2, some details about related work to the coding step within the BoW feature generation framework are discussed. The new coding strategy is introduced in section 3. Section 4 highlights experimental studies and results on the following benchmarks: UIUC 8-sport [17] and scenes-15 [15] for event and scene classification respectively and Caltech- 101 [9] for object recognition. 2. Related work Let us consider a codebook denoted by B = {b i ; b i R d ; i N ; N = 1,..., K}, with N = K the size of the codebook and d the dimensionality of a visual word (or a basis vector). The codebook is constructed on a subset of local descriptors {x i ; x i R d ; i = 1,..., N} extracted from the training dataset. In the original BoW method [26], coding local descriptors is performed with hard assignment. Each local descriptor is assigned to the nearest visual word, i.e., 1 if j = argmin x i b j 2 2, z i,j = j=1,...,k (1) 0 otherwise, with z i the code of size K associated to the descriptor x i. As reported in [14, 22, 28], such coding has several limitations, mainly the sensitivity to distortion errors of the codebook. Using sparse coding [32] as an alternative has significantly improved its robustness to these problems. Therefore, coding is performed by solving the l 1 -norm regularized approximate problem: z i = argmin z R K x i Bz λ z 1, λ R. (2) Nevertheless, this optimization problem is computationally expensive and leads to non consistent encoding of similar descriptors [30, 11]. Indeed, it might select different bases for similar descriptors due to the over-completeness of the codebook, which results in large deviations in representing similar local features. Therefore, authors of [30, 20] propose more efficient and consistent coding methods relying on the locality property introduced by [33]. Their hypothesis is that descriptors approximately reside on a lower dimensional manifold in an ambient descriptor space. Then, using Euclidean distances for assigning descriptors to visual words is only meaningful within a local region. Hence, local bases are selected to perform the coding. The generalized formulation of the locality constrained coding (LCC) problem is the following: z i = argmin z R K x i Bz λ d i z 2 2, s.t. 1 T z i = 1, (3)

3 with d i = exp( dist(xi,b) σ ), dist(x i, B) = [dist(x i, b 1 ),..., dist(x i, b K )] T the Euclidean distances between x i and the basis vectors; and σ a parameter controlling the weight decay speed for the locality. An alternative to improve the consistency of sparse coding has been proposed by [11]. It consists in adding the Laplacian matrix to the objective function (2) to perform codebook learning as well as coding local features, i.e., argmin B,Z X BZ λ i z i 1 + βtr(zlz T ), s.t. b j 2 1, j N, (4) with L = A W the Laplacian matrix obtained form the similarity matrix W encoding the relationship between local features and A m,m = n W m,n. Due to the extremely high number of local descriptors in a dataset, constructing the Laplacian matrix and learning sparse codes simultaneously is computationally infeasible. A restriction to a selected set of local features, called template features, was necessary. But this technique is still computationally expensive due to the search of the k-nearest template features followed by the similarity constrained sparse coding. Salient coding [12] is another alternative approach that has shown interesting results while remaining efficient. It exploits the locality constraint to replace the conventional hard assignment in (1) with a saliency degree regarding the nearest bases b j to x i, which is defined as: ( ) x i b j 2 2 ψ i,j = φ 1 k k 1 m j x i ˆb, (5) m 2 2 where φ(.) is a monotonically decreasing function and {ˆb m } m=1,...,k is the set of the k-nearest bases to x i. Salient coding improves the consistency of the locality coding when it is performed on a small number of local bases compared to the dimension of the descriptors. All the aforementioned coding schemes are applied on local features independently, except for the Laplacian sparse coding, where a global similarity between local features is considered to constrain sparsity. However, dense local features share some contextual information locally in the spatial domain of a given image. This could be seen simply by computing a local pairwise similarity map between local features extracted from a given image, that shows local correlations in some regions of the image (see figure 4). Discarding this contextual information in the coding step would induce codes that are non-consistent in term of contextual spatial information, and also less reliable when the spatial pooling operation is conducted to design the final signature. In the same figure, we can show indexes of bases assigned to local features (as colored maps) using localityconstrained hard and soft coding strategies. We note that considering the local spatial information improves the consistency of the coding regarding to the context of the image. To our knowledge, introducing the local contextual information in the coding step of the BoW approach has not been proposed in any of the previous works. In the next section, we propose a robust and fast method to achieve this type of coding. 3. Locality-constrained and spatially regularized coding We propose here an energy based formulation to achieve a robust basis selection, required to encode dense local features of a given image. Then, a new fast resolution algorithm is provided relying on the graph-cut framework. Finally, we present the additional operations required to generate the final BoW signature Energy model Considering the Markovian assumption on a given image I, we can assume that neighboring patches within a constant or a smooth region of an image have to be coded on shared, or even similar bases from the codebook. In contrast, for neighboring patches within a discontinuous region, the corresponding local bases can be different, depending on the local descriptor information. Such an assumption leads to some interesting properties on the final coding: less noisy assignment considering the local contextual spatial information; saliency property is extended to take into account the spatial neighborhood; coding similar descriptors is consistent with the Markovian prior assumed on images. In order to obtain a robust basis selection following the proposed spatial assumption as well as the locality assumption stated in [33], we reformulate the problem as a labeling one. Formally, let us consider an image I. We denote by P = {1,..., N I } the set of indexes of dense patches (or more generally sites) in I. A set of local features X = {x p ; x p R d ; p P} is extracted from I at all sites. Given a codebook B = {b i ; b i R K ; i N }, we consider that each local feature is assigned to a subset of basis vectors, with cardinality m, belonging to the codebook. For simplicity of notations, a local feature x p is assigned to a set of indexes y p of bases in B. Therefore, Y = {y p ; y p N m ; p P} denotes the assignment of all the local features of the image I. In the LCC case for instance, each vector y p reflects the indexes of the m- nearest visual words to x p. The set of basis vectors related to the indexes in y p is denoted ˆB p = {ˆb p,i ; i = 1,..., m},

4 and we define ˆB = { ˆB p ; p P}. We also denote by L p = {l 1 p, l 2 p,..., l k p} the set of indexes of the k-nearest visual words to a local feature x p and call it the set of possible labels that a site p can take. Following the locality assumption, each local feature should be assigned to bases of cardinality m within the set of the k-nearest visual words in the codebook (k is set large enough to consider a large neighborhood in the feature space, i.e., k > m). The labeling problem we consider here consists in retrieving optimal basis for each local feature among the k- nearest ones, under the spatial contextual constraint. As a result, the locality assumptions both in the feature space and in the spatial domain of the image would be enforced. We introduce the following energy function to model the current problem: E(Y) = f data (x p, ˆB p ) +β w p,q f prior ( }{{} ˆB p, ˆB q ), }{{} p P p q E p(y p) E p,q(y p,y q) }{{}}{{} E data E prior (6) where f data (x p, ˆB p )) = m i=1 x p ˆb p,i 2 2 is the total distance between a descriptor x p and its m selected bases; p q indicates the indexes of two spatially neighboring patches under a fixed neighboring system (the grid of 4-nearest neighbors for instance); f prior ( ˆB p, ˆB q ) = m i=1 ˆb p,i ˆb q,i is a sum of the distances between the bases assigned to neighboring patches x p and x q ; w p,q is a local regularization parameter that corresponds to the similarity between local patches x p and x q. The more similar the local patches are, the higher we regularize the basis selection operation. Among the existing similarity measures of local features, we consider the histogram intersection kernel [31], denoted by K(.,.), since it has shown interesting performances when dealing with histogram based local features. We set the local hyper-parameters as the following: { K(x p, x q ) if K(x p, x q ) T, w p,q = (7) 0 otherwise. On the one hand, this binary form of local hyperparameters ensures regularization only on similar neighboring patches, above a similarity threshold T. On the other hand, it reduces the sensitivity of the model to the global regularization parameter β; E data is then a likelihood term that penalizes assigning visual words far from descriptors, whereas E prior is a prior term that penalizes assigning different visual words to similar neighboring patches. Minimizing (6) leads to an optimal assignment configuration: Ỹ = argmin E(Y X, ˆB, W). (8) Y We shall note that some particular cases of the proposed energy function lead to the state-of-the-art basis selection required for the coding, e.g., β = 0 and m = 1: hard and salient coding [26, 15, 12], β = 0 and m > 1: approximate LCC as implemented in [30, 20] Fast optimization algorithm The proposed energy function (6) is non-convex for general forms of distance functions f prior and f data. Its minimization can be performed efficiently inspiring with fast optimization tools dedicated to pairwise multi-label MRF energies [18]. In particular, the graph-cut approach has been successfully used to efficiently solve many labeling problems in computer vision [27]. One of the most popular iterative graph-cut based approximate multi-label optimization algorithms is the α-expansion [4], which relies on iterative binary moves of the desired configuration [29]. At a given iteration (i), each site can keep its current label or change it to a new one α (i) L, with L a discrete set of labels. This binary move is performed optimally by building an appropriate graph where a minimum-cut/maximum-flow is computed [10]. Several binary partition moves are iterated until convergence to a local optimum of the energy. Such large partition moves efficiently reach good local optima of nonconvex energy functions. Besides its effectiveness, it can be applied to various labeling problem, even with unordered labels. To solve the optimization problem (8), we propose an appropriate optimization algorithm extending the α-expansion in two directions. On the one hand, it deals with vectorial labels, i.e., a finite set of labels is assigned to each site. On the other hand, we constrain each site to take a label within an associated subset of labels, that can change from one site to another. The proposed optimization algorithm will be called α knn -expansion. It performs, at each iteration, a binary expansion move to a set of labels α = {α 1, α 2,..., α m } N m only for a subset of sites S α P that have α within the set of the k-nearest visual words indexes, i.e., S α = {p P ; such that α L p }. These sites are called active sites. Thereby, at each iteration of the optimization algorithm, a global binary move of all active sites is performed. Binary moves are iterated for a number of vectorial labels {α 1, α 2,..., α n } within a cycle. If the energy decreases, a new cycle is started, until convergence to a local optimum (figure 2). We note that a possible

5 Input: Y (0), W, ˆB, X Output: Ŷ S S For each cycle c do 1. Select n vectors of labels {α i} n i=1 within the set of k-nearest basis indexes to the local features 2. For each iteration i n do (a) Perform an optimal binary expansion move to α i: (b) Ỹ := Y (i) Y (i) = argmin E(Y X, ˆB, W), Y such that: y p (i) {y p (i 1), α i}, p P 3. If E(Ỹ) < E(Y(c 1) ), Then Y (c) := Ỹ Else return Ỹ Figure 2. α knn -expansion based optimization algorithm. initialization Y (0) could be the m-nearest bases, as used for LCC. In order to achieve an optimal binary move (step 2.a in figure 2), a directed graph G α = (A α, E α ) is built, where A α is a set of nodes related to active sites and E α is a set of oriented edges connecting neighboring nodes. Two auxiliary nodes s and t are added for the maximum-flow computation. Based on the efficient graph construction originally proposed in [13] for the α-expansion, a graphical illustration of the graph topography for the proposed α knn - expansion move as well as the capacities on edges are described in figure 3. Once the graph is constructed, the maximum-flow is computed in a polynomial time due to the sub-modular property of the proposed energy function [13]. Indeed, for a binary expansion move, the sub-modularity constraint on the energy function is verified for any likelihood terms and metric prior terms [4]. This is the case of (6), since we are using a metric as a priori to compute distances between optimal bases. The efficient polynomial time maximumflow algorithm, proposed by [13] 1, is well suited to gridstructured graphs, and thus to our problem. Additionally, minimizing (6) with the proposed α knn -expansion algorithm is fast, since expansion moves are restricted to active sites only Coding and pooling Once optimal bases are selected, assigning a response to each one of the basis vector can be achieved with various strategies. We consider recent strategies in the literature, namely hard responses [15], salient responses [12] and soft responses either by solving a linear system [30] or by com- 1 http : //pub.ist.ac.at/ vnk/software.html t (a) E s p p q r E p t E p q t (b) E s q E q t E s p E p (y p ) + E p,q (y p, y q ) E p t E p (α) + E p,q (α, y q ) E p q E p,q (y p, α) + E p,q (α, y q ) E p,q (y p, y q ) E s q E p,q (α, y q ) + E q (y q ) + E q,r (y q, y r ) E q t E q (α) + E q,r (α, y r ) Figure 3. Illustrative example of the graph construction on an image with 4 4 sites. (a) Graph topography to compute one optimal move of the α knn -expansion algorithm for a given label α. Active sites (the gray ones) are connected to the source and sink nodes, whereas black points are non-active sites at the current iteration; (b) detailed construction for two possible neighboring configurations: active-active and active-nonactive with corresponding edge capacities depicted in the table. puting a posteriori probabilities of local features belonging to the selected optimal bases [20]. For each image, the obtained codes are then aggregated with a max-pooling operation, following recent works [30, 12, 20], resulting in a unique and compact signature vector. 4. Experimental results In this section the proposed approach is tested for scene classification and object recognition tasks and evaluated on three well known benchmarks, intensively used in the literature [15, 30, 12, 20]: UIUC 8-sport [17], 15-natural scenes [15] and Caltech-101 object categories [9] Pipeline In all the experiments we conduct, the same processing chain is considered following the literature settings to ensure consistency. The pipeline is as follows: dense local features (SIFTs) of size 128 are extracted within a regular spatial grid, and only one scale from images downsized to no more than pixels (resp pixels) for 15-natural scenes (resp. UIUC 8-sport). The step-size is fixed to 4 pixels and patch size to pixels; a codebook of size 1024 is created using the K-means clustering method on a randomly selected subset of SIFTs belonging to the training dataset ( 10 5 SIFTs);

6 (a) (d) (b) (e) (c) (f) (g) Figure 4. Illustrative example of local basis selection required for coding: (a) original image, (b) visualization of extracted dense SIFTs using features similarity map for visualization only, obtained by setting the map elements p the method of [19], (c) a pairwise local h )2 + (w v )2 ; p q ; (p, q) P 2 } to zero under the threshold T = 0.7, (d) the conventional hard assignment (color map { (wp,q p,q corresponds to indexes of selected visual words from the codebook), (e) 3-nearest bases selected for each local feature as performed with LLC (we use a RGB color map to visualize the three visual words indexes for each local patch), spatially regularized (f) hard assignment and (g) LLC based assignment using the proposed approach. for coding we fix the step-size for patch extraction to 4 pixels and patch size to pixels. Even if the theoretical coding formalism conducted in section 3 concerns patches surrounding all the pixels, using a small step-size does not reduce the consistency of the coding, and accelerates its computation. Regarding the computation time required to retrieve the optimal bases, the choice of the two parameters m and k is crucial. In order to accelerate the convergence, we set the number of optimal bases retrieved for each local feature to m = 3 among the k = 10 nearest visual words. It results in a reduced number of possible vectorial labels αi N 3 to be visited within each cycle of the optimization algorithm. We observed empirically that enlarging the size of the set of retrieved optimal bases improves slightly the classification performances while augmenting the computational time. the max-pooling operation is performed (even with hard assignment codings) and the SPM [15] with 3 levels 1 1, 2 2 and 4 4 is adopted, leaving a same weight at each layer; a one-vs-all linear SVM classifier is used, since it has shown good performances in categorization when paired with the max-pooling operation [32]. Our method is integrated into several coding schemes, namely hard assignment coding (HC) [15], localityconstrained linear coding (LLC) [30], saliency coding (SC) [12] and localized soft assignment coding (LSC) [20], showing the improvement achieved when the local spatial context within images is included into the coding process. We have to mention that for some of the state-of-the-art works, results shown in the original papers are obtained using different pipelines than that we adopt here. For instance, sparse coding is used as an alternative to the K-means algorithm for learning the codebook in [30, 11], dense SIFTs are extracted with three scales in [30, 12], a mix-order maxpooling operation is applied in [20], and the number of the selected local bases varies from 5 to 10, etc. Therefore, in order to achieve fair comparisons to these works and obtain a coherent assessment, it was necessary to conduct all the experiments with the same pipeline and a common implementation. Since source codes of existing methods are not always available, we had to re-implement them and, as it often happens, our engineering choices led to minor performance differences compared to results reported in the original papers. Nevertheless, performances we obtained remain fully consistent with existing works. Our experimental approach, consisting in using a common implementation and pipeline to insure consistency, was also suggested by [5] UIUC 8-sport This dataset contains 8 sport categories for image-based event classification [17]. There are 1579 images. Each class has 137 to 250 images. Following the standard setting on this data set, we use 10 random splits of the data, we randomly select 70 training images and 60 test images for each category. In table 1, the classification accuracies resulting from the state-of-the-art coding approaches and the use of the proposed LCSR coding method are reported. As we can see, for all the coding strategies, adding the contextual spatial information improves the classification accuracy significantly. On the one hand, it improves the classic hard assignment method (HC) by reducing assignment errors due

7 codebook distortions. We note that classification accuracy with such a regularized hard assignment reaches the performances of recent locality based coding methods. On the other hand, it improves the optimal basis selection step required by the locally-constrained coding strategies (LLC, SLC and SC). We note that for the saliency based coding (SC), our approach extends the saliency property originally considered in the feature space only, to cope with the local spatial information. Hence, non-salient patches in the spatial domain of the image will be discarded at the maxpooling step. More discriminative features in term of classrelative salient visual words are obtained, which are well suited to linear classifiers. We shall indicate that the LCSR operation is computationally fast and so does not affect the computational time required for several coding methods. As an order of magnitude, for an image of size , leading to SIFTs, time required to select optimal 3 bases for each SIFT among the 10-nearest bases to each one is less than 1 second using a CPU with a frequency of 2.66 GHz. To compare our result to other recent works, we outperform the best classification rate (85.31%) of [11] with a different and highly computationally demanding pipeline. Classification accuracies (%) Coding method original approach using the LCSR HC [15] ± ± 1.68 LLC [30] ± ± 1.52 LSC [20] ± ± 1.56 SC [12] ± ± 1.14 Table 1. Classification accuracies on the UIUC 8-sport data set. Rock Climbing Badminton Bocce Croquet Polo Rowing Sailing Snowboardin Rock Climbing 95.8 Badminton 93.5 Bocce 64.8 Croquet 84.5 Polo 84.7 Rowing 90 Sailing 96.5 Figure 5. Confusion matrix for the UIUC 8-sport data set natural scenes This dataset [15] contains 4485 images of 15 scene categories, each one containing 200 to 400 images. Scenes vary from indoor to outdoor environments. Following the standard setup, we use 10 random splits of the data, while Snowboardin 88 considering 100 random images per class for training and the rest for testing. In table 2 classification accuracies using several approaches are provided. Similarly to the previous dataset, adding the spatial context into local feature coding always improves the performances by 2% on average. Classification accuracies (%) Coding method original approach using the LCSR HC [15] ± ± 0.5 LLC [30] ± ± 0.36 LSC [20] ± ± 0.51 SC [12] ± ± 0.5 Table 2. Classification accuracies on the 15-natural scenes data set Caltech-101 This dataset [9] has 101 object categories containing from 31 to 800 images each one. We use 10 random splits of the data, while considering 30 random images per class for training and the rest for testing and provide the average classification rates in table 3. In contrast to the two previous datasets, containing scene images, the current task rather deals with object recognition. Local spatial context is then less relevant for image understanding. Nevertheless, it leads to some improvement in classification accuracies. Indeed, the local spatial information may improve the coding step by reducing assignment errors that are mainly due to some artifacts characterizing images of the current dataset (often resulted from transformations synthetically made on images to evaluate the classification s robustness toward them). Classification accuracies (%) Coding method original approach using the LCSR HC [15] ± ± 0.52 LLC [30] ± ± 1.23 LSC [20] ± ± 0.81 SC [12] ± ± 0.75 Table 3. Classification accuracies on the Caltech-101 data set. 5. Conclusion In this paper, we presented a promising local feature encoding method that exploits interesting locality properties in both the features space and the spatial domain of the image. Results show that our contribution improves state-ofthe-art coding schemes, increasing the classification rates by 1% to 4% on UIUC 8-sport, 15-natural scenes and Caltech-101, using a standard experimental pipeline. Ongoing efforts are devoted to analyze the proposed method in case of multi-label and multi-instance recognition tasks, since incorporating the local spatial information into features would be beneficial to recognize multiple objects in a given image.

8 Acknowledgment: This work has been partially funded by I2S in the context of the project Polinum. We acknowledge support from the French ANR (Agence Nationale de la Recherche) via the YOJI (ANR-09-CORD-104) and PERIPLUS (ANR-10-CORD-026) projects. References [1] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. In ECCV, [2] Y. Boureau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-level features for recognition. In CVPR, , 2 [3] Y. Boureau, J. Ponce, and Y. LeCun. A theoretical analysis of feature pooling in vision algorithms. In ICML, [4] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. PAMI, , 4, 5 [5] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In BMVC, [6] A. Coates and A. Y. Ng. The importance of encoding versus training with sparse coding and vector quantization. In ICMA, [7] C. Cortes and V. Vapnik. Support-vector networks. Machine learning, [8] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, [9] L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVIU, , 5, 7 [10] L. R. Ford and D. R. Fulkerson. Maximal flow through a network. Classic papers in combinatorics, [11] S. Gao, I. Tsang, L. Chia, and P. Zhao. Local features are not lonely - Laplacian sparse coding for image classification. In CVPR, , 3, 6, 7 [12] Y. Huang, K. Huang, Y. Yu, and T. Tan. Salient coding for image classification. In CVPR, , 2, 3, 4, 5, 6, 7 [13] V. Kolmogorov and R. Zabih. What energy functions can be minimized via graph cuts? PAMI, [14] S. Lazebnik and M. Raginsky. Supervised learning of quantizer codebooks by information loss minimization. PAMI, , 2 [15] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, , 4, 5, 6, 7 [16] H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efficient sparse coding algorithms. NIPS, [17] L. J. Li and L. Fei-Fei. What, where and who? classifying events by scene and object recognition. In ICCV, , 5, 6 [18] S. Z. Li. Markov random field modeling in image analysis. Springer-Verlag New York Inc, [19] C. Liu, J. Yuen, and A. Torralba. Sift flow:dense correspondence across scenes and its applications. PAMI, [20] L. Liu, L. Wang, and X. Liu. In defense of softassignment coding. In ICCV, , 2, 4, 5, 6, 7 [21] D. G. Lowe. Distinctive image features from scaleinvariant keypoints. IJCV, [22] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman. Discriminative learned dictionaries for local image analysis. In CVPR, , 2 [23] B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research, [24] R. Rigamonti, M. A. Brown, and V. Lepetit. Are sparse representations really relevant for image classification? In CVPR, [25] R. E. Schapire. The strength of weak learnability. Machine learning, [26] J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In ICCV, , 2, 4 [27] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. PAMI, [28] J. van Gemert, C. Veenman, A. Smeulders, and J. Geusebroek. Visual word ambiguity. PAMI, , 2 [29] O. Veksler. Efficient Graph-Based Energy Minimization. PhD thesis, Cornell University, [30] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, , 2, 4, 5, 6, 7 [31] J. Wu and J. M. Rehg. Beyond the Euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In CVPR, [32] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR, , 6 [33] K. Yu, T. Zhang, and Y. Gong. Nonlinear learning using local coordinate coding. NIPS, , 3

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

arxiv: v1 [cs.lg] 20 Dec 2013

arxiv: v1 [cs.lg] 20 Dec 2013 Unsupervised Feature Learning by Deep Sparse Coding Yunlong He Koray Kavukcuoglu Yun Wang Arthur Szlam Yanjun Qi arxiv:1312.5783v1 [cs.lg] 20 Dec 2013 Abstract In this paper, we propose a new unsupervised

More information

Multiple Stage Residual Model for Accurate Image Classification

Multiple Stage Residual Model for Accurate Image Classification Multiple Stage Residual Model for Accurate Image Classification Song Bai, Xinggang Wang, Cong Yao, Xiang Bai Department of Electronics and Information Engineering Huazhong University of Science and Technology,

More information

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Low-Rank Sparse Coding for Image Classification

Low-Rank Sparse Coding for Image Classification 2013 IEEE International Conference on Computer Vision Low-Rank Sparse Coding for Image Classification Tianzhu Zhang 1,4,5, Bernard Ghanem 1,2,SiLiu 3, Changsheng Xu 4, Narendra Ahuja 1,5 1 Advanced Digital

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

Multi-Class Image Classification: Sparsity Does It Better

Multi-Class Image Classification: Sparsity Does It Better Multi-Class Image Classification: Sparsity Does It Better Sean Ryan Fanello 1,2, Nicoletta Noceti 2, Giorgio Metta 1 and Francesca Odone 2 1 Department of Robotics, Brain and Cognitive Sciences, Istituto

More information

Image-to-Class Distance Metric Learning for Image Classification

Image-to-Class Distance Metric Learning for Image Classification Image-to-Class Distance Metric Learning for Image Classification Zhengxiang Wang, Yiqun Hu, and Liang-Tien Chia Center for Multimedia and Network Technology, School of Computer Engineering Nanyang Technological

More information

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,

More information

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task S. Avila 1,2, N. Thome 1, M. Cord 1, E. Valle 3, and A. de A. Araújo 2 1 Pierre and Marie Curie University, UPMC-Sorbonne Universities, LIP6, France

More information

Codebook Graph Coding of Descriptors

Codebook Graph Coding of Descriptors Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'5 3 Codebook Graph Coding of Descriptors Tetsuya Yoshida and Yuu Yamada Graduate School of Humanities and Science, Nara Women s University, Nara,

More information

Sketchable Histograms of Oriented Gradients for Object Detection

Sketchable Histograms of Oriented Gradients for Object Detection Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The

More information

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. CSE 455 Linda Shapiro Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

Generalized Lasso based Approximation of Sparse Coding for Visual Recognition Generalized Lasso based Approximation of Sparse Coding for Visual Recognition Nobuyuki Morioka The University of New South Wales & NICTA Sydney, Australia nmorioka@cse.unsw.edu.au Shin ichi Satoh National

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Learning Compact Visual Attributes for Large-scale Image Classification

Learning Compact Visual Attributes for Large-scale Image Classification Learning Compact Visual Attributes for Large-scale Image Classification Yu Su and Frédéric Jurie GREYC CNRS UMR 6072, University of Caen Basse-Normandie, Caen, France {yu.su,frederic.jurie}@unicaen.fr

More information

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Liefeng Bo University of Washington Seattle WA 98195, USA Xiaofeng Ren ISTC-Pervasive Computing Intel Labs Seattle

More information

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Fuzzy based Multiple Dictionary Bag of Words for Image Classification Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image

More information

Mixtures of Gaussians and Advanced Feature Encoding

Mixtures of Gaussians and Advanced Feature Encoding Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why

More information

Artistic ideation based on computer vision methods

Artistic ideation based on computer vision methods Journal of Theoretical and Applied Computer Science Vol. 6, No. 2, 2012, pp. 72 78 ISSN 2299-2634 http://www.jtacs.org Artistic ideation based on computer vision methods Ferran Reverter, Pilar Rosado,

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

A 2-D Histogram Representation of Images for Pooling

A 2-D Histogram Representation of Images for Pooling A 2-D Histogram Representation of Images for Pooling Xinnan YU and Yu-Jin ZHANG Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China ABSTRACT Designing a suitable image representation

More information

Bilevel Visual Words Coding for Image Classification

Bilevel Visual Words Coding for Image Classification Bilevel Visual Words Coding for Image Classification Jiemi Zhang, Chenxia Wu, Deng Cai and Jianke Zhu The State Key Lab of CAD&CG, College of Computer Science, Zhejiang University, China jmzhang10@gmail.com,

More information

Learning-based Methods in Vision

Learning-based Methods in Vision Learning-based Methods in Vision 16-824 Sparsity and Deep Learning Motivation Multitude of hand-designed features currently in use in vision - SIFT, HoG, LBP, MSER, etc. Even the best approaches, just

More information

Beyond Bags of features Spatial information & Shape models

Beyond Bags of features Spatial information & Shape models Beyond Bags of features Spatial information & Shape models Jana Kosecka Many slides adapted from S. Lazebnik, FeiFei Li, Rob Fergus, and Antonio Torralba Detection, recognition (so far )! Bags of features

More information

Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features

Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features Yangqing Jia UC Berkeley EECS jiayq@berkeley.edu Chang Huang NEC Labs America chuang@sv.nec-labs.com Abstract We examine the

More information

Learning Tree-structured Descriptor Quantizers for Image Categorization

Learning Tree-structured Descriptor Quantizers for Image Categorization KRAPAC, VERBEEK, JURIE: TREES AS QUANTIZERS FOR IMAGE CATEGORIZATION 1 Learning Tree-structured Descriptor Quantizers for Image Categorization Josip Krapac 1 http://lear.inrialpes.fr/~krapac Jakob Verbeek

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

The devil is in the details: an evaluation of recent feature encoding methods

The devil is in the details: an evaluation of recent feature encoding methods CHATFIELD et al.: THE DEVIL IS IN THE DETAILS 1 The devil is in the details: an evaluation of recent feature encoding methods Ken Chatfield http://www.robots.ox.ac.uk/~ken Victor Lempitsky http://www.robots.ox.ac.uk/~vilem

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Sparse coding for image classification

Sparse coding for image classification Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction

More information

Discriminative classifiers for image recognition

Discriminative classifiers for image recognition Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

CS 231A Computer Vision (Fall 2011) Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4 CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based

More information

Sparse Models in Image Understanding And Computer Vision

Sparse Models in Image Understanding And Computer Vision Sparse Models in Image Understanding And Computer Vision Jayaraman J. Thiagarajan Arizona State University Collaborators Prof. Andreas Spanias Karthikeyan Natesan Ramamurthy Sparsity Sparsity of a vector

More information

Sparse Coding and Dictionary Learning for Image Analysis

Sparse Coding and Dictionary Learning for Image Analysis Sparse Coding and Dictionary Learning for Image Analysis Part IV: Recent Advances in Computer Vision and New Models Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro CVPR 10 tutorial, San Francisco,

More information

Exploring Bag of Words Architectures in the Facial Expression Domain

Exploring Bag of Words Architectures in the Facial Expression Domain Exploring Bag of Words Architectures in the Facial Expression Domain Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett Machine Perception Laboratory, University of California San Diego {ksikka,ting,josh,marni}@mplab.ucsd.edu

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Scene Recognition using Bag-of-Words

Scene Recognition using Bag-of-Words Scene Recognition using Bag-of-Words Sarthak Ahuja B.Tech Computer Science Indraprastha Institute of Information Technology Okhla, Delhi 110020 Email: sarthak12088@iiitd.ac.in Anchita Goel B.Tech Computer

More information

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin

REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY. Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin REJECTION-BASED CLASSIFICATION FOR ACTION RECOGNITION USING A SPATIO-TEMPORAL DICTIONARY Stefen Chan Wai Tim, Michele Rombaut, Denis Pellerin Univ. Grenoble Alpes, GIPSA-Lab, F-38000 Grenoble, France ABSTRACT

More information

Max-Margin Dictionary Learning for Multiclass Image Categorization

Max-Margin Dictionary Learning for Multiclass Image Categorization Max-Margin Dictionary Learning for Multiclass Image Categorization Xiao-Chen Lian 1, Zhiwei Li 3, Bao-Liang Lu 1,2, and Lei Zhang 3 1 Dept. of Computer Science and Engineering, Shanghai Jiao Tong University,

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

Computer Vision. Exercise Session 10 Image Categorization

Computer Vision. Exercise Session 10 Image Categorization Computer Vision Exercise Session 10 Image Categorization Object Categorization Task Description Given a small number of training images of a category, recognize a-priori unknown instances of that category

More information

Spatial Latent Dirichlet Allocation

Spatial Latent Dirichlet Allocation Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features

Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features Supplementary material: Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features Sakrapee Paisitkriangkrai, Chunhua Shen, Anton van den Hengel The University of Adelaide,

More information

Bag-of-features. Cordelia Schmid

Bag-of-features. Cordelia Schmid Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image

More information

Local Features and Bag of Words Models

Local Features and Bag of Words Models 10/14/11 Local Features and Bag of Words Models Computer Vision CS 143, Brown James Hays Slides from Svetlana Lazebnik, Derek Hoiem, Antonio Torralba, David Lowe, Fei Fei Li and others Computer Engineering

More information

Improving Scene Classification by Fusion of Training Data and Web Resources

Improving Scene Classification by Fusion of Training Data and Web Resources 18th International Conference on Information Fusion Washington, DC - July 6-9, 2015 Improving Scene Classification by Fusion of Training Data and Web Resources Dongzhe Wang, and Kezhi Mao School of Electrical

More information

Supervised texture detection in images

Supervised texture detection in images Supervised texture detection in images Branislav Mičušík and Allan Hanbury Pattern Recognition and Image Processing Group, Institute of Computer Aided Automation, Vienna University of Technology Favoritenstraße

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

Combining Selective Search Segmentation and Random Forest for Image Classification

Combining Selective Search Segmentation and Random Forest for Image Classification Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

III. VERVIEW OF THE METHODS

III. VERVIEW OF THE METHODS An Analytical Study of SIFT and SURF in Image Registration Vivek Kumar Gupta, Kanchan Cecil Department of Electronics & Telecommunication, Jabalpur engineering college, Jabalpur, India comparing the distance

More information

A Keypoint Descriptor Inspired by Retinal Computation

A Keypoint Descriptor Inspired by Retinal Computation A Keypoint Descriptor Inspired by Retinal Computation Bongsoo Suh, Sungjoon Choi, Han Lee Stanford University {bssuh,sungjoonchoi,hanlee}@stanford.edu Abstract. The main goal of our project is to implement

More information

Patch Descriptors. EE/CSE 576 Linda Shapiro

Patch Descriptors. EE/CSE 576 Linda Shapiro Patch Descriptors EE/CSE 576 Linda Shapiro 1 How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

Bag of Features Model Using the New Approaches: A Comprehensive Study

Bag of Features Model Using the New Approaches: A Comprehensive Study Bag of Features Model Using the New Approaches: A Comprehensive Study CHOUGRAD Hiba, ZOUAKI Hamid LIMA, Laboratoire d Informatique et Mathématique et leurs Applications, CHOUAIB DOUKKALI University El

More information

Improved Spatial Pyramid Matching for Image Classification

Improved Spatial Pyramid Matching for Image Classification Improved Spatial Pyramid Matching for Image Classification Mohammad Shahiduzzaman, Dengsheng Zhang, and Guojun Lu Gippsland School of IT, Monash University, Australia {Shahid.Zaman,Dengsheng.Zhang,Guojun.Lu}@monash.edu

More information

Local features and image matching. Prof. Xin Yang HUST

Local features and image matching. Prof. Xin Yang HUST Local features and image matching Prof. Xin Yang HUST Last time RANSAC for robust geometric transformation estimation Translation, Affine, Homography Image warping Given a 2D transformation T and a source

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Spatial Locality-Aware Sparse Coding and Dictionary Learning

Spatial Locality-Aware Sparse Coding and Dictionary Learning JMLR: Workshop and Conference Proceedings 25:491 505, 2012 Asian Conference on Machine Learning Spatial Locality-Aware Sparse Coding and Dictionary Learning Jiang Wang 2145 Sheridan Road, Evanston IL 60208

More information

Computer Vision I - Filtering and Feature detection

Computer Vision I - Filtering and Feature detection Computer Vision I - Filtering and Feature detection Carsten Rother 30/10/2015 Computer Vision I: Basics of Image Processing Roadmap: Basics of Digital Image Processing Computer Vision I: Basics of Image

More information

Robotics Programming Laboratory

Robotics Programming Laboratory Chair of Software Engineering Robotics Programming Laboratory Bertrand Meyer Jiwon Shin Lecture 8: Robot Perception Perception http://pascallin.ecs.soton.ac.uk/challenges/voc/databases.html#caltech car

More information

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce Object Recognition Computer Vision Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce How many visual object categories are there? Biederman 1987 ANIMALS PLANTS OBJECTS

More information

Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification?

Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification? Supplementary material for the paper Are Sparse Representations Really Relevant for Image Classification? Roberto Rigamonti, Matthew A. Brown, Vincent Lepetit CVLab, EPFL Lausanne, Switzerland firstname.lastname@epfl.ch

More information

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING

IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING IMAGE DENOISING USING NL-MEANS VIA SMOOTH PATCH ORDERING Idan Ram, Michael Elad and Israel Cohen Department of Electrical Engineering Department of Computer Science Technion - Israel Institute of Technology

More information

Image Resizing Based on Gradient Vector Flow Analysis

Image Resizing Based on Gradient Vector Flow Analysis Image Resizing Based on Gradient Vector Flow Analysis Sebastiano Battiato battiato@dmi.unict.it Giovanni Puglisi puglisi@dmi.unict.it Giovanni Maria Farinella gfarinellao@dmi.unict.it Daniele Ravì rav@dmi.unict.it

More information

By Suren Manvelyan,

By Suren Manvelyan, By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,

More information

Supervised learning. y = f(x) function

Supervised learning. y = f(x) function Supervised learning y = f(x) output prediction function Image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the

More information

Ask the locals: multi-way local pooling for image recognition

Ask the locals: multi-way local pooling for image recognition Ask the locals: multi-way local pooling for image recognition Y-Lan Boureau, Nicolas Le Roux, Francis Bach, Jean Ponce, Yann Lecun To cite this version: Y-Lan Boureau, Nicolas Le Roux, Francis Bach, Jean

More information

MULTI-REGION SEGMENTATION

MULTI-REGION SEGMENTATION MULTI-REGION SEGMENTATION USING GRAPH-CUTS Johannes Ulén Abstract This project deals with multi-region segmenation using graph-cuts and is mainly based on a paper by Delong and Boykov [1]. The difference

More information

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015 Vector visual representation Fixed-size image representation High-dim (100 100,000) Generic, unsupervised: BoW,

More information

Local Image Features

Local Image Features Local Image Features Ali Borji UWM Many slides from James Hayes, Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial Overview of Keypoint Matching 1. Find a set of distinctive key- points A 1 A 2 A 3 B 3

More information

Multiple Kernel Learning for Emotion Recognition in the Wild

Multiple Kernel Learning for Emotion Recognition in the Wild Multiple Kernel Learning for Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort and Marian S. Bartlett Machine Perception Laboratory UCSD EmotiW Challenge,

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION

CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 4, 2015 ISSN 2286-3540 CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION Ionuţ Mironică 1, Bogdan Ionescu 2, Radu Dogaru 3 In this paper we propose

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space. Visual words Map high-dimensional descriptors to tokens/words by quantizing the feature space. Quantize via clustering; cluster centers are the visual words Word #2 Descriptor feature space Assign word

More information

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention Anand K. Hase, Baisa L. Gunjal Abstract In the real world applications such as landmark search, copy protection, fake image

More information

SPM-BP: Sped-up PatchMatch Belief Propagation for Continuous MRFs. Yu Li, Dongbo Min, Michael S. Brown, Minh N. Do, Jiangbo Lu

SPM-BP: Sped-up PatchMatch Belief Propagation for Continuous MRFs. Yu Li, Dongbo Min, Michael S. Brown, Minh N. Do, Jiangbo Lu SPM-BP: Sped-up PatchMatch Belief Propagation for Continuous MRFs Yu Li, Dongbo Min, Michael S. Brown, Minh N. Do, Jiangbo Lu Discrete Pixel-Labeling Optimization on MRF 2/37 Many computer vision tasks

More information

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Efficient Kernels for Identifying Unbounded-Order Spatial Features Efficient Kernels for Identifying Unbounded-Order Spatial Features Yimeng Zhang Carnegie Mellon University yimengz@andrew.cmu.edu Tsuhan Chen Cornell University tsuhan@ece.cornell.edu Abstract Higher order

More information

arxiv: v1 [cs.cv] 2 Sep 2013

arxiv: v1 [cs.cv] 2 Sep 2013 A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification Xiaojiang Peng 1,2, Qiang Peng 1, Yu Qiao 2, Junzhou Chen 1, and Mehtab Afzal 1 1 Southwest Jiaotong University,

More information

Markov Random Fields and Segmentation with Graph Cuts

Markov Random Fields and Segmentation with Graph Cuts Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out

More information

NTHU Rain Removal Project

NTHU Rain Removal Project People NTHU Rain Removal Project Networked Video Lab, National Tsing Hua University, Hsinchu, Taiwan Li-Wei Kang, Institute of Information Science, Academia Sinica, Taipei, Taiwan Chia-Wen Lin *, Department

More information

OBJECT CATEGORIZATION

OBJECT CATEGORIZATION OBJECT CATEGORIZATION Ing. Lorenzo Seidenari e-mail: seidenari@dsi.unifi.it Slides: Ing. Lamberto Ballan November 18th, 2009 What is an Object? Merriam-Webster Definition: Something material that may be

More information

UNSUPERVISED CO-SEGMENTATION BASED ON A NEW GLOBAL GMM CONSTRAINT IN MRF. Hongkai Yu, Min Xian, and Xiaojun Qi

UNSUPERVISED CO-SEGMENTATION BASED ON A NEW GLOBAL GMM CONSTRAINT IN MRF. Hongkai Yu, Min Xian, and Xiaojun Qi UNSUPERVISED CO-SEGMENTATION BASED ON A NEW GLOBAL GMM CONSTRAINT IN MRF Hongkai Yu, Min Xian, and Xiaojun Qi Department of Computer Science, Utah State University, Logan, UT 84322-4205 hongkai.yu@aggiemail.usu.edu,

More information

Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding

Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding 2014 22nd International Conference on Pattern Recognition Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding Wai Lam Hoo, Tae-Kyun Kim, Yuru Pei and Chee Seng Chan Center of

More information

A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-dimensional Visual Data

A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-dimensional Visual Data 2013 IEEE International Conference on Computer Vision A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-dimensional Visual Data Lingqiao Liu CECS, Australian

More information

Recent Progress on Object Classification and Detection

Recent Progress on Object Classification and Detection Recent Progress on Object Classification and Detection Tieniu Tan, Yongzhen Huang, and Junge Zhang Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition

More information

Visual Object Recognition

Visual Object Recognition Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 & Kristen Grauman Department

More information

Learning based face hallucination techniques: A survey

Learning based face hallucination techniques: A survey Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Multiple VLAD encoding of CNNs for image classification

Multiple VLAD encoding of CNNs for image classification Multiple VLAD encoding of CNNs for image classification Qing Li, Qiang Peng, Chuan Yan 1 arxiv:1707.00058v1 [cs.cv] 30 Jun 2017 Abstract Despite the effectiveness of convolutional neural networks (CNNs)

More information

Modeling Visual Cortex V4 in Naturalistic Conditions with Invari. Representations

Modeling Visual Cortex V4 in Naturalistic Conditions with Invari. Representations Modeling Visual Cortex V4 in Naturalistic Conditions with Invariant and Sparse Image Representations Bin Yu Departments of Statistics and EECS University of California at Berkeley Rutgers University, May

More information

Ordinal Random Forests for Object Detection

Ordinal Random Forests for Object Detection Ordinal Random Forests for Object Detection Samuel Schulter, Peter M. Roth, Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology, Austria {schulter,pmroth,bischof}@icg.tugraz.at

More information

Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness

Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness Xiaodong Yang and YingLi Tian Department of Electrical Engineering City College, City University of New York Abstract.

More information