Image classification based on improved VLAD

Size: px
Start display at page:

Download "Image classification based on improved VLAD"

Transcription

1 DOI /s Image classification based on improved VLAD Xianzhong Long Hongtao Lu Yong Peng Xianzhong Wang Shaokun Feng Received: 25 August 2014 / Revised: 22 December 2014 / Accepted: 18 February 2015 Springer Science+Business Media New York 2015 Abstract Recently, a coding scheme called vector of locally aggregated descriptors (VLAD) has got tremendous successes in large scale image retrieval due to its efficiency of compact representation. VLAD employs only the nearest neighbor visual word in dictionary to aggregate each descriptor feature. It has fast retrieval speed and high retrieval accuracy under small dictionary size. In this paper, we give three improved VLAD variations for image classification: first, similar to the bag of words (BoW) model, we count the number of descriptors belonging to each cluster center and add it to VLAD; second, in order to expand the impact of residuals, squared residuals are taken into account; thirdly, in contrast with one nearest neighbor visual word, we try to look for two nearest neighbor visual words for aggregating each descriptor. Experimental results on UIUC Sports Event, Corel 10 and 15 Scenes datasets show that the proposed methods outperform some state-of-the-art coding schemes in terms of the classification accuracy and computation speed. X. Long ( ) School of Computer Science & Technology, School of Software, Nanjing University of Posts and Telecommunications, Nanjing, , China lxz@njupt.edu.cn H. Lu Y. Peng X. Wang S. Feng Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, , China H. Lu htlu@sjtu.edu.cn Y. Peng pengyong851012@sjtu.edu.cn X. Wang wxz2453@sjtu.edu.cn S. Feng superkkking@sjtu.edu.cn

2 Keywords Image classification Scale-invariant feature transform Vector of locally aggregated descriptors K-means clustering algorithm 1 Introduction As one of the most important and challenging tasks in computer vision and pattern recognition fields, image classification has recently got many attention. There are some benchmark datasets used to evaluate the classification performance of image classification algorithms, for example, UIUC sports event [23], Corel 10 [26], 15 Scenes [21], Caltech 101 [10] and Caltech 256 [14], etc. Many image classification models have recently been proposed, such as generative models [2, 22, 33], discriminative models [9, 18, 27, 39] and hybrid generative/discriminative models [3]. Generative model classifies images from the viewpoint of probability, it only depends on the data themselves and does not require training or learning parameters. In contrast, discriminative model solves classification problem from the nonprobabilistic perspective, which needs to train or learn parameters appeared in the classifier. Here, we only consider image classification based on discriminative model. In the discriminative models, the earliest bag of words (BoW) technique [35] won the greatest popularity and had a wide range of applications in the fields of image retrieval [31], video event detection [37] and image classification [6, 13]. However, the BoW representation does not possess enough descriptive capability because it is the histogram of the number of image descriptors assigned to each visual word and it ignores the spatial information of the image. To solve this problem, Spatial Pyramid Matching (SPM) model has been put forward in [21], which takes the spatial information of image into account. In fact, SPM is an extension of BoW model and has been proved to achieve better image classification accuracy than the latter [15, 36, 38]. In the image classification based on SPM model, there are five steps, i.e., local descriptor extraction, dictionary learning, feature coding, spatial pooling and classifier selection. Specifically, the commonly used local descriptors include Scale-Invariant Feature Transform (SIFT) [25], Histogram of oriented Gradients (HoG) [7], Affine Scale-Invariant Feature Transform (ASIFT) [28], Oriented Fast and Rotated BRIEF (ORB) [34], etc. After getting all images descriptors, vector quantization [21] or sparse coding [38] is utilized to train a dictionary. In the feature coding phase, each image s descriptors matrix corresponds to a coefficient matrix generated by one different coding strategy. It is necessary to illustrate the principle of spatial pooling clearly because it dominates the whole image classification framework based on SPM model. During the spatial pooling period, an image is divided into increasingly finer subregions of L layers, with 2 l 2 l subregions at layer l, l = 0, 1,,L 1. A typical partition is three layers, i.e., L = 3. At layer 0, the image itself as a whole; at layer 1, the image is divided into four regions and at layer 2, each subregion of layer 1 is further divided into 4, resulting in 16 smaller subregions. This process generates a spatial pyramid of three layers with a total of 21 subregions. Then, spatial pyramid is combined with feature coding process and different pooling functions is exploited, i.e., sum pooling [21] and max pooling [36, 38]. Finally, the feature vectors of the 21 subregions are concatenated into a long feature vector for the whole image. The process mentioned above is the spatial pyramid representation of the image. The dimensionality of the new representation for each image is 21P (P is the dictionary size). It is noteworthy that when l = 0,

3 SPM reduces to the original BoW model. In the last step, classifiers such as Support Vector Machine (SVM) [5] or Adaptive Boosting (AdaBoost) [11] is applied to classify images. Over the past several years, a number of dictionary learning methods and feature coding strategies have been brought forward for image classification. In [6], as one vector quantization (VQ) technique, K-means clustering algorithm was used to generate dictionary, during the feature coding phase, each local descriptor was given a binary value that specified the cluster center which the local descriptor belonged to. This process is called BoW, which produces the histograms representation of visual words. However, this approach is likely to result in large reconstruction error because it limits the ability of representing descriptors. To address this problem, SPM based on sparse coding (ScSPM) method has been proposed in [38], which employed L 1 norm-based sparse coding scheme to substitute the previous K-means clustering method and to generate dictionary by learning randomly sampled SIFT feature vectors. During the feature coding period, ScSPM used sparse coding strategy to code each local descriptor. However, the computation speed of ScSPM is very slow when the dictionary size becomes large. In order to accelerate the computation and maintain high classification accuracy, locality-constrained linear coding (LLC) was put forward in [36], which gave an analytical solution for feature coding. Furthermore, several improved image classification schemes based SPM have also been suggested recently, such as spatial pyramid matching using Laplacian sparse coding [12], discriminative spatial pyramid [15], discriminative affine sparse codes [20], nearest neighbor basis vectors spatial pyramid matching (NNBVSPM) [24], etc. How to find some efficient feature coding strategies is becoming an urgent research direction. In the field of pattern recognition, Fisher vector (FV) technique has been used for image classification [4, 19, 29, 30]. FV is a strong framework which combines the advantages of generative and discriminative approaches. The key point of FV is to represent a signal using a gradient vector derived from a generative probability model and to subsequently input this representation to a discriminative classifier. Therefore, FV can be seen as one hybrid generative/discriminative model. The vector of locally aggregated descriptors (VLAD) can be viewed as a non-probabilistic version of the FV when the gradient only associates with the mean and replace gaussian mixture models (GMM) clustering by K-means. VLAD has been successfully applied to image retrieval [1, 8, 16, 17]. When some higher-order statistics are considered, researchers proposed another two coding methods, i.e., vectors of locally aggregated tensors (VLAT) [32] and super-vector (SV) [41]. The dimensionality of VLAT is P(D + D 2 ),wherethed is the dimension of each descriptor, the high dimensionality representation of VLAT can result in very large computation time. Besides, SV is based on probability viewpoint and it is still a generative model. Therefore, we do not consider the VLAT and SV feature coding algorithms. In this paper, we only concentrate on some image classification methods based on discriminative models, BoW, ScSPM, LLC and VLAD are selected to compare with our improved VLAD methods. In order to obtain stronger coding ability and improve the classification rate or speed, three improved VLAD versions for image classification are given in this paper. First, similar to the bag of words (BoW) model, we count the number of descriptors belonging to each cluster center and add it to VLAD. In this way, our improved VLAD method possesses the characteristics of BoW. Second, in order to expand the impact of residuals, squared residuals are added into the original VLAD. This makes the dimension of new representation is two times of the original. Thirdly, there are some descriptors which have nearly the same

4 distance to more than one visual words. Thus, these descriptors only assigned to the nearest visual word in original VLAD are not appropriate. In contrast with one nearest neighbor visual word, we try to look for two nearest neighbor visual words for aggregating each descriptor. The remainder of the paper is organized as follows: Section 2 introduces the basic idea of existing schemes. Our improved VLAD methods are presented in Section 3. In Section 4, the comparison results of image classification on three widely used datasets are reported. Finally, conclusions are made and some future research issues are discussed in Section 5. 2 Related work Let V be a set of D-dimensional local descriptors extracted from an image, i.e., V = [v 1, v 2,, v M ] R D M. Given a dictionary with P entries, W =[w 1, w 2,, w P ] R D P, different feature coding schemes convert each descriptor into a P -dimensional code to generate the final image representation coefficient matrix H, i.e., H = [h 1, h 2,, h M ] R P M. Each column of V is a local descriptor corresponding to a coefficient, i.e., each column of H. 2.1 Bag of words (BoW) The BoW representation groups local descriptors. It first generates a dictionary W with P visual words usually obtained by K-means clustering algorithm. Each D dimension local descriptor from an image is then assigned to the closest center. The BoW representation is obtained as the histogram of the assignment of all image descriptors to visual words. Therefore, it produces a P -dimensional vector representation, the sum of the elements in this vector equals the number of descriptors in each image. However, the BoW model does not consider the spatial structure information of image and has large reconstruction error, its ability to image classification is restricted [6]. 2.2 Sparse coding spatial pyramid matching (ScSPM) In ScSPM [38], by using sparse coding in place of vector quantization followed by multilayer spatial max pooling, the authors developed an extension of the traditional SPM method [21] and presented a linear SPM kernel based on SIFT sparse coding. In the process of image classification, ScSPM solved the following optimization problem: min W,H i=1 M v i Wh i λ h i 1 (1) where. 2 denotes the L 2 norm of a vector, i.e., the square root of sum of the vector entries squares,. 1 is the L 1 norm of a vector, i.e., the sum of the absolute values of the vector entries. The parameter λ is used to control the sparsity of the solution of formula (1), the bigger λ is, the more sparse the solution will be. Experimental results in [38] demonstrated that linear SPM based on sparse coding of SIFT descriptors significantly outperformed the linear SPM kernel on histograms and was even better than the nonlinear SPM

5 kernels. Nevertheless, utilizing sparse coding to learn dictionary and to encode features are time consuming, especially for large scale image dataset or large dictionary. 2.3 Locality-constrained linear coding (LLC) In LLC [36], inspired by the viewpoint of [40] which illustrated that locality was more important than sparsity, the authors generalized the sparse coding to locality-constrained linear coding and suggested a locality constraint instead of the sparsity constraint in the formula (1). With respect to LLC, the following optimization problem was solved: min H M v i Wh i λ d i h i 2 2 i=1 s.t. 1 T h i = 1, i (2) where 1 = (1, 1,, 1) T, denotes the element-wise multiplication, and d i R P is a weight vector. In addition, each coefficient vector h i is normalized in terms of 1 T h i = 1. Experimental results in [36] showedthatthe LLCoutperformed ScSPMonsome benchmark datasets due to its excellent properties, i.e., better reconstruction, local smooth sparsity and analytical solution. 2.4 Vector of locally aggregated descriptors (VLAD) VLAD representation was proposed in [16] for image retrieval. V =[v 1, v 2,, v M ] R D M represents a descriptor set extracted from an image. Like the BoW, a dictionary W = [w 1, w 2,, w P ] R D P is first learned using K-means. Then, for each local descriptor v i, we look for its nearest neighbor visual word NN(v i ) in the dictionary. Finally, for each visual word w j, the differences v i w j of the vectors v i assigned to w j are accumulated. C =[c T 1, ct 2,, ct P ]T R PD (c j R D,j = 1, 2,,P) is the final vector representation of VLAD, which can be obtained according to the following formula. c j = v i :NN(v i )=w j (v i w j ) (3) The VLAD representation is the concatenation of the D dimensional vectors c j and is therefore PD dimension, where P is the dictionary size. Algorithm 1 gives the VLAD coding process. Like the Fisher vector, the VLAD can then be power- and L 2 -normalized sequently, where the parameter α is empirically set to 0.5. It is worth noting that there are no SPM and pooling process in the VLAD coding algorithm. The existing experiments have proved that VLAD is an efficient feature coding method under small dictionary size. 3 Improved VLAD In this section, three improved VLAD methods are presented. They are named as VLAD based on BoW, Magnified VLAD and Two Nearest Neighbor VLAD respectively. The same

6 as VLAD, the improved VLAD representations can also be power- and L 2 -normalized, where the parameter α is empirically set to VLAD based on BoW Inspired by the BoW, we count the number of descriptors belonging to each cluster w j (j = 1,,P) and add it to VLAD. This improved VLAD method is called VLAD based on BoW (abbreviated as: VLAD+BoW). Therefore, the dimensionality of VLAD+BoW representation is P(D + 1), and the extra one dimension is used to store the BoW representation. After integrating the histogram information of visual words into the VLAD, we hope that VLAD+BoW can possess the characteristics of BoW and improve the classification performance. The VLAD+BoW is presented in Algorithm Magnified VLAD In order to magnify the impact of residuals, squared residuals are taken into account. This improved version is called Magnified VLAD (abbreviated as: MVLAD) and its dimension is 2PD. The computation of MVLAD is given in Algorithm 3.

7 3.3 Two nearest neighbor VLAD In addition to a nearest neighbor center, we attempt to seek a second nearest neighbor center for each descriptor. This process is referred to two nearest neighbor VLAD (abbreviated as: TNNVLAD). The dimension of TNNVLAD representation is still PD. TNNVLAD is a kind of soft coding method and it can reduce representation error. The specific details are showed in Algorithm 4. If d 1 >βd 2, the 0.5 times differences between v i and its two nearest neighbor centers are accumulated. The value of β can be obtained according to our experiments. 4 Experimental results This section begins with an illustration of our experiments setting which is followed by comparisons between our schemes with other prominent methods on three datasets, i.e., UIUC Sports Event, Corel 10 and 15 Scenes. Figure 1 shows example images of these datasets.

8 4.1 Experiments setting A typical experiments setting for classifying images mainly contains four steps. First of all, we adopt the widely used SIFT descriptor [25] due to its good performance in image classification reported in [12, 21, 36, 38]. Specifically speaking, SIFT features are invariant to image scale and rotation and robust across a substantial range of affine distortion, addition of noise, and change in illumination. To be consistent with previous work, we also draw on the same setting to extract SIFT descriptor. We employ the 128-dimensional SIFT descriptor which are densely extracted from image patches on a grid with step size of 8 pixels under one patch size, i.e., We resize the maximum side (i.e., length or width) of each image to 300 pixels except for UIUC Sports Event. For UIUC Sports Event dataset, we resize the maximum side to 400 because of the high resolution of original images. Next, about twenty descriptors from each image are chosen at random to form a new matrix which is taken as an input of K-means clustering or sparse coding algorithm, and we then learn a dictionary of specified size. In the third step, we then exploit BoW, sparse coding, LLC, VLAD and improved VLAD schemes to encode the descriptors and produce image s new representation. For the BoW model, the dimensionality of the new representation is dictionary size P. In the ScSPM and LLC, we combined three layers spatial pyramid matching model (including 21 subregions) with max pooling function, thus, the dimension of the new representation is 21P. The dimensionality for the VLAD and the improved VLAD methods can be found from the Algorithms 1-4. At the final step, we apply linear SVM classifier

9 Fig. 1 Image examples of the datasets UIUC Sports Event (the left four), Corel 10 (the middle four), and 15 Scenes (the right four)

10 for the new representations, randomly selecting some columns per class to train and some other columns per class to test. Then, it is not difficult for us to get a classification accuracy for each category by comparing the obtained label of test set with the ground-truth label of test set. Eventually, we sum up classification accuracy of all categories and divide it by the number of categories to get the classification accuracy of all categories. All the results are obtained by repeating five independent experiments, and the average classification accuracy and the standard deviation over five experiments are reported. All the experiments are conducted in MATLAB, which is executed on a server with an Intel X5650 CPU (2.66GHz and 12 cores) and 32GB RAM. For the TNNVLAD algorithm, Fig. 2 gives the choice process of parameter β on three different datasets. Specifically speaking, Fig. 2 shows the classification accuracy of our TNNVLAD method when β changes in an interval [0.1, 1] where the dictionary size is 130. The experimental results presented in Fig. 2 indicate that β = 0.8 is the best choice for TNNVLAD. Therefore, in our experiments, we fix β = 0.8 in TNNVLAD algorithm. 4.2 UIUC sports event dataset UIUC Sports Event [23] contains 8 categories and 1579 images in total, with the number of images within each category ranging from 137 to 250. These 8 categories are badminton, bocce, croquet, polo, rock climbing, rowing, sailing and snow boarding. In order to compare with other methods, we first randomly select 70 images per class as training data and randomly select 60 images from each class as test data. We compare the classification accuracy of our three improved VLAD schemes with other four methods under different dictionary UIUC Sports Event 76 Corel Scenes beta Fig. 2 Classification accuracy of our TNNVLAD algorithm under different β on the UIUC Sports Event, Corel 10 and 15 Scenes datasets

11 BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 3 Classification accuracy comparisons of various coding methods under different dictionary size on the UIUC Sports Event dataset size in Fig. 3, where the dictionary size ranges from 10 to 420 and the step length is 10. From the results presented in Fig. 3, we notice that the classification accuracy of our methods surpass all the other algorithms when the dictionary size is small and are comparable to the existing schemes when the dictionary size becomes large. This phenomenon may be explained for the fact that the goal of VLAD is for aggregating local image descriptors into compact codes. VLAD can obtain good performance in the case of small dictionary size. Besides, we can know the results from the Fig. 3 that the performance of BoW is the lowest and ScSPM is better than BoW, yet, the classification accuracy of LLC is further better than ScSPM, these observations are consistent with reports in the existing literature sources. BasedonFig.3, we list the best classification accuracy of various approaches in Table 1, where the average classification accuracy, standard deviation and corresponding dictionary Table 1 The best classification accuracy comparisons on the UIUC Sports Event dataset (mean±std-dev)% Algorithm Classification Accuracy (Dictionary Size) BoW [6] ± 0.85 (390) ScSPM [38] ± 2.20 (400) LLC [36] ± 1.36 (330) VLAD [17] ± 2.67 (220) VLAD+BoW ± 0.87 (210) MVLAD ± 1.85 (220) TNNVLAD ± 1.26 (220)

12 The Confusion Matrix of VLAD+BoW algorithm on UIUC Sports Event (%) badminton bocce croquet polo rockclimbing rowing sailing snowboarding badminton bocce croquet polo rockclimbing rowing sailing snowboarding The Confusion Matrix of MVLAD algorithm on UIUC Sports Event (%) badminton bocce croquet polo rockclimbing rowing sailing snowboarding badminton badminton bocce croquet polo rockclimbing rowing sailing snowboarding The Confusion Matrix of TNNVLAD algorithm on UIUC Sports Event (%) bocce croquet polo rockclimbing rowing sailing snowboarding badminton bocce croquet polo rockclimbing rowing sailing snowboarding Fig. 4 Confusion Matrices of our algorithms on UIUC Sports Event dataset

13 size are given. From Table 1, we can draw the conclusion that the best classification accuracy of our three improved methods are better than those of the other four schemes on the UIUC Sports Event dataset. Our VLAD+BoW and TNNVLAD methods achieve more than 1 % higher accuracy than LLC, which is the state-of-the-art and is based on SPM model. Furthermore, the original VLAD and improved VLAD can get the best classification accuracy under small dictionary size, but the BoW, ScSPM and LLC obtain their highest classification accuracy needing large dictionary size. Moreover, the confusion matrices of our algorithms for UIUC Sports Event dataset are showninfig.4. In the process of obtaining confusion matrices, the dictionary size is set to 130 in our three improved VLAD methods. In the confusion matrices, the element in the i th row and j th column (i = j) is the percentage of images from class i that are misidentified as class j. Average classification accuracies of five independent experiments for individual classes are listed along the main diagonal. Figure 4 shows the classification and misclassification status for each individual class. Our algorithms perform well for class badminton and rock climbing. What is more, we also notice that the class bocce and croquet have a high percentage being classified wrongly, and this may result from that they are visually similar to each other. Balls in the class bocce and croquet have very similar appearance. To further demonstrate the superiority of our methods in running speed, the computation time comparisons of various approaches with different dictionary size on the UIUC Sports Event dataset is reported in Fig. 5. The computation time of all methods is the total time of five independent experiments and the corresponding unit is seconds. From Fig. 5, we can know that the computing speed of BoW method is the fastest due to its low dimensional representation. Meanwhile, we also observe that ScSPM algorithm is the slowest. This is because that sparse coding strategy is used to learn a dictionary and to encode features in ScSPM. To solve the optimization problem of minimizing the L 1 norm is very time-consuming. The computation time of VLAD and our three improved VLAD methods Computation Time (seconds) BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 5 Computation time comparisons of various coding methods under different dictionary size on the UIUC Sports Event dataset

14 BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 6 Classification accuracy comparisons of various coding methods under different dictionary size on the Corel 10 dataset are smaller than LLC. This experimental results show that our algorithms have a certain advantage on the computation time. 4.3 Corel 10 dataset Corel 10 [26] contains 10 categories and 100 images per category. These categories are beach, buildings, elephants, flowers, food, horses, mountains, owls, skiing and tigers. Like the setting of [12, 26], we randomly select 50 images from each class as training data and use the rest 50 images per class as test data. Similarly, classification accuracy comparisons of various coding methods under different dictionary size on the Corel 10 dataset are described in Fig. 6. We again see that our improved VLAD algorithms can obtain good performance when the dictionary size is small. According to Fig. 6, the best classification accuracy of different algorithms are reported in Table 2. From the results, we can see that the best classification accuracies of our three improved VLAD algorithms are better than those of the other four schemes on the Corel 10 Table 2 The best classification accuracy comparisons on the Corel 10 dataset (mean±std-dev)% Algorithm Classification Accuracy (Dictionary Size) BoW [6] ± 0.91 (340) ScSPM [38] ± 1.24 (340) LLC [36] ± 1.66 (380) VLAD [17] ± 1.47 (110) VLAD+BoW ± 0.48 (130) MVLAD ± (280) TNNVLAD ± 1.45 (130)

15 The Confusion Matrix of VLAD+BoW algorithm on Corel 10 (%) beach buildings elephants food 1 8 horses mountains owls skiing tiger beach buildings elephants food horses mountains owls skiing The Confusion Matrix of MVLAD algorithm on Corel 10 (%) tiger beach buildings 8 elephants food horses mountains owls skiing tiger beach buildings elephants food horses mountains owls skiing The Confusion Matrix of TNNVLAD algorithm on Corel 10 (%) tiger beach buildings elephants food horses mountains owls skiing tiger beach buildings elephants food horses mountains owls skiing tiger Fig. 7 Confusion Matrices of our algorithms on Corel 10 dataset

16 dataset. Moreover, all the algorithms based on VLAD obtain the best classification accuracy under small dictionary size, but the BoW, ScSPM and LLC get their best classification accuracy needing large dictionary size. Our TNNVLAD method has two percentage point higher than the other best method LLC. The confusion matrices for Corel 10 dataset are also given in Fig. 7. Our algorithms perform well for class flower and horse, and get poor performance on class mountain. Figure 8 gives the computation time comparisons of various coding methods under different dictionary size on Corel 10 dataset. ScSPM algorithm requires the most time than other six algorithms. Although MVLAD needs more time than BoW and LLC, but it still far less than ScSPM Scenes dataset The 15 Scenes dataset [21] contains 15 categories and 4485 images in total, with the number of images within each category ranging from 200 to 400. These 15 categories are bedroom, suburb, industrial, kitchen, living room, coast, forest, highway, inside city, mountain, open country, street, tall building, office and store. The image content is different, containing not only indoor scenes, like livingroom and store, but also outdoor sceneries, such as coast and forest etc. In order to compare with other methods, we randomly select 100 images per class as training data and use the rest as test data. Figure 9 gives the classification accuracy comparisons of various coding methods under different dictionary size on the 15 Scenes dataset. Algorithms based on VLAD cat get better performance than ScSPM and LLC when the dictionary size is small, but they become slightly lower than LLC when the dictionary size increases. Computation Time (seconds) BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 8 Computation time comparisons of various coding methods under different dictionary size on Corel 10 dataset

17 BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 9 Classification accuracy comparisons of various coding methods under different dictionary size on the 15 Scenes dataset On the basis of data in Fig. 9, the most prominent classification accuracy are presented in Table 3. For the 15 Scenes dataset, the best performance of our improved VLAD algorithms are comparable with or slightly lower than LLC and ScSPM. The confusion matrices for 15 Scenes dataset are shown in Fig. 10. Our algorithms perform well for class calsuburb and forest. Besides, we know that the class bedroom and living room have a high percentage being classified wrongly, meanwhile, the class kitchen and living room also have high misclassification rate, and these may result from that they are visually similar to each other. Figure 11 reports the computation time comparisons of various coding methods under different dictionary size on 15 Scenes dataset. ScSPM algorithm requires the most time than other six algorithms. Table 3 The best classification accuracy comparisons on the 15 Scenes dataset (mean±std-dev)% Algorithm Classification Accuracy (Dictionary Size) BoW [6] ± 0.61 (90) ScSPM [38] ± 0.70 (420) LLC [36] ± 1.02 (420) VLAD [17] ± 0.50 (400) VLAD+BoW ± 0.51 (280) MVLAD ± 0.50 (280) TNNVLAD ± 0.62 (400)

18 bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store The Confusion Matrix of VLAD+BoW algorithm on 15 Scenes (%) bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store The Confusion Matrix of MVLAD algorithm on 15 Scenes (%) bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store The Confusion Matrix of TNNVLAD algorithm on 15 Scenes (%) bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store Fig. 10 Confusion Matrices of our algorithms on 15 Scenes dataset

19 Computation Time (seconds) BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 11 Computation time comparisons of various coding methods under different dictionary size on the 15 Scenes dataset 5 Conclusion and future work In this paper, three feature coding schemes based on VLAD are proposed for image classification. We compare our schemes with some state-of-the-art methods, including BoW, ScSPM, LLC and VLAD. Experiments on different kinds of datasets (UIUC Sports Event dataset, Corel 10 dataset and 15 Scenes dataset) demonstrate that classification accuracy of our improved VLAD coding strategies are better than the previous four classical methods under small dictionary size. At the same time, it is noteworthy that our schemes are much faster than ScSPM because ScSPM algorithm needs more time to learn dictionary and code features using sparse coding strategy. In many cases, we need to consider the classification accuracy and classification speed simultaneously. In the future, we will try to find more efficient feature coding strategies and apply them to large scale image datasets. Acknowledgments This work is sponsored by NUPTSF (Grant No. NY214168), National Natural Science Foundation of China (Grant No , ), Shanghai Science and Technology Committee (Grant No ) and European Union Seventh Framework Programme (Grant No ). References 1. Arandjelovic R, Zisserman A (2013) All about vlad. In: IEEE conference on computer vision and pattern recognition, pp Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE conference on computer vision and pattern recognition, pp Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Int 30(4): Cinbis RG, Verbeek J, Schmid C (2012) Image categorization using fisher kernels of non-iid image models. In: IEEE conference on computer vision and pattern recognition, pp Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):

20 6. Csurka G, Dance CR, Fan LX, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1, p Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, vol 1, pp Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the vlad image representation. In: ACM international conference on Multimedia, pp Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Proc 15(12): Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comp Vision Image Underst 106 (1): Freund Y, Schapire R (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp Gao SH, Tsang IWH, Chia LT, Zhao PL (2010) Local features are not lonely laplacian sparse coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp Grauman K, Darrell T (2005) The pyramid match kernel: Discriminative classification with sets of image features. In: International conference on computer vision, vol 2, pp Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset 15. Harada T, Ushiku Y, Yamashita Y, Kuniyoshi Y (2011) Discriminative spatial pyramid. In: IEEE conference on computer vision and pattern recognition, pp Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE conference on computer vision and pattern recognition, pp Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Int 34(9): Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: International conference on computer vision, vol 1, pp Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE international conference on computer vision, pp Kulkarni N, Li BX (2011) Discriminative affine sparse codes for image classification. In: IEEE conference on computer vision and pattern recognition, pp Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE conference on computer vision and pattern recognition, vol 2, pp Li FF, Pietro P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE conference on computer vision and pattern recognition, vol 2, pp Li LJ, Li FF (2007) What, where and who? Classifying events by scene and object recognition. In: International conference on computer vision, pp Long X, Lu H, Li W (2012) Image classification based on nearest neighbor basis vectors. Multimed Tools Appl: Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2): Lu Z, Ip HHS (2009) Image categorization with spatial mismatch kernels. In: IEEE conference on computer vision and pattern recognition, pp Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. Advances in neural information processing systems Morel J, Yu G (2009) Asift: a new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2(2): Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition, pp Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision, pp Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, pp Picard D, Gosselin PH (2011) Improving image similarity with vectors of locally aggregated tensors. In: IEEE international conference on image processing, pp Quelhas P, Monay F, Odobez JM, Gatica-Perez D, Tuytelaars T, Van Gool L (2005) Modeling scenes with local descriptors and latent aspects. In: International conference on computer vision, vol 1, pp

21 34. Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: International conference on computer vision 35. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: International conference on computer vision, pp Wang JJ, Yang JC, Yu K, Lv FJ, Huang T, Gong YH (2010) Locality-constrained linear coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp Xu D, Chang S (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Int 30(11): Yang JC, Yu K, Gong YH, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp Yang L, Jin R, Sukthankar R, Jurie F (2008) Unifying discriminative visual codebook generation with classifier training for object category recognition. In: IEEE conference on computer vision and pattern recognition, pp Yu K, Zhang T, Gong YH (2009) Nonlinear learning using local coordinate coding. Adv Neural Inf Process Syst 22: Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European conference on computer vision, pp Xianzhong Long obtained his Ph.D. degree from Shanghai Jiao Tong University on June He received his B.S. degree from Henan Polytechnic University in 2007 and M.S. degree from Xihua University in 2010, both in computer science. Now, he is an assistant professor at Nanjing University of Posts and Telecommunications. His research interests are computer vision, machine learning and image processing, specifically on image classification, object recognition and clustering.

22 Hongtao Lu got his Ph.D. degree in Electronic Engineering from Southeast University, Nanjing, in After graduation he became a postdoctoral fellow in Department of Computer Science, Fudan University, Shanghai, China, where he spent two years. In 1999, he joined the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, where he is now a professor. His research interest includes machine learning, computer vision and pattern recognition, and information hiding. He has published more than sixty papers in international journals such as IEEE Transactions, Neural Networks and in international conferences. His papers got more than 400 citations by other researchers. Yong Peng received the B.S degree in computer science from Hefei New Star Research Institure of Applied Technology, the M.S degree from Graduate University of Chinese Academy of Sciences. Now he is working towards his PhD degree in Shanghai Jiao Tong University. His research interests include machine learning, pattern recognition and evolutionary computation.

23 Xianzhong Wang received the B.S degree in computer science from An Hui University Of Technology.Now he is a Master candidate in Computer Science and Engineering Department of Shanghai Jiao Tong University. His research interests include machine learning and human action recognition. Shaokun Feng received the B.S degree in information science from University of Shanghai for Science and Technology. Now he is working towards his M.S degree in Shanghai Jiao Tong University. His research interests include machine learning, pattern recognition and deep learning.

arxiv: v3 [cs.cv] 3 Oct 2012

arxiv: v3 [cs.cv] 3 Oct 2012 Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,

More information

Aggregating Descriptors with Local Gaussian Metrics

Aggregating Descriptors with Local Gaussian Metrics Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,

More information

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space. Visual words Map high-dimensional descriptors to tokens/words by quantizing the feature space. Quantize via clustering; cluster centers are the visual words Word #2 Descriptor feature space Assign word

More information

Beyond Bags of Features

Beyond Bags of Features : for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015

More information

Artistic ideation based on computer vision methods

Artistic ideation based on computer vision methods Journal of Theoretical and Applied Computer Science Vol. 6, No. 2, 2012, pp. 72 78 ISSN 2299-2634 http://www.jtacs.org Artistic ideation based on computer vision methods Ferran Reverter, Pilar Rosado,

More information

arxiv: v1 [cs.lg] 20 Dec 2013

arxiv: v1 [cs.lg] 20 Dec 2013 Unsupervised Feature Learning by Deep Sparse Coding Yunlong He Koray Kavukcuoglu Yun Wang Arthur Szlam Yanjun Qi arxiv:1312.5783v1 [cs.lg] 20 Dec 2013 Abstract In this paper, we propose a new unsupervised

More information

Aggregated Color Descriptors for Land Use Classification

Aggregated Color Descriptors for Land Use Classification Aggregated Color Descriptors for Land Use Classification Vedran Jovanović and Vladimir Risojević Abstract In this paper we propose and evaluate aggregated color descriptors for land use classification

More information

Image-to-Class Distance Metric Learning for Image Classification

Image-to-Class Distance Metric Learning for Image Classification Image-to-Class Distance Metric Learning for Image Classification Zhengxiang Wang, Yiqun Hu, and Liang-Tien Chia Center for Multimedia and Network Technology, School of Computer Engineering Nanyang Technological

More information

String distance for automatic image classification

String distance for automatic image classification String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,

More information

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition

More information

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,

More information

Mixtures of Gaussians and Advanced Feature Encoding

Mixtures of Gaussians and Advanced Feature Encoding Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why

More information

Scene Recognition using Bag-of-Words

Scene Recognition using Bag-of-Words Scene Recognition using Bag-of-Words Sarthak Ahuja B.Tech Computer Science Indraprastha Institute of Information Technology Okhla, Delhi 110020 Email: sarthak12088@iiitd.ac.in Anchita Goel B.Tech Computer

More information

Codebook Graph Coding of Descriptors

Codebook Graph Coding of Descriptors Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'5 3 Codebook Graph Coding of Descriptors Tetsuya Yoshida and Yuu Yamada Graduate School of Humanities and Science, Nara Women s University, Nara,

More information

A 2-D Histogram Representation of Images for Pooling

A 2-D Histogram Representation of Images for Pooling A 2-D Histogram Representation of Images for Pooling Xinnan YU and Yu-Jin ZHANG Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China ABSTRACT Designing a suitable image representation

More information

Part-based and local feature models for generic object recognition

Part-based and local feature models for generic object recognition Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

Sketchable Histograms of Oriented Gradients for Object Detection

Sketchable Histograms of Oriented Gradients for Object Detection Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The

More information

Multiple Stage Residual Model for Accurate Image Classification

Multiple Stage Residual Model for Accurate Image Classification Multiple Stage Residual Model for Accurate Image Classification Song Bai, Xinggang Wang, Cong Yao, Xiang Bai Department of Electronics and Information Engineering Huazhong University of Science and Technology,

More information

Bag-of-features. Cordelia Schmid

Bag-of-features. Cordelia Schmid Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image

More information

Learning Compact Visual Attributes for Large-scale Image Classification

Learning Compact Visual Attributes for Large-scale Image Classification Learning Compact Visual Attributes for Large-scale Image Classification Yu Su and Frédéric Jurie GREYC CNRS UMR 6072, University of Caen Basse-Normandie, Caen, France {yu.su,frederic.jurie}@unicaen.fr

More information

ImageCLEF 2011

ImageCLEF 2011 SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test

More information

Kernel Codebooks for Scene Categorization

Kernel Codebooks for Scene Categorization Kernel Codebooks for Scene Categorization Jan C. van Gemert, Jan-Mark Geusebroek, Cor J. Veenman, and Arnold W.M. Smeulders Intelligent Systems Lab Amsterdam (ISLA), University of Amsterdam, Kruislaan

More information

Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition

Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition Bangpeng Yao 1, Juan Carlos Niebles 2,3, Li Fei-Fei 1 1 Department of Computer Science, Princeton University, NJ 08540, USA

More information

Improved Spatial Pyramid Matching for Image Classification

Improved Spatial Pyramid Matching for Image Classification Improved Spatial Pyramid Matching for Image Classification Mohammad Shahiduzzaman, Dengsheng Zhang, and Guojun Lu Gippsland School of IT, Monash University, Australia {Shahid.Zaman,Dengsheng.Zhang,Guojun.Lu}@monash.edu

More information

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia

More information

Part based models for recognition. Kristen Grauman

Part based models for recognition. Kristen Grauman Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily

More information

ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING. Wei Liu, Serkan Kiranyaz and Moncef Gabbouj

ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING. Wei Liu, Serkan Kiranyaz and Moncef Gabbouj Proceedings of the 5th International Symposium on Communications, Control and Signal Processing, ISCCSP 2012, Rome, Italy, 2-4 May 2012 ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING

More information

Large-scale visual recognition The bag-of-words representation

Large-scale visual recognition The bag-of-words representation Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Comparing Local Feature Descriptors in plsa-based Image Models

Comparing Local Feature Descriptors in plsa-based Image Models Comparing Local Feature Descriptors in plsa-based Image Models Eva Hörster 1,ThomasGreif 1, Rainer Lienhart 1, and Malcolm Slaney 2 1 Multimedia Computing Lab, University of Augsburg, Germany {hoerster,lienhart}@informatik.uni-augsburg.de

More information

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Fuzzy based Multiple Dictionary Bag of Words for Image Classification Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features

More information

Multiple VLAD encoding of CNNs for image classification

Multiple VLAD encoding of CNNs for image classification Multiple VLAD encoding of CNNs for image classification Qing Li, Qiang Peng, Chuan Yan 1 arxiv:1707.00058v1 [cs.cv] 30 Jun 2017 Abstract Despite the effectiveness of convolutional neural networks (CNNs)

More information

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization

Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization Jung H. Oh, Gyuho Eoh, and Beom H. Lee Electrical and Computer Engineering, Seoul National University,

More information

FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE. Project Plan

FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE. Project Plan FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE Project Plan Structured Object Recognition for Content Based Image Retrieval Supervisors: Dr. Antonio Robles Kelly Dr. Jun

More information

Ensemble of Bayesian Filters for Loop Closure Detection

Ensemble of Bayesian Filters for Loop Closure Detection Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information

More information

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task CPPP/UFMS at ImageCLEF 2014: Robot Vision Task Rodrigo de Carvalho Gomes, Lucas Correia Ribas, Amaury Antônio de Castro Junior, Wesley Nunes Gonçalves Federal University of Mato Grosso do Sul - Ponta Porã

More information

TEXTURE CLASSIFICATION METHODS: A REVIEW

TEXTURE CLASSIFICATION METHODS: A REVIEW TEXTURE CLASSIFICATION METHODS: A REVIEW Ms. Sonal B. Bhandare Prof. Dr. S. M. Kamalapur M.E. Student Associate Professor Deparment of Computer Engineering, Deparment of Computer Engineering, K. K. Wagh

More information

Scene Classification with Low-dimensional Semantic Spaces and Weak Supervision

Scene Classification with Low-dimensional Semantic Spaces and Weak Supervision Scene Classification with Low-dimensional Semantic Spaces and Weak Supervision Nikhil Rasiwasia Nuno Vasconcelos Department of Electrical and Computer Engineering University of California, San Diego nikux@ucsd.edu,

More information

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention

Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention Anand K. Hase, Baisa L. Gunjal Abstract In the real world applications such as landmark search, copy protection, fake image

More information

Determinant of homography-matrix-based multiple-object recognition

Determinant of homography-matrix-based multiple-object recognition Determinant of homography-matrix-based multiple-object recognition 1 Nagachetan Bangalore, Madhu Kiran, Anil Suryaprakash Visio Ingenii Limited F2-F3 Maxet House Liverpool Road Luton, LU1 1RS United Kingdom

More information

Randomized Spatial Partition for Scene Recognition

Randomized Spatial Partition for Scene Recognition Randomized Spatial Partition for Scene Recognition Yuning Jiang, Junsong Yuan, and Gang Yu School of Electrical and Electronics Engineering Nanyang Technological University, Singapore 639798 {ynjiang,jsyuan}@ntu.edu.sg,

More information

III. VERVIEW OF THE METHODS

III. VERVIEW OF THE METHODS An Analytical Study of SIFT and SURF in Image Registration Vivek Kumar Gupta, Kanchan Cecil Department of Electronics & Telecommunication, Jabalpur engineering college, Jabalpur, India comparing the distance

More information

Visual Object Recognition

Visual Object Recognition Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 & Kristen Grauman Department

More information

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya

More information

Max-Margin Dictionary Learning for Multiclass Image Categorization

Max-Margin Dictionary Learning for Multiclass Image Categorization Max-Margin Dictionary Learning for Multiclass Image Categorization Xiao-Chen Lian 1, Zhiwei Li 3, Bao-Liang Lu 1,2, and Lei Zhang 3 1 Dept. of Computer Science and Engineering, Shanghai Jiao Tong University,

More information

NTHU Rain Removal Project

NTHU Rain Removal Project People NTHU Rain Removal Project Networked Video Lab, National Tsing Hua University, Hsinchu, Taiwan Li-Wei Kang, Institute of Information Science, Academia Sinica, Taipei, Taiwan Chia-Wen Lin *, Department

More information

CS 231A Computer Vision (Fall 2011) Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4 CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based

More information

MACHINE VISION is a subfield of artificial intelligence. An Ensemble of Deep Support Vector Machines for Image Categorization

MACHINE VISION is a subfield of artificial intelligence. An Ensemble of Deep Support Vector Machines for Image Categorization An Ensemble of Deep Support Vector Machines for Image Categorization Azizi Abdullah, Remco C. Veltkamp Department of Information and Computer Sciences Utrecht University, The Netherlands azizi@cs.uu.nl,

More information

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Liefeng Bo University of Washington Seattle WA 98195, USA Xiaofeng Ren ISTC-Pervasive Computing Intel Labs Seattle

More information

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering waset.org/publication/0007607

More information

Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search

Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search Yan Zhang 1,2, Yao Zhao 1,2, Shikui Wei 3( ), and Zhenfeng Zhu 1,2 1 Institute of Information Science, Beijing Jiaotong

More information

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,

More information

A Hybrid Feature Extractor using Fast Hessian Detector and SIFT

A Hybrid Feature Extractor using Fast Hessian Detector and SIFT Technologies 2015, 3, 103-110; doi:10.3390/technologies3020103 OPEN ACCESS technologies ISSN 2227-7080 www.mdpi.com/journal/technologies Article A Hybrid Feature Extractor using Fast Hessian Detector and

More information

A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani 2

A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani 2 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 10, 2014 ISSN (online): 2321-0613 A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani

More information

The devil is in the details: an evaluation of recent feature encoding methods

The devil is in the details: an evaluation of recent feature encoding methods CHATFIELD et al.: THE DEVIL IS IN THE DETAILS 1 The devil is in the details: an evaluation of recent feature encoding methods Ken Chatfield http://www.robots.ox.ac.uk/~ken Victor Lempitsky http://www.robots.ox.ac.uk/~vilem

More information

Semantic-based image analysis with the goal of assisting artistic creation

Semantic-based image analysis with the goal of assisting artistic creation Semantic-based image analysis with the goal of assisting artistic creation Pilar Rosado 1, Ferran Reverter 2, Eva Figueras 1, and Miquel Planas 1 1 Fine Arts Faculty, University of Barcelona, Spain, pilarrosado@ub.edu,

More information

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation

More information

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Efficient Kernels for Identifying Unbounded-Order Spatial Features Efficient Kernels for Identifying Unbounded-Order Spatial Features Yimeng Zhang Carnegie Mellon University yimengz@andrew.cmu.edu Tsuhan Chen Cornell University tsuhan@ece.cornell.edu Abstract Higher order

More information

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and

More information

Linear combinations of simple classifiers for the PASCAL challenge

Linear combinations of simple classifiers for the PASCAL challenge Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu

More information

Combining Selective Search Segmentation and Random Forest for Image Classification

Combining Selective Search Segmentation and Random Forest for Image Classification Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many

More information

Robust Scene Classification with Cross-level LLC Coding on CNN Features

Robust Scene Classification with Cross-level LLC Coding on CNN Features Robust Scene Classification with Cross-level LLC Coding on CNN Features Zequn Jie 1, Shuicheng Yan 2 1 Keio-NUS CUTE Center, National University of Singapore, Singapore 2 Department of Electrical and Computer

More information

CS229: Action Recognition in Tennis

CS229: Action Recognition in Tennis CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active

More information

Content-Based Image Classification: A Non-Parametric Approach

Content-Based Image Classification: A Non-Parametric Approach 1 Content-Based Image Classification: A Non-Parametric Approach Paulo M. Ferreira, Mário A.T. Figueiredo, Pedro M. Q. Aguiar Abstract The rise of the amount imagery on the Internet, as well as in multimedia

More information

Fig. 1 Feature descriptor can be extracted local features from difference regions and resolutions by (a) and (b). (a) Spatial pyramid matching (SPM) 1

Fig. 1 Feature descriptor can be extracted local features from difference regions and resolutions by (a) and (b). (a) Spatial pyramid matching (SPM) 1 IIEEJ Paper Image Categorization Using Hierarchical Spatial Matching Kernel TamT.LE, Yousun KANG (Member), Akihiro SUGIMOTO Kyoto University, Tokyo Polytechnic University, National Institute of Informatics

More information

Where am I: Place instance and category recognition using spatial PACT

Where am I: Place instance and category recognition using spatial PACT Where am I: Place instance and category recognition using spatial PACT Jianxin Wu James M. Rehg School of Interactive Computing, College of Computing, Georgia Institute of Technology {wujx,rehg}@cc.gatech.edu

More information

Specular 3D Object Tracking by View Generative Learning

Specular 3D Object Tracking by View Generative Learning Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp

More information

Locality-constrained and Spatially Regularized Coding for Scene Categorization

Locality-constrained and Spatially Regularized Coding for Scene Categorization Locality-constrained and Spatially Regularized Coding for Scene Categorization Aymen Shabou Hervé Le Borgne CEA, LIST, Vision & Content Engineering Laboratory Gif-sur-Yvettes, France aymen.shabou@cea.fr

More information

Mutual Information Based Codebooks Construction for Natural Scene Categorization

Mutual Information Based Codebooks Construction for Natural Scene Categorization Chinese Journal of Electronics Vol.20, No.3, July 2011 Mutual Information Based Codebooks Construction for Natural Scene Categorization XIE Wenjie, XU De, TANG Yingjun, LIU Shuoyan and FENG Songhe (Institute

More information

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task

BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task S. Avila 1,2, N. Thome 1, M. Cord 1, E. Valle 3, and A. de A. Araújo 2 1 Pierre and Marie Curie University, UPMC-Sorbonne Universities, LIP6, France

More information

SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION

SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION Yupeng Yan 1 Xinmei Tian 1 Linjun Yang 2 Yijuan Lu 3 Houqiang Li 1 1 University of Science and Technology of China, Hefei Anhui, China 2 Microsoft Corporation,

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

Spatial Hierarchy of Textons Distributions for Scene Classification

Spatial Hierarchy of Textons Distributions for Scene Classification Spatial Hierarchy of Textons Distributions for Scene Classification S. Battiato 1, G. M. Farinella 1, G. Gallo 1, and D. Ravì 1 Image Processing Laboratory, University of Catania, IT {battiato, gfarinella,

More information

IMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS

IMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS IMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS Lingxi Xie 1, Qi Tian 2, Jingdong Wang 3, and Bo Zhang 4 1,4 LITS, TNLIST, Dept. of Computer Sci&Tech, Tsinghua University, Beijing 100084, China 2 Department

More information

A Comparison of SIFT, PCA-SIFT and SURF

A Comparison of SIFT, PCA-SIFT and SURF A Comparison of SIFT, PCA-SIFT and SURF Luo Juan Computer Graphics Lab, Chonbuk National University, Jeonju 561-756, South Korea qiuhehappy@hotmail.com Oubong Gwun Computer Graphics Lab, Chonbuk National

More information

Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding

Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding 2014 22nd International Conference on Pattern Recognition Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding Wai Lam Hoo, Tae-Kyun Kim, Yuru Pei and Chee Seng Chan Center of

More information

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets

More information

Selection of Scale-Invariant Parts for Object Class Recognition

Selection of Scale-Invariant Parts for Object Class Recognition Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract

More information

Locality-constrained Linear Coding for Image Classification

Locality-constrained Linear Coding for Image Classification Locality-constrained Linear Coding for Image Classification Jinjun Wang, Jianchao Yang,KaiYu, Fengjun Lv, Thomas Huang, and Yihong Gong Akiira Media System, Palo Alto, California Beckman Institute, University

More information

Image Classification by Hierarchical Spatial Pooling with Partial Least Squares Analysis

Image Classification by Hierarchical Spatial Pooling with Partial Least Squares Analysis ZHU, et al.: HIERARCHICAL SPATIAL POOLING 1 Image Classification by Hierarchical Spatial Pooling with Partial Least Squares Analysis Jun Zhu 1 junnyzhu@sjtu.edu.cn Weijia Zou 1 zouweijia@sjtu.edu.cn Xiaokang

More information

Multi-Class Image Classification: Sparsity Does It Better

Multi-Class Image Classification: Sparsity Does It Better Multi-Class Image Classification: Sparsity Does It Better Sean Ryan Fanello 1,2, Nicoletta Noceti 2, Giorgio Metta 1 and Francesca Odone 2 1 Department of Robotics, Brain and Cognitive Sciences, Istituto

More information

Improving Scene Classification by Fusion of Training Data and Web Resources

Improving Scene Classification by Fusion of Training Data and Web Resources 18th International Conference on Information Fusion Washington, DC - July 6-9, 2015 Improving Scene Classification by Fusion of Training Data and Web Resources Dongzhe Wang, and Kezhi Mao School of Electrical

More information

By Suren Manvelyan,

By Suren Manvelyan, By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,

More information

Learning based face hallucination techniques: A survey

Learning based face hallucination techniques: A survey Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)

More information

IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart

IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM Christian Eggert, Stefan Romberg, Rainer Lienhart Multimedia Computing and Computer Vision Lab University of Augsburg ABSTRACT

More information

Exploring Bag of Words Architectures in the Facial Expression Domain

Exploring Bag of Words Architectures in the Facial Expression Domain Exploring Bag of Words Architectures in the Facial Expression Domain Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett Machine Perception Laboratory, University of California San Diego {ksikka,ting,josh,marni}@mplab.ucsd.edu

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Patch Descriptors. CSE 455 Linda Shapiro

Patch Descriptors. CSE 455 Linda Shapiro Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar

More information

A Hierarchial Model for Visual Perception

A Hierarchial Model for Visual Perception A Hierarchial Model for Visual Perception Bolei Zhou 1 and Liqing Zhang 2 1 MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, and Department of Biomedical Engineering, Shanghai

More information

Sparse coding for image classification

Sparse coding for image classification Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction

More information

Supervised learning. y = f(x) function

Supervised learning. y = f(x) function Supervised learning y = f(x) output prediction function Image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the

More information

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane

More information

KNOWING Where am I has always being an important

KNOWING Where am I has always being an important CENTRIST: A VISUAL DESCRIPTOR FOR SCENE CATEGORIZATION 1 CENTRIST: A Visual Descriptor for Scene Categorization Jianxin Wu, Member, IEEE and James M. Rehg, Member, IEEE Abstract CENTRIST (CENsus TRansform

More information

Image classification based on support vector machine and the fusion of complementary features

Image classification based on support vector machine and the fusion of complementary features Image classification based on support vector machine and the fusion of complementary features Huilin Gao, a,b Wenjie Chen, a,b,* Lihua Dou, a,b a Beijing Institute of Technology, School of Automation,

More information

CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION

CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 4, 2015 ISSN 2286-3540 CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION Ionuţ Mironică 1, Bogdan Ionescu 2, Radu Dogaru 3 In this paper we propose

More information

ILSVRC on a Smartphone

ILSVRC on a Smartphone [DOI: 10.2197/ipsjtcva.6.83] Express Paper ILSVRC on a Smartphone Yoshiyuki Kawano 1,a) Keiji Yanai 1,b) Received: March 14, 2014, Accepted: April 24, 2014, Released: July 25, 2014 Abstract: In this work,

More information