Image classification based on improved VLAD
|
|
- Austin Gilbert
- 5 years ago
- Views:
Transcription
1 DOI /s Image classification based on improved VLAD Xianzhong Long Hongtao Lu Yong Peng Xianzhong Wang Shaokun Feng Received: 25 August 2014 / Revised: 22 December 2014 / Accepted: 18 February 2015 Springer Science+Business Media New York 2015 Abstract Recently, a coding scheme called vector of locally aggregated descriptors (VLAD) has got tremendous successes in large scale image retrieval due to its efficiency of compact representation. VLAD employs only the nearest neighbor visual word in dictionary to aggregate each descriptor feature. It has fast retrieval speed and high retrieval accuracy under small dictionary size. In this paper, we give three improved VLAD variations for image classification: first, similar to the bag of words (BoW) model, we count the number of descriptors belonging to each cluster center and add it to VLAD; second, in order to expand the impact of residuals, squared residuals are taken into account; thirdly, in contrast with one nearest neighbor visual word, we try to look for two nearest neighbor visual words for aggregating each descriptor. Experimental results on UIUC Sports Event, Corel 10 and 15 Scenes datasets show that the proposed methods outperform some state-of-the-art coding schemes in terms of the classification accuracy and computation speed. X. Long ( ) School of Computer Science & Technology, School of Software, Nanjing University of Posts and Telecommunications, Nanjing, , China lxz@njupt.edu.cn H. Lu Y. Peng X. Wang S. Feng Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, , China H. Lu htlu@sjtu.edu.cn Y. Peng pengyong851012@sjtu.edu.cn X. Wang wxz2453@sjtu.edu.cn S. Feng superkkking@sjtu.edu.cn
2 Keywords Image classification Scale-invariant feature transform Vector of locally aggregated descriptors K-means clustering algorithm 1 Introduction As one of the most important and challenging tasks in computer vision and pattern recognition fields, image classification has recently got many attention. There are some benchmark datasets used to evaluate the classification performance of image classification algorithms, for example, UIUC sports event [23], Corel 10 [26], 15 Scenes [21], Caltech 101 [10] and Caltech 256 [14], etc. Many image classification models have recently been proposed, such as generative models [2, 22, 33], discriminative models [9, 18, 27, 39] and hybrid generative/discriminative models [3]. Generative model classifies images from the viewpoint of probability, it only depends on the data themselves and does not require training or learning parameters. In contrast, discriminative model solves classification problem from the nonprobabilistic perspective, which needs to train or learn parameters appeared in the classifier. Here, we only consider image classification based on discriminative model. In the discriminative models, the earliest bag of words (BoW) technique [35] won the greatest popularity and had a wide range of applications in the fields of image retrieval [31], video event detection [37] and image classification [6, 13]. However, the BoW representation does not possess enough descriptive capability because it is the histogram of the number of image descriptors assigned to each visual word and it ignores the spatial information of the image. To solve this problem, Spatial Pyramid Matching (SPM) model has been put forward in [21], which takes the spatial information of image into account. In fact, SPM is an extension of BoW model and has been proved to achieve better image classification accuracy than the latter [15, 36, 38]. In the image classification based on SPM model, there are five steps, i.e., local descriptor extraction, dictionary learning, feature coding, spatial pooling and classifier selection. Specifically, the commonly used local descriptors include Scale-Invariant Feature Transform (SIFT) [25], Histogram of oriented Gradients (HoG) [7], Affine Scale-Invariant Feature Transform (ASIFT) [28], Oriented Fast and Rotated BRIEF (ORB) [34], etc. After getting all images descriptors, vector quantization [21] or sparse coding [38] is utilized to train a dictionary. In the feature coding phase, each image s descriptors matrix corresponds to a coefficient matrix generated by one different coding strategy. It is necessary to illustrate the principle of spatial pooling clearly because it dominates the whole image classification framework based on SPM model. During the spatial pooling period, an image is divided into increasingly finer subregions of L layers, with 2 l 2 l subregions at layer l, l = 0, 1,,L 1. A typical partition is three layers, i.e., L = 3. At layer 0, the image itself as a whole; at layer 1, the image is divided into four regions and at layer 2, each subregion of layer 1 is further divided into 4, resulting in 16 smaller subregions. This process generates a spatial pyramid of three layers with a total of 21 subregions. Then, spatial pyramid is combined with feature coding process and different pooling functions is exploited, i.e., sum pooling [21] and max pooling [36, 38]. Finally, the feature vectors of the 21 subregions are concatenated into a long feature vector for the whole image. The process mentioned above is the spatial pyramid representation of the image. The dimensionality of the new representation for each image is 21P (P is the dictionary size). It is noteworthy that when l = 0,
3 SPM reduces to the original BoW model. In the last step, classifiers such as Support Vector Machine (SVM) [5] or Adaptive Boosting (AdaBoost) [11] is applied to classify images. Over the past several years, a number of dictionary learning methods and feature coding strategies have been brought forward for image classification. In [6], as one vector quantization (VQ) technique, K-means clustering algorithm was used to generate dictionary, during the feature coding phase, each local descriptor was given a binary value that specified the cluster center which the local descriptor belonged to. This process is called BoW, which produces the histograms representation of visual words. However, this approach is likely to result in large reconstruction error because it limits the ability of representing descriptors. To address this problem, SPM based on sparse coding (ScSPM) method has been proposed in [38], which employed L 1 norm-based sparse coding scheme to substitute the previous K-means clustering method and to generate dictionary by learning randomly sampled SIFT feature vectors. During the feature coding period, ScSPM used sparse coding strategy to code each local descriptor. However, the computation speed of ScSPM is very slow when the dictionary size becomes large. In order to accelerate the computation and maintain high classification accuracy, locality-constrained linear coding (LLC) was put forward in [36], which gave an analytical solution for feature coding. Furthermore, several improved image classification schemes based SPM have also been suggested recently, such as spatial pyramid matching using Laplacian sparse coding [12], discriminative spatial pyramid [15], discriminative affine sparse codes [20], nearest neighbor basis vectors spatial pyramid matching (NNBVSPM) [24], etc. How to find some efficient feature coding strategies is becoming an urgent research direction. In the field of pattern recognition, Fisher vector (FV) technique has been used for image classification [4, 19, 29, 30]. FV is a strong framework which combines the advantages of generative and discriminative approaches. The key point of FV is to represent a signal using a gradient vector derived from a generative probability model and to subsequently input this representation to a discriminative classifier. Therefore, FV can be seen as one hybrid generative/discriminative model. The vector of locally aggregated descriptors (VLAD) can be viewed as a non-probabilistic version of the FV when the gradient only associates with the mean and replace gaussian mixture models (GMM) clustering by K-means. VLAD has been successfully applied to image retrieval [1, 8, 16, 17]. When some higher-order statistics are considered, researchers proposed another two coding methods, i.e., vectors of locally aggregated tensors (VLAT) [32] and super-vector (SV) [41]. The dimensionality of VLAT is P(D + D 2 ),wherethed is the dimension of each descriptor, the high dimensionality representation of VLAT can result in very large computation time. Besides, SV is based on probability viewpoint and it is still a generative model. Therefore, we do not consider the VLAT and SV feature coding algorithms. In this paper, we only concentrate on some image classification methods based on discriminative models, BoW, ScSPM, LLC and VLAD are selected to compare with our improved VLAD methods. In order to obtain stronger coding ability and improve the classification rate or speed, three improved VLAD versions for image classification are given in this paper. First, similar to the bag of words (BoW) model, we count the number of descriptors belonging to each cluster center and add it to VLAD. In this way, our improved VLAD method possesses the characteristics of BoW. Second, in order to expand the impact of residuals, squared residuals are added into the original VLAD. This makes the dimension of new representation is two times of the original. Thirdly, there are some descriptors which have nearly the same
4 distance to more than one visual words. Thus, these descriptors only assigned to the nearest visual word in original VLAD are not appropriate. In contrast with one nearest neighbor visual word, we try to look for two nearest neighbor visual words for aggregating each descriptor. The remainder of the paper is organized as follows: Section 2 introduces the basic idea of existing schemes. Our improved VLAD methods are presented in Section 3. In Section 4, the comparison results of image classification on three widely used datasets are reported. Finally, conclusions are made and some future research issues are discussed in Section 5. 2 Related work Let V be a set of D-dimensional local descriptors extracted from an image, i.e., V = [v 1, v 2,, v M ] R D M. Given a dictionary with P entries, W =[w 1, w 2,, w P ] R D P, different feature coding schemes convert each descriptor into a P -dimensional code to generate the final image representation coefficient matrix H, i.e., H = [h 1, h 2,, h M ] R P M. Each column of V is a local descriptor corresponding to a coefficient, i.e., each column of H. 2.1 Bag of words (BoW) The BoW representation groups local descriptors. It first generates a dictionary W with P visual words usually obtained by K-means clustering algorithm. Each D dimension local descriptor from an image is then assigned to the closest center. The BoW representation is obtained as the histogram of the assignment of all image descriptors to visual words. Therefore, it produces a P -dimensional vector representation, the sum of the elements in this vector equals the number of descriptors in each image. However, the BoW model does not consider the spatial structure information of image and has large reconstruction error, its ability to image classification is restricted [6]. 2.2 Sparse coding spatial pyramid matching (ScSPM) In ScSPM [38], by using sparse coding in place of vector quantization followed by multilayer spatial max pooling, the authors developed an extension of the traditional SPM method [21] and presented a linear SPM kernel based on SIFT sparse coding. In the process of image classification, ScSPM solved the following optimization problem: min W,H i=1 M v i Wh i λ h i 1 (1) where. 2 denotes the L 2 norm of a vector, i.e., the square root of sum of the vector entries squares,. 1 is the L 1 norm of a vector, i.e., the sum of the absolute values of the vector entries. The parameter λ is used to control the sparsity of the solution of formula (1), the bigger λ is, the more sparse the solution will be. Experimental results in [38] demonstrated that linear SPM based on sparse coding of SIFT descriptors significantly outperformed the linear SPM kernel on histograms and was even better than the nonlinear SPM
5 kernels. Nevertheless, utilizing sparse coding to learn dictionary and to encode features are time consuming, especially for large scale image dataset or large dictionary. 2.3 Locality-constrained linear coding (LLC) In LLC [36], inspired by the viewpoint of [40] which illustrated that locality was more important than sparsity, the authors generalized the sparse coding to locality-constrained linear coding and suggested a locality constraint instead of the sparsity constraint in the formula (1). With respect to LLC, the following optimization problem was solved: min H M v i Wh i λ d i h i 2 2 i=1 s.t. 1 T h i = 1, i (2) where 1 = (1, 1,, 1) T, denotes the element-wise multiplication, and d i R P is a weight vector. In addition, each coefficient vector h i is normalized in terms of 1 T h i = 1. Experimental results in [36] showedthatthe LLCoutperformed ScSPMonsome benchmark datasets due to its excellent properties, i.e., better reconstruction, local smooth sparsity and analytical solution. 2.4 Vector of locally aggregated descriptors (VLAD) VLAD representation was proposed in [16] for image retrieval. V =[v 1, v 2,, v M ] R D M represents a descriptor set extracted from an image. Like the BoW, a dictionary W = [w 1, w 2,, w P ] R D P is first learned using K-means. Then, for each local descriptor v i, we look for its nearest neighbor visual word NN(v i ) in the dictionary. Finally, for each visual word w j, the differences v i w j of the vectors v i assigned to w j are accumulated. C =[c T 1, ct 2,, ct P ]T R PD (c j R D,j = 1, 2,,P) is the final vector representation of VLAD, which can be obtained according to the following formula. c j = v i :NN(v i )=w j (v i w j ) (3) The VLAD representation is the concatenation of the D dimensional vectors c j and is therefore PD dimension, where P is the dictionary size. Algorithm 1 gives the VLAD coding process. Like the Fisher vector, the VLAD can then be power- and L 2 -normalized sequently, where the parameter α is empirically set to 0.5. It is worth noting that there are no SPM and pooling process in the VLAD coding algorithm. The existing experiments have proved that VLAD is an efficient feature coding method under small dictionary size. 3 Improved VLAD In this section, three improved VLAD methods are presented. They are named as VLAD based on BoW, Magnified VLAD and Two Nearest Neighbor VLAD respectively. The same
6 as VLAD, the improved VLAD representations can also be power- and L 2 -normalized, where the parameter α is empirically set to VLAD based on BoW Inspired by the BoW, we count the number of descriptors belonging to each cluster w j (j = 1,,P) and add it to VLAD. This improved VLAD method is called VLAD based on BoW (abbreviated as: VLAD+BoW). Therefore, the dimensionality of VLAD+BoW representation is P(D + 1), and the extra one dimension is used to store the BoW representation. After integrating the histogram information of visual words into the VLAD, we hope that VLAD+BoW can possess the characteristics of BoW and improve the classification performance. The VLAD+BoW is presented in Algorithm Magnified VLAD In order to magnify the impact of residuals, squared residuals are taken into account. This improved version is called Magnified VLAD (abbreviated as: MVLAD) and its dimension is 2PD. The computation of MVLAD is given in Algorithm 3.
7 3.3 Two nearest neighbor VLAD In addition to a nearest neighbor center, we attempt to seek a second nearest neighbor center for each descriptor. This process is referred to two nearest neighbor VLAD (abbreviated as: TNNVLAD). The dimension of TNNVLAD representation is still PD. TNNVLAD is a kind of soft coding method and it can reduce representation error. The specific details are showed in Algorithm 4. If d 1 >βd 2, the 0.5 times differences between v i and its two nearest neighbor centers are accumulated. The value of β can be obtained according to our experiments. 4 Experimental results This section begins with an illustration of our experiments setting which is followed by comparisons between our schemes with other prominent methods on three datasets, i.e., UIUC Sports Event, Corel 10 and 15 Scenes. Figure 1 shows example images of these datasets.
8 4.1 Experiments setting A typical experiments setting for classifying images mainly contains four steps. First of all, we adopt the widely used SIFT descriptor [25] due to its good performance in image classification reported in [12, 21, 36, 38]. Specifically speaking, SIFT features are invariant to image scale and rotation and robust across a substantial range of affine distortion, addition of noise, and change in illumination. To be consistent with previous work, we also draw on the same setting to extract SIFT descriptor. We employ the 128-dimensional SIFT descriptor which are densely extracted from image patches on a grid with step size of 8 pixels under one patch size, i.e., We resize the maximum side (i.e., length or width) of each image to 300 pixels except for UIUC Sports Event. For UIUC Sports Event dataset, we resize the maximum side to 400 because of the high resolution of original images. Next, about twenty descriptors from each image are chosen at random to form a new matrix which is taken as an input of K-means clustering or sparse coding algorithm, and we then learn a dictionary of specified size. In the third step, we then exploit BoW, sparse coding, LLC, VLAD and improved VLAD schemes to encode the descriptors and produce image s new representation. For the BoW model, the dimensionality of the new representation is dictionary size P. In the ScSPM and LLC, we combined three layers spatial pyramid matching model (including 21 subregions) with max pooling function, thus, the dimension of the new representation is 21P. The dimensionality for the VLAD and the improved VLAD methods can be found from the Algorithms 1-4. At the final step, we apply linear SVM classifier
9 Fig. 1 Image examples of the datasets UIUC Sports Event (the left four), Corel 10 (the middle four), and 15 Scenes (the right four)
10 for the new representations, randomly selecting some columns per class to train and some other columns per class to test. Then, it is not difficult for us to get a classification accuracy for each category by comparing the obtained label of test set with the ground-truth label of test set. Eventually, we sum up classification accuracy of all categories and divide it by the number of categories to get the classification accuracy of all categories. All the results are obtained by repeating five independent experiments, and the average classification accuracy and the standard deviation over five experiments are reported. All the experiments are conducted in MATLAB, which is executed on a server with an Intel X5650 CPU (2.66GHz and 12 cores) and 32GB RAM. For the TNNVLAD algorithm, Fig. 2 gives the choice process of parameter β on three different datasets. Specifically speaking, Fig. 2 shows the classification accuracy of our TNNVLAD method when β changes in an interval [0.1, 1] where the dictionary size is 130. The experimental results presented in Fig. 2 indicate that β = 0.8 is the best choice for TNNVLAD. Therefore, in our experiments, we fix β = 0.8 in TNNVLAD algorithm. 4.2 UIUC sports event dataset UIUC Sports Event [23] contains 8 categories and 1579 images in total, with the number of images within each category ranging from 137 to 250. These 8 categories are badminton, bocce, croquet, polo, rock climbing, rowing, sailing and snow boarding. In order to compare with other methods, we first randomly select 70 images per class as training data and randomly select 60 images from each class as test data. We compare the classification accuracy of our three improved VLAD schemes with other four methods under different dictionary UIUC Sports Event 76 Corel Scenes beta Fig. 2 Classification accuracy of our TNNVLAD algorithm under different β on the UIUC Sports Event, Corel 10 and 15 Scenes datasets
11 BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 3 Classification accuracy comparisons of various coding methods under different dictionary size on the UIUC Sports Event dataset size in Fig. 3, where the dictionary size ranges from 10 to 420 and the step length is 10. From the results presented in Fig. 3, we notice that the classification accuracy of our methods surpass all the other algorithms when the dictionary size is small and are comparable to the existing schemes when the dictionary size becomes large. This phenomenon may be explained for the fact that the goal of VLAD is for aggregating local image descriptors into compact codes. VLAD can obtain good performance in the case of small dictionary size. Besides, we can know the results from the Fig. 3 that the performance of BoW is the lowest and ScSPM is better than BoW, yet, the classification accuracy of LLC is further better than ScSPM, these observations are consistent with reports in the existing literature sources. BasedonFig.3, we list the best classification accuracy of various approaches in Table 1, where the average classification accuracy, standard deviation and corresponding dictionary Table 1 The best classification accuracy comparisons on the UIUC Sports Event dataset (mean±std-dev)% Algorithm Classification Accuracy (Dictionary Size) BoW [6] ± 0.85 (390) ScSPM [38] ± 2.20 (400) LLC [36] ± 1.36 (330) VLAD [17] ± 2.67 (220) VLAD+BoW ± 0.87 (210) MVLAD ± 1.85 (220) TNNVLAD ± 1.26 (220)
12 The Confusion Matrix of VLAD+BoW algorithm on UIUC Sports Event (%) badminton bocce croquet polo rockclimbing rowing sailing snowboarding badminton bocce croquet polo rockclimbing rowing sailing snowboarding The Confusion Matrix of MVLAD algorithm on UIUC Sports Event (%) badminton bocce croquet polo rockclimbing rowing sailing snowboarding badminton badminton bocce croquet polo rockclimbing rowing sailing snowboarding The Confusion Matrix of TNNVLAD algorithm on UIUC Sports Event (%) bocce croquet polo rockclimbing rowing sailing snowboarding badminton bocce croquet polo rockclimbing rowing sailing snowboarding Fig. 4 Confusion Matrices of our algorithms on UIUC Sports Event dataset
13 size are given. From Table 1, we can draw the conclusion that the best classification accuracy of our three improved methods are better than those of the other four schemes on the UIUC Sports Event dataset. Our VLAD+BoW and TNNVLAD methods achieve more than 1 % higher accuracy than LLC, which is the state-of-the-art and is based on SPM model. Furthermore, the original VLAD and improved VLAD can get the best classification accuracy under small dictionary size, but the BoW, ScSPM and LLC obtain their highest classification accuracy needing large dictionary size. Moreover, the confusion matrices of our algorithms for UIUC Sports Event dataset are showninfig.4. In the process of obtaining confusion matrices, the dictionary size is set to 130 in our three improved VLAD methods. In the confusion matrices, the element in the i th row and j th column (i = j) is the percentage of images from class i that are misidentified as class j. Average classification accuracies of five independent experiments for individual classes are listed along the main diagonal. Figure 4 shows the classification and misclassification status for each individual class. Our algorithms perform well for class badminton and rock climbing. What is more, we also notice that the class bocce and croquet have a high percentage being classified wrongly, and this may result from that they are visually similar to each other. Balls in the class bocce and croquet have very similar appearance. To further demonstrate the superiority of our methods in running speed, the computation time comparisons of various approaches with different dictionary size on the UIUC Sports Event dataset is reported in Fig. 5. The computation time of all methods is the total time of five independent experiments and the corresponding unit is seconds. From Fig. 5, we can know that the computing speed of BoW method is the fastest due to its low dimensional representation. Meanwhile, we also observe that ScSPM algorithm is the slowest. This is because that sparse coding strategy is used to learn a dictionary and to encode features in ScSPM. To solve the optimization problem of minimizing the L 1 norm is very time-consuming. The computation time of VLAD and our three improved VLAD methods Computation Time (seconds) BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 5 Computation time comparisons of various coding methods under different dictionary size on the UIUC Sports Event dataset
14 BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 6 Classification accuracy comparisons of various coding methods under different dictionary size on the Corel 10 dataset are smaller than LLC. This experimental results show that our algorithms have a certain advantage on the computation time. 4.3 Corel 10 dataset Corel 10 [26] contains 10 categories and 100 images per category. These categories are beach, buildings, elephants, flowers, food, horses, mountains, owls, skiing and tigers. Like the setting of [12, 26], we randomly select 50 images from each class as training data and use the rest 50 images per class as test data. Similarly, classification accuracy comparisons of various coding methods under different dictionary size on the Corel 10 dataset are described in Fig. 6. We again see that our improved VLAD algorithms can obtain good performance when the dictionary size is small. According to Fig. 6, the best classification accuracy of different algorithms are reported in Table 2. From the results, we can see that the best classification accuracies of our three improved VLAD algorithms are better than those of the other four schemes on the Corel 10 Table 2 The best classification accuracy comparisons on the Corel 10 dataset (mean±std-dev)% Algorithm Classification Accuracy (Dictionary Size) BoW [6] ± 0.91 (340) ScSPM [38] ± 1.24 (340) LLC [36] ± 1.66 (380) VLAD [17] ± 1.47 (110) VLAD+BoW ± 0.48 (130) MVLAD ± (280) TNNVLAD ± 1.45 (130)
15 The Confusion Matrix of VLAD+BoW algorithm on Corel 10 (%) beach buildings elephants food 1 8 horses mountains owls skiing tiger beach buildings elephants food horses mountains owls skiing The Confusion Matrix of MVLAD algorithm on Corel 10 (%) tiger beach buildings 8 elephants food horses mountains owls skiing tiger beach buildings elephants food horses mountains owls skiing The Confusion Matrix of TNNVLAD algorithm on Corel 10 (%) tiger beach buildings elephants food horses mountains owls skiing tiger beach buildings elephants food horses mountains owls skiing tiger Fig. 7 Confusion Matrices of our algorithms on Corel 10 dataset
16 dataset. Moreover, all the algorithms based on VLAD obtain the best classification accuracy under small dictionary size, but the BoW, ScSPM and LLC get their best classification accuracy needing large dictionary size. Our TNNVLAD method has two percentage point higher than the other best method LLC. The confusion matrices for Corel 10 dataset are also given in Fig. 7. Our algorithms perform well for class flower and horse, and get poor performance on class mountain. Figure 8 gives the computation time comparisons of various coding methods under different dictionary size on Corel 10 dataset. ScSPM algorithm requires the most time than other six algorithms. Although MVLAD needs more time than BoW and LLC, but it still far less than ScSPM Scenes dataset The 15 Scenes dataset [21] contains 15 categories and 4485 images in total, with the number of images within each category ranging from 200 to 400. These 15 categories are bedroom, suburb, industrial, kitchen, living room, coast, forest, highway, inside city, mountain, open country, street, tall building, office and store. The image content is different, containing not only indoor scenes, like livingroom and store, but also outdoor sceneries, such as coast and forest etc. In order to compare with other methods, we randomly select 100 images per class as training data and use the rest as test data. Figure 9 gives the classification accuracy comparisons of various coding methods under different dictionary size on the 15 Scenes dataset. Algorithms based on VLAD cat get better performance than ScSPM and LLC when the dictionary size is small, but they become slightly lower than LLC when the dictionary size increases. Computation Time (seconds) BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 8 Computation time comparisons of various coding methods under different dictionary size on Corel 10 dataset
17 BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 9 Classification accuracy comparisons of various coding methods under different dictionary size on the 15 Scenes dataset On the basis of data in Fig. 9, the most prominent classification accuracy are presented in Table 3. For the 15 Scenes dataset, the best performance of our improved VLAD algorithms are comparable with or slightly lower than LLC and ScSPM. The confusion matrices for 15 Scenes dataset are shown in Fig. 10. Our algorithms perform well for class calsuburb and forest. Besides, we know that the class bedroom and living room have a high percentage being classified wrongly, meanwhile, the class kitchen and living room also have high misclassification rate, and these may result from that they are visually similar to each other. Figure 11 reports the computation time comparisons of various coding methods under different dictionary size on 15 Scenes dataset. ScSPM algorithm requires the most time than other six algorithms. Table 3 The best classification accuracy comparisons on the 15 Scenes dataset (mean±std-dev)% Algorithm Classification Accuracy (Dictionary Size) BoW [6] ± 0.61 (90) ScSPM [38] ± 0.70 (420) LLC [36] ± 1.02 (420) VLAD [17] ± 0.50 (400) VLAD+BoW ± 0.51 (280) MVLAD ± 0.50 (280) TNNVLAD ± 0.62 (400)
18 bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store The Confusion Matrix of VLAD+BoW algorithm on 15 Scenes (%) bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store The Confusion Matrix of MVLAD algorithm on 15 Scenes (%) bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store The Confusion Matrix of TNNVLAD algorithm on 15 Scenes (%) bedroom calsuburb industrial kitchen livingroom coast forest highway insidecity mountain opencountry street tallbuilding store Fig. 10 Confusion Matrices of our algorithms on 15 Scenes dataset
19 Computation Time (seconds) BoW ScSPM LLC VLAD VLAD+BoW MVLAD TNNVLAD Dictionary Size Fig. 11 Computation time comparisons of various coding methods under different dictionary size on the 15 Scenes dataset 5 Conclusion and future work In this paper, three feature coding schemes based on VLAD are proposed for image classification. We compare our schemes with some state-of-the-art methods, including BoW, ScSPM, LLC and VLAD. Experiments on different kinds of datasets (UIUC Sports Event dataset, Corel 10 dataset and 15 Scenes dataset) demonstrate that classification accuracy of our improved VLAD coding strategies are better than the previous four classical methods under small dictionary size. At the same time, it is noteworthy that our schemes are much faster than ScSPM because ScSPM algorithm needs more time to learn dictionary and code features using sparse coding strategy. In many cases, we need to consider the classification accuracy and classification speed simultaneously. In the future, we will try to find more efficient feature coding strategies and apply them to large scale image datasets. Acknowledgments This work is sponsored by NUPTSF (Grant No. NY214168), National Natural Science Foundation of China (Grant No , ), Shanghai Science and Technology Committee (Grant No ) and European Union Seventh Framework Programme (Grant No ). References 1. Arandjelovic R, Zisserman A (2013) All about vlad. In: IEEE conference on computer vision and pattern recognition, pp Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE conference on computer vision and pattern recognition, pp Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Int 30(4): Cinbis RG, Verbeek J, Schmid C (2012) Image categorization using fisher kernels of non-iid image models. In: IEEE conference on computer vision and pattern recognition, pp Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):
20 6. Csurka G, Dance CR, Fan LX, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1, p Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition, vol 1, pp Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the vlad image representation. In: ACM international conference on Multimedia, pp Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Proc 15(12): Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comp Vision Image Underst 106 (1): Freund Y, Schapire R (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp Gao SH, Tsang IWH, Chia LT, Zhao PL (2010) Local features are not lonely laplacian sparse coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp Grauman K, Darrell T (2005) The pyramid match kernel: Discriminative classification with sets of image features. In: International conference on computer vision, vol 2, pp Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset 15. Harada T, Ushiku Y, Yamashita Y, Kuniyoshi Y (2011) Discriminative spatial pyramid. In: IEEE conference on computer vision and pattern recognition, pp Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE conference on computer vision and pattern recognition, pp Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Int 34(9): Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: International conference on computer vision, vol 1, pp Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE international conference on computer vision, pp Kulkarni N, Li BX (2011) Discriminative affine sparse codes for image classification. In: IEEE conference on computer vision and pattern recognition, pp Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE conference on computer vision and pattern recognition, vol 2, pp Li FF, Pietro P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE conference on computer vision and pattern recognition, vol 2, pp Li LJ, Li FF (2007) What, where and who? Classifying events by scene and object recognition. In: International conference on computer vision, pp Long X, Lu H, Li W (2012) Image classification based on nearest neighbor basis vectors. Multimed Tools Appl: Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2): Lu Z, Ip HHS (2009) Image categorization with spatial mismatch kernels. In: IEEE conference on computer vision and pattern recognition, pp Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. Advances in neural information processing systems Morel J, Yu G (2009) Asift: a new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2(2): Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition, pp Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision, pp Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, pp Picard D, Gosselin PH (2011) Improving image similarity with vectors of locally aggregated tensors. In: IEEE international conference on image processing, pp Quelhas P, Monay F, Odobez JM, Gatica-Perez D, Tuytelaars T, Van Gool L (2005) Modeling scenes with local descriptors and latent aspects. In: International conference on computer vision, vol 1, pp
21 34. Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: International conference on computer vision 35. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: International conference on computer vision, pp Wang JJ, Yang JC, Yu K, Lv FJ, Huang T, Gong YH (2010) Locality-constrained linear coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp Xu D, Chang S (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Int 30(11): Yang JC, Yu K, Gong YH, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE conference on computer vision and pattern recognition, pp Yang L, Jin R, Sukthankar R, Jurie F (2008) Unifying discriminative visual codebook generation with classifier training for object category recognition. In: IEEE conference on computer vision and pattern recognition, pp Yu K, Zhang T, Gong YH (2009) Nonlinear learning using local coordinate coding. Adv Neural Inf Process Syst 22: Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European conference on computer vision, pp Xianzhong Long obtained his Ph.D. degree from Shanghai Jiao Tong University on June He received his B.S. degree from Henan Polytechnic University in 2007 and M.S. degree from Xihua University in 2010, both in computer science. Now, he is an assistant professor at Nanjing University of Posts and Telecommunications. His research interests are computer vision, machine learning and image processing, specifically on image classification, object recognition and clustering.
22 Hongtao Lu got his Ph.D. degree in Electronic Engineering from Southeast University, Nanjing, in After graduation he became a postdoctoral fellow in Department of Computer Science, Fudan University, Shanghai, China, where he spent two years. In 1999, he joined the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, where he is now a professor. His research interest includes machine learning, computer vision and pattern recognition, and information hiding. He has published more than sixty papers in international journals such as IEEE Transactions, Neural Networks and in international conferences. His papers got more than 400 citations by other researchers. Yong Peng received the B.S degree in computer science from Hefei New Star Research Institure of Applied Technology, the M.S degree from Graduate University of Chinese Academy of Sciences. Now he is working towards his PhD degree in Shanghai Jiao Tong University. His research interests include machine learning, pattern recognition and evolutionary computation.
23 Xianzhong Wang received the B.S degree in computer science from An Hui University Of Technology.Now he is a Master candidate in Computer Science and Engineering Department of Shanghai Jiao Tong University. His research interests include machine learning and human action recognition. Shaokun Feng received the B.S degree in information science from University of Shanghai for Science and Technology. Now he is working towards his M.S degree in Shanghai Jiao Tong University. His research interests include machine learning, pattern recognition and deep learning.
arxiv: v3 [cs.cv] 3 Oct 2012
Combined Descriptors in Spatial Pyramid Domain for Image Classification Junlin Hu and Ping Guo arxiv:1210.0386v3 [cs.cv] 3 Oct 2012 Image Processing and Pattern Recognition Laboratory Beijing Normal University,
More informationAggregating Descriptors with Local Gaussian Metrics
Aggregating Descriptors with Local Gaussian Metrics Hideki Nakayama Grad. School of Information Science and Technology The University of Tokyo Tokyo, JAPAN nakayama@ci.i.u-tokyo.ac.jp Abstract Recently,
More informationVisual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.
Visual words Map high-dimensional descriptors to tokens/words by quantizing the feature space. Quantize via clustering; cluster centers are the visual words Word #2 Descriptor feature space Assign word
More informationBeyond Bags of Features
: for Recognizing Natural Scene Categories Matching and Modeling Seminar Instructed by Prof. Haim J. Wolfson School of Computer Science Tel Aviv University December 9 th, 2015
More informationArtistic ideation based on computer vision methods
Journal of Theoretical and Applied Computer Science Vol. 6, No. 2, 2012, pp. 72 78 ISSN 2299-2634 http://www.jtacs.org Artistic ideation based on computer vision methods Ferran Reverter, Pilar Rosado,
More informationarxiv: v1 [cs.lg] 20 Dec 2013
Unsupervised Feature Learning by Deep Sparse Coding Yunlong He Koray Kavukcuoglu Yun Wang Arthur Szlam Yanjun Qi arxiv:1312.5783v1 [cs.lg] 20 Dec 2013 Abstract In this paper, we propose a new unsupervised
More informationAggregated Color Descriptors for Land Use Classification
Aggregated Color Descriptors for Land Use Classification Vedran Jovanović and Vladimir Risojević Abstract In this paper we propose and evaluate aggregated color descriptors for land use classification
More informationImage-to-Class Distance Metric Learning for Image Classification
Image-to-Class Distance Metric Learning for Image Classification Zhengxiang Wang, Yiqun Hu, and Liang-Tien Chia Center for Multimedia and Network Technology, School of Computer Engineering Nanyang Technological
More informationString distance for automatic image classification
String distance for automatic image classification Nguyen Hong Thinh*, Le Vu Ha*, Barat Cecile** and Ducottet Christophe** *University of Engineering and Technology, Vietnam National University of HaNoi,
More informationPreviously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011
Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition
More informationIMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES
IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES Pin-Syuan Huang, Jing-Yi Tsai, Yu-Fang Wang, and Chun-Yi Tsai Department of Computer Science and Information Engineering, National Taitung University,
More informationMixtures of Gaussians and Advanced Feature Encoding
Mixtures of Gaussians and Advanced Feature Encoding Computer Vision Ali Borji UWM Many slides from James Hayes, Derek Hoiem, Florent Perronnin, and Hervé Why do good recognition systems go bad? E.g. Why
More informationScene Recognition using Bag-of-Words
Scene Recognition using Bag-of-Words Sarthak Ahuja B.Tech Computer Science Indraprastha Institute of Information Technology Okhla, Delhi 110020 Email: sarthak12088@iiitd.ac.in Anchita Goel B.Tech Computer
More informationCodebook Graph Coding of Descriptors
Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'5 3 Codebook Graph Coding of Descriptors Tetsuya Yoshida and Yuu Yamada Graduate School of Humanities and Science, Nara Women s University, Nara,
More informationA 2-D Histogram Representation of Images for Pooling
A 2-D Histogram Representation of Images for Pooling Xinnan YU and Yu-Jin ZHANG Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China ABSTRACT Designing a suitable image representation
More informationPart-based and local feature models for generic object recognition
Part-based and local feature models for generic object recognition May 28 th, 2015 Yong Jae Lee UC Davis Announcements PS2 grades up on SmartSite PS2 stats: Mean: 80.15 Standard Dev: 22.77 Vote on piazza
More informationA Novel Extreme Point Selection Algorithm in SIFT
A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes
More informationSketchable Histograms of Oriented Gradients for Object Detection
Sketchable Histograms of Oriented Gradients for Object Detection No Author Given No Institute Given Abstract. In this paper we investigate a new representation approach for visual object recognition. The
More informationMultiple Stage Residual Model for Accurate Image Classification
Multiple Stage Residual Model for Accurate Image Classification Song Bai, Xinggang Wang, Cong Yao, Xiang Bai Department of Electronics and Information Engineering Huazhong University of Science and Technology,
More informationBag-of-features. Cordelia Schmid
Bag-of-features for category classification Cordelia Schmid Visual search Particular objects and scenes, large databases Category recognition Image classification: assigning a class label to the image
More informationLearning Compact Visual Attributes for Large-scale Image Classification
Learning Compact Visual Attributes for Large-scale Image Classification Yu Su and Frédéric Jurie GREYC CNRS UMR 6072, University of Caen Basse-Normandie, Caen, France {yu.su,frederic.jurie}@unicaen.fr
More informationImageCLEF 2011
SZTAKI @ ImageCLEF 2011 Bálint Daróczy joint work with András Benczúr, Róbert Pethes Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences Training/test
More informationKernel Codebooks for Scene Categorization
Kernel Codebooks for Scene Categorization Jan C. van Gemert, Jan-Mark Geusebroek, Cor J. Veenman, and Arnold W.M. Smeulders Intelligent Systems Lab Amsterdam (ISLA), University of Amsterdam, Kruislaan
More informationMining Discriminative Adjectives and Prepositions for Natural Scene Recognition
Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition Bangpeng Yao 1, Juan Carlos Niebles 2,3, Li Fei-Fei 1 1 Department of Computer Science, Princeton University, NJ 08540, USA
More informationImproved Spatial Pyramid Matching for Image Classification
Improved Spatial Pyramid Matching for Image Classification Mohammad Shahiduzzaman, Dengsheng Zhang, and Guojun Lu Gippsland School of IT, Monash University, Australia {Shahid.Zaman,Dengsheng.Zhang,Guojun.Lu}@monash.edu
More informationTensor Decomposition of Dense SIFT Descriptors in Object Recognition
Tensor Decomposition of Dense SIFT Descriptors in Object Recognition Tan Vo 1 and Dat Tran 1 and Wanli Ma 1 1- Faculty of Education, Science, Technology and Mathematics University of Canberra, Australia
More informationPart based models for recognition. Kristen Grauman
Part based models for recognition Kristen Grauman UT Austin Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily
More informationROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING. Wei Liu, Serkan Kiranyaz and Moncef Gabbouj
Proceedings of the 5th International Symposium on Communications, Control and Signal Processing, ISCCSP 2012, Rome, Italy, 2-4 May 2012 ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING
More informationLarge-scale visual recognition The bag-of-words representation
Large-scale visual recognition The bag-of-words representation Florent Perronnin, XRCE Hervé Jégou, INRIA CVPR tutorial June 16, 2012 Outline Bag-of-words Large or small vocabularies? Extensions for instance-level
More informationImproving Recognition through Object Sub-categorization
Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,
More informationComparing Local Feature Descriptors in plsa-based Image Models
Comparing Local Feature Descriptors in plsa-based Image Models Eva Hörster 1,ThomasGreif 1, Rainer Lienhart 1, and Malcolm Slaney 2 1 Multimedia Computing Lab, University of Augsburg, Germany {hoerster,lienhart}@informatik.uni-augsburg.de
More informationFuzzy based Multiple Dictionary Bag of Words for Image Classification
Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image
More informationVideo annotation based on adaptive annular spatial partition scheme
Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory
More informationLearning Visual Semantics: Models, Massive Computation, and Innovative Applications
Learning Visual Semantics: Models, Massive Computation, and Innovative Applications Part II: Visual Features and Representations Liangliang Cao, IBM Watson Research Center Evolvement of Visual Features
More informationMultiple VLAD encoding of CNNs for image classification
Multiple VLAD encoding of CNNs for image classification Qing Li, Qiang Peng, Chuan Yan 1 arxiv:1707.00058v1 [cs.cv] 30 Jun 2017 Abstract Despite the effectiveness of convolutional neural networks (CNNs)
More informationAppearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization
Appearance-Based Place Recognition Using Whole-Image BRISK for Collaborative MultiRobot Localization Jung H. Oh, Gyuho Eoh, and Beom H. Lee Electrical and Computer Engineering, Seoul National University,
More informationFACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE. Project Plan
FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE Project Plan Structured Object Recognition for Content Based Image Retrieval Supervisors: Dr. Antonio Robles Kelly Dr. Jun
More informationEnsemble of Bayesian Filters for Loop Closure Detection
Ensemble of Bayesian Filters for Loop Closure Detection Mohammad Omar Salameh, Azizi Abdullah, Shahnorbanun Sahran Pattern Recognition Research Group Center for Artificial Intelligence Faculty of Information
More informationCPPP/UFMS at ImageCLEF 2014: Robot Vision Task
CPPP/UFMS at ImageCLEF 2014: Robot Vision Task Rodrigo de Carvalho Gomes, Lucas Correia Ribas, Amaury Antônio de Castro Junior, Wesley Nunes Gonçalves Federal University of Mato Grosso do Sul - Ponta Porã
More informationTEXTURE CLASSIFICATION METHODS: A REVIEW
TEXTURE CLASSIFICATION METHODS: A REVIEW Ms. Sonal B. Bhandare Prof. Dr. S. M. Kamalapur M.E. Student Associate Professor Deparment of Computer Engineering, Deparment of Computer Engineering, K. K. Wagh
More informationScene Classification with Low-dimensional Semantic Spaces and Weak Supervision
Scene Classification with Low-dimensional Semantic Spaces and Weak Supervision Nikhil Rasiwasia Nuno Vasconcelos Department of Electrical and Computer Engineering University of California, San Diego nikux@ucsd.edu,
More informationEnhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention
Enhanced and Efficient Image Retrieval via Saliency Feature and Visual Attention Anand K. Hase, Baisa L. Gunjal Abstract In the real world applications such as landmark search, copy protection, fake image
More informationDeterminant of homography-matrix-based multiple-object recognition
Determinant of homography-matrix-based multiple-object recognition 1 Nagachetan Bangalore, Madhu Kiran, Anil Suryaprakash Visio Ingenii Limited F2-F3 Maxet House Liverpool Road Luton, LU1 1RS United Kingdom
More informationRandomized Spatial Partition for Scene Recognition
Randomized Spatial Partition for Scene Recognition Yuning Jiang, Junsong Yuan, and Gang Yu School of Electrical and Electronics Engineering Nanyang Technological University, Singapore 639798 {ynjiang,jsyuan}@ntu.edu.sg,
More informationIII. VERVIEW OF THE METHODS
An Analytical Study of SIFT and SURF in Image Registration Vivek Kumar Gupta, Kanchan Cecil Department of Electronics & Telecommunication, Jabalpur engineering college, Jabalpur, India comparing the distance
More informationVisual Object Recognition
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 & Kristen Grauman Department
More informationImprovement of SURF Feature Image Registration Algorithm Based on Cluster Analysis
Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya
More informationMax-Margin Dictionary Learning for Multiclass Image Categorization
Max-Margin Dictionary Learning for Multiclass Image Categorization Xiao-Chen Lian 1, Zhiwei Li 3, Bao-Liang Lu 1,2, and Lei Zhang 3 1 Dept. of Computer Science and Engineering, Shanghai Jiao Tong University,
More informationNTHU Rain Removal Project
People NTHU Rain Removal Project Networked Video Lab, National Tsing Hua University, Hsinchu, Taiwan Li-Wei Kang, Institute of Information Science, Academia Sinica, Taipei, Taiwan Chia-Wen Lin *, Department
More informationCS 231A Computer Vision (Fall 2011) Problem Set 4
CS 231A Computer Vision (Fall 2011) Problem Set 4 Due: Nov. 30 th, 2011 (9:30am) 1 Part-based models for Object Recognition (50 points) One approach to object recognition is to use a deformable part-based
More informationMACHINE VISION is a subfield of artificial intelligence. An Ensemble of Deep Support Vector Machines for Image Categorization
An Ensemble of Deep Support Vector Machines for Image Categorization Azizi Abdullah, Remco C. Veltkamp Department of Information and Computer Sciences Utrecht University, The Netherlands azizi@cs.uu.nl,
More informationHierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms Liefeng Bo University of Washington Seattle WA 98195, USA Xiaofeng Ren ISTC-Pervasive Computing Intel Labs Seattle
More informationK-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors
K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering waset.org/publication/0007607
More informationLight-Weight Spatial Distribution Embedding of Adjacent Features for Image Search
Light-Weight Spatial Distribution Embedding of Adjacent Features for Image Search Yan Zhang 1,2, Yao Zhao 1,2, Shikui Wei 3( ), and Zhenfeng Zhu 1,2 1 Institute of Information Science, Beijing Jiaotong
More informationExtracting Spatio-temporal Local Features Considering Consecutiveness of Motions
Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu Noguchi and Keiji Yanai Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka,
More informationA Hybrid Feature Extractor using Fast Hessian Detector and SIFT
Technologies 2015, 3, 103-110; doi:10.3390/technologies3020103 OPEN ACCESS technologies ISSN 2227-7080 www.mdpi.com/journal/technologies Article A Hybrid Feature Extractor using Fast Hessian Detector and
More informationA Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani 2
IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 10, 2014 ISSN (online): 2321-0613 A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani
More informationThe devil is in the details: an evaluation of recent feature encoding methods
CHATFIELD et al.: THE DEVIL IS IN THE DETAILS 1 The devil is in the details: an evaluation of recent feature encoding methods Ken Chatfield http://www.robots.ox.ac.uk/~ken Victor Lempitsky http://www.robots.ox.ac.uk/~vilem
More informationSemantic-based image analysis with the goal of assisting artistic creation
Semantic-based image analysis with the goal of assisting artistic creation Pilar Rosado 1, Ferran Reverter 2, Eva Figueras 1, and Miquel Planas 1 1 Fine Arts Faculty, University of Barcelona, Spain, pilarrosado@ub.edu,
More informationEvaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition
International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 9 No. 4 Dec. 2014, pp. 1708-1717 2014 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Evaluation
More informationEfficient Kernels for Identifying Unbounded-Order Spatial Features
Efficient Kernels for Identifying Unbounded-Order Spatial Features Yimeng Zhang Carnegie Mellon University yimengz@andrew.cmu.edu Tsuhan Chen Cornell University tsuhan@ece.cornell.edu Abstract Higher order
More informationPreliminary Local Feature Selection by Support Vector Machine for Bag of Features
Preliminary Local Feature Selection by Support Vector Machine for Bag of Features Tetsu Matsukawa Koji Suzuki Takio Kurita :University of Tsukuba :National Institute of Advanced Industrial Science and
More informationLinear combinations of simple classifiers for the PASCAL challenge
Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu
More informationCombining Selective Search Segmentation and Random Forest for Image Classification
Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many
More informationRobust Scene Classification with Cross-level LLC Coding on CNN Features
Robust Scene Classification with Cross-level LLC Coding on CNN Features Zequn Jie 1, Shuicheng Yan 2 1 Keio-NUS CUTE Center, National University of Singapore, Singapore 2 Department of Electrical and Computer
More informationCS229: Action Recognition in Tennis
CS229: Action Recognition in Tennis Aman Sikka Stanford University Stanford, CA 94305 Rajbir Kataria Stanford University Stanford, CA 94305 asikka@stanford.edu rkataria@stanford.edu 1. Motivation As active
More informationContent-Based Image Classification: A Non-Parametric Approach
1 Content-Based Image Classification: A Non-Parametric Approach Paulo M. Ferreira, Mário A.T. Figueiredo, Pedro M. Q. Aguiar Abstract The rise of the amount imagery on the Internet, as well as in multimedia
More informationFig. 1 Feature descriptor can be extracted local features from difference regions and resolutions by (a) and (b). (a) Spatial pyramid matching (SPM) 1
IIEEJ Paper Image Categorization Using Hierarchical Spatial Matching Kernel TamT.LE, Yousun KANG (Member), Akihiro SUGIMOTO Kyoto University, Tokyo Polytechnic University, National Institute of Informatics
More informationWhere am I: Place instance and category recognition using spatial PACT
Where am I: Place instance and category recognition using spatial PACT Jianxin Wu James M. Rehg School of Interactive Computing, College of Computing, Georgia Institute of Technology {wujx,rehg}@cc.gatech.edu
More informationSpecular 3D Object Tracking by View Generative Learning
Specular 3D Object Tracking by View Generative Learning Yukiko Shinozuka, Francois de Sorbier and Hideo Saito Keio University 3-14-1 Hiyoshi, Kohoku-ku 223-8522 Yokohama, Japan shinozuka@hvrl.ics.keio.ac.jp
More informationLocality-constrained and Spatially Regularized Coding for Scene Categorization
Locality-constrained and Spatially Regularized Coding for Scene Categorization Aymen Shabou Hervé Le Borgne CEA, LIST, Vision & Content Engineering Laboratory Gif-sur-Yvettes, France aymen.shabou@cea.fr
More informationMutual Information Based Codebooks Construction for Natural Scene Categorization
Chinese Journal of Electronics Vol.20, No.3, July 2011 Mutual Information Based Codebooks Construction for Natural Scene Categorization XIE Wenjie, XU De, TANG Yingjun, LIU Shuoyan and FENG Songhe (Institute
More informationBossaNova at ImageCLEF 2012 Flickr Photo Annotation Task
BossaNova at ImageCLEF 2012 Flickr Photo Annotation Task S. Avila 1,2, N. Thome 1, M. Cord 1, E. Valle 3, and A. de A. Araújo 2 1 Pierre and Marie Curie University, UPMC-Sorbonne Universities, LIP6, France
More informationSEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION
SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION Yupeng Yan 1 Xinmei Tian 1 Linjun Yang 2 Yijuan Lu 3 Houqiang Li 1 1 University of Science and Technology of China, Hefei Anhui, China 2 Microsoft Corporation,
More informationObject Classification Problem
HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category
More informationSpatial Hierarchy of Textons Distributions for Scene Classification
Spatial Hierarchy of Textons Distributions for Scene Classification S. Battiato 1, G. M. Farinella 1, G. Gallo 1, and D. Ravì 1 Image Processing Laboratory, University of Catania, IT {battiato, gfarinella,
More informationIMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS
IMAGE CLASSIFICATION WITH MAX-SIFT DESCRIPTORS Lingxi Xie 1, Qi Tian 2, Jingdong Wang 3, and Bo Zhang 4 1,4 LITS, TNLIST, Dept. of Computer Sci&Tech, Tsinghua University, Beijing 100084, China 2 Department
More informationA Comparison of SIFT, PCA-SIFT and SURF
A Comparison of SIFT, PCA-SIFT and SURF Luo Juan Computer Graphics Lab, Chonbuk National University, Jeonju 561-756, South Korea qiuhehappy@hotmail.com Oubong Gwun Computer Graphics Lab, Chonbuk National
More informationEnhanced Random Forest with Image/Patch-Level Learning for Image Understanding
2014 22nd International Conference on Pattern Recognition Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding Wai Lam Hoo, Tae-Kyun Kim, Yuru Pei and Chee Seng Chan Center of
More informationBeyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba Adding spatial information Forming vocabularies from pairs of nearby features doublets
More informationSelection of Scale-Invariant Parts for Object Class Recognition
Selection of Scale-Invariant Parts for Object Class Recognition Gy. Dorkó and C. Schmid INRIA Rhône-Alpes, GRAVIR-CNRS 655, av. de l Europe, 3833 Montbonnot, France fdorko,schmidg@inrialpes.fr Abstract
More informationLocality-constrained Linear Coding for Image Classification
Locality-constrained Linear Coding for Image Classification Jinjun Wang, Jianchao Yang,KaiYu, Fengjun Lv, Thomas Huang, and Yihong Gong Akiira Media System, Palo Alto, California Beckman Institute, University
More informationImage Classification by Hierarchical Spatial Pooling with Partial Least Squares Analysis
ZHU, et al.: HIERARCHICAL SPATIAL POOLING 1 Image Classification by Hierarchical Spatial Pooling with Partial Least Squares Analysis Jun Zhu 1 junnyzhu@sjtu.edu.cn Weijia Zou 1 zouweijia@sjtu.edu.cn Xiaokang
More informationMulti-Class Image Classification: Sparsity Does It Better
Multi-Class Image Classification: Sparsity Does It Better Sean Ryan Fanello 1,2, Nicoletta Noceti 2, Giorgio Metta 1 and Francesca Odone 2 1 Department of Robotics, Brain and Cognitive Sciences, Istituto
More informationImproving Scene Classification by Fusion of Training Data and Web Resources
18th International Conference on Information Fusion Washington, DC - July 6-9, 2015 Improving Scene Classification by Fusion of Training Data and Web Resources Dongzhe Wang, and Kezhi Mao School of Electrical
More informationBy Suren Manvelyan,
By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan,
More informationLearning based face hallucination techniques: A survey
Vol. 3 (2014-15) pp. 37-45. : A survey Premitha Premnath K Department of Computer Science & Engineering Vidya Academy of Science & Technology Thrissur - 680501, Kerala, India (email: premithakpnath@gmail.com)
More informationIMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM. Christian Eggert, Stefan Romberg, Rainer Lienhart
IMPROVING VLAD: HIERARCHICAL CODING AND A REFINED LOCAL COORDINATE SYSTEM Christian Eggert, Stefan Romberg, Rainer Lienhart Multimedia Computing and Computer Vision Lab University of Augsburg ABSTRACT
More informationExploring Bag of Words Architectures in the Facial Expression Domain
Exploring Bag of Words Architectures in the Facial Expression Domain Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett Machine Perception Laboratory, University of California San Diego {ksikka,ting,josh,marni}@mplab.ucsd.edu
More informationIntroduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationPatch Descriptors. CSE 455 Linda Shapiro
Patch Descriptors CSE 455 Linda Shapiro How can we find corresponding points? How can we find correspondences? How do we describe an image patch? How do we describe an image patch? Patches with similar
More informationA Hierarchial Model for Visual Perception
A Hierarchial Model for Visual Perception Bolei Zhou 1 and Liqing Zhang 2 1 MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, and Department of Biomedical Engineering, Shanghai
More informationSparse coding for image classification
Sparse coding for image classification Columbia University Electrical Engineering: Kun Rong(kr2496@columbia.edu) Yongzhou Xiang(yx2211@columbia.edu) Yin Cui(yc2776@columbia.edu) Outline Background Introduction
More informationSupervised learning. y = f(x) function
Supervised learning y = f(x) output prediction function Image feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the
More informationTraffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers
Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane
More informationKNOWING Where am I has always being an important
CENTRIST: A VISUAL DESCRIPTOR FOR SCENE CATEGORIZATION 1 CENTRIST: A Visual Descriptor for Scene Categorization Jianxin Wu, Member, IEEE and James M. Rehg, Member, IEEE Abstract CENTRIST (CENsus TRansform
More informationImage classification based on support vector machine and the fusion of complementary features
Image classification based on support vector machine and the fusion of complementary features Huilin Gao, a,b Wenjie Chen, a,b,* Lihua Dou, a,b a Beijing Institute of Technology, School of Automation,
More informationCELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION
U.P.B. Sci. Bull., Series C, Vol. 77, Iss. 4, 2015 ISSN 2286-3540 CELLULAR AUTOMATA BAG OF VISUAL WORDS FOR OBJECT RECOGNITION Ionuţ Mironică 1, Bogdan Ionescu 2, Radu Dogaru 3 In this paper we propose
More informationILSVRC on a Smartphone
[DOI: 10.2197/ipsjtcva.6.83] Express Paper ILSVRC on a Smartphone Yoshiyuki Kawano 1,a) Keiji Yanai 1,b) Received: March 14, 2014, Accepted: April 24, 2014, Released: July 25, 2014 Abstract: In this work,
More information