Improved Spatial Pyramid Matching for Image Classification

Similar documents
Beyond Bags of Features

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

arxiv: v3 [cs.cv] 3 Oct 2012

Aggregating Descriptors with Local Gaussian Metrics

Artistic ideation based on computer vision methods

String distance for automatic image classification

Part-based and local feature models for generic object recognition

Part based models for recognition. Kristen Grauman

ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING. Wei Liu, Serkan Kiranyaz and Moncef Gabbouj

Video annotation based on adaptive annular spatial partition scheme

Beyond bags of Features

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Object Classification Problem

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Using Geometric Blur for Point Correspondence

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Tensor Decomposition of Dense SIFT Descriptors in Object Recognition

Comparing Local Feature Descriptors in plsa-based Image Models

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Ensemble of Bayesian Filters for Loop Closure Detection

Evaluation and comparison of interest points/regions

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

TEXTURE CLASSIFICATION METHODS: A REVIEW

Bag-of-features. Cordelia Schmid

Scene Recognition using Bag-of-Words

Improving Recognition through Object Sub-categorization

Exploring Bag of Words Architectures in the Facial Expression Domain

Sparse coding for image classification

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

ImageCLEF 2011

Sketchable Histograms of Oriented Gradients for Object Detection

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Visual Object Recognition

Beyond Bags of features Spatial information & Shape models

CPPP/UFMS at ImageCLEF 2014: Robot Vision Task

Spatial Hierarchy of Textons Distributions for Scene Classification

Local Features and Bag of Words Models

Action recognition in videos

Patch Descriptors. CSE 455 Linda Shapiro

Combining Selective Search Segmentation and Random Forest for Image Classification

Learning Representations for Visual Object Class Recognition

CS6716 Pattern Recognition

Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition

Kernels for Visual Words Histograms

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

CS229: Action Recognition in Tennis

CS6670: Computer Vision

Codebook Graph Coding of Descriptors

CLASSIFICATION Experiments

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Object Detection Using Segmented Images

By Suren Manvelyan,

SEMANTIC-SPATIAL MATCHING FOR IMAGE CLASSIFICATION

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Selection of Scale-Invariant Parts for Object Class Recognition

A Keypoint Descriptor Inspired by Retinal Computation

Dynamic Scene Classification using Spatial and Temporal Cues

Multiple Kernel Learning for Emotion Recognition in the Wild

Deformable Part Models

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Human Detection and Action Recognition. in Video Sequences

I2R ImageCLEF Photo Annotation 2009 Working Notes

arxiv: v1 [cs.lg] 20 Dec 2013

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

TA Section: Problem Set 4

CS 231A Computer Vision (Fall 2011) Problem Set 4

Learning Compact Visual Attributes for Large-scale Image Classification

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

OBJECT CATEGORIZATION

A Study on Low-Cost Representations for Image Feature Extraction on Mobile Devices

Patch Descriptors. EE/CSE 576 Linda Shapiro

Aggregated Color Descriptors for Land Use Classification

Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Fast Image Matching Using Multi-level Texture Descriptor

The Caltech-UCSD Birds Dataset

Novelty Detection Using Sparse Online Gaussian Processes for Visual Object Recognition

Semantic-based image analysis with the goal of assisting artistic creation

SEMANTIC SEGMENTATION AS IMAGE REPRESENTATION FOR SCENE RECOGNITION. Ahmed Bassiouny, Motaz El-Saban. Microsoft Advanced Technology Labs, Cairo, Egypt

Fig. 1 Feature descriptor can be extracted local features from difference regions and resolutions by (a) and (b). (a) Spatial pyramid matching (SPM) 1

Supervised learning. y = f(x) function

Developing Open Source code for Pyramidal Histogram Feature Sets

Recognition with Bag-ofWords. (Borrowing heavily from Tutorial Slides by Li Fei-fei)

Local Image Features

Kernel Codebooks for Scene Categorization

A Survey on Image Classification using Data Mining Techniques Vyoma Patel 1 G. J. Sahani 2

Generic object recognition using graph embedding into a vector space

The SIFT (Scale Invariant Feature

Latest development in image feature representation and extraction

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Real-Time Detection of Landscape Scenes

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Loose Shape Model for Discriminative Learning of Object Categories

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Spatial Pyramids and Two-layer Stacking SVM Classifiers for Image Categorization: A Comparative Study

Image Classification based on Saliency Driven Nonlinear Diffusion and Multi-scale Information Fusion Ms. Swapna R. Kharche 1, Prof.B.K.

Facial Expression Classification with Random Filters Feature Extraction

Modeling Image Context using Object Centered Grid

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Coarse-to-fine image registration

Transcription:

Improved Spatial Pyramid Matching for Image Classification Mohammad Shahiduzzaman, Dengsheng Zhang, and Guojun Lu Gippsland School of IT, Monash University, Australia {Shahid.Zaman,Dengsheng.Zhang,Guojun.Lu}@monash.edu Abstract. Spatial analysis of salient feature points has been shown to be promising in image analysis and classification. In the past, spatial pyramid matching makes use of both of salient feature points and spatial multiresolution blocks to match between images. However, it is shown that different images or blocks can still have similar features using spatial pyramid matching. The analysis and matching will be more accurate in scale space. In this paper, we propose to do spatial pyramid matching in scale space. Specifically, pyramid match histograms are computed in multiple scales to refine the kernel for support vector machine classification. We show that the combination of salient point features, scale space and spatial pyramid matching improves the original spatial pyramid matching significantly. 1 Introduction Image classification has attracted large amount of research interest in the past few decades due to the ever increasing digital image data generated around the world. Traditionally, images are represented and retrieved using low level features. Recently, machine learning tools have been widely used to classify images into semantic categories. Now low level features can be used more efficiently than ever. Image classification is an important application in computer vision. Our research goal is to improve methods for Image classification, more specifically natural scene images or images with some spatial configurations. We want to classify an image based on its semantic category of a scene like forest, road or building etc. Our approach to whole image categorization employs to renowned techniques namely Spatial Pyramid Matching (SPM) [1] and scale space theory. Our objective is to combine the power of these two methods. In this paper, scene categorization is attempted by global image representation developed from low level image properties. There is another approach for this task that is to get idea of high level semantic attributes by segmentation of objects on the scene (like bed or car) and classify the scene accordingly. We believe scene classification can be done without extracting this high level object cues. This is inspired by the publications of [2] where they proved that people can recognize natural scenes while overlooking most of the details in it (i.e. the constituent objects). In another publication [3] it is also shown that global information is as important as local information for scene classification by human subjects. R. Kimmel, R. Klette, and A. Sugimoto (Eds.): ACCV 2010, Part IV, LNCS 6495, pp. 449 459, 2011. Springer-Verlag Berlin Heidelberg 2011

450 M. Shahiduzzaman, D. Zhang, and G. Lu Scale is an important aspect of local feature finding in prominent cue detection in images. The most prominent example of using scale space and characteristics scale is the local invariant feature detector SIFT [4]. In SIFT the authors used maxima/minima of neighboring scale space to find the interest points or key points of an image. Scene features like sands in a beach or certain textures in the curtain of a room would be more evident in bigger scales. Scale-space theory is a framework for multi-scale signal representation. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parameterized by the size of the smoothing kernel used for suppressing fine-scale structures [5]. In recent years the bag-of-features (BoF) model has been extremely popular in image categorization. The method treats an image as a collection of unordered appearance descriptors extracted from local patches. Then the patches or descriptors are quantized into discrete visual words of a codebook dictionary, and then the image histograms are compared and classified according to the dictionary. The BoF approach discards the spatial order of local descriptors, which severely limits the descriptive power of the image representation. By overcoming this problem, one particular extension of the BoF model, called spatial pyramid matching (SPM) [1], has made a remarkable success on a range of image classification benchmarks and was the major component of the state-of-the-art systems, e.g., [6]. Our method is based on SPM. Similarly like SPM we have used the subdivide and disorder principle. The essence of this principle is to partition the image into smaller blocks and calculate orderless statistics of low level image features. Existing methods differs by the use of features (like pixel value, gradient orientation, and filter bank outputs) and the subdivision method (regular grid, quad trees, and flexible image windows). SPM and as well as our method is independent in choice of features, anyone can plug any other type of features to get a classification result. Authors of [7] offered an early insight into subdivide and principle by suggesting that locally orderless image play an important role in visual perception. While SPM authors did not consider their Gaussian scale space of apertures, we integrated that idea into SPM. Importance of locally orderless statistics is also evident from few recent publications. To summarize, our method provides a unified framework to combine the gains from subdivide and disorder principle and scale space aperture with a choice of low level features. It will enable to combine the locally orderless statistics results from multiple scales and different fixed hierarchy or rectangular windows to achieve the scene classification task. 2 Related Methods In this work we combine the power of multiresolution histogram with spatial pyramid matching. So our method consists of two concepts - multiresolution or scale space analysis of image and spatial pyramid matching. In kernel based learning methods like support vector machine (SVM), we need to provide a

Improved Spatial Pyramid Matching for Image Classification 451 Fig. 1. Schematic illustration of Pyramid match kernel with two levels kernel for learning and testing. There are many kernels, which varies in formulation. For example, histogram intersection kernel is a kernel matrix which is built by histogram intersection. Essentially it provides a pair wise similarity measure of the training and testing images. A pyramid match kernel (PMK) [1] works with an unordered image representation/features. The idea of the method is to compute multiresolution histograms and finding the histogram intersection at each resolution. In figure 1, for two different images X and Y, histograms and the corresponding histogram intersections are computed at three resolution levels (0,1,2). The bin size is doubled in successive higher resolutions while the bin numbers are down sampled by 2. After that, all new histogram matching in each resolution is weighted and summed up to form the histogram intersection kernel. It has the limitation of discarding all spatial information. Let us construct a sequence of grids at resolutions 0,1,...,L such that the grid at level lhas2 l cells along each dimension. Number of matches (I l ) at level l is given by the histogram intersection function. Therefore, the number of new matches found at level l is given by I l I l+1 for l = 0,1,...,L-1. The weight associated 1 with level l is set to (2 L l ). Spatial pyramid matching (SPM) takes a different approach of performing pyramid matching in the two-dimensional image space, and using traditional clustering techniques in feature space. So in SPM the histogram computation is done at a single resolution and in multiple pyramid levels within the same resolution, whereas in PMK it is done in multiresolution. PMK dont employ any feature clustering, directly map features in multiresolution histogram bins. On the other hand, SPM uses feature clustering during histogram computation to find the representative feature sets. In SPM, all feature vectors are first quantized into M discrete types (i.e. the total number of histogram indices is M). In figure 2, we are showing an example of constructing a three-level spatial pyramid. The image has three types of features, indicated by triangles, circles and stars. At the top row, the image is subdivided at three different levels of resolution. At the bottom row, the number of features that fall in each subregion is counted. The spatial histograms are weighted according to pyramid

452 M. Shahiduzzaman, D. Zhang, and G. Lu Fig. 2. Three-level spatial pyramid example match kernel. During kernel computation, each type calculation comprised of two sets of two- dimensional vectors, X m and Y m, representing the coordinates of features of type m found in the respective images. The final kernel is then the sum of the separate channel kernels: K L (X, Y )= M K L (X m,y m ) (1) m=1 This method reduces to a standard bag of features when it is a single level. Considering the fact that pyramid match kernel is simply a weighted sum of histogram intersections, and c min(a, b) = min(ca, cb) for positive numbers, K L can be implemented as a single histogram intersection of long vectors formed by concatenating the appropriately weighted histograms of all channels at all resolutions. So essentially we are weighting the histograms before computing the histogram intersection for convenience as the reverse would yield the same result. For L levels and M channels and S scales, the resulting vector has dimensionality: (M L 4 l ) S = M 1 3 (4L+1 1) S (2) l=1 Several experiments reported in results section use the settings of M = 200, L = 3 and S = 3 resulting in (3 17000) -dimensional histogram intersections. However

Improved Spatial Pyramid Matching for Image Classification 453 these operations are efficient because the histogram vectors are extremely sparse, the computational complexity of the kernel is linear in the number of features. One important aspect of the training and test images that we run the experiment only on gray level images; even if color images are available we converted in to gray level images. We decide this from the finding of [9] that removing color information from images doesnt make the scene categorization tasks more attention demanding. 3 Proposed Method: Multi-scale SPM SPM uses a mechanism to combine local salient features and their spatial relationship so as to provide a robust feature matching. However, in many cases, different image and block can have similar histograms, this degrade the performance of SPM. This drawback can be overcome by analyzing images in scale space, as confusions in previous case can be clarified at different scales. For example, in figure 3, images (a) and (b) are artificially generated images with almost similar histograms, later they are Gaussian blurred and hence their histograms are also more discriminative than the original histograms. For a given image f(x,y), its linear (Gaussian) scale-space representation is a family of derived signals L(x,y;t) defined by the convolution of f(x,y) with the Gaussian kernel: g t (x, y) = 1 2πt e (x2+y2) 2t Such that L(x, y; t) =(g g t f)(x, y) (3) Inspired by scale space theory we want to propose a multi-scale spatial pyramid matching method. Key idea behind our method is the use of scale space to gain (a) (b) (c) (d) (e) (f) (g) (h) Fig. 3. (a) and (c) are different images with almost similar image histograms (b) and (d). (e) and (g) are corresponding Gaussian blurred images and the previous small difference in histograms is now more prominent in higher scales(f and g).

454 M. Shahiduzzaman, D. Zhang, and G. Lu Fig. 4. Block diagram of the proposed method more discriminative power in classification. The major steps of our algorithm are (figure 4). 3.1 Feature Generation in Different Scales First SIFT features are generated from all the images in different scales in a regular grid. Here a dense feature representation is used to avoid the problems superfluous data like clutter, occlusion etc. 128 bit SIFT descriptors are calculated for all images in all scales in 8*8 regular grid settings and using a 16*16 patch in the grid centers. These features are saved into files for use in later steps. 3.2 Calculate Dictionary The features are clustered according to the parameter M which is the total number of bins in of the computed histograms. It is often believed that increasing the number of M will increase the classification accuracy. But, in our experiments we are getting comparable accuracy from M=200 setup compared to M=400 and M=600. Again the dictionary is built for all images in all scales. Dictionary is calculated using K-means based clustering using all the extracted SIFT features in a specific scale. In figure 5 (left image), we are showing the corresponding histogram of the values of a 200 sized dictionary. Separate dictionaries are calculated for separate scales. The dictionaries are calculated for using in histogram generation in later stages.

Improved Spatial Pyramid Matching for Image Classification 455 Fig. 5. Histogram plot of the calculated dictionary (left) and combined pyramid histogram plot of all individual histograms in different levels (right) 3.3 Compile Pyramid Histogram For all scales, the image is divided ranging from coarse to finer resolution and compute histogram in each area and assign weight according to PMK. Match in finer resolution will be given more weight than match in coarse resolution. After these steps now we have all the data required to build the pyramid histogram. With the different scale level histograms, we can just concatenate those forming a long histogram or compute inter-scale intersection/selection before forming the concatenation. We are taking the first approach in our method. Though this will essentially increase the size of the long histogram by the scale factor, but that wouldnt be a problem performance-wise. In this research our focus is on increasing classification accuracy and leveraging performance on the currently available powerful hardware. In figure 5 (right image), one such combined pyramid histogram is shown. According to equation 2, size of the histogram is 34000 for dictionary size 200, 3 pyramid levels and scale level 1. 3.4 Kernel Computation and SVM Classification For SVM, we just need to build the histogram intersection kernel from the compiled pyramid histograms. As we explained before, for the histogram intersection kernel computation we just need to find the intersections of the long histogram concatenation formed in the previous step. For training kernel intersection is computed between the same concatenated histograms and for training kernel it is between training histogram and testing histogram. A grey scale image map of the testing and training kernel is shown in figure 6. For training kernel, a white line is visible along the diagonal, as there will be a perfect match for corresponding training pairs. In testing kernel the matches are scattered as training and testing sets are different. For SVM, we are using a modified version of libsvm library [10] which implements the one vs. all classification. scales and different fixed hierarchy or rectangular windows to achieve the scene classification task.

456 M. Shahiduzzaman, D. Zhang, and G. Lu Fig. 6. Histogram intersection kernel as image for Training images (left) and testing images (right) 4 Experimental Results 4.1 Test Dataset We tested our method on scene category dataset [1], Caltech-101 [11] and Caltech- 256 [12]. A brief statistical comparison of these three datasets is given in table 1. 4.2 Performance Metric Two separate performance metric is used to measure the results combined accuracy and average of per class accuracy. Per class accuracy (P) is defined as the ratio of correctly classified images in a class with respect total number of images in that particular class. If total number of image categories is N, then combined accuracy and average of per class accuracy is defined as: Average of per class accuracy = N i=1 P i N (4) Combined accuracy = Total number of correctly classified images 100 Total number of images in the dataset (5) Table 1. Statistical information of the image datasets used Dataset No. of Total No. of Avg. image Max. no. of train/test categories images size images used Scene category 15 4485 300*250 100/rest Caltech-101 102 9144 300*200 30/300 Caltech-256 257 30607 351*300 60/300

Improved Spatial Pyramid Matching for Image Classification 457 Table 2. Accuracy results on different combination of parameters. Bold font means its the best for a certain codebook size and pyramid level. Codebook Pyramid Scale Combined Avg. of per class Size level level accuracy (%) accuracy (%) 200 3 1 81.47 ± 0.59 81.11 ± 0.68 200 3 2 83.69 ± 0.50 83.31 ± 0.59 200 3 3 83.45 ± 0.57 83.21 ± 0.61 200 2 1 79.88 ± 0.52 81.1 ± 0.30 200 2 2 82.69 ± 0.67 82.25 ± 0.52 200 2 3 82.78 ± 0.70 82.21 ± 0.75 400 3 1 81.95 ± 0.57 81.1 ± 0.60 400 3 2 83.78 ± 0.64 83.48 ± 0.58 400 3 3 83.71 ± 0.54 83.29 ± 0.70 400 2 1 80.28 ± 0.53 81.4 ± 0.50 400 2 2 83.22 ± 0.44 82.75 ± 0.40 400 2 3 83.10 ± 0.63 82.67 ± 0.78 Table 3. Our result compared to the original SPM for codebook size = 400, pyramid level = 3 and scale level = 2 SPM [1] Proposed method Average of per class accuracy(%) 81.1 ± 0.60 83.48 ± 0.58 Combined accuracy(%) 81.95 ± 0.57 83.78 ± 0.64 Table 4. Caltech-101 result for codebook size=400, pyramid level=3 and scale level=3 SPM [1] Proposed method Average of per class accuracy(%) 64.6 ± 0.7 67.36 ± 0.17 Combined accuracy(%) 70.59 ± 0.16 76.65 ± 0.46 Table 5. Caltech-256 result for codebook size=400, pyramid level=3 and scale level=3 SPM [12] Proposed method Average of per class accuracy(%) 32.62 ± 0.41 37.54 ± 0.31 Combined accuracy(%) 34.98 ± 0.60 40.19 ± 0.12 Table 2 is the extensive experiment done with codebook size, pyramid level, scale level. Results are first grouped by codebook size and pyramid levels. The notable thing here is that, scale level greater than one always produce better results than single level. Using the combined accuracy metric, we get our best result from codebook size 400, pyramid level 3 and scale level 2. Scale level 1 is basically the original SPM. So for scale level 1, we use the results from [1]. But as the authors of [1] didn t report the result of combined accuracy, we calculated it using our own implementation of SPM. All results are obtained using a 2*64 bit Quad core processor with 48

458 M. Shahiduzzaman, D. Zhang, and G. Lu Fig. 7. Per class accuracy for the result (average of per class accuracy) reported in Table 2 GB ofram. All experiments arerun for ten times with randomly selected training and testing images. The average of all the runs and standard deviation is reported here. Table 3 summarizes our best result compared to the original SPM. In figure 7, we showed the per class accuracy for the best result reported in Table 4. Our method outperforms SPM in eleven categories and provides comparable performance in the four categories. We tested whether the difference between two methods reported in table 2 is statistically significant by the Matlab function ttest. In this case, ttest result indicated that the improvement obtained the by the proposed method is indeed statistically significant. The results on Caltech-101 and Caltech- 256 are presented in table 4, 5 and it is in line with the results obtained from scene category dataset. On both of these databases, according to overall average accuracy metric, proposed method is better than SPM by around 3% margin and using the average of per class accuracy metric, the margin is around 6%. 5 Conclusion and Future Scope This paper presents an improvement to the spatial pyramid matching scheme. We provided a simple, intuitive and effective way to improve the SPM method.

Improved Spatial Pyramid Matching for Image Classification 459 To the best of our knowledge, this has not been done by previous researchers. The proposed extension is quite general and not limited to any specific feature descriptors or classifiers and can be used as a surrogate module or new baseline for SPM in image categorization systems. The weight mechanism of the spatial pyramid matching (SPM) method is not sophisticated enough. It defines uniform and better weight level to the finer resolution blocks and punishes the coarse resolution blocks by assigning less weight. As a basic method this is okay, but consider a finer resolution block containing only background or clutter, then assigning it more weight is only misleading calculation. So in the future, there is room for redesigning this weight mechanism to only assigning more weight to the corresponding blocks irrespective of scale or spatial resolution. References 1. Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2196 2178 (2006) 2. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145 175 (2001) 3. Ogel, J., Schwaninger, A., Wallraven, C., Bülthoff, H.H.: Categorization of Natural Scenes: Local versus Global Information and the Role of Color. ACM Transactions on Applied Perception 4(3) (2007) 4. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60(3), 91 110 (2004) 5. Witkin, A.P.: Scale-space filtering. In: Proceedings of 8th International Joint Conference on Artificial Intelligence, pp. 1019 1022 (1983) 6. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge. In: VOC 2009 (2009), http://www.pascal-network.org/challenges/voc/voc2009/workshop/index. html 7. Koenderink, J., Doorn, A.V.: The structure of locally orderless images. International Journal of Computer Vision 31(199), 159 168 8. Grauman, K., Darrell, T.: The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV (2005) 9. Fei-fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2005) 10. Chang C., Lin C.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm 11. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: Proceedings of IEEE Workshop on Generative-Model Based Vision, CVPR (2004) 12. Griffin, G., Holub, A., Perona, P.: Caltech-256 Object Category Dataset. Caltech Technical Report. Technical Report, Caltech (2007)