Comparisson of Fuzzy C Means, K-Means and K-Medoids for Clustering in the Bag Of Visual Words Algorithm

Size: px

Start display at page:

Download "Comparisson of Fuzzy C Means, K-Means and K-Medoids for Clustering in the Bag Of Visual Words Algorithm"

Joshua Payne
5 years ago
Views:

1 Comparisson of Fuzzy C Means, K-Means and K-Medoids for Clustering in the Bag Of Visual Words Algorithm Daniel Fernando Tello Gamarra Control and Automation Engineering Universidade Federal de Santa Maria (UFSM) Santa Maria, Rio Grande do Sul, Brazil, CEP daniel.gamarra@ufsm.br Marco Antonio De Souza Leite Cuadros Instituto Federal do Espirito Santo (IFES) Serra, Espirito Santo, Brazil, CEP marcoantonio@ifes.edu.br Abstract This paper makes a comparison of the performance of three different clustering techniques in the framework of the bag of visual words algorithm. The bag of visual words is a state of the art technique for image classification, one of the key features of the algorithm is the construction of a Codebook with clustering techniques. Different simulations will be run in an object database using the algorithms fuzzy c means, k-medois and k-means in order to evaluate their influence in the Bag of visual words algorithm when it is used for object classification. I. INTRODUCTION Human beings have an outstanding capacity of recognizing objects, and establish differences among objects that belong to the same class. If machines could develop or mimic in some way human capacity for object recognition with a good level of accuracy, this fact could pave the way for many applications in which machines could enhance their performance to interact with its environment in a more successful way. One of the algorithms that is state of the art in object recognition is the algorithm of Bag of Visual Words (BOV) initially proposed in [1] and inspired in the application of the algorithm of Bag of Words (BoW) created for text search applications. An essential step of the algorithm is the creation of a codebook with clustering techniques, being the k-means the most frequently used technique for this purpose. Application of other clustering algorithms for the BOV have been also explored in previous works as in the work of Dell Agnello in [2] that studied the application of the Fuzzy C Means (FCM) algorithm and the Kernel Fuzzy C Means (KFCM) algorithm for the construction of the visual codebooks, the results shown in the paper pointed out that the KFCM algorithm has a better performance than the FCM, but the computational cost is higher. Another application of the use of the FCM algorithm is proposed by Sujatha in [3] that implements the BOW technique using a multiple dictionary bag of words, the multiple dictionary information is concatenated in the process of image classification, the dictionaries are created usign the Fuzzy C Means algorithm. Zhou in [4] proposed a structured SOM (Self Organizing Maps) and a Spatial Constrained Fuzzy C Means with hierarchical technique to improve the bag-of-visual words model profiting the spatial information of the data for the clustering. Also Sujatha in [5] explores the application of the K-means and K-medoids clustering techniques for the bag of visual words. This paper is focused on the evaluation of three clustering algorithms for the codebook construction of an image dataset for the BOV technique. The remainder of the paper is as follows. In the second section are described the algorithms used in this work; the third section shows the experimental setup architecture using the Bag of Visual words algorithm; in the fourth section the experiments executed in the paper are described; finally, conclusions are drawn in the last section. II. A. Bag of Visual Words THEORETICAL BACKGROUND The Bag of Visual Words algorithm initially is derived from the algorithm Bag of Words (BoW) algorithm is a successful technique applied in text retrieval. The technique could be sumarized in five steps according to [5]: (i) feature extraction of the objects, (ii) feature pool, (iii) codebook generation from the extracted features using clustering techniques, (iv) training, and (v) classification using techniques as Support Vector Machines (SVM) or K-Nearest Neighbor (KNN) for object recognition. In the first step, as stated in [6] traditionally, the BOV approach models the distribution of low-level local features using the scale-invariant features transformation (SIFT) [7], which computes the orientation and gradient of the keypoints in gray-level information. The algorithm of k-means is used to cluster the feature detectors set in k clusters, the center of every cluster is named as a visual word and the set of these visual words generates the codebook. Based on the visual words, as stated in [6], an histogram is generated by counting their occurrence numbers. Here, such histogram is defined as a BOV representation. B. K-Means The description of the K-means algorithm is based on Bishop s book in [8], that states that, the k-means clustering is applied to identify groups, or clusters, of data points in a multidimensional space. If it is defined a data set {x 1,..., x N }, the goal is to partition the data into some number of k clusters.

2 If it is used the variable v k as representing the center of the clusters. Every data point x n has a corresponding set of binary indicator variables r nk {0, 1}, so every data point is assigned to a cluster k then r nk = 1, and r nj = 0 for j k. It is then defined an objective function called a distortion measure, given by N K J = r nk x n v k 2 (1) n=1 k=1 The algorithm tries to minimize this cost function. First are chosen randomly from the dataset some initial values for the v k, it is possible to optimize for each n separately by choosing r nk to be 1 for whichever value of k gives the minimum value of x n v k 2. In other words, the nth data point is simply assign to the closest cluster center, formally this is expressed by the equation: r nk = { 1 if k = arg min j x n v j 2 0 otherwise. The clusters means are recomputed with the equation: n v k = r nkx n n r (3) nk The two phases of the algorithm of re-assigning data points to clusters and re-computing the cluster means are repeated until there is no further change in the assignments or a number of iterations is exceeded. C. K-Medoids As stated in [9], in contrast to the k-means algorithm, k- medoids selects datapoints as centers (medoids or exemplars), so a medoid is the most centrally located point in the set. As Shah stated the most commom realization of k-medoid clustering is the Partitioning around Medoids (PAM ) algorithm and is described through the following steps: 1) Initialize: randomly select k of the n data points as the medoids 2) Assignment step: Associate each data point to the closest medoid. 3) Update step: For each medoid m and each data point o associated to m, swap m and compute the total cost of the configuration. Select the medoid o with the lowest cost of the configuration. 4) Repeat Alternating steps 2 and 3 D. Fuzzy C Means (FCM) As mentioned in [10], the popularity and usefulness of fuzzy C means results from three facts. The algorithm is very simple; the algorithm is unsupervised and is always convergent. The fuzzy C means algorithm uses concepts from the field of fuzzy logic and fuzzy set theory. The algorithm can be detailed following the description written in [11] and resumed in [12]. Fuzzy C Means is a very popular algorithm and has different applications in image processing such as segmentation[13].given the data set X, chose the number of clusters c, between 1 < c < N, where N is the number of (2) data objects, the weighting exponent m > 1 the tuning of this parameter can influence the performance of the algorithm, the termination tolerance ε > 0 and the norm inducing matrix A. The shape of the clusters is determined by the choice of matrix A, a common choice is A = I, which gives the standard Euclidean norm as stated in [14]. The membership of the data samples in the clusters is described by the fuzzy partition matrix U that will be initialized randomly. The weight µ ik represents the degree of membership of an object in a cluster, and is contained in the kth column of U. Every object belongs to every cluster in some degree, and each cluster is characterized by its centroid v i, the algorithm tries to minimize a cost function: J F CM = N n=1 k=1 c u m ik x i v k 2 (4) Assuming that the number of determined iterations is l, the first step is to compute the cluster centroids: v l i = N k=1 (µl 1 ik )x k N n=1 (µl 1 ik ), 1 i c (5) The second step consists in computing the distances, it means minimizing the sum square error: d 2 ika = (x k v l i) T A(x k v l i) (6) 1 i c, 1 k N The third step is to update the partition matrix elements: if d ika > 0 for 1 i c, 1 k N µ l ik = 1 c j=1 (d ika/d jka ) ( 2/(m 1)) Otherwise µ l ik = 0 if d ika = 0 and µ l ik [0, 1] with Σ c i=1 µl ik = 1. This procedure will be repeated a determined number of iterations or until the following condition will be reached U l U l 1 < ε meaning the absolute change of the variation of the fuzzy partition matrix is below a given threshold. III. EXPERIMENTAL SETUP The different experiments executed in this paper used the code hands on Advanced Bag-of-Words Models for Visual Recognition that can be downloaded in [15] developed by Ballan and Seidenari, all the code is written in Matlab using commands from the image processing toolbox of matlab and the libvsm library wrapped in C from matlab. The clustering algorithms, Fuzzy C Means and K-Medoids that were used are derived from the matlab toolboxes of clustering and Fuzzy logic, the K-Means algorithm used is the one taken from Ballan algorithm code. The classification step was developed using the algorithm of Support Vector Machines (SVM) as linear classifiers using the libvsm library. (7)

The algorithms were tested on a set of figures taken from the Caltech data-set. The Caltech dataset was divided in two subsets, one of 4 images and another subset of 15 images.

The second subset is shown in Figure 2, and has the following types of images: bonsai, butterfly, crab, elephant, euphonium, faces, grandpiano, joshuatree, leopards, lotus, motorbikes, schooner,

3 The algorithms were tested on a set of figures taken from the Caltech data-set. The Caltech dataset was divided in two subsets, one of 4 images and another subset of 15 images. The first subset is shown in Figure 1 and contains four types of images that are images of faces, motorbikes, cars and airplanes, every type of image has 450 images, i.e. the face type has 450 different faces. The second subset is shown in Figure 2, and has the following types of images: bonsai, butterfly, crab, elephant, euphonium, faces, grandpiano, joshuatree, leopards, lotus, motorbikes, schooner, stopsign, sunflower and watch. These types have different images number that varies between 60 to 240 images approximately. Dense features descriptors DSIFT were used instead of SIFT features detector in order to increase the number of features extracted in the objects of the two sets of images.the number of visual words constructed was 500 visual words, the number of features was features. The number of images selected for training was 30, and the number of images selecting for testing was 50. IV. RESULTS Different experiments were run with the two subsets of the Caltech dataset images, one with the 4-objects subset and another with the 15-objects subset. So, in this section, the algorithm used and the results obtained will be described. Both images subsets used the same clustering algorithms with the same parameters. It will be described briefly the clustering algorithms and the parameters used. The following parameters were used for the clustering algorithms, for the K-Means algorithm algorithm a maximum number of iterations of 50 were defined. The K-Medoids algorithm used an Euclidean distance. For the Fuzzy C Means algorithm, the number of iterations for the algorithm was 20. The weighting exponent (m) was 1.2, the termination tolerance (ε) also named as minimum amount of improvement was set to 1e 5; we have tried different values for m but the performance of the clustering decreased. The number of clusters or visual words is 500, and the number of features is A. subset 4-object The clustering algorithm will create 500 visual words, Figure 3 shows one of these 500 visual words obtained after applying the clustering algorithm. Fig. 1. images taken from the Caltech subset of 4-object. Fig. 3. Visual Word Obtained from the clustering step in the BOV technique. Fig. 2. images taken from the Caltech subset of 15-object. Figure 4 shows the classification accuracy, also known as confusion matrix, of the Bag of Visual Words Algorithm clustered applying the K-Means algorithm in one of the experiments. The confusion matrix is a contingency table where each cell has the number of objects shared by one a priori class and one of the predicted class. From the confusion matrix it can be computed the metrics accuracy. The diagonal of the matrix represents the accuracy of the classifier, in this case a Support Vector Machine (SVM). Figure 5 shows the classification accuracy of the Bag of Visual Words Algorithm clustered applying the K-Medoids algorithm in one of the experiments. Figure 6 shows the

classification accuracy of the Bag of Visual Words Algorithm clustered applying the FCM algorithm in one of the experiments.

4 Fig. 4. Confusion matrix obtained on the 4 objects Caltech dataset using linear SVM and K-Means algorithm for codebook construction. Fig. 6. Confusion matrix obtained on the 4 objects Caltech dataset using linear SVM FCM algorithm for codebook construction. classification accuracy of the Bag of Visual Words Algorithm clustered applying the FCM algorithm in one of the experiments. Figure 7 shows the performance of the three algorithms and its average accuracy. The referred accuracy is related to the performance of the SVM for every object classification and its average in a normal experiment run of the algorithm with a specific clustering algorithm. The categories 1,2 and 3 in the figure represented by bars are the k-means, FCM and K-Medoids algorithms accuracy respectively. From these categories is possible to conclude that the K-Means shows the best accuracy, followed by the K-Medoids and FCM algorithms.. Fig. 7. Accuracy of the clustering algorithms using SVM in the Caltech 4-objects subset Fig. 5. Confusion matrix obtained on the 4 objects Caltech dataset using linear SVM and K-Medoids algorithm for codebook construction. B. subset 15-object The same experiments were executed in the second subset of 15 images of the Caltech dataset, using this bigger subset was obtained the figure 8, that shows the accuracy of the clustering algorithms. In this figure it is possible to observe that the best performance is also achieved with the K-Means and K-Medoids algorithms that are hard clustering algorithms compared with the FCM algorithm that is considered a soft clustering algorithms method. Also the performance of the accuracy of the algorithm decays compared to a dataset with a minor number of images. Fig. 8. Accuracy of the clustering algorithms using SVM in the Caltech 15-objects subset V. CONCLUSION It has been explored the application of three clustering algorithms in the bag of visual words algorithm for object recognition; the tests realized in an image database divided in two subsets of different images proved that the performance of the K-means algorithm is better compared to the K-Medoids and FCM algorithm. Also, we can notice that the accuracy of the algorithm decreased if the number of objects is augmented

5 in our experiments, but we need to evaluate more datasets and experiments in order to find the limits of the relation of the accuracy and the quantity of objects in our framework. As future works, we are planning to evaluate the performance of the clustering algorithm with different feature extractors methods for the BOW algorithm. REFERENCES [1] Sivic, J. and Zisserman, A. Video Google: A text retrieval approach to object matching in videos, Internantional Conference on Computer Vision (ICCV) IEEE Computer Society., pp , [2] Dell Agnello, D, Carneiro, G., Chin, T., Castellano, G. and Fanelli, A.M. (2013) Fuzzy clustering based encoding for Visual Object Classification, in IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS). (Edmonton-Canada). [3] Sujatha, K.S., Keerthana, P., Suga Priya, S., Kaavya, E. and Vinod, B. (2012) Fuzzy based Multiple Dictionary Bag of Words for Image Classificationg, Procedia Engineering., Vol. 38, pp [4] Zhou, G., Wang, Z., Wang, J. and Feng, D. (2010) Spatial Context for Visual Vocabulary Construction, in International Conference in Image Analysis and Signal Processing (IASP).(Zhejiang-China). [5] Sujatha, K.S., Karthiga, G.M. and Vinod, B. (2012) Evaluation of Bag of Visual Words for Category Level Object Recognition, International Joutnal of Electronics Signals and Systems., Vol. 1, No. 4,pp [6] Xu, S., Fang, T., Li, D. and Wang, S. Object Classification of Aerial Images With Bag-of-Visual Words, IEEE Geoscience and Remote Sensing Letters., Vol. 7, No. 2, pp , [7] Lowe, D.G. (1999) Object Recognition from Local Scale Invarian Features, in Proceedings of the International Conference on Computer Vision, pp (Kerkyra-Greece). [8] Bishop, C.M. Pattern Recognition and Machine Learning, Springer, [9] Shah, S. and Singh,M. (2012) Comparison of a Time Efficient Modified K-mean Algorithm with K-Mean and K-Medoid Algorithm, in International Conference on Communication Systems and Network TEchnologies (CSNT). (Rajkot-India). [10] Zhou, Z. (2010) Comparisson of Four Kinds of Fuzzy C-Means Clustering Methods, in Symposium on Informatics Processing (ISIP), pp (Qingdao-China). [11] Setnes, M. and Babusca, R. (1999) A Fuzzy Relational Classifier Trained by Fuzzy Clustering, IEEE Transactions and Systems, Man, and Cybernetics., Vol. 29, No. 5, pp [12] Gamarra, D.F.T. (2013) Forward Models with Cluster Validity Criteria Applied in Ballistic Reaching for Visual Servoing, International Review on Modelling and Simulations., Vol. 6, pp [13] Gamarra, D.F.T. (2015) Fuzzy Image Segmentation Using Validity Indexes Correlation, International Journal of Computer Science and Information Technology., Vol. 7, pp [14] Jain, N. and Shukla, S. (2012) Fuzzy Databases Using Extended Fuzzy C Means Clustering, Journal of Engineering Research and Applications., Vol. 2, No. 3, pp [15] Ballan, L. and Seidenari, L. (2013) available:

Fuzzy based Multiple Dictionary Bag of Words for Image Classification

Available online at www.sciencedirect.com Procedia Engineering 38 (2012 ) 2196 2206 International Conference on Modeling Optimisation and Computing Fuzzy based Multiple Dictionary Bag of Words for Image