HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging

007 International Conference on Convergence Information Technology HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging Lixin Han,, Guihai Chen Department of Computer Science and Engineering, Hohai University, Nanjing, China State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China E-mail lhan@hhu.edu.cn Abstract In recent years, there has been considerable interest in collaborative tagging. This paper proposes a hybrid fuzzy clustering method for collaborative tagging. Key feature of the method includes using a combination of the fuzzy c-means and the subtractive clustering to handle collaborative tagging problems. The method allows a resource to belong to more than one taggings with different membership grade. The HFCT method need not know in advance the number of taggings, in order to avoid the difficulty of initial guesses of the number of taggings.. Introduction Nowadays collaborative tagging is becoming more popular. Collaborative tagging [] allows a number of web users with explicit or implicit social interactions to annotate together such bookmarks, photographs as objects, in order to retrieve and share information more efficiently. Influential web applications include the social bookmarking site del.ici.ous, Flickr, Rojo, Furl, Technorati, Connotea, and Amazon []. Collaborative tagging makes users to annotate web resources more easily, openly and freely than taxonomies and ontologies. In contrast to formal classification systems, annotation systems have more adaptability in organizing information []. In contrast to web resources annotated in the Semantic Web area, the users with collaborative Tagging can adds one or more tags to the resource manually without a predefined formal ontology []. Collaborative tagging is also called social annotations. In general, different users annotate a resource with the same or different tags. Thus, a resource may contain a single or multiple tagging. Single-labeled data means each resource can belong to exactly one tagging. Multi-labeled data means that each resource can belong to several taggings simultaneously, and these taggings are not exclusive one another. In this paper, we propose an algorithm called HFCT (Hybrid Fuzzy Clustering Method for Collaborative Tagging). The HFCT method introduces fuzzy clustering method to handle collaborative tagging problems. In the HFCT method, the subtractive clustering algorithm is used to determine the number of taggings for a given set of resources and the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources. Each resource may belong to multiple taggings to some degree that is specified by a membership grade.. Related work Wu et al. [] explore a social approach to the semantic annotation. The complement approach of semantic annotations focuses on the social annotations" of the web resources. The approach 0-7695-3038-9/07 $5.00 007 IEEE DOI 0.09/ICCIT.007.55 389

extends the bigram Separable Mixture Model to a tripartite probabilistic model to obtain the emergent semantics of the tags and automatically derive the emergent semantics. Based on the analysis of social annotation s characteristics, Li [4] et al. propose an algorithm of effective large scale annotation browser (ELSABer) for browsing social annotation data. Key features of the ELSABer algorithm include assigning semantic concepts consisting of the semantically related annotations for semantic browsing, organizing annotations in a hierarchical way for hierarchical browsing, and studying the power low distribution of social annotations for efficient browsing. Halpin et al. [] explore the dynamics of distributed tagging systems. They have shown that the distribution of the frequency of use of tags with sufficient active users and many tags tends to stabilize into power law distributions. Golder et al. [3] analyze in detail the structure of collaborative tagging systems called Delicious from dynamical aspects. Chirita et al. [5] propose a method of automatically generates annotation tags for Web pages (P-TAG) on personal Desktop. Their system employs the implicit background knowledge residing on each user s personal Desktop to produce personalized annotation tags instead of ontology-based annotations. In contrast to the above work, the HFCT method employs the subtractive clustering combined with the fuzzy c-means to handle collaborative tagging problems. The subtractive clustering algorithm is used to determine the number of taggings for a given set of resources and the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources. Each resource may belong to multiple taggings to some degree that is specified by a membership value. 3. A hybrid fuzzy clustering method for collaborative tagging problems The HFCT method is a hybrid fuzzy clustering method for collaborative tagging problems. The HFCT method consists mainly of the subtractive clustering algorithm and the fuzzy c-means algorithm. The HFCT method is described below: { the subtractive clustering algorithm is used to determine the number of taggings for a given set of resources; the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources; } 3.. The subtractive clustering algorithm The subtractive clustering algorithm [6], [7], which is a fast one-pass algorithm, extends a form of the grid-based mountain clustering method introduced by Ronald R. Yager and Dimitar P Filev [8], [9]. The subtractive clustering algorithm can be used to estimate the number of clusters and the cluster centers for a given set of data. The subtractive clustering algorithm first assumes each data point is a potential cluster center. Then, based on the density in the neighborhood of potential data points, a measure of the likelihood that each data point would define the cluster center is calculated [7], [8].The subtractive clustering algorithm for estimating the number of taggings is described below: { while all of the resource point is not within radii of any cluster centers {selects the resource point with largest density value as the cluster center; all the resource points in the neighborhood of the cluster center are deleted; } } 3.. The fuzzy c-means algorithm 390

The most known method of fuzzy clustering is the fuzzy c-means method (FCM) [0]. FCM introduces the concepts of fuzzy logic to classic K-means [], []. Fuzzy set theory is an extension of the classic set theory developed by Zadeh [3] as a way to deal with vague concepts. Classical set theory considers an object as a member of a given set or not, that is, indicator variable is and 0. In a fuzzy set, the indicator variable called membership can take intermediate values in interval [0, ]. The FCM algorithm assumes that the number of clusters c is known in advance and minimizes an objective function to find the best set of clusters. Usually, membership functions are defined based on a distance function, such that membership degrees express proximities of entities to cluster prototypes. In the FCM algorithm, let X={x,---, x n } denote a set of unlabeled feature vector in R p, and let c be an integer, <c<n. Each x j is the numerical representation of p features. Given X, a fuzzy c-partition of X is represented by a c n fuzzy partition matrix U=[u ij ] satisfying the conditions: 0 u ij ( i c, j n), c = ( j n) and ij i= u n > 0 ( i c), ij j= where each value u ij represents the membership of the j-th feature vector to the i-th cluster. The clustering criterion used by the FCM algorithm is associated with the generalized least-squared errors function. u c n m ( m ) = ij ij i= j= min J U, V ( u ) D c s.t. uij =, j {, n } i= 0 u ij i {, c}, j {, n} () where c is the number of fuzzy clusters, u ik [0,] is the degree of membership of feature point x k in cluster i. Parameter m> is the degree of fuzzification called fuzzifier in order to increase or decrease the fuzziness. Higher values of fuzziness will make the result fuzzier. U=[u ij ] is a c n constrained fuzzy c-partition matrix. If m, the membership degrees u ik 0/. Thus, the classification tends to be crisp. If m, u ik, where c is the number of clusters. c V=[v v c ] (v i R p ) is the vector of cluster prototypes, and D ij is some distance metric between feature vector x j and cluster prototype v i, which is taken equal to the squared distance. D ij = x j - v i =( x A j - v i ) T A(x j - v i ) () where the matrix A represents a positive definite n n weight matrix. If A is taken as the identity matrix I, the resulting Euclidean norm implies hyperspherical clusters. 4. Experiment results and discussion In this section, we conduct experiments to evaluate the performance of the HFCT method, in order to discover the closely related resources and taggings. 34 Web pages are selected as resources. These resources are mapped into a two-dimensional vector space. The subtractive clustering algorithm is used to determine the number of taggings and the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources. The experiment result in Figure shows that the subtractive clustering algorithm has good effect. In Figure, every data point represents a source and every cluster centers represents a tagging. Therefore total number of source is 34 and total number of taggings is 3. The experiment result in Figure shows that the fuzzy c-means algorithm has good effect. After the number of cluster centers is determined in Figure, the fuzzy c-means algorithm in Figure is used to modify these cluster centers, in order to make cluster centers more reasonable. The experiment result in Figure shows that a 'movies' tag annotates4 sources, a 'music' tag annotates sources, a 'games' tag annotates 5 sources, and 7 sources are not forced to fully belong to 39

anyone of tag. Figure. The experiment result of the subtractive clustering algorithm 8 6 4 0 Y 8 6 4 0 0 5 0 5 0 X more popular. Fuzzy Clustering problems are Figure. The experiment result of the fuzzy c-means algorithm 8 6 4 0 Y 8 6 4 0 0 5 0 5 0 X 5. Conclusion Nowadays collaborative tagging is becoming very useful in practice and theory of collaborative tagging. In this paper, we propose an algorithm called HFCT (Hybrid Fuzzy Clustering Method for Collaborative Tagging). 39

The HFCT method employs the fuzzy c-means combined with the subtractive clustering to handle collaborative tagging problems. In the HFCT method, the subtractive clustering algorithm is used to determine the number of taggings for a given set of resources and the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources. The HFCT method allows each resource belongs to multiple taggings with different degree of belief. The HFCT method need not know in advance the number of taggings. Acknowledgement [4] Rui Li, Shenghua Bao, Ben Fei, Zhong Su, and Yong Yu. Towards Effective Browsing of Large Scale Social Annotations. In the Proceedings of the sixteenth International World Wide Web Conference (WWW007). Banff, Alberta, Canada, May 8-, 007. pp.943 95. [5] Paul -Alexandru Chirita, Stefania Costache, Siegfried Handschuh, Wolfgang Nejdl. PTAG: Large Scale Automatic Generation of Personalized Annotation TAGs for the Web. In the Proceedings of the sixteenth International World Wide Web Conference (WWW007). Banff, Alberta, Canada, May 8-, 007. pp. 845-854. This work is also supported by the National Natural Science Foundation of China under grants 6067386 and 6057048, the National Grand Fundamental Research 973 Program of China under grant 00CB300, the State Key Laboratory Foundation of Novel Software Technology at Nanjing University under grant A00604, and the Natural Science Foundation of Jiangsu Province of China under grant BK00508. References [] Xian Wu, Lei Zhang, Yong Yu. Exploring Social Annotations for the Semantic Web. In the Proceedings of the fifteenth International World Wide Web Conference (WWW006). Edinburgh, Scotland, May 3-6, 006. [] Harry Halpin, Valentin Robu, Hana Shepherd. The Complex Dynamics of Collaborative Tagging. In the Proceedings of the sixteenth International World Wide Web Conference (WWW007). Banff, Alberta, Canada, May 8-, 007. pp. - 0. [3] Golder, S. and Huberman, B. A.. 005. The Structure of Collaborative Tagging Systems. Technical report, In-formation Dynamics Lab, HP Labs. [6] C. W. Tao. Unsupervised fuzzy clustering with multi-center clusters. Fuzzy Sets and Systems. Volume 8, Issue 3, June. pp. 305-3. 00. [7] Chiu, S., "Fuzzy Model Identification Based on Cluster Estimation," Journal of Intelligent & Fuzzy Systems, Vol., No. 3, Sept. 994. [8] Yager, R. and D. Filev, "Generation of Fuzzy Rules by Mountain Clustering," Journal of Intelligent & Fuzzy Systems, Vol., No. 3, pp. 09-9, 994. [9] Yager, R.R.; Filev, D.P.. Approximate Clustering Via the Mountain Method. IEEE Transactions on Systems, Man and Cybernetics.Volume 4, Issue 8, Aug. pp.79 84. 994. [0] JC Bezdek, R Ehrlich, FCM: The Fuzzy c-means Clustering Algorithm, Computers and Geosciences 0 (984) 9-03. [] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function, Plenum Press, 98. [] J.C. Bezdek, Some non-standard clustering algorithms in: Legendre, P. & Legendre, L. Developments in Numerical Ecology, NATO ASI Series, Vol. G4. Springer-Verlag, 987. 393

[3] L.A. Zadeh, Fuzzy sets, Information and Control 8 (965) 338-353. 394