HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging

Similar documents
A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

Fuzzy-Kernel Learning Vector Quantization

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION

ECM A Novel On-line, Evolving Clustering Method and Its Applications

Use of Content Tags in Managing Advertisements for Online Videos

Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

Data Mining Approaches to Characterize Batch Process Operations

Machine Learning & Statistical Models

Collaborative Rough Clustering

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Clustering CS 550: Machine Learning

S. Sreenivasan Research Scholar, School of Advanced Sciences, VIT University, Chennai Campus, Vandalur-Kelambakkam Road, Chennai, Tamil Nadu, India

A Fuzzy Rule Based Clustering

An indirect tire identification method based on a two-layered fuzzy scheme

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

QUALITATIVE MODELING FOR MAGNETIZATION CURVE

TSS: A Hybrid Web Searches

Fuzzy C-means Clustering with Temporal-based Membership Function

Hybrid Fuzzy C-Means Clustering Technique for Gene Expression Data

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS

Cluster Analysis. Ying Shen, SSE, Tongji University

An Improved Fuzzy K-Medoids Clustering Algorithm with Optimized Number of Clusters

How Social Is Social Bookmarking?

A Modified Fuzzy C Means Clustering using Neutrosophic Logic

Texture Image Segmentation using FCM

CSE 5243 INTRO. TO DATA MINING

Tag Based Image Search by Social Re-ranking

RPKM: The Rough Possibilistic K-Modes

Cluster analysis of 3D seismic data for oil and gas exploration

Unsupervised Learning and Clustering

CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH

Methods for Intelligent Systems

CSE 5243 INTRO. TO DATA MINING

Detecting Tag Spam in Social Tagging Systems with Collaborative Knowledge

Web Based Fuzzy Clustering Analysis

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

Collaborative Filtering using Euclidean Distance in Recommendation Engine

COMMUNITY DETECTION IN THE COLLABORATIVE WEB

Clustering and Visualisation of Data

Information Retrieval and Web Search

Unsupervised Learning : Clustering

Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology

Hybrid Models Using Unsupervised Clustering for Prediction of Customer Churn

Fuzzy C-MeansC. By Balaji K Juby N Zacharias

Object Segmentation in Color Images Using Enhanced Level Set Segmentation by Soft Fuzzy C Means Clustering

Fuzzy Co-Clustering and Application to Collaborative Filtering

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Equi-sized, Homogeneous Partitioning

Text Document Clustering Using DPM with Concept and Feature Analysis

COSC 6397 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2015.

COLOR BASED REMOTE SENSING IMAGE SEGMENTATION USING FUZZY C-MEANS AND IMPROVED SOBEL EDGE DETECTION ALGORITHM

A SURVEY ON CLUSTERING ALGORITHMS Ms. Kirti M. Patil 1 and Dr. Jagdish W. Bakal 2

CHAPTER 4: CLUSTER ANALYSIS

A Language Independent Author Verifier Using Fuzzy C-Means Clustering

Semi-Supervised Clustering with Partial Background Information

International Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 11 Nov. 2016, Page No.

Recognition of Changes in SAR Images Based on Gauss-Log Ratio and MRFFCM

Cluster Tendency Assessment for Fuzzy Clustering of Incomplete Data

Algorithms for Soft Document Clustering

Keywords - Fuzzy rule-based systems, clustering, system design

A Survey On Different Text Clustering Techniques For Patent Analysis

Knowledge Discovery and Data Mining 1 (VO) ( )

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

Supervised vs. Unsupervised Learning

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Classifying Users and Identifying User Interests in Folksonomies

Based on Raymond J. Mooney s slides

Overlapping Communities

EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR

Expectation Maximization (EM) and Gaussian Mixture Models

Web Data mining-a Research area in Web usage mining

Unit V. Neural Fuzzy System

Music Recommendation with Implicit Feedback and Side Information

ANALYSIS AND REASONING OF DATA IN THE DATABASE USING FUZZY SYSTEM MODELLING

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

ISSN: Page 22

SOMSN: An Effective Self Organizing Map for Clustering of Social Networks

Improving Image Segmentation Quality Via Graph Theory

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection

Collaborative Tag Recommendations

Unsupervised Learning and Clustering

4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1

Pattern Clustering with Similarity Measures

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

Clustering & Classification (chapter 15)

A Tagging Approach to Ontology Mapping

A New Fuzzy Neural System with Applications

Clustering. Supervised vs. Unsupervised Learning

Fuzzy Ant Clustering by Centroid Positioning

XETA: extensible metadata System

CS Introduction to Data Mining Instructor: Abdullah Mueen

; Robust Clustering Based on Global Data Distribution and Local Connectivity Matrix

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Study of Fuzzy Set Theory and Its Applications

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

COSC 6339 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2017.

Transcription:

007 International Conference on Convergence Information Technology HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging Lixin Han,, Guihai Chen Department of Computer Science and Engineering, Hohai University, Nanjing, China State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China E-mail lhan@hhu.edu.cn Abstract In recent years, there has been considerable interest in collaborative tagging. This paper proposes a hybrid fuzzy clustering method for collaborative tagging. Key feature of the method includes using a combination of the fuzzy c-means and the subtractive clustering to handle collaborative tagging problems. The method allows a resource to belong to more than one taggings with different membership grade. The HFCT method need not know in advance the number of taggings, in order to avoid the difficulty of initial guesses of the number of taggings.. Introduction Nowadays collaborative tagging is becoming more popular. Collaborative tagging [] allows a number of web users with explicit or implicit social interactions to annotate together such bookmarks, photographs as objects, in order to retrieve and share information more efficiently. Influential web applications include the social bookmarking site del.ici.ous, Flickr, Rojo, Furl, Technorati, Connotea, and Amazon []. Collaborative tagging makes users to annotate web resources more easily, openly and freely than taxonomies and ontologies. In contrast to formal classification systems, annotation systems have more adaptability in organizing information []. In contrast to web resources annotated in the Semantic Web area, the users with collaborative Tagging can adds one or more tags to the resource manually without a predefined formal ontology []. Collaborative tagging is also called social annotations. In general, different users annotate a resource with the same or different tags. Thus, a resource may contain a single or multiple tagging. Single-labeled data means each resource can belong to exactly one tagging. Multi-labeled data means that each resource can belong to several taggings simultaneously, and these taggings are not exclusive one another. In this paper, we propose an algorithm called HFCT (Hybrid Fuzzy Clustering Method for Collaborative Tagging). The HFCT method introduces fuzzy clustering method to handle collaborative tagging problems. In the HFCT method, the subtractive clustering algorithm is used to determine the number of taggings for a given set of resources and the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources. Each resource may belong to multiple taggings to some degree that is specified by a membership grade.. Related work Wu et al. [] explore a social approach to the semantic annotation. The complement approach of semantic annotations focuses on the social annotations" of the web resources. The approach 0-7695-3038-9/07 $5.00 007 IEEE DOI 0.09/ICCIT.007.55 389

extends the bigram Separable Mixture Model to a tripartite probabilistic model to obtain the emergent semantics of the tags and automatically derive the emergent semantics. Based on the analysis of social annotation s characteristics, Li [4] et al. propose an algorithm of effective large scale annotation browser (ELSABer) for browsing social annotation data. Key features of the ELSABer algorithm include assigning semantic concepts consisting of the semantically related annotations for semantic browsing, organizing annotations in a hierarchical way for hierarchical browsing, and studying the power low distribution of social annotations for efficient browsing. Halpin et al. [] explore the dynamics of distributed tagging systems. They have shown that the distribution of the frequency of use of tags with sufficient active users and many tags tends to stabilize into power law distributions. Golder et al. [3] analyze in detail the structure of collaborative tagging systems called Delicious from dynamical aspects. Chirita et al. [5] propose a method of automatically generates annotation tags for Web pages (P-TAG) on personal Desktop. Their system employs the implicit background knowledge residing on each user s personal Desktop to produce personalized annotation tags instead of ontology-based annotations. In contrast to the above work, the HFCT method employs the subtractive clustering combined with the fuzzy c-means to handle collaborative tagging problems. The subtractive clustering algorithm is used to determine the number of taggings for a given set of resources and the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources. Each resource may belong to multiple taggings to some degree that is specified by a membership value. 3. A hybrid fuzzy clustering method for collaborative tagging problems The HFCT method is a hybrid fuzzy clustering method for collaborative tagging problems. The HFCT method consists mainly of the subtractive clustering algorithm and the fuzzy c-means algorithm. The HFCT method is described below: { the subtractive clustering algorithm is used to determine the number of taggings for a given set of resources; the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources; } 3.. The subtractive clustering algorithm The subtractive clustering algorithm [6], [7], which is a fast one-pass algorithm, extends a form of the grid-based mountain clustering method introduced by Ronald R. Yager and Dimitar P Filev [8], [9]. The subtractive clustering algorithm can be used to estimate the number of clusters and the cluster centers for a given set of data. The subtractive clustering algorithm first assumes each data point is a potential cluster center. Then, based on the density in the neighborhood of potential data points, a measure of the likelihood that each data point would define the cluster center is calculated [7], [8].The subtractive clustering algorithm for estimating the number of taggings is described below: { while all of the resource point is not within radii of any cluster centers {selects the resource point with largest density value as the cluster center; all the resource points in the neighborhood of the cluster center are deleted; } } 3.. The fuzzy c-means algorithm 390

The most known method of fuzzy clustering is the fuzzy c-means method (FCM) [0]. FCM introduces the concepts of fuzzy logic to classic K-means [], []. Fuzzy set theory is an extension of the classic set theory developed by Zadeh [3] as a way to deal with vague concepts. Classical set theory considers an object as a member of a given set or not, that is, indicator variable is and 0. In a fuzzy set, the indicator variable called membership can take intermediate values in interval [0, ]. The FCM algorithm assumes that the number of clusters c is known in advance and minimizes an objective function to find the best set of clusters. Usually, membership functions are defined based on a distance function, such that membership degrees express proximities of entities to cluster prototypes. In the FCM algorithm, let X={x,---, x n } denote a set of unlabeled feature vector in R p, and let c be an integer, <c<n. Each x j is the numerical representation of p features. Given X, a fuzzy c-partition of X is represented by a c n fuzzy partition matrix U=[u ij ] satisfying the conditions: 0 u ij ( i c, j n), c = ( j n) and ij i= u n > 0 ( i c), ij j= where each value u ij represents the membership of the j-th feature vector to the i-th cluster. The clustering criterion used by the FCM algorithm is associated with the generalized least-squared errors function. u c n m ( m ) = ij ij i= j= min J U, V ( u ) D c s.t. uij =, j {, n } i= 0 u ij i {, c}, j {, n} () where c is the number of fuzzy clusters, u ik [0,] is the degree of membership of feature point x k in cluster i. Parameter m> is the degree of fuzzification called fuzzifier in order to increase or decrease the fuzziness. Higher values of fuzziness will make the result fuzzier. U=[u ij ] is a c n constrained fuzzy c-partition matrix. If m, the membership degrees u ik 0/. Thus, the classification tends to be crisp. If m, u ik, where c is the number of clusters. c V=[v v c ] (v i R p ) is the vector of cluster prototypes, and D ij is some distance metric between feature vector x j and cluster prototype v i, which is taken equal to the squared distance. D ij = x j - v i =( x A j - v i ) T A(x j - v i ) () where the matrix A represents a positive definite n n weight matrix. If A is taken as the identity matrix I, the resulting Euclidean norm implies hyperspherical clusters. 4. Experiment results and discussion In this section, we conduct experiments to evaluate the performance of the HFCT method, in order to discover the closely related resources and taggings. 34 Web pages are selected as resources. These resources are mapped into a two-dimensional vector space. The subtractive clustering algorithm is used to determine the number of taggings and the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources. The experiment result in Figure shows that the subtractive clustering algorithm has good effect. In Figure, every data point represents a source and every cluster centers represents a tagging. Therefore total number of source is 34 and total number of taggings is 3. The experiment result in Figure shows that the fuzzy c-means algorithm has good effect. After the number of cluster centers is determined in Figure, the fuzzy c-means algorithm in Figure is used to modify these cluster centers, in order to make cluster centers more reasonable. The experiment result in Figure shows that a 'movies' tag annotates4 sources, a 'music' tag annotates sources, a 'games' tag annotates 5 sources, and 7 sources are not forced to fully belong to 39

anyone of tag. Figure. The experiment result of the subtractive clustering algorithm 8 6 4 0 Y 8 6 4 0 0 5 0 5 0 X more popular. Fuzzy Clustering problems are Figure. The experiment result of the fuzzy c-means algorithm 8 6 4 0 Y 8 6 4 0 0 5 0 5 0 X 5. Conclusion Nowadays collaborative tagging is becoming very useful in practice and theory of collaborative tagging. In this paper, we propose an algorithm called HFCT (Hybrid Fuzzy Clustering Method for Collaborative Tagging). 39

The HFCT method employs the fuzzy c-means combined with the subtractive clustering to handle collaborative tagging problems. In the HFCT method, the subtractive clustering algorithm is used to determine the number of taggings for a given set of resources and the fuzzy c-means algorithm is used to acquire the taggings corresponding to the specific resources. The HFCT method allows each resource belongs to multiple taggings with different degree of belief. The HFCT method need not know in advance the number of taggings. Acknowledgement [4] Rui Li, Shenghua Bao, Ben Fei, Zhong Su, and Yong Yu. Towards Effective Browsing of Large Scale Social Annotations. In the Proceedings of the sixteenth International World Wide Web Conference (WWW007). Banff, Alberta, Canada, May 8-, 007. pp.943 95. [5] Paul -Alexandru Chirita, Stefania Costache, Siegfried Handschuh, Wolfgang Nejdl. PTAG: Large Scale Automatic Generation of Personalized Annotation TAGs for the Web. In the Proceedings of the sixteenth International World Wide Web Conference (WWW007). Banff, Alberta, Canada, May 8-, 007. pp. 845-854. This work is also supported by the National Natural Science Foundation of China under grants 6067386 and 6057048, the National Grand Fundamental Research 973 Program of China under grant 00CB300, the State Key Laboratory Foundation of Novel Software Technology at Nanjing University under grant A00604, and the Natural Science Foundation of Jiangsu Province of China under grant BK00508. References [] Xian Wu, Lei Zhang, Yong Yu. Exploring Social Annotations for the Semantic Web. In the Proceedings of the fifteenth International World Wide Web Conference (WWW006). Edinburgh, Scotland, May 3-6, 006. [] Harry Halpin, Valentin Robu, Hana Shepherd. The Complex Dynamics of Collaborative Tagging. In the Proceedings of the sixteenth International World Wide Web Conference (WWW007). Banff, Alberta, Canada, May 8-, 007. pp. - 0. [3] Golder, S. and Huberman, B. A.. 005. The Structure of Collaborative Tagging Systems. Technical report, In-formation Dynamics Lab, HP Labs. [6] C. W. Tao. Unsupervised fuzzy clustering with multi-center clusters. Fuzzy Sets and Systems. Volume 8, Issue 3, June. pp. 305-3. 00. [7] Chiu, S., "Fuzzy Model Identification Based on Cluster Estimation," Journal of Intelligent & Fuzzy Systems, Vol., No. 3, Sept. 994. [8] Yager, R. and D. Filev, "Generation of Fuzzy Rules by Mountain Clustering," Journal of Intelligent & Fuzzy Systems, Vol., No. 3, pp. 09-9, 994. [9] Yager, R.R.; Filev, D.P.. Approximate Clustering Via the Mountain Method. IEEE Transactions on Systems, Man and Cybernetics.Volume 4, Issue 8, Aug. pp.79 84. 994. [0] JC Bezdek, R Ehrlich, FCM: The Fuzzy c-means Clustering Algorithm, Computers and Geosciences 0 (984) 9-03. [] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function, Plenum Press, 98. [] J.C. Bezdek, Some non-standard clustering algorithms in: Legendre, P. & Legendre, L. Developments in Numerical Ecology, NATO ASI Series, Vol. G4. Springer-Verlag, 987. 393

[3] L.A. Zadeh, Fuzzy sets, Information and Control 8 (965) 338-353. 394