Fuzzy C-means Clustering with Temporal-based Membership Function

Similar documents
Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data

Improved Version of Kernelized Fuzzy C-Means using Credibility

SPATIAL BIAS CORRECTION BASED ON GAUSSIAN KERNEL FUZZY C MEANS IN CLUSTERING

Available online Journal of Scientific and Engineering Research, 2019, 6(1): Research Article

ISSN: X Impact factor: 4.295

An indirect tire identification method based on a two-layered fuzzy scheme

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

Unsupervised Learning : Clustering

CAD SYSTEM FOR AUTOMATIC DETECTION OF BRAIN TUMOR THROUGH MRI BRAIN TUMOR DETECTION USING HPACO CHAPTER V BRAIN TUMOR DETECTION USING HPACO

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

QUANTUM BASED PSO TECHNIQUE FOR IMAGE SEGMENTATION

Texture Image Segmentation using FCM

Color based segmentation using clustering techniques

Hybrid Fuzzy C-Means Clustering Technique for Gene Expression Data

S. Sreenivasan Research Scholar, School of Advanced Sciences, VIT University, Chennai Campus, Vandalur-Kelambakkam Road, Chennai, Tamil Nadu, India

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

An Improved Fuzzy K-Medoids Clustering Algorithm with Optimized Number of Clusters

Clustering CS 550: Machine Learning

Clustering and Visualisation of Data

Chapter 7 UNSUPERVISED LEARNING TECHNIQUES FOR MAMMOGRAM CLASSIFICATION

HFCT: A Hybrid Fuzzy Clustering Method for Collaborative Tagging

Colour Image Segmentation Using K-Means, Fuzzy C-Means and Density Based Clustering

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

CSE 5243 INTRO. TO DATA MINING

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

A Recommender System Based on Improvised K- Means Clustering Algorithm

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang

Content Based Image Retrieval Using Hierachical and Fuzzy C-Means Clustering

Image Compression: An Artificial Neural Network Approach

Kernel Based Fuzzy Ant Clustering with Partition validity

Collaborative Rough Clustering

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining, 2 nd Edition

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Dynamic Clustering of Data with Modified K-Means Algorithm

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Cluster Analysis. Ying Shen, SSE, Tongji University

CSE 5243 INTRO. TO DATA MINING

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Improving the efficiency of Medical Image Segmentation based on Histogram Analysis

CS Introduction to Data Mining Instructor: Abdullah Mueen

Cluster Analysis: Agglomerate Hierarchical Clustering

A Naïve Soft Computing based Approach for Gene Expression Data Analysis

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

Feature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Unsupervised Learning and Clustering

University of Florida CISE department Gator Engineering. Clustering Part 2

Analyzing Outlier Detection Techniques with Hybrid Method

Available Online through

Unsupervised Learning and Clustering

ECG782: Multidimensional Digital Signal Processing

Clustering. Supervised vs. Unsupervised Learning

T-S Neural Network Model Identification of Ultra-Supercritical Units for Superheater Based on Improved FCM

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Tumor Detection and classification of Medical MRI UsingAdvance ROIPropANN Algorithm

Enhanced Hemisphere Concept for Color Pixel Classification

Low Contrast Image Enhancement Using Adaptive Filter and DWT: A Literature Review

Fuzzy Ant Clustering by Centroid Positioning

Similarity Measures of Pentagonal Fuzzy Numbers

Fuzzy C-MeansC. By Balaji K Juby N Zacharias

CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH

Image Segmentation Using Two Weighted Variable Fuzzy K Means

IMAGE SEGMENTATION BY FUZZY C-MEANS CLUSTERING ALGORITHM WITH A NOVEL PENALTY TERM

Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

A Review on Cluster Based Approach in Data Mining

An adjustable p-exponential clustering algorithm

Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem

Fuzzy Clustering Algorithms for Effective Medical Image Segmentation

Saudi Journal of Engineering and Technology. DOI: /sjeat ISSN (Print)

Global Journal of Engineering Science and Research Management

Subspace Clustering. Weiwei Feng. December 11, 2015

ECLT 5810 Clustering

Redefining and Enhancing K-means Algorithm

Dept of CSE, CIT Gubbi, Tumkur, Mysore, India

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

A Clustering Method with Efficient Number of Clusters Selected Automatically Based on Shortest Path

IMPLEMENTATION OF SPATIAL FUZZY CLUSTERING IN DETECTING LIP ON COLOR IMAGES

Supervised vs. Unsupervised Learning

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Slides From Lecture Notes for Chapter 8. Introduction to Data Mining

CHAPTER 4: CLUSTER ANALYSIS

A Review of K-mean Algorithm

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm

Image Analysis - Lecture 5

Overlapping Clustering: A Review

Swarm Based Fuzzy Clustering with Partition Validity

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

COLOR image segmentation is a method of assigning

Transcription:

Indian Journal of Science and Technology, Vol (S()), DOI:./ijst//viS/, December ISSN (Print) : - ISSN (Online) : - Fuzzy C-means Clustering with Temporal-based Membership Function Aseel Mousa * and Yuhanis Yusof School of Computing, Universiti Utara Malaysia, Sintok, Kedah, Malaysia; assiso@yahoo.com, yuhanis@uum.edu.my Abstract Objective: In this paper, a method is proposed to create clusters depending on temporal information. Despite its popularity, the FCM algorithm does not utilize temporal information in creating clusters, hence affecting the accuracy of clustering. This paper presents an improved Fuzzy C-means algorithm that incorporates temporal information into the membership function used for clustering. Methods: The proposed FCM algorithm employs temporal neighbouring of data points as the base of clustering. In order to evaluate the algorithm, experimental analysis was performed on three multi-labelled datasets, including a clinical free text (medical), textual email messages (Enron), and Bibtex. Finding: The experimental results show that the proposed function contributes a smaller value of objective function while using a minimum number of iterations. Application: The proposed work will benefit data mining in various domains such as information retrieval, healthcare, business management and many others. This is due to its ability in grouping data- points that are not mutually exclusive. Keyword: Fuzzy C-mean, Data Clustering, Data Mining, Multi-labelled Data. Introduction Clustering is a distinguishing among things, in conformity with certain requirements and rules. A clustering algorithm is a set of classification rules of the data with unknown distribution. The main goal is to find the structure hidden in data, and as much as possible to assign data with the same nature attributed to the same class according to some measure of similarity degree. Clustering analysis, which leads to fuzzy partition of sample space, has been widely used in a variety of areas such as data mining and pattern recognition. An example of a clustering algorithm is the Fuzzy c-means (FCM) which has been successfully applied in various fields such as medical imaging, target recognition, image segmentation and so on. In, the authors propose an improved FCM algorithm based on Particle Swarm Optimization algorithm (PSO) to solve the problem of premature convergence of the fuzzy c-means clustering algorithm. The results show that the improved method can handle the noise better than previous methods. Further the clustering performance is improved with data sets that has dimensions greater than the number of samples. In, the author proposes an improved clustering approach based on cluster density (FCM-CD). Considering of the global dot density in a cluster, a distance correction regulatory factor is built and applied to FCM. The experiment results reveal that FCM-CD has good tolerance to different densities and various cluster shapes. FCM-CD shows a higher performance in clustering accuracy. In the author propose new fuzzy c-means method for improving the Magnetic Resonance Imaging (MRI) segmentation. The proposed method is known as Possiblistic Fuzzy C-Means (PFCM) and hybrids the Fuzzy C-Means (FCM) and Possiblistic C-Means (PCM) functions. It may be realized by modifying the objective function of the conventional PCM algorithm with Gaussian exponent weights to produce memberships and possibilities simultaneously, along with the usual point prototypes or cluster centres for each cluster. *Author for correspondence

Fuzzy C-means Clustering with Temporal-based Membership Function In the authors propose an improved FCM algorithm by adopting a novel strategy for selecting the initial cluster centres, to address the difficulty that the traditional FCM clustering algorithm has in selecting initial cluster centres. In the authors presents an automatic effective intuitionistic fuzzy c-means s which is an extension of standard Intuitionistic Fuzzy C-Means (IFCM). They present a model called RBF Kernel based on Intuitionistic Fuzzy C-Means (KIFCM), replacing the Euclidean norm with other, different distances. Although there have been many studies on distance measure, existing approaches do not consider the temporal information of the data. Such information may provide insight into characteristics of the data, hence producing accurate clusters. This paper introduces a membership function that operates based on the temporal information of the data. The proposed function is later integrated into the FCM algorithm to cluster a multi-labelled dataset.. Fuzzy C-mean The FCM algorithm assigns pixels to each category using fuzzy memberships. FCM has the ability to determine and iteratively update the values of membership of a data point in clusters that are previously defined. So, any data point can be related to all clusters based on its membership value. This algorithm tries to assign membership to each data point corresponding to each cluster centre. It depends on the calculation of mean distance between each data point and the centroid point,. Algorithm gives the conventional FCM : Algorithm. Conventional FCM : Initialize of membership : Choose parameter > to stop the iteration. Set the iteration counting parameter l equal to : for l = to x : Set time matrix : At k-step calculate the centre vectors = by : else : stop at some iteration lw : end if where m is defined to any real number that is greater than, is refers to a degree of membership of in cluster j, is the of d-dimensional measured data, is the dimension centroid of the cluster, is any norm expressing the similarity between any measured data and the centroid, c represents the number of cluster centre; n is the number of data points, is the distance between data to cluster center and d ij represents the Euclidean distance between i th data to j th centre.. Methods This study focuses on improving existing fuzzy function employed in FCM algorithm in order to produce a better clustering. In this section, the phases undertaken in the study are elaborated on.. Data Collection Three multi-label datasets are used in this study and this includes medical, Enron, and Bibtex datasets. The description of these datasets are tabulated in Table. All of them were obtained from the Mulan website. Mulan is an open-source library for learning from multi-label datasets, containing a table of multi label data sets and descriptions http://mulan.sourceforge.net/datasets. html.. Function Design Currently, the standard FCM algorithm does not utilize temporal relationship of data points, so it may not be robust with close data points that have strong relationships (i.e. overlapping in data points). Overlapping means that a data point may intrinsically belong to more than one cluster. In other words, each data point is mapped to a more than one label. Figure shows a plotted figure of overlapping clustering. : Update the membership matrix, by : if - > l : Goto step Table. Description of datasets Name Domain Instances Labels Density Medical Text. Enron Text. Bibtex Text. Vol (S()) December www.indjst.org Indian Journal of Science and Technology

Aseel Mousa and Yuhanis Yusof Figure. Two-D plot for overlapped classes. Non-overlapped data means that a data point should belongs to one cluster. In other words, clustering will assign an object to exactly one class, even though there are two or more class labels. Temporal information indicates that the neighbouring data points, in time, are highly correlated and thus possess the same feature value. This means that the probability that they belong to the same cluster is high. In order to exploit temporal information contained in the data, temporal function is defined: Equation. Probability of pixel belongs to cluster The clustering starts by applying the conventional FCM to calculate the membership function. This membership function is mapped to the temporal function to compute the temporal membership function as in Equation and Equation. The operation stops when the distance between cluster centres and data point is less than a threshold (.). This threshold value represents a minimum value of distance between data point and cluster centre, guaranteeing an optimal estimation of a number of clusters to solve the overlap problem. This distance then relates each data point to its real cluster, which is in general better than for a larger value of distance and thus still ensures the recognition of barely detectable clusters, reducing the overlap that could be found in clusters. The proposed temporal-based membership function is included as step in the improved FCM, as presented in Algorithm. Algorithm. Improved FCM : Initialize of membership : Choose parameter > to stop the iteration. Set the iteration counting parameter l equal to : for i = to x : Set time matrix : end for : At k-step calculate the centers vectors = by Equation. New membership function in Equation is the probability that pixel belongs to the cluster, and t is the time combined with each input vector. Each vector (data point) corresponds to time value that is stored in the time matrix, represents the membership of pixel of time t in the cluster while c is the number of cluster centres. represents a square array of time value centred on time value in the temporal domain, µnewij represents the new membership function in temporal domain. In this study, a x array of time matrix associated with the matrix of data points will be taken for easer representation (i.e. a x of SQ matrix). The temporal function of a data point for a cluster is large if the majority of its neighbourhood belongs to the same clusters. The temporal function will support the conventional membership function in the case of normal datasets and reduce the overlapping weight in case of multi label datasets by reprocessing the overlapped data points that are related to more than one cluster by applying the time factor matching in temporal membership function. : Update the membership matrix, by : if - > l : Goto step : else : stop at some iteration l : end if : Calculate the temporal function and map it to the new membership function by : if Go to step Vol (S()) December www.indjst.org Indian Journal of Science and Technology

Fuzzy C-means Clustering with Temporal-based Membership Function : else : Goto step : end if Table depicts the iteration count that gives the minimum objective function, while Table show the value of distance and objective function for the three datasets, respectively. For the medical dataset, (as in Table ), the improved FCM is better than the conventional FCM in terms of average distance between the cluster centre and data points and in terms of objective function. However, for the Enron dataset, (as in Table ), the improved method is better than conventional one in objective function, the difference in value of average distance between clusters is small between conventional and improve FCM, also for Bibtex dataset, the value of average distance is close between conventional and improved FCM, while there is a difference in the values of objective function for the two methods. This is because of the high overlap of data. Such result indicates that the conventional FCM is sensitive to overlapping data. Hence, temporal information is useful to facilitate the mapping of a data point to its cluster. Clustering performance is highly affected by data structure and cluster density. It has poor performance when the cluster densities are highly different.................... x..... a........ b x. Results In this section, results of the undertaken experiments are presented. The graphical illustration in Figure shows Table. Iteration count that leads to the minimum objective function Dataset Conventional FCM Improved FCM Medical Enron Bibtex,. Table. Average distance and objective function for the three datasets Method Medical Enron Bibtex Conventional FCM Improved FCM Avg. Objective Avg. Objective Avg. Distance function Distance function Distance.......... Objective function. c. Figure. (c) Bibtex. c Cluster densities of (a) medical, (b) Enron and the cluster densities for the datasets under consideration. It can be seen that the medical dataset has high different cluster density, which affects the performance of clustering. On the other hand, the Enron and Bibtex sets have close cluster densities. Figure shows the progress of the objective function for conventional FCM, while Figure illustrates the result for the improved FCM. From these two figures, it can be concluded that improved FCM is better than the conventional FCM in terms of iteration count and the value of objective function. The number of iterations that generates the minimum objective function for the employed datasets is smaller than that obtained by conventional FCM. The value of objective function affects the accuracy of the clustering. This is because when data point is close to Vol (S()) December www.indjst.org Indian Journal of Science and Technology

Aseel Mousa and Yuhanis Yusof s a.... b s s..... b..... c Figure. Improved FCM objective function for (a) medical, (b) enron and (c) bibtex. s the cluster centre, the objective function becomes small, causing a high membership function. c Figure. Conventional FCM objective function for (a) medical, (b) enron and (c) bibtex.. Conclusion The Fuzzy C-mean Algorithm (FCM) is one of the most well-known clustering algorithms. Nevertheless, it does not utilize temporal information contained in the data, to create clusters based temporal matching of data. In this paper, a new membership function is proposed for inclusion in the FCM. The membership functions of the neighbors in the temporal domain are enumerated to obtain the probability of data point belonging to specific cluster. The aim of the improved FCM is to produce quality clusters for multi-labelled dataset. The experiments revealed that the proposed method minimizes the objective function value while requiring fewer iterations.. References a. Niu Q, Huang X. An improved fuzzy c-means clustering algorithm based on PSO. Journal of Software. ; ():.. Lou X, Li J, Liu H. Improved fuzzy c-means clustering algorithm based on cluster density. Journal of Computational Information Systems. ; ():. Vol (S()) December www.indjst.org Indian Journal of Science and Technology

Fuzzy C-means Clustering with Temporal-based Membership Function. Chattopadhyay S, Pratihar DK, Sarkar SCD. A comparative Study of Fuzzy C-means algorithm and Entropy-based Fuzzy Clustering algorithms. Computing and Informatics. ; :.. Zanaty EA. An adaptive fuzzy C-means algorithm for improving MRI segmentation. Open Journal of Medical Imaging. ; ():.. Blacknell D, Griffiths H. Radar Automatic Target Recognition (ATR) and Non-Cooperative Target Recognition (NCTR). ;.. Kannan SR, Ramathilagam S, Pandiyarajan R. Modified bias field fuzzy C-means for effective segmentation of brain MRI. Transactions on computational science VIII. Gavrilova ML: Springer-Verlag;. p... Lu Y, Ma T, Yi C, Xie X, Tian W, Zhong S. Implementation of the Fuzzy C-Means Clustering algorithm in meteorological data. International Journal of Database Theory and Application. ; ():.. Kaur P, Soni AK, Gosain A. RETRACTED: A robust kernelized intuitionistic Fuzzy C-means Clustering algorithm in segmentation of noisy medical images. Pattern Recognition Letters. ; ():.. William R. Hierarchical Temporal Memory Cortical Learning algorithm for pattern recoginition: ProQuest, UMI Dissertation Publishing; Oct.. Jun W, Shi-Tong W. Double indices FCM algorithm based on hybrid distance metric learning. Journal of Software. ; ():.. Grabusts P. The choice of metrics for clustering algorithms. International Scientific and Practical Conference; Izdevniecība,: Rēzeknes Augstskola, Rēzekne;.. Cai W, Chen S, Zhang D. Fast and Robust Fuzzy C-Means Clustering Algorithms incorporating local information for image segmentation. Pattern Recognition Letters. ; ():. Mullner D. Modern Hierarchical, Agglomerative Clustering algorithms. Librarary C, editor. Modern Hierarchical, Agglomerative Clustering Algorithms. arxiv:.v;. p... Tsai D-M, Lin C-C. Fuzzy C-means based clustering for linearly and nonlinearly separable data. Pattern Recognition Letters. ; ():.. Schwämmle V, Jensen ON. A simple and fast method to determine the parameters for Fuzzy C means Cluster analysis. Bioiformatics. ; ():. Vol (S()) December www.indjst.org Indian Journal of Science and Technology