On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis Distances

Similar documents
Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Cluster Analysis. Ying Shen, SSE, Tongji University

Stefano Cavuoti INAF Capodimonte Astronomical Observatory Napoli

Introduction to Artificial Intelligence

Unsupervised Learning : Clustering

Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms

Clustering CS 550: Machine Learning

Improved Performance of Unsupervised Method by Renovated K-Means

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS

Machine Learning (BSMC-GA 4439) Wenke Liu

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis using Spherical SOM

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

Machine Learning (BSMC-GA 4439) Wenke Liu

University of Florida CISE department Gator Engineering. Clustering Part 2

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Comparative Study Of Different Data Mining Techniques : A Review

Iteration Reduction K Means Clustering Algorithm

Clustering and Visualisation of Data

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

An Enhanced K-Medoid Clustering Algorithm

Gene Clustering & Classification

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Saudi Journal of Engineering and Technology. DOI: /sjeat ISSN (Print)

Clustering. Supervised vs. Unsupervised Learning

OPTIMUM COMPLEXITY NEURAL NETWORKS FOR ANOMALY DETECTION TASK

Supervised vs. Unsupervised Learning

Unsupervised Learning

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Overlapping Clustering: A Review

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Chapter 6 Continued: Partitioning Methods

Clustering. Chapter 10 in Introduction to statistical learning

Statistical Methods in AI

A Review of K-mean Algorithm

Enhancing K-means Clustering Algorithm with Improved Initial Center

Unsupervised Learning and Clustering

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION

Fuzzy ants as a clustering concept

CSE 5243 INTRO. TO DATA MINING

An adjustable p-exponential clustering algorithm

Understanding Clustering Supervising the unsupervised

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

On the Consequence of Variation Measure in K- modes Clustering Algorithm

Input: Concepts, Instances, Attributes

Classification with Diffuse or Incomplete Information

Forestry Applied Multivariate Statistics. Cluster Analysis

Unsupervised learning

Redefining and Enhancing K-means Algorithm

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Workload Characterization Techniques

CHAPTER 4: CLUSTER ANALYSIS

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

International Journal of Advanced Research in Computer Science and Software Engineering

INF 4300 Classification III Anne Solberg The agenda today:

CSE 5243 INTRO. TO DATA MINING

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

The University of Birmingham School of Computer Science MSc in Advanced Computer Science. Imaging and Visualisation Systems. Visualisation Assignment

Hybrid Fuzzy C-Means Clustering Technique for Gene Expression Data

Clustering Using Elements of Information Theory

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE

Lecture on Modeling Tools for Clustering & Regression

Network Traffic Measurements and Analysis

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

K-Means. Oct Youn-Hee Han

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Using Machine Learning to Optimize Storage Systems

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

A Novel Approach for Weighted Clustering

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Unsupervised Learning

Multivariate Analysis

Collaborative Filtering using Euclidean Distance in Recommendation Engine

The Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Figure (5) Kohonen Self-Organized Map

Introduction to Computer Science

Multi-Modal Data Fusion: A Description

Maharashtra, India. I. INTRODUCTION. A. Data Mining

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

A New Improved Hybridized K-MEANS Clustering Algorithm with Improved PCA Optimized with PSO for High Dimensional Data Set

Unsupervised Learning and Clustering

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

New Approach for K-mean and K-medoids Algorithm

FUZZY KERNEL K-MEDOIDS ALGORITHM FOR MULTICLASS MULTIDIMENSIONAL DATA CLASSIFICATION

The Clustering Validity with Silhouette and Sum of Squared Errors

An Effective Determination of Initial Centroids in K-Means Clustering Using Kernel PCA

Automated Clustering-Based Workload Characterization

Hierarchical Clustering

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Transcription:

International Journal of Statistics and Systems ISSN 0973-2675 Volume 12, Number 3 (2017), pp. 421-430 Research India Publications http://www.ripublication.com On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis Distances R. Deepana 1 Department of Statistics, Pondicherry University, India. Abstract This paper investigates the clustering problem with different weights assigned to the observations. In this paper, a weight selection procedure is proposed based on Euclidean and Mahalanobis distance measures. The clustering results of the proposed weighted K-means algorithm are compared with the K- means algorithm. The numerical illustrations show that the proposed algorithm produces better groupings than the k-means algorithm. Keywords: Euclidean Distance, Mahalanobis Distance, K-means algorithm, Adjusted Rand Index 1. INTRODUCTION Clustering is one of the most important unsupervised learning problems. Cluster analysis deals with the problem where groups are themselves unknown apriori and the primary purpose of data analysis is to determine the groupings from the data themselves so that the entities within the same group are in some sense more similar or homogeneous than those that belong to different groups. Cluster analysis is also known as segmentation analysis or taxonomy analysis [ Jain and Dubes (1988) ]. Cluster analysis has been widely used in several disciplines such as Statistics, Software Engineering, Biology, Psychology and other Social Sciences in order to identify natural groups in large amounts of data. Clustering has also been widely adopted by researchers within computer science especially the database community. 1 Corresponding Author: R.Deepana, Department of Statistics, Pondicherry University, India. E mail id: deepana192007@gmail.com

422 R. Deepana The majority of clustering techniques start with computation of similarity or distance measures. Most commonly used distance measures are Euclidean distance, Mahalanobis distance and Minkowski distance [Everitt et al., (2011)]. Clustering methods are broadly classified as hierarchical and no-hierarchical methods. Non- Hierarchical cluster analysis aims to find grouping of objects which maximizes or minimizes some evaluation criterion. K-means clustering algorithm is a well known non-hierarchical method. In k-means algorithm, first K points are chosen as the initial centroids from the given dataset. Each data point is assigned to the closest centroid. The centroid of each cluster is updated based on the points assigned to the cluster. The assignments are repeated till the centroids of clusters are same in two successive iterations. However, there are some drawbacks of K-means algorithm. The algorithm is strongly sensitive to outliers and it is difficult to predict the initial number of clusters. To overcome the above mentioned drawbacks of K-means algorithm, a lot of work has been done in the weighted k-means algorithm by various researchers. Gnanadesikan et al., (1995) have discussed about the weighting and selection of variables in cluster analysis. They have used range scaling, weight scaling and auto scaling for variables. The experimental results on cluster recovery, variable selection and variable weighting in context with simulated and real data sets are discussed in their paper. Gnanadesikan et al., (2007) have discussed different methods of scaling for variables and determining the weights for variables. They have used statistical measures for scaling sample variances, reciprocal of squared sample ranges and inter quartile ranges. These weights do not depend on the particular method of clustering. Wen-Liang et al., (2011) have proposed weighted k- means algorithm based on statistical variation. They have proposed two attribute weight initialization in the weighted k-means algorithm through variation in the data. They have compared the clustering results of the weighted k-means algorithm for different initialization methods. De Amorin et al., (2012) have developed Minkowski weighted k-means algorithm in which weights are computed for features at each cluster. They have discussed and compared six initializations methods in the Minkowski metric space. They have compared the initializations in terms of accuracy and processing time. However, this method had poor accuracy in high-dimensional features except Ward s method. Melynkov et al., (2014) have proposed k-means algorithm using Mahalanobis distance with a novel approach for initializing covariance matrices. The use of Mahalanobis distance is not effective without proper covariance matrix initialization strategy. Renato et al., (2016) have introduced Minkowski weighted k-means algorithm with distributed centroids for numerical, categorical and mixed types of data. They have used silhouette method and Adjusted Rand Index (ARI) method for evaluating the cluster quality. Jian et al., (2011) have considered probability distribution to represent the sample weights for dataset. They

On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis 423 also have applied maximum entropy methods to compute the sample weights for clustering such as k-means, fuzzy c-means and expectation and maximization methods. However, clustering results are affected due to initial centroid and initial weights. Huang et al., (2005) proposed weighted k-means algorithm, where the weights are used to reduce noise variables. Madhu Yedla et al., (2010) proposed an algorithm for initial values for the centroids for k-means algorithm. The review of research papers indicate that a lot of work has been carried out based on variable weight and sample weight in cluster analysis. It can also be seen that all the samples in a dataset are not equally important. There may be outliers in the dataset. In such cases, it is essential that different weights may be assigned to the observations. In this paper, an attempt has been made to obtain better clusters and improve the accuracy of the clusters by introducing sample weighted clustering algorithm. The sample weights are calculated using two distance measures namely, Euclidean and Mahalanobis distances between each sample and the cluster centroids. The weighted samples are assigned to the clusters based on k-means criterion. The rest of the paper is organized as follows. In Section 2, the proposed sample-weighted clustering algorithm is described. Section 3, investigates the performance of the proposed method using the real datasets and compares it with k-means algorithm. The conclusion is given in Section 4. 2 SAMPLE WEIGHTED CLUSTERING ALGORITHM The present work is an attempt to obtain better clusters by introducing weighted samples in k-means clustering. It is known that in k-means clustering an observation is assigned to a cluster based on minimum distance calculated between the observation and the cluster centroid. The initial cluster centroid plays a pivotal role in allocation of samples to the clusters. Generally, in k-means algorithm, the initial centroids are chosen randomly. In this paper, a simple method of choosing initial centroids for the cluster is suggested. The initial centroid is chosen based on the following criteria. The Euclidean distance from origin for each sample is calculated. n A new distance E di = i=1 exp (di) is calculated for each sample. The calculated new distances are arranged in ascending order. The samples are then rearranged according to the sorted distances. The sorted samples are grouped into K clusters randomly and the centroids for each cluster are chosen from the samples. The distance measures used in the clustering algorithm also affects the final cluster results due to the nature of the sample observations. Generally, k-means algorithm uses Euclidean distance measure D ik = (x ij C kj ) 2 k = 1,2,, K which is suitable for clusters with spherical homogeneous covariance matrices. Mahalanobis distance measure n i=1 p j=1

424 R. Deepana n p j=1 D ik = (x ij C kj ) T i=1 Σ 1 (x ij C kj ) k = 1,2,, K with covariance matrix Σ is used when the clusters are elliptical in shape. In the present work, the two distance measures are used in calculating the weights for the samples. The weights are obtained by minimizing the objective function n K M = i=1 k=1(w ik ) α D ik (1) n subject to i=1 W ik = 1; k = 1,2,, K W ik 0 where D ik is the distance between i th sample and k th cluster centroid (Euclidean or Mahalanobis distances) and W ik is the weight of the i th sample assigned to k th cluster. The weights derived by minimizing the objective function is given by W ik = 1 k t=1 [ D 1 α 1 ik D ] ; k = 1,2,, K (2) tk D tk is the distance between t th sample in k th cluster and k th cluster centroid. W ik can be expressed as 1 α 1 th root of reciprocal of sum of ratio of D tk and D ik. α is the parameter of weights. The main use of α is to decrease the noise points from a dataset given by clustering results. The value of α affect the clustering results. If α value is too large then it leads to poor performance (or) its makes more noise points in the clustering results. If α = 1 then it is equivalent to the clustering results of k-means clustering without weights. In general, it is suggested to take values of α between 2.001 to 2.09. The weights based on 1 α 1 th root of ratio of distance between samples in each group and the cluster centers and total sum of distances between samples in all groups and cluster centers is also calculated. These weights are also used in the clustering algorithm. K m=1 D mk W ik = [ K ] t m D tk t=1 1 α 1 (3) The proposed sample weighted clustering algorithm is described here. 1. Choose the number of clusters K for the dataset and initial centroids randomly.

On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis 425 2. Calculate the distance between each sample x i and cluster center C k using Euclidean and Mahalanobis distance measures. Assign the samples to the cluster whose distance from the cluster center is minimum of all the cluster centers. 3. Calculate the weights for each sample using the Equation (2) and (3). 4. Recalculate the new cluster centers using the formula 1 C x i k i=1 where, C k represents the number of data points in the k th cluster. 5. Cluster Update: The cluster update assigns each object to their closest centroid. 6. Update the weights in Equation (1). 7. Repeat the step (2) to (7) until the cluster center are same. 3. EXPERIMENTS RESULTS Three real datasets are used to compare the proposed algorithm with k-means algorithm. The dataset are taken from the UCI Reppository of Machine Learning. Iris dataset contains the three class of iris flower: setosa, versicolor and viginica. This dataset contains 150 observations and three classes. In iris dataset, each class contains 50 observationns with four attributes namely sepal length, sepal width, petal length and petal width. [https://archive.ics.uci.edu/ml/datasets/iris]. The Wine dataset contain the chemical analysis of wine in the same region of Italy with three different cultivators. The dataset contains 178 observations and three classes with 13 attributes. The attributes of the datset are alcohol, phenols, proanthocyanins, color intensity, malic acid, ash, alkalinity of ash, magnesium, total phenols, hue, OD280/OD315 of diluted wines flavanoids, nonflavanoid and proline. [https://archive.ics.uci.edu/ml/datasets/wine]. The Ionosphere dataset consists of 351 observations with 34 variables from two clusters. [https:// archive.ics.uci.edu/ml/datasets/ Ionosphere]. In this paper, the weighted k-means is performed for more than 50 times. In the first iteration weights are not used. After the first iteration, weights based on equation (2) and (3) are used for further analysis. Adjusted Rand Index (ARI) is used in cluster validation. The value of ARI lies between 0 and 1 [Everitt et al., (2011)]. Adjusted Rand Index and clustering accuracy are computed in this paper. The two weighted clustering methods are compared with original k-means clustering using three Real Datasets. The best classification of the datasets are shown in Figures 1,2 and 3 clearly. The results of misclassification rate, ARI and accuracy of the C k

426 R. Deepana clustering are shown in Table 1, 2 and 3. Three types of centroids are used in this work (Random centroid, Median centroid and proposed centroid). a) b) c) d) e) f) Figure 1 Scatter plot for Dataset of Iris data with three clusters labelled 1, 2, 3. a) k- means using Euclidean distance b) K-means using Mahalanobis Distance. c) K-means Euclidean Distance with weights-1. d) K-means using Mahalanobis distance with weights-1. e) K-means Euclidean Distance with weights-2. g) K-means using Mahalanobis distance with weights-2. Table 1 gives the clustering results for the Iris dataset. K-means algorithm with median centroids using Euclidean distance converged in 11 iterations with 84.4% and Mahalanobis distance converged in 10 iterations with 85.1% correct classification. K- means algorithm with proposed centriod using Euclidean distance converged in 13 iterations with 83.2% and Mahalanobis distance converged in 13 iterations with 89.1% correct classification. The sample weights based on Equation (2) are used in k-means algorithm. The results of sample weighted algorithm with Euclidean distance (ED-1) and Mahalanobis distance (MD-1) are also shown in Table 1. The results of proposed sample weighted clustering method using three types of centroid shows better classification as compared to K-means algorithm. MD-1 with median centroid converged in 8 iterations with 93% correct classification. The sample weighted clustering algorithm with proposed centroid using Euclidean distance converged in 10 iterations with 85% and Mahalanobis distance converged in 12 iterations with 92% correct classification. The sample weights using Equation (3) is represented as ED-2 (Euclidean distance) and MD-2 (Mahalanobis distance). The proposed algorithm with median centriod using Euclidean distance converged in 10 iterations with 92% and Mahalanobis distance converged in 8 iterations with 97% are

On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis 427 correct classification. Finally, centroids are chosen according to proposed method. The weighted clustering algorithm using Euclidean and Mahalanobis distances converged in 6 iterations with 85% and 7 iterations with 90% are correct classification respectively. ARI of iris dataset sample weights using Equation (3) with proposed centroids (0.927) gives good result. Table 1 Experimental Results: Results provided by k-means and weighted k-means algorithm for Iris Data set Misclassification Rate ARI Accuracy Method Random Median Random Median Random Median ED 0.233 0.073 0.081 0.717 0.717 0.770 0.738 0.846 0.833 M D 0.183 0.083 0.153 0.832 0.906 0.753 0.727 0.852 0.892 ED -1 0.122 0.162 0.223 0.850 0.741 0.755 0.804 0.843 0.751 MD -1 0.095 0.101 0.023 0.732 0.903 0.902 0.826 0.937 0.922 ED-2 0.101 0.073 0.102 0.800 0.801 0.823 0.816 0.927 0.852 MD-2 0.038 0.056 0.052 0.810 0.922 0.927 0.857 0.973 0.901 Table 2 shows the clustering results for the wine dataset. K-means algorithm with median centroid using Euclidean distance converged in 14 iterations with 78% and Mahalanobis distance converged in 11 iterations with 85% correct classification. The sample weights based on Equation (2) are used in k-means algorithm using Euclidean distance (ED-1) with proposed centroid converged in 8 iterations with 88% and Mahalanobis distance (MD-1) converged in 10 iterations with 97% are correct classification. The proposed algorithm with median centriod using Euclidean distance converged in 10 iterations with 92% and Mahalanobis distance converged in 8 iterations with 97% are correct classification. The sample weights based on Equation (3) used in k-means algorithm using Euclidean distance (ED-2) with proposed centroid converged in 7 iterations with 87.8% and Mahalanobis distance (MD-2) converged in 9 iterations with 98.3% correct classification. ARI of wine dataset sample weights using Equation (2) with median centroids using Mahalanobis distance (MD-1) is 0.927. ARI of MD-2 with proposed centroid is 0.973.

428 R. Deepana a) b) c) d) e) f) Figure 2 Scatter plot for Dataset of wine data with three clusters labelled 1, 2, 3. a) k-means using Euclidean distance b) K-means using Mahalanobis Distance. c) K- means Euclidean Distance with weights-1. d) K-means using Mahalanobis distance with weights-1. e) K-means Euclidean Distance with weights-2. g) K-means using Mahalanobis distance with weights-2. Table 2 Experimental Results: The results provided by k-means and weighted k- means algorithm in Wine Data set. Misclassification Rate ARI Accuracy Method Random Median Random Median Random Median ED 0.293 0.255 0.202 0.461 0.406 0.523 0.735 0.785 0.738 M D 0.087 0.083 0.052 0.571 0.852 0.893 0.712 0.852 0.835 ED -1 0.267 0.238 0.192 0.532 0.427 0.567 0.832 0.811 0.882 MD -1 0.057 0.055 0.025 0.725 0.943 0.922 0.852 0.925 0.973 ED-2 0.253 0.225 0.171 0.709 0.521 0.521 0.809 0.840 0.878 MD-2 0.255 0.011 0.023 0.758 0.965 0.973 0.750 0.978 0.985 Table 3 shows the clustering results of Ionosphere dataset. Comparing all methods in Ionosphere dataset, the clustering method (ED -1) with median centroid converged in 13 iterations with 80% and MD-1 converged in 10 iterations with 71.7% correct classification. The k-means algorithm using Euclidean distance (ED-2) with proposed centroid converged in 12 iterations with 82.8% and Mahalanobis distance (MD-2) converged in 11 iterations with 75.3% are correct classification. The ARI value of MD-2 method with proposed centroid is 0.782.

On Sample Weighted Clustering Algorithm using Euclidean and Mahalanobis 429 a) b) c) d) e) f) Figure 3 Scatter plot for Dataset of Ionosphere data with three clusters labelled 1, 2. a) k-means using Euclidean distance b) K-means using Mahalanobis Distance. c) K- means Euclidean Distance with weights-1. d) K-means using Mahalanobis distance with weights-1. e) K-means Euclidean Distance with weights-2. g) K-means using Mahalanobis distance with weights-2. Table 3. Experimental Results: The results provided by k-means and weighted k- means algorithm in Ionosphere Data set. Misclassification Rate ARI Accuracy Method Random Median Random Median Random Median ED 0.392 0.110 0.362 0.023 0.330 0.528 0.593 0.763 0.682 M D 0.328 0.051 0.232 0.153 0.532 0.690 0.558 0.612 0.672 ED -1 0.421 0.154 0.208 0.523 0.539 0.568 0.601 0.803 0.713 MD -1 0.308 0.047 0.192 0.452 0.587 0.728 0.567 0.717 0.623 ED-2 0.326 0.110 0.223 0.523 0.533 0.572 0.658 0.823 0.823 MD-2 0.235 0.061 0.210 0.279 0.502 0.782 0.721 0.675 0.752 The experimental results show that the proposed method provides better results than k-means algorithm. The proposed method provides high accuracy. 4. CONCLUSION This work attempted to try two different methods for sample weights. This is achieved by the improvement of a new procedure to generate weight for each sample from each cluster. Within proposed methods sample weights have been compared

430 R. Deepana and the observed results are compared with original k-means algorithm. Experimental results show the effectiveness of the new algorithm. REFERENCES [1] Anil k. Jain and Richard C. Dubes, Algorithms for Clustering Data (1988), Prentice Hall Inc. [2] Brain S. Everitt, Sabine Landau, Morven Leese and Dainel Stahl (2011), Cluster Analysis, 5/e, Wiley Series in Probability and Statistics. [3] Gnanadesikan R, Kettenring J R and Tsao S L (1995), Weighting and Selection of Variables for Cluster Analysis, Journal of Classification, 12 (1), 113-136. [4] Gnanadesikan R, Kettenring J R and Srinivas Maloor (2007), Better alternatives to current methods of scaling and weighting data for cluster analysis, Journal of Statistical Planning and Inference, 137 (11), 3483-3496. [5] Igor Melnykov and Volodymyr (2014), On k-means Algorithm with the Use of Mahalanobis Distances, Statistics and Probability Letters, 84(C), 88-95. [6] Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong, and Zichen Li (2005), Automated Variable Weighting in k-means Type Clustering, 27(5), 657 668. [7] Jian Yu, Miin-Shen Yang and E. Stanley Lee (2011), Sample Weighted Clustering Methods, Computers and Mathematics with Applications, 62, 2200-2208. [8] Madhu Yedla, Sandeep Malle and Srinivasa TM, (2010), An improved K- means Clustering Algorithm with refined initial centroids, Journal of Networking Technology, 1(3), 113-117. [9] Renato Cordeiro de Amorim and Peter Komisarczul (2012), On Initializations for the Minkowski Weighted K-means, IDA Lecture Notes in Computer Science, 7619, 45-55. [10] Renato Cordeiro de Amorim and Vladimir Makarenkov (2016), Applying Subclustering and Lp Distance in Weighted K-means with Distributed s, Neurocomputing, 173(3), 700-707. [11] Wen-liang Hung, Yen-Chang Chang and E. Stanley Lee (2011), Weight Selection in W- K-means algorithm with an application in color image segmentation, Computers and Mathematics with Applications, 62 (2), 668-676.