Ensemble Fuzzy Clustering using Cumulative Aggregation on Random Projections

Size: px

Start display at page:

Download "Ensemble Fuzzy Clustering using Cumulative Aggregation on Random Projections"

Dorothy Evans
5 years ago
Views:

1 IEEE TRANSACTIONS ON FUZZY SYSTEMS 1 Ensemble Fuzzy Clusterng usng Cumulatve Aggregaton on Random Projectons Punt Rathore, Member, IEEE, James C. Bezdek, Lfe Fellow, IEEE, Sarah M. Erfan, Sutharshan Rajasegarar and Marmuthu Palanswam, Fellow, IEEE Abstract Random projecton s a popular method for dmensonalty reducton due to ts smplcty and effcency. In the past few years, random projecton and fuzzy c-means based cluster ensemble approaches have been developed for hgh dmensonal data clusterng. However, they requre large amounts of space for storng a bg affnty matrx, and ncur large computaton tme whle clusterng n ths affnty matrx. In ths paper, we propose a new random projecton, fuzzy c-means based cluster ensemble framework for hgh dmensonal data. Our framework uses cumulatve agreement to aggregate fuzzy parttons. Fuzzy parttons of random projectons are ranked usng external and nternal cluster valdty ndces. The best partton n the ranked queue s the core (or base partton. Remanng parttons then provde cumulatve nputs to the core, thus arrvng at a consensus best overall partton bult from the ensemble. Expermental results wth Gaussan mxture datasets and a varety of real datasets demonstrate that our approach outperforms three state-ofthe-art methods n terms of accuracy and space-tme complexty. Our algorthm runs one to two orders of magntude faster than other state-of-the-arts algorthms. Index Terms Hgh Dmensonal Data, Fuzzy Clusterng, Random Projecton, Ensemble Clusterng, Cumulatve Agreement. I. INTRODUCTION Clusterng s an essental method of exploratory data analyss n whch data are parttoned nto several subsets such that objects n each subset are smlar to each other, and dssmlar to members of other subsets. Clusterng s an underlyng tool for knowledge dscovery [1], outler/anomaly detecton [2] [5], ndexng [6], and compresson [7]. Wth the rapd advancement of Internet of Thngs (IoT technologes, moble computng, smart moble devces, and socal network servces, data are growng at very fast rates. Many bomedcal applcatons such as physologcal montorng, magng, and sequencng [8] produce large amounts of hgh-dmensonal data [9]. Ths artcle s about clusterng algorthms that can be used for such large, hgh dmensonal datasets. Hgh dmensonal feature vector data,.e., data descrbed by a large number of attrbutes, poses two challenges for clusterng. Frst, the so-called curse of dmensonalty, whch Punt Rathore, and Marmuthu Palanswam are wth the Department of Electrcal and Electronc Engneerng, The Unversty of Melbourne, Parkvlle, Vctora, Australa. E-mal: {prathore.student, palan}@unmelb.edu.au. James C. Bezdek, and Sarah M. Erfan are wth the School of Computng and Informaton Systems, The Unversty of Melbourne, Vctora, Australa. E-mal: {jbezdek, sarah.erfan}@unmelb.edu.au. Sutharshan Rajasegarar s wth the School of Informaton Technology, Deakn Unversty, Geelong, Vctora, Australa. E-mal: sutharshan.rajasegarar@deakn.edu.au. s caused by the lack of a suffcent number of samples n most hgh dmensonal data, makes t dffcult to fnd statstcally meanngful structures n the data [10]. Second, nosy and rrelevant attrbutes n the data can worsen the performance of a clusterng algorthm. One possble soluton to mprove the utlty of clusterng algorthms for hgh dmensonal data s to perform dmensonalty reducton [11]. Feature subset selecton [12] and feature transformatons to lower dmensonal spaces are two well known methods for dmensonalty reducton. Popular algorthms for feature extracton, such as Prncpal Component Analyss (PCA [13] and Sngular Value Decomposton (SVD [14], use well-defned crtera to optmze the projecton n lower dmensonal space. Unlke these algorthms, random projecton [15] [17] s a relatvely smple, computatonally effcent lnear transformaton method whch does not use any specal crtera to fnd "optmal" lower dmensonal projectons. Two key propertes, namely low computatonal complexty and (approxmate dstance preservaton n lower dmenson subspaces, make random projecton [16] an attractve choce for dmensonalty reducton. Over the past few years, ensemble clusterng has drawn sgnfcant attenton n addressng the clusterng problem. Random projecton based ensemble frameworks [18] [21] have been proposed for hgh-dmensonal clusterng usng fuzzy or probablstc clusterng algorthms. These approaches use random projecton to generate multple subsets nto a lower dmenson from the orgnal dataset, and then some method of ntegraton s used across the soft clusterng results obtaned on all projected datasets. Among these random projecton based fuzzy clusterng approaches, the most recent approaches [20], [21] requre less memory and run faster than earler approaches [18], [19]. However, the ensemble algorthms developed n [20], [21] stll requre very large amounts of space for storng a bg affnty matrx; moreover, they take a lot of tme to cluster the affnty matrx. Generatng and combnng multple output parttons from clusterng has been done n several ways [22] [28]. However, most of the exstng mergng algorthms suffer from tme and/or space complexty problems. Among these approaches, agreement (votng based mergng [25] [28], s the most popular and relatvely computatonally effcent approach. To the best of our knowledge, none of the algorthms based on mergng cluster ensembles usng the agreement approach have been studed for large and hgh-dmensonal datasets. In ths paper, we propose a new, smple and effcent random projecton based ensemble framework usng a cumulatve agreement scheme to aggregate multple fuzzy membershp

2 2 IEEE TRANSACTIONS ON FUZZY SYSTEMS matrces based on ther qualty. Cluster Valdty Indces (CVIs are used to determne the qualty of consensus parttons. Ths framework elmnates the need of a fnal tme-consumng clusterng step such as the ones reported n [19] [21] to obtan output parttons. Our aggregaton method employs an agreement based approach [27], [28], whch, to our knowledge, has been prevously studed for only crsp parttons. Our algorthm extends ths dea to the soft case for effectve aggregaton of fuzzy parttons, whch are obtaned usng the Fuzzy c-means (FCM clusterng algorthm [29] on randomly projected datasets. The ensemble approach used n our framework combnes fuzzy parttons n a sequental manner, thus avodng the complexty requred by smultaneous aggregaton of the sute of fuzzy parttons produced by clusterng many random projectons of the hgh dmensonal data. Our method, whch we call Cumulatve Agreement FCM (CAFCM, scales lnearly n the number of data ponts and the number of repettons, makng our random projecton based ensemble approach feasble for large and hgh dmensonal datasets. We evaluate the performance of our proposed framework on two synthetc and sx real hgh dmensonal datasets to demonstrate ts superorty and robustness over three state-ofthe-art approaches. Here s an outlne of the rest of ths artcle. Secton II presents prelmnares on fuzzy and crsp parttons and random projecton methods. Secton III presents a revew of related work. Our agreement based aggregaton model s dscussed n Secton IV. Secton V descrbes the use of CVIs n our framework to acheve the best performance. Secton VI presents the proposed framework for Cumulatve Agreement Fuzzy c-means (CAFCM for ensemble fuzzy clusterng whch uses random projecton and cumulatve agreement. Secton VII dscusses the numercal experments and results, followed by the conclusons and dscusson n Secton VIII. II. PRELIMINARIES In ths secton, we ntroduce our notaton for crsp and soft parttons and present the random projecton method. A. Matrx Representaton for Fuzzy and Crsp Parttons Consder a set of n objects O = {o 1,o 2,...,o n }, where each object s defned by a set of features n the form of X = {x 1,x 2,...,x n } R p. The non-degenerate (no zero rows correspondng to empty clusters, soft (fuzzy/probablstc and crsp c-parttons of n objects are matrces, denoted as: M f cn = {U R c n {1,c}, j {1,n} : u j [0,1]; c =1 u j = 1 j; n j=1 u j > 0..}; M hcn = {U M f cn u j {0,1}, j}, (1a (1b where u j represents the membershp of data pont j n cluster for fuzzy clusterng. If the clusterng s probablstc, the value u j = p j of data pont j s the posteror probablty that, gven pont j, t came from class. Soft parttons are more flexble than crsp parttons n that each object can have membershp n more than one cluster. In ths paper, FCM s used to generate soft parttons n random projectons of X. However, our ensemble approach for hgh dmensonal data clusterng s equally applcable to probablstc clusterng algorthms such as the Gaussan Mxture Model (GMM [30], mplemented wth the Expectaton-Maxmzaton (EM [31] algorthm. B. Random Projecton A random projecton (RP s a lnear transformaton from R p to R q, represented by a matrx T. Let X = {x 1,x 2,...,x n } R p be a set of n ponts n p dmensons, denoted as the "upspace". X can be mapped to a reduced dmenson dataset Y = {y 1,y 2,...,y n } R q,q p, denoted as the "downspace", by the lnear transformaton of X wth T. Most random projecton methods are based on the Johnson-Lndenstrauss (JL lemma [32]. It s not clear from [15] [17] whch random projecton functon T s best for clusterng, so we wll use a varant of the JL lemma proposed by Achloptas n [16]. The theorem proved by Achloptas s as follows: Theorem 1: Let matrx X R n p be a dataset of n ponts and p attrbutes. Gven ε > 0, and β > 0, for any nteger q q q 0 = (4 + 2βlog(n ε 2 /2 ε 3 /3. (2 The parameter ε controls the accuracy n dstance preservaton, whle β controls the probablty that dstance preservaton to wthn 1 ± ε s acheved. Let T be a p q random matrx, n whch each element t, j s drawn from one of the followng ndependently dentcally dstrbuted dstrbutons: t, j = { +1 wth probablty 1/2 1 wth probablty 1/2 + 3 wth probablty 1/6 t, j = 0 wth probablty 2/3 3 wth probablty 1/6. Let Y = 1 q XT be the projecton matrx of the n ponts n R q. Let f : R p R q map the th row of X to the th row of Y. Then for any u,v X wth probablty at least 1 n β, we have (1 ε u v 2 f (u f (v 2 (1 + ε u v 2. Accordng to Theorem 1, f the reduced (downspace dmenson q s equal or bgger than the JL lower bound q 0, then parwse Eucldean dstance squares are preserved wthn a multplcatve factor of 1 ± ε, and we say that Y has JL certfcate. An older verson of ths projecton operator s based on randomly choosng each element of T from a Gaussan dstrbuton wth zero mean and unt varance whch carres a smlar guarantee [16], [33]. However, the authors n [34] assert that the JL bound often holds for q q 0. They called such projectons "rogue random projectons". We wll study the use of rogue random projectons n our ensemble clusterng approach. (3 (4

3 RATHORE et al.: ENSEMBLE FUZZY CLUSTERING USING CUMULATIVE AGGREGATION ON RANDOM PROJECTIONS 3 III. RELATED WORK In ths secton, we revew exstng random projecton based cluster ensemble methods for hgh dmensonal data clusterng and agreement based combnaton schemes. A. Random Projecton Based Ensemble Approaches Several ensemble approaches have been proposed for hgh dmensonal data clusterng, whch are based on random projecton and fuzzy c-means. The man dea of the exstng approaches s as follows; Frst, multple downspace datasets {Y r } N r=1 are generated n a fxed lower dmenson Rq usng RP, where N s the number of RPs. Then, FCM clusterng s performed on each downspace copy to obtan N fuzzy parttons, e.g., U r = FCM (Y r, where U r M f cn. These output parttons {U r } N r=1 are aggregated usng an ensemble scheme. The fnal output partton s typcally obtaned by performng soft clusterng on the rows of an aggregated matrx. Apparently, the frst cluster ensemble approach that used random projecton was proposed n [19], n whch GMM/EM clusterng was used to obtan probablstc parttons P M f cn, where p(c,θ s the probablty of pont beng n cluster c under a model θ. Subsequently, a smlarty matrx M was computed between two jont probablty dstrbutons for each downspace dataset. The fnal smlarty matrx M was obtaned by averagng the M s, and then the fnal clusterng output was obtaned by applyng a herarchcal clusterng algorthm, called complete lnkage (CL, on the aggregated smlarty matrx M. A smlar approach usng FCM for fuzzy clusterng (EFCM was used n [18] to fnd the sgnfcant genes n DNA mcro-array data. Random projecton was used to reduce the data dmensonalty. Then, FCM clusterng algorthm was employed on each downspace dataset to generate membershp matrces U r M f cn. Then for each r, a smlarty matrx M r was computed as M r = Ur T U r R n n. Then, an aggregated smlarty matrx (M was calculated by averagng the N M r s across multple projecton runs. The dstance matrx D = 1 M was computed, and then FCM was performed on the rows of D R n n to obtan a fnal membershp matrx. Both of the above approaches have space complexty O(n 2 for storng the smlarty matrx (M. There s a tme complexty of O(n 2 log(n n applyng complete lnkage (GMM/EM based approach and O(dlnc 2 n applyng FCM (the EFCM approach on D R n n, where n s number of data ponts, d s the dmensons of the matrx on whch clusterng s appled (for EFCM approach, d = n, c s the number of clusters, and l s the number of teratons used by FCM. There s an addtonal tme complexty of O(cNn 2 n the EFCM approach due to computng the product of the N partton matrces and ther transposes. Therefore, both of these algorthms are lmted to applcatons for whch the number of objects n s small (e.g., some thousands of samples, and the orgnal dmenson p of the upspace data s large (e.g., more than tens of thousands. As n ncreases, the EFCM approach becomes ntractable for bg data. To address the lmtatons of these two approaches for bg data clusterng, Popescu et al. [20] proposed a new method, RPFCM-A, that began wth FCM clusterng of random projectons of the data. The resultant membershp matrces {U r } N r=1 were concatenated as U con = [U1 T U 2 T... U N T ], and the fnal membershp partton was obtaned by applyng FCM to the rows of the aggregated matrx U con R n cn. Concatenatng N parttons of n c dmenson by stackng them along the element dmenson results n an n cn matrx whch s sgnfcantly smaller than M r. Ths approach elmnates the tme complexty spent computng products of the membershp matrces and ther transposes. Thus, t seems more sutable than the EFCM based approach. However, t stll requres the multplcaton of the concatenated matrx wth ts transpose when a crsp output partton s desred. Moreover, ths scheme has tme complexty of O(dlnc 2 when applyng FCM to the concatenated matrx U con R n cn, where d = cn. If the number of clusters c n the data and the number of downspace datasets N are such that cn > p; t means the dmenson of the agreement matrx becomes hgher than the orgnal dmenson of dataset, whch makes ths approach unsutable for hgh dmensonal data clusterng. Mao et al. [21] proposed a modfed approach, RPFCM- B, based on spectral graph parttonng. Instead of consderng the full agreement matrx U con, they performed the clusterng on the frst c left sngular vectors of Û con, where Û con = SV D(U con R n c, whch reduces the computatonal tme as compared to RPFCM-A approach. However, there s space complexty of O(cnN, and computatonal complexty of O(n(cN 2 for SVD and O(dlnc 2 for the FCM clusterng, where d = c. B. Agreement Based Combnaton Schemes Among exstng ensemble approaches, agreement based mergng algorthms are popular due to ther smplcty and computatonal effcency. The dea of the agreement based combnaton scheme for fuzzy clusterng was frst ntroduced by Dmtradou et al. [27], whch s based on mnmzng the average squared dstance between ensemble membershp parttons and an output optmal partton. Ths algorthm computes an approxmate soluton n a sequental manner, n whch, the best cluster label permutaton s obtaned for each ensemble partton wth respect to a reference partton, followed by updatng the reference partton through averagng. However, the determnaton of the best cluster label for each cluster n a partton for large values of c s a tme consumng task due to the computaton of squared dstances between parttons across each possble permutaton of cluster labels. The labellng correspondence problem s solved n [26] usng a maxmumlkelhood estmate found wth the Hungaran method [35], and then pluralty votng s appled to obtan an optmal partton. The Hungaran algorthm can be costly because t s O(c 3. The most recent work on consensus clusterng employs a votng based mechansm [28], where the cluster label assgnment problem s addressed usng a contngency matrx whch requres less computaton tme than that requred by prevous methods. The study n [28] was lmted to crsp parttons. Ths scheme may not enjoy the same performance for soft parttons, whch are obtaned from projected datasets

4 4 IEEE TRANSACTIONS ON FUZZY SYSTEMS Random Projecton Downspace Dataset ; 5?4 ä Membershp matrx 7 5Ð/ ÙÖá Random Projecton Upspace Dataset :?4 ã D Downspace Dataset ; 6?4 ä YYYYX Downspace Dataset ; Ç?4 ä Membershp matrx 7 6Ð/ ÙÖá Random Projecton YYYXX Random Projecton FCM FCM FCM FCM Membershp matrx7 Ç Ð/ ÙÖá D= sf 5 Ã R Ü@5 Ç 7 ñ Ü 7 7 Ü Öâá L>7 ñ 5 7 ñ 6 ää7 ñ Ç? Cumulatve Agreement +CVI FCM 7 ¾ ¼Æ [18] FCM 7 ËÉ ¼Æ?º [20] SVD á7?kj :BENOP?OEJCQH=NRA?PKNO; FCM 7 ËÉ ¼Æ?» [21] 7 ¼º ¼Æ [Ths paper] Fg. 1: Four methods of ensemble FCM clusterng usng random projecton usng random projecton. Ths s because random projecton produces hghly unstable and radcally dfferent outputs [19], [33]. Although a far amount of work has been done on agreement based aggregaton schemes, only a few schemes are applcable to soft clusterng. In our work, we elmnate the use of FCM clusterng on the aggregated matrx to get a fnal output partton, usng an agreement based aggregaton scheme whch s computatonally effcent and easy to mplement. Fg. 1 compares the three FCM based schemes n [18], [20] and [21] to our proposed CAFCM method. In the next secton, we dscuss our agreement based scheme for aggregatng the fuzzy parttons {U r } N r=1, obtaned from FCM clusterng on N randomly projected datasets. IV. AGREEMENT BASED AGGREGATION MODEL The objectve of an aggregaton model s to fnd a partton U f, whch represents a set of N fuzzy parttons {U r } N r=1, the representaton beng optmal n some well-defned sense. We assume that U f and the U r are all the same sze (c n. Let u (r and u ( f be the label vectors of data pont x for the parttons U r and U f, respectvely. That s, u (r s the -th column of U r, and smlarly for u ( f. The average dssmlarty functon h(u r,u f s chosen as an optmalty crtera, and can be expressed as the average squared dstance between the N columns of U r and U f, as [27] h(u r,u f = 1 n n =1 (u (r u ( f 2. (5 The computaton n equaton (5 measures the smlarty between U r and (the unknown soluton U f on the assumpton that the c clusters n U r and U f are "algned",.e., the rows of U r and U f represent the clusters n the same order. Ths s the so-called "regstraton problem" n clusterng, and care must be taken to ensure that all of the parttons beng aggregated are algned n ths sense. Ths problem s exacerbated when the parttons are fuzzy. We want to relabel the N U r s so that they are algned. Ths ensures that they wll be algned wth the unknown U f. One way to approach ths problem s to let Π b (U r represent the mappng of partton U r to an optmally relabelled partton U r,b wth respect to some base (or core partton U b. Then, an optmal partton can be obtaned as the soluton to [27], U f = argmn U b M f cn ( 1 N N r=1 h(π b (U r,u b. (6 The soluton of ths mnmzaton problem n [27] gves u ( f as the arthmetc mean of u (r over all parttons. In order to obtan the best cluster label permutaton for each ensemble partton, the squared dstance (mnmzaton between the ensemble and base parttons was chosen as mappng Π b (U r. A contngency weght matrx based mappng scheme was proposed n [28] as a soluton of (6. These solutons are not effectve n combnng multple fuzzy parttons whch are obtaned usng random projectons. Our experments wth ths method dd not show very promsng results. So, we turned to another approach, whch effectvely combnes fuzzy parttons, obtaned usng RPs, based on ther qualty, as measured by cluster valdty ndces. The concept behnd agreement based ensemble approach s that pars of ponts that stck together (appear n the same cluster n most or all of the ndvdual parttons should also stck together n the fnal ensemble partton. Suppose the number of clusters c r for ndvdual parttons U r s randomly selected wthn some range {c mn,c max }. The ntuton underlyng our approach s that the pars of ponts that are members of a cluster for hgher values of c should be consdered to be more strongly assocated to each other than pars of pont whch are together n a cluster at a smaller value of c. The N parttons obtaned by applyng FCM clusterng to N random projectons wll have dfferent nformaton content (qualty. The best qualty partton, whch has maxmum nformaton content about the cluster labels dstrbuton, s chosen as the base partton, U b, n the frst step of the aggregaton. Assumng that we do not have any pror knowledge for the selecton of the base partton and the "true" number of clusters, we use an nternal cluster valdty ndex (CVI to choose the base partton (dscussed n the next secton. The remanng N 1 parttons are ranked n decreasng order of qualty based on ther relatonshp to the base partton, and are combned sequentally based on ther rank. The objectve of ths scheme s to secure the strongest agreement between the hghest ranked parttons n the queue wth the base partton. In ths way, low-qualty parttons wll have mnmal effect on the qualty of the overall output partton. Mnor varatons n rankng are not expected to mpact the performance of ths scheme, because usng an ordered sequence based on decreasng qualty effectvely ntegrates the good and bad fuzzy parttons, and decreases the effects of bad parttons on the overall output. If the base partton s of poor qualty or there s major varaton n rankng (for example, a few

5 RATHORE et al.: ENSEMBLE FUZZY CLUSTERING USING CUMULATIVE AGGREGATION ON RANDOM PROJECTIONS 5 poor-qualty parttons are n the top fve parttons n the CVI queue, then we expect performance to deterorate. At the other extreme, f all N parttons are of roughly the same qualty, then the selecton of the base partton and rankng of the remanng parttons wll not have a sgnfcant effect on the output partton. In the next secton, we dscuss the use of CVIs to acheve the best performance for CAFCM. V. QUALITY OF CONSENSUS PARTITIONS The projected datasets can be drastcally dfferent from each other due to the random mappng from upspace to downspace. Consequently, clusterng on these dfferent downspace datasets wth any algorthm may result n output parttons of dfferent qualty. To determne the qualty of parttons, we use a cluster valdty ndex (CVI. A CVI s a measure of cluster qualty that can be used to dentfy the "best" member amongst a set of multple parttons (where best means, wth respect to the CVI n use. External CVIs requre ground truth nformaton, whereas nternal CVIs use only data and/or algorthmc outputs. See [36] [40] and Table X for a detaled analyss and dscusson on varous nternal and external CVIs. The qualty of the output partton U f constructed by CAFCM depends on the qualty of the base partton U b, whch s chosen n the ntalzaton phase. The fuzzy partton from the set {U r } N r=1, whch best preserves the structure of the ground truth partton of labeled data wll be taken as the base partton. The ntuton behnd usng the best member from the set of ensemble parttons as the base partton s that the output partton U f should contan the maxmum amount of nformaton about structure n the data that s present n the best qualty partton amongst all ensemble parttons. Most mportantly, ths wll eventually lead us to a method for dentfyng U b for the unlabeled data case. The qualty of ndvdual fuzzy parttons compared to a ground truth (labeled data partton can be determned usng a soft external CVI. Let the qualty of any partton U r wth respect to the ground truth partton U gt, usng an external soft CVI V exts, be denoted as V exts (U r U gt, where subsubscrpt s means soft. Based on the optmalty of V exts (U r U gt, the N ensemble parttons can be ranked n descendng order of qualty such that V exts (U (1 U gt V exts (U (2 U gt... V exts (U (N U gt, (7 where parenthetcal subscrpts ndcate the permutaton of the orgnal ndces that results n the orderng shown n (7, and we assume wthout loss of generalty that the CVI s maxoptmal (best s maxmum. Ths gves a set of sorted parttons based on ther qualty wth respect to the external CVI V exts. In real-world applcatons, the data s unlabeled so the ground truth nformaton, whch s requred to evaluate partton qualty based on (7, s not avalable. In ths case, a queston that must be answered s: can nternal CVIs (V nts be used to acheve smlar rankngs for a set of parttons U (ext s sorted U (nt s sorted? Internal/external (I/E matchng analyss s dscussed n Secton VII to determne whether the same base partton and smlar rankng of the sorted parttons, suggested by an external CVI, can be obtaned usng nternal CVIs. Assumng that smlar sets of parttons U (nt s sorted = U(ext s sorted can be obtaned usng an nternal CVI, the best qualty partton for unlabeled data, U (1 from U (nt s sorted, can be chosen as the base partton U b. Usng the base partton n Algorthm 1, chosen by ths crteron, results n an output partton U f, whch s an aggregaton of the ensemble of nputs that s optmal wth respect to the chosen CVI. Ths mnmzes the average dssmlarty between ensemble matrces and the best qualty partton, whch best preserves apparent cluster structure or nformaton about X. Next, we dscuss the proposed framework, CAFCM. VI. CUMULATIVE AGREEMENT FCM (CAFCM ALGORITHM Suppose we have a set of ensemble parttons U sorted = {U (r } N r=1, each partton havng c r clusters, ranked accordng to (7 n decreasng order of ther qualty wth respect to a specfed CVI. Let the best (frst partton U (1 n U sorted have c clusters and take U (1 = U b. The parttons {U (r } N r=2 are desgnated as votng parttons wth respect to U b. The entres of each column vector of stochastc matrx U (r M f cr n represent the degree of membershp of that object n each cluster (rows, and sum to 1, whereas, n the Moore-Penrose pseudonverse U 1 (r M f nc r, each column vector turns nto the row (cluster vector {c } c r =1 whose entres sum to 1 [41]. These values can be nterpreted as the weght of each data pont (rows n cluster (columns vector c. Multplyng the pseudonverse of U (r wth base partton U b gves the weght matrx W r,b R c c r, W r,b = U b U 1 (r. (8 Due to the pseudonverse U 1 n the weght matrx calculaton, (r the entres n W r,b do not le n the range [0,1]. The relabellng of partton U (r aganst the base partton U b s acheved by multplyng U (r wth ths weght matrx W r,b, whch gves the transformed partton U r,b as U r,b = W r,b U (r. (9 The degrees of membershp n the transformed partton U r,b correspond to degrees of membershps n U (r, whch are scaled by the entres of W r,b. Ths accomplshes the vote by U (r to the base partton U b. The ensemble approach n [28], that computes the weght matrx W 1 as W = U b U T (r, (10 s a specal case of approach (8 (sutable for fuzzy parttons. Both approaches are demonstrated n Example 1 wth a base partton U b and an ensemble partton U (r. The mutual nformaton between the transformed and the base partton s measured usng the soft Normalzed mutual nformaton ndex (NMI V NMIs [37]. It can be nferred from the NMI values n Example 1 that U r,b contans more mutual nformaton wth respect to the base partton U b, than U (obtaned usng (10 and (9. 1 The columns of weght matrx, W, are normalzed n [28] such that w j [0,1], and cr j=1 w j = 1.

6 6 IEEE TRANSACTIONS ON FUZZY SYSTEMS Example 1: Consder a fuzzy base partton U b of sze 3 4 and an ensemble fuzzy partton U (r of sze 2 4, as gven below: [ ] U b = ,U (r = The weght matrx W r,b, computed usng (8, and the matrx W, computed wth (10, are as follows: W r,b = ,W = , whch gves the correspondng transformed parttons U r,b and U, usng (9, as: U r,b = ,U = , V NMIs (U r,b U b =0.2178, V NMIs (U U b = When multplyng the partton U (r wth weght matrx W r,b, each row vector {c } c r =1 of U (r votes for each of the clusters {c j } c j=1 of U b, wth weghts w j from the cumulatve vote weght matrx W r,b. In the general case, each partton U (r from U sorted, casts ts vote wth U b ths way n decreasng order of ther qualty n a sequental manner. Followng [27], the base partton U ( b at teraton s calculated by averagng the last base partton U ( 1 b wth transformed partton U ( r,b. It s evdent from (8 and (9 that U r,b, and n turn U f, wll have the same number of clusters as the base partton U b. If the number of clusters c r for each ensemble partton s chosen randomly from c mn to c max, the crteron of selectng the base partton based on the CVI rankng (refer to Secton V does not always capture the most meanngful nformaton.e., true number of clusters n the base partton. The problem of fndng the true or best number of clusters, usng CVIs, s well addressed n the lterature. In our work, each ensemble partton havng the best number of clusters c r s obtaned usng a chosen CVI. For each downspace dataset, FCM clusterng s performed wth the number of clusters varyng from c mn to c max. Dependng on the evaluaton of the CVI, the ensemble partton U r havng the CVI-best number of clusters, c r s obtaned for each downspace dataset. Our CAFCM algorthm for hgh dmensonal data clusterng usng random projecton and cumulatve agreement based aggregaton wth FCM clusterng s presented n Algorthm 1. In Step 1 of the Algorthm 1, multple downspace datasets {Y r } are generated n a fxed lower dmensons; downspace R q usng random projecton, as dscussed n Secton II. In Step 2, FCM clusterng s appled to each downspace dataset Y r, wth the number of clusters varyng from c mn to c max. In Step 3, the partton U r wth the best number of clusters c r s obtaned for each downspace dataset, usng a chosen CVI. Ths step gves N fuzzy parttons, each havng a CVI-best number of clusters c r. In Step 4, these N fuzzy parttons are TABLE I: Tme and space complexty of four FCM-based ensemble approaches Ensemble Methods Tme Complexty Space Complexty EFCM [18] O(dlnc 2 + O(cNn 2, d = n O(n 2 RPFCM-A [20] O(dlnc 2 + O(cNn 2, d = cn O(n 2 RPFCM-B [21] O(dlnc 2 + O(n(cN 2, d = c O(cnN CAFCM (Proposed O(nNc 2 O(cn l s the number of teratons to termnaton, d s the dmensons of the matrx on whch clusterng s appled, c s the number of clusters, n s the number of data ponts, and N s the number of random projectons. Algorthm 1 CAFCM: Cluster Ensemble for FCM Clusterng wth Random Projecton Input: Dataset X R n p, cluster range = {c mn,c max }, downspace dmenson q, number of random projectons N. Output: Fuzzy partton U f. Step 1: Dataset generaton n downspace. for r = 1 to N do Generate downspace datasets Y r R n q usng Y = 1 q XT, where T R p q s the random matrx bult usng (3. end for Step 2: Run FCM on each Y r, obtanng U r M f cn : c = c mn to c max. Step 3: Get parttons {U r } N r=1 M f c r n, each partton havng a CVI-best c r number of clusters, choosng each c r wth an nternal cluster valdty ndex, V nts. Step 4: Get a set U of sorted parttons {U (r } N r=1 M f c r n, as gven n (7, usng the cluster valdty ndex, V nts. Step 5: Assgn the best partton U (1 (from Step 4 as the base partton,.e., U (1 b = U (1. for = 2 to N do W,b = U ( 1 b U 1 ( U,b = W,b U ( end for U f = U b. U ( b = 1 U ( 1 b + 1 U,b ranked based on ther qualty as n (7. In our experments, the Normalzed Partton Entropy (PEB V PEBs [38] was chosen as an nternal ndex n Steps 3 and 4. Step 5 corresponds to the cumulatve agreement based aggregaton approach, as dscussed n ths Secton. Whle FCM s part of the ttle of our algorthm, we pont out that ths scheme apples wthout change when the ensemble of soft parttons s generated by ANY fuzzy or probablstc clusterng algorthm. The tme and space complexty of the proposed aggregaton approach and the three state-of-the-art ensemble approaches that are used for comparson s shown n Table I. Our aggregaton approach has tme complexty of O(nNc 2 for matrx multplcaton and computaton of pseudonverse of the rectangular matrx [42]. The fast Moore Penrose nverse method [42] was used to compute the pseudo nverse of ensemble partton U (r. Therefore, the proposed aggregaton

7 RATHORE et al.: ENSEMBLE FUZZY CLUSTERING USING CUMULATIVE AGGREGATION ON RANDOM PROJECTIONS 7 method has lnear computatonal complexty n the number (n of nput samples. The CAFCM approach has the mnmal space complexty, O(cn, whch s requred to store the base partton that s updated sequentally n each teraton. VII. EXPERIMENTS We performed fve sets of experments. In the frst experment, we explored the effect of usng downspace datasets generated by dfferent RP dstrbutons (3 and (4 on the output partton. In the second experment, an nternal CVI valdaton test was performed among all nternal CVIs to choose the best c r correspondng to each RP, and subsequently a best nternal CVI s chosen. In the thrd experment, an Internal/External (I/E agreement test was performed to determne whether the parttons rankng, acheved by a soft external CVI, can also be obtaned usng a soft nternal CVI. Based on the agreement performance of each nternal CVI aganst the soft external CVI, we choose one best nternal CVI to get sorted parttons for each dataset n our ensemble approach. In the fourth experment, we demonstrate the effect of alterng the orderng sequence of ensemble parttons on the output partton for CAFCM. In the last experment, we compare dfferent cluster ensemble approaches for hgh dmensonal data clusterng. To facltate the comparson of these dfferent approaches, we denote the approaches of [18] as EFCM, of [20] as RPFCM- A, of [21] as RPFCM-B, and our cumulatve agreement based approach (Algorthm 1 as CAFCM. The experments were performed n the MATLAB envronment on a normal PC wth the followng confguratons; OS: Wndows 7 (64 bt; processor: Intel(R Core(TM RAM: 16GB. A. Datasets and Parameter Settngs We performed our experments on the followng datasets. 1 Synthetc datasets: Two synthetc datasets, each havng n = data ponts n p = 1000 dmensons, were constructed by drawng labeled samples from a mxture of three Gaussan dstrbutons. GM1 s a well separated Gaussan mxture, whle GM2 presumably has overlappng Gaussan clusters because ts means are closer than those n GM1. The propertes of these synthetc datasets are gven n Table II. TABLE II: Propertes of two synthetc datasets GM1 and GM2 Component Means GM1 ( 6, 6,..., (0,0,..., (6,6,..., GM2 ( 2, 2,..., (0,0,..., (2,2,..., Standard devatons n all drectons GM1 (1,1,..., (2,2,..., (3,3,..., GM2 (1,1,..., (2,2,..., (3,3,..., Real datasets: Sx publcly avalable real hghdmensonal labeled datasets were chosen to demonstrate the applcablty of our approach. The detals are as follows: KDD CUP 99 [43]: We used a sample of KDD CUP 99, whch contans a wde varety of nternet attacks smulated n a mltary envronment. It conssts of nstances of 41 dmensonal vectors, and each vector s labeled to specfy the attack type. We normalzed all 41 features to the nterval [0,1] by subtractng the mnmum and then dvdng by the subsequent maxmum so that they all had same scale. Ths dataset contans 22 types of smulated attacks whch fall nto one of four man categores [43]. ACT [44]: Ths s a tme-seres dataset whch contans data representng 19 actvtes such as sttng, walkng, jumpng etc., captured by 45 moton sensors over a 5 mnute wndow sampled at 25Hz. Each actvty s performed by 8 dfferent subjects. The 5-mn sgnals are dvded nto 5-sec segments so that 480 (= 60 8 sgnal segments are obtaned for each actvty. In each segment, there are a total of 125 (= 5sec 25Hz rows and 45 columns. We concatenated each segment data to obtan 9120 (= nstances n 5625 dmensons. All features were normalzed to [0,1] usng the method dscussed earler. Forest Covertype [45]: These data consst of 54 cartographc features obtaned by the U.S. Geologcal Survey and U.S. Forest Servces, collected from a total of (30m 30m cells, whch were then categorzed nto 7 forest cover types. Ths s a challengng dataset for any clusterng algorthm as t contans ten contnuous features, and 44 bnary features (four wlderness types and 40 sol types. Because of the dfferent nature of 54 features, we started developng our own dstance metrc usng Eucldean and Hammng dstance wth normalzed contnuous feature (wthn [0,1] that accounts for these dfferences to gve smlar weght to all the features. But the clusterng results were slghtly worse than usng Eucldean dstance alone. After several experments, we dscovered that the bnary features do not add too much value n dscrmnatng the forest Cover type. Usng the Eucldean dstance wth scaled contnuous features, wth all bnary features, yelded the best results n our experments, therefore, we used Eucldean dstance model for Forest dataset. We normalzed the contnuous features to the nterval [0,1]. MNIST [46]: Ths dataset s a subset of a large set of handwrtten mages from the Natonal Insttute of Standards and Technology (NIST. It contans a total of (= dmensonal bnary mages of the dgts 0 to 9. The man problem wth handwrtten mages s that a sngle character can be wrtten n many often qute dfferent ways. Ths causes overlappng clusters n the data and makes t challengng for clusterng. HAR [47]: Ths tme-seres dataset contans nstances of 6 daly actvtes performed by 30 subjects, whle carryng a wast-mounted smart phone wth embedded nertal sensors. It s a preprocessed dataset whch has 561 features wth tme and frequency doman varables. CIFAR 10 [48]: Ths dataset contans x32 color mages n 10 classes, wth 6000 mages per class. The classes are mutually exclusve. We concatenated each mage nto a 3072 = ( dmensonal feature vector. 3 Parameters: The model and error norms were both Eucldean for FCM except for the two tme-seres datasets. The

8 8 IEEE TRANSACTIONS ON FUZZY SYSTEMS Cosne dstance was used as model norm for HAR and ACT, based on ts performances n prevous studes [20]. Ths was done by replacng the Eucldean norm by the Cosne dstance n the FCM functon. In ths case, the resultant algorthm s not alternatng optmzaton snce the FCM objectve functon has been abandoned. So ths s an nstance of alternatng cluster estmaton. The number of random projecton (RPs, N s chosen as 30, unless stated otherwse. The weghtng exponent m = 2, termnaton threshold ε = , and the number of maxmum teratons s chosen as 100 for the MATLAB mplementaton of FCM. Termnaton occurs when the absolute value of the dfference between successve values of the FCM objectve functon usng ether dstance s less than ε. B. Evaluaton Crtera Adjusted Rand Index: The soft verson [36] of the adjusted rand ndex, ARI s (Hubert and Arabe [49] s used as an external soft CVI. Ths ndex V ARIs (U U gt measures the degree to whch a fuzzy partton U matches a crsp U gt. Hgher values ndcate a better match, so V ARIs s a maxoptmal CVI. Ths ndex maxmzes at 1 when U = U gt, and t s mnmum may be negatve when ts expected value s not zero. The Normalzed Partton Entropy (PEB V PEBs [38], Partton Index (SC V SCs [50], Normalzed Partton Coeffcent (PCR V PCRs [51], and Xe-Ben ndex (XB V XBs [52], are used for nternal CVI comparsons. Based on the mn or maxoptmalty of nternal CVIs, a set U of parttons, ordered n decreasng qualty as n (7, s obtaned for each nternal CVI V nts. The performance of each nternal CVI V nts aganst the external CVI V ARIs s evaluated usng two metrcs: Kendall s rank correlaton coeffcent [53]: Let E exts and E nts be poston vectors of V exts and V nts respectvely, whch contan the rankng of sorted (descendng order of qualty parttons. Kendall s coeffcent τ measures the smlarty between orderngs n E exts and E nts, whch s gven as [53]: τ = Number of concordant pars Number of dscordant pars N(N 1/2. (11 Kendall s τ s valued n [ 1,1], where 1 s for perfect agreement between two rankngs, and 1, for perfect dsagreement. Poston of the base partton: The selecton of the best qualty partton to be the base partton s mportant n our approach. Let the poston of the best partton U (1 (frst n E exts n E nts be denoted as e U(1, then a poston metrc V Ub s used to evaluate how accurately an nternal CVI determnes the poston of the base partton n E nts, thus V Ub = 1 e U (1 1. [0,1] (12 N 1 The nteger e U(1 s the poston of the partton n the nternal rankng E nts whose partton matches U (1 = U b, so e U(1 can take any value from 1 to N. Suppose e U(1 = 1, so that U (1 s the best partton n both rankngs E exts and E nts, then V Ub = 1. On the other hand, suppose e U(1 = N, then V Ub = 0, So the TABLE III: The average V ARIs and downspace data generaton tme for dstrbuton (3 and (4 Random Matrx\Datasets V ARIs GM1 GM2 Tme (s V ARIs Tme (s Dstrbuton ( Dstrbuton ( range of V Ub s [0,1], maxmum at 1 when the best external and best nternal partton are the same; and mnmum at 0 when the best external partton s the worst nternal partton. The hgher the value of V Ub, the hgher the rankng of the best partton U (1 n E nts. The evaluaton crtera to compare the performances of dfferent ensemble approaches are: Accuracy: The smlarty of the fnal clusterng soluton U f wth respect to ground truth partton U gt s measured usng V ARIs (U f U gt, for all four fuzzy ensemble approaches. Run-Tme: Runnng tme s also an mportant crtera for comparson, whch s related to the scalablty of an algorthm. For each dataset, we pre-generated the downspace datasets usng random projecton, and used the same projecton matrces for all algorthms. We keep the number of RPs N, and other parameters fxed for all approaches. We also compare the four fuzzy ensemble approaches based on the aggregaton tme T agg, requred to get a fnal output partton U f from the N ensemble parttons. C. Selecton of Random Matrx T for Downspace Data (Y Generaton We conducted an experment to demonstrate that we can use ether of equatons (3 or (4 as the bass for random projecton. Usng datasets GM1 and GM2 wth dstrbutons (3 and (4, we generated downspace datasets {Y r } (q = 100 and used them n our framework for ensemble clusterng. The average (10 trals executon tmes for downspace data generaton and the correspondng soft adjusted rand ndces V ARIs for output parttons are shown n Table III. These values confrm that there s very lttle dfference between the projectons based on equatons (3 and (4. As also shown n [16], both (3 and (4 are very smple probablty dstrbutons and all mathematcal operatons requred to compute Y = 1 q XT are very effcent and easy to mplement. Subsequently, we used dstrbuton (3 to generate downspace datasets n all the remanng experments. D. Internal CVIs Valdaton for Best c r The base partton should deally contan the nomnally "true" target value for the number of clusters c gt, that are dentfed by U gt. In ths regard, the best-c valdaton test [40] was performed usng the four soft nternal CVIs to estmate c gt n all datasets. The downspace dmenson q was chosen as 20. For the choces of ε = β = 0.25, and n = 10000, q o = 1591, so q s well below the JL bound q o. In ths experment, FCM was performed on each downspace dataset by parttonng the data at each value of c between {c mn,c max }. The lower (c mn and the upper (c max lmts were chosen such that they under- and

9 RATHORE et al.: ENSEMBLE FUZZY CLUSTERING USING CUMULATIVE AGGREGATION ON RANDOM PROJECTIONS 9 TABLE IV: The average (20 trals of the best c s from all nternal CVIs (V nts <Internal CVI> c gt <V PEBs > <V SCs > <V XBs > <V PCRs > Synthetc Datasets GM GM Real Datasets MNIST CIFAR HAR FOREST ACT KDD CUP Root Mean Square Error TABLE V: Average Values (5 trals of Kendall s τ and (V Ub of nternal CVIs aganst V ARIs. <Internal CVI> <V PEBs > <V SCs > <V XBs > <V PCRs > Synthetc Datasets GM ( ( ( (1.00 GM ( ( ( (1.00 Real Datasets MNIST 0.36 ( ( ( (0.97 CIFAR ( ( ( (0.98 HAR 0.68 ( ( ( (0.99 FOREST 0.17 ( ( ( (0.96 ACT 0.65 ( ( ( (1.00 KDD CUP 0.19 ( ( ( (0.93 Column Average 0.52 ( ( ( (0.98 over-estmated the possble number of clusters n the data. The best qualty partton, U r, havng c r clusters, was chosen usng each CVI based on ts mn/max optmalty. Ths procedure was performed for each downspace projecton, and the (round average of the best c s was used as an estmate of the true number of clusters n the upspace data. In ths test, randomly chosen subsets of each upspace dataset were used for the bg datasets. Table IV shows the estmated number of clusters n each dataset for each of the nternal CVIs. The value of the apparent 2 true number of clusters c gt s shown n the second column of Table IV. The values n the last row of Table IV show the square root of the sum of squared errors (RMSE between c gt and the estmated values for each nternal CVI. In ths exercse, V SCs produces slghtly more relable estmates of c gt than the other three CVIs, whlst V PEBs produces the second best estmates of c gt. We remark that these conclusons are not generally applcable. You could test many dfferent CVIs and get dfferent best results. Or you could change datasets and dscover that V SCs and V PEBs performed badly. And so on, ad nfntum. It can also be observed from Table IV that V PCRs works best for MNIST, V PEBs for ACT, whle V SCs s best for rest of the datasets. We tested the performance of the CAFCM algorthm usng both V SCs and V PEBs n Step 3, and the fnal results were very smlar. Therefore, we chose V PEBs as the best nternal CVI based on ths and the I/E agreement test (next for use n Steps 3 and 4 of CAFCM Algorthm. E. The Internal/External (I/E Agreement Test In ths experment, we performed the Internal/External (I/E agreement test, n whch the performance of an nternal CVI s compared wth the performance of an external CVI to assess whether they both yeld smlar base partton and smlar partton rankngs or not [29], [54]. We compared the partton rankngs and the base partton obtaned usng the external CVI (V ARIs, wth the partton rankngs and base partton obtaned usng each of the four nternal CVIs. Among these four nternal CVIs, the CVI whch determnes the most smlar 2 We say apparent because t s well known that labeled data whch contan c1 physcally labeled subsets often possess c2 c1 "best clusters" wth respect to a gven model and algorthm [29]. partton rankng and base partton, obtaned usng the external CVI, s chosen for use n our framework. Usng ths best nternal CVI, we hope to acheve the desred partton rankngs and base partton n the best possble way when ground truth data are not avalable (the unlabeled case. 1 Partton rankngs comparson: Step 3 of the CAFCM algorthm produces the N ensemble parttons havng best c r number of clusters. The rankng of each ensemble of fuzzy parttons s establshed usng the external CVI V ARIs, and the four soft nternal CVIs V PEBs, V SCs, V XBs, and V PCRs, based on the partton qualty. The parttons rankng, E nts, of each of the four soft nternal CVIs was compared wth the parttons rankng, E exts, of soft external CVI, V ARIs, for each dataset usng the Kendall rank correlaton coeffcent. 2 Base partton comparson: Besdes the partton rankngs, the selecton of the base partton, U b, s also mportant n our framework. In ths experment, the poston e U(1 of the base partton U b, the best external CVI partton (frst n E exts, n each nternal CVI partton rankng E nts was used to compute the poston metrc V Ub for each nternal CVI and for each dataset. The values of τ and V Ub were computed between rankngs E exts = {E ARIs } and each rankng of E nts = {E PEBs,E SCs,E XBs,E PCRs }, usng (11 and (12. Ths procedure was repeated 5 tmes for each dataset. Table V shows the averaged values of τ and V Ub (n parentheses correspondng to the order of the N fuzzy parttons establshed by each nternal CVI for each dataset. The notaton <CVI> n the frst row of the table ndcates the bass of the τ and V Ub values that are dsplayed n each column, not to be confused wth the value of the CVIs, whch are NOT shown. The values n each column are formatted wth just enough resoluton so that the optmal values can be seen. Apparently all of the CVIs except V XBs perform well for the two synthetc datasets, whch means three nternal CVIs are able to acheve almost the same rankng of parttons as obtaned by the external CVI V ARIs. The τ value of all four CVIs degrades for the real datasets. However, the (V Ub values of V PCRs and V PEBs are hgh for all real datasets, whch means they relably choose the best qualty partton from the N ensemble parttons. The last row of Table V contans

10 10 IEEE TRANSACTIONS ON FUZZY SYSTEMS TABLE VI: The effects of ordered versus random aggregaton of ensemble parttons (tabulated values are the 10 tral average of V ARIs. Sequence Orderng of Parttons GM1 (q = 30 GM2 (q = 100 Decreasng order of qualty Arbtrary order column averages, and t shows that overall V PCRs and V PEBs perform well (wth a very slght advantage to V PEBs, whle V XBs performs worst. Based on ths overall performance of four nternal CVIs n determnng partton rankngs and the base partton, the performance of V PEBs (nternal CVI agrees best wth the performance of the soft external ndex V ARIs. Therefore, we chose V PEBs to determne the base partton and a set of sorted parttons, requred n Step 4 of CAFCM Algorthm. The CVI V PEBs s also used n Step 3 of Algorthm 1 to obtan the ensemble parttons, havng the best c r number of clusters. F. Effect of Orderng Sequence of Parttons on Output Partton To demonstrate the effect of alterng the orderng of the ranked queue, as shown n (7, on the output partton, we performed an experment usng datasets GM1 and GM2 consderng two cases vz., where sequence of ensemble parttons s ( ordered and ( arbtrary. Frst, we obtaned a base partton for each dataset n the manner descrbed. Table VI compares the V ARIs values of the output partton obtaned when the ensemble parttons are combned n a sequental manner based on ther CVI qualty as n (7 to the V ARIs values of the output partton obtaned when the N 1 remanng parttons are combned wth the base partton n an arbtrary order. The average V ARIs values (10 trals n Table VI make t clear that combnng the remander parttons accordng to ther CVI rank yelds better V ARIs values (and hence, a better output partton than arbtrary combnaton. G. Comparson of Dfferent Cluster Ensemble Methods In ths experment, we compare the performance of our approach wth three exstng ensemble approaches for hgh dmensonal data clusterng usng random projecton wth FCM. We dscuss the performance of all four cluster ensemble approaches n 5 data groups (G1-G5, based on the dfferent attrbutes of datasets. Synthetc datasets of dfferent downspace dmensons q (G1: For synthetc datasets GM1, GM2, experments were performed for downspace dmenson q = 10, 20, 30, 50, 100. These q values are correspondng to rogue random projectons, whch are chosen rrespectve of ε and β (below the JL bound as mentoned n Secton VII-D. The average V ARIs values and ensemble tme T agg of all approaches over 5 trals for GM1 and GM2 are shown n Table VII. The best performance approach for each downspace dmenson s hghlghted n bold. It s evdent from the values n Table VII that even wth q = 10, all the ensemble approaches acheve very good clusterng results (V ARIs > 0.9 for the GM1 dataset. Ths TABLE VIII: Average V ARIs values and ensemble tme T agg (s for dfferent number of RPs (N on the GM2 dataset. EFCM N ARI T agg ARI T agg ARI T agg ARI T agg s because the clusters n ths dataset are (probably well separated from each other. EFCM and RPFCM-B get perfect results (V ARIs = 1 for q = 10 and 20. The CAFCM approach performs reasonably well (V ARIs > 0.9 n sgnfcantly less computaton tme, and acheves perfect results for q = 30. It can be concluded from Table VII that the CAFCM approach s tmes faster than the other three approaches. All four approaches get perfect results for q = 30 and above, so we do not compare them for hgher downspace dmensons. For the GM2 dataset, CAFCM performs sgnfcantly better than the other three approaches for all downspace dmensons except q = 10. The weak performance of CAFCM for q = 10 may be because, the dstrbuton of ponts among clusters changes n each consensus partton, whch n turn, causes the weak agreements of ponts for any cluster across all consensus parttons. Whereas for q > 10, more features make stronger agreement of each data pont for any cluster. The CAFCM algorthm performs aggregaton n neglgble tme compared to the other three approaches, for both synthetc datasets. Ths s because, unlke other ensemble approaches, CAFCM does not use FCM on a fnal aggregaton matrx to get the fnal membershp matrx. In order to compare the performance of all four ensemble methods wth respect to stablty, the standard devaton (rounded off of V ARIs values wth average values are shown n Table VII. We can see that CAFCM seems to be the least varable among all the approaches. Ths mght be due to the smoothng effect from sequental averagng of the transformed parttons and base partton (refer to Algorthm 1. The EFCM algorthm seems to be the most stable of the other three approaches. Synthetc dataset GM2 for dfferent number of RPs, N (G2: We conducted another experment for GM2 dataset for dfferent numbers of RPs, N (ensemble sze. For datasets havng hgh dversty (overlappng clusters lke GM2, ncreasng N may be benefcal because there wll probably be much more dversty n the random projectons due to the mxed clusters n the upspace. Table VIII shows the average V ARIs values and ensemble tme (5 trals of all approaches for a fxed value of q(= 40. It can be noted that CAFCM gves the best performance for all Ns except N = 5 and 10. As expected, the adjusted rand ndex (V ARIs ncreases for all approaches as N ncreases. Unlke exstng approaches, ncreasng the ensemble sze has a neglgble effect on the computatonal tme of CAFCM. The maxmum speedup s CAFCM:EFCM s 4200 : 1 at N = 50, and the mnmum speedup s CAFCM:RPFCM-B

11 RATHORE et al.: ENSEMBLE FUZZY CLUSTERING USING CUMULATIVE AGGREGATION ON RANDOM PROJECTIONS 11 TABLE VII: Average V ARIs values and ensemble tme T agg (n s for all approaches on the GM1 and GM2 datasets. EFCM q V ARIs T agg V ARIs T agg V ARIs T agg V ARIs T agg GM1 Dataset c r {2,8} ± ± ± ± ± ± ± ± ± ± ± ± GM2 Dataset c r {2,8} ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± = 11 : 1 at N = 20. Hgh dmensonal real datasets (ACT, HAR, MNIST and CIFAR for dfferent q (G3: In ths group, we dscuss the performance on the real datasets ACT, HAR, MNIST and CIFAR, whch have relatvely hgh dmensons (n hundreds and thousands as compared to the KDD CUP and FOREST datasets, whch have smaller upspace dmensons. For G3 datasets, the downspace dmensons q = 10, 20, 30, 50, 100 were chosen. Lne-plots are used to present the V ARIs values of all ensemble approaches for dfferent downspace dmensons, whch are shown n the left columns of Fgs. 2 and 3, whereas, the rght columns n Fgs. 2 and 3 shows the tme performance (on logarthmc scale of all ensemble approaches for dfferent numbers of downspace dmensons. We dd not apply EFCM to MNIST, CIFAR (as n > to avod an out of memory error, and ts assocated computatonal load. Therefore, the tme performance for these datasets s shown on a nonlogarthmc scale. The mnmum and maxmum number of clusters n consensus parttons s shown n the ttle of the fgure for each dataset. Fgs. 2(a and (b show that CAFCM outperforms all other ensemble methods for the two tme-seres datasets (HAR and ACT. For the mage datasets (MNIST and CIFAR, the performance of CAFCM s comparable to RPFCM-B, and outperforms RPFCM-A. The aggregaton tme for CAFCM s qute small compared to the other three approaches, whch agrees wth our tme complexty analyss as dscussed n Secton III. KDD CUP and FOREST Covertype (G4: The upspace dmensons for FOREST and KDD CUP are 41 and 54, respectvely, so we chose the downspace dmensons to be q = 10,20,30,40. For each of these datasets, the experments were performed on a subset of n = 100, 000 nstances. Consequently, the EFCM algorthm was not appled on these datasets to avod the assocated computatonal load. The performance of all ensemble approaches for these two datasets, s shown n Fgs. 3 (a and (b respectvely. The CAFCM approach performs better than the other three ensemble methods for almost all of the downspace dmensons. The CAFCM algorthm acheves near to best accuracy even wth q = 10 (25% dmensons for these two datasets. The tme performance n Fg. 3 (b shows that even for the large datasets, CAFCM takes neglgble tme for aggregaton compared to the other approaches. Performance of all ensemble approaches for dfferent number of samples (n (G5: In order to demonstrate the applcablty of our algorthm for bg data, the tme performance of each ensemble approach for dfferent number of samples of the KDD CUP dataset s presented n Fg. 4 (on logarthmc scale. EFCM tests were lmted to n = 20,000 nput samples to avod the large computatonal burden. We see that CAFCM takes just a few seconds for even n = 100,000 samples. The maxmum computatonal tme (for 100, 000 samples of CAFCM s no more than the mnmum tme (for 10,000 samples taken by the other approaches. VIII. CONCLUSIONS AND DISCUSSION Ths paper ntroduces a smple and computatonally effcent framework called CAFCM for hgh dmensonal data clusterng, whch employs FCM clusterng an ensemble of random projectons. Three other state-of-the-art ensemble approaches that also use FCM clusterng are dscussed n ths paper. These approaches requre large amounts of space for storng a bg affnty matrx. In addton, they also requre FCM clusterng on a large affnty matrx to get the fnal partton, so they ncur much larger computaton tme than CAFCM does. The CAFCM algorthm elmnates the complexty nvolved n dealng wth a fnal affnty matrx usng a cumulatve agreement based fuzzy partton aggregaton approach. The fnal CAFCM partton s acheved wth cumulatve agreement based relabellng and averagng of the ensemble of fuzzy parttons. Each partton s taken sequentally from a ranked queue establshed per equaton (7. The ranks are computed wth a cluster valdty ndex. The hghest rankng partton becomes the core partton U b, and ths partton drves the agreement procedure. We expermented wth dfferent nternal CVIs to assess the qualty of ensemble parttons havng known target (true numbers of labeled subsets. The performance of four nternal CVIs were correlated wth the assessments made by the soft external ARI, V ARIs. The normalzed soft partton entropy (V PEBs ndex led to the best fnal parttons n the experments presented here. Once the CVIs for steps 3 and 4 n Algorthm 1 are chosen, our approach does not requre any pror knowledge of the number of clusters that mght be present n the dataset, whch makes t attractve for real clusterng problems. We demonstrated the superorty of our CAFCM approach by comparng t wth three exstng approaches on two Gaussan

12 12 IEEE TRANSACTIONS ON FUZZY SYSTEMS Adjusted Rand Index (ARI s Adjusted Rand Index (ARI s Adjusted Rand Index (ARI s 0.7 EFCM EFCM Adjusted Rand Index (ARI s Aggregaton Tme, T agg (s (a HAR Dataset, c r {3,10} Aggregaton Tme, T agg (s EFCM (b ACT Dataset, c r {15,25} Aggregaton Tme, T agg (s EFCM (c CIFAR Dataset, c r {4,16} Aggregaton Tme, T agg (s (d MNIST Dataset c r {4,16} 0 Fg. 2: V ARIs values (n left column and Aggregaton tme T agg (n rght column for dfferent downspace dmensons mxture datasets and sx real datasets. Our expermental results show that CAFCM outperforms the other three approaches n terms of accuracy, stablty, space, and tme complexty. Expermental results reveal that on average our algorthm runs one to two orders of magntude ( tmes faster than other state-of-the-arts algorthms, and at best, can acheve speedups n on the order of 4000 : 1. We also showed that CAFCM can produce reasonable performance even for downspace dmensons well below the JL bound (rogue random projectons. Ths s very mportant when the dataset has many features. For example, even wth q = 10, the CAFCM approach produced good results on the ACT data. The proposed CAFCM algorthm has lnear O(n tme complexty n the number (n of data ponts. We also showed emprcally that our algorthm scales lnearly n the number of samples (n for a bg dataset (KDD CUP. The CAFCM ensemble tme for n = 100,000 samples was less than the mnmum ensemble tme for the other approaches for any number of samples. The CAFCM algorthm may take hundreds of seconds for very large (n 10 9 datasets.

13 RATHORE et al.: ENSEMBLE FUZZY CLUSTERING USING CUMULATIVE AGGREGATION ON RANDOM PROJECTIONS 13 Adjusted Rand Index (ARI s Adjusted Rand Index (ARI s Aggregaton Tme, T agg (s (a KDD Dataset, c r {15,25} Aggregaton Tme, T agg (s (b FOREST Dataset, c r {3,15} Fg. 3: V ARIs values (n left column and Aggregaton tme T agg (n rght column for dfferent downspace dmensons Aggregaton Tme, T agg (s 10 5 EFCM k 2k 3k 4k 5k 10k 20k 30k 50k 100k Number of Samples (n Fg. 4: KDD CUP Dataset: Aggregaton tme T agg for dfferent number of samples However, our aggregaton approach takes only about a second for n = 100,000 samples, and we estmate that t wll take only a few seconds for a n = 10 6 data ponts. ACKNOWLEDGMENT We acknowledge the support from Australan Research Councl (ARC Lnkage Project grant (LP , the ARC Lnkage Infrastructure, Equpment and Facltes scheme (LIEF grant (LF REFERENCES [1] S. K. Halgamuge and L. Wang, Classfcaton and clusterng for knowledge dscovery. Sprnger Scence & Busness Meda, 2005, vol. 4. [2] M. Moshtagh, S. Rajasegarar, C. Lecke, and S. Karunasekera, Anomaly detecton by clusterng ellpsods n wreless sensor networks, n 5th Internatonal Conference on Intellgent Sensors, Sensor Networks and Informatons Processng (ISSNIP, 2009, pp [3] J. C. Bezdek, T. C. Havens, J. M. Keller, C. Lecke, L. Park, M. Palanswam, and S. Rajasegarar, Clusterng ellptcal anomales n sensor networks, n IEEE Internatonal Conference on Fuzzy systems (FUZZ, 2010, pp [4] S. M. Erfan, M. Baktashmotlagh, S. Rajasegarar, S. Karunasekera, and C. Lecke, R1SVM: a Randomsed Nonlnear Approach to Large-Scale Anomaly Detecton, n Proceedngs of Assocaton for the Advancement of Artfcal Intellgence (AAAI, 2015, pp [5] S. M. Erfan, M. Baktashmotlagh, S. Rajasegarad, V. Nguyen, C. Lecke, J. Baley, and K. Ramamohanarao, R1stm: One-class support tensor machne wth randomsed kernel, n Proceedngs of the 2016 SIAM Internatonal Conference on Data Mnng. SIAM, 2016, pp [6] E. Keogh, K. Chakrabart, M. Pazzan, and S. Mehrotra, Locally adaptve dmensonalty reducton for ndexng large tme seres databases, ACM SIGMOD Record, vol. 30, no. 2, pp , [7] Q. Du and J. E. Fowler, Hyperspectral mage compresson usng jpeg2000 and prncpal component analyss, IEEE Geoscence and Remote Sensng Letters, vol. 4, no. 2, pp , [8] E. P. Xng, M. I. Jordan, R. M. Karp et al., Feature selecton for hghdmensonal genomc mcroarray data, n Proceedngs of Internatonal Conference on Machne Learnng (ICML, vol. 1, 2001, pp [9] E. Bngham and H. Mannla, Random projecton n dmensonalty reducton: applcatons to mage and text data, n Proceedngs of the Seventh ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, 2001, pp [10] M. Stenbach, L. Ertöz, and V. Kumar, The challenges of clusterng hgh dmensonal data, n New drectons n Statstcal Physcs. Sprnger, 2004, pp [11] L. Parsons, E. Haque, and H. Lu, Subspace clusterng for hgh dmensonal data: a revew, ACM SIGKDD Exploratons Newsletter, vol. 6, no. 1, pp , [12] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatc subspace clusterng of hgh dmensonal data for data mnng applcatons, SIGMOD Rec., vol. 27, pp , [13] H. Hotellng, Analyss of a complex of statstcal varables nto prncpal components. Journal of Educatonal Psychology, vol. 24, no. 6, p. 417, [14] G. H. Golub and C. Rensch, Sngular value decomposton and least squares solutons, Numersche Mathematk, vol. 14, no. 5, pp , [15] S. Kask, Dmensonalty reducton by random mappng: Fast smlarty computaton for clusterng, n Proceedngs of Internatonal Jont Conference on Neural Networks, vol. 1, 1998, pp [16] D. Achloptas, Database-frendly random projectons, n Proceedngs of the twenteth ACM SIGMOD-SIGACT-SIGART Symposum on Prncples of Database Systems, 2001, pp [17] C. H. Papadmtrou, H. Tamak, P. Raghavan, and S. Vempala, Latent

14 IEEE TRANSACTIONS ON FUZZY SYSTEMS semantc ndexng: A probablstc analyss, n Proceedngs of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposum on Prncples of Database Systems, 1998, pp. 159 168.

Fern and C. E. Brodley, Random projecton for hgh dmensonal data clusterng: A cluster ensemble approach, n Proceedngs of Internatonal Conference on Machne Learnng (ICML, vol. 3, 2003, pp. 186 193.

Ye, W. Lu, J. We, and X. Hu, Fuzzy-means and cluster ensemble wth random projecton for bg data clusterng, Mathematcal Problems n Engneerng, 2016. [22] A. Strehl and J.

14 14 IEEE TRANSACTIONS ON FUZZY SYSTEMS semantc ndexng: A probablstc analyss, n Proceedngs of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposum on Prncples of Database Systems, 1998, pp [18] R. Avogadr and G. Valentn, Fuzzy ensemble clusterng based on random projectons for dna mcroarray data analyss, Artfcal Intellgence n Medcne, vol. 45, no. 2, pp , [19] X. Z. Fern and C. E. Brodley, Random projecton for hgh dmensonal data clusterng: A cluster ensemble approach, n Proceedngs of Internatonal Conference on Machne Learnng (ICML, vol. 3, 2003, pp [20] M. Popescu, J. Keller, J. Bezdek, and A. Zare, Random projectons fuzzy c-means (rpfcm for bg data clusterng, n IEEE Internatonal Conference on Fuzzy Systems (FUZZ-IEEE, 2015, pp [21] M. Ye, W. Lu, J. We, and X. Hu, Fuzzy-means and cluster ensemble wth random projecton for bg data clusterng, Mathematcal Problems n Engneerng, [22] A. Strehl and J. Ghosh, Cluster ensembles a knowledge reuse framework for combnng multple parttons, Journal of Machne Learnng Research, vol. 3, pp , [23] X. Z. Fern and C. E. Brodley, Solvng cluster ensemble problems by bpartte graph parttonng, n Proceedngs of the twenty-frst ACM Twenty-frst Internatonal Conference on Machne Learnng, 2004, p. 36. [24] A. Topchy, A. K. Jan, and W. Punch, Clusterng ensembles: Models of consensus and weak parttons, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 27, no. 12, pp , [25] A. L. Fred and A. K. Jan, Data clusterng usng evdence accumulaton, n Proceedngs of the 16th Internatonal Conference on Pattern Recognton, vol. 4, 2002, pp [26] S. Dudot and J. Frdlyand, Baggng to mprove the accuracy of a clusterng procedure, Bonformatcs, vol. 19, no. 9, pp , [27] E. Dmtradou, A. Wengessel, and K. Hornk, A combnaton scheme for fuzzy clusterng, Internatonal Journal of Pattern Recognton and Artfcal Intellgence, vol. 16, no. 07, pp , [28] H. G. Ayad and M. S. Kamel, Cumulatve votng consensus method for parttons wth varable number of clusters, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 30, no. 1, pp , [29] J. C. Bezdek, Prmer on Cluster Analyss: Four Basc Methods that (Usually Work. Frst Edton Desgn Publshng, 2017, vol. 1. [30] G. J. McLachlan and K. E. Basford, Mxture models: Inference and applcatons to clusterng. Marcel Dekker, 1988, vol. 84. [31] A. P. Dempster, N. M. Lard, and D. B. Rubn, Maxmum lkelhood from ncomplete data va the em algorthm, Journal of the Royal Statstcal Socety. Seres B (methodologcal, pp. 1 38, [32] W. B. Johnson and J. Lndenstrauss, Extensons of lpschtz mappngs nto a hlbert space, Contemporary Mathematcs, vol. 26, no , p. 1, [33] S. Dasgupta, Experments wth random projecton, n Proceedngs of the Sxteenth conference on Uncertanty n artfcal ntellgence, 2000, pp [34] J. C. Bezdek, X. Ye, M. Popescu, J. Keller, and A. Zare, Random projecton below the JL lmt, n Proceedngs of Internatonal Jont Conference on Neural Network (IJCNN, 2016, pp [35] H. W. Kuhn, The hungaran method for the assgnment problem, Naval Research Logstcs Quarterly, vol. 2, no. 1-2, pp , [36] D. T. Anderson, J. C. Bezdek, M. Popescu, and J. M. Keller, Comparng fuzzy, probablstc, and possblstc parttons, IEEE Transactons on Fuzzy Systems, vol. 18, no. 5, pp , [37] Y. Le, J. C. Bezdek, J. Chan, N. X. Vnh, S. Romano, and J. Baley, Generalzed nformaton theoretc cluster valdty ndces for soft clusterngs, n IEEE Symposum on Proceedngs of the Eghth Internatonal Conference on Numercal Taxonomy, 2014, pp [38] J. C. Bezdek, Mathematcal models for systematcs and taxonomy, n Proceedngs of the Eghth Internatonal Conference on Numercal Taxonomy, 1975, pp [39] N. R. Pal and J. C. Bezdek, On cluster valdty for the fuzzy c-means model, IEEE, vol. 3, no. 3, pp , [40] J. C. Bezdek, M. Moshtagh, T. Runkler, and C. Lecke, The generalzed C ndex for nternal fuzzy cluster valdty, IEEE Transactons on Fuzzy Systems, vol. 24, no. 6, pp , [41] J. Wall, Generalzed nverses of stochastc matrces, Lnear Algebra and ts Applcatons, vol. 10, no. 2, pp , [42] P. Courreu, Fast computaton of moore-penrose nverse matrces, CoRR, vol. abs/ , [Onlne]. Avalable: abs/ [43] M. Tavallaee, E. Bagher, W. Lu, and A.-A. Ghorban, A detaled analyss of the kdd cup 99 data set, n Proceedngs of the Second IEEE Symposum on Computatonal Intellgence for Securty and Defence Applcatons, [44] K. Altun, B. Barshan, and O. Tunçel, Comparatve study on classfyng human actvtes wth mnature nertal and magnetc sensors, Pattern Recognton, vol. 43, no. 10, pp , [45] J. A. Blackard and D. J. Dean, Comparatve accuraces of artfcal neural networks and dscrmnant analyss n predctng forest cover types from cartographc varables, Computers and Electroncs n Agrculture, vol. 24, no. 3, pp , [46] Y. LeCun, C. Cortes, and C. J. Burges, The mnst dataset of handwrtten dgts, URL lecun. com/exdb/mnst, [47] D. Anguta, A. Gho, L. Oneto, X. Parra, and J. L. Reyes-Ortz, A publc doman dataset for human actvty recognton usng smartphones. n ESANN, [48] A. Krzhevsky and G. Hnton, Learnng multple layers of features from tny mages. Cteseer, [49] L. Hubert and P. Arabe, Comparng parttons, Journal of Classfcaton, vol. 2, no. 1, pp , [50] N. Zahd, M. Lmour, and A. Essad, A new cluster-valdty for fuzzy clusterng, Pattern Recognton, vol. 32, no. 7, pp , [51] M. Roubens, Pattern classfcaton problems and fuzzy sets, Fuzzy Sets and Systems, vol. 1, no. 4, pp , [52] X. L. Xe and G. Ben, A valdty measure for fuzzy clusterng, IEEE Transactons on Pattern Analyss and Machne Intellgence, vol. 13, no. 8, pp , [53] M. G. Kendall, Rank correlaton methods. Grffn, [54] O. Arbelatz, I. Gurrutxaga, J. Muguerza, J. M. Pérez, and I. Perona, An extensve comparatve study of cluster valdty ndces, Pattern Recognton, vol. 46, no. 1, pp , Punt Rathore receved the Master of Technology (M.Tech n Electrcal Engneerng (Instrumentaton from the Indan Insttute of Technology, Kharagpur, Inda n He has worked as Researcher n TATA Steel Lmted, Inda for three and half years ( He s currently pursung the Ph.D. degree wth the Department of Electrcal and Electronc Engneerng, Unversty of Melbourne, Melbourne, Australa. Hs research nterests nclude bg data clusterng, ncremental clusterng, spato-temporal analytcs, Internet of Thngs, machne learnng, pattern recognton, and sgnal processng. James C. Bezdek (LF 10 receved the PhD n Appled Math, Cornell Unversty, Jm s past presdent of NAFIPS (North Amercan Fuzzy Informaton Processng Socety, IFSA (Internatonal Fuzzy Systems Assocaton and the IEEE CIS (Computatonal Intellgence Socety as the NNC: foundng edtor the Int l. Jo.Approxmate Reasonng and the IEEE : Lfe fellow of the IEEE and IFSA; recpent of the IEEE 3rd Mllennum, IEEE CIS Fuzzy Systems Poneer, IEEE Frank Rosenblatt TFA and the Kempe de Feret IPMU awards. He retred n Hs research nterests nclude optmzaton, pattern recognton, clusterng n very large data, coclusterng, and vsual clusterng. Sarah M. Erfan s a lecturer n the School of Computng and Informaton Systems at The Unversty of Melbourne. Her research nterests nclude machne learnng, large-scale data mnng, cyber securty, and data prvacy.

degree from the Unversty of Melbourne, Melbourne, VIC, Australa, n 2009. He s currently a Research Fellow wth the Department of Electrcal and Electronc Engneerng, Unversty of Melbourne.

15 RATHORE et al.: ENSEMBLE FUZZY CLUSTERING USING CUMULATIVE AGGREGATION ON RANDOM PROJECTIONS 15 Sutharshan Rajasegarar receved the B.Sc. Engneerng degree n electronc and telecommuncaton engneerng (Frst Class Hons. from the Unversty of Moratuwa, Moratuwa, Sr Lanka, n 2002, and the Ph.D. degree from the Unversty of Melbourne, Melbourne, VIC, Australa, n He s currently a Research Fellow wth the Department of Electrcal and Electronc Engneerng, Unversty of Melbourne. Hs current research nterests nclude wreless sensor networks, anomaly/outler detecton, spato-temporal estmatons, Internet of Thngs, machne learnng, pattern recognton, sgnal processng, and wreless communcaton. Marmuthu Palanswam (F 12 receved the M.E. degree n electrcal, electronc and control engneerng from the Indan Insttute of Scence, Bengaluru, Inda, the M.Eng.Sc. degree n electrcal, electronc and control engneerng from the Unversty of Melbourne, Melbourne, VIC, Australa, and the Ph.D. degree from the Unversty of Newcastle, N.S.W., Australa. He s currently a Professor wth the Unversty of Melbourne. He s representng Australa as a core partner n EU FP7 projects such as SENSEI, SmartSantander, Internet of Thngs Intatve, and SocIoTal. He has been funded by several Australan Research Councl (ARC and ndustry grants (over 40 mllon to conduct research n sensor network, Internet of Thngs (IoT, health, envronmental, machne learnng, and control areas. He has publshed over 400 refereed research papers, and leads one of the largest funded ARC Research Network on Intellgent Sensors, Sensor Networks and Informaton Processng Programme. Hs current research nterests nclude SVMs, sensors and sensor networks, IoT, machne learnng, neural network, pattern recognton, and sgnal processng and control. TABLE IX: The Contngency Table A to compare partton U and V Partton V v j = row j of V Class v 1 v 2... v r Sums Partton U u = row of U u 1 u 2 u 3. u c A = n 11 n n 1r n 21 n n 2r n 31 n n 3r n c1 n c2... n cr =UV T n 1 n 2 n 3. n c Sums n 1 n 2... n r n = n TABLE X: Cluster Valdty Indces used n ths paper CVI Formula Descrpton Optmalty/Range Soft External CVI Adjusted Rand Index (ARI s [36] a (a+c(a+b (a+b+c+d (a+c+(a+b 2 (a+c(a+b (a+b+c+d The parameter a,b,c, and d (refer to [49] are derved from generalzed contngency matrx A = φuv T (Table IX, where φ = n c =1 n Max-optmal, maxmum=1, mnmum can be negatve f ndex s less than expected value. Normalzed Mutual Informaton (NMI s [37] Normalzed Partton Entropy (PEB [38] MI(U,V /max(h(u, H(V ( 1 c n n u j log(u j /ln a c =1 j=1 n j /n MI = c =1 r j=1 (n j/nlog( n n j /n 2, H(U = c =1 (n /nlog(n /n, where n are derved from contngency matrx. Soft Internal CVI u j s the fuzzy membershp degree of object x j to -th cluster. The c s the number of clusters. Ths valdty ndex requres only membershp values. Max-optmal, and ranges n [0,1] Mn-optmal, and ranges n [0,1] Normalzed Partton Coeffcent (PCR [51] (c U 2 2 1/(c 1 n U 2 2 = c =1 n j=1 (u j 2.Ths valdty ndex requres only membershp values. Max-optmal, and ranges n [0,1] Partton Index (SC [50] ( V (1 c s center for each cluster, and c n j=1 (u j x j V m/2 m s the weghtng exponent. Ths valdty =1 n j=1 (u j 2 ndex requres the membershp values and the dataset both. Max-optmal Xe Ben (XB [52] c =1 n j=1 [um j x j V 2 ] n mn j ( V V j Ths valdty ndex requres the membershp values and the dataset both. Max-optmal

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features