Rough Fuzzy c-means Subspace Clustering

Size: px
Start display at page:

Download "Rough Fuzzy c-means Subspace Clustering"

Transcription

1 Chapter 4 Rough Fuzzy c-means Subspace Clustering In this chapter, we propose a novel adaptation of rough fuzzy c-means algorithm for high dimensional data by modifying its objective function. The proposed algorithm automatically detects the relevant cluster dimensions of the high dimensional data set. The assignment of weights to attributes being specific to each cluster, an efficient subspace clustering scheme is generated. We have also discussed the convergence of the proposed algorithm. The remainder of this chapter is organised as follows: section 4.1 introduces rough set theory, in section 4.2 on related work, we describe how classical clustering methods have been adapted to suit the requirements of high dimensional data, in section 4.3, we extend the rough fuzzy c-means algorithm for subspace clustering in the form of Rough Fuzzy c-means Subspace (RFCMS) algorithm, in section 4.4 we discuss the convergence of proposed algorithm, in section 4.5, we present the results of applying RFCMS algorithm on several UCI data sets, and finally section 4.6 summarizes the chapter. 41

2 4.1 Introduction Pawlak introduced rough set theory as a new framework for dealing with imperfect knowledge [Pawlak, 1991]. Rough set theory provides a methodology for addressing the problem of relevant feature selection, by selecting a set of information rich features from a data set that retains the semantics of the original data and requires no human inputs unlike statistical approaches [Jensen, 1999]. It is often possible to arrive at a minimal feature set (called reduct in rough set theory) that can be used for data analysis tasks such as classification and clustering [Lingras and West, 2004], [Mitra et al., 2006]. When feature selection approaches based on rough sets are combined with an intelligent classification system like those based on fuzzy systems or neural networks, they retain the descriptive power of the overall classifier and result in simplified system structure which enhances the understandability of the resultant system [Shen, 2007]. Following Rutkowski we describe the notion of rough sets used to model uncertainty in information systems [Rutkowski, 2008]. Formally, an information system is a pair (U, A), where U is a non-empty finite set of objects and A is a non-empty finite set of attributes such that each attribute a has an associated value set V a, i.e. a : U V a for every a A. A Decision System DS is defined as a pair (U, A {d}), d / A is called decision attribute and the elements of A are called condition attributes. For an attribute set B A, the set of objects in the information system, indiscernible w.r.t. B is described by the indiscernibility relation IND IS (B) defined as: IND IS (B) = {(x 1, x 2 ) U 2 a(x 1 ) = a(x 2 ) a B}. The objects x 1 and x 2 are indiscernible from each other by attributes from B if (x 1, x 2 ) IND IS (B). The equivalence classes of the B-indiscernibility relation are denoted by [x] B. If X U then X can be approximated using B by constructing three approximations, namely, B lower approximation: BX = {x [x] B X}, B upper approximation: BX = {x [x] B X φ}, 42

3 and the B boundary region: BX BX of X. Evidently, the boundary region consists of all objects in upper approximation but not in lower approximation of X. Bazan et al. discuss various techniques for rough set reduct generation and argue that the classical reducts being static may not be stable in randomly chosen samples of a given decision table [Bazan et al., 2000]. To deal with such situations they focus on reducts that are stable over different subsets of samples chosen from a given decision table. Such reducts are called dynamic reducts. They compute reducts using an order based genetic algorithm and subsequently extract dynamic reducts which are used to generate classification rules. Each rule set is associated with a measure called the rule strength which is used later to resolve conflicts when several rules are applicable. Slezak generalized the concept of reduct by introducing the notion of association reducts corresponding to both association rules and rough set reducts [Ślezak, 2005]. He defined association reduct as a pair (A, B) of disjoint subsets of attributes such that all data supported patterns involving A approximately determine those involving B. He developed an information theory based algorithm to compute association reducts. As the algorithm needs to examine all association reducts, it has exponential time requirements. In order to alleviate this hardship, Slezak targeted significantly smaller ensembles of dependencies providing reasonably rich knowledge, and developed an order based genetic algorithm to achieve this [Ślezak, 2009]. Shen and Jensen proposed the concept of retainer as an approximation of a reduct [Richard and Qiang, 2001]. The authors suggest a heuristic to compute the retainer and demonstrate its usefulness for the classification task. For clustering textual database consisting of N documents, with a vocabulary of size V, Li et al. developed an algorithm based on approximate reducts that works in time O(VN) [Li et al., 2006]. 43

4 4.2 Related Work Rough sets have been widely used for classification and clustering [Lingras and West, 2004], [Mitra et al., 2006], [Pawlak, 1991]. The classical k-means algorithm has been extended to rough k-means algorithm by Lingras et al. [Lingras and West, 2004]. In rough k-means algorithm, a cluster in the lower approximation, called the core cluster, is surrounded by a buffer or boundary set having objects with unclear membership status [Lingras and West, 2004]. A data point in the lower approximation surely belong to a cluster, although, membership of the objects in an upper approximation is uncertain. Signature of each cluster is represented by its center, lower and upper approximation. If lower and upper approximations are equal then buffer set is empty and the data objects are crisply assigned to the cluster. The rough k-means algorithm follows an iterative process, wherein cluster centers are updated until convergence criterion is met. Asharaf et al. have extended rough k- means algorithm in such a way that it does not require prior specification of the number of clusters [Asharaf and Murty, 2004]. They have proposed a two phase algorithm. It identifies a set of leaders which act as prototypes in the first phase. Subsequently a set of supporting leaders are identified, which can act as leaders, provided they yield better partitioning. The evolutionary rough k-medoids algorithm [Peters et al., 2008] is based on the family of rough clustering algorithms and the classical k-medoids algorithm [Kaufman and Rousseeuw, 1990]. In Malyszko et al. have extended rough k-means clustering to rough entropy clustering [Malyszko and Stepaniuk, 2009]. It is an iterative process: firstly a predefined number of weight pairs are selected, for each weight pair a new offspring clustering is determined, rough entropy is computed, and the partition which gives highest rough entropy is selected. Liu et al. have proposed a feature selection method ISODATA-RFE for high dimensional gene expression datasets [Liu et al., 2012]. Bhattacharya distance is used to rank the features of training set. Features with low Bhat- 44

5 tacharya distance are removed from feature set. For separating different classes, fuzzy ISODATA algorithm is used to calculate sensitivity index of each feature. A recursive feature elimination method is applied to feature set for removing unimportant features. It generates multiple nested candidate feature subsets. Finally, the feature subset with least error is selected for use in classification and clustering algorithms. Own and Abraham have proposed a new weighted rough set framework based classification for neonatal jaundice [Own and Abraham, 2012]. The weighted information table is built by applying class equal sample weighting. While samples in majority class have smaller weight, the samples in minority class have larger weight. A weighted reduction algorithm MLEM2 exploits the significance of the attributes to extract a set of diagnosis rules from decision system of NeoNatal Jaundice database. Deng et al. have proposed an enhanced entropy weighting subspace clustering algorithm for high dimensional gene expression data [Deng et al., 2011]. Its objective function integrates the fuzzy within cluster compactness and between cluster information simultaneously. [Cordeiro de Amorim and Mirkin, 2012] have extended the weighted K-means algorithm proposed by Huang et al.. They have replaced Euclidean distance metric by minkowski metric for measuring distances as the Euclidean distance cannot capture the relationship between scales of the feature values and feature weights. Bai et al. have proposed a novel weighting algorithm for categorical data [Bai et al., 2011]. The algorithm computes two weights for each dimension in each cluster. These weight values are used to identify the subsets of attributes which can categorize different clusters. Rough set theory has been applied in conjunction with fuzzy set theory in several domains such as fuzzy rule extraction, reasoning with uncertainty, fuzzy modelling, and feature selection [Maji and Pal, 2010]. The classical fuzzy c-means algorithm has been used in conjunction with rough sets to develop rough fuzzy c-means (RFCM) algorithm [Mitra and Banka, 2007]. The concept of membership in FCM enables efficient handling of overlapping 45

6 partitions, while, the rough sets are aimed at modelling uncertainty in data. Such hybrid techniques provide a strong paradigm for uncertainty handling in various application domains such as pattern recognition, image processing, mining stock prices, vocabulary for information retrieval, fuzzy clustering, dimensionality reduction, data mining and knowledge discovery [Maji and Paul, 2011], [Maji and Pal, 2010]. Maji and Pal proposed an algorithm RFCMdd for selecting the most informative bio-basis (medoids), where each partition is represented by a medoid computed as weighted average of the crisp lower approximation and fuzzy boundary [Maji and Pal, 2007b]. In Maji introduced a quantitative measure of similarity among genes based on fuzzy rough sets to develop fuzzy-rough supervised attribute clustering (FRSAC) algorithm [Maji, 2011]. 4.3 Rough Fuzzy c-means Subspace Clustering In this section, we propose an algorithm based on rough fuzzy c-means algorithm for subspace clustering Rough c-means Rough c-means algorithm [Lingras and West, 2004], has extended the concept of c-means by considering each cluster as an interval or rough set, where lower and upper approximations BX and BX are characteristics of rough set X. A rough set has following properties: (i) An object x j can belong to at most one lower approximation. (ii) If x j BX of cluster X, then x j BX also. (iii) If x j does not belongs to any lower approximation, then it belongs to two or more upper approximations, i.e. overlap between clusters is possible. 46

7 The iterative steps of the Rough c-means Algorithm are as follows: Algorithm 2 Rough c-means Algorithm 1. Chose initial means z i, 1 i k, for the k clusters. 2. Assign each data point x j, 1 j n, to the lower approximation BU i or upper approximations BU i, BU i of cluster pairs U i, U i by computing the difference in its distance d i j d ij, where x j be a j th data point at distance d ij from i th centroid z i of cluster U i. 3. Let d ij be minimum and d i j be the next to minimum. If d i j d ij is less than some threshold then x j BU i and x j BUí and x j cannot be a member of any lower approximation, else x j BU i such that distance d ij is minimum over the k clusters. 4. Compute new mean z i for each cluster, as x x j j (BUi BU i ) BU i BU i x x z i = j w low j BUi BU i +w up x x j j BUi BU i x x j j (BUi BU i ) BU i BU i if BU i = φ BU i BU i φ if BU i φ BU i BU i φ otherwise. where the parameters w low and w up represents the relative importance of the lower and upper approximations respectively. Thus, RCM generates three types of clusters, with objects (i) in both the lower and upper approximations, (ii) only in lower approximation, and (iii) only in upper approximation. 5. Repeat Steps 2-4 until convergence, i.e., there are no more new assignments, or upper limit on the number of iterations is reached. Note: w up = 1 w low, 0.5 < w low < 1, and 0 < threshold <

8 4.3.2 Rough-Fuzzy c-means Rough-fuzzy c-means algorithm [Mitra et al., 2006] incorporates weighted distance in terms of fuzzy membership value u ij of a data point x j to a cluster mean z i, instead of the absolute individual distance d ij of j th data point from i th cluster center. The iterative steps of the algorithm are as follows: Algorithm 3 Rough Fuzzy c-means Algorithm 1. Chose initial means z i, 1 i k, for the k clusters. 2. Compute u ij by eq. 3.9 for k clusters and n data objects. 3. Assign each data point x j to the lower approximation BU i or upper approximation BU i, BU i of cluster pairs U i, U i by computing the difference in its membership u ij u i j. 4. Let u ij be maximum and u i j be the next to maximum. If u ij - u i j is less than some threshold then x j BU i and x j BU i and x j cannot be a member of any lower approximation, else x j BU i such that membership u ij is maximum over the k clusters. 5. Compute new mean z i for each cluster, as x j (BU i BU i ) µα ij x j if BU i = φ BU i BU i φ x j (BU i BU i ) µα ij µ z i = α x w j BU i ij x j x low + w j (BU i BU i ) µα ij x j µ α up if BU i φ BU i BU i φ x j BU i ij x j (BU i BU i ) µα ij µ x α j BU i ij x j, otherwise. xj BU i µ α ij 6. Repeat Steps 2-5 until convergence, i.e., there are no more new assignments, or upper limit on the number of iterations is reached. Note: w up = 1 w low, 0.5 < w low < 1, and 0 < threshold <

9 4.3.3 Rough Fuzzy c-means Subspace Clustering Algorithm The proposed algorithm called Rough Fuzzy c-means Subspace (RFCMS) has been developed by hybridizing the concept of fuzzy membership for objects (in clusters) and dimensions (fuzzy membership serves as weight of dimension) and rough set based approximations of clusters. Objective Function Let, BU i, BU i and BU i BU i denote lower approximation, upper approximation, and boundary region of i th cluster U i respectively. In [Lingras and West, 2004] the classical objective function of fuzzy c-means algorithm has been modified in the rough framework by incorporating the lower and upper approximations of the clusters. We have extended the objective function of rough fuzzy c-means algorithm [Lingras and West, 2004] by incorporating the weights of dimensions as relevant to different clusters. We associate with i th cluster, the weight vector, ω i which represents the relative relevance of different attributes for the i th cluster. Thus, in the matrix W = [ω ir ] k d, ω ir denote the contribution of r th dimension to the i th cluster. The sum of contributions from all dimensions adds to 1 for each cluster. d ω ir = 1, 1 i k, (4.1) r=1 ω ir [0, 1], 1 i k, 1 r d (4.2) The proposed RFCMS algorithm minimizes the following objective function J RF CMS to partition data set into k clusters. aa + bb if BU φ BU BU φ J RF CMS = A if BU φ BU BU = φ B otherwise. 49

10 where, A = k d µ α ijωird β 2 ijr x j BU i i=1 r=1 B = k d µ α ijωird β 2 ijr (4.3) x j (BU i BU i ) i=1 r=1 In the above formulation, A and B correspond to lower and upper approximations. Parameters a and b control the contribution of lower and upper approximation of a cluster. d 2 ijr = (x jr z ir ) 2 (4.4) is the distance between i th cluster center and j th data object along r th dimension. Parameters α (1, ), β (1, ) are weighting components. These parameters control the fuzzification of µ ij and ω ir respectively. Solving 4.3 w.r.t µ ij and ω ir we get: µ ij = 1 kl=1 [ d r=1 (ω ir) β d 2 ijr d r=1 (ω lr) β d 2 ljr ] 1/(α 1) (4.5) ω ir = d l =1 1 [ n ] 1/(β 1) (4.6) j=1 (µ ij) α d 2 ijr n j=1 (µ ij) α d 2 ijl The weights of dimensions are computed using eq. 4.6 as in [Kumar and Puri, 2009]. Cluster Center The cluster centers are computed as: x j (BU i BU i ) µα ij x jr if BU i = φ BU i BU i φ z ir = a x j (BU i BU i ) µα ij x j BU i µ α ij x jr x j BU i µ α ij x +b j (BU i BU i ) µα ij x jr x j (BU i BU i ) µα ij x j BU µ α i ij x jr x j BU µ α i ij 50 if BU i φ BU i BU i φ otherwise.

11 (4.7) As the objects lying in lower approximation definitely belong to the cluster so they are assigned higher weights as compared to weight for objects lying in boundary region. For the case a 1 cluster center may get stuck in local optimum because clusters cannot find the objects lying in the boundary region and therefore, they may not be able to move towards the best cluster center. In order to maintain the greater degree of freedom to move, the values of parameters a and b are set as o < a < b < 1 such that a + b = 1 [Maji and Pal, 2007a]. Like FCM [Bezdek et al., 1987], and Yan s fuzzy curve tracing algorithm [Yan, 2004] the proposed RFCMS algorithm converges, at least along a subsequence, to a local optimum solution. The iterative steps of the algorithm are as follows: Algorithm 4 Rough Fuzzy c-means Subspace Clustering Algorithm 1. Chose initial cluster centers z i, 1 i k, for the k clusters. 2. Compute µ ij by eq. 4.5 for k clusters and n data objects. 3. Let µîj be maximum and µ ij be the next to maximum for an object x j. If µîj - µ ij is less than some threshold ɛ then x j BUî and x j BU i and x j cannot be a member of any lower approximation, else x j BUî such that membership µîj is maximum over the k clusters. 4. Compute ω ir by eq. 4.6 for k clusters and d dimensions. 5. Compute new cluster centers z i for each cluster, as in eq Repeat steps 2-5 until convergence, i.e., there are no more new assignments, or limit on maximum number of iterations is reached. Note: a = 1 b, 0.5 < a < 1, and 0 < threshold <

12 4.4 Convergence In this section, we discuss the convergence criteria of the proposed algorithm along with its proof. On the similar lines, as global convergence property of FCM algorithm, global convergence property of RFCMS states that for any data set and initialization parameters, an iteration sequence of RFCMS algorithm either (i) converges to a local minimum or (ii) there exists a subsequence of the iteration sequence that converges to a stationary point. Theorems 4.1, 4.2 and 4.3 below show that necessary and sufficient conditions hold for U, W, and Z respectively. Theorem 4.1 Let η : M kn f Z R kd is fixed. Then U M kn f calculated by equation: µ ij = 1 R, η(u) = J RF CMS (U, W, Z), where W M kd f k l=1 and is the strict local minima of η if and only if U is [ d r=1 (ω ir )β d 2 ijr d r=1 (ω lr )β d 2 ljr ] 1/α 1 Proof 4.1 We have to minimize J RF CMS with respect to U, W, subject to constraints 2.11, and 4.1 where α (1, ), and β (1, ). non-negativity of µ ij and ω ir we set µ ij = S 2 ij and ω ir = P 2 ir. In order to ensure The constraints 2.11, and 4.1 have been adjoined to J RF CMS with a set of Lagrange multipliers {λ j, 1 j n}, and {φ i, 1 i k} to formulate the new objective function as follows: J RF CMS = n j=1 ki=1 dr=1 S 2α ij P 2β ir d 2 ijr+ n j=1 λ j ( ki=1 S 2 ij 1 ) + k i=1 φ i ( dr=1 P 2 ir 1 ) Now, we compute the first order derivate of J RF CMS with respect to S ij, necessary condition for optimality. J RF CMS S ij J RF CMS S ij = 2α = 2S ij d r=1 [ α Sij 2α 1 P 2β ir d 2 ijr + 2λ j S ij = 0 (4.8) d r=1 ] Sij 2α 2 P 2β ir d 2 ijr + λ j = 0 (4.9) 52

13 Assuming that S ij 0, 1 j n, 1 i k, we get: α d r=1 Sij 2α 2 P 2β ir d 2 ijr + λ j = 0 or λ j = α d r=1 Sij 2α 2 P 2β ir d 2 ijr or S 2α 2 ij = or Sij 2 = λ j α d r=1 P 2β λ j α d r=1 P 2β ir d 2 ijr ir d 2 ijr 1 (α 1) λ µ ij = j α d r=1 P 2β ir d 2 ijr 1 (α 1) (4.10) Using constraint eq in eq. 4.10, we get: k k λ µ ij = j i=1 i=1 α d r=1 P 2β ir d 2 ijr 1 (α 1) = 1 Substituting the value of λ j in eq. 4.10, we obtain: µ ij = 1 kl=1 [ d r=1 ωβ ir d2 ijr d r=1 ωβ lr d2 ljr ] 1/(α 1) (4.11) Now, to prove the sufficiency condition we compute the second order partial derivative. 2 J RF CMS S ij S i j = 2α(2α 1) d r=1 Sij 2α 2 P 2β ir d 2 ijr + 2λ j if i = i j = j, 0 otherwise. = 2α(2α 1) d r=1 µ (α 1) ij P 2β ir d 2 ijr + 2λ j (4.12) = 2α(2α 1)µ (α 1) ij d 2 ij + 2λj (4.13) 53

14 where d ij = d r=1 P 2β ir d 2 ijr Substituting the value of µ ij and λ j in 4.13, we get: / 1/(α 1) (β 1) 2α(2α 1)d 2 k / ij 1 d2 ij k l=1 d /(α 1) (α 1) lj l=1 αd 2 lj / k = (2α(2α 1) 2α) 1 1 l=1 d 2 lj 1/(α 1) (α 1) (4.14) [ k [ ] = 4α(α 1) d 2 1/(1 α) ] (1 α) lj l=1 [ k ] (1 α) Letting, a j = (d 2 lj ) 1/(1 α), 1 j n, l=1 (4.15) 2 J RF CMS S ij S i j = γ j where, γ j = 4α(α 1)a j 1 j n. (4.16) Hence there are n distinct eigen values each of multiplicity k, of Hessian matrix of U which is a diagonal matrix. With the assumptions α > 1, β > 1 and d 2 ij > 0 l, j it implies γ j > 0 j. Thus, Hessian matrix of U is positive definite and hence, the sufficiency condition is proved. Theorem 4.2 Let ζ : M kd f Z R kd is fixed. Then W M kd f 1 is calculated by equation: ω ir = [ n R, ζ(w ) = J RF CMS (U, W, Z), where U M kn f d l =1 and is the strict local minima of ζ if and only if W j=1 (µ ij )α d 2 ijr n j=1 (µ ij )α d 2 ijl ] 1/β 1 Proof 4.2 In order to obtain the first order necessary condition for optimality, we set the gradient of J α,β w.r.t P ir equal to zero. J RF CMS P ir = 2β n j=1 S 2α ij P 2β 1 ir d 2 ijr + 2φ i P ir = 0 (4.17) We assume that P ir 0, 1 i k, 1 r d. Computing in a manner as in theorem 3.1 we obtain: P 2 ir = [ φ i β n j=1 S 2α ij d 2 ijr 54 ] 1 (β 1)

15 Since, ω ir = P 2 ir we get: ω ir = Using constraint eq. 3.4 we get: [ [ d d ω ir = r=1 r=1 φ i β n j=1 S 2α ij d 2 ijr φ i β n j=1 S 2α ij d 2 ijr Substituting the value of φ i in eq. 4.18, we obtain: ] 1 (β 1) ] 1 (β 1) = 1 (4.18) 1 ω ir = [ n ] 1/(β 1) (4.19) d l j=1 µα ij d2 ijr =1 n j=1 µα ij d2 ijl Now, to prove the sufficiency condition we compute the second order partial derivative 2 J RF CMS P ir P i r = 2β(2β 1) n j=1 P 2β 2 ir Sij 2α d 2 ijr + 2φ i if i = i, r = r 0 otherwise. where = 2β(2β 1) n j=1 = 2β(2β 1)ω (β 1) ir ˆ d 2 ir = ω (β 1) ir Sij 2α d 2 ijr + 2φ i (4.20) n j=1 (4.21) dˆ 2 ir + 2φ i (4.22) S 2α ij d 2 ijr (4.23) Substituting the value of ω ir and φ i in 4.22, we get: / = 2β(2β 1) d ˆ2 d ˆ ir 1 d 2 1/(β 1) (β 1) / ir d 2 ˆ l =1 d /(β 1) (β 1) il l =1 βd ˆ 2 il / d = (2β(2β 1) 2β) 1 1ˆ 1/(β 1) (β 1) l =1 d2 il (1 β) d = 4β(β 1) ( d ˆ2 il )1/(1 β) l =1 55

16 (1 β) d Letting, b i = ( d ˆ2 il )1/(1 β) 1 i k, l =1 2 J RF CMS P ir P i r = κ i where, κ i = 4β(β 1)b i 1 i k. (4.24) Hence there are k distinct eigen values each of multiplicity r, of Hessian matrix of W which is a diagonal matrix. With the assumption α > 1, β > 1 and d ˆ2 > 0 i, l il it implies κ i > 0 i. Thus, Hessian matrix of W is positive definite and hence, the sufficiency condition is proved. Theorem 4.3 Let ξ : R kd R, ξ(z) = J RF CMS (U, W, Z), where U M fk and W M kd f is fixed. Then Z is the strict local minima of ξ if and only if Z is calculated using eq Proof 4.3 Now, we discuss necessary and sufficient condition for cluster centers Z to converge. Now, we compute the first order derivative of J RF CMS with respect to Z, which is again a necessary condition for optimality. J z ir = a x j BU i µ α ij(x jr z ir ) + b x j (BU i BU i ) µ α ij(x jr z ir ) = 0 Thus, cluster centers are computed as the weighted average the crisp lower approximation and fuzzy boundary. z ir = a x j BU i µ α ijx jr x j BU i µ α ij when BU i φ BU i BU i φ Hence eq can be written as: + b x j (BU i BU i ) µα ijx jr x j (BU i BU i ) µα ij (4.25) z ir = a z irlower approx + b z irupper approx where z irlower approx = x j BU i x jr BU i (4.26) 56

17 z irupper approx = x j (BU i BU i ) µα ijx jr x j (BU i BU i ) µα ij(4.27) As an object may not belong to both lower approximation and upper approximation, thus, the convergence of cluster center depends on both the lower and upper approximation of cluster center. Eqs and 4.27 can be written as: BU i z irlower approx = x jr (4.28) x j BU i BU i BU i µ α ijz irlower approx = µ α ijx jr (4.29) x j (BU i BU i ) Eqs and 4.29 represents a linear set of equations. In order to prove the convergence we treat eqs and 4.27 as a Gauss-seidel iterations for solving the set of equations with µ ij considered to be fixed. The sufficient condition by Gauss-seidel algorithm for assuring the convergence of the matrix, representing each iteration is that it should be diagonally dominant. The matrices corresponding to eqs and 4.27 are: BU Ã = 0 BU BU k η B = 0 η η k where η i = x j (BU i BU i ) µα ijx j The sufficient condition for matrices A and B to be diagonally dominant is: BU i > 0 and η i > 0 respectively. 57

18 Also, going by the convergence theorem proposed by [Bezdek et al., 1987] for FCM, [Maji and Pal, 2007a] and [Yan, 2004] convergence analysis of the fuzzy curve tracing algorithm, matrices à and B are hessian of A and B w.r.t z irlower approx and z irupper approx respectively with all postive eigen values and hence proved that these matrices are diagonally dominant. Thus, by theorem 4.1, 4.2 and 4.3 the proposed algorithm RFCMS converges, at least along a subsequence, to a local optimum solution. 4.5 Experiments In this section, we present the comparative performance of proposed subspace clustering algorithm RFCMS with FCM, RCM, RFCM, DOC, and PROCLUS, using UCI data sets [uci, ]. While FCM, RCM, RFCM are full dimensional clustering algorithms, PROCLUS and DOC, are subspace clustering algorithms tailored for high-dimensional applications. We used MATLAB version of FCM, opensubspace weka [osw, ] implementation for DOC and PROCLUS, and implemented RCM, RFCM, and RFCMS algorithms in MATLAB. In all the experiments, with FCM, RCM, RFCM and RFCMS algorithm the stopping criterion parameter ɛ was set as 10 3 and the maximum number of iterations was restricted to 100. However, in all the experiments we conducted, the algorithms always converged before the limit on the number of iterations was reached. The normed differences between successive iterations of matrix Z is compared with the threshold parameter ɛ, set to define convergence criterion. Based on experimentation, we set the value of parameters a = 0.85 and b = 0.25 for RCM, RFCM and RFCMS algorithms. The parameters for DOC algorithm were used as mentioned in [Procopiuc et al., 2002]. The number of clusters k was set equal to the number of classes given in each data set, as indicated in Table 4.1. We have evaluated the effect of fuzzification parameters α and β of RFCMS algorithm and fuzzification parameter m of FCM and RFCM algorithms. We evaluated the performance of all the algorithms w.r.t. quality and validity measures. The set of relevant dimensions computed by each 58

19 Data Sets Instances Attributes Classes Alzheimr Breast Cancer Spambase Wine Diabetes Magic Table 4.1: Data Sets of the subspace clustering algorithms RFCMS, DOC and PROCLUS have been shown for all the data sets Data Sets We experimented with Alzhemir, Breast Cancer, Spambase, Wine, Diabetes and Magic data sets from the UCI data repository [uci, ]. These data sets are heterogeneous in terms of size, number of clusters, and distribution of classes and have no missing values. General characteristics of the data sets are summarized in Table Effect of Fuzzification Parameters For the RFCMS algorithm, the best combination of fuzzification parameters α and β was determined by varying the values of α and β in the range 2-10 independent of each other. This was done for each data set. Similarly, the best value of fuzzification parameter m for FCM and RFCM algorithm was determined by varying the values of m. Table 4.2 shows the complete list of fuzzification parameters we found for different data sets as a result of fine-tuning. 59

20 Data Sets RFCMS FCM RFCM α β m m Alzehmir Breast Cancer Spambase Wine Diabetes Magic Table 4.2: Fuzzifier Values: RFCMS, FCM, and RFCM Data Sets RFCMS FCM RCM RFCM PROCLUS DOC Alzehmir Breast Cancer Spambase Wine Diabetes Magic Table 4.3: Accuracy: RFCMS, FCM, RCM, RFCM, PROCLUS, and DOC Cluster Validity Table 4.3 shows accuracy results for all the algorithms and data sets. RFCMS algorithm has highest accuracy for Breast Cancer, Spambase and Wine data sets. FCM algorithm achieves highest accuracy for Alzehmir data set, RFCM algorithm achieves highest accuracy for Magic data set and Doc algorithm achieves highest accuracy for Diabetes data set. In Table 4.4, 4.5, 4.6, and 4.7, we present the results of applying recall, specificity, precision and F1-measure to the outcomes of clustering schemes produced by different algorithms. RFCMS algorithm achieves highest recall and specificity for Breast Cancer, Spambase and Wine data sets. FCM algorithm achieves highest recall and specificity for Alzehmir data set, RFCM algorithm achieves highest recall and specificity for Magic data set and Doc 60

21 Data Sets RFCMS FCM RCM RFCM PROCLUS DOC Alzehmir Breast Cancer Spambase Wine Diabetes Magic Table 4.4: Recall: RFCMS, FCM, RCM, RFCM, PROCLUS, and DOC Data Sets RFCMS FCM RCM RFCM PROCLUS DOC Alzehmir Breast Cancer Spambase Wine Diabetes Magic Table 4.5: Specificity: RFCMS, FCM, RCM, RFCM, PROCLUS, and DOC algorithm achieves highest recall and specificity for Diabetes data set. RFCMS algorithm has highest precision for Breast Cancer, Spambase, Diabetes, Magic and Wine data sets. FCM algorithm achieves highest precision for Alzehmir data set. RFCMS algorithm achieves highest F1-measure for Breast Cancer, Spambase and Wine data sets. FCM algorithm achieves highest F1-measure for Alzehmir data set, RFCM algorithm achieves highest F1-measure for Magic data set. FCM, RCM and RFCM algorithms achieve highest F1-measure for Diabetes data set. In summary itcan be seen that no algorithm is a clear winner w.r.t all measures for all the algorithms and all the data sets. 61

22 Data Sets RFCMS FCM RCM RFCM PROCLUS DOC Alzehmir Breast Cancer Spambase Wine Diabetes Magic Table 4.6: Precision: RFCMS, FCM, RCM, RFCM, PROCLUS, and DOC Data Sets RFCMS FCM RCM RFCM PROCLUS DOC Alzehmir Breast Cancer Spambase Wine Diabetes Magic Table 4.7: F1-measure: RFCMS, FCM, RCM, RFCM, PROCLUS, and DOC Subspaces Generated The proposed algorithm RFCMS, is an objective function based subspace clustering algorithm. For such algorithms fewer the number of dimensions lesser will be the error or scatter among objects of a cluster. We have compared RFCMS, DOC and PROCLUS algorithms in terms of the number of dimensions found. Tables 4.8, 4.9, 4.10, 4.11, 4.12 and 4.13 show the sets of dimensions found for Alzehmir, Breast Cancer, Spambase, Wine, Diabetes and Magic data sets by RFCMS, PROCLUS and DOC algorithms. For all the data sets mentioned above, RFCMS algorithm finds subspaces with fewer dimensions. 62

23 Cluster No. RFCMS PROCLUS DOC 1 4 4,6,7 1,2,3,4,5,6,7 2 4, 5, 7 4,5,6 1,2,3,4,5,6,7 3 4, 5, 6 4,5,6 1,2,3,4,5,6,7 Table 4.8: Dimensions: RFCMS, PROCLUS and DOC for Alzehmir Cluster No. RFCMS PROCLUS DOC 1 10, 15, , , , , , 15, 20 1,2 1-3, , Table 4.9: Dimensions: RFCMS, PROCLUS and DOC for Breast Cancer Cluster No. RFCMS PROCLUS DOC 1 28, 29, 32, 34, 38, 44, , 46, 47, 51, 52 40, Table 4.10: Dimensions: RFCMS, PROCLUS and DOC for Spambase Cluster No. RFCMS PROCLUS DOC 1 3, 8, 11 1,2,3,6,7,8,9,11, , 8, 11 1,3,6,7,8,9,11, , 7, 8, 9, 11 1, Table 4.11: Dimensions: RFCMS, PROCLUS and DOC for Wine Cluster No. RFCMS PROCLUS DOC 1 1,6,7 1,6-8 1, ,7 1,4,5,7 1, 6-8 Table 4.12: Dimensions: RFCMS, PROCLUS and DOC for Diabetes 63

24 Cluster No. RFCMS PROCLUS DOC 1 4,5 3,4,5,8,9 2-6,8,9 2 4,5 1,2,3,4,5 1-5,8 Table 4.13: Dimensions: RFCMS, PROCLUS and DOC for Magic Experiments on Biological Datasets In this section, we present the comparative performance of proposed projected clustering algorithm RFCMS with EWKM, FWKM and LAC algorithms for biological data sets. RFCMS, EWKM, FWKM and LAC algorithms are subspace clustering algorithms tailored for high-dimensional applications. We used weka implementation for EWKM, FWKM and LAC [Peng and Zhang, 2011]. The parameters for EWKM, FWKM and LAC algorithm were used as mentioned in [Jing et al., 2007], [Jing et al., 2005] and [Domeniconi et al., 2007]. We have evaluated the effect of fuzzification parameters α and β of RFCMS algorithm. We evaluated the performance of all the algorithms w.r.t. validity measures. The set of relevant dimensions computed by each of the subspace clustering algorithms RFCMS have been shown for all the data sets Data Sets We experimented with Colon, Embroynal Tumours, Prostate and Leukemia data sets [bio, ]. These data sets are heterogeneous in terms of size, and have no missing values. We have chosen datasets which are pre-classified as it helps in evaluating the results of applying clustering algorithms. General characteristics of the data sets are summarized in Table Effect of Fuzzification Parameters For the RFCMS algorithm the best combination of fuzzification parameters α and β was determined by varying the values of α and β in the range 2-5 independent of each other. This was done for each data set. Table 4.15 shows the complete list of 64

25 Data Sets Instances Attributes Classes Colon Cancer Embroynal Tumours Leukemia Prostate Table 4.14: Data Sets fuzzification parameters we found for different data sets as a result of fine-tuning. Data Sets α β Colon Cancer 2 4 Embroynal Tumours 3 5 Leukemia 3 4 Prostate 2 2 Table 4.15: Fuzzifier Values Cluster Validity Table 4.16 shows accuracy results for all the algorithms and data sets. RFCMS algorithm achieves highest accuracy for Colon and Leukemia datasets. FWKM and LAC algorithm achieves highest accuracy for Embroynal Tumour data set. FWKM algorithm achieves highest accuracy for Prostate data set. However accuracy of Data Sets RFCMS EWKM FWKM LAC Colon Cancer Embroynal Tumours Leukemia Prostate Table 4.16: Accuracy: RFCMS, EWKM, FWKM and LAC 65

26 Data Sets RFCMS EWKM FWKM LAC Colon Cancer Embroynal Tumours Leukemia Prostate Table 4.17: Specificity: RFCMS, EWKM, FWKM and LAC RFCMS was comparable with FWKM algorithm for both Embryonal Tumour and Prostate data set. In Table 4.18, 4.17, 4.19, and 4.20, we present the results of applying recall, specificity, precision and F1-measure to the outcomes of clustering schemes produced by different algorithms. RFCMS algorithm achieves highest recall for Leukemia data set. EWKM, FWKM and LAC algorithms achieve highest recall for Embryonal Tumour data set. FWKM algorithm achieves highest recall for Prostate data set. RFCMS algorithm achieves highest specificity for Colon, Embroynal Tumours, Prostate and Leukemia data sets. RFCMS algorithm has highest precision for Colon and Leukeima data sets. EWKM, FWKM and LAC algorithms achieve highest precision for Embryonal Tumour data set. EWKM and LAC algorithm achieves highest precision for Prostate data set. RFCMS algorithm achieves highest F1-measure for Colon, Embroynal Tumours, and Prostate data sets. FWKM achieves highest F1-measure for Leukemia data set Subspaces Generated Figures 4.1 to 4.12 show the set of dimensions found for Colon, Embroynal Tumours, Prostate and Leukemia data sets by RFCMS, EWKM and LAC algorithms. RFCMS algorithm finds fewer dimensions as compared to EWKM and LAC algorithms. For Embroynal Tumours data set EWKM and LAC algorithms fails to 66

27 Data Sets RFCMS EWKM FWKM LAC Colon Cancer Embroynal Tumours Leukemia Prostate Table 4.18: Recall: RFCMS, EWKM, FWKM and LAC Data Sets RFCMS EWKM FWKM LAC Colon Cancer Embroynal Tumours Leukemia Prostate Table 4.19: Precision: RFCMS, EWKM, FWKM and LAC distinguish the relevance of dimensions for cluster 2. However RFCMS algorithm distinguishes the relevant and non relevant dimensions for cluster 2. For Prostate data set RFCMS algorithm finds fewer dimensions as compared to EWKM and LAC algorithms. For Leukemia data set results of RFCMS, EWKM, and LAC algorithms are comparable. Data Sets RFCMS EWKM FWKM LAC Colon Cancer Embroynal Tumours Leukemia Prostate Table 4.20: F1-measure: RFCMS, EWKM, FWKM and LAC 67

28 Figure 4.1: RFCMS: Memberships of dimensions in cluster 1 and cluster 2 for Colon Dataset Figure 4.2: EWKM: Memberships of dimensions in cluster 1 and cluster 2 for Colon Dataset 68

29 Figure 4.3: LAC: Memberships of dimensions in cluster 1 and cluster 2 for Colon Dataset Figure 4.4: RFCMS: Memberships of dimensions in cluster 1 and cluster 2 for Embryonal Tumours Dataset 69

30 Figure 4.5: EWKM: Memberships of dimensions in cluster 1 and cluster 2 for Embryonal Tumours Dataset Figure 4.6: LAC: Memberships of dimensions in cluster 1 and cluster 2 for Embryonal Tumours Dataset 70

31 Figure 4.7: RFCMS: Memberships of dimensions in cluster 1 and cluster 2 for Prostate Dataset Figure 4.8: EWKM: Memberships of dimensions in cluster 1 and cluster 2 for Prostate Dataset 71

32 Figure 4.9: LAC: Memberships of dimensions in cluster 1 and cluster 2 for Prostate Dataset Figure 4.10: RFCMS: Memberships of dimensions in cluster 1 and cluster 2 for Leukemia Dataset 72

33 Figure 4.11: EWKM: Memberships of dimensions in cluster 1 and cluster 2 for Leukemia Dataset Figure 4.12: LAC: Memberships of dimensions in cluster 1 and cluster 2 for Leukemia Dataset 73

34 4.6 Summary In this chapter, we have proposed a novel subspace clustering algorithm which employs a combination of rough sets and fuzzy set theory. Rough fuzzy c-means Subspace (RFCMS) algorithm is an extension of rough fuzzy c-means algorithm, which incorporates fuzzy membership of data points and dimensions in each cluster. In each iteration, cluster centers are updated and a data point is assigned to lower approximations or upper approximation of a cluster. This process is repeated until convergence criterion is met. We have also discussed the convergence of the proposed algorithm. The results of applying the proposed approach to UCI data sets shows that the proposed algorithm scores over its competitors in terms of several validity measures. The proposed algorithm can be used in conjunction with density based algorithms to automatically detect the number of clusters. 74

RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets

RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets Fundamenta Informaticae 8 (7) 475 495 475 RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets Pradipta Maji and Sankar K. Pal Center for Soft Computing Research Indian Statistical Institute

More information

RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets

RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets Fundamenta Informaticae 8 (27) 475 496 475 RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets Pradipta Maji and Sankar K. Pal Center for Soft Computing Research Indian Statistical Institute

More information

Collaborative Rough Clustering

Collaborative Rough Clustering Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical

More information

CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH

CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH 37 CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH 4.1 INTRODUCTION Genes can belong to any genetic network and are also coordinated by many regulatory

More information

A Novel Fuzzy Rough Granular Neural Network for Classification

A Novel Fuzzy Rough Granular Neural Network for Classification International Journal of Computational Intelligence Systems, Vol. 4, No. 5 (September, 2011), 1042-1051 A Novel Fuzzy Rough Granular Neural Network for Classification Avatharam Ganivada, Sankar K. Pal

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION

HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION 1 M.S.Rekha, 2 S.G.Nawaz 1 PG SCALOR, CSE, SRI KRISHNADEVARAYA ENGINEERING COLLEGE, GOOTY 2 ASSOCIATE PROFESSOR, SRI KRISHNADEVARAYA

More information

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

CHAPTER 7 A GRID CLUSTERING ALGORITHM

CHAPTER 7 A GRID CLUSTERING ALGORITHM CHAPTER 7 A GRID CLUSTERING ALGORITHM 7.1 Introduction The grid-based methods have widely been used over all the algorithms discussed in previous chapters due to their rapid clustering results. In this

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Kapitel 4: Clustering

Kapitel 4: Clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

A Memetic Heuristic for the Co-clustering Problem

A Memetic Heuristic for the Co-clustering Problem A Memetic Heuristic for the Co-clustering Problem Mohammad Khoshneshin 1, Mahtab Ghazizadeh 2, W. Nick Street 1, and Jeffrey W. Ohlmann 1 1 The University of Iowa, Iowa City IA 52242, USA {mohammad-khoshneshin,nick-street,jeffrey-ohlmann}@uiowa.edu

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised

More information

Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets

Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets Information Granulation and Approximation in a Decision-theoretic Model of Rough Sets Y.Y. Yao Department of Computer Science University of Regina Regina, Saskatchewan Canada S4S 0A2 E-mail: yyao@cs.uregina.ca

More information

Biclustering Bioinformatics Data Sets. A Possibilistic Approach

Biclustering Bioinformatics Data Sets. A Possibilistic Approach Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

Fuzzy-Rough Sets for Descriptive Dimensionality Reduction

Fuzzy-Rough Sets for Descriptive Dimensionality Reduction Fuzzy-Rough Sets for Descriptive Dimensionality Reduction Richard Jensen and Qiang Shen {richjens,qiangs}@dai.ed.ac.uk Centre for Intelligent Systems and their Applications Division of Informatics, The

More information

Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data

Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data Novel Intuitionistic Fuzzy C-Means Clustering for Linearly and Nonlinearly Separable Data PRABHJOT KAUR DR. A. K. SONI DR. ANJANA GOSAIN Department of IT, MSIT Department of Computers University School

More information

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM 96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays

More information

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation

Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation Maximum Class Separability for Rough-Fuzzy C-Means Based Brain MR Image Segmentation Pradipta Maji and Sankar K. Pal Center for Soft Computing Research, Machine Intelligence Unit Indian Statistical Institute,

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data

More information

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data

Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying

More information

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering. Chapter 4 Fuzzy Segmentation 4. Introduction. The segmentation of objects whose color-composition is not common represents a difficult task, due to the illumination and the appropriate threshold selection

More information

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY

ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING WITH UNCERTAINTY ALGEBRAIC METHODS IN LOGIC AND IN COMPUTER SCIENCE BANACH CENTER PUBLICATIONS, VOLUME 28 INSTITUTE OF MATHEMATICS POLISH ACADEMY OF SCIENCES WARSZAWA 1993 ROUGH MEMBERSHIP FUNCTIONS: A TOOL FOR REASONING

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

More information

Non-exhaustive, Overlapping k-means

Non-exhaustive, Overlapping k-means Non-exhaustive, Overlapping k-means J. J. Whang, I. S. Dhilon, and D. F. Gleich Teresa Lebair University of Maryland, Baltimore County October 29th, 2015 Teresa Lebair UMBC 1/38 Outline Introduction NEO-K-Means

More information

A Decision-Theoretic Rough Set Model

A Decision-Theoretic Rough Set Model A Decision-Theoretic Rough Set Model Yiyu Yao and Jingtao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,jtyao}@cs.uregina.ca Special Thanks to Professor

More information

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS 4.1. INTRODUCTION This chapter includes implementation and testing of the student s academic performance evaluation to achieve the objective(s)

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.

More information

A study on lower interval probability function based decision theoretic rough set models

A study on lower interval probability function based decision theoretic rough set models Annals of Fuzzy Mathematics and Informatics Volume 12, No. 3, (September 2016), pp. 373 386 ISSN: 2093 9310 (print version) ISSN: 2287 6235 (electronic version) http://www.afmi.or.kr @FMI c Kyung Moon

More information

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Survey on Rough Set Feature Selection Using Evolutionary Algorithm

Survey on Rough Set Feature Selection Using Evolutionary Algorithm Survey on Rough Set Feature Selection Using Evolutionary Algorithm M.Gayathri 1, Dr.C.Yamini 2 Research Scholar 1, Department of Computer Science, Sri Ramakrishna College of Arts and Science for Women,

More information

FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION

FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION FEATURE EXTRACTION TECHNIQUES USING SUPPORT VECTOR MACHINES IN DISEASE PREDICTION Sandeep Kaur 1, Dr. Sheetal Kalra 2 1,2 Computer Science Department, Guru Nanak Dev University RC, Jalandhar(India) ABSTRACT

More information

A Unified Framework to Integrate Supervision and Metric Learning into Clustering

A Unified Framework to Integrate Supervision and Metric Learning into Clustering A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December

More information

Knowledge Discovery using PSO and DE Techniques

Knowledge Discovery using PSO and DE Techniques 60 CHAPTER 4 KNOWLEDGE DISCOVERY USING PSO AND DE TECHNIQUES 61 Knowledge Discovery using PSO and DE Techniques 4.1 Introduction In the recent past, there has been an enormous increase in the amount of

More information

Classification with Diffuse or Incomplete Information

Classification with Diffuse or Incomplete Information Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Chapter 7 UNSUPERVISED LEARNING TECHNIQUES FOR MAMMOGRAM CLASSIFICATION

Chapter 7 UNSUPERVISED LEARNING TECHNIQUES FOR MAMMOGRAM CLASSIFICATION UNSUPERVISED LEARNING TECHNIQUES FOR MAMMOGRAM CLASSIFICATION Supervised and unsupervised learning are the two prominent machine learning algorithms used in pattern recognition and classification. In this

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

Equi-sized, Homogeneous Partitioning

Equi-sized, Homogeneous Partitioning Equi-sized, Homogeneous Partitioning Frank Klawonn and Frank Höppner 2 Department of Computer Science University of Applied Sciences Braunschweig /Wolfenbüttel Salzdahlumer Str 46/48 38302 Wolfenbüttel,

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

The k-means Algorithm and Genetic Algorithm

The k-means Algorithm and Genetic Algorithm The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective

More information

FEATURE SELECTION TECHNIQUES

FEATURE SELECTION TECHNIQUES CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,

More information

Fuzzy Sets and Systems. Lecture 1 (Introduction) Bu- Ali Sina University Computer Engineering Dep. Spring 2010

Fuzzy Sets and Systems. Lecture 1 (Introduction) Bu- Ali Sina University Computer Engineering Dep. Spring 2010 Fuzzy Sets and Systems Lecture 1 (Introduction) Bu- Ali Sina University Computer Engineering Dep. Spring 2010 Fuzzy sets and system Introduction and syllabus References Grading Fuzzy sets and system Syllabus

More information

Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality

Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality Abstract: Mass classification of objects is an important area of research and application in a variety of fields. In this

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Chapter 7: Competitive learning, clustering, and self-organizing maps

Chapter 7: Competitive learning, clustering, and self-organizing maps Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier

Rough Set Approach to Unsupervised Neural Network based Pattern Classifier Rough Set Approach to Unsupervised Neural based Pattern Classifier Ashwin Kothari, Member IAENG, Avinash Keskar, Shreesha Srinath, and Rakesh Chalsani Abstract Early Convergence, input feature space with

More information

Open and Closed Sets

Open and Closed Sets Open and Closed Sets Definition: A subset S of a metric space (X, d) is open if it contains an open ball about each of its points i.e., if x S : ɛ > 0 : B(x, ɛ) S. (1) Theorem: (O1) and X are open sets.

More information

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze Institute for

More information

Multiple Classifier Fusion using k-nearest Localized Templates

Multiple Classifier Fusion using k-nearest Localized Templates Multiple Classifier Fusion using k-nearest Localized Templates Jun-Ki Min and Sung-Bae Cho Department of Computer Science, Yonsei University Biometrics Engineering Research Center 134 Shinchon-dong, Sudaemoon-ku,

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Fuzzy C-means Clustering with Temporal-based Membership Function

Fuzzy C-means Clustering with Temporal-based Membership Function Indian Journal of Science and Technology, Vol (S()), DOI:./ijst//viS/, December ISSN (Print) : - ISSN (Online) : - Fuzzy C-means Clustering with Temporal-based Membership Function Aseel Mousa * and Yuhanis

More information

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection Based on Locality Preserving Projection 2 Information & Technology College, Hebei University of Economics & Business, 05006 Shijiazhuang, China E-mail: 92475577@qq.com Xiaoqing Weng Information & Technology

More information

S. Sreenivasan Research Scholar, School of Advanced Sciences, VIT University, Chennai Campus, Vandalur-Kelambakkam Road, Chennai, Tamil Nadu, India

S. Sreenivasan Research Scholar, School of Advanced Sciences, VIT University, Chennai Campus, Vandalur-Kelambakkam Road, Chennai, Tamil Nadu, India International Journal of Civil Engineering and Technology (IJCIET) Volume 9, Issue 10, October 2018, pp. 1322 1330, Article ID: IJCIET_09_10_132 Available online at http://www.iaeme.com/ijciet/issues.asp?jtype=ijciet&vtype=9&itype=10

More information

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values

A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values A Comparison of Global and Local Probabilistic Approximations in Mining Data with Many Missing Attribute Values Patrick G. Clark Department of Electrical Eng. and Computer Sci. University of Kansas Lawrence,

More information

Road map. Basic concepts

Road map. Basic concepts Clustering Basic concepts Road map K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clustering algorithm to use?

More information

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf

More information

Cluster quality assessment by the modified Renyi-ClipX algorithm

Cluster quality assessment by the modified Renyi-ClipX algorithm Issue 3, Volume 4, 2010 51 Cluster quality assessment by the modified Renyi-ClipX algorithm Dalia Baziuk, Aleksas Narščius Abstract This paper presents the modified Renyi-CLIPx clustering algorithm and

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

Mi Main goal of rough sets - id induction of approximations of concepts.

Mi Main goal of rough sets - id induction of approximations of concepts. Rough Sets and Rough-Fuzzy Clustering Algorithm Pradipta Maji Machine Intelligence Unit Indian Statistical Institute, Kolkata, INDIA E-mail:pmaji@isical.ac.in Web:http://www.isical.ac.in/~pmaji ac in/

More information

Relational Partitioning Fuzzy Clustering Algorithms Based on Multiple Dissimilarity Matrices

Relational Partitioning Fuzzy Clustering Algorithms Based on Multiple Dissimilarity Matrices Relational Partitioning Fuzzy Clustering Algorithms Based on Multiple Dissimilarity Matrices Francisco de A.T. de Carvalho a,, Yves Lechevallier b and Filipe M. de Melo a a Centro de Informática, Universidade

More information

Improved Performance of Unsupervised Method by Renovated K-Means

Improved Performance of Unsupervised Method by Renovated K-Means Improved Performance of Unsupervised Method by Renovated P.Ashok Research Scholar, Bharathiar University, Coimbatore Tamilnadu, India. ashokcutee@gmail.com Dr.G.M Kadhar Nawaz Department of Computer Application

More information

Chapter 8 The C 4.5*stat algorithm

Chapter 8 The C 4.5*stat algorithm 109 The C 4.5*stat algorithm This chapter explains a new algorithm namely C 4.5*stat for numeric data sets. It is a variant of the C 4.5 algorithm and it uses variance instead of information gain for the

More information

Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Particle Swarm Optimization applied to Pattern Recognition

Particle Swarm Optimization applied to Pattern Recognition Particle Swarm Optimization applied to Pattern Recognition by Abel Mengistu Advisor: Dr. Raheel Ahmad CS Senior Research 2011 Manchester College May, 2011-1 - Table of Contents Introduction... - 3 - Objectives...

More information

MTAEA Convexity and Quasiconvexity

MTAEA Convexity and Quasiconvexity School of Economics, Australian National University February 19, 2010 Convex Combinations and Convex Sets. Definition. Given any finite collection of points x 1,..., x m R n, a point z R n is said to be

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM CHAPTER-7 MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM 7.1 Introduction To improve the overall efficiency of turning, it is necessary to

More information

Rough Fuzzy C-means and Particle Swarm Optimization Hybridized Method for Information Clustering Problem

Rough Fuzzy C-means and Particle Swarm Optimization Hybridized Method for Information Clustering Problem Journal of Communications Vol. 11, No. 12, December 2016 Rough Fuzzy C-means and Particle Swarm Optimization Hybridized Method for Information Clustering Problem F. Cai and F. J. Verbeek Section Imaging

More information

RPKM: The Rough Possibilistic K-Modes

RPKM: The Rough Possibilistic K-Modes RPKM: The Rough Possibilistic K-Modes Asma Ammar 1, Zied Elouedi 1, and Pawan Lingras 2 1 LARODEC, Institut Supérieur de Gestion de Tunis, Université de Tunis 41 Avenue de la Liberté, 2000 Le Bardo, Tunisie

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

On the Consequence of Variation Measure in K- modes Clustering Algorithm

On the Consequence of Variation Measure in K- modes Clustering Algorithm ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal Published By: Oriental Scientific Publishing Co., India. www.computerscijournal.org ISSN:

More information

An adjustable p-exponential clustering algorithm

An adjustable p-exponential clustering algorithm An adjustable p-exponential clustering algorithm Valmir Macario 12 and Francisco de A. T. de Carvalho 2 1- Universidade Federal Rural de Pernambuco - Deinfo Rua Dom Manoel de Medeiros, s/n - Campus Dois

More information

The Application of K-medoids and PAM to the Clustering of Rules

The Application of K-medoids and PAM to the Clustering of Rules The Application of K-medoids and PAM to the Clustering of Rules A. P. Reynolds, G. Richards, and V. J. Rayward-Smith School of Computing Sciences, University of East Anglia, Norwich Abstract. Earlier research

More information

Finding Rough Set Reducts with SAT

Finding Rough Set Reducts with SAT Finding Rough Set Reducts with SAT Richard Jensen 1, Qiang Shen 1 and Andrew Tuson 2 {rkj,qqs}@aber.ac.uk 1 Department of Computer Science, The University of Wales, Aberystwyth 2 Department of Computing,

More information

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University

More information