Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems

Similar documents
Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Support Vector Machines

The Research of Support Vector Machine in Agricultural Data Classification

Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework

Concurrent Apriori Data Mining Algorithms

A Combined Approach for Mining Fuzzy Frequent Itemset

Classifier Selection Based on Data Complexity Measures *

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

A New Approach For the Ranking of Fuzzy Sets With Different Heights

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

A Binarization Algorithm specialized on Document Images and Photos

Cluster Analysis of Electrical Behavior

From Comparing Clusterings to Combining Clusterings

Support Vector Machines

A Deflected Grid-based Algorithm for Clustering Analysis

Smoothing Spline ANOVA for variable screening

Clustering algorithms and validity measures

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

An Entropy-Based Approach to Integrated Information Needs Assessment


LinkSelector: A Web Mining Approach to. Hyperlink Selection for Web Portals

Unsupervised Learning

Query Clustering Using a Hybrid Query Similarity Measure

(1) The control processes are too complex to analyze by conventional quantitative techniques.

Association Analysis for an Online Education System

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

y and the total sum of

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

ABSTRACT. WEIQING, JIN. Fuzzy Classification Based On Fuzzy Association Rule Mining (Under the direction of Dr. Robert E. Young).

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Machine Learning: Algorithms and Applications

TOWARDS FUZZY-HARD CLUSTERING MAPPING PROCESSES. MINYAR SASSI National Engineering School of Tunis BP. 37, Le Belvédère, 1002 Tunis, Tunisia

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

A Simple Methodology for Database Clustering. Hao Tang 12 Guangdong University of Technology, Guangdong, , China

Module Management Tool in Software Development Organizations

Classification / Regression Support Vector Machines

TF 2 P-growth: An Efficient Algorithm for Mining Frequent Patterns without any Thresholds

Optimal Fuzzy Clustering in Overlapping Clusters

S1 Note. Basis functions.

Feature Reduction and Selection

NIVA: A Robust Cluster Validity

ApproxMGMSP: A Scalable Method of Mining Approximate Multidimensional Sequential Patterns on Distributed System

Performance Evaluation of Information Retrieval Systems

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Optimizing Document Scoring for Query Retrieval

An Optimal Algorithm for Prufer Codes *

Programming in Fortran 90 : 2017/2018

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

An Internal Clustering Validation Index for Boolean Data

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Graph-based Clustering

A Robust Method for Estimating the Fundamental Matrix

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

X- Chart Using ANOM Approach

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

Hierarchical clustering for gene expression data analysis

Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset

Generating Fuzzy Term Sets for Software Project Attributes using and Real Coded Genetic Algorithms

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

Load-Balanced Anycast Routing

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

A New Measure of Cluster Validity Using Line Symmetry *

A Clustering Algorithm for Chinese Adjectives and Nouns 1

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

A METHOD FOR FACTOR SCREENING OF SIMULATION EXPERIMENTS BASED ON ASSOCIATION RULE MINING

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Mathematics 256 a course in differential equations for engineering students

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

A Hierarchical Clustering and Validity Index for Mixed Data

Optimal Workload-based Weighted Wavelet Synopses

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Available online at Available online at Advanced in Control Engineering and Information Science

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

EXTENDED BIC CRITERION FOR MODEL SELECTION

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Machine Learning. Topic 6: Clustering

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Clustering on antimatroids and convex geometries

A Heuristic for Mining Association Rules In Polynomial Time*

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

A User Selection Method in Advertising System

Associative Based Classification Algorithm For Diabetes Disease Prediction

Meta-heuristics for Multidimensional Knapsack Problems

An Approach in Coloring Semi-Regular Tilings on the Hyperbolic Plane

An Anti-Noise Text Categorization Method based on Support Vector Machines *

Transcription:

Determnng Fuzzy Sets for Quanttatve Attrbutes n Data Mnng Problems ATTILA GYENESEI Turku Centre for Computer Scence (TUCS) Unversty of Turku, Department of Computer Scence Lemmnkäsenkatu 4A, FIN-5 Turku FINLAND Abstract: The problem of mnng assocaton rules for fuzzy quanttatve tems was ntroduced and an algorthm proposed n [5]. However, the algorthm assumes that fuzzy sets are gven. In ths paper we propose a method to fnd the fuzzy sets for each quanttatve attrbute n a database by usng clusterng technques. We present a scheme for fndng the optmal parttonng of a data set durng the clusterng process regardless of the clusterng algorthm used. More specfcally, we present an approach for evaluaton of clusterng parttons so as to fnd the best number of clusters for each specfc data set. Ths s based on a goodness ndex, whch assesses the most compact and well-separated clusters. We use these clusters to classfy each quanttatve attrbute nto fuzzy sets and defne ther membershp functons. These steps are combned nto a concse algorthm for fndng the fuzzy sets. Fnally, we descrbe the results of usng ths approach to generate assocaton rules from a real-lfe dataset. The results show that a hgher number of nterestng rules can be dscovered, compared to parttonng the attrbute values nto equal-szed sets. Key-Words: assocaton rules, fuzzy tems, quanttatve attrbutes, clusterng Introducton Snce knowledge can often be expressed n a more natural way by usng fuzzy sets, many decson support problems can be greatly smplfed. We attempt to take advantage of fuzzy sets n knowledge dscovery from databases. One mportant topc n knowledge dscovery and decson support research s concerned wth the dscovery of nterestng assocaton rules []. An nterestng assocaton rule descrbes an nterestng relatonshp among dfferent attrbutes. Gven a set of transactons where each transacton s a set of tems, an assocaton rule s an expresson of the form X Y, where X and Y are sets of tems. An example of an assocaton rule s: 4% of transactons that contan beer and potato chps also contan dapers; 5% of all transactons contan all of these tems. Here 4% s called the confdence of the rule, and 5% the support of the rule. The problem s to fnd all assocaton rules that satsfy user-specfed mnmum support and mnmum confdence constrants. The problem of mnng boolean assocaton rules over supermarket data was ntroduced n [], and later broadened n [3], for the case of databases consstng of categorcal attrbutes alone. In practce the nformaton n many, f not most, databases s not lmted to categorcal attrbutes, but also contans much quanttatve data. The problem of mnng quanttatve assocaton rules was ntroduced and an algorthm proposed n [4]. The algorthm nvolves dscretzng the domans of quanttatve attrbutes nto ntervals n order to reduce the doman nto a categorcal one. An example of a rule accordng to ths defnton would be: % of marred people between age 5 and 7 have at least cars. However, these ntervals may not be concse and meanngful enough for human experts to easly obtan nontrval knowledge from those rules dscovered. In [5], we showed a method to handle quanttatve attrbutes usng a fuzzy approach. Instead of usng ntervals, the method employs lngustc terms to represent the revealed regulartes and exceptons. We assgned each quanttatve attrbute several fuzzy sets whch characterze t. Fuzzy sets provde a smooth transton between a member and non-member of a set. The fuzzy assocaton rule s also easly understandable to a human because of the lngustc terms assocated wth the fuzzy sets. Usng the fuzzy set concept, the above example could be rephrased e.g. % of marred old people have several cars. However, the algorthm proposed n [5] for fuzzy assocaton rule mnng suffers from the followng problem. The user or an expert must provde ths algorthm the requred fuzzy sets of the quanttatve

attrbutes and ther correspondng membershp functons. It s unrealstc to assume that experts can always provde the fuzzy sets of the quanttatve attrbutes n the database for fuzzy assocaton rule mnng. To deal wth ths problem, we ntend to fnd the fuzzy sets by usng clusterng technques. In ths paper, we present an approach for clusterng scheme evaluaton. It ams at evaluatng the schemes produced by a specfc clusterng algorthm, assumng dfferent nput parameter values. These schemes are evaluated usng a new clusterng scheme valdty ndex, whch we defne. Our goal t s not to propose a new clusterng algorthm or to evaluate a varety of clusterng algorthms, but to produce the clusterng scheme wth the most compact and well-separated clusters for any gven algorthm. The remander of the paper s organzed as follows. In the next secton we descrbe the proposed goodness ndex for clusterng scheme evaluaton. In Secton 3, we explot the dscovered cluster centers, to classfy the quanttatve attrbute values nto fuzzy sets, and show a method to fnd the correspondng membershp functon for each fuzzy set. Then we formulate our approach nto a precse algorthm n Secton 4. In Secton 5 the expermental results are reported, comparng obtaned assocaton rules both qualtatvely and quanttatvely. The paper ends wth a bref concluson n Secton 6. Clusterng Scheme Evaluaton The objectve of the clusterng methods s to provde n some sense optmal parttons of a data set. In general, they should search for well separated clusters whose members are close to each other. Another problem n clusterng s to decde the optmal number of clusters that fts best a data set. The majorty of clusterng algorthms produce a parttonng based on the nput parameters (e.g. number of clusters, mnmum densty) that fnally lead to a fnte number of clusters. Thus, the applcaton of an algorthm assumng dfferent nput parameter values results n dfferent parttons of a partcular data set, whch are not easly comparable. A soluton to ths problem s to run the algorthm repettvely wth dfferent nput parameter values and compare the results aganst a well-defned valdty ndex. A number of cluster valdty ndces are descrbed n the lterature. A cluster valdty ndex for crsp clusterng proposed n [6], attemps to dentfy compact and separated clusters. Other valdty ndces for crsp clusterng have been proposed n [7] and [8]. The mplementaton of most of these measures s very expensve computatonally, especally when the number of clusters and number of objects n the data set grow very large [9]. Other valdty measures are proposed n [], []. We should menton that the evaluaton of proposed measures and the analyss of ther relablty have been qute lmted. In the followng, we defne a goodness ndex for evaluatng clusterng schemes based on the valdty ndex defned for the fuzzy c-means method (FCM) n []. We use the same concepts for valdaton, but the goodness ndex can be used for any clusterng method, not just for FCM. Assume that we study a quanttatve attrbute X. Defnton The varance of an attrbute X, σ X, s defned as denoted ( ) ( X) = ( x k x) σ, n k= where x, K, xn are the attrbute nstances, and x s the mean gven by n x = x k n k= Defnton The varance of cluster contanng elements X = x,, x } s gven by K { n n n k= ( x r ) k σ ( X, r) =, n where r s the center of cluster, havng n elements. Defnton 3 The average scatterng (separaton) for c clusters s defned as c = σ Scat( X, R) = c σ where R s the set of c cluster centers. ( X, r) ( X) Scat(X,R) ndcates the average compactness of clusters. A small value for ths term ndcates compact clusters and as the scatterng wthn clusters ncreases (they become less compact) the value of Scat(X,R) also ncreases. Defnton 4 The total separaton between clusters s gven by D c max c Ds( R) = ( j = r r j ), Dmn = where D max s the maxmum, and D mn s the mnmum dstance between cluster centers.,

The term total separaton sounds lke a measure that we want to maxmze. However, here the opposte holds: a smaller value s better. Ds(R) ndcates the total separaton (scatterng) between the c clusters, and generally, ths term wll ncrease wth the number of clusters. Now, we can defne our goodness ndex based on the last two defntons. Defnton 5 The goodness ndex for cluster R wthn set X s as follows: G ( X, R) = α Scat( X, R) Ds( R), where α s a weghtng factor equal to Ds(c max ), c max s the maxmum number of nput clusters. The goodness ndex uses cluster separaton as an ndcaton of the average scatterng between clusters. Mnmzng the separaton thus also tends to mnmze the possblty to select a cluster scheme wth sgnfcant dfferences n cluster dstances. Snce the two terms of goodness ndex are of dfferent ranges, a weghtng factor s needed n order to ncorporate both terms n a balanced way. (Note that the nfluence of the weghtng factor s an ssue for further study as mentoned n [].) Goodness Index.5.45.4.35.3.5 Age 3 4 5 6 7 8 9 Number of Clusters Fg.: Example of Goodness Index for the Attrbute Age For example, Fg. shows the values of the goodness ndex as a functon of the number of clusters for attrbute Age, whch s gven n Secton 5. We can see that the best number of clusters s three for ths dataset. 3 Determnng Fuzzy Sets by Usng the Dscovered Cluster Scheme After we have obtaned the best cluster scheme (.e. centers of clusters), we can use ths to classfy the quanttatve attrbute values nto c fuzzy sets. We dvde the attrbute nterval nto c sub-ntervals by usng the dscovered r values, wth a coverage of p percent between two adjacent ones, and gve each subnterval a symbolc name related to ts poston (Fg.). r d - d MnValue r d3 MaxValue (low) (mddle) (hgh) Fg.: Example of the proposed fuzzy parttons To specfy our heurstc method, we gve the followng defntons. Defnton 6 The effectve upper bound, denoted d for fuzzy set, s gven by: d ( p)( r r ) = r.5 d, where p s the overlap parameter n %, and r s the center of cluster, = {,, K, c}. d s also the fuzzy lower bound of cluster. Defnton 7 The effectve lower bound, denoted j d for fuzzy set j, s as follows: ( p)( r r ) j = rj.5 j j d, where p s the overlap parameter n %, and r j s the center of cluster j, j = {,3, K,c}. d j s also the fuzzy upper bound of cluster j-. p Notce that.5( p ) =. These defntons become clear by nspectng Fg.. To quote an example, we classfy the attrbute Age nto three fuzzy sets as gven n Table, where Age ranges from 5 to 9. Table : The ranges of fuzzy set Age (p = 3%) Fuzzy set Range Cluster center (Age,young) 5 to 43.95 3.65 (Age,mddle) 38.8 to 65.8 5.58 (Age,hgh) 58.77 to 9 73.99 In the followng, we descrbe how to generate the correspondng membershp functon for each fuzzy set of a quanttatve attrbute. Let { r, r, K, r, K, r c } be the cluster centers for a quanttatve attrbute. We use the followng formulas to defne the requred membershp functons for each fuzzy set. For the fuzzy set wth cluster center r, the membershp functon for element x s gven by - r 3

f x d d x f ( r, x) = f d < < x d d d f x d For the fuzzy set wth cluster center r c, the membershp functon for element x s gven by f x dc dc ( ) = x f rc, x f d < x< d c- c dc dc f x dc For the fuzzy set wth cluster center r, where < c, the membershp functon for element x s gven by f x d d x f d < < x d d d f ( r ) =, x f d x d d x f d < < x d d d f x d 4 An Algorthm for Fndng Fuzzy Sets by Usng a Clusterng Scheme Goodness Index In Secton we have defned a goodness ndex for clusterng scheme evaluaton. We explot ths ndex durng the clusterng process n order to defne the optmal number of clusters for a quanttatve attrbute. More specfcally, we frst defne the range of nput parameters (e.g. number of clusters) of a clusterng algorthm. Let parameter c denote the number of clusters, to be optmzed. The range of values for c s defned by an expert, so that the clusterng schemes produced are compatble wth expected attrbute parttons. Then, a clusterng algorthm s performed for each value c and the results of clusterng are evaluated usng goodness ndex G. We use the dscovered most compact and well-separated clusters to classfy each quanttatve attrbute nto fuzzy sets. After that, we can generate the correspondng membershp functon for each fuzzy set. The steps for fndng fuzzy sets can be summarzed as: () Fndng the best clusterng scheme by usng a goodness ndex for each quanttatve attrbute, () constructng fuzzy sets wth the c cluster centers, and (3) dervng the correspondng membershp functons. Man algorthm (C alg, X, c mn, c max, p) (*Frst phase: fndng the optmal number of clusters and cluster centers*) Intalze: c c max repeat Run the clusterng algorthm C alg for data set X to produce c cluster centers R Compute the goodness ndex G(X, R) f (c = c max ) then α Ds(c max ) G opt G(c) c opt c endf else f G(c) < G opt then c opt c G opt G(c) endf c c- untl c = c mn- (*Second phase: constructng fuzzy sets wth the c cluster centers*) for := to c opt do f < c opt then determne d by usng p f then determne d by usng p endfor (*Thrd phase: generatng membershp functon for each fuzzy set*) for each x X do for each r R do Compute the correspondng membershp functon f(r, x) endfor endfor End algorthm Parameters: C alg = the clusterng algorthm X = { x, x, K, x n } the set of attrbute values to be clustered c mn = the mnmum number of clusters c max = the maxmum number of clusters p = overlap parameter n % 5 Expermental Results We assessed the effectveness of our approach by expermentng wth a real-lfe dataset. The data set comes from a research by the U.S. Census Bureau. The data had 6 quanttatve attrbutes for 63756 famles: age of famly head n years ( head s the reference person n a famly), number of persons, chldren n famly, educaton level of head, head's personal ncome and famly ncome.

Goodness Index.8.6.4...8.6.4 IncHead IncFam 3 4 5 6 7 8 9 Number of Clusters Fg.3: Goodness ndex as a functon of the number of clusters Frst, we evaluate the proposed approach for fndng the optmal clusterng scheme usng the above data set. The clusterng schemes are dscovered usng the C-means algorthm whle ts nput parameters (number of clusters) take values between and 5 for the attrbutes FamPers, NumKds, and between and 9 for the others (see Fg.3 for attrbutes IncHead and IncFam). Applyng the frst phase of our algorthm (see n Secton 4.), Table shows the best number of clusters for dfferent attrbutes. After fndng the best cluster scheme, we can create the fuzzy sets for each quanttatve attrbute by usng the dscovered cluster centers. For example, the ranges of fuzzy set of Age s shown n the Table. These ranges nclude all values where the membershp functon s postve. Table : The best number of clusters Attr. No.of Cluster centers clust. Age 3 3.65, 5.58, 73.99 FamPers.73, 4.48 NumKds 3.6,.4, 4.5 EdHead 3 34.8, 39.39, 43.35 IncHead 4 3436,48,84933,7354 IncFam 4 656,47396,88938,6794 In the followng, we llustrate how the above concept (clusterng-based parttonng) gves a larger number of frequent temsets and nterestng assocaton rules than the case when we don t use the proposed approach. In the latter case we use the same number of attrbute elements for each nterval (quantle-based parttonng). Note that the same defntons of membershp functons are used for both methods as descrbed n Secton 3. In dervng the assocaton rules, we apply the algorthm descrbed n [5], developed for fuzzy attrbutes. It s an extenson of the well-known technque based on ncrementally fndng the frequent sets [3]. Fg.4(a) and Fg.4(b) show the average support and the number of frequent temsets for dfferent mnmum support thresholds. As expected, the average support ncreases and the number of frequent temset decreses as the mnmum support ncreases from % to 5%. We can see that the clusterng-based parttonng gves a hgher number of frequent temsets. However, the quantle-based parttonng gves hgher average support values f the mnmum support s between.35 and.5. Note, however, that t s generated by only two frequent temsets. Average Support Number of Frequent Itemsets.8.7.6.5.4.3.. 6 4 8 6 4 clusterng method quantle method..5..5.3.35.4.45.5 Mnmum Support (a) clusterng method quantle method..5..5.3.35.4.45.5 Mnmum Support (b) Fg.4: (a) Average Support (b) Number of Frequent Itemsets Fg.5(a) shows the average confdence for dfferent mnmum confdence thresholds. The result s qute smlar to that n Fg.4(a). We can see that the clusterng-based parttonng gves n most cases hgher average confdence values. Fg.5(b) shows the number of generated rules as a functon of mnmum confdence threshold, for both clusterng and quantle case. The mnmum support was set to 3%. The results are as expected: the numbers of rules for the clusterng-based parttonng case are larger, but both decrease wth ncreasng confdence threshold.

Average Confdence Number of Interestng Rules..8.6.4. 6 4 8 6 4 clusterng method quantle method..3.4.5.6.7.8.9 Mnmum Confdence (a) clusterng method quantle method..3.4.5.6.7.8.9 Mnmum Confdence (b) Fg.5: (a) Average Confdence (b) Number of Interestng Rules Fnally, we show some nterestng rules. The mnmum support was set to 3% and the mnmum confdence to 5%. IF EdHead s medum THEN IncHead s low IF IncHead s medum THEN IncFam s medum IF FamPers s low AND NumKds s low THEN EdHead s low IF FamPers s low AND NumKds s low AND IncHead s low THEN IncFam s low We see that the rules are very easy to read and understand for anyone. Ths s our man goal n usng fuzzy parttons for attrbutes. Of course, the usefulness of the rules can only be judged by a human. 6 Concluson The problem of mnng assocaton rules for fuzzy quanttatve tems was ntroduced n [5]. However, the algorthm assumes that the fuzzy sets are gven. In ths paper we have proposed a method to fnd the fuzzy sets for each quanttatve attrbute n a database by usng clusterng technques. We defned the goodness ndex G for clusterng scheme evaluaton, based on two crtera: compactness and separaton. The goodness ndex s a varant of the ndces defned for the fuzzy c-means algorthm n [], adapted to crsp clusterng algorthms. Our approach s ndependent of the clusterng algorthm used to partton the data set. After havng obtaned the best cluster scheme, we exploted the dscovered cluster centers, to classfy the quanttatve attrbute values nto fuzzy sets, and showed a method to fnd the correspondng membershp functon for each fuzzy set dscovered. Then we combned the dfferent steps nto an explct algorthm. The expermental results demonstrated that by usng the goodness ndex G as a bass for generatng clusters (and thereby fuzzy sets), a hgher number of fuzzy assocaton rules can be dscovered. Accordng to our observatons, we clam that the generated rules are very meanngful for real-lfe data sets. References: [] G. Patetsky-Shapro, W.J. Frawley, Knowledge Dscovery n Databases. AAAI Press, 99. [] R.Agrawal, T.Imelnsk, A.Swam, Mnng assocaton rules between sets of tems n large databases. Proc. of ACM SIGMOD, 993, pp. 7-6 [3] R. Agrawal, R. Srkant, Fast algorthms for mnng assocaton rules n large databases. Proc. of the th VLDB Conference, 994, pp. 487-499. [4] R. Srkant, R. Agrawal, Mnng quanttatve assocaton rules n large relaton tables. Proceedngs of ACM SIGMOD, 996, pp. -. [5] A. Gyenese, Mnng Weghted Assocaton Rules for Fuzzy Quanttatve Items. Proceedngs of PKDD Conference, Lyon,, pp. 46-43. [6] J.C. Dunn, Well separated clusters and optmal fuzzy parttons. J.Cybern, 974, pp. 95-4. [7] R.N. Dave, Valdatng fuzzy parttons obtaned through c-shells clusterng. Pattern Recognton Letters, Vol.7, 996, pp. 63-63. [8] Z. Huang, A Fast Clusterng Algorthm to Cluster very Large Categorcal Data Sets n Data Mnng. DMKD, 997. [9] X.L.Xe, G.Ben, A Valdty measure for Fuzzy Clusterng. IEEE Trans. on Pattern Analyss and Machne Intellgence, Vol.3, No.4, 99. [] Gath, B. Geva, Unsupervsed Optmal Fuzzy Clusterng. IEEE Trans. on Pattern Analyss and Machne Intellgence, Vol., No.7, 989. [] R. Reazee, Leleveldt, Reber, A new cluster valdty ndex for the fuzzy c-mean. Pattern Recognton Letters, 9, 998, pp. 37-46.