An Overview of Mining Fuzzy Association Rules

Size: px

Start display at page:

Download "An Overview of Mining Fuzzy Association Rules"

Kristian Malone
5 years ago
Views:

1 An Overview of Mining Fuzzy Association Rules Tzung-Pei Hong and Yeong-Chyi Lee Abstract Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Many types of knowledge and technology have been proposed for data mining. Among them, finding association rules from transaction data is most commonly seen. Most studies have shown how binary valued transaction data may be handled. Transaction data in real-world applications, however, usually consist of fuzzy and quantitative values, so designing sophisticated data-mining algorithms able to deal with various types of data presents a challenge to workers in this research field. This chapter thus surveys some fuzzy mining concepts and techniques related to association-rule discovery. The motivation from crisp mining to fuzzy mining will be first described. Some crisp mining techniques for handling quantitative data will then be briefly reviewed. Several fuzzy mining techniques, including mining fuzzy association rules, mining fuzzy generalized association rules, mining both membership functions and fuzzy association rules, will then be described. The advantages and the limitations of fuzzy mining will also be discussed. 1 Introduction Knowledge discovery in databases (KDD) has thus become a process of considerable interest in recent years, as the amounts of data in many databases have grown tremendously large. KDD means the application of nontrivial procedures for identifying effective, coherent, potentially useful, and previously unknown patterns in large databases [20, 45]. The KDD process generally consists of three phases: pre-processing, data mining and post-processing [17, 2]. Among them, data mining plays a critical role to KDD. Depending on the types of databases to be processed, mining approaches may be classified as working on transaction databases, temporal databases, relational databases, and multimedia databases, among others [31]. On the other hand, depending on the classes of knowledge derived, mining approaches may be classified as finding association rules, classification rules, clustering rules, and sequential patterns [14], among others. Among them, finding association rules in transaction databases is most commonly seen in data mining [41, 50, 18, 1, 42, 25, 48, 51]. H. Bustince et al., (eds.), Fuzzy Sets and Their Extensions: Representation, Aggregation and Models. C Springer

2 398 T.-P. Hong, Y.-C. Lee An association rule is an expression X Y,whereX is a set of items and Y is usually a single item. It means in the set of transactions, if all the items in X exist in a transaction, then Y is also in the transaction with a high probability. Most studies have shown how binary valued transaction data may be handled. Transactiondata in real-world applications, however, usually consist of fuzzy and quantitative values. Designing sophisticated data-mining algorithms able to deal with various types of data presents a challenge to workers in this research field. Fuzzy set theory was first proposed by Zadeh in 1965 [58]. It has been used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning [35]. It is primarily concerned with quantifying and reasoning using natural language in which words can have ambiguous meanings. This can be thought of as an extension of traditional crisp sets, in which each element must either be in or not in a set. The theory has been applied in many fields such as manufacturing, engineering, diagnosis, economics, among others [35, 37, 59]. Several fuzzy learning algorithms for inducing rules from given sets of data have been designed and used with good effects in specific domains [33, 34, 19, 24, 46, 29, 53]. Strategies based on decision trees [40] were proposed in [16, 47, 21]. Wang et al. also proposed a fuzzy version space learning strategy for managing vague information [53]. Recently, the fuzzy set theory has also been applied to data mining to find interesting association rules or sequential patterns in transaction data with quantitative values [23, 29, 28, 8, 6, 3, 57]. This chapter thus attempts to survey some fuzzy mining concepts and techniques about association rules. The remaining part of this chapter is organized as follows. Some related crisp mining approaches for association rules are first reviewed in Sect. 2. Some fuzzy mining techniques, including mining fuzzy association rules, mining fuzzy generalized association rules, mining both membership functions and fuzzy association rules, integrating clustering with fuzzy association rules, are then described in Sects. 3 to 5. The advantages and the limitations of fuzzy mining are discussed and conclusions are given in Sect Some Crisp Data-mining Approaches for Association Rules The goal of data mining is to discover important associations among items such that the presence of some items in a transaction will imply the presence of some other items. To achieve this purpose, Agrawal and his co-workers proposed the famous Apriori algorithm that based on the concept of large itemsets to find association rules in transaction data [41, 50]. They divided the mining process into two phases. In the first phase, candidate itemsets were generated and counted by scanning the transaction data. If the number of an itemset appearing in the transactions (called the support of an itemset) was larger than a pre-defined threshold value (called minimum support), the itemset was considered a large itemset. Itemsets containing only one item were processed first. Large itemsets containing only single items were then combined to form candidate itemsets containing two items. This process was repeated until all large itemsets had been found. In the second phase, association rules were induced from the large itemsets found in the first

3 An Overview of Mining Fuzzy Association Rules 399 phase. All possible association rules from each large itemset were formed, and their confidence values were calculated. The confidence was defined as the support of the itemset in the left hand side of the rule over the support of the itemset in the whole rule. Those rules with calculated confidence values larger than a predefined threshold (called minimum confidence) were output as association rules. Han et al. proposed the Frequent-Pattern-tree structure (FP-tree) for efficiently mining association rules without generation of candidate itemsets [22]. The approach compressed a database into a tree structure storing only large items. Three steps were involved in FP-tree construction. The database was first scanned to find all items with their frequency. The items with their supports larger than a predefined minimum support were selected as large 1-itemsets (items). Next, the large items were sorted in descending frequency. At last, the database was scanned again to construct the FP-tree according to the sorted order of large items. The construction process was executed tuple by tuple, from the first transaction to the last one. After all transactions were processed, the FP-tree was completely constructed. After the FP-tree was constructed from a database, a mining procedure called FP-Growth [22] was executed to find all large itemsets. FP-Growth did not need to generate candidate itemsets for mining, but derived frequent patterns directly from the FP-tree. It was a recursive process, handling the frequent items one by one. A conditional FP-tree was generated for each frequent item, and from the tree the large itemsets with the processed item could be recursively derived. Many mining methods for finding association rules based on the FP-tree structure have also been proposed [32, 4, 49, 60]. Cai et al. proposed weighted mining to reflect different importance on items [36]. Each item was attached a numerical weight given by users. Weighted supports and weighted confidenceswere then defined to determine interesting association rules. In the above approach, the items in transactions are binary. That is, the value of an item is treated as 1 if the item is present in a transaction, and as 0 if it is not present. Items in real applications may, however, be presented with rich data types, such as quantitative and categorical types. Some approaches were then proposed for handling items with quantitative and categorical values while discovering association rules. Piatetsky-Shapiro proposed a mining approach, which partitioned quantitative attributes into intervals [27]. The value of each quantitative attribute could be presented as a range rather than a single value, and was allowed to appear in the left-hand side (antecedent) of a rule. Srikant and Agrawal then proposed a method for mining association rules from transactions with quantitative attributes [51]. Their proposed method first determined the number of partitions for each quantitative attribute, and then mapped all possible values of each attribute into a set of consecutive integers. The number of intervals was determined by a partial completeness factor, which was used to evaluate the lost information from partition. Adjacent intervals were also allowed to be merged into a large one. Their approach then found large itemsets whose support values were greater than the user-specified minimum-support levels. These large itemsets were then processed to generate association rules. An interest measure was also introduced to get interesting association rules.

4 400 T.-P. Hong, Y.-C. Lee Fukuda et al. proposed a mining approach to discover optimized association rules from quantitative data [18]. Two problems were discussed, one for mining optimized support rules and the other for mining optimized conference rules. Mining optimized support rules focused on finding an association rule with a maximum support value on the condition that the confidence value of the rule had to be larger than a given minimum confidence. On the other hand, mining optimized confidence rules emphasized on finding an association rule with a maximum confidence value on the condition that the support value of the rule had to be larger than a given minimum support. The authors first found the ranges of quantitative attributes by sampling, sorting and filling data into buckets before the mining process proceeded. Association rules containing quantitative attributes in the left-hand side were then found by an effective search approach. In addition, Rastogi and Shim extended Fukuda et al. s approach to find the association rules with disjunction conditions [30]. Lent et al. proposed a geometry-based algorithm to deal with the problem of clustering association rules from quantitative attributes [11]. Each association rule was initially represented by a primitive attribute value (or range). Adjacent association rules were then clustered together to form generalized rules. Many other researches are still in progress. 3 Mining Fuzzy Association Rules As mentioned above, the fuzzy set theory is concerned with quantifying and reasoning using natural language. It is thus very suitable to handle quantitative values by fuzzy sets. Several fuzzy mining approaches have thus been proposed to find interesting association rules or sequential patterns in transaction data with quantitative values. In this section, we focus on the data mining approaches for finding association rules with predefined membership functions. Chan and Au proposed an F-APACS algorithm to mine fuzzy association rules [23]. They first transformed quantitative attribute values into linguistic terms and then used the adjusted difference analysis to find interesting associations among attributes. It had the advantage that the user-specified thresholds were not needed since the statistical analysis was used. In addition, both positive and negative associations could be found. Kuok et al. proposed a fuzzy mining approach to handle numerical data in databases with attributes and derived fuzzy association rules [3]. At nearly the same time, Hong et al. proposed a fuzzy mining algorithm to mine fuzzy rules from quantitative transaction data [29]. Basically, the fuzzy mining algorithms first used membership functions to transform each quantitative value into a fuzzy set in linguistic terms. The algorithm then calculated the scalar cardinality of each linguistic term on all the transaction data. The mining process based on fuzzy counts was then performed to find fuzzy association rules. Hong et al. described the fuzzy mining steps in details. Their fuzzy mining algorithm is described below as a good reference [8].

5 An Overview of Mining Fuzzy Association Rules 401 The Fuzzy Data Mining Algorithm: INPUT: A set of n training data, each with m attribute values, a set of membership functions, a predefined minimum support value α, and a predefined confidence value λ. OUTPUT: A set of fuzzy association rules. STEP 1: Transform the quantitative value v (i) j of each transaction datum D (i) for each attribute A j, j=1 to m, into a fuzzy set f (i) j represented as ( (i) ) f j 1 R j1 + f (i) j 2 R j2 + + f (i) j l R jl using the given membership functions, where i presents the current transaction number that is processed, R jk is the k-th fuzzy region (linguistic term) of attribute A j, f (i) jk is the value of the membership function in R jk for the value v (i) j,andl (= A j )isthe number of fuzzy regions for A j. STEP 2: Calculate the count of each attribute region (linguistic term) R jk in the transaction data: count jk = n i=1 f (i) jk. STEP 3: Collect each attribute region (linguistic term) to form the candidate set C 1. STEP 4: Check whether count jk of each R jk (1 j m and 1 k A j )is larger than or equal to the predefined minimum support value α. IfR jk satisfies the above condition, put it in the set of large 1-itemsets (L 1 ). That is: L 1 ={R jk count jk α, 1 j m and 1 k A j }. STEP 5: IF L 1 is not null, then do the next step; otherwise, exit the algorithm. STEP 6: Set r=1, where r is used to represent the number of items kept in the current large itemsets. STEP 7: Join the large itemsets L r to generate the candidate set C r+1 in a way similar to that in the apriori algorithm [41] except that two regions (linguistic terms) belonging to the same attribute cannot simultaneously exist in an itemset in C r+1. Restated, the algorithm first joins L r and L r under the condition that r-1 items in the two itemsets are the same and the other one is different. It then keeps in C r+1 the itemsets which have all their sub-itemsets of r items existing in L r and do not have any two items R jp and R jq (p q) of the same attribute R j. STEP 8: Do the following substeps for each newly formed (r+1)-itemset s with items (s 1, s 2,, s r+1 )inc r+1 :

6 402 T.-P. Hong, Y.-C. Lee (a) Calculate the fuzzy value of each transaction data D (i) in s as f s (i) = f s (i) 1 f s (i) 2 f s (i) r+1,where f s (i) j is the membership value of D (i) in region s j. If the minimum operator is used for the intersection, then: f (i) s = Min r+1 j=1 f (i) s j. (b) Calculate the count of s in the transactions as: count s = (c) If count s is larger than or equal to the predefined minimum support value α, put s in L r+1. STEP 9: IF L r+1 is null, then do the next step; otherwise, set r = r+1 and repeat STEPs 6 to 8. STEP 10: Collect the large itemsets together. STEP 11: Construct association rules for each large q-itemset s with items (s 1, s 2,...,s q ), q 2, using the following substeps: n i=1 f (i) s (a) Form each possible association rule as follows: s 1 s k 1 s k+1 s q, k = 1 to q. (b) Calculate the confidence values of all association rules using: ni=1 ( f (i) s 1 ni=1 f s (i) f s (i) k 1 f s (i) k+1 f s (i) q ). STEP 12: Output the association rules with confidence values larger than or equal to the predefined confidence threshold λ. In the above fuzzy mining approach, all the linguistic terms are used. As an alternative, each item can use only the linguistic term with the maximum cardinality in later mining processes [28]. It can thus keep the same number of items as the original attributes. The alternative therefore focuses on the most important linguistic terms and can reduce its time complexity. Its derived set of association rules is, however, more incomplete than that by considering all the linguistic items. Trade-off thus exists between the rule completeness and the time complexity. In addition, items may have different importance. Yue et al. thus extended the above concept to find fuzzy association rules with weighted items from transaction data [57]. Each item was given a weight to represent the importance of an item, and each weight was in a range of [0, 1]. They also adopted Kohonen self-organized mapping to derive fuzzy sets for numerical attributes. Weighted supports and weighted conferences were utilized to discover weighted fuzzy association rules.

7 An Overview of Mining Fuzzy Association Rules 403 Hong et al. then proposed a mining approach for extracting interesting weighted fuzzy association rules from transactions, with the parameters (minimum support and minimum confidence) needed in the mining process were given in linguistic terms [6]. Items were evaluated by managers as linguistic terms to reflect their importance, which were then transformed as fuzzy sets of weights. The approach then transformed linguistic weighted items, minimum supports and minimum confidences into fuzzy sets, then filtered weighted large itemsets out by fuzzy operations. Weighted association rules with linguistic supports and confidences were then derived from the weighted large itemsets. 4 Mining Fuzzy Generalized Association Rules Most mining algorithms for association rules focused on single-concept levels. However, mining multiple-concept-level rules may lead to discovery of more specific and important knowledge from data. Relevant item taxonomies are usually predefined in real-world applications and can be represented by hierarchy trees. Terminal nodes on the trees represent actual items appearing in transactions; internal nodes represent classes or concepts formed by lower-level nodes. A simple example is given in Fig. 1. In this example, food falls into two classes: brink and bread. Brink can be further classified into milk and juice. Similarly, clothes are divided into jackets and T-shirts. Only the terminal items (milk, juice, bread, jacket and T-shirt) can appear in transactions. Srikant and Agrawal thus proposed a method for finding generalized association rules, which included generalized items (the terminal nodes in taxonomy) [48]. Their mining process could be divided into four phases. In the first phase, ancestors of items in each given transaction were added according to the predefined taxonomy. In the second phase, large itemsets were generated in a way similar to the Apriori algorithm. In the third phase, all possible generalized association rules were induced from the large itemsets found in the second phase. The rules with calculated confidence values larger than the minimum confidence were kept. In the fourth phase, Food Clothes Drink Bread Jackets T-shirts Fig. 1 An example of predefined taxonomic structures Milk Juice

8 404 T.-P. Hong, Y.-C. Lee uninteresting association rules were pruned and interest rules were output according to the following interest requirements: A rule that had no ancestor rules (by replacing the items in a rule with their ancestors in the taxonomy) was mined out; The support of a rule was R times larger than the expected supports of its ancestor rules, where R was called a minimum interest; The confidence of a rule was R times larger than the expected confidences of its ancestor rules. In addition, Han and Fu also proposed a method for finding level-crossing association rules at multiple levels [1]. Nodes in predefined taxonomies were first encoded using sequences of numbers and the symbol * according to their positions in the hierarchy tree. A top-down progressively deepening search approach was then used to explore level-crossing association relationships. Wei et al. considered the partial relationships possibly existing in taxonomy [56]. An item may partially belong to more than one parent item. For instance, tomato may partially belong to both fruit and vegetable with different degrees. Wei et al. thus defined a fuzzy taxonomic structure and considered the extended degrees of support, conference and interest measures for mining generalized association rules. Hong et al. proposed a fuzzy multiple-level mining algorithm for extracting implicit knowledge from transactions stored as quantitative values [5]. The proposed generalized fuzzy mining algorithm was based on Srikant and Agrawal s approach to find fuzzy interesting rules from quantitative data. The quantitative items may be from any level of the given taxonomy. They also proposed another fuzzy multiple-level mining algorithm which adopted a top-down progressively deepening approach [28]. Shen et al. devised an algorithm for mining generalized association rules with fuzzy taxonomic structure from quantitative databases based on Wei s algorithm [39]. They handled the quantitative data by Srikant and Agrawal s algorithm [51], which partitioned quantitative attributes into several intervals. Kaya et al. then extended Hong et al s approaches [6, 43] for discovering multi-cross-level fuzzy weighted association rules [9]. In summary, mining fuzzy generalized association rules is a little different from mining fuzzy association rules in the following ways. 1. Non-terminal items must be processed. These items are usually considered expanded items. 2. The candidate set C 2 must be processed in a particular way. The two items in an itemset in C 2 must not have relationships in the taxonomy. This check needs to be done only for C 2. The other candidate sets derived from C 2 will not have hierarchical relationships due to the characteristic of sub-itemset checking. 3. Cross-level association rules need to be found. 4. The interestingness of association rules must consider generalization criteria from the taxonomy. Interest requirements are checked to remove uninteresting rules.

9 An Overview of Mining Fuzzy Association Rules Mining both Membership Functions and Fuzzy Association Rules The proposed approaches in the above sections mined fuzzy rules under a given set of membership functions. The given membership functions had a critical influence on the final mining results. Although many approaches for learning membership functions were proposed [12, 13, 38, 54], most of them were usually used for classification or control problems. Recently, researches about fuzzy data-mining algorithms for extracting both association rules and membership functions from quantitative transactions have been proposed. For example, Wang et al. tuned membership functions for intrusion detection systems based on similarity of association rules [55]. Kaya et al. proposed a GA-based clustering method to derive a predefined number of membership functions for getting a maximum profit within an interval of user specified minimum support values [7]. In their approach, the membership functions of quantitative attributes were obtained by GAs and were then used to discover fuzzy association rules. Its goal was to output the membership functions which would generate the maximized number of large itemsets. For this purpose, the parameter values of membership functions of the quantitative attributes were encoded into a real-valued string for evolving. Hong et al. proposed several algorithms to dynamically adapt membership functions by genetic algorithms and used them to fuzzify the quantitative transactions [44, 26]. They proposed a GA-based framework for searching membership functions suitable for mining problems and then used the final best set of membership functions to mine association rules. The proposed framework is shown in Fig. 2. The proposed framework maintained a population of sets of membership functions, and used the genetic algorithm to automatically derive the resulting one. It first transformed each set of membership functions into a fixed-length string. It then chose appropriate strings for mating, gradually creating good offspring membership function sets. The offspring membership function sets then underwent recursive evolution until a good set of membership functions has been obtained. The fitness was evaluated by the number of large 1-itemsets and the suitability of membership functions. The suitability of membership functions was composed of two terms, overlap and coverage. The overlap ratio of two membership functions was defined as the overlap length divided by half the minimum span of the two functions. If the overlap length was larger than half the span, then these two membership functions were thought of as a little redundant. Appropriate punishment was thus considered in this case. The coverage ratio of a set of membership functions for an item was defined as the coverage range of the functions divided by the maximum quantity of that item in the transactions. The more the coverage ratio was, the better the derived membership functions were. Besides, a larger number of 1-itemsets would usually result in a larger number of all itemsets with a higher probability, which would thus usually imply more interesting association rules. The evaluation by 1-itemsets was, however, faster than that by all itemsets or interesting association rules. Using the number of large 1-itemsets could thus achieve a trade-off between execution time and rule interestingness.

10 406 T.-P. Hong, Y.-C. Lee Linguistic terms Minimum support Minimum confidence Mining Membership Functions PC Membership Function Set 1 Membership Function Set 2 Membership Function Set 3... Membership Function Set q Population Chromosome 1 Chromosome 2 Chromosome 3... Chromosome q Genetic Fuzzy MF Acquisition process Fuzzy Mining for Large 1 -itemsets Transaction Database Mining Fuzzy Association Rules Final membership function Set Fuzzy Mining Fuzzy Association Rules Fig. 2 GA-based framework for searching membership functions Hong et al. also proposed an enhanced approach, called cluster-based fuzzy-ga mining algorithm, to speed up the evaluation process and keep nearly the same quality of solutions [15]. The proposed algorithm first divided the chromosomes in a population into k clusters by using the k-means clustering approach. All the chromosomes in a cluster then used the number of large 1-itemsets derived from the representative chromosome in the cluster and their own suitability of membership functions to calculate the fitness values. Since the chromosomes with similar coverage and overlap factors would form a cluster, they would have nearly the same shape of membership functions and induce about the same number of large 1-itemsets. For each cluster, the chromosome which was the nearest to the cluster center was thus chosen to derive its number of large 1-itemsets. All chromosomes in the same cluster then used the number of large 1-itemsets derived from the representative chromosome as their own. Finally, each chromosome was evaluated by this number of large 1-itemsets divided by its own suitability value. The evaluation cost could thus be greatly reduced due to the time-saving in finding 1-itemsets. 6 Discussion and Conclusion This chapter has given a wide survey about fuzzy data mining for association rules. Several fuzzy mining techniques, including mining fuzzy association rules, mining

11 An Overview of Mining Fuzzy Association Rules 407 fuzzy generalized association rules, mining both membership functions and fuzzy association rules, have been described. Some crisp mining techniques for handling quantitative data have also been briefly reviewed. Verlinde et al. investigated the difference between the rules by fuzzy mining and crisp mining through experiments and claimed the difference was small from the experimental results [52]. They constrained the rules with only a single item in both the antecedent and the consequence. The experiments were made in the environments where the data were normally distributed and the data boundaries were at the sides of the normal distributions. It is apparent that the rule difference in this situation is small since the data used by fuzzy mining and by crisp mining are not different very much. The boundary effect reduced by the fuzzy mining approaches on data in large databases and with normal distribution is thus not significant. However, as we know, the fuzzy set is usually used for modeling human perception on a concept. The membership functions defined are not necessarily corresponding to the distribution of a data. It is also the reason why the fuzzy set was proposed and not replaced with the probability theory. In summary, when compared to conventional crisp-set mining methods for quantitative data, fuzzy-mining approaches can get smoother mining results due to the fuzzy membership characteristics. The mined rules are expressed in linguistic terms, which are more natural and understandable for human beings. Besides, nearly all the fuzzy mining approaches can be easily degraded into crisp ones by assigning membership functions with values always equal to 1 or 0. These make fuzzy mining promising in real applications. Acknowledgment The authors would like to thank Mr. Chien-Shing Chen for his help in making the experiments. This research was supported by the National Science Council of the Republic of China under contract NSC E References 1. Agrawal R, Imielinksi T and Swami A (1993) Mining association rules between sets of items in large database. In: Proceedings of the ACM SIGMOD Conference, Washington DC, USA, pp Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: Proceedings of the International Conference on Very Large Databases, pp Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 8th IEEE International Conference on Data Engineering, pp Blishun AF (1987) Fuzzy learning models in expert systems. Fuzzy Sets and Systems, 22: Cai CH, Fu WC, Cheng CH, Kwong WW (1998) Mining association rules with weighted items. In: Proceedings of the International Database Engineering and Applications Symposium. Cardiff, Wales, UK, pp decampos LM, Moral S (1993) Learning rules for a fuzzy inference model. Fuzzy Sets and Systems, 59: Chan CC, Au WH (1997) Mining fuzzy association rules. Conference on Information and Knowledge Management. Las Vegas, pp

12 408 T.-P. Hong, Y.-C. Lee 8. Chang RLP, Pavliddis T (1977) Fuzzy decision tree algorithms. IEEE Transactions on Systems, Man and Cybernetics, 7: Chen MS, Han J, Yu PS (1996) Data mining: an overview from database perspective. IEEE Transaction on Knowledge and Data Engineering, 8(6): Chen CH, Hong TP, Tseng VSM (2006) Cluster-based fuzzy-genetic mining approach for association rules and membership functions. In: Proceedings of the 2006 IEEE International Conference on Fuzzy Systems. 11. Clair C, Liu C, Pissinou N (1998) Attribute weighting: a method of applying domain knowledge in the decision tree process. In: Proceedings of the 7th International Conference on Information and Knowledge Management, pp Clark P, Niblett T (1989) The CN2 induction algorithm. Machine Learning, 3: Cordón O, Herrera F, Villar P (2001) Generating the knowledge base of a fuzzy rule-based system by the genetic learning of the data base. In: Proceedings of IEEE Transactions on Fuzzy Systems, 9(4): Cordón O, Herrera F, Hoffmann F, Magdalena L (2001) Evolutionary tuning and learning of fuzzy knowledge bases. Genetic Fuzzy Systems: Evolutionary Tuning and Learning of Fuzzy Knowledge Bases. World Scientific Publishing Company, Singapore. 15. Delgado M, Gonzalez A (1993) An inductive learning procedure to identify fuzzy systems. Fuzzy Sets and Systems, 55: Ezeife CI (2002) Mining incremental association rules with generalized FP-tree. In: Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence, pp Famili A, Shen W, Weber R, Simoudis E (1997) Data preprocessing and intelligent data analysis. Intelligent Data Analysis, 1(1): Fayyad U, Piatesky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. In: Fayyad U, Piatesky-Shapiro G, Smyth P (eds): Advances in Knowledge Discovery & Data Mining. AAAI/MIT, pp Frawley WJ, Piatetsky-Shapiro G, Matheus CJ (1991) Knowledge discovery in databases: an overview. The AAAI Workshop on Knowledge Discovery in Databases, pp Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1996) Mining optimized association rules for numeric attributes. The ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp Gonzalez A (1995) A learning methodology in uncertain and imprecise environments. International Journal of Intelligent Systems, 10: Han J, Fu Y (1995) Discovery of multiple-level association rules from large database. In: Proceedings of the 21st International Conference on Very Large Data Bases, Zurich, Switzerland, pp Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. The 2000 ACM SIGMOD International Conference on Management of Data, 29(2): Hong TP, Lee CY (1996) Induction of fuzzy rules and membership functions from training examples. Fuzzy Sets and Systems, 84: Hong TP, Kuo CS, Chi SC (1999) A data mining algorithm for transaction data with quantitative values. In: Proceedings of the 8th International Fuzzy Systems Association World Congress, pp Hong TP, Kuo CS, Chi SC (1999) Mining association rules from quantitative data. Intelligent Data Analysis, 3(5): Hong TP, Kuo CS, Chi SC (2001) Trade-off between computation time and number of rules for fuzzy mining from quantitative data. International Journal of Uncertainty. Fuzziness and Knowledge-based Systems, 9(5): Hong TP, Chiang MJ, Wang SL (2002) Mining from quantitative data with linguistic minimum supports and confidences. In: Proceedings of the 8th IEEE International Conference on Fuzzy Systems, pp Hong TP, Lin KY, Chien BC (2003) Mining fuzzy multiple-level association rules from quantitative data. Applied Intelligence, 18(1):

13 An Overview of Mining Fuzzy Association Rules Hong TP, Lin KY, Wang SL (2003) Fuzzy data mining for interesting generalized association rules. Fuzzy Sets and Systems, 138(2): Hong TP, Chen CH, Wu YL, Lee YC (2003) Mining membership functions and fuzzy association rules. In: Proceedings of 2003 The Joint Conference on AI, Fuzzy System, and Grey System. 32. Hong TP, Chen CH, Wu YL, Lee YC (2006) A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions. Soft Computing, 10(11): Hong TP, Lin JW, Wu YL (2006) A Fast Updated Frequent Pattern Tree. In: Proceedings of the 17th IEEE International Conference on Systems, Man, and Cybernetics. 34. Kandel A (1992) Fuzzy Expert Systems. CRC Press, Boca Raton, pp Kaya M, Alhajj R (2003) A clustering algorithm with genetically optimized membership functions for fuzzy association rules mining. In: Proceedings of The IEEE International Conference on Fuzzy Systems, pp Kaya M, Alhajj R (2004) Mining multi-cross-level fuzzy weighted association rules. In: Proceedings of IEEE International Conference Intelligent Systems, Varna, Bulgaria, pp Kuok C, Fu A, Wong M (1998) Mining fuzzy association rules in databases. SIGMOD Record, 27(1): Lent B, Swami A, and Widom J (1997) Clustering association rules. In: Proceedings of International Conference on Data Engineering (ICDE 97), Birmingham, England, pp Mamdani EH (1974) Applications of fuzzy algorithms for control of simple dynamic plants. Proceedings of IEEE, 121: Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithm for discovering association rules. The AAAI Workshop on Knowledge Discovery in Databases, pp Mannila H (1997) Methods and problems in data mining. In: Proceedings of the 6th International Conference on Database Theory (ICDT 97), LNCS, Springer-Verlag, 1186: Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Transactions on Knowledge and Data Engineering, 9(5): Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press, pp Qiu Y, Lan YJ, Xie QS (2004) An improved algorithm of mining from FP-tree. In: Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, Shanghai, pp Quinlan JR (1987) Decision tree as probabilistic classifier. In:Proceedings of the 4th International Machine Learning Workshop, Morgan Kaufmann, San Mateo, CA., pp Rastogi R, Shim K (1998) Mining optimized association rules with categorical and numeric attributes. In: Proceedings of the 14th IEEE International Conference on Data Engineering, Orlando, pp Rives J (1990) FID3: fuzzy induction decision tree. In: Proceedings of the 1st International symposium on Uncertainty, Modeling and Analysis, pp Setnes M, Roubos H (2000) GA-fuzzy modeling and classification: Complexity and performance. IEEE Transaction on Fuzzy System, 8(5): Shen H, Wang S, Yang J (2004) Fuzzy taxonomic, quantitative database and mining generalized association rules. In: Proceedings of the 4th International Conference on Rough Sets and Current Trends in Computing (RSCTC 2004), Uppsala, Sweden, pp Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Monreal, Canada, pp Verlinde H, Cock MD, Boute R (2006) Fuzzy versus quantitative association rules: a fair data driven comparison. IEEE Transactions on Systems, Man and Cybernetics, 36(3): Wang CH, Hong TP, Tseng SS (1996) Inductive learning from fuzzy examples. In: Proceedings of the fifth IEEE International Conference on Fuzzy Systems, New Orleans, pp

14 410 T.-P. Hong, Y.-C. Lee 54. Wang CH, Hong TP, Tseng SS (1998) Integrating fuzzy knowledge by genetic algorithms. IEEE Transactions on Evolutionary Computation, 2(4): Wang W, Bridges SM (2000) Genetic algorithm optimization of membership functions for mining fuzzy association rules. In: Proceedings of the International Joint Conference on Information Systems, Fuzzy Theory and Technology, pp Wei Q, Chen G (1999) Mining generalized association rules with fuzzy taxonomic structures. In: Proceedings of the 18th International Conference of the North American Fuzzy Information Processing Society (NAFIPS), NY, USA, pp Yue S, Tsand E, Yeung D, Shi D (2000) Mining fuzzy association rules with weighted items. In: Proceedings of the IEEE International Conference on System, Man and Cybernetics, pp Zadeh LA (1965) Fuzzy sets. Information and Control, 8(3): Zadeh LA (1988) Fuzzy logic. IEEE Computer, pp Zaïane OR, Mohammed EH (2003) COFI-tree mining: A new approach to pattern growth with reduced candidacy generation. In: Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations.

Maintenance of fast updated frequent pattern trees for record deletion

Maintenance of fast updated frequent pattern trees for record deletion Tzung-Pei Hong a,b,, Chun-Wei Lin c, Yu-Lung Wu d a Department of Computer Science and Information Engineering, National University