Discovering fuzzy time-interval sequential patterns in sequence databases

Size: px

Start display at page:

Download "Discovering fuzzy time-interval sequential patterns in sequence databases"

Emma Boone
6 years ago
Views:

1 Discovering fuzzy time-interval sequential patterns in sequence databases Yen-Liang Chen Department of Information Management National Central University Cheng-Kui Huang Department of Information Management National Central University Abstract Given a sequence database and minimum support threshold, the task of sequential pattern mining is to discover the complete set of sequential patterns in databases. From the discovered sequential patterns, we can know what items are frequently bought together and in what order they appear. However, they can not tell us the time gaps between successive items in patterns. Accordingly, Chen, Chiang and Ko have proposed a generalization of sequential patterns, called time-interval sequential patterns, which reveals not only the order of items but also the time intervals between successive items (Chen et al. 2003). An example of time-interval sequential pattern has a form like (A, I 2, B, I 1, C), meaning that we buy A first, then after an interval of I 2 we buy B, and finally after an interval of I 1 we buy C, where I 2 and I 1 are predetermined time ranges. Although this new type of pattern can alleviate the above concern, it causes the sharp boundary problem. That is, when a time interval is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the original research so that fuzzy time-interval sequential patterns are discovered from databases. An efficient algorithm, the FTI-Apriori algorithm, is developed for mining fuzzy time-interval sequential patterns by modifying traditional Apriori algorithm. An experimental study is shown for the algorithm. Keywords: data mining, sequential patterns, sequence data, time interval, fuzzy sets 1. Introduction Data mining extracts implicit, previously unknown and potentially useful information from databases. The discovered information and knowledge are useful for various applications, including market analysis, decision support, fraud detection and business management. Many approaches have been proposed to extract information, and mining sequential patterns is one of the most important approaches (Han et al. 2000). The problem of mining sequential patterns was first introduced in the mid 1990s, which discovers patterns that occur frequently in a sequence database (Agrawal et al.

2 1995; Pei et al. 2000). A typical example of sequential pattern is like that in which a customer who, having bought a computer, returns to buy a scanner and a microphone. Although the discovered sequential patterns can reveal what items are frequently bought together and in what order they appear, they cannot tell us the time gaps between successive items. Unfortunately, not knowing the time intervals means that, although we know what items will be bought next, we have no idea when the next purchase will happen; this makes it difficult to take the right action at the right time. In view of this problem, Chen, Chiang and Ko (Chen et al. 2003) have proposed a generalization of sequential patterns, called time-interval sequential patterns, which reveals not only the order of items but also the time intervals between successive items. The following are some examples of the time-interval sequential pattern: (a) having bought a laser printer, a customer returns to buy a scanner in three months and then a CD burner in six months. (b) A customer revisits website A within a week. (c) After an operation X, a patient is very likely to be infected by virus Y in two weeks. Here, we briefly restate the approach proposed by Chen, Chiang and Ko (Chen et al. 2003). The input of their problem contains a sequence database S, a set I = {i 1, i 2,, i m } of items and a set TI= {I 0, I 1, I 2,, I r } of time intervals, where TI is a complete and non-overlap partition of the time domain. A sequence B=(b 1, & 1, b 2, & 2,, b v-1, & v-1, b v ) is a time-interval sequence if b i I for 1 i v and & i TI for 1 i v-1. The output is all time-interval sequences which occur frequently in database S. An example of time-interval sequential pattern has a form like (A, I 2, B, I 1, C), meaning that we buy A first, then after an interval of I 2 we buy B, and finally after an interval of I 1 we buy C, where I 2 and I 1 are predetermined time intervals. Although sequential patterns extended with time-intervals can offer more information than those without time-intervals, the approach may cause the sharp boundary problem. That is, when a time interval is near the boundary of two adacent ranges, we either ignore or overemphasize it. For example, let the interval of I 2 be 5 t<10 and that of I 3 be 10 t<20, where t is the time gap between two successive items. Then if the time gap between items A and B is near 10, either a little larger or smaller 10, it is not fair to udge whether the time interval between A and B is in I 2 or in I 3. However, according to the original definition of Chen, Chiang and Ko, it can only be one hundred percent in I 2 or in I 3. This difficulty can be adequately tackled by using fuzzy techniques, for fuzzy set theory allows this time gap to be 50% in I 2 and at the same time 50% in I 3. This simple example indicates that the fuzzy concept is better than the partition method because fuzzy sets provide a smooth transition between member and non-member of a set. Besides the above-mentioned benefit, there are several other reasons that support the use of fuzzy time interval in place of crisp interval. First, the human knowledge can be

3 represented more naturally and appropriately by fuzzy logic. And how to partition and represent the time interval is a sort of human knowledge. Second, it is widely recognized that many real world situations are intrinsically fuzzy. And the partition of time interval is one of them. Third, fuzzy time interval is simple and easy for users. For example, if we use fuzzy sets to handle the time intervals, we can first define the linguistic terms that are meaningful and understandable to users. Then, for each such term we can choose appropriate fuzzy function to represent it. A number of researches have exploited fuzzy techniques to mine fuzzy association rules or sequential patterns from databases. These efforts can be roughly classified into the following: (1) fuzzy representation of item s quantity (Lee et al. 1997), (2) fuzzy representation of quantitative attribute (Hong et al. 1999; Zhang 1999), (3) fuzzy product taxonomies or generalization hierarchies (Chen et al. 2002), (4) fuzzy representation of item importance (Yue et al. 2000), (5)fuzzy representation of transactions (Lee 2000), (6) fuzzy support and confidence measure (Kuok et al. 1998), (7) using fuzzy techniques for determining linguistic terms or domain partition (Fu et al. 1998; Vazirgiannis 1998), and (8) using fuzzy techniques to determine rule s interestingness (Au et al. 1997; Au et al. 1998; Au et al. 1999; Au et al 2003). To our knowledge, no research has ever applied fuzzy techniques to deal with time intervals in time-interval sequential patterns. We, therefore, extend the original research of Chen, Chiang and Ko so that fuzzy time-interval sequential patterns can be discovered from databases. Some linguistic terms, such as Long, Middle, and Short, will be provided to represent time-intervals. And, a fuzzy time-interval sequential pattern may have a form like: Having bought a laser printer, a customer returns to buy a scanner in a Short period and then a CD burner in a Long period. The rest of this paper is organized as follows. Section 2 formally defines the problem and the fuzzy time-interval sequential pattern. Thereafter, Section 3 develops an algorithm to find fuzzy time-interval sequential patterns, which is developed by modifying the traditional Apriori algorithm. Section 4 shows the performance of the algorithm. Conclusions are finally drawn in Section Problem Definition As done in the previous research of Chen, Chiang and Ko (Chen et al. 2003), we represent a sequence in the following way. Definition 1. A sequence s is represented as ( (a 1, t 1 ), (a 2, t 2 ), (a 3, t 3 ),, (a n, t n ) ), where a is an item and t stands for the time at which a occurs, 1 n, and t -1 t for 2 n. In the sequence, if items occur at the same time, they are ordered alphabetically. From the time tags attached to the items in sequence s, we can compute the time

4 interval values as ti = t +1 -t, where =1, 2,, n-1. For example, if we have a sequence s as ((a, 1), (b, 4), (e, 29)), then its time interval values are 3 and 25. Suppose we have the set LT={lt =1, 2,, l} of linguistic terms. Then we use μ lt (ti) to denote the membership degree of time-interval value ti to linguistic term lt. Two approaches have been used to determine linguistic terms and fuzzy membership functions (Medasani et al. 1998). The first approach relies on domain experts to specify the functions based on their background knowledge and requirements. The second approach assumes that the functions are obtained by a preprocessing phase that learns the functions from the data, such as learning by neural-network (Lin et al. 1991), by genetic algorithm (Karr et al. 1993), by clustering method (Fu et al. 1998), and by entropy measure (Ross 1995). Therefore, a complete process in fuzzy mining may contain two phases, where the first phase learns fuzzy functions from data and the second phase discovers patterns according to the fuzzy functions learned from the first phase. Interestingly but not surprisingly, almost all of the existing papers in fuzzy mining only deal with the second phase by assuming that the fuzzy functions are given, because this can simplify the presentation of the paper and enable us to focus on the design of mining algorithms. Due to these reasons, we adopt the same assumption that the fuzzy functions are given. Example 1. Suppose we want to represent a time interval by using three linguistic terms: Short(S), Middle(M), and Long(L). Their membership functions can be represented as follows. μ μ Short Long ( ti ( ti ) 1, 15 ti ) = 13 0, 0, ti 15 =, 13 1,, ti 2 < ti ti ti 15 < ti ti 2 < < μ Middle ( ti ) 0, ti 2 =, ti, 13 Fig. 1. The fuzzy membership functions for time-interval concept. either 2 ti 2 < ti 15 < ti or ti 15 < By applying the fuzzy functions above, we find that the time-interval value 3 is 0.92/Short /Middle + 0.0/Long and the time-interval value 25 is 0.0/Short /Middle /Long. According to the linguistic terms and the membership functions, we can define the fuzzy time-interval sequence as follows. Definition 2. Let I = {i 1, i 2,, i m } be the set of all items and LT={lt =1, 2,, l} be the set of all linguistic terms. A sequence α=(b 1, lg 1, b 2, lg 2,, b r-1, lg r-1, b r ) is a fuzzy

5 time-interval sequence if b i I for 1 i r and lg i LT for 1 i r-1. Definition 3. Let s=((a 1, t 1 ), (a 2, t 2 ), (a 3, t 3 ),, (a n, t n )) be a sequence and α=(b 1, lg 1, b 2, lg 2,, b r-1, lg r-1, b r ) be a fuzzy time-interval sequence, where r 2. Let μ lgi (t) denote the membership degree of time-interval value t to linguistic term lg i. Suppose there are K lists of indexes in s, denoted as 1 < w < < n for k=1 to K, each of which satisfies the condition of b1 =, wk,1 k, 2 aw k,1 2 a w k, 2 w k, r b =,, and b r = a wk. Then we call that α is, r contained in s with degree γ or that α is a fuzzy time-interval subsequence of s with degree γ iff the following conditions hold: (1) ti = t for i=1, 2,, r-1 and k=1, 2,, K; w k,i t w k, i+ 1 wk, i (2) γ=max 1 k K min 1 i r-1 {μ lgi ( ti w )}. k,i Although Definition 3 seems to be a right definition, it does not consider the situation of r=1, where the fuzzy time-interval sequence degenerates into a crisp sequence containing a single item. To make the definition complete, we do the following amendment. Definition 4. When a fuzzy time-interval sequence only contains a single item, it can be represented as α=(b 1 ), where b 1 I. In such a case, we call that α is contained in s with degree 1 if there exists an integer, where 1 n, such that b 1 = a. The total number of items in a fuzzy time-interval sequence α is referred to as the length of the sequence. A fuzzy time-interval sequence whose length is k is referred to as a fuzzy k-time-interval sequence. Example 2. Suppose we are given a sequence s=((a, 4), (d, 5), (d, 10), (e, 28)) and a fuzzy time-interval sequence α=(a, Short, d, Middle, e). There are two ways that we can match α: one is ((a, 4), (d, 5), (e, 28)) and the other is ((a, 4), (d, 10), (e, 28)). For the first case, we have the degree as min {μ Short (1), μ Middle (23)}= min {1, 5/13}= The second case has the degree as min{μ Short (6), μ Middle (18)}= min{9/13, 10/13}= Consequently, α is contained in s with degree max{0.385, 0.692}= For ease of reference, let ϒ(α, s) represent the degree that a fuzzy time-interval sequence α is contained in sequence s, which is determined according to Definitions 2, 3 and 4. A transaction is represented by <sid, s>, where sid is the identifier of this transaction and s is a sequence. A sequence database S is formed by a set of transactions. For a given fuzzy time-interval sequence α, its support in database S is defined as follows. Definition 5. support S (α) = (sid, s) in S ϒ(α, s) / S

6 A fuzzy time-interval sequence α is called a fuzzy time-interval sequential pattern or a frequent fuzzy time-interval sequence if its support in S is greater than or equal to the user-specified minimum support (called min_sup). A fuzzy time-interval sequential pattern with length k is referred to as a fuzzy k-time-interval sequential pattern. Given a sequence database and min_sup, the goal of fuzzy time-interval sequential pattern mining is to determine in the sequence database all the fuzzy time-interval subsequences whose supports are more than or equal to min_sup. Sid Sequence 10 ( (a, 1), (b, 4), (e, 29) ) 20 ( (d, 1), (a, 2), (d, 24) ) 30 ( (b, 1), (a, 11), (e, 28) ) 40 ( (f, 1), (b, 5), (c, 19) ) 50 ( (a, 4), (b, 5), (d, 10), (e, 28) ) 60 ( (a, 0), (b, 5), (e, 30) ) 70 ( (, 2), (a, 17), (h, 17) ) 80 ( (c, 3), (i, 10), (f, 18) ) 90 ( (h, 4), (a, 10), (b, 21) ) 100 ( (g, 0), (a, 0), (b, 3), (e, 30) ) Fig. 2. A sequence database. Example 3. Consider the sequence database shown in Fig. 2 with the linguistic terms defined in Example 1. If min_sup=0.3, then we can find fuzzy time-interval sequential pattern (a, Short, b, Long, e) with support in the database. Four transactions (Sid=10, 50, 60 and 100) contribute to this pattern, whose degrees are respectively 0.77, 0.62, 0.77 and According to Definition 5, the support of this pattern is ( )/10= Algorithms for Mining Fuzzy Time-interval Sequential Patterns The goal of this section is to develop an algorithm for mining fuzzy time-interval sequential patterns from databases. The algorithm is developed by modifying the well-known Apriori algorithm. We introduce them in the following The FTI-Apriori algorithm The Fuzzy Time Interval (FTI)-Apriori Algorithm is developed by modifying the well-known Apriori algorithm. Basically, two phases are repeatedly executed to generate the patterns. The first phase generates candidate sequences of length k, denoted by C k,

7 from the frequent sequences of length k-1, denoted by L k-1. So, each candidate sequence generated in the current cycle will have one more item and one more linguistic term than the frequent sequences in the preceding cycle. After finding the set of candidate sequences, the second phase scans the database to determine the support of each candidate pattern, and the resulting set comprises all frequent sequences of length k. In the following, we discuss how to execute the first phase for different values of k: (1) For k=1: The set of candidate patterns of length 1, C 1, will be generated by listing all distinct items in databases. (2) For k=2: Traditional, C 2 was obtained by directly oining L 1 with L 1. However, since the first item and the second item in C 2, say b and c, may have various fuzzy time-interval relations, pairs for all possible fuzzy time-interval relations must be generated. Let us consider an explanatory example. Suppose that (b) and (c) belong to L 1 and LT={lt 1, lt 2, lt 3, lt 4, lt 5 }. Then there are totally 20 candidate fuzzy time-interval sequences in C 2. Some of them are (b, lt 1, b), (b, lt 3, b), (b, lt 2, c), (c, lt 2, b) and (c, lt 2, c). In a word, C 2 can be generated as L 1 TI L 1, where denotes oin. (3) When k>2: Let (b 1, lg 1, b 2, lg 2,, lg k-1, b k ) be a fuzzy k-time-interval sequence in L k. Then, the fuzzy (k-1)-time-interval sequences (b 1, lg 1, b 2, lg 2,, lg k-2, b k-1 ) and (b 2, lg 2,, b k-1, lg k-1, b k ) must be also frequent, because the support of (b 1, lg 1, b 2, lg 2,, lg k-1, b k ) must be no larger than the supports of the other two. (For the proof, please refer to Theorem A.1 in Appendix). Therefore, if the time-interval sequences (b 1, lg 1, b 2, lg 2,, lg k-2, b k-1 ) and (b 2, lg 2,, b k-1, lg k-1, b k ) exist in L k-1, then (b 1, lg 1, b 2, lg 2,, lg k-1, b k ) must exist in C k. All the time-interval sequences in C k can be generated by oining the time-interval sequences in L k-1 this way. Next, we will discuss how to execute the second phase, i.e., to determine the supports of all patterns in C k. To this end, a tree structure, called fuzzy candidate tree, is used as a basis. Basically, the candidate tree is similar to the prefix tree adopted in previous research (Agrawal et al. 1994; Liu et al. 2003). The maor difference lies in that the traditional approach connects each tree branch with an item name, whereas in the new approach two components are attached an item name and a linguistic term. Suppose we are given a candidate set C k. Initially, we have an empty tree with a single root node. Then we insert every fuzzy time-interval pattern in C k into the tree, ust as how we build a prefix tree. After all the patterns in C k have been inserted, the tree is built. Next, we will traverse the tree for every transaction. For a given transaction, after finishing the traversal we can determine the degrees that the patterns in the tree are contained in that transaction. Finally, after the tree has been traversed by all transactions the support value of every pattern is kept in the corresponding leaf node in the tree. So, we can determine what patterns are frequent and what are not.

8 In the following, the maor steps of the FTI-Apriori algorithm are listed. For clarity, we omit the detailed functions and steps. Fig. 3. The FTI-Apriori algorithm. Input: Sequence Database S, Minimum Support min_sup, and Linguistic Terms LT; Output: The complete set of fuzzy time-interval patterns Variable: c.count is the support of time-interval sequence c Method: C 1 = find_all_items(s); L 1 ={c C 1 c.count min_sup} For each i 1 L 1 { } For each i 2 L 1 { } For each ltd LT; c=i 1 *ltd*i 2 ; add c to C 2 ; L 2 ={c C 2 c.count min_sup} For (k>2; L k-1 ; k++) do begin { C k =fuzzy_apriori_gen(l k-1 ); Build the fuzzy candidate tree from C k ; For each sequence s S {Traverse the fuzzy candidate tree and accumulate the supports; } L k ={c C k (c.count / S ) min_sup} } return L k ; Example 4. Consider the sequence database shown in Fig. 2 and assume that we set min_sup as 0.3. C 1 will be generated as follows: (a): 8, (b): 7, (c): 2, (d): 2, (e): 5, (f): 2, (g): 1, (h): 2, (i):1, (): 1, Then, we have L 1 ={a, b, e} because their supports are larger than min_sup. After that, C 2 can be generated by oining L 1 with LT={Short, Middle, Long}, where their membership functions are referred to Fig. 1. The a then b pattern can be generated as the following: For a then b with different linguistic terms: Short =( )/10=3.92/10=0.392 Middle =( )=1.08/10=0.108 Long =( )=0.0 Among the above three, only the pattern of a then b in Short can be generated in L 2 since its support is greater than min_sup. Besides this pattern, other patterns in L 2 include:

9 a then e in Long with support 0.361, and b then e in Long with support After the generation of L 2, the algorithm starts to produce C k and L k for k>2. Since the patterns in L 2 are (a, Short, b), (a, Long, e) and (b, Long, e), the candidate pattern that we can generate for C 3 is (a, Short, b, Long, e). The following computation indicates that the support of this pattern exceeds min_sup, and thus we have L 3 as {(a, Short, b, Long, e)}. Sid=10: min{(a, 0.92/Short, b), (b, 0.77/Long, e)} = 0.77 Sid=50: min{(a, 1.0/Short, b), (b, 0.62/Long, e)} = 0.62 Sid=60: min{(a, 0.77/Short, b), (b, 0.77/Long, e)} = 0.77 Sid=100: min{(a, 1.0/Short, b), (b, 0.92/Long, e)} = 0.92 Support= ( )/10= Experimental Results In this section, we perform a simulation study of the algorithm, FTI-Apriori. It is implemented by Sun Java language (J2SDK 1.4.1_02) and tested on a PC with two Intel Pentium III 933 processors and 1GB main memory under the Windows 2000 operating system. Neither the multithreading technology nor the parallel computing skill is used in our implemented programs. Synthetic datasets are generated by applying the famous synthetic data generation algorithm in Agrawal et al. (Agrawal et al. 1995). Basically, each transaction is a sequence of itemsets. However, we extend the transaction data so that the items in different item sets have different time values and that those in the same item set have the same time values. A value w is drawn from a Poisson distribution with mean T I for each customer. The drawn value w represents the average time interval between successive itemsets in the sequence of this particular customer. After that, we determine the intervals between successive itemsets of this customer by repetitively drawing values from a Poisson distribution with mean w. Table I lists the parameters used in the simulation; the first eight parameters are the classical ones used in previous research but the last parameter T I is a new parameter created for the problem considered here. In the simulation, some parameters are fixed: N=10000, N s =5000, N I =25000, T I =15 and D = Table I Parameters D Number of customers C Average number of transactions per customer T Average number of items per transaction S Average length of maximal potentially large sequences I Average size of itemsets in maximal potentially large sequences

10 N S N I N T I Number of maximal potentially large sequences Number of maximal potentially large itemsets Number of items Average length of time intervals Table II Parameters Name C T S I C10-T2.5-S4-I C10-T5-S4-I1.25 C10-T5-S4-I C20-T2.5-S4-I1.25 C20-T2.5-S4-I2.5 C20-T2.5-S8-I The first comparison would compare the run times of these seven algorithms for different minimum supports. The comparison is carried out on the basis of the six data sets shown in Table II, where the minimum support threshold is varied from 3.0% to 1.5%. Fig. 4 summarizes the results. Runtime(Second) C10-T2.5-S4-I FTI-Apriori Runtime(Second) C10-T5-S4-I FTI-Apriori Minimum Support Minimum Support C10-T5-S4-I2.5 C20-T2.5-S4-I1.25 Runtime(Second) Minimum Support FTI-Apriori Runtime(Second) Minimum Support FTI-Apriori

11 C20-T2.5-S4-I2.5 C20-T2.5-S8-I2.5 Runtime(Second) FTI-Apriori Runtime(Second) FTI-Apriori Minimum Support Minimum Support Fig. 4. Run times for the six data sets. 5. Conclusion Sequential-pattern mining is useful in discovering customer purchasing patterns along time from transactional databases. Since the method was first proposed by Agrawal et al. (Agrawal et al. 1995) in 1995, it has become an established and active research area. The existing methods, however, do not discover the time intervals between successive items in the pattern. In view of this problem, Chen, Chiang and Ko proposed a novel method to discover the time-interval information between successive items in the pattern. With this additional information, we can know when the next purchase will happen after the previous purchase was made. Although time-interval sequential patterns can provide more information than those without time-intervals, the approach may cause the sharp boundary problem. That is, when a time interval is near the boundary of two adacent ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the original research of Chen, Chiang and Ko so that fuzzy time-interval sequential pattern can be discovered from databases. Some linguistic terms, such as Long, Middle and Short, are provided to represent the linguistic terms for time-intervals. Fuzzy time-interval sequential pattern mining represents a new and promising research area in data mining. The results of this paper can be extended by considering time constraints, spatial constraints, fuzzy time-hierarchy and other kinds of time-related knowledge. Furthermore, it is important to explore how different fuzzy membership functions may influence the result of mining. References Agrawal, R., and Srikant, R. Fast Algorithms for Mining Association Rules, in Proceedings of 1994 International Conference Very Large Data Bases, 1994, pp Agrawal, R., and Srikant, R. Mining Sequential Patterns, in Proceedings of 1995 International Conference Data Engineering, 1995, pp

12 Au, W. H., and Chan, K. C. C. Mining fuzzy association rules, in Proc. 6th Int. Conference Information Knowledge Management, Las Vegas, NV, 1997, pp Au, W. H., and Chan, K. C. C. An effective algorithm for discovering fuzzy rules in relational databases, in Proceedings IEEE International Conference Fuzzy Systems, vol. II, 1998, pp Au, W. H., and Chan, K. C. C. FARM: A data mining system for discovering fuzzy association rules, in Proceedings FUZZ-IEEE 99, vol. 3, 1999, pp Au, W. H., and Chan, K. C. C. Mining fuzzy association rules in a bank-account database, IEEE Transaction on Fuzzy Systems (11), 2003, pp Chen, G., and Wei, Q. Fuzzy association rules and the extended mining algorithms, Information Sciences (147), 2002, pp Chen, Y. L., Chiang, M. C., and Ko, M. T. Discovering Time-interval Sequential Patterns in Sequence Databases, Expert Systems with Applications (25:3), 2003, pp Fu, A. W. C., Wong, M. H., Sze, S. C., Wong, W. C., Wong, W. L., and Yu, W. K. Finding fuzzy sets for the mining of fuzzy association rules for numerical attributes, in Proceedings International Symposium Intelligent Data Engineering Learning (IDEAL 98), Hong Kong, 1998, pp Hong, T. P., Kuo, C. S., and Chi, S. C. Mining association rules from quantitative data, Intelligent Data Analysis (3), 1999, pp Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., and Hsu, M. C. FreeSpan: Frequent Pattern-proected Sequential Pattern Mining, in Proceedings of 2000 International Conference on Knowledge Discovery and Data Mining, 2000, pp Karr, C. L., and Gentry, E. J. Fuzzy control of ph using genetic algorithms, IEEE Transaction on Fuzzy Systems (1:1), 1993, pp Kuok, C. M., Fu, A., and Wong, M. H. Mining fuzzy association rules in databases, SIGMOD Record (27:1), 1998, pp Lin, C. T., and Lee, C. S. G. Neural network based fuzzy logic control and decision systems, IEEE Transaction on Computers (40:12), 1991, pp Lee, J. H., and Kwang, H. L. An extension of association rules using fuzzy sets, presented at the IFSA 97, Prague, Czech Republic, Lee, J. W. T. An ordinal framework for data mining of fuzzy rules, in FUZZ IEEE 2000, San Antonio, TX, 2000, pp Liu, G., Lu, H., Xu, Y., and Yu, J. X. Ascending frequency ordered prefix-tree: efficient mining of frequent patterns, in Proceedings of the Eighth International Conference on Database Systems for Advanced Applications, 2003, pp

13 Medasani, S., Kim, J., and Krishnapuram, R. An overview of membership function generation techniques for pattern recognition, International Journal of Approximate Reasoning (19), 1998, pp Pei, J., Han, J., Mortazavi-Asl, B., and Zhu, H. Mining access patterns efficiently from web logs, in Proceedings of 2000 Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000, pp Ross, T. J. Fuzzy Logic with Engineering Applications, McGraw-Hill, Inc Vazirgiannis, M. A classification and relationship extraction scheme for relational databases based on fuzzy logic, in Proceedings Research Development Knowledge Discovery Data Mining, Melbourne, Australia, 1998, pp Yue, J. S., Tsang, E., Yenng, D., and Daming, S. Mining fuzzy association rules with weighted items, in Proc. IEEE International Conference Systems, Man, Cybernetics, Nashville, TN, 2000, pp Zhang, W. Mining fuzzy quantitative association rules, in Proceedings 11 th International Conference Tools Artificial Intelligence, Chicago, IL, 1999, pp

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering