Discovering fuzzy time-interval sequential patterns in sequence databases

Size: px
Start display at page:

Download "Discovering fuzzy time-interval sequential patterns in sequence databases"

Transcription

1 Discovering fuzzy time-interval sequential patterns in sequence databases Yen-Liang Chen Department of Information Management National Central University Cheng-Kui Huang Department of Information Management National Central University Abstract Given a sequence database and minimum support threshold, the task of sequential pattern mining is to discover the complete set of sequential patterns in databases. From the discovered sequential patterns, we can know what items are frequently bought together and in what order they appear. However, they can not tell us the time gaps between successive items in patterns. Accordingly, Chen, Chiang and Ko have proposed a generalization of sequential patterns, called time-interval sequential patterns, which reveals not only the order of items but also the time intervals between successive items (Chen et al. 2003). An example of time-interval sequential pattern has a form like (A, I 2, B, I 1, C), meaning that we buy A first, then after an interval of I 2 we buy B, and finally after an interval of I 1 we buy C, where I 2 and I 1 are predetermined time ranges. Although this new type of pattern can alleviate the above concern, it causes the sharp boundary problem. That is, when a time interval is near the boundary of two predetermined time ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the original research so that fuzzy time-interval sequential patterns are discovered from databases. An efficient algorithm, the FTI-Apriori algorithm, is developed for mining fuzzy time-interval sequential patterns by modifying traditional Apriori algorithm. An experimental study is shown for the algorithm. Keywords: data mining, sequential patterns, sequence data, time interval, fuzzy sets 1. Introduction Data mining extracts implicit, previously unknown and potentially useful information from databases. The discovered information and knowledge are useful for various applications, including market analysis, decision support, fraud detection and business management. Many approaches have been proposed to extract information, and mining sequential patterns is one of the most important approaches (Han et al. 2000). The problem of mining sequential patterns was first introduced in the mid 1990s, which discovers patterns that occur frequently in a sequence database (Agrawal et al.

2 1995; Pei et al. 2000). A typical example of sequential pattern is like that in which a customer who, having bought a computer, returns to buy a scanner and a microphone. Although the discovered sequential patterns can reveal what items are frequently bought together and in what order they appear, they cannot tell us the time gaps between successive items. Unfortunately, not knowing the time intervals means that, although we know what items will be bought next, we have no idea when the next purchase will happen; this makes it difficult to take the right action at the right time. In view of this problem, Chen, Chiang and Ko (Chen et al. 2003) have proposed a generalization of sequential patterns, called time-interval sequential patterns, which reveals not only the order of items but also the time intervals between successive items. The following are some examples of the time-interval sequential pattern: (a) having bought a laser printer, a customer returns to buy a scanner in three months and then a CD burner in six months. (b) A customer revisits website A within a week. (c) After an operation X, a patient is very likely to be infected by virus Y in two weeks. Here, we briefly restate the approach proposed by Chen, Chiang and Ko (Chen et al. 2003). The input of their problem contains a sequence database S, a set I = {i 1, i 2,, i m } of items and a set TI= {I 0, I 1, I 2,, I r } of time intervals, where TI is a complete and non-overlap partition of the time domain. A sequence B=(b 1, & 1, b 2, & 2,, b v-1, & v-1, b v ) is a time-interval sequence if b i I for 1 i v and & i TI for 1 i v-1. The output is all time-interval sequences which occur frequently in database S. An example of time-interval sequential pattern has a form like (A, I 2, B, I 1, C), meaning that we buy A first, then after an interval of I 2 we buy B, and finally after an interval of I 1 we buy C, where I 2 and I 1 are predetermined time intervals. Although sequential patterns extended with time-intervals can offer more information than those without time-intervals, the approach may cause the sharp boundary problem. That is, when a time interval is near the boundary of two adacent ranges, we either ignore or overemphasize it. For example, let the interval of I 2 be 5 t<10 and that of I 3 be 10 t<20, where t is the time gap between two successive items. Then if the time gap between items A and B is near 10, either a little larger or smaller 10, it is not fair to udge whether the time interval between A and B is in I 2 or in I 3. However, according to the original definition of Chen, Chiang and Ko, it can only be one hundred percent in I 2 or in I 3. This difficulty can be adequately tackled by using fuzzy techniques, for fuzzy set theory allows this time gap to be 50% in I 2 and at the same time 50% in I 3. This simple example indicates that the fuzzy concept is better than the partition method because fuzzy sets provide a smooth transition between member and non-member of a set. Besides the above-mentioned benefit, there are several other reasons that support the use of fuzzy time interval in place of crisp interval. First, the human knowledge can be

3 represented more naturally and appropriately by fuzzy logic. And how to partition and represent the time interval is a sort of human knowledge. Second, it is widely recognized that many real world situations are intrinsically fuzzy. And the partition of time interval is one of them. Third, fuzzy time interval is simple and easy for users. For example, if we use fuzzy sets to handle the time intervals, we can first define the linguistic terms that are meaningful and understandable to users. Then, for each such term we can choose appropriate fuzzy function to represent it. A number of researches have exploited fuzzy techniques to mine fuzzy association rules or sequential patterns from databases. These efforts can be roughly classified into the following: (1) fuzzy representation of item s quantity (Lee et al. 1997), (2) fuzzy representation of quantitative attribute (Hong et al. 1999; Zhang 1999), (3) fuzzy product taxonomies or generalization hierarchies (Chen et al. 2002), (4) fuzzy representation of item importance (Yue et al. 2000), (5)fuzzy representation of transactions (Lee 2000), (6) fuzzy support and confidence measure (Kuok et al. 1998), (7) using fuzzy techniques for determining linguistic terms or domain partition (Fu et al. 1998; Vazirgiannis 1998), and (8) using fuzzy techniques to determine rule s interestingness (Au et al. 1997; Au et al. 1998; Au et al. 1999; Au et al 2003). To our knowledge, no research has ever applied fuzzy techniques to deal with time intervals in time-interval sequential patterns. We, therefore, extend the original research of Chen, Chiang and Ko so that fuzzy time-interval sequential patterns can be discovered from databases. Some linguistic terms, such as Long, Middle, and Short, will be provided to represent time-intervals. And, a fuzzy time-interval sequential pattern may have a form like: Having bought a laser printer, a customer returns to buy a scanner in a Short period and then a CD burner in a Long period. The rest of this paper is organized as follows. Section 2 formally defines the problem and the fuzzy time-interval sequential pattern. Thereafter, Section 3 develops an algorithm to find fuzzy time-interval sequential patterns, which is developed by modifying the traditional Apriori algorithm. Section 4 shows the performance of the algorithm. Conclusions are finally drawn in Section Problem Definition As done in the previous research of Chen, Chiang and Ko (Chen et al. 2003), we represent a sequence in the following way. Definition 1. A sequence s is represented as ( (a 1, t 1 ), (a 2, t 2 ), (a 3, t 3 ),, (a n, t n ) ), where a is an item and t stands for the time at which a occurs, 1 n, and t -1 t for 2 n. In the sequence, if items occur at the same time, they are ordered alphabetically. From the time tags attached to the items in sequence s, we can compute the time

4 interval values as ti = t +1 -t, where =1, 2,, n-1. For example, if we have a sequence s as ((a, 1), (b, 4), (e, 29)), then its time interval values are 3 and 25. Suppose we have the set LT={lt =1, 2,, l} of linguistic terms. Then we use μ lt (ti) to denote the membership degree of time-interval value ti to linguistic term lt. Two approaches have been used to determine linguistic terms and fuzzy membership functions (Medasani et al. 1998). The first approach relies on domain experts to specify the functions based on their background knowledge and requirements. The second approach assumes that the functions are obtained by a preprocessing phase that learns the functions from the data, such as learning by neural-network (Lin et al. 1991), by genetic algorithm (Karr et al. 1993), by clustering method (Fu et al. 1998), and by entropy measure (Ross 1995). Therefore, a complete process in fuzzy mining may contain two phases, where the first phase learns fuzzy functions from data and the second phase discovers patterns according to the fuzzy functions learned from the first phase. Interestingly but not surprisingly, almost all of the existing papers in fuzzy mining only deal with the second phase by assuming that the fuzzy functions are given, because this can simplify the presentation of the paper and enable us to focus on the design of mining algorithms. Due to these reasons, we adopt the same assumption that the fuzzy functions are given. Example 1. Suppose we want to represent a time interval by using three linguistic terms: Short(S), Middle(M), and Long(L). Their membership functions can be represented as follows. μ μ Short Long ( ti ( ti ) 1, 15 ti ) = 13 0, 0, ti 15 =, 13 1,, ti 2 < ti ti ti 15 < ti ti 2 < < μ Middle ( ti ) 0, ti 2 =, ti, 13 Fig. 1. The fuzzy membership functions for time-interval concept. either 2 ti 2 < ti 15 < ti or ti 15 < By applying the fuzzy functions above, we find that the time-interval value 3 is 0.92/Short /Middle + 0.0/Long and the time-interval value 25 is 0.0/Short /Middle /Long. According to the linguistic terms and the membership functions, we can define the fuzzy time-interval sequence as follows. Definition 2. Let I = {i 1, i 2,, i m } be the set of all items and LT={lt =1, 2,, l} be the set of all linguistic terms. A sequence α=(b 1, lg 1, b 2, lg 2,, b r-1, lg r-1, b r ) is a fuzzy

5 time-interval sequence if b i I for 1 i r and lg i LT for 1 i r-1. Definition 3. Let s=((a 1, t 1 ), (a 2, t 2 ), (a 3, t 3 ),, (a n, t n )) be a sequence and α=(b 1, lg 1, b 2, lg 2,, b r-1, lg r-1, b r ) be a fuzzy time-interval sequence, where r 2. Let μ lgi (t) denote the membership degree of time-interval value t to linguistic term lg i. Suppose there are K lists of indexes in s, denoted as 1 < w < < n for k=1 to K, each of which satisfies the condition of b1 =, wk,1 k, 2 aw k,1 2 a w k, 2 w k, r b =,, and b r = a wk. Then we call that α is, r contained in s with degree γ or that α is a fuzzy time-interval subsequence of s with degree γ iff the following conditions hold: (1) ti = t for i=1, 2,, r-1 and k=1, 2,, K; w k,i t w k, i+ 1 wk, i (2) γ=max 1 k K min 1 i r-1 {μ lgi ( ti w )}. k,i Although Definition 3 seems to be a right definition, it does not consider the situation of r=1, where the fuzzy time-interval sequence degenerates into a crisp sequence containing a single item. To make the definition complete, we do the following amendment. Definition 4. When a fuzzy time-interval sequence only contains a single item, it can be represented as α=(b 1 ), where b 1 I. In such a case, we call that α is contained in s with degree 1 if there exists an integer, where 1 n, such that b 1 = a. The total number of items in a fuzzy time-interval sequence α is referred to as the length of the sequence. A fuzzy time-interval sequence whose length is k is referred to as a fuzzy k-time-interval sequence. Example 2. Suppose we are given a sequence s=((a, 4), (d, 5), (d, 10), (e, 28)) and a fuzzy time-interval sequence α=(a, Short, d, Middle, e). There are two ways that we can match α: one is ((a, 4), (d, 5), (e, 28)) and the other is ((a, 4), (d, 10), (e, 28)). For the first case, we have the degree as min {μ Short (1), μ Middle (23)}= min {1, 5/13}= The second case has the degree as min{μ Short (6), μ Middle (18)}= min{9/13, 10/13}= Consequently, α is contained in s with degree max{0.385, 0.692}= For ease of reference, let ϒ(α, s) represent the degree that a fuzzy time-interval sequence α is contained in sequence s, which is determined according to Definitions 2, 3 and 4. A transaction is represented by <sid, s>, where sid is the identifier of this transaction and s is a sequence. A sequence database S is formed by a set of transactions. For a given fuzzy time-interval sequence α, its support in database S is defined as follows. Definition 5. support S (α) = (sid, s) in S ϒ(α, s) / S

6 A fuzzy time-interval sequence α is called a fuzzy time-interval sequential pattern or a frequent fuzzy time-interval sequence if its support in S is greater than or equal to the user-specified minimum support (called min_sup). A fuzzy time-interval sequential pattern with length k is referred to as a fuzzy k-time-interval sequential pattern. Given a sequence database and min_sup, the goal of fuzzy time-interval sequential pattern mining is to determine in the sequence database all the fuzzy time-interval subsequences whose supports are more than or equal to min_sup. Sid Sequence 10 ( (a, 1), (b, 4), (e, 29) ) 20 ( (d, 1), (a, 2), (d, 24) ) 30 ( (b, 1), (a, 11), (e, 28) ) 40 ( (f, 1), (b, 5), (c, 19) ) 50 ( (a, 4), (b, 5), (d, 10), (e, 28) ) 60 ( (a, 0), (b, 5), (e, 30) ) 70 ( (, 2), (a, 17), (h, 17) ) 80 ( (c, 3), (i, 10), (f, 18) ) 90 ( (h, 4), (a, 10), (b, 21) ) 100 ( (g, 0), (a, 0), (b, 3), (e, 30) ) Fig. 2. A sequence database. Example 3. Consider the sequence database shown in Fig. 2 with the linguistic terms defined in Example 1. If min_sup=0.3, then we can find fuzzy time-interval sequential pattern (a, Short, b, Long, e) with support in the database. Four transactions (Sid=10, 50, 60 and 100) contribute to this pattern, whose degrees are respectively 0.77, 0.62, 0.77 and According to Definition 5, the support of this pattern is ( )/10= Algorithms for Mining Fuzzy Time-interval Sequential Patterns The goal of this section is to develop an algorithm for mining fuzzy time-interval sequential patterns from databases. The algorithm is developed by modifying the well-known Apriori algorithm. We introduce them in the following The FTI-Apriori algorithm The Fuzzy Time Interval (FTI)-Apriori Algorithm is developed by modifying the well-known Apriori algorithm. Basically, two phases are repeatedly executed to generate the patterns. The first phase generates candidate sequences of length k, denoted by C k,

7 from the frequent sequences of length k-1, denoted by L k-1. So, each candidate sequence generated in the current cycle will have one more item and one more linguistic term than the frequent sequences in the preceding cycle. After finding the set of candidate sequences, the second phase scans the database to determine the support of each candidate pattern, and the resulting set comprises all frequent sequences of length k. In the following, we discuss how to execute the first phase for different values of k: (1) For k=1: The set of candidate patterns of length 1, C 1, will be generated by listing all distinct items in databases. (2) For k=2: Traditional, C 2 was obtained by directly oining L 1 with L 1. However, since the first item and the second item in C 2, say b and c, may have various fuzzy time-interval relations, pairs for all possible fuzzy time-interval relations must be generated. Let us consider an explanatory example. Suppose that (b) and (c) belong to L 1 and LT={lt 1, lt 2, lt 3, lt 4, lt 5 }. Then there are totally 20 candidate fuzzy time-interval sequences in C 2. Some of them are (b, lt 1, b), (b, lt 3, b), (b, lt 2, c), (c, lt 2, b) and (c, lt 2, c). In a word, C 2 can be generated as L 1 TI L 1, where denotes oin. (3) When k>2: Let (b 1, lg 1, b 2, lg 2,, lg k-1, b k ) be a fuzzy k-time-interval sequence in L k. Then, the fuzzy (k-1)-time-interval sequences (b 1, lg 1, b 2, lg 2,, lg k-2, b k-1 ) and (b 2, lg 2,, b k-1, lg k-1, b k ) must be also frequent, because the support of (b 1, lg 1, b 2, lg 2,, lg k-1, b k ) must be no larger than the supports of the other two. (For the proof, please refer to Theorem A.1 in Appendix). Therefore, if the time-interval sequences (b 1, lg 1, b 2, lg 2,, lg k-2, b k-1 ) and (b 2, lg 2,, b k-1, lg k-1, b k ) exist in L k-1, then (b 1, lg 1, b 2, lg 2,, lg k-1, b k ) must exist in C k. All the time-interval sequences in C k can be generated by oining the time-interval sequences in L k-1 this way. Next, we will discuss how to execute the second phase, i.e., to determine the supports of all patterns in C k. To this end, a tree structure, called fuzzy candidate tree, is used as a basis. Basically, the candidate tree is similar to the prefix tree adopted in previous research (Agrawal et al. 1994; Liu et al. 2003). The maor difference lies in that the traditional approach connects each tree branch with an item name, whereas in the new approach two components are attached an item name and a linguistic term. Suppose we are given a candidate set C k. Initially, we have an empty tree with a single root node. Then we insert every fuzzy time-interval pattern in C k into the tree, ust as how we build a prefix tree. After all the patterns in C k have been inserted, the tree is built. Next, we will traverse the tree for every transaction. For a given transaction, after finishing the traversal we can determine the degrees that the patterns in the tree are contained in that transaction. Finally, after the tree has been traversed by all transactions the support value of every pattern is kept in the corresponding leaf node in the tree. So, we can determine what patterns are frequent and what are not.

8 In the following, the maor steps of the FTI-Apriori algorithm are listed. For clarity, we omit the detailed functions and steps. Fig. 3. The FTI-Apriori algorithm. Input: Sequence Database S, Minimum Support min_sup, and Linguistic Terms LT; Output: The complete set of fuzzy time-interval patterns Variable: c.count is the support of time-interval sequence c Method: C 1 = find_all_items(s); L 1 ={c C 1 c.count min_sup} For each i 1 L 1 { } For each i 2 L 1 { } For each ltd LT; c=i 1 *ltd*i 2 ; add c to C 2 ; L 2 ={c C 2 c.count min_sup} For (k>2; L k-1 ; k++) do begin { C k =fuzzy_apriori_gen(l k-1 ); Build the fuzzy candidate tree from C k ; For each sequence s S {Traverse the fuzzy candidate tree and accumulate the supports; } L k ={c C k (c.count / S ) min_sup} } return L k ; Example 4. Consider the sequence database shown in Fig. 2 and assume that we set min_sup as 0.3. C 1 will be generated as follows: (a): 8, (b): 7, (c): 2, (d): 2, (e): 5, (f): 2, (g): 1, (h): 2, (i):1, (): 1, Then, we have L 1 ={a, b, e} because their supports are larger than min_sup. After that, C 2 can be generated by oining L 1 with LT={Short, Middle, Long}, where their membership functions are referred to Fig. 1. The a then b pattern can be generated as the following: For a then b with different linguistic terms: Short =( )/10=3.92/10=0.392 Middle =( )=1.08/10=0.108 Long =( )=0.0 Among the above three, only the pattern of a then b in Short can be generated in L 2 since its support is greater than min_sup. Besides this pattern, other patterns in L 2 include:

9 a then e in Long with support 0.361, and b then e in Long with support After the generation of L 2, the algorithm starts to produce C k and L k for k>2. Since the patterns in L 2 are (a, Short, b), (a, Long, e) and (b, Long, e), the candidate pattern that we can generate for C 3 is (a, Short, b, Long, e). The following computation indicates that the support of this pattern exceeds min_sup, and thus we have L 3 as {(a, Short, b, Long, e)}. Sid=10: min{(a, 0.92/Short, b), (b, 0.77/Long, e)} = 0.77 Sid=50: min{(a, 1.0/Short, b), (b, 0.62/Long, e)} = 0.62 Sid=60: min{(a, 0.77/Short, b), (b, 0.77/Long, e)} = 0.77 Sid=100: min{(a, 1.0/Short, b), (b, 0.92/Long, e)} = 0.92 Support= ( )/10= Experimental Results In this section, we perform a simulation study of the algorithm, FTI-Apriori. It is implemented by Sun Java language (J2SDK 1.4.1_02) and tested on a PC with two Intel Pentium III 933 processors and 1GB main memory under the Windows 2000 operating system. Neither the multithreading technology nor the parallel computing skill is used in our implemented programs. Synthetic datasets are generated by applying the famous synthetic data generation algorithm in Agrawal et al. (Agrawal et al. 1995). Basically, each transaction is a sequence of itemsets. However, we extend the transaction data so that the items in different item sets have different time values and that those in the same item set have the same time values. A value w is drawn from a Poisson distribution with mean T I for each customer. The drawn value w represents the average time interval between successive itemsets in the sequence of this particular customer. After that, we determine the intervals between successive itemsets of this customer by repetitively drawing values from a Poisson distribution with mean w. Table I lists the parameters used in the simulation; the first eight parameters are the classical ones used in previous research but the last parameter T I is a new parameter created for the problem considered here. In the simulation, some parameters are fixed: N=10000, N s =5000, N I =25000, T I =15 and D = Table I Parameters D Number of customers C Average number of transactions per customer T Average number of items per transaction S Average length of maximal potentially large sequences I Average size of itemsets in maximal potentially large sequences

10 N S N I N T I Number of maximal potentially large sequences Number of maximal potentially large itemsets Number of items Average length of time intervals Table II Parameters Name C T S I C10-T2.5-S4-I C10-T5-S4-I1.25 C10-T5-S4-I C20-T2.5-S4-I1.25 C20-T2.5-S4-I2.5 C20-T2.5-S8-I The first comparison would compare the run times of these seven algorithms for different minimum supports. The comparison is carried out on the basis of the six data sets shown in Table II, where the minimum support threshold is varied from 3.0% to 1.5%. Fig. 4 summarizes the results. Runtime(Second) C10-T2.5-S4-I FTI-Apriori Runtime(Second) C10-T5-S4-I FTI-Apriori Minimum Support Minimum Support C10-T5-S4-I2.5 C20-T2.5-S4-I1.25 Runtime(Second) Minimum Support FTI-Apriori Runtime(Second) Minimum Support FTI-Apriori

11 C20-T2.5-S4-I2.5 C20-T2.5-S8-I2.5 Runtime(Second) FTI-Apriori Runtime(Second) FTI-Apriori Minimum Support Minimum Support Fig. 4. Run times for the six data sets. 5. Conclusion Sequential-pattern mining is useful in discovering customer purchasing patterns along time from transactional databases. Since the method was first proposed by Agrawal et al. (Agrawal et al. 1995) in 1995, it has become an established and active research area. The existing methods, however, do not discover the time intervals between successive items in the pattern. In view of this problem, Chen, Chiang and Ko proposed a novel method to discover the time-interval information between successive items in the pattern. With this additional information, we can know when the next purchase will happen after the previous purchase was made. Although time-interval sequential patterns can provide more information than those without time-intervals, the approach may cause the sharp boundary problem. That is, when a time interval is near the boundary of two adacent ranges, we either ignore or overemphasize it. Therefore, this paper uses the concept of fuzzy sets to extend the original research of Chen, Chiang and Ko so that fuzzy time-interval sequential pattern can be discovered from databases. Some linguistic terms, such as Long, Middle and Short, are provided to represent the linguistic terms for time-intervals. Fuzzy time-interval sequential pattern mining represents a new and promising research area in data mining. The results of this paper can be extended by considering time constraints, spatial constraints, fuzzy time-hierarchy and other kinds of time-related knowledge. Furthermore, it is important to explore how different fuzzy membership functions may influence the result of mining. References Agrawal, R., and Srikant, R. Fast Algorithms for Mining Association Rules, in Proceedings of 1994 International Conference Very Large Data Bases, 1994, pp Agrawal, R., and Srikant, R. Mining Sequential Patterns, in Proceedings of 1995 International Conference Data Engineering, 1995, pp

12 Au, W. H., and Chan, K. C. C. Mining fuzzy association rules, in Proc. 6th Int. Conference Information Knowledge Management, Las Vegas, NV, 1997, pp Au, W. H., and Chan, K. C. C. An effective algorithm for discovering fuzzy rules in relational databases, in Proceedings IEEE International Conference Fuzzy Systems, vol. II, 1998, pp Au, W. H., and Chan, K. C. C. FARM: A data mining system for discovering fuzzy association rules, in Proceedings FUZZ-IEEE 99, vol. 3, 1999, pp Au, W. H., and Chan, K. C. C. Mining fuzzy association rules in a bank-account database, IEEE Transaction on Fuzzy Systems (11), 2003, pp Chen, G., and Wei, Q. Fuzzy association rules and the extended mining algorithms, Information Sciences (147), 2002, pp Chen, Y. L., Chiang, M. C., and Ko, M. T. Discovering Time-interval Sequential Patterns in Sequence Databases, Expert Systems with Applications (25:3), 2003, pp Fu, A. W. C., Wong, M. H., Sze, S. C., Wong, W. C., Wong, W. L., and Yu, W. K. Finding fuzzy sets for the mining of fuzzy association rules for numerical attributes, in Proceedings International Symposium Intelligent Data Engineering Learning (IDEAL 98), Hong Kong, 1998, pp Hong, T. P., Kuo, C. S., and Chi, S. C. Mining association rules from quantitative data, Intelligent Data Analysis (3), 1999, pp Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., and Hsu, M. C. FreeSpan: Frequent Pattern-proected Sequential Pattern Mining, in Proceedings of 2000 International Conference on Knowledge Discovery and Data Mining, 2000, pp Karr, C. L., and Gentry, E. J. Fuzzy control of ph using genetic algorithms, IEEE Transaction on Fuzzy Systems (1:1), 1993, pp Kuok, C. M., Fu, A., and Wong, M. H. Mining fuzzy association rules in databases, SIGMOD Record (27:1), 1998, pp Lin, C. T., and Lee, C. S. G. Neural network based fuzzy logic control and decision systems, IEEE Transaction on Computers (40:12), 1991, pp Lee, J. H., and Kwang, H. L. An extension of association rules using fuzzy sets, presented at the IFSA 97, Prague, Czech Republic, Lee, J. W. T. An ordinal framework for data mining of fuzzy rules, in FUZZ IEEE 2000, San Antonio, TX, 2000, pp Liu, G., Lu, H., Xu, Y., and Yu, J. X. Ascending frequency ordered prefix-tree: efficient mining of frequent patterns, in Proceedings of the Eighth International Conference on Database Systems for Advanced Applications, 2003, pp

13 Medasani, S., Kim, J., and Krishnapuram, R. An overview of membership function generation techniques for pattern recognition, International Journal of Approximate Reasoning (19), 1998, pp Pei, J., Han, J., Mortazavi-Asl, B., and Zhu, H. Mining access patterns efficiently from web logs, in Proceedings of 2000 Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000, pp Ross, T. J. Fuzzy Logic with Engineering Applications, McGraw-Hill, Inc Vazirgiannis, M. A classification and relationship extraction scheme for relational databases based on fuzzy logic, in Proceedings Research Development Knowledge Discovery Data Mining, Melbourne, Australia, 1998, pp Yue, J. S., Tsang, E., Yenng, D., and Daming, S. Mining fuzzy association rules with weighted items, in Proc. IEEE International Conference Systems, Man, Cybernetics, Nashville, TN, 2000, pp Zhang, W. Mining fuzzy quantitative association rules, in Proceedings 11 th International Conference Tools Artificial Intelligence, Chicago, IL, 1999, pp

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining

Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining Bonfring International Journal of Data Mining, Vol. 4, No. 1, March 214 1 Improving Efficiency of Apriori Algorithms for Sequential Pattern Mining Alpa Reshamwala and Dr. Sunita Mahajan Abstract--- Computer

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

An Efficient Tree-based Fuzzy Data Mining Approach

An Efficient Tree-based Fuzzy Data Mining Approach 150 International Journal of Fuzzy Systems, Vol. 12, No. 2, June 2010 An Efficient Tree-based Fuzzy Data Mining Approach Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Abstract 1 In the past, many algorithms

More information

PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN

PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN Dvijesh Bhatt Department of Information Technology, Institute of Technology, Nirma University Gujarat,( India) ABSTRACT Data

More information

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,

More information

Generation of Potential High Utility Itemsets from Transactional Databases

Generation of Potential High Utility Itemsets from Transactional Databases Generation of Potential High Utility Itemsets from Transactional Databases Rajmohan.C Priya.G Niveditha.C Pragathi.R Asst.Prof/IT, Dept of IT Dept of IT Dept of IT SREC, Coimbatore,INDIA,SREC,Coimbatore,.INDIA

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

Maintenance of the Prelarge Trees for Record Deletion

Maintenance of the Prelarge Trees for Record Deletion 12th WSEAS Int. Conf. on APPLIED MATHEMATICS, Cairo, Egypt, December 29-31, 2007 105 Maintenance of the Prelarge Trees for Record Deletion Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu Department of

More information

Maintenance of Generalized Association Rules for Record Deletion Based on the Pre-Large Concept

Maintenance of Generalized Association Rules for Record Deletion Based on the Pre-Large Concept Proceedings of the 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and ata Bases, Corfu Island, Greece, February 16-19, 2007 142 Maintenance of Generalized Association Rules for

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

A Novel Method of Optimizing Website Structure

A Novel Method of Optimizing Website Structure A Novel Method of Optimizing Website Structure Mingjun Li 1, Mingxin Zhang 2, Jinlong Zheng 2 1 School of Computer and Information Engineering, Harbin University of Commerce, Harbin, 150028, China 2 School

More information

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others Vol.15 No.6 J. Comput. Sci. & Technol. Nov. 2000 A Fast Algorithm for Mining Association Rules HUANG Liusheng (ΛΠ ), CHEN Huaping ( ±), WANG Xun (Φ Ψ) and CHEN Guoliang ( Ξ) National High Performance Computing

More information

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree

Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Discovery of Multi-level Association Rules from Primitive Level Frequent Patterns Tree Virendra Kumar Shrivastava 1, Parveen Kumar 2, K. R. Pardasani 3 1 Department of Computer Science & Engineering, Singhania

More information

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set

To Enhance Projection Scalability of Item Transactions by Parallel and Partition Projection using Dynamic Data Set To Enhance Scalability of Item Transactions by Parallel and Partition using Dynamic Data Set Priyanka Soni, Research Scholar (CSE), MTRI, Bhopal, priyanka.soni379@gmail.com Dhirendra Kumar Jha, MTRI, Bhopal,

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Applying Data Mining to Wireless Networks

Applying Data Mining to Wireless Networks Applying Data Mining to Wireless Networks CHENG-MING HUANG 1, TZUNG-PEI HONG 2 and SHI-JINN HORNG 3,4 1 Department of Electrical Engineering National Taiwan University of Science and Technology, Taipei,

More information

Optimized Weighted Association Rule Mining using Mutual Information on Fuzzy Data

Optimized Weighted Association Rule Mining using Mutual Information on Fuzzy Data International Journal of scientific research and management (IJSRM) Volume 2 Issue 1 Pages 501-505 2014 Website: www.ijsrm.in ISSN (e): 2321-3418 Optimized Weighted Association Rule Mining using Mutual

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Mining Fuzzy Association Rules Using Mutual Information

Mining Fuzzy Association Rules Using Mutual Information Mining Fuzzy Association Rules Using Mutual Information S. Lotfi, M.H. Sadreddini Abstract Quantitative Association Rule (QAR) mining has been recognized as an influential research problem over the last

More information

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

Maintenance of fast updated frequent pattern trees for record deletion

Maintenance of fast updated frequent pattern trees for record deletion Maintenance of fast updated frequent pattern trees for record deletion Tzung-Pei Hong a,b,, Chun-Wei Lin c, Yu-Lung Wu d a Department of Computer Science and Information Engineering, National University

More information

Efficient Remining of Generalized Multi-supported Association Rules under Support Update

Efficient Remining of Generalized Multi-supported Association Rules under Support Update Efficient Remining of Generalized Multi-supported Association Rules under Support Update WEN-YANG LIN 1 and MING-CHENG TSENG 1 Dept. of Information Management, Institute of Information Engineering I-Shou

More information

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm

Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek Poznan University of Technology Institute of Computing Science ul.

More information

A Hierarchical Document Clustering Approach with Frequent Itemsets

A Hierarchical Document Clustering Approach with Frequent Itemsets A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database Algorithm Based on Decomposition of the Transaction Database 1 School of Management Science and Engineering, Shandong Normal University,Jinan, 250014,China E-mail:459132653@qq.com Fei Wei 2 School of Management

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com

More information

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction International Journal of Engineering Science Invention Volume 2 Issue 1 January. 2013 An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction Janakiramaiah Bonam 1, Dr.RamaMohan

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011

AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR UTILITY MINING. Received April 2011; revised October 2011 International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 7(B), July 2012 pp. 5165 5178 AN EFFICIENT GRADUAL PRUNING TECHNIQUE FOR

More information

Generating Cross level Rules: An automated approach

Generating Cross level Rules: An automated approach Generating Cross level Rules: An automated approach Ashok 1, Sonika Dhingra 1 1HOD, Dept of Software Engg.,Bhiwani Institute of Technology, Bhiwani, India 1M.Tech Student, Dept of Software Engg.,Bhiwani

More information

Mining Temporal Association Rules in Network Traffic Data

Mining Temporal Association Rules in Network Traffic Data Mining Temporal Association Rules in Network Traffic Data Guojun Mao Abstract Mining association rules is one of the most important and popular task in data mining. Current researches focus on discovering

More information

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity

WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity WIP: mining Weighted Interesting Patterns with a strong weight and/or support affinity Unil Yun and John J. Leggett Department of Computer Science Texas A&M University College Station, Texas 7783, USA

More information

SeqIndex: Indexing Sequences by Sequential Pattern Analysis

SeqIndex: Indexing Sequences by Sequential Pattern Analysis SeqIndex: Indexing Sequences by Sequential Pattern Analysis Hong Cheng Xifeng Yan Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign {hcheng3, xyan, hanj}@cs.uiuc.edu

More information

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad Hyderabad,

More information

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2017) Vol. 6 (3) 213 222 USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS PIOTR OŻDŻYŃSKI, DANUTA ZAKRZEWSKA Institute of Information

More information

Incrementally mining high utility patterns based on pre-large concept

Incrementally mining high utility patterns based on pre-large concept Appl Intell (2014) 40:343 357 DOI 10.1007/s10489-013-0467-z Incrementally mining high utility patterns based on pre-large concept Chun-Wei Lin Tzung-Pei Hong Guo-Cheng Lan Jia-Wei Wong Wen-Yang Lin Published

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey G. Shivaprasad, N. V. Subbareddy and U. Dinesh Acharya

More information

International Journal of Advance Engineering and Research Development. Fuzzy Frequent Pattern Mining by Compressing Large Databases

International Journal of Advance Engineering and Research Development. Fuzzy Frequent Pattern Mining by Compressing Large Databases Scientific Journal of Impact Factor(SJIF): 3.134 e-issn(o): 2348-4470 p-issn(p): 2348-6406 International Journal of Advance Engineering and Research Development Volume 2,Issue 7, July -2015 Fuzzy Frequent

More information

FUFM-High Utility Itemsets in Transactional Database

FUFM-High Utility Itemsets in Transactional Database Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 3, March 2014,

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

M.Kannan et al IJCSET Feb 2011 Vol 1, Issue 1,30-34

M.Kannan et al IJCSET Feb 2011 Vol 1, Issue 1,30-34 Genetic Data Mining With Divide-And- Conquer Strategy M.Kannan, P.Yasodha, V.Srividhya CSA Dept., SCSVMV University, Enathur, Kanchipuram - 631 561. Abstract: Data mining is most commonly used in attempts

More information

Datasets Size: Effect on Clustering Results

Datasets Size: Effect on Clustering Results 1 Datasets Size: Effect on Clustering Results Adeleke Ajiboye 1, Ruzaini Abdullah Arshah 2, Hongwu Qin 3 Faculty of Computer Systems and Software Engineering Universiti Malaysia Pahang 1 {ajibraheem@live.com}

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data

FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data FM-WAP Mining: In Search of Frequent Mutating Web Access Patterns from Historical Web Usage Data Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University,

More information

An Algorithm for Frequent Pattern Mining Based On Apriori

An Algorithm for Frequent Pattern Mining Based On Apriori An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior

More information

Review Paper Approach to Recover CSGM Method with Higher Accuracy and Less Memory Consumption using Web Log Mining

Review Paper Approach to Recover CSGM Method with Higher Accuracy and Less Memory Consumption using Web Log Mining ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Review Paper Approach to Recover CSGM Method with Higher Accuracy and Less Memory Consumption using Web Log Mining Abstract Shrivastva Neeraj

More information

A Graph-Based Approach for Mining Closed Large Itemsets

A Graph-Based Approach for Mining Closed Large Itemsets A Graph-Based Approach for Mining Closed Large Itemsets Lee-Wen Huang Dept. of Computer Science and Engineering National Sun Yat-Sen University huanglw@gmail.com Ye-In Chang Dept. of Computer Science and

More information

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

What Is Data Mining? CMPT 354: Database I -- Data Mining 2 Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT

More information

FP-Growth algorithm in Data Compression frequent patterns

FP-Growth algorithm in Data Compression frequent patterns FP-Growth algorithm in Data Compression frequent patterns Mr. Nagesh V Lecturer, Dept. of CSE Atria Institute of Technology,AIKBS Hebbal, Bangalore,Karnataka Email : nagesh.v@gmail.com Abstract-The transmission

More information

Mining Generalized Sequential Patterns using Genetic Programming

Mining Generalized Sequential Patterns using Genetic Programming Mining Generalized Sequential Patterns using Genetic Programming Sandra de Amo Universidade Federal de Uberlândia Faculdade de Computação Uberlândia MG - Brazil deamo@ufu.br Ary dos Santos Rocha Jr. Universidade

More information

Generalized Knowledge Discovery from Relational Databases

Generalized Knowledge Discovery from Relational Databases 148 IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.6, June 2009 Generalized Knowledge Discovery from Relational Databases Yu-Ying Wu, Yen-Liang Chen, and Ray-I Chang Department

More information

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee, Uttarkhand, India. bnkeshav123@gmail.com, mitusuec@iitr.ernet.in,

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

Association Rule Mining. Entscheidungsunterstützungssysteme

Association Rule Mining. Entscheidungsunterstützungssysteme Association Rule Mining Entscheidungsunterstützungssysteme Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

More information

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules

A Comparative study of CARM and BBT Algorithm for Generation of Association Rules A Comparative study of CARM and BBT Algorithm for Generation of Association Rules Rashmi V. Mane Research Student, Shivaji University, Kolhapur rvm_tech@unishivaji.ac.in V.R.Ghorpade Principal, D.Y.Patil

More information

Sequential Pattern Mining A Study

Sequential Pattern Mining A Study Sequential Pattern Mining A Study S.Vijayarani Assistant professor Department of computer science Bharathiar University S.Deepa M.Phil Research Scholar Department of Computer Science Bharathiar University

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai

Salah Alghyaline, Jun-Wei Hsieh, and Jim Z. C. Lai EFFICIENTLY MINING FREQUENT ITEMSETS IN TRANSACTIONAL DATABASES This article has been peer reviewed and accepted for publication in JMST but has not yet been copyediting, typesetting, pagination and proofreading

More information

Algorithm for Efficient Multilevel Association Rule Mining

Algorithm for Efficient Multilevel Association Rule Mining Algorithm for Efficient Multilevel Association Rule Mining Pratima Gautam Department of computer Applications MANIT, Bhopal Abstract over the years, a variety of algorithms for finding frequent item sets

More information

Mining Association Rules From Time Series Data Using Hybrid Approaches

Mining Association Rules From Time Series Data Using Hybrid Approaches International Journal Of Computational Engineering Research (ijceronline.com) Vol. Issue. ining Association Rules From Time Series Data Using ybrid Approaches ima Suresh 1, Dr. Kumudha Raimond 2 1 PG Scholar,

More information

Association Rule Mining. Introduction 46. Study core 46

Association Rule Mining. Introduction 46. Study core 46 Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent

More information

Value Added Association Rules

Value Added Association Rules Value Added Association Rules T.Y. Lin San Jose State University drlin@sjsu.edu Glossary Association Rule Mining A Association Rule Mining is an exploratory learning task to discover some hidden, dependency

More information

Association Rule Mining from XML Data

Association Rule Mining from XML Data 144 Conference on Data Mining DMIN'06 Association Rule Mining from XML Data Qin Ding and Gnanasekaran Sundarraj Computer Science Program The Pennsylvania State University at Harrisburg Middletown, PA 17057,

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules

Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Graph Based Approach for Finding Frequent Itemsets to Discover Association Rules Manju Department of Computer Engg. CDL Govt. Polytechnic Education Society Nathusari Chopta, Sirsa Abstract The discovery

More information

Improving the Efficiency of Web Usage Mining Using K-Apriori and FP-Growth Algorithm

Improving the Efficiency of Web Usage Mining Using K-Apriori and FP-Growth Algorithm International Journal of Scientific & Engineering Research Volume 4, Issue3, arch-2013 1 Improving the Efficiency of Web Usage ining Using K-Apriori and FP-Growth Algorithm rs.r.kousalya, s.k.suguna, Dr.V.

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Challenges and Interesting Research Directions in Associative Classification

Challenges and Interesting Research Directions in Associative Classification Challenges and Interesting Research Directions in Associative Classification Fadi Thabtah Department of Management Information Systems Philadelphia University Amman, Jordan Email: FFayez@philadelphia.edu.jo

More information

Implementation of CHUD based on Association Matrix

Implementation of CHUD based on Association Matrix Implementation of CHUD based on Association Matrix Abhijit P. Ingale 1, Kailash Patidar 2, Megha Jain 3 1 apingale83@gmail.com, 2 kailashpatidar123@gmail.com, 3 06meghajain@gmail.com, Sri Satya Sai Institute

More information

Roadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea.

Roadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea. 15-721 DB Sys. Design & Impl. Association Rules Christos Faloutsos www.cs.cmu.edu/~christos Roadmap 1) Roots: System R and Ingres... 7) Data Analysis - data mining datacubes and OLAP classifiers association

More information

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.923

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Association Rule Mining for Multiple Tables With Fuzzy Taxonomic Structures

Association Rule Mining for Multiple Tables With Fuzzy Taxonomic Structures Association Rule Mining for Multiple Tables With Fuzzy Taxonomic Structures Praveen Arora, R. K. Chauhan and Ashwani Kush Abstract Most of the existing data mining algorithms handle databases consisting

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS K. Kavitha 1, Dr.E. Ramaraj 2 1 Assistant Professor, Department of Computer Science,

More information

A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS

A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS A NOVEL ALGORITHM FOR MINING CLOSED SEQUENTIAL PATTERNS ABSTRACT V. Purushothama Raju 1 and G.P. Saradhi Varma 2 1 Research Scholar, Dept. of CSE, Acharya Nagarjuna University, Guntur, A.P., India 2 Department

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Sequential Pattern Mining Methods: A Snap Shot

Sequential Pattern Mining Methods: A Snap Shot IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-661, p- ISSN: 2278-8727Volume 1, Issue 4 (Mar. - Apr. 213), PP 12-2 Sequential Pattern Mining Methods: A Snap Shot Niti Desai 1, Amit Ganatra

More information

Fuzzy Cognitive Maps application for Webmining

Fuzzy Cognitive Maps application for Webmining Fuzzy Cognitive Maps application for Webmining Andreas Kakolyris Dept. Computer Science, University of Ioannina Greece, csst9942@otenet.gr George Stylios Dept. of Communications, Informatics and Management,

More information

An Algorithm for Mining Frequent Itemsets from Library Big Data

An Algorithm for Mining Frequent Itemsets from Library Big Data JOURNAL OF SOFTWARE, VOL. 9, NO. 9, SEPTEMBER 2014 2361 An Algorithm for Mining Frequent Itemsets from Library Big Data Xingjian Li lixingjianny@163.com Library, Nanyang Institute of Technology, Nanyang,

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Mining User - Aware Rare Sequential Topic Pattern in Document Streams

Mining User - Aware Rare Sequential Topic Pattern in Document Streams Mining User - Aware Rare Sequential Topic Pattern in Document Streams A.Mary Assistant Professor, Department of Computer Science And Engineering Alpha College Of Engineering, Thirumazhisai, Tamil Nadu,

More information

International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015)

International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Improved Apriori Algorithm was Applied in the System of Elective Courses in Colleges and Universities

More information

Maintaining Frequent Itemsets over High-Speed Data Streams

Maintaining Frequent Itemsets over High-Speed Data Streams Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred Ng Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon,

More information

2. Department of Electronic Engineering and Computer Science, Case Western Reserve University

2. Department of Electronic Engineering and Computer Science, Case Western Reserve University Chapter MINING HIGH-DIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,

More information

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo

More information

A Conflict-Based Confidence Measure for Associative Classification

A Conflict-Based Confidence Measure for Associative Classification A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA

More information

Mining Vague Association Rules

Mining Vague Association Rules Mining Vague Association Rules An Lu, Yiping Ke, James Cheng, and Wilfred Ng Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong, China {anlu,keyiping,csjames,wilfred}@cse.ust.hk

More information

An Algorithm for Mining Large Sequences in Databases

An Algorithm for Mining Large Sequences in Databases 149 An Algorithm for Mining Large Sequences in Databases Bharat Bhasker, Indian Institute of Management, Lucknow, India, bhasker@iiml.ac.in ABSTRACT Frequent sequence mining is a fundamental and essential

More information

A Comprehensive Survey on Sequential Pattern Mining

A Comprehensive Survey on Sequential Pattern Mining A Comprehensive Survey on Sequential Pattern Mining Irfan Khan 1 Department of computer Application, S.A.T.I. Vidisha, (M.P.), India Anoop Jain 2 Department of computer Application, S.A.T.I. Vidisha, (M.P.),

More information